We currently implement freelists for the following object and internal structs
str
float
tuple (about 20 freelists, which is really wasteful)
list
async generator
contexts
small dicts
small dict keys
slice (a freelist of one)
All of these are implemented independently and rather inefficiently.
They take up 3672 bytes of space, instead of the ~200 bytes they should take.
This is not a lot in terms of memory, but it is a lot in terms of L1 cache.
Only one test is needed for allocation and deallocation (on the fast path).
Allocation needs to test freelist.head != NULL. Deallocation needs to test freelist.space != 0.
The actual list is threaded through the objects on the list, terminated by NULL.
Cache locality is good. The head and space are adjacent, and 4 freelists fit in a single cache line.
When freeing, the object is hot (and thus in cache).
When allocating, the object is about to be used, so needs to be moved to cache anyway.
The capacity fields are there to allow the capacity of a freelist to be temporarily set to 0, ensuring that all allocations go through the main allocator, for use cases like tracemalloc. Currently tracemalloc doesn't see a lot of allocations, due to freelists.
Unifying the code for freelists reduces code duplication, and simplifies things for further improvements.
markshannon commentedDec 14, 2022
•
edited
We currently implement freelists for the following object and internal structs
All of these are implemented independently and rather inefficiently.
They take up 3672 bytes of space, instead of the ~200 bytes they should take.
This is not a lot in terms of memory, but it is a lot in terms of L1 cache.
A freelist should look like this:
Only one test is needed for allocation and deallocation (on the fast path).
Allocation needs to test
freelist.head != NULL
. Deallocation needs to testfreelist.space != 0
.The actual list is threaded through the objects on the list, terminated by
NULL
.Cache locality is good. The
head
andspace
are adjacent, and 4 freelists fit in a single cache line.When freeing, the object is hot (and thus in cache).
When allocating, the object is about to be used, so needs to be moved to cache anyway.
The
capacity
fields are there to allow the capacity of a freelist to be temporarily set to 0, ensuring that all allocations go through the main allocator, for use cases liketracemalloc
. Currentlytracemalloc
doesn't see a lot of allocations, due to freelists.Unifying the code for freelists reduces code duplication, and simplifies things for further improvements.
Original discussion
faster-cpython/ideas#132
The text was updated successfully, but these errors were encountered: