Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a principled, and consistent, implementation of freelists. #100240

Open
markshannon opened this issue Dec 14, 2022 · 0 comments
Open

Use a principled, and consistent, implementation of freelists. #100240

markshannon opened this issue Dec 14, 2022 · 0 comments
Assignees
Labels
performance Performance or resource usage

Comments

@markshannon
Copy link
Member

markshannon commented Dec 14, 2022

We currently implement freelists for the following object and internal structs

  • str
  • float
  • tuple (about 20 freelists, which is really wasteful)
  • list
  • async generator
  • contexts
  • small dicts
  • small dict keys
  • slice (a freelist of one)

All of these are implemented independently and rather inefficiently.
They take up 3672 bytes of space, instead of the ~200 bytes they should take.
This is not a lot in terms of memory, but it is a lot in terms of L1 cache.

A freelist should look like this:

typedef struct {
    void *head;
    uint32_t space;
    uint16_t current_capacity;
    uint16_t limit_capacity;
} _PyFreelist;

Only one test is needed for allocation and deallocation (on the fast path).
Allocation needs to test freelist.head != NULL. Deallocation needs to test freelist.space != 0.

The actual list is threaded through the objects on the list, terminated by NULL.

Cache locality is good. The head and space are adjacent, and 4 freelists fit in a single cache line.
When freeing, the object is hot (and thus in cache).
When allocating, the object is about to be used, so needs to be moved to cache anyway.

The capacity fields are there to allow the capacity of a freelist to be temporarily set to 0, ensuring that all allocations go through the main allocator, for use cases like tracemalloc. Currently tracemalloc doesn't see a lot of allocations, due to freelists.

Unifying the code for freelists reduces code duplication, and simplifies things for further improvements.

Original discussion

faster-cpython/ideas#132

@markshannon markshannon added the performance Performance or resource usage label Dec 14, 2022
@markshannon markshannon self-assigned this Dec 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

No branches or pull requests

1 participant