Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-45526: obmalloc radix use 64 addr bits #29062

Merged
merged 5 commits into from Oct 21, 2021

Conversation

nascheme
Copy link
Member

@nascheme nascheme commented Oct 19, 2021

In obmalloc, set ADDRESS_BITS to POINTER_BITS, rather than to 48. That is safer in the case that the kernel gives user-space virtual addresses that span a range greater than 48 bits.

https://bugs.python.org/issue45526

@methane
Copy link
Member

methane commented Oct 20, 2021

Update these lines too.

cpython/Objects/obmalloc.c

Lines 1283 to 1294 in 3163e68

/* radix tree for tracking arena usage
bit allocation for keys
64-bit pointers and 2^20 arena size:
16 -> ignored (POINTER_BITS - ADDRESS_BITS)
10 -> MAP_TOP
10 -> MAP_MID
8 -> MAP_BOT
20 -> ideal aligned arena
----
64

@nascheme
Copy link
Member Author

nascheme commented Oct 20, 2021

Update these lines too.

Done. I had meant to do that but forgot. I hopefully improved the comment a bit more.

/* Most current 64-bit processors are limited to 48-bit physical addresses. If
* the kernel never gives user-space a virtual memory address outside a certain
* range then ADDRESS_BITS can be safely set to something smaller than
* POINTER_BITS. If set smaller, the code assumes that the high bits of virtual
* addresses can be masked as zero when indexing into the radix tree. That
* reduces the amount of virtual memory used by the top and middle layers of
* the radix tree. Using POINTER_BITS is the safest setting and seems to have
* pretty minor overhead.
Copy link
Contributor

@ambv ambv Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. You're essentially saying "if kernel only gives out addresses between 1 .. 2^48 then it's safe to set ADDRESS_BITS to a value lower than 48".

It seems to me that the opposite is true and the opposite is what you're in fact doing in your change. What am I missing?

Copy link
Member Author

@nascheme nascheme Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should probably improve the wording if it's confusing. If ADDRESS_BITS < POINTER_BITS, the code assumes the high bits of memory addresses can be ignored (i.e. upper 16 bits) when indexing into the radix tree memory map.

The comment is explaining why it might be safe to set ADDRESS_BITS to 48. I'm changing the default to 64 because that's the safer default assumption, even though it has a bit more overhead (mostly as virtual memory usage).

@jrtc27 has suggested that machines certainly exist which violate that assumption that makes ADDRESS_BITS=48 safe. Setting ADDRESS_BITS to 64 has some small additional overhead vs using 48. I don't like the idea of Python randomly crashing.

Copy link
Member Author

@nascheme nascheme Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ambv, is this comment better:

/* High bits of memory addresses that will be ignored when indexing into the
 * radix tree.  Setting this to zero is the safe default.  For most 64-bit
 * machines, setting this to 16 would be safe.  The kernel would not give
 * user-space virtual memory addresses that have significant information in
 * those high bits.  The main advantage to setting IGNORE_BITS > 0 is that less
 * virtual memory will be used for the top and middle radix tree arrays.  Those
 * arrays are allocated in the BSS segment and so will typically consume real
 * memory only if actually accessed.
 */
#define IGNORE_BITS 0

I would replace ADDRESS_BITS with POINTER_BITS-IGNORE_BITS. The name ADDRESS_BITS is confusing since the number of physical address bits on the machine is not really the key thing.

Copy link
Contributor

@ambv ambv Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is muuuch clearer, thanks!

Copy link
Contributor

@jrtc27 jrtc27 Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, not quite. You care about the number of bits in a virtual address, which is a 32-bit or 64-bit quantity that just happens to have some of the bits required to be copies of the others. On traditional architectures, POINTER_BITS is the number of bits in a virtual address, but this is not true for CHERI, and thus Arm's experimental Morello prototype; here, C language pointers are represented with hardware capabilities, which are unforgeable and consist of a standard integer address, plus bounds and other metadata (exactly what isn't important). This provides fine-grained spatial memory safety for C/C++ (and temporal with some tricks), but means that sizeof(void *) is 16 on 64-bit architectures, whilst only 8 of those bytes are the actual integer address, and are the only bits you're interested in here. So you do mean ADDRESS_BITS, not POINTER_BITS.

Improve the comment that explains what IGNORE_BITS means.
@ambv ambv merged commit 0224b71 into python:main Oct 21, 2021
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants