Description
I apologize if this has already been reported. I tried searching for it, but I couldn't find any related issues.
Bug report
Exactly as the title says. the hash of -1 is -2, but the hash of -2 is also -2.
>>> hash(-1)
-2
>>> hash(-2)
-2
I'm working on a project that makes heavy use of a custom cache implementation, and a function that allows a single integer as an argument was fetching the wrong value from the cache due to -1 and -2 hashing to the same value.
Also, other numeric types hash the same way:
>>> import fractions, decimal
>>> hash(-1.0)
-2
>>> hash(fractions.Fraction(-1, 1))
-2
>>> hash(decimal.Decimal('-1'))
-2
This is easy enough to work around in my current project since I can just add a check for it in the cache implementation, but this is likely to cause some other hard to find bugs in any project that relies on hashing of numeric values.
I found this section of the source code for the long type:
Lines 3121 to 3132 in 4a1dd73
The switch-case statement appears to be the culprit, but I don't have any experience with C, so I'm not 100% sure what this function is returning. Specifically, I don't know what the expression after the ternary operator is supposed to return since I couldn't locate the ob_digit
implementation and I'm not familiar enough with C to infer anything about it.
Also, this comment in test_numeric_tower.py indicates that it is intended that -1 is an invalid hash value:
cpython/Lib/test/test_numeric_tower.py
Lines 124 to 134 in f4c0348
However, having the same hash value for both -1 and -2 still seems like a nasty bug to me, so even if hash(-1)
should not return -1
, it also should not return -2
.
That comment mentions a bug in long_hash
, which I'm guessing is at least related to this problem, so if this is already being tracked, please feel free to close this issue as a duplicate. I couldn't find any related issues, so I opened one to be sure. :)
As an aside, users of the functools.lru_cache
decorator may run into some very difficult to debug cache hits because of this since the functools.lru_cache
hashes all arguments together as a tuple, which will make the problem less obvious than it was in my case:
Lines 432 to 446 in f9433ff
As previously stated, I'm not experienced with C, so I'm not sure about it, but the presence of the ternary operator in the long_hash
implementation makes me wonder if only certain environments would get a hash value of -2, and others would get the intended value.
Please let me know if there is any other system/environment information that I need to provide in order to make such a distinction. :)
Your environment
-
CPython versions tested on:
3.10.5 (main, Aug 1 2022, 07:53:20) [GCC 12.1.0] (Provided by Arch package repo.)
3.12.0a0 (heads/main:44f1f63ad5, Aug 4 2022, 18:35:48) [GCC 12.1.1 20220730] (Built from source) -
Operating system and architecture:
Arch Linux kernel ver. 5.18.16, x86_64.