Skip to content

hash(-1) == -2 and hash(-2) == -2 #95695

Closed as not planned
Closed as not planned
@GrammAcc

Description

@GrammAcc

I apologize if this has already been reported. I tried searching for it, but I couldn't find any related issues.

Bug report

Exactly as the title says. the hash of -1 is -2, but the hash of -2 is also -2.

>>> hash(-1)
-2
>>> hash(-2)
-2

I'm working on a project that makes heavy use of a custom cache implementation, and a function that allows a single integer as an argument was fetching the wrong value from the cache due to -1 and -2 hashing to the same value.

Also, other numeric types hash the same way:

>>> import fractions, decimal
>>> hash(-1.0)
-2
>>> hash(fractions.Fraction(-1, 1))
-2
>>> hash(decimal.Decimal('-1'))
-2

This is easy enough to work around in my current project since I can just add a check for it in the cache implementation, but this is likely to cause some other hard to find bugs in any project that relies on hashing of numeric values.

I found this section of the source code for the long type:

cpython/Objects/longobject.c

Lines 3121 to 3132 in 4a1dd73

long_hash(PyLongObject *v)
{
Py_uhash_t x;
Py_ssize_t i;
int sign;
i = Py_SIZE(v);
switch(i) {
case -1: return v->ob_digit[0]==1 ? -2 : -(sdigit)v->ob_digit[0];
case 0: return 0;
case 1: return v->ob_digit[0];
}

The switch-case statement appears to be the culprit, but I don't have any experience with C, so I'm not 100% sure what this function is returning. Specifically, I don't know what the expression after the ternary operator is supposed to return since I couldn't locate the ob_digit implementation and I'm not familiar enough with C to infer anything about it.

Also, this comment in test_numeric_tower.py indicates that it is intended that -1 is an invalid hash value:

def test_hash_normalization(self):
# Test for a bug encountered while changing long_hash.
#
# Given objects x and y, it should be possible for y's
# __hash__ method to return hash(x) in order to ensure that
# hash(x) == hash(y). But hash(x) is not exactly equal to the
# result of x.__hash__(): there's some internal normalization
# to make sure that the result fits in a C long, and is not
# equal to the invalid hash value -1. This internal
# normalization must therefore not change the result of
# hash(x) for any x.

However, having the same hash value for both -1 and -2 still seems like a nasty bug to me, so even if hash(-1) should not return -1, it also should not return -2.

That comment mentions a bug in long_hash, which I'm guessing is at least related to this problem, so if this is already being tracked, please feel free to close this issue as a duplicate. I couldn't find any related issues, so I opened one to be sure. :)

As an aside, users of the functools.lru_cache decorator may run into some very difficult to debug cache hits because of this since the functools.lru_cache hashes all arguments together as a tuple, which will make the problem less obvious than it was in my case:

cpython/Lib/functools.py

Lines 432 to 446 in f9433ff

class _HashedSeq(list):
""" This class guarantees that hash() will be called no more than once
per element. This is important because the lru_cache() will hash
the key multiple times on a cache miss.
"""
__slots__ = 'hashvalue'
def __init__(self, tup, hash=hash):
self[:] = tup
self.hashvalue = hash(tup)
def __hash__(self):
return self.hashvalue

As previously stated, I'm not experienced with C, so I'm not sure about it, but the presence of the ternary operator in the long_hash implementation makes me wonder if only certain environments would get a hash value of -2, and others would get the intended value.

Please let me know if there is any other system/environment information that I need to provide in order to make such a distinction. :)

Your environment

  • CPython versions tested on:
    3.10.5 (main, Aug 1 2022, 07:53:20) [GCC 12.1.0] (Provided by Arch package repo.)
    3.12.0a0 (heads/main:44f1f63ad5, Aug 4 2022, 18:35:48) [GCC 12.1.1 20220730] (Built from source)

  • Operating system and architecture:
    Arch Linux kernel ver. 5.18.16, x86_64.

Metadata

Metadata

Assignees

No one assigned

    Labels

    type-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions