bpo-25457: Allow json.encode() to handle mixed keys when sort_keys=True #8011

jheiv · 2018-06-29T17:24:01Z

Note: This causes 3 regression tests to fail. The first two fail because the ordering of items when sort_keys=True is different than the hard coded expectation. The third failure is because specifying sort_keys=True and mixed keys no longer raises the expected TypeError:

BPO Issue: https://bugs.python.org/issue25457

python.bat -m test.regrtest test_json -l  -v
...
======================================================================
FAIL: test_encode_truefalse (test.test_json.test_dump.TestCDump)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\james\Source\Repos\cpython\lib\test\test_json\test_dump.py", line 21, in test_encode_truefalse
    '{"false": 1, "2": 3.0, "4.0": 5, "6": true}')
AssertionError: '{"2": 3.0, "4.0": 5, "6": true, "false": 1}' != '{"false": 1, "2": 3.0, "4.0": 5, "6": true}'
- {"2": 3.0, "4.0": 5, "6": true, "false": 1}
+ {"false": 1, "2": 3.0, "4.0": 5, "6": true}


======================================================================
FAIL: test_encode_truefalse (test.test_json.test_dump.TestPyDump)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\james\Source\Repos\cpython\lib\test\test_json\test_dump.py", line 21, in test_encode_truefalse
    '{"false": 1, "2": 3.0, "4.0": 5, "6": true}')
AssertionError: '{"2": 3.0, "4.0": 5, "6": true, "false": 1}' != '{"false": 1, "2": 3.0, "4.0": 5, "6": true}'
- {"2": 3.0, "4.0": 5, "6": true, "false": 1}
+ {"false": 1, "2": 3.0, "4.0": 5, "6": true}


======================================================================
FAIL: test_unsortable_keys (test.test_json.test_speedups.TestEncode)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\james\Source\Repos\cpython\lib\test\test_json\test_speedups.py", line 73, in test_unsortable_keys
    self.json.encoder.JSONEncoder(sort_keys=True).encode({'a': 1, 1: 'a'})
AssertionError: TypeError not raised

----------------------------------------------------------------------
Ran 153 tests in 3.518s

FAILED (failures=3, skipped=1)

https://bugs.python.org/issue25457

…ython.org/issue25457)

…tead of creating a PyList object.

…keys that aren't strings This fixes one of the problems reported in ansible#49343 Upstream Python3 bug for the json traceback: https://bugs.python.org/issue25457 and PR that may fix it: python/cpython#8011

…keys that aren't strings This fixes one of the problems reported in #49343 Upstream Python3 bug for the json traceback: https://bugs.python.org/issue25457 and PR that may fix it: python/cpython#8011

…s dictionary keys that aren't strings This fixes one of the problems reported in ansible#49343 Upstream Python3 bug for the json traceback: https://bugs.python.org/issue25457 and PR that may fix it: python/cpython#8011. (cherry picked from commit c817bef) Co-authored-by: Toshio Kuratomi <a.badger@gmail.com>

…keys that aren't strings This fixes one of the problems reported in ansible#49343 Upstream Python3 bug for the json traceback: https://bugs.python.org/issue25457 and PR that may fix it: python/cpython#8011

aeros · 2019-09-08T02:41:17Z

An updated PR has been opened to address this issue (#15691). All of the CI tests are passing on that one and it's been a while since this one was last updated, so I'll label this as stale and recommend for it to be closed once the author fixes the title for #15691.

serhiy-storchaka

In principle looks good to me, bet the drawback of this PR is that it makes slower the common case, when all keys are strings.

aeros · 2019-09-08T11:08:50Z

@serhiy-storchaka:

In principle looks good to me, bet the drawback of this PR is that it makes slower the common case, when all keys are strings.

How significant is the performance loss? I guess the question is whether or not providing the functionality to properly sort mixed keys is worth slowing down the sorting process in the common cases.

Would it be reasonable to provide the users a kwarg for json.encode so that it provides this functionality optionally without slowing down the sorting process for the more common cases? This option would be False by default. The kwarg could be called mixed_keys or something along those lines.

aeros · 2019-09-08T11:10:24Z

Removing the stale label since #15691 was closed. This also seems to be more of an enhancement than a bugfix, but I think we should look into updating this PR and providing users an option to accurately sort mixed keys.

yoann9344 · 2019-09-09T18:31:44Z

I'm the author of #15691, my modification of source code seemed to work for special case, but it didn't directly call json.dumps, it pass through jsonify passing through itsdangerous. Let's go up to the river :
dict definition

BuildingInfo_Names = {
    'Wood': 1,
    'Clay': 2,
    'Iron': 3,
    'Crop': 4,
    ...}

add of reversed keys, for code simplification (in the __init__ of a module) (maybe not the best way ^^")

for name, b_type in list(BuildingInfo_Names.items()): 
    BuildingInfo_Names[b_type] = name

my line of code causing the issue return jsonify(BuildingInfo_Names)

jsonify from flask (pallets)

from itsdangerous import json as _json
...
def dumps(obj, app=None, **kwargs):
    _dump_arg_defaults(kwargs, app=app)
    encoding = kwargs.pop("encoding", None)
    rv = _json.dumps(obj, **kwargs)
    if encoding is not None and isinstance(rv, text_type):
        rv = rv.encode(encoding)
    return rv
...
def _dump_arg_defaults(kwargs, app=None):
    """Inject default arguments for dump functions."""
    if app is None:
        app = current_app

    if app:
        bp = app.blueprints.get(request.blueprint) if request else None
        kwargs.setdefault(
            "cls", bp.json_encoder if bp and bp.json_encoder else app.json_encoder
        )

        if not app.config["JSON_AS_ASCII"]:
            kwargs.setdefault("ensure_ascii", False)

        kwargs.setdefault("sort_keys", app.config["JSON_SORT_KEYS"])
    else:
        kwargs.setdefault("sort_keys", True)
        kwargs.setdefault("cls", JSONEncoder)
...
def jsonify(*args, **kwargs):
    indent = None
    separators = (",", ":")

    if current_app.config["JSONIFY_PRETTYPRINT_REGULAR"] or current_app.debug:
        indent = 2
        separators = (", ", ": ")

    if args and kwargs:
        raise TypeError("jsonify() behavior undefined when passed both args and kwargs")
    elif len(args) == 1:  # single args are passed directly to dumps()
        data = args[0]
    else:
        data = args or kwargs

    return current_app.response_class(
        dumps(data, indent=indent, separators=separators) + "\n",
        mimetype=current_app.config["JSONIFY_MIMETYPE"],
    )

itsdangerous (pallets) in __init__ : from ._json import json

try:
    import simplejson as json
except ImportError:
    import json


class _CompactJSON(object):
    """Wrapper around json module that strips whitespace."""

    @staticmethod
    def loads(payload):
        return json.loads(payload)

    @staticmethod
    def dumps(obj, **kwargs):
        kwargs.setdefault("ensure_ascii", False)
        kwargs.setdefault("separators", (",", ":"))
        return json.dumps(obj, **kwargs)

I hope the water was not too cold and that no one drowns !

nb : I have done the modification that I have submitted and it worked, but finally I have opted for putting True on app.config["JSON_SORT_KEYS"] for compatibility with default behavior
ps : we can notice how a lot of dependencies can be pain full ^^

csabella · 2020-02-04T01:05:41Z

@jheiv, please resolve the merge conflict and respond to the code reviews. Thank you!

…e 'unsortable' xfail test

jheiv · 2020-03-24T08:41:16Z

@csabella Sorry for the delay -- merge conflicts are now resolved and the failing tests have been updated to account for mixed-type-keys.

Re @serhiy-storchaka 's concern about slowing down the common case, I agree and will follow up with some profiling tomorrow/this week so you guys can have an idea of the cost.

jheiv · 2020-03-24T23:11:59Z

Here are the results of some benchmarks:

   object_cache         imp       sort_keys       pre -> post perf_counter
---------------------------------------------------------------------------------------
   100k-1000-0.1.pkl    c         True      138696600 ->  135319600    2.435 % faster
   100k-1000-0.1.pkl    c         False     136588000 ->  135399100    0.870 % faster
   100k-1000-0.5.pkl    c         True      137782000 ->  135307000    1.796 % faster
   100k-1000-0.5.pkl    c         False     136207500 ->  134547300    1.219 % faster
   100k-1000-0.8.pkl    c         True      137624500 ->  135440500    1.587 % faster
   100k-1000-0.8.pkl    c         False     136098100 ->  134688400    1.036 % faster
    10k-0010-0.1.pkl    c         True       13736300 ->   13545600    1.388 % faster
    10k-0010-0.1.pkl    c         False      13614400 ->   13429000    1.362 % faster
    10k-0010-0.5.pkl    c         True       13804300 ->   13574100    1.668 % faster
    10k-0010-0.5.pkl    c         False      13738200 ->   13726700    0.084 % faster
    10k-0010-0.8.pkl    c         True       13764000 ->   13537200    1.648 % faster
    10k-0010-0.8.pkl    c         False      13644500 ->   13458200    1.365 % faster
    10k-0100-0.1.pkl    c         True       13841400 ->   13456100    2.784 % faster
    10k-0100-0.1.pkl    c         False      13647000 ->   13772900    0.923 % slower
    10k-0100-0.5.pkl    c         True       13793600 ->   13421800    2.695 % faster
    10k-0100-0.5.pkl    c         False      13605500 ->   13498100    0.789 % faster
    10k-0100-0.8.pkl    c         True       13716400 ->   13459500    1.873 % faster
    10k-0100-0.8.pkl    c         False      13579900 ->   13539500    0.297 % faster
    10k-1000-0.1.pkl    c         True       13748200 ->   13480800    1.945 % faster
    10k-1000-0.1.pkl    c         False      13622000 ->   13449200    1.269 % faster
    10k-1000-0.5.pkl    c         True       13771900 ->   13452800    2.317 % faster
    10k-1000-0.5.pkl    c         False      13598600 ->   13649500    0.374 % slower
    10k-1000-0.8.pkl    c         True       13744100 ->   13553500    1.387 % faster
    10k-1000-0.8.pkl    c         False      13646000 ->   13431300    1.573 % faster
   100k-1000-0.1.pkl    python    True      396553200 ->  398647100    0.528 % slower
   100k-1000-0.1.pkl    python    False     396115400 ->  400246100    1.043 % slower
   100k-1000-0.5.pkl    python    True      397332200 ->  399220700    0.475 % slower
   100k-1000-0.5.pkl    python    False     398192100 ->  398421100    0.058 % slower
   100k-1000-0.8.pkl    python    True      399345500 ->  399795800    0.113 % slower
   100k-1000-0.8.pkl    python    False     395161600 ->  398907400    0.948 % slower
    10k-0010-0.1.pkl    python    True       38967100 ->   39088400    0.311 % slower
    10k-0010-0.1.pkl    python    False      39113900 ->   39128600    0.038 % slower
    10k-0010-0.5.pkl    python    True       38796600 ->   39046200    0.643 % slower
    10k-0010-0.5.pkl    python    False      39129000 ->   38929200    0.511 % faster
    10k-0010-0.8.pkl    python    True       39188900 ->   39143000    0.117 % faster
    10k-0010-0.8.pkl    python    False      38901400 ->   39334300    1.113 % slower
    10k-0100-0.1.pkl    python    True       38851800 ->   39085200    0.601 % slower
    10k-0100-0.1.pkl    python    False      39042700 ->   38743900    0.765 % faster
    10k-0100-0.5.pkl    python    True       39044100 ->   38872200    0.440 % faster
    10k-0100-0.5.pkl    python    False      39196300 ->   39073100    0.314 % faster
    10k-0100-0.8.pkl    python    True       38734600 ->   38963900    0.592 % slower
    10k-0100-0.8.pkl    python    False      38613000 ->   39179800    1.468 % slower
    10k-1000-0.1.pkl    python    True       38911700 ->   39270300    0.922 % slower
    10k-1000-0.1.pkl    python    False      39057800 ->   39241600    0.471 % slower
    10k-1000-0.5.pkl    python    True       39380600 ->   39411700    0.079 % slower
    10k-1000-0.5.pkl    python    False      38666500 ->   39381500    1.849 % slower
    10k-1000-0.8.pkl    python    True       39263000 ->   38900400    0.924 % faster
    10k-1000-0.8.pkl    python    False      39197900 ->   38836400    0.922 % faster

obj_cache - a pickled list of randomly generated dictionaries. The three hyphen-separated parts of the filename are (# of records, # of items per record, fraction of keys that are non-strings).
imp - JSON Encoder implementation -- either cjson or pyjson as defined in https://github.com/python/cpython/blob/master/Lib/test/test_json/__init__.py#L9-L10
sort_keys - the sort_keys value passed to the encoder constructor.
pre -> post perf_counter - I used time.perf_counter_ns to time encoding of the unpickled list:

    def time_dumps(self, module, sort_keys, runs=5, runs_per_run=10, now=time.perf_counter_ns):
        encoder = module.encoder.JSONEncoder(sort_keys=sort_keys)
        encode  = encoder.encode

        times = []
        for _ in range(runs):
            t0 = now()
            for _ in range(runs_per_run):
                for obj in self.objects:
                    _ = encode(obj)
            td = now() - t0
            times.append(td)

For each pickle file I "kept" the minimum time from each of the runs deltas (though mean showed similar results).

I ran the test (loading the same pickled lists of dictionaries) for both the upstream master (without patch) and remote master (my fork, with patch).

The results are... confusing -- with the c implementation seeming slightly faster with the changes and the python implementation seeming slightly slower (but all, in general, less than +/- 2%)

If there's another method of benchmarking that's more appropriate here, I'd be happy to run them -- please let me know.

aeros · 2020-03-25T01:48:06Z

Modules/_json.c

-        goto bail;
-    }
+
+    coerced_items = PyList_New(0);


This seems to introduce a few possible refleaks for coerced_items, effectively in each goto bail location where coerced_items isn't NULL, prior to the Py_DECREF(coerced_items). IMO, the easiest solution would be adding a Py_XDECREF(coerced_items) to bail since there are multiple paths where this could potentially occur.

There should probably also be the following error condition, in case the list can't be instantiated:

Suggested change

coerced_items = PyList_New(0);

coerced_items = PyList_New(0);

if (coerced_items == NULL) {

goto bail;

}

What do you think @serhiy-storchaka? My experience with the C-API is fairly limited, so I'm not entirely certain about the above.

rhettinger · 2022-05-10T21:50:17Z

-1 from me.

IMO sorting mix keys is an anti-pattern that shouldn't be encouraged. This is doubly true on the encoding side when the user have complete control over the data going in.

For the most part, it is a feature that sorting fails here. It is an indication that something is seriously wrong with the data modeling.

jheiv added 6 commits Jun 29, 2018

Allow json(..., sort_keys=True) to handle mixed keys. (https://bugs.p…

8d3612f

…ython.org/issue25457)

Tabs -> Spaces (woops)

9ffe28b

Better mirror what _json.c now does, but use a yielding generator ins…

07fb4d1

…tead of creating a PyList object.

Better mirror what _json.c now does, but use a yielding generator ins…

d94498f

…tead of creating a PyList object.

Merge branch 'master' of https://github.com/jheiv/cpython

23939cc

Fix (?) negative refcount

e659cad

the-knights-who-say-ni added the CLA signed label Jun 29, 2018

bedevere-bot added the awaiting review label Jun 29, 2018

abadger mentioned this pull request Dec 4, 2018

Error in file module when changing group owner on RHEL 8 ansible/ansible#49343

Closed

abadger mentioned this pull request Dec 4, 2018

Fix for callback plugins on Python3 when a module returns dictionary … ansible/ansible#49512

Merged

aeros added the stale label Sep 8, 2019

aeros mentioned this pull request Sep 8, 2019

bpo-25457: Remove sorting bug in json._iterencode_dict #15691

Closed

serhiy-storchaka reviewed Sep 8, 2019

View changes

aeros added type-feature and removed stale labels Sep 8, 2019

csabella added awaiting changes and removed awaiting review labels Feb 4, 2020

jheiv added 2 commits Mar 24, 2020

resolve merge conflicts from merge upstream/master

0cb9191

update failing tests to handle mixed-key-type sorting, rm now obsolet…

9e52b24

…e 'unsortable' xfail test

rm trailing whitespace

fe71ad6

aeros reviewed Mar 25, 2020

View changes

csabella requested a review from serhiy-storchaka May 23, 2020

AlexWaygood added awaiting change review and removed awaiting changes labels Apr 17, 2022

python / cpython Public

bpo-25457: Allow json.encode() to handle mixed keys when sort_keys=True #8011

bpo-25457: Allow json.encode() to handle mixed keys when sort_keys=True #8011

jheiv commented Jun 29, 2018 •

edited by bedevere-bot

aeros commented Sep 8, 2019 •

edited

serhiy-storchaka left a comment

aeros commented Sep 8, 2019

aeros commented Sep 8, 2019 •

edited

yoann9344 commented Sep 9, 2019 •

edited

csabella commented Feb 4, 2020

jheiv commented Mar 24, 2020

jheiv commented Mar 24, 2020 •

edited

aeros Mar 25, 2020

rhettinger commented May 10, 2022

python / cpython Public

bpo-25457: Allow json.encode() to handle mixed keys when sort_keys=True #8011

Are you sure you want to change the base?

bpo-25457: Allow json.encode() to handle mixed keys when sort_keys=True #8011

Conversation

jheiv commented Jun 29, 2018 • edited by bedevere-bot

aeros commented Sep 8, 2019 • edited

serhiy-storchaka left a comment

aeros commented Sep 8, 2019

aeros commented Sep 8, 2019 • edited

yoann9344 commented Sep 9, 2019 • edited

csabella commented Feb 4, 2020

jheiv commented Mar 24, 2020

jheiv commented Mar 24, 2020 • edited

aeros Mar 25, 2020

Choose a reason for hiding this comment

rhettinger commented May 10, 2022

jheiv commented Jun 29, 2018 •

edited by bedevere-bot

aeros commented Sep 8, 2019 •

edited

aeros commented Sep 8, 2019 •

edited

yoann9344 commented Sep 9, 2019 •

edited

jheiv commented Mar 24, 2020 •

edited