New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-25457: Allow json.encode() to handle mixed keys when sort_keys=True #8011
base: main
Are you sure you want to change the base?
Conversation
…tead of creating a PyList object.
…tead of creating a PyList object.
…keys that aren't strings This fixes one of the problems reported in ansible#49343 Upstream Python3 bug for the json traceback: https://bugs.python.org/issue25457 and PR that may fix it: python/cpython#8011
…keys that aren't strings This fixes one of the problems reported in #49343 Upstream Python3 bug for the json traceback: https://bugs.python.org/issue25457 and PR that may fix it: python/cpython#8011
…s dictionary keys that aren't strings This fixes one of the problems reported in ansible#49343 Upstream Python3 bug for the json traceback: https://bugs.python.org/issue25457 and PR that may fix it: python/cpython#8011. (cherry picked from commit c817bef) Co-authored-by: Toshio Kuratomi <a.badger@gmail.com>
…keys that aren't strings This fixes one of the problems reported in ansible#49343 Upstream Python3 bug for the json traceback: https://bugs.python.org/issue25457 and PR that may fix it: python/cpython#8011
In principle looks good to me, bet the drawback of this PR is that it makes slower the common case, when all keys are strings.
How significant is the performance loss? I guess the question is whether or not providing the functionality to properly sort mixed keys is worth slowing down the sorting process in the common cases. Would it be reasonable to provide the users a kwarg for json.encode so that it provides this functionality optionally without slowing down the sorting process for the more common cases? This option would be |
Removing the stale label since #15691 was closed. This also seems to be more of an enhancement than a bugfix, but I think we should look into updating this PR and providing users an option to accurately sort mixed keys. |
I'm the author of #15691, my modification of source code seemed to work for special case, but it didn't directly call json.dumps, it pass through jsonify passing through itsdangerous. Let's go up to the river : BuildingInfo_Names = {
'Wood': 1,
'Clay': 2,
'Iron': 3,
'Crop': 4,
...} add of reversed keys, for code simplification (in the __init__ of a module) (maybe not the best way ^^") for name, b_type in list(BuildingInfo_Names.items()):
BuildingInfo_Names[b_type] = name my line of code causing the issue jsonify from flask (pallets) from itsdangerous import json as _json
...
def dumps(obj, app=None, **kwargs):
_dump_arg_defaults(kwargs, app=app)
encoding = kwargs.pop("encoding", None)
rv = _json.dumps(obj, **kwargs)
if encoding is not None and isinstance(rv, text_type):
rv = rv.encode(encoding)
return rv
...
def _dump_arg_defaults(kwargs, app=None):
"""Inject default arguments for dump functions."""
if app is None:
app = current_app
if app:
bp = app.blueprints.get(request.blueprint) if request else None
kwargs.setdefault(
"cls", bp.json_encoder if bp and bp.json_encoder else app.json_encoder
)
if not app.config["JSON_AS_ASCII"]:
kwargs.setdefault("ensure_ascii", False)
kwargs.setdefault("sort_keys", app.config["JSON_SORT_KEYS"])
else:
kwargs.setdefault("sort_keys", True)
kwargs.setdefault("cls", JSONEncoder)
...
def jsonify(*args, **kwargs):
indent = None
separators = (",", ":")
if current_app.config["JSONIFY_PRETTYPRINT_REGULAR"] or current_app.debug:
indent = 2
separators = (", ", ": ")
if args and kwargs:
raise TypeError("jsonify() behavior undefined when passed both args and kwargs")
elif len(args) == 1: # single args are passed directly to dumps()
data = args[0]
else:
data = args or kwargs
return current_app.response_class(
dumps(data, indent=indent, separators=separators) + "\n",
mimetype=current_app.config["JSONIFY_MIMETYPE"],
) itsdangerous (pallets) in __init__ : try:
import simplejson as json
except ImportError:
import json
class _CompactJSON(object):
"""Wrapper around json module that strips whitespace."""
@staticmethod
def loads(payload):
return json.loads(payload)
@staticmethod
def dumps(obj, **kwargs):
kwargs.setdefault("ensure_ascii", False)
kwargs.setdefault("separators", (",", ":"))
return json.dumps(obj, **kwargs) I hope the water was not too cold and that no one drowns ! nb : I have done the modification that I have submitted and it worked, but finally I have opted for putting True on app.config["JSON_SORT_KEYS"] for compatibility with default behavior |
@jheiv, please resolve the merge conflict and respond to the code reviews. Thank you! |
@csabella Sorry for the delay -- merge conflicts are now resolved and the failing tests have been updated to account for mixed-type-keys. Re @serhiy-storchaka 's concern about slowing down the common case, I agree and will follow up with some profiling tomorrow/this week so you guys can have an idea of the cost. |
Here are the results of some benchmarks:
def time_dumps(self, module, sort_keys, runs=5, runs_per_run=10, now=time.perf_counter_ns):
encoder = module.encoder.JSONEncoder(sort_keys=sort_keys)
encode = encoder.encode
times = []
for _ in range(runs):
t0 = now()
for _ in range(runs_per_run):
for obj in self.objects:
_ = encode(obj)
td = now() - t0
times.append(td) For each pickle file I "kept" the minimum time from each of the runs deltas (though mean showed similar results). I ran the test (loading the same pickled lists of dictionaries) for both the upstream master (without patch) and remote master (my fork, with patch). The results are... confusing -- with the c implementation seeming slightly faster with the changes and the python implementation seeming slightly slower (but all, in general, less than +/- 2%) If there's another method of benchmarking that's more appropriate here, I'd be happy to run them -- please let me know. |
goto bail; | ||
} | ||
|
||
coerced_items = PyList_New(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to introduce a few possible refleaks for coerced_items
, effectively in each goto bail
location where coerced_items isn't NULL
, prior to the Py_DECREF(coerced_items)
. IMO, the easiest solution would be adding a Py_XDECREF(coerced_items)
to bail
since there are multiple paths where this could potentially occur.
There should probably also be the following error condition, in case the list can't be instantiated:
coerced_items = PyList_New(0); | |
coerced_items = PyList_New(0); | |
if (coerced_items == NULL) { | |
goto bail; | |
} |
What do you think @serhiy-storchaka? My experience with the C-API is fairly limited, so I'm not entirely certain about the above.
-1 from me. IMO sorting mix keys is an anti-pattern that shouldn't be encouraged. This is doubly true on the encoding side when the user have complete control over the data going in. For the most part, it is a feature that sorting fails here. It is an indication that something is seriously wrong with the data modeling. |
Note: This causes 3 regression tests to fail. The first two fail because the ordering of items when sort_keys=True is different than the hard coded expectation. The third failure is because specifying sort_keys=True and mixed keys no longer raises the expected TypeError:
BPO Issue: https://bugs.python.org/issue25457
https://bugs.python.org/issue25457