Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upOptimizations for the response data format for speedup and cost-effectiveness in production mode #1030
Comments
Relevant discussion in Slack was https://cube-js.slack.com/archives/CC0403RRR/p1598379807118300 cc @paveltiunov |
Yeah, have the same problem. That's better to have
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_json.html That's the best format for |
I totally agree also that it is crucial to run cubejs faster, with less memory and less network bandwidth. I think the main discussion here should be:
I also up to the task to help with this one! |
Let's introduce @amitripshtos Contributions are very welcomed here! |
Hi @paveltiunov, could you eventually point us to the relevant code portions which are used for generating the output? Thanks! |
Hey @tobilg! Please see cube.js/packages/cubejs-api-gateway/index.js Line 610 in 18e48e1 |
Is your feature request related to a problem? Please describe.
Currently, the JSON responses of successful queries are very verbose, and contain a lot of duplicate information, which yields the following
Describe the solution you'd like
Ideally, there would be the possibility to use a "compact mode" for the query responses when the production mode is used (via configuration flag or something similar)
This would then change the response data format from something like
to
So, basically, send the data as an array of records, where a record is an array of ray values in the original data type, and additionally an
annotation.mapping
object containing acolums
property which defined the sequence of "columns" in the record.In the example above, this would shrink the
data
payload from 577 chars to 329 chars, which is a reduction of around 43%. I assume this will be proportional to more "columns" and more records.This change would also need to be included in the cube.js frontend libraries.