New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(databend sink): support csv encoding & compression none #16829
feat(databend sink): support csv encoding & compression none #16829
Conversation
|
Name | Link |
---|---|
62a4ab6 | |
https://app.netlify.com/sites/vector-project/deploys/6426f42efbfe0d0008f30c9d | |
https://deploy-preview-16829--vector-project.netlify.app | |
To edit notification comments on pull requests, go to your Netlify site settings.
|
Name | Link |
---|---|
3227fc0 | |
https://app.netlify.com/sites/vrl-playground/deploys/641e99afb9541300078e6292 |
98318e4
to
ec51caa
Compare
Regression Detector ResultsRun ID: 12fbae7f-b915-405a-b271-a57bbed593e8 ExplanationA regression test is an integrated performance test for The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed. No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%. Fine details of change detection per experiment.
|
Regression Detector ResultsRun ID: f5d796d5-6e34-4494-9caa-dfb63e7804a9 ExplanationA regression test is an integrated performance test for The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed. No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%. Fine details of change detection per experiment.
|
default: "gzip" | ||
enum: { | ||
gzip: """ | ||
[Gzip][gzip] compression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, Databend supports more compression algorithms: https://databend.rs/doc/load-data/#supported-file-formats
Is it true? If yes, would you like to add support for more compression algorithms, supported by databend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true.
Since there are only Gzip
and Zlib
in sink::util:buffer::compression::Compression
, we could add more compression format support for vector later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I do not recommend using sink::util:buffer::compression::Compression
for sinks-specific compression settings since every sink has own set of supported compression algorithms. sink::util:buffer::compression::Compression
is too generic.
I advise creating a Databend-specific Compression
enum, filling it with supported by Databend algorithms, and then using it for compression purposes.
However, I see that here you are bounded by the trait so you cannot use this approach :( Hopefully, later we will extend sink::util:buffer::compression::Compression
with more algorithms...
Regression Detector ResultsRun ID: 9573939e-8e2e-4971-b4f9-55cade13826b ExplanationA regression test is an integrated performance test for The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed. No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%. Fine details of change detection per experiment.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely want to take a look at both the encoding
and compression
and look to avoid exposing a number of unsupported options in the documentation.
At a high level, I'm not sure how this impacts a user though. Regardless of encoding I'd expect Databend to expose the data appropriately via a UI/SQL/etc, without leaking details of how the data is stored at rest?
Is the intention allow users to configure this to aid in manual debugging, by allowing them to read the files directly from blob storage without Databend in the middle?
This configuration is to helping people with performance optimization, such as reducing cpu usage by transferring without compressing & decompressing. There's no need to read them from blob storage directly, since they are temporary files. |
ee158b8
to
3227fc0
Compare
Regression Detector ResultsRun ID: 2987e507-ccd2-4937-92ee-5628d1a55150 ExplanationA regression test is an integrated performance test for The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed. Changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%:
Fine details of change detection per experiment.
|
Regression Detector ResultsRun ID: 54c7c3ef-88e8-4443-aebf-8452a6f4864e ExplanationA regression test is an integrated performance test for The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed. Changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%:
Fine details of change detection per experiment.
|
Flakey failure in the Checks job - retrying. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still unclear as to when a user would prefer to choose CSV vs JSON encoding, but I'm not sure that's a blocker for me. One question I've raised to our team around consistency for defaults.
Actually the main concern is CSV is much smaller than JSON encoding in most cases. |
Thanks @everpcpc! |
Regression Detector ResultsRun ID: f24c8a2c-ae21-482d-8d61-91d980654684 ExplanationA regression test is an integrated performance test for The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed. No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%. Fine details of change detection per experiment.
|
require: #16828