YUV->RGBA conversion: Special case the edge pixels, do the middle without index clamping #12
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR splits the color space conversion of the edge pixels and "the rest", to allow fewer operations on the inside pixels that are the straightforward case.
The effects of each included commit (plus two excluded ones) on the runtime of a particular command are:
These numbers are the output of
time
(in seconds) running the following command (after compilation is done), with each sample averaged from three runs. The error bars are one standard deviation long in both directions.time cargo run --package=exporter --release -- ../../Downloads/z0r-de_4145.swf --frames 1000
I also commented out the actual saving of the frames into files, so the effect on rendering itself is more directly measurable.
While the "utility functions" commit regresses a little bit, doing it is almost a necessity for the one after it, which is the one providing the significant gains.
Overall, these changes sped up the rendering by about 25%.
I also made two more experiments (independently) that I then discarded because they both regressed slightly:
The first one was doing the bilinear interpolation differently: on
f32
numbers, in two steps (the usual way, in a rotated H-shape).The second one was simply omitting the
.min()
and.max()
calls fromclamp()
, relying on the saturating property of thef32
tou8
cast instead.I don't know if this is starting to stretch the "code simplicity/cleanliness" vs. "runtime performance" trade-off a little bit too far, but at least there is still no
unsafe
anywhere... :)