Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
When looking at why our GHE backups were taking so long, I noticed that various operations are CPU constrained on gzip compression/decompression. For example, when dumping the MySQL database, I witnessed the MySQL process consume ~14 minutes of CPU time total and the
gzip
process it was piping to consuming ~45 minutes of CPU time! Put another way, if the compressor could ingest the line speed that MySQL dumping is capable of emitting, GHE backups would complete ~30 minutes faster on this instance on just the MySQL data bits alone. On the decompression side, the MySQL archive consumed ~39 minutes of CPU time with gzip.Large backup operations could be substantially faster if a modern, faster compression library were used. I personally recommend zstd, which yields better and faster compression than zlib/gzip at default/normal compression levels. I zstd compressed the MySQL dump of our GHE instance (using level 3 - the default) and the resulting archive was ~75% the size of the gzip version and took far less CPU to compress. On the decompression side, it required ~130s of CPU versus ~2,400s.
In summary, various GHE backup operations are CPU constrained by gzip compression. Replacing gzip with something more modern like zstd will make these operations faster, substantially so on larger GHE instances.