Lossless Compression Benchmarks
Lossless compression algorithms and their implementations are routinely tested in head-to-head benchmarks. There are a number of better-known compression benchmarks. Some benchmarks cover only the compression ratio, so winners in these benchmark may be unsuitable for everyday use due to the slow speed of the top performers. Another drawback of some benchmarks is that their data files are known, so some program writers may optimize their programs for best performance on a particular data set. The winners on these benchmarks often come from the class of context-mixing compression software.
The benchmarks listed in the 5th edition of the Handbook of Data Compression (Springer, 2009) are:
- The Maximum Compression benchmark, started in 2003 and frequently updated, includes over 150 programs. Maintained by Werner Bergmans, it tests on a variety of data sets, including text, images, and executable code. Two types of results are reported: single file compression (SFC) and multiple file compression (MFC). Not surprisingly, context mixing programs often win here; programs from the PAQ series and WinRK often are in the top. The site also has a list of pointers to other benchmarks.
- UCLC (the ultimate command-line compressors) benchmark by Johan de Bock is another actively maintained benchmark including over 100 programs. The winners in most tests usually are PAQ programs and WinRK, with the exception of lossless audio encoding and grayscale image compression where some specialized algorithms shine.
- Squeeze Chart by Stephan Busch is another frequently updated site.
- The EmilCont benchmarks by Berto Destasio are somewhat outdated having been most recently updated in 2004. A distinctive feature is that the data set is not public, to prevent optimizations targeting it specifically. Nevertheless, the best ratio winners are again the PAQ family, SLIM and WinRK.
- The Archive Comparison Test (ACT) by Jeff Gilchrist included 162 DOS/Windows and 8 Macintosh lossless compression programs, but it was last updated in 2002.
- The Art Of Lossless Data Compression by Alexander Ratushnyak provides a similar test performed in 2003.
Matt Mahoney, in his February 2010 edition of the free booklet Data Compression Explained, additionally lists the following:
- The Calgary Corpus dating back to 1987 is no longer widely used due to its small size, although Leonid A. Broukhis still maintains The Calgary Corpus Compression Challenge, which started in 1996.
- The Large Text Compression Benchmark and the similar Hutter Prize both use a trimmed Wikipedia XML UTF-8 data set.
- The Generic Compression Benchmark, maintained by Mahoney himself, test compression on random data.
- Sami Runsas (author of NanoZip) maintains Compression Ratings, a benchmark similar to Maximum Compression multiple file test, but with minimum speed requirements. It also offers a calculator that allows the user to weight the importance of speed and compression ratio. The top programs here are fairly different due to speed requirement. In January 2010, the top programs were NanoZip followed by FreeArc, CCM, flashzip, and 7-Zip.
- The Monster of Compression benchmark by N. F. Antonio tests compression on 1Gb of public data with a 40 minute time limit. As of Dec. 20, 2009 the top ranked archiver is NanoZip 0.07a and the top ranked single file compressor is ccmx 1.30c, both context mixing.
Compression Ratings publishes a chart summary of the "frontier" in compression ratio and time.
Read more about this topic: Lossless Compression
Famous quotes containing the word compression:
“Do they [the publishers of Murphy] not understand that if the book is slightly obscure it is because it is a compression and that to compress it further can only make it more obscure?”
—Samuel Beckett (19061989)