Replies: 7 comments 34 replies
-
Hi! Sorry I'm replying so late. MCM is a great algorithm, with very a competitive compression ratio. Which doesn't mean it can't be used if you treat dwarfs like an archiver. I saw in a thread about brotli that there is an abstraction in dwarfs to add new compression algorithms without too much trouble. I could get behind an MCM-backed dwarfs for some uses, definitely. |
Beta Was this translation helpful? Give feedback.
-
Here's a quick and dirty comparison of some algorithms: Text version:
MCM option was done using mhx's script to extract all blocks and compress them individually |
Beta Was this translation helpful? Give feedback.
-
That is very cool! Did the use of dwarfs make it be multithreaded though? Also how is the speed on MT? comparable to LZMA? |
Beta Was this translation helpful? Give feedback.
-
Hey, just a quick note that I'm not ignoring this thread (or the precomp one). This is all really interesting, but I need to find time to write down some thoughts. While it would be incredibly cool to do all this, it's likely not straightforward. I'll get back. |
Beta Was this translation helpful? Give feedback.
-
I've been pondering this (and a few other feature requests / ideas) for a bit now and here are some thoughts:
The BCJ filtering would definitely help ZSTD compression as much as it does XZ. MCM is still better, but as far as I can tell it only currently implements x86 filtering (which is why it's only marginally better for ARM binaries):
|
Beta Was this translation helpful? Give feedback.
-
Sounds great. Especially the filters in combination with zstd. I dont really care about extreme algorithms anymore since I want to use the format with it's mounting feature. So helping zstd be better is the way! |
Beta Was this translation helpful? Give feedback.
-
There are the codecs from Bulat Ziganshin's FreeArc. One of them is called dispack (actually an adaptation of the original one). It's what I use on my scripts to filter executable data, and it works great on exes of all origin. rep is an extremely fast deduper which can speed up and help compress better the data blocks. They can be separated from FA, of course Then there's also this with lots of transforms, including for exe |
Beta Was this translation helpful? Give feedback.
-
Hi, I deal with compression of compiled binary data and am always looking to get it better. (while keeping it streamable within dwarfs most of the time)
Recently I stumbled upon mcm compression algorithm specialized on compressing binary data.
https://github.com/mathieuchartier/mcm
Also a fork with improvements until last year https://github.com/eternaleye/mcm (that I couldn't compile unfortunately)
The agorithm is singlethreaded, but perhaps implementation into a format like dwarfs can make it multithread.
Even with mt support, this is not that fast. But it does make me curious about how well it can do or maybe taking it's context mixing techniques into modern algoritms. (I'm not a scientist)
Sharing some other algorithms that are great for this task would be great, I'm happy to learn what I'm not aware of.
Some benchmark: mcm compressed better by 50mib on 6gib binary data than lzma level 9 extreme.
Beta Was this translation helpful? Give feedback.
All reactions