[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y ] [Search | Free Show | Home]

What is the superior compression algorithm? The most famous

This is a blue board which means that it's for everybody (Safe For Work content only). If you see any adult content, please report it.

Thread replies: 33
Thread images: 1

File: zip-1[1].png (1MB, 500x500px) Image search: [Google]
zip-1[1].png
1MB, 500x500px
What is the superior compression algorithm?

The most famous ones are of course the zip and rar, but zip is dead and done for any professional work, since it was designed back when computers had 64KB of ram and is now outdated as hell.
So what should a neckbeard use to archive his millions of text files images and other stuff today if he wants to keep up with the times?
>>
LZMA2 probs
>>
>>61137124
Probably, which means you could just use 7zip, as .7z uses LZMA
>>
Electro optical attenuators or FET are the way to go.
>>
How does file compression work?
>>
>>61137114
>he wants to compress already compressed images
>he doesn't know what happens when compressing high entropy files
>>
>>61137231
>>he doesn't know what happens when compressing high entropy files
Nothing. Literally.
>>
>>61137206

Finds repeating strings in the data and replaces with a dictionary mark instead. For a really, really simple example lets pretend your file is, at its core, this string

AKGISGUAPWKGISFJWUKGISODSPKGIS

You'll notice KGIS appears 4 times. So when compressed you can use dictionary markings in their place with a single reference. Your compressed archive now looks like this

A*GUAPW*FJWU*ODSP*

And a dictionary file says

* = KGIS

Congratulations, you just shaved out 16 characters of the file, saving space. When it decompresses it will replace the * with KGIS and your file restores to its original form.
>>
>>61137426
>>61137206

Sorry, I meant to say you saved 12 characters. You replaced 16 characters with 4.
>>
Who cares they're all worthless niggers until someone implements multithreaded decompression.
>>
>>61137426
>>61137437
Thanks
>>
>if he wants to keep up with the times?

Get an 8 TB drive for $180 and leave all your shit uncompressed.
>>
>>61137426
There's more to it than that especially in certain applications like image compression
>>
>>61137671
Not that guy but Image compression it's a different subject than general file compression.

Like saving the same .jpeg file multiple times with any edition image edition software will reduce the image quality drastically (jpeg it's like that by design) but compressing it with Winrar a million times wont.
However, the size reduction will be pretty small.
>>
>>61137114
>superior compression

That'll be BASE64.

The "64" is because it tries the top 64 different algorithms on each block, using the most efficient each time.
>>
>>61137114
LZMA in the 7zip container
>>
>>61137114
your own custom version of Zlib
>>
>>61137741
>>61137671
Lossless compression(FLAC,7z,PNG) differrs from lossy compression (mp3,jpg)
>>
>>61137480

computerphile has a great set of videos about compression that goes into a lot more detail than the anon who replied to you

https://www.youtube.com/watch?v=Lto-ajuqW3w

here's a really good one about jpg compression too, which is a totally different ballgame: https://www.youtube.com/watch?v=Q2aEzeMDHMA
>>
>>61137671
The breif history of ZIP:
At first there was to major ideas in compression:
- one (as told earlier) the marking of repeating sequnces
- and two, the Huffman coding https://en.wikipedia.org/wiki/Huffman_coding
(in short, it is common that data is structured by bytes, 8 bit groups, but if the frequency of some things are greater, like the character 'a' is present in the data in vast amounts, then we could mark it not by it's eight bit signiture, but by a 3, or 4 bit one (example: 1110) this means that we mark 'a' with less then half it's original size, however this leads to dynamic lenght of markings for each character (for example, let's say 'b' will be next, if the marking will be 1111 for 'b' and 'c' will be 11110, we will never detect 'c' because 11110 starts with 1111, which is 'b' and we will read the rest as a new character mark starting with 0), this there is a fixed Huffman coding table, the most used characters (and sequence markings) are 7bit signitures and the least are 9bit signitures, but it is possible to make a specialized Huffman table for each data, also one file is not necessary consist of one compression block, each block of compressed data can has it's own Huffman table https://en.wikipedia.org/wiki/DEFLATE)
Also there is the LZ77 and LZ78
https://en.wikipedia.org/wiki/LZ77_and_LZ78
these are also part of the zip, it's basicly a moving search-window, it tells how distant can the algorithm look back in the original data to look for repeating sequnces, at default this is a 32KB window (legacy reasons, this is what most modern compression softwares raise higher, like WinRAR and 7z, it's quiet suprissing that raising this parameter doesn't increase efficiency that much)
Then there is the ZIP64 expansion, this is a small patch to the file structure to solve the 32bit size limitation of files and directory naming things

damn character limitations...
>>
>>61137382
But the rotational velocidensity will get out of whack
>>
>>61138395
a few years ago there I worked on making a compression system for our company in-house to be optimized on our data structures, we started with understanding the ZIP... boy it was a long run, but worth every minute...
I would like give some spotlight for two major player in what the ZIP is today
first is the maker: Phil Katz
https://en.wikipedia.org/wiki/Phil_Katz
He was the one who put the ZIP file format together first, with amazing insight for future and backwards compatibility
And the other one is Mark Adler, he with Jean-loup Gailly made the "zlib" library which is used by all around the world from games to business softwares and still wanders around the internet helping people understand how ZIP really works:
https://stackoverflow.com/questions/20762094/how-are-zlib-gzip-and-zip-related-what-do-they-have-in-common-and-how-are-they
>>
Why does adding one tiny text file into an existing zip archive takes million billion years to complete?
>>
>>61138821
well, it depends on the software, some do the whole thing again even for the slightest of changes
>>
>>61137114
Depends on what you mean by superior.
Size reduction?
Compression rate? (trust me this matters more than you think it would, specially for large files)
Decompress rate? (doesn't take as much cpu, but it is a metric)
Something else?

Different algorithms do different things.
>>
>>61137124
>>61137155
>>61137819
the answer is LZMA, which yields the smallest file sizes yet is faster than bzip2. 7zip uses LZMA in a traditional Windows archive file way, and xz uses LZMA in the traditional Unix way
>>
use 7-zip and tell WinRAR to go fuck itself

use lzma

but really just play around intill you get the size you want.
>>
>>61137611
>just throw more hardware at it XD
do you work for Currysoft by any chance?
>>
>>61140065

Whys so sour bro?. What did Winrar do to you?.

>>61140230

You do realize even if he were to compress all his shit, he would still need to throw more hardware in order to download any more right?.

If he's compressing just to compress then he's not gaining any space. Videos and images are as well compressed as they can be by as distributed and I doubt he has 1-3 TB worth of text files.

I'm not saying to buy new hardware. I'm saying don't waste time compressing videos/images/pretty much anything because it's already compressed. Unless you and him just want to waste time and hard drive lifespan by excessive and un-necessary read/writes and time it takes to read/write from and to compressed files.
>>
>>61137114
Hash the file.
Whenever you want it back just brute force it.
>>
>>61141221
That is very unreliable. Hashes can produce duplicates.
>>
>>61141236
Yeah but chances are, most collisions you find will be random bullshit, or atleast nothing close to the original file.

Good luck "decompressing" anything over any meaningful size tho.
>>
>>61137790
But then you have to store an extra 6 bits of information at the beginning of each block so that you can tell what each block is compressed with.
Thread posts: 33
Thread images: 1


[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y] [Search | Top | Home]

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/


If you need a post removed click on it's [Report] button and follow the instruction.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com.
If you like this website please support us by donating with Bitcoins at 16mKtbZiwW52BLkibtCr8jUg2KVUMTxVQ5
All trademarks and copyrights on this page are owned by their respective parties.
Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the content originated from that site.
This means that RandomArchive shows their content, archived.
If you need information for a Poster - contact them.