If we had the computing power to bruteforce hashing algorithms like MD5 where the original string/bin data was ~ 10MB wouldn't we essentially have created the greatest form of compression?
Why am I wrong?
I guess technically it would, but it would not be time efficient, since you would have to run through all the previous combinations before finding yours.
I don't fully understand your question but I think you're trying to say that if you can reverse a md5 hash via bruteforce you will have figured out the content of file or string therefore creating "compression"
/sqt/ is that way
>>59010764
Just step through pi until you find your exact 10mb of data and share the position/length of that.
Due to collisions there's no guarantee that the bruteforced data is the original uncompressed data even if it matches the checksum.
No. There are 2^(10*1024*1024 - 16) = 2^10485744 different 10MB files with the same MD5 hash.
>>59011016
Are you sure about that
is this some kind of troll thread?
>>59010987
this.
There are theoretically an infinite number of n's for MD5(n) that make a given hash Z. A hash is not lossless, ie you lose information that is unrecoverable.
>>59011055
No, thanks for pointing. It's actually more, 2^(8*10485744)
>>59010764
Collisions.
>>59011172
>more possibilites than the amount of atoms in the observable universe
Are you sure about that
>>59011172
Only off by a factor of 8.
>>59011191
>2^10485744 / 2^83885952 = 8
Are you sure about that
An MD5 sum is 16 bytes, or 128 bits. So, there are 2^128 different MD5 sums possible.
10MB = 10000KB = 10000000B = 80000000 bits. So, there are 2^80000000 possible 10MB files.
Therefore, on average, each MD5 sum can "decompress" to any of 2^(80000000-128) 10MB files.
are you all retarded?
this is a serious question
>>59011256
Not too sure, too tired to think it through properly
Did I get the math wrong?
There are 16^32 possible md5 hashes and 2^10485760 possible 10MB files.
I will give you a hint that there are many times the number of possible 10MB files than there are MD5 hashes.
>>59011276
Woops, I didn't that bytes to bits. Yeah, >>59011243
>>59011055
He seems to be off a little bit.
number of 10MB files
2^(8*10*2^20)
number of md5sums
2^(8*16)
average number of 10MB files with a specific md5sum:
2^(8*10*2^20)/2^(8*16) = 2^(8*(10*2^20 - 16))
= 256^(10*2^20-16)
I don't think you can say they're perfectly uniform. One md5sum value might have more than another by a percent or two. The odds of even one md5sum having a unique 10MB file seem astronomical. For lossless compression to work you need for a given compressed version to map to only one uncompressed version. Reverse md5sum maps to a gorillion.
Hashing and compression are dramatically different things. Any similarities you see in them are aesthetic; the Mathematics behind a good hashing algorithm and a good compression algorithm are share almost nothing in common.
>>59010987
>>59011016
>>59011276
It would be more practical to divide the uncompressed data into smaller blocks (much smaller than 10MB, e.g. 256 bytes or so) and perform such a conversion on each individual block instead.
In addition, for every block, as a part of the final compressed data you would want to store key points of information; Say the original first, middle and last bytes, as well as a length value (for odd blocks) would be stored alongside the resulting hash to better confirm a match to the original data when reversed.
Sure, you wouldn't achieve 10485760:32 compression ratios like OP's scenario, but ~256:40 would still be pretty high compared to many cases of what modern compression algorithms can achieve.
>>59011603
>10485760:16 / 256:24*
I forgot to factor in storing the hash as byte data instead of text.
>>59011603
What you fail to realize is that there is no point at which it becomes viable.
>>59011620
No you forgot to factor in the fact that if you map m items to n items and n < m you lose information.
The vast majority of possible 10MB files map to ~10MB files when compressed using lossless compression algorithms.