[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y ] [Search | Free Show | Home]

If we had the computing power to bruteforce hashing algorithms

This is a blue board which means that it's for everybody (Safe For Work content only). If you see any adult content, please report it.

Thread replies: 26
Thread images: 2

File: file.png (26KB, 685x163px) Image search: [Google]
file.png
26KB, 685x163px
If we had the computing power to bruteforce hashing algorithms like MD5 where the original string/bin data was ~ 10MB wouldn't we essentially have created the greatest form of compression?

Why am I wrong?
>>
I guess technically it would, but it would not be time efficient, since you would have to run through all the previous combinations before finding yours.
>>
I don't fully understand your question but I think you're trying to say that if you can reverse a md5 hash via bruteforce you will have figured out the content of file or string therefore creating "compression"

/sqt/ is that way
>>
>>59010764
Just step through pi until you find your exact 10mb of data and share the position/length of that.
>>
Due to collisions there's no guarantee that the bruteforced data is the original uncompressed data even if it matches the checksum.
>>
No. There are 2^(10*1024*1024 - 16) = 2^10485744 different 10MB files with the same MD5 hash.
>>
>>59011016
Are you sure about that
>>
is this some kind of troll thread?
>>
>>59010987
this.

There are theoretically an infinite number of n's for MD5(n) that make a given hash Z. A hash is not lossless, ie you lose information that is unrecoverable.
>>
>>59011055
No, thanks for pointing. It's actually more, 2^(8*10485744)
>>
>>59010764
Collisions.
>>
>>59011172
>more possibilites than the amount of atoms in the observable universe
Are you sure about that
>>
>>59011172
Only off by a factor of 8.
>>
>>59011191
>2^10485744 / 2^83885952 = 8
Are you sure about that
>>
An MD5 sum is 16 bytes, or 128 bits. So, there are 2^128 different MD5 sums possible.

10MB = 10000KB = 10000000B = 80000000 bits. So, there are 2^80000000 possible 10MB files.

Therefore, on average, each MD5 sum can "decompress" to any of 2^(80000000-128) 10MB files.
>>
are you all retarded?
this is a serious question
>>
File: raw.gif (1MB, 498x278px) Image search: [Google]
raw.gif
1MB, 498x278px
>>59011243
>>
>>59011256
Not too sure, too tired to think it through properly

Did I get the math wrong?
>>
There are 16^32 possible md5 hashes and 2^10485760 possible 10MB files.

I will give you a hint that there are many times the number of possible 10MB files than there are MD5 hashes.
>>
>>59011276
Woops, I didn't that bytes to bits. Yeah, >>59011243
>>
>>59011055
He seems to be off a little bit.

number of 10MB files
2^(8*10*2^20)
number of md5sums
2^(8*16)

average number of 10MB files with a specific md5sum:
2^(8*10*2^20)/2^(8*16) = 2^(8*(10*2^20 - 16))
= 256^(10*2^20-16)

I don't think you can say they're perfectly uniform. One md5sum value might have more than another by a percent or two. The odds of even one md5sum having a unique 10MB file seem astronomical. For lossless compression to work you need for a given compressed version to map to only one uncompressed version. Reverse md5sum maps to a gorillion.
>>
Hashing and compression are dramatically different things. Any similarities you see in them are aesthetic; the Mathematics behind a good hashing algorithm and a good compression algorithm are share almost nothing in common.
>>
>>59010987
>>59011016
>>59011276
It would be more practical to divide the uncompressed data into smaller blocks (much smaller than 10MB, e.g. 256 bytes or so) and perform such a conversion on each individual block instead.

In addition, for every block, as a part of the final compressed data you would want to store key points of information; Say the original first, middle and last bytes, as well as a length value (for odd blocks) would be stored alongside the resulting hash to better confirm a match to the original data when reversed.

Sure, you wouldn't achieve 10485760:32 compression ratios like OP's scenario, but ~256:40 would still be pretty high compared to many cases of what modern compression algorithms can achieve.
>>
>>59011603
>10485760:16 / 256:24*
I forgot to factor in storing the hash as byte data instead of text.
>>
>>59011603
What you fail to realize is that there is no point at which it becomes viable.
>>
>>59011620
No you forgot to factor in the fact that if you map m items to n items and n < m you lose information.

The vast majority of possible 10MB files map to ~10MB files when compressed using lossless compression algorithms.
Thread posts: 26
Thread images: 2


[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y] [Search | Top | Home]

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/


If you need a post removed click on it's [Report] button and follow the instruction.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com.
If you like this website please support us by donating with Bitcoins at 16mKtbZiwW52BLkibtCr8jUg2KVUMTxVQ5
All trademarks and copyrights on this page are owned by their respective parties.
Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the content originated from that site.
This means that RandomArchive shows their content, archived.
If you need information for a Poster - contact them.