[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y ] [Search | Free Show | Home]

Compressing random data

This is a blue board which means that it's for everybody (Safe For Work content only). If you see any adult content, please report it.

Thread replies: 25
Thread images: 2

File: ant.jpg (16KB, 350x316px) Image search: [Google]
ant.jpg
16KB, 350x316px
So explain to me why we can't compress random binary data again?

Couldn't you just make a rainbow table of all possibilities that can be stored in 32 bytes and index those bytes in a database with a RIPEMD160 hash of the data that is 16 bytes long (this is bound to generate collisions.)

chunk_size = 32
possibilities = (chunk_size ** 8)
space_required = (possibilities * chunk_size) / 1024 / 1024 / 1024

Or 32,768 gigabytes. That's not that large. Now to compress the data: hash the original data with SHA256 to get a checksum and convert every 32 byte chunk into a 16 byte RIPEMD160 index into your table. To decompress, try every possibility of collisions for the associated index hashes. If this is infeasible you could always store a small amount of the original data too to help brute force the origin data until it matches the initial hash.

Result: ~50% reduction in any kind of data no matter how random but it requires a metric fuck load of hard drives. Would this be possible? Ay lmaooooo
>>
>>58833831
>Couldn't you just make a rainbow table of all possibilities that can be stored in 32 bytes
32 bytes = 32*8 bits=256 bits.

Possible combinations for 256 bits = 2^(256) about 10^77 possibilities.

I am gonna say no.
>>
>>58833903
Plus OPs scheme doesn't actually compress anything.

>To decompress, try every possibility of collisions for the associated index hashes.

I.e. this amounts to having to brute force 50% of the original data anyway.
>>
>>58833831
Why have the chunks so big? If this were a good idea a chunk size of 2 bits would suffice. Now we can just do what you suggested.

Random string:
00110110

Rainbow table:
00 -> 00
01 -> 11
10 -> 01
11 -> 10

Our compressed string:
00110110

And we have accomplished nothing, if the data is random enough the rainbow table is actually filled out and does nothing.
>>
>>58833831
You should really learn about entropy, OP.
>>
>>58833831
No. That's not how entropy works.

Hint: how many entries would you need in a table that stores all possible 32-bit values? The same number as the max 32-bit unsigned int, 2**32

It's mathematically impossible with sufficiently random/already compressed data to compress it any further with your algorithm; it'll generate way too many collisions. (I.E. this will never do better than gzip.)

It's a bit like declaring you've managed to store all the digits of pi in zero bytes using your amazing new compression scheme where an empty file = pi.

It's a bit like thermodynamics; once you've optimized it to a certain point, you can't make any further improvements.
>>
OP, you may be onto something with using hash methods. The likelihood of a string of data (with a given length) having a collision with a string of data that is the same length is small. So, simply hash any given string of data. Do this in portions, of course.

Decompression would take ages, but there's a good chance of compression ratios being good even on the most entropic of data.

However, if even one of these hashes have a collision that would be tried before the original value that originated that hash, the data which is decompressed is not identical to the compressed data. This could be mitigated by testing decompression immediately after compressing, and if it fails, use an offset or different size of hashes or something.
>>
>>58833831
Random data does not contain any redundancies.
>>
>>58833831
Metrics for the win
Lbs make me feel fatter than kilos
Miles make you feel you less speeding than you are
Inches make me feel i have a little Dick
F° shit make you don't know when is freezing celsius is simple 0 is freezing
>>
are you really this stupid, /g/? And I'm coming here regularly. I think I should stop.

A compressor is a one-to-one mapping. For each string you compress, there is one that you make longer.

Now if count how many strings of length n there, and how many strings that are shorter, you will see what is the relation between the number of strings you can compress by at least 1 bit and the number of shorter strings that your compressor makes longer.

morons. I'm out of here. Forever.
>>
>>58836154
It might though
>>
>>58833831
The problem with this scheme is that the SHA256 sum will have collisions too.

Statistically, each 16 byte hash will have 2^128 collisions in the 32 byte space. If you have two such blocks, you're likely to see a SHA256 collision.

You can't cheat entropy.
>>
>>58833831

because there is no redundant info in random data you can get rid off.
>>
>>58833831
this is actually sort of how riverbed steelheads work, however you need an appliance on both ends of your wan connection
>>
>>58833831
The Pigeon Hole Principle proves that for lossless compression, if some inputs get mapped to shorter outputs, *some* inputs necessarily must map to larger ones.

For truly random data, you can't predict which inputs should get shrunk and which get made larger, so you're fucked. Things like gzip limit the degree to which expansion can hurt you by having a tiny flag indicating "next block of data not actually compressed", so you still lose, but not by very much.
>>
>>58833831
you get to the minimum when compressing data just flips the bits around in circle
>>
Read Claude Shannon. He is good though von Neumann is total badass.
>>
>>58833831
Entrapy and information density are directly related.
>>
>>58838759
God is perfectly just. You can't compress random numbers.
>>
the bs here is that real data sets are rarely random. doubly so in anything a computer would compress/decompress - things like squid rely on this

on a large enough set of random data even RLE can still make a dent because random or not there will still be a standard distribution of data
>>
>>58833831

If your data is really random you can compress it like this:

stat -c%s random.dat > random.dat.rz

and uncompress with:

dd if=/dev/random of=random.dat bs=1 count="$(cat random.dat.rz)"
>>
>see a bunch of rambling incoherent thoughts
>none of you have a god damned clue about what you're babbling on about
>go watch "Silicon Valley" and learn about middle-out concepts
>stupid fucking people will be the death of us all
>>
File: 1485406470793.png (642KB, 726x1040px) Image search: [Google]
1485406470793.png
642KB, 726x1040px
>>58833831
>random
>>
>>58833831
A table of all combinations of say, 4 bytes, will require 4 bytes to point to any entry of.
>>
Lyra is best pony.?
Thread posts: 25
Thread images: 2


[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y] [Search | Top | Home]

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/


If you need a post removed click on it's [Report] button and follow the instruction.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com.
If you like this website please support us by donating with Bitcoins at 16mKtbZiwW52BLkibtCr8jUg2KVUMTxVQ5
All trademarks and copyrights on this page are owned by their respective parties.
Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the content originated from that site.
This means that RandomArchive shows their content, archived.
If you need information for a Poster - contact them.