Compressing random data

Thread replies: 25
Thread images: 2

Anonymous
Compressing random data 2017-02-07 09:11:40 Post No. 58833831
[Report] Image search: [Google]

File: ant.jpg (16KB, 350x316px) Image search: [Google]

Compressing random data Anonymous 2017-02-07 09:11:40 Post No. 58833831 [Report]

So explain to me why we can't compress random binary data again?

Couldn't you just make a rainbow table of all possibilities that can be stored in 32 bytes and index those bytes in a database with a RIPEMD160 hash of the data that is 16 bytes long (this is bound to generate collisions.)

chunk_size = 32
possibilities = (chunk_size ** 8)
space_required = (possibilities * chunk_size) / 1024 / 1024 / 1024

Or 32,768 gigabytes. That's not that large. Now to compress the data: hash the original data with SHA256 to get a checksum and convert every 32 byte chunk into a 16 byte RIPEMD160 index into your table. To decompress, try every possibility of collisions for the associated index hashes. If this is infeasible you could always store a small amount of the original data too to help brute force the origin data until it matches the initial hash.

Result: ~50% reduction in any kind of data no matter how random but it requires a metric fuck load of hard drives. Would this be possible? Ay lmaooooo

Anonymous 2017-02-07 09:25:17 Post No.58833903
[Report]

Anonymous 2017-02-07 09:25:17 Post No.58833903 [Report]

>>58833831
>Couldn't you just make a rainbow table of all possibilities that can be stored in 32 bytes
32 bytes = 32*8 bits=256 bits.

Possible combinations for 256 bits = 2^(256) about 10^77 possibilities.

I am gonna say no.

Anonymous 2017-02-07 09:38:43 Post No.58833989
[Report]

Anonymous 2017-02-07 09:38:43 Post No.58833989 [Report]

>>58833903
Plus OPs scheme doesn't actually compress anything.

>To decompress, try every possibility of collisions for the associated index hashes.

I.e. this amounts to having to brute force 50% of the original data anyway.

Anonymous 2017-02-07 09:51:34 Post No.58834088
[Report]

Anonymous 2017-02-07 09:51:34 Post No.58834088 [Report]

>>58833831
Why have the chunks so big? If this were a good idea a chunk size of 2 bits would suffice. Now we can just do what you suggested.
Random string:
00110110

Rainbow table:
00 -> 00
01 -> 11
10 -> 01
11 -> 10

Our compressed string:
00110110
And we have accomplished nothing, if the data is random enough the rainbow table is actually filled out and does nothing.

Anonymous 2017-02-07 09:58:58 Post No.58834139
[Report]

Anonymous 2017-02-07 09:58:58 Post No.58834139 [Report]

>>58833831
You should really learn about entropy, OP.

Anonymous 2017-02-07 10:07:52 Post No.58834197
[Report]

Anonymous 2017-02-07 10:07:52 Post No.58834197 [Report]

>>58833831
No. That's not how entropy works.

Hint: how many entries would you need in a table that stores all possible 32-bit values? The same number as the max 32-bit unsigned int, 2**32

It's mathematically impossible with sufficiently random/already compressed data to compress it any further with your algorithm; it'll generate way too many collisions. (I.E. this will never do better than gzip.)

It's a bit like declaring you've managed to store all the digits of pi in zero bytes using your amazing new compression scheme where an empty file = pi.

It's a bit like thermodynamics; once you've optimized it to a certain point, you can't make any further improvements.

Anonymous 2017-02-07 10:11:55 Post No.58834225
[Report]

Anonymous 2017-02-07 10:11:55 Post No.58834225 [Report]

OP, you may be onto something with using hash methods. The likelihood of a string of data (with a given length) having a collision with a string of data that is the same length is small. So, simply hash any given string of data. Do this in portions, of course.

Decompression would take ages, but there's a good chance of compression ratios being good even on the most entropic of data.

However, if even one of these hashes have a collision that would be tried before the original value that originated that hash, the data which is decompressed is not identical to the compressed data. This could be mitigated by testing decompression immediately after compressing, and if it fails, use an offset or different size of hashes or something.

Anonymous 2017-02-07 02:27:42 Post No.58836154
[Report]

Anonymous 2017-02-07 02:27:42 Post No.58836154 [Report]

>>58833831
Random data does not contain any redundancies.

Anonymous 2017-02-07 03:54:22 Post No.58836980
[Report]

Anonymous 2017-02-07 03:54:22 Post No.58836980 [Report]

>>58833831
Metrics for the win
Lbs make me feel fatter than kilos
Miles make you feel you less speeding than you are
Inches make me feel i have a little Dick
F° shit make you don't know when is freezing celsius is simple 0 is freezing

Anonymous 2017-02-07 04:20:03 Post No.58837218
[Report]

Anonymous 2017-02-07 04:20:03 Post No.58837218 [Report]

are you really this stupid, /g/? And I'm coming here regularly. I think I should stop.

A compressor is a one-to-one mapping. For each string you compress, there is one that you make longer.

Now if count how many strings of length n there, and how many strings that are shorter, you will see what is the relation between the number of strings you can compress by at least 1 bit and the number of shorter strings that your compressor makes longer.

morons. I'm out of here. Forever.

Anonymous 2017-02-07 04:23:05 Post No.58837247
[Report]

Anonymous 2017-02-07 04:23:05 Post No.58837247 [Report]

>>58836154
It might though

Anonymous 2017-02-07 04:33:28 Post No.58837350
[Report]

Anonymous 2017-02-07 04:33:28 Post No.58837350 [Report]

>>58833831
The problem with this scheme is that the SHA256 sum will have collisions too.

Statistically, each 16 byte hash will have 2^128 collisions in the 32 byte space. If you have two such blocks, you're likely to see a SHA256 collision.

You can't cheat entropy.

Anonymous 2017-02-07 04:41:48 Post No.58837434
[Report]

Anonymous 2017-02-07 04:41:48 Post No.58837434 [Report]

>>58833831

because there is no redundant info in random data you can get rid off.

Anonymous 2017-02-07 04:48:07 Post No.58837509
[Report]

Anonymous 2017-02-07 04:48:07 Post No.58837509 [Report]

>>58833831
this is actually sort of how riverbed steelheads work, however you need an appliance on both ends of your wan connection

Anonymous 2017-02-07 05:00:23 Post No.58837642
[Report]

Anonymous 2017-02-07 05:00:23 Post No.58837642 [Report]

>>58833831
The Pigeon Hole Principle proves that for lossless compression, if some inputs get mapped to shorter outputs, *some* inputs necessarily must map to larger ones.

For truly random data, you can't predict which inputs should get shrunk and which get made larger, so you're fucked. Things like gzip limit the degree to which expansion can hurt you by having a tiny flag indicating "next block of data not actually compressed", so you still lose, but not by very much.

Anonymous 2017-02-07 05:38:20 Post No.58838037
[Report]

Anonymous 2017-02-07 05:38:20 Post No.58838037 [Report]

>>58833831
you get to the minimum when compressing data just flips the bits around in circle

Anonymous 2017-02-07 06:17:34 Post No.58838533
[Report]

Anonymous 2017-02-07 06:17:34 Post No.58838533 [Report]

Read Claude Shannon. He is good though von Neumann is total badass.

Anonymous 2017-02-07 06:37:42 Post No.58838759
[Report]

Anonymous 2017-02-07 06:37:42 Post No.58838759 [Report]

>>58833831
Entrapy and information density are directly related.

Anonymous 2017-02-07 06:45:13 Post No.58838859
[Report]

Anonymous 2017-02-07 06:45:13 Post No.58838859 [Report]

>>58838759
God is perfectly just. You can't compress random numbers.

Anonymous 2017-02-07 09:10:16 Post No.58840789
[Report]

Anonymous 2017-02-07 09:10:16 Post No.58840789 [Report]

the bs here is that real data sets are rarely random. doubly so in anything a computer would compress/decompress - things like squid rely on this

on a large enough set of random data even RLE can still make a dent because random or not there will still be a standard distribution of data

Anonymous 2017-02-07 09:32:30 Post No.58841255
[Report]

Anonymous 2017-02-07 09:32:30 Post No.58841255 [Report]

>>58833831

If your data is really random you can compress it like this:

stat -c%s random.dat > random.dat.rz

and uncompress with:

dd if=/dev/random of=random.dat bs=1 count="$(cat random.dat.rz)"

Anonymous 2017-02-07 10:14:51 Post No.58841960
[Report]

Anonymous 2017-02-07 10:14:51 Post No.58841960 [Report]

>see a bunch of rambling incoherent thoughts
>none of you have a god damned clue about what you're babbling on about
>go watch "Silicon Valley" and learn about middle-out concepts
>stupid fucking people will be the death of us all

Anonymous 2017-02-07 10:36:13 Post No.58842335
[Report] Image search: [Google]

Anonymous 2017-02-07 10:36:13 Post No.58842335 [Report]

File: 1485406470793.png (642KB, 726x1040px) Image search: [Google]

642KB, 726x1040px

>>58833831
>random

Anonymous 2017-02-07 11:13:54 Post No.58842879
[Report]

Anonymous 2017-02-07 11:13:54 Post No.58842879 [Report]

>>58833831
A table of all combinations of say, 4 bytes, will require 4 bytes to point to any entry of.

Anonymous 2017-02-08 12:53:21 Post No.58844445
[Report]

Anonymous 2017-02-08 12:53:21 Post No.58844445 [Report]

Lyra is best pony.?

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible. Read more on this topic here - https://archived.moe/talk/thread/1694/

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/