Lossless Compression

Thread replies: 31
Thread images: 2

Anonymous
Lossless Compression 2017-04-29 11:28:31 Post No. 60125755
[Report] Image search: [Google]

File: 73db138acd5695e0c6c85d93261a0ca1.jpg (63KB, 449x346px) Image search: [Google]

63KB, 449x346px

Lossless Compression Anonymous 2017-04-29 11:28:31 Post No. 60125755 [Report]

Does anyone understand the specifics of lossless compression?
I understand what it does but not how it does it.

Anonymous 2017-04-29 11:29:39 Post No.60125762
[Report]

Anonymous 2017-04-29 11:29:39 Post No.60125762 [Report]

>>60125755
Get a bachelors degree in Computer Science.

Anonymous 2017-04-29 11:30:42 Post No.60125774
[Report]

Anonymous 2017-04-29 11:30:42 Post No.60125774 [Report]

Essentially it's taking a hash file from a torrent and turning it into the source file with no errors. Hard but not impossible.

Anonymous 2017-04-29 11:33:30 Post No.60125796
[Report]

Anonymous 2017-04-29 11:33:30 Post No.60125796 [Report]

Which specific lossless encoding algorithm do you not understand?

I can trivially explain to you how something like Huffman encoding does lossless compression.

Anonymous 2017-04-29 11:33:36 Post No.60125797
[Report]

Anonymous 2017-04-29 11:33:36 Post No.60125797 [Report]

>>60125755
It depends on the specific algorithm, but generally it goes like this:
-compress original source using some lossy algorithm based on a prediciton function
-store differences
-pack both in a single file

Anonymous 2017-04-29 11:38:58 Post No.60125839
[Report]

Anonymous 2017-04-29 11:38:58 Post No.60125839 [Report]

>>60125796
I don't understand the algorithms used in Huffman encoding, and why they work the way they do

Anonymous 2017-04-29 11:44:52 Post No.60125880
[Report]

Anonymous 2017-04-29 11:44:52 Post No.60125880 [Report]

You can take very common long patterns of data and make a shortcut reference to them. Say you have a file that is 10,000 bits long. It's mostly random but there are patterns that occur often enough that you make shortcuts to reference them. For example, think of an image like some ms paint reaction image. There is going to be a lot of white pixels so you might say that A references 10 pixels in a row, so you replace all the data that means 10 white pixels in a row with A which is less data than 001011000001001000... Or whatever the bits look like. And then there's some overhead to store 10 white pixels as A, but if you take the most common patterns then you are able to compress a lot of the data while still preserving all the data

Anonymous 2017-04-29 11:48:41 Post No.60125909
[Report]

Anonymous 2017-04-29 11:48:41 Post No.60125909 [Report]

>>60125755
To put it simply, it's many times like this:
AAAAABBBCCCCCCCCC = 5A3B9C

The original data can be determined from the compressed one.

Anonymous 2017-04-29 11:50:05 Post No.60125924
[Report]

Anonymous 2017-04-29 11:50:05 Post No.60125924 [Report]

https://en.wikipedia.org/wiki/Information_theory

https://en.wikipedia.org/wiki/A_Mathematical_Theory_of_Communication

https://en.wikipedia.org/wiki/Entropy_(information_theory)

https://en.wikipedia.org/wiki/Shannon%27s_source_coding_theorem

https://en.wikipedia.org/wiki/Data_compression

Anonymous 2017-04-29 11:55:45 Post No.60125964
[Report]

Anonymous 2017-04-29 11:55:45 Post No.60125964 [Report]

>>60125909
But what if its ABAABCCA
How would it know where to place the A B and C if its compressed as 4A 2B 2C

Anonymous 2017-04-29 11:59:58 Post No.60125995
[Report]

Anonymous 2017-04-29 11:59:58 Post No.60125995 [Report]

>>60125964
>ABAABCCA
This information is much less redundant than
>AAAAABBBCCCCCCCCC
this one.

Meaning compressing the former losslessly won't neccesarily give a decrease in file size.

Anonymous 2017-04-29 01:11:09 Post No.60126639
[Report]

Anonymous 2017-04-29 01:11:09 Post No.60126639 [Report]

>>60125964
That would be 'compressed' as 1A1B2A1B2C1A

Anonymous 2017-04-29 01:17:18 Post No.60126706
[Report]

Anonymous 2017-04-29 01:17:18 Post No.60126706 [Report]

Time travel stupid goy

Anonymous 2017-04-29 01:41:58 Post No.60126945
[Report]

Anonymous 2017-04-29 01:41:58 Post No.60126945 [Report]

>>60125964
That's where algorithms come in, they search the file for patterns and then create an index so that you can easily turn that ABAABCCA into a single byte information.

Example:
0x01 ABAABCCA
0x01

Anonymous 2017-04-29 01:42:47 Post No.60126956
[Report]

Anonymous 2017-04-29 01:42:47 Post No.60126956 [Report]

>>60125755

The others guys said this before, it's basically about patterns.

Imagine this is a part of a picture, the numbers are colors:
111222
111222
111222
111333
111444
123456
It could be expressed like this:
a = 111222
line1, line2, line3 = a
line3 = a + 000111
line4 = line4  + 000111
If you want to know details, look up how to do matrix compression.

Anonymous 2017-04-29 01:44:34 Post No.60126977
[Report]

Anonymous 2017-04-29 01:44:34 Post No.60126977 [Report]

>>60126956

Sorry, messed up the numbers.

But you get the idea.

Anonymous 2017-04-29 01:52:18 Post No.60127045
[Report]

Anonymous 2017-04-29 01:52:18 Post No.60127045 [Report]

>>60125924
The source coding theorem shows that (...) it is impossible to compress the data such that the code rate (average number of bits per symbol) is less than the Shannon entropy of the source, without it being virtually certain that information will be lost. However it is possible to get the code rate arbitrarily close to the Shannon entropy, with negligible probability of loss.

So, how close are current state of the art lossless algorithms from the Shannon entropy of the source material? I'm wondereing how much improvement can be done still.

Anonymous 2017-04-29 01:54:28 Post No.60127069
[Report]

Anonymous 2017-04-29 01:54:28 Post No.60127069 [Report]

>>60125964
That's where algorithms come in, they search the file for patterns and then create an index so that you can easily turn that ABAABCCA into a single byte information.

Example:
0x00 XXXXXXXX - reserved header
0x01 EFLKABAA
0x02 BCCAPIZF
0x03 JHERVBNS
0x04 ITWQOUGF
0x05 ABAABCCA
Becomes:
0x00 XABAABCC
0x01 AZEFLKXP
0x02 IZFJHERV
0x03 BNSITWQO
0x04 UGFX
The letter "X" was used as variable. You can expand the capacity of compression using variables, addresses and headers in algorithms.

Anonymous 2017-04-29 02:10:12 Post No.60127221
[Report]

Anonymous 2017-04-29 02:10:12 Post No.60127221 [Report]

>>60127045
Depends entirely on the context. If you have a video file and you want to compress it, it's already compressed beyond the shannon limit of the source material because of the lossy compression. I don't know the answer for things like video/audio/picture codecs however.

Anonymous 2017-04-29 02:50:38 Post No.60127652
[Report]

Anonymous 2017-04-29 02:50:38 Post No.60127652 [Report]

>>60127221
Well I asked about lossless algorithns specifically. Like lossless video codecs or image compression like lzip fpr tiff or what ever png uses, or flac etc.

Anonymous 2017-04-29 02:52:17 Post No.60127673
[Report]

Anonymous 2017-04-29 02:52:17 Post No.60127673 [Report]

>>60125755
>I understand what it does but not how it does it.
Then you don't understand what it does.

Anonymous 2017-04-29 03:17:20 Post No.60127932
[Report]

Anonymous 2017-04-29 03:17:20 Post No.60127932 [Report]

>>60127652
Again, it's dependent on context. The shannon limit only tells you how much information you could possibly stuff into a channel (of certain properties) without having any errors decoding it on the receiving end.
Before you can answer your question you need to know how the information is going to be represented and encoded. By represented I mean how you would encode something like an audio signal, and by encoded I mean the coding which you'll use to turn the audio waves into 0s and 1s.

https://www.ee.columbia.edu/~dpwe/e6820/lectures/L07-coding.pdf

Here's a PDF that could explain it far better than I ever could. Slide #10 shows how you'd calculate the shannon limit for audio, given the method used to represent the audio.

Anonymous 2017-04-29 03:18:40 Post No.60127944
[Report]

Anonymous 2017-04-29 03:18:40 Post No.60127944 [Report]

>By represented I mean *the method* you'd use to encode something like an audio signal
Fix'd

Anonymous 2017-04-29 03:29:29 Post No.60128090
[Report]

Anonymous 2017-04-29 03:29:29 Post No.60128090 [Report]

>>60127932
>>60127944
ftp://svr-ftp.eng.cam.ac.uk/pub/reports/auto-pdf/robinson_tr156.pdf

Here's the paper FLAC is based on.

Anonymous 2017-04-29 04:19:05 Post No.60128638
[Report]

Anonymous 2017-04-29 04:19:05 Post No.60128638 [Report]

>>60125774
lel

Anonymous 2017-04-29 04:57:07 Post No.60129129
[Report]

Anonymous 2017-04-29 04:57:07 Post No.60129129 [Report]

>>60128090
>>60127932
So can you give a ballpark estimate where we are in terms of compression? How close to ideal are we? 10%? 50? 90?

Anonymous 2017-04-29 05:12:45 Post No.60129340
[Report] Image search: [Google]

Anonymous 2017-04-29 05:12:45 Post No.60129340 [Report]

File: 1460966780850.jpg (225KB, 627x502px) Image search: [Google]

225KB, 627x502px

>>60129129
depends on what you're compressing
if you want to know more, check out;
https://en.wikipedia.org/wiki/Entropy_(information_theory)

Anonymous 2017-04-29 05:52:26 Post No.60129897
[Report]

Anonymous 2017-04-29 05:52:26 Post No.60129897 [Report]

>>60129340
For: recorded audio, video, and image data. Give me some numbers.

Anonymous 2017-04-29 06:02:00 Post No.60130005
[Report]

Anonymous 2017-04-29 06:02:00 Post No.60130005 [Report]

>>60129897
no idea

Anonymous 2017-04-29 06:18:55 Post No.60130229
[Report]

Anonymous 2017-04-29 06:18:55 Post No.60130229 [Report]

>>60130005
Thought so.

Anonymous 2017-04-29 08:06:14 Post No.60131750
[Report]

Anonymous 2017-04-29 08:06:14 Post No.60131750 [Report]

Speaking of Huffman compression, what is it called when you have separate code tables dependent on earlier code word?

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible. Read more on this topic here - https://archived.moe/talk/thread/1694/

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/