Does anyone understand the specifics of lossless compression?
I understand what it does but not how it does it.
>>60125755
Get a bachelors degree in Computer Science.
Essentially it's taking a hash file from a torrent and turning it into the source file with no errors. Hard but not impossible.
Which specific lossless encoding algorithm do you not understand?
I can trivially explain to you how something like Huffman encoding does lossless compression.
>>60125755
It depends on the specific algorithm, but generally it goes like this:
-compress original source using some lossy algorithm based on a prediciton function
-store differences
-pack both in a single file
>>60125796
I don't understand the algorithms used in Huffman encoding, and why they work the way they do
You can take very common long patterns of data and make a shortcut reference to them. Say you have a file that is 10,000 bits long. It's mostly random but there are patterns that occur often enough that you make shortcuts to reference them. For example, think of an image like some ms paint reaction image. There is going to be a lot of white pixels so you might say that A references 10 pixels in a row, so you replace all the data that means 10 white pixels in a row with A which is less data than 001011000001001000... Or whatever the bits look like. And then there's some overhead to store 10 white pixels as A, but if you take the most common patterns then you are able to compress a lot of the data while still preserving all the data
>>60125755
To put it simply, it's many times like this:
AAAAABBBCCCCCCCCC = 5A3B9C
The original data can be determined from the compressed one.
https://en.wikipedia.org/wiki/Information_theory
https://en.wikipedia.org/wiki/A_Mathematical_Theory_of_Communication
https://en.wikipedia.org/wiki/Entropy_(information_theory)
https://en.wikipedia.org/wiki/Shannon%27s_source_coding_theorem
https://en.wikipedia.org/wiki/Data_compression
>>60125909
But what if its ABAABCCA
How would it know where to place the A B and C if its compressed as 4A 2B 2C
>>60125964
>ABAABCCA
This information is much less redundant than
>AAAAABBBCCCCCCCCC
this one.
Meaning compressing the former losslessly won't neccesarily give a decrease in file size.
>>60125964
That would be 'compressed' as 1A1B2A1B2C1A
Time travel stupid goy
>>60125964
That's where algorithms come in, they search the file for patterns and then create an index so that you can easily turn that ABAABCCA into a single byte information.
Example:
0x01 ABAABCCA
0x01
>>60125755
The others guys said this before, it's basically about patterns.
Imagine this is a part of a picture, the numbers are colors:111222
111222
111222
111333
111444
123456
It could be expressed like this:a = 111222
line1, line2, line3 = a
line3 = a + 000111
line4 = line4 + 000111
If you want to know details, look up how to do matrix compression.
>>60126956
Sorry, messed up the numbers.
But you get the idea.
>>60125924
The source coding theorem shows that (...) it is impossible to compress the data such that the code rate (average number of bits per symbol) is less than the Shannon entropy of the source, without it being virtually certain that information will be lost. However it is possible to get the code rate arbitrarily close to the Shannon entropy, with negligible probability of loss.
So, how close are current state of the art lossless algorithms from the Shannon entropy of the source material? I'm wondereing how much improvement can be done still.
>>60125964
That's where algorithms come in, they search the file for patterns and then create an index so that you can easily turn that ABAABCCA into a single byte information.
Example:0x00 XXXXXXXX - reserved header
0x01 EFLKABAA
0x02 BCCAPIZF
0x03 JHERVBNS
0x04 ITWQOUGF
0x05 ABAABCCA
Becomes:0x00 XABAABCC
0x01 AZEFLKXP
0x02 IZFJHERV
0x03 BNSITWQO
0x04 UGFX
The letter "X" was used as variable. You can expand the capacity of compression using variables, addresses and headers in algorithms.
>>60127045
Depends entirely on the context. If you have a video file and you want to compress it, it's already compressed beyond the shannon limit of the source material because of the lossy compression. I don't know the answer for things like video/audio/picture codecs however.
>>60127221
Well I asked about lossless algorithns specifically. Like lossless video codecs or image compression like lzip fpr tiff or what ever png uses, or flac etc.
>>60125755
>I understand what it does but not how it does it.
Then you don't understand what it does.
>>60127652
Again, it's dependent on context. The shannon limit only tells you how much information you could possibly stuff into a channel (of certain properties) without having any errors decoding it on the receiving end.
Before you can answer your question you need to know how the information is going to be represented and encoded. By represented I mean how you would encode something like an audio signal, and by encoded I mean the coding which you'll use to turn the audio waves into 0s and 1s.
https://www.ee.columbia.edu/~dpwe/e6820/lectures/L07-coding.pdf
Here's a PDF that could explain it far better than I ever could. Slide #10 shows how you'd calculate the shannon limit for audio, given the method used to represent the audio.
>By represented I mean *the method* you'd use to encode something like an audio signal
Fix'd
>>60127932
>>60127944
ftp://svr-ftp.eng.cam.ac.uk/pub/reports/auto-pdf/robinson_tr156.pdf
Here's the paper FLAC is based on.
>>60125774
lel
>>60128090
>>60127932
So can you give a ballpark estimate where we are in terms of compression? How close to ideal are we? 10%? 50? 90?
>>60129129
depends on what you're compressing
if you want to know more, check out;
https://en.wikipedia.org/wiki/Entropy_(information_theory)
>>60129340
For: recorded audio, video, and image data. Give me some numbers.
>>60129897
no idea
>>60130005
Thought so.
Speaking of Huffman compression, what is it called when you have separate code tables dependent on earlier code word?