[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y ] [Search | Free Show | Home]

Lossless Compression

This is a blue board which means that it's for everybody (Safe For Work content only). If you see any adult content, please report it.

Thread replies: 31
Thread images: 2

Does anyone understand the specifics of lossless compression?
I understand what it does but not how it does it.
>>
>>60125755
Get a bachelors degree in Computer Science.
>>
Essentially it's taking a hash file from a torrent and turning it into the source file with no errors. Hard but not impossible.
>>
Which specific lossless encoding algorithm do you not understand?

I can trivially explain to you how something like Huffman encoding does lossless compression.
>>
>>60125755
It depends on the specific algorithm, but generally it goes like this:
-compress original source using some lossy algorithm based on a prediciton function
-store differences
-pack both in a single file
>>
>>60125796
I don't understand the algorithms used in Huffman encoding, and why they work the way they do
>>
You can take very common long patterns of data and make a shortcut reference to them. Say you have a file that is 10,000 bits long. It's mostly random but there are patterns that occur often enough that you make shortcuts to reference them. For example, think of an image like some ms paint reaction image. There is going to be a lot of white pixels so you might say that A references 10 pixels in a row, so you replace all the data that means 10 white pixels in a row with A which is less data than 001011000001001000... Or whatever the bits look like. And then there's some overhead to store 10 white pixels as A, but if you take the most common patterns then you are able to compress a lot of the data while still preserving all the data
>>
>>60125755
To put it simply, it's many times like this:
AAAAABBBCCCCCCCCC = 5A3B9C

The original data can be determined from the compressed one.
>>
https://en.wikipedia.org/wiki/Information_theory

https://en.wikipedia.org/wiki/A_Mathematical_Theory_of_Communication

https://en.wikipedia.org/wiki/Entropy_(information_theory)

https://en.wikipedia.org/wiki/Shannon%27s_source_coding_theorem

https://en.wikipedia.org/wiki/Data_compression
>>
>>60125909
But what if its ABAABCCA
How would it know where to place the A B and C if its compressed as 4A 2B 2C
>>
>>60125964
>ABAABCCA
This information is much less redundant than
>AAAAABBBCCCCCCCCC
this one.

Meaning compressing the former losslessly won't neccesarily give a decrease in file size.
>>
>>60125964
That would be 'compressed' as 1A1B2A1B2C1A
>>
Time travel stupid goy
>>
>>60125964
That's where algorithms come in, they search the file for patterns and then create an index so that you can easily turn that ABAABCCA into a single byte information.

Example:

0x01 ABAABCCA
0x01
>>
>>60125755

The others guys said this before, it's basically about patterns.

Imagine this is a part of a picture, the numbers are colors:
111222
111222
111222
111333
111444
123456


It could be expressed like this:

a = 111222
line1, line2, line3 = a
line3 = a + 000111
line4 = line4 + 000111


If you want to know details, look up how to do matrix compression.
>>
>>60126956

Sorry, messed up the numbers.

But you get the idea.
>>
>>60125924
The source coding theorem shows that (...) it is impossible to compress the data such that the code rate (average number of bits per symbol) is less than the Shannon entropy of the source, without it being virtually certain that information will be lost. However it is possible to get the code rate arbitrarily close to the Shannon entropy, with negligible probability of loss.

So, how close are current state of the art lossless algorithms from the Shannon entropy of the source material? I'm wondereing how much improvement can be done still.
>>
>>60125964
That's where algorithms come in, they search the file for patterns and then create an index so that you can easily turn that ABAABCCA into a single byte information.

Example:
0x00 XXXXXXXX - reserved header
0x01 EFLKABAA
0x02 BCCAPIZF
0x03 JHERVBNS
0x04 ITWQOUGF
0x05 ABAABCCA


Becomes:
0x00 XABAABCC
0x01 AZEFLKXP
0x02 IZFJHERV
0x03 BNSITWQO
0x04 UGFX


The letter "X" was used as variable. You can expand the capacity of compression using variables, addresses and headers in algorithms.
>>
>>60127045
Depends entirely on the context. If you have a video file and you want to compress it, it's already compressed beyond the shannon limit of the source material because of the lossy compression. I don't know the answer for things like video/audio/picture codecs however.
>>
>>60127221
Well I asked about lossless algorithns specifically. Like lossless video codecs or image compression like lzip fpr tiff or what ever png uses, or flac etc.
>>
>>60125755
>I understand what it does but not how it does it.
Then you don't understand what it does.
>>
>>60127652
Again, it's dependent on context. The shannon limit only tells you how much information you could possibly stuff into a channel (of certain properties) without having any errors decoding it on the receiving end.
Before you can answer your question you need to know how the information is going to be represented and encoded. By represented I mean how you would encode something like an audio signal, and by encoded I mean the coding which you'll use to turn the audio waves into 0s and 1s.

https://www.ee.columbia.edu/~dpwe/e6820/lectures/L07-coding.pdf

Here's a PDF that could explain it far better than I ever could. Slide #10 shows how you'd calculate the shannon limit for audio, given the method used to represent the audio.
>>
>By represented I mean *the method* you'd use to encode something like an audio signal
Fix'd
>>
>>60127932
>>60127944
ftp://svr-ftp.eng.cam.ac.uk/pub/reports/auto-pdf/robinson_tr156.pdf

Here's the paper FLAC is based on.
>>
>>60125774
lel
>>
>>60128090
>>60127932
So can you give a ballpark estimate where we are in terms of compression? How close to ideal are we? 10%? 50? 90?
>>
File: 1460966780850.jpg (225KB, 627x502px) Image search: [Google]
1460966780850.jpg
225KB, 627x502px
>>60129129
depends on what you're compressing
if you want to know more, check out;
https://en.wikipedia.org/wiki/Entropy_(information_theory)
>>
>>60129340
For: recorded audio, video, and image data. Give me some numbers.
>>
>>60129897
no idea
>>
>>60130005
Thought so.
>>
Speaking of Huffman compression, what is it called when you have separate code tables dependent on earlier code word?
Thread posts: 31
Thread images: 2


[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y] [Search | Top | Home]

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/


If you need a post removed click on it's [Report] button and follow the instruction.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com.
If you like this website please support us by donating with Bitcoins at 16mKtbZiwW52BLkibtCr8jUg2KVUMTxVQ5
All trademarks and copyrights on this page are owned by their respective parties.
Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the content originated from that site.
This means that RandomArchive shows their content, archived.
If you need information for a Poster - contact them.