[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y ] [Search | Extra juicy! | Home]

I want to store 35petabytes of data, how can i do it?

This is a blue board which means that it's for everybody (Safe For Work content only). If you see any adult content, please report it.

Thread replies: 94
Thread images: 6

File: petahdd.jpg (75KB, 602x672px) Image search: [iqdb] [SauceNao] [Google]
petahdd.jpg
75KB, 602x672px
I want to store 35petabytes of data, how can i do it?
>>
>>45445767
Install gentoo
>>
You post on /g/ asking for advice on $50+ million dollar system.
>>
Buy about 18000 2TB hard drives.
>>
>implying you have 35PB
>>
Depends on what you need to store it for. Probably the cheapest option would be on tape cartridges, though you'd still be setting yourself back quite a bit. A quick search shows a 10-pack of 1.5 TB tape cartridges on Amazon for $210 each, so that would be $490,140 altogether. Though I'm going to assume that if you are going to buy 2334 tape cartridges you're going to get a better deal and some kind of bulk discount, also I know they do make higher capacity cartridges, and also you will have to account for some of them failing.

I'd estimate it'd still set you back at least 200-300 grand though.
>>
>>45445767
Get a job at the NSA
>>
>>45445767
open 2.333×10^6 google drive accounts
>>
>>45445790
this is the only way to do it
>>
>>45445805
Eh, I think a VNX 7400 will do about that for ~8 mil...
>>
You can actually compress any amount of data into a very small size by removing all the zeroes, and then replacing the long string of 1s with a binary integer count of how many there are.
>>
>>45445825
this.
and upboatted.
>>
>>45445767
Contact Cyborg Technologies for their Shadow compression.

http://www.cyborg.co/shadow/
>>
>>45445911
how could you possibly unpack that later on, you would have no knowledge of where the 0's are supposed to be.
>>
>>45445911
>>45445950
stop it you guys, this is srs businez
>>
>>45445825
seems the best solution so far

there is also the option of aws for 35k/month but it'll be much more expensive in the long run
>>
>>45445911
This is the most basic form of compression that I could program in my basement.
The fuck moron
You literally encode repeating integers.
Example 123451234512345 = 312345 except in binary
>>
>>45445971
It doesn't matter, the zeroes don't contain any information anyway. They're just padding.
>>
>>45445767
I'm fairly certain you'll need a distributed file system on top of quite a few servers.
Lustre and Ceph come to mind.
>>
>>45445767

get a tape service. Only way, really.
>>
>>45445950

Is this a joke? A scam?

>claims unbreakable encryption
>>Digital Library of Congress (~10PB): 1.2 kilobytes
top
>>
>>45446105
didn't know about them, thnx
>>
Quantum hdd when?
>>
>>45446332
can someone check if the shadow- compressed files start with
"magnet:?"
>>
>>45446332
It's a retard.
>>
>>45446644
Seems more like a joke/satire to me honestly.
>>
>>45446659
Yeah, it does actually.
But the problem is that it's not.
It's the 'dream in code' kid.
>>
>>45446332
The unbreakable encryption claim is likely to be true. Try breaking encryption to data that doesn't exist.
>>
>>45446332
>>45446644
>>45446659
>>45445950

Time for Nicolas DuPont thread?
>>
>>45446669
"Unbreakable encryption" has existed since before computers have, it's just not practical for most situations.
>>
>>45446664
Wouldn't he realize pretty quickly that his algorithms don't actually work though?
>>
>>45446709
You'd think that. But this has the smell of typical shit like miracle free energy devices. Someone, through their own retardation, fools themselves into believing that their magic technology actually works, and seeks out investors. Using the enthusiasm they gain from being stupid, they manage to kindle some interest.
What happens from here is where it turns from retard to scam. You can't stop here and go 'whoops it was a mistake' to the investors, and get back the money you wasted. You keep going. You either fool yourself even harder, or you realise it doesn't work, have that 'oh god' moment and keep up the illusion that it's real out of panic. Fleischmann and Pons is the classic story of fools turned scammers. One repeated over and over again.
>>
>>45446756
Aren't those usually people who know damn well it doesn't work and just want to make a quick buck off gullible morons?
>>
>>45446765
They're rarer than you might expect. People don't usually go into this kind of thing trying to scam people right away.
Well okay a lot do, but the difference between a scammer and a fool is that the fool actually believes his own nonsense.

I recommend reading Voodoo Science by Robert Park.
Honestly it should be mandatory reading in schools. Or at least something with the same kind of ideas.
>>
>>45446709
I made a case earlier this year that 'compression' of this scale actually breaks the laws of thermodynamics. You can break down information itself into a thermodynamically compatible process. Flipping a bit from a 0 to a 1 requires energy, and states of bits are states of entropy levels. Basically with the idea that even an 100% efficient processor would still use minute amounts of energy to flip bits around, compression of this level is akin to saying you get more energy out of a computation than you put in.
IE more bits are flipped out than you put energy in to flip them.

It's a fairly flimsy argument, but I think it's more or less on the right lines.
>>
>>45446699
>>45446699
>omg anon i lost my password, cant you just like reset it
>>
>>45446811
Do you know what compression is?
>>
>>45445767
Data cloud on one or more server farms or a slow but comparatively less expensive tape robot setup plus a bit of manual work are the two "realistic" ways to do it.

You'll probably need a budget exceeding a decent nation's GDP to realize that one soon, though.
>>
>>45446872
Yes. 10 PB into 1.2 kilobytes is not compression.
Keep in mind that the claim also included it being fast and requiring very little processor usage.
>>
>>45446811
But lossless compression is possible and common. It's not that rare to see certain types of data compressed to 10% or less of their original size. I'm not sure what your cutoff would be where a certain compression rate starts to violate the laws of physics.
>>
>>45446911
Just reading 10 PB of data into memory would take a long time.
>>
>>45446912
The cutoff is at meaningful data for lossless compression. Lossless compression is pretty simple. Just cut out the nonsense. Since it's a lossless compression claim we're talking about- imagine an entire book of words and meaning compressed into just a single word.
That's nonsense. you can see how just a single word can never mean as much as the whole book without a book worth of algorithms to 'decode' it.

The cutoff point isn't hard and fast I'm afraid. As I said, the argument is flimsy.
>>
>>45446911
Why would you think 10 PB into 1.2KB isn't compression?
You don't even know what compression is, do you?
>>
>>45446936
In what way is 10PB into 1.2kB compression?
It's nonsense.
You could compress a 10PB long chain of 0s into 1.2kB. Hell I just did it myself.
"10PB long chain of 0s" clearly takes less space than 10PB.
But what we're talking about is 10PB of books, papers, words, letters, language.
Or the other claims are compressing a 40GB bluray movie into 20 bytes.
Claiming that this is possible is more absurd than claiming it is not.
>>
>>45446931
But it's not always intuitively obvious what is "nonsense" or what kinds of patterns can be constructed to compress the data further. Of course there is a lower limit of bits that some data can't possibly be reduced any further, but I don't think we have any way to really know what that limit is. Only that we cannot reduce it further with known methods.
>>
>>45445767
> buy a printer and paper
> print out the binary code of every file

problem solved
>>
tapes

hdd's are cheaper for relatively smaller amount of data, but at 35PB, the cost of a tape drive becomes insignificant
>>
>>45446996
And my argument revolves around the idea that you could use thermodynamics to find these lower limits, and that this kids' claims probably falls under it.
Like I said, it's not a strong argument.
>>
>>45447010
There's no way OP can afford that much paper and time.
>>
>>45447010
i printed an ebook onto a piece of paper once

i did it by printing ~80KB worth of datamatrix 2d barcodes on each side
>>
>>45447024
Sounds neat, could you explain how
>>
>>45445767
Save 42.zip 8 times (~331KiB). It's more than 35PB extracted. You are welcome.
>>
>>45447010
Poor rainforests, good bye Oxigen!
>>
>>45447044
i just compressed the ebook (after stripping out the cover jpeg), and split it into chunks small enough to be encoded into datamatrix codes, then converted them and layed them out in libreoffice

before that i also tried printing codes at various DPI to see how small i could print them before they became too poor quality to read back

it was just for fun, i didn't put effort into making an automated way of producing or scanning the pages
>>
>>45447075
You could use this tho:
http://ollydbg.de/Paperbak/
It's GPL, someone could port it to Linux
>>
File: a.jpg (2MB, 2896x2172px) Image search: [iqdb] [SauceNao] [Google]
a.jpg
2MB, 2896x2172px
>>45447075
this is what it looked like
>>
>>45447100
>now a 2.09 MB image

Nice compression
>>
>>45447100
and how big was the ebbok uncompressed?
>>
>>45447100
Reported for spreading copyrighted material :^)
>>
>>45447100
Embedding this and scrolling is trippy as fuck
>>
>>45447115
don't worry, if you look closely, all those codes are identical, i only printed the *number* of codes the book was converted to, i couldn't be assed laying out all of the actual ones

plus the photo isn't good enough to read back the code
>>
>>45447109
1.9M epub
763K extracted with with the jpegs and embedded fonts removed
165K compressed with lrzip/zpaq
>>
What if we stored every possible bit pattern of a certain length in a big hash table and then replaced each bit string in the data with the index of the data? It would require a shitload of storage space but you could just store one copy of the data on a server somewhere and then all the clients make use of the server for compressing/decompressing. Would probably be slow as fuck too since you basically have to download the entire program from the cloud server, but you should be able to get a constant high rate of compression for anything, so it might be good for archival purposes.

I guess you'd be fucked if the server went down but you could just recompute the table on your end if you really desperately needed to decompress something.
>>
>>45447159
On second thought I'm retarded, just realized why this is a really dumb idea...
>>
>>45447159
sounds similar to huffman coding, which is a common lossless compression technique
>>
>>45447181
Yeah, wouldn't work the way I imagined it though since to address your table with 2^n entries you'd need a n-bit index which would be as large as the data you wanted to compress in the first place.
>>
>>45445950
>www.cyborg.co/shadow
>Linux 12.04+
Still not fixed.
>>
File: 1395967572221.jpg (93KB, 958x545px)
1395967572221.jpg
93KB, 958x545px
>>45445767
Maybe this will help you.
>>
>>45446007
HAHAHAHAHAHAHAHAHAHAHAHAHA
>>
>>45447223
Why don't we just use DNA to store data?
>>
>>45445767
1. Have a lot of money
2. Use Amazons cloud storage systems
3. No longer have any money
>>
>>45447159
>>45447165
it's like a stupid version of plan9's file server
>>
>>45447238
Making and sequencing DNA is not easy and takes a long time.

Also not sure what the shelf life is like if you just let it sit around.
>>
>>45447262
Well sperm dies in 15 minutes when left out
>>
>>45445767
wait 10 years
>>
>>45445767
ZFS.
>>
>>45446936
>>45446953
I always just figured it was people who didn't know what a fucking Hash was.
>>
>The year is 2055
>Nanotechnology has gone mainstream in the game for 30 years already
>We now have 5PB HDD
Feels good in the future
>>
>>45446007
holy shit
>>
>>45447453
That feature probably makes use of a centralised cloud with "infinite space" and personal archives are for neckbeards who still use windows 10.
>>
>>45447453
At some point before 2055 we should have memristor based drives that are not only petabytes big, but also so fast as to make RAM unnecessary. We might even have that by 2035, but that's probably a little optimistic.
>>
i've filled all the physical space in my PC with 1tb and 2tb hard drives because i'm too scared to buy anything bigger because muh failure rate

now I'm out of space and I can't hold any more 50gb blu ray rips

what do /g/
>>
>>45447801
when I say "all the space" I really mean it too. I got rid of my optical drive and used the space there for more HDs
>>
>>45447801
put together a barebones pc and fill that with hdd's, use as a NAS or iSCSI server
>>
>>45446996
>But it's not always intuitively obvious what is "nonsense" or what kinds of patterns can be constructed to compress the data further. Of course there is a lower limit of bits that some data can't possibly be reduced any further, but I don't think we have any way to really know what that limit is. Only that we cannot reduce it further with known methods.
It's called the Information Theory. It's been researched for years. The relevant topic is information entrophy. A VERY simplified summary is: the more random your data appears, the less you can compress it. And compressed (reduced size) data always has more entrophy (is more random) than before compression.
Due to the fact that claimed numbers are way off the scale of what is currently documented, tested and verified you may understand why people here are very suspicious of those claims and unless proven (in the scientific sense) assume them to be uninformed at best, malicious at worst.
>>
>>45445767
Use your mind faggot
>>
>>45445767
A cubic fuckload of zip disks
>>
>>45445767
Lots of these
https://www.backblaze.com/blog/backblaze-storage-pod-4/
>>
>>45449044
>taking any kind of storage advice from backblaze
>>
>>45446007
Haha. Okay, buddy.
>>
File: niger_laughing.jpg (52KB, 536x400px) Image search: [iqdb] [SauceNao] [Google]
niger_laughing.jpg
52KB, 536x400px
>>45446007
>>
>>45446007
you gotta have a few 0's every now and then to know when the 1's come along

waiting on the 1's now
>>
>>45446811
>A computer with energy flowing in is a closed system

lel
>>
That's a lot of porn...
Thread posts: 94
Thread images: 6


[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y] [Search | Top | Home]
Please support this website by donating Bitcoins to 16mKtbZiwW52BLkibtCr8jUg2KVUMTxVQ5
If a post contains copyrighted or illegal content, please click on that post's [Report] button and fill out a post removal request
All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the content originated from that site. This means that 4Archive shows an archive of their content. If you need information for a Poster - contact them.