[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vip /vp / vr / w / wg / wsg / wsr / x / y ] [Search | Home]
4Archive logo
I'm building a 1 Petabyte storage system,...
If images are not shown try to refresh the page. If you like this website, please disable any AdBlock software!

You are currently reading a thread in /g/ - Technology

Thread replies: 72
Thread images: 6
File: proggin.jpg (98 KB, 600x516) Image search: [iqdb] [SauceNao] [Google]
proggin.jpg
98 KB, 600x516
I'm building a 1 Petabyte storage system, my budget is <$100K USD. /g/ could you give me some advice?

What would you do if you were tasked with this?
>>
>>42955215
Only storage? Tapes.
>>
Chained hardware RAID cards
>>
http://www.engadget.com/2014/04/30/sony-185tb-data-tape/

What >>42955225 said isn't far off, but good luck trying to buy half a dozen super tapes.
>>
lots of SAS backplanes
lots of cooling
>>
What sort of IOPS do you need?
Assuming what raid?
What are you connecting it too? (Fiber, copper)

Are you doing a whitebox, or are you thinking about an engineered solution?
>>
>>42955215
what are you gonna use it for?
>>
>>42955255
How much do these data tapes cost?
>>
>>42955215
No dice, double the money.
One solution would be normal consumer gear, ie. mATX mobos with 4-6 SATA connectors, switches, 500 * 2TB HDs and pooling the storage with something like hadoop.
But even that with PSUs and all that extra cruft would cost you more then 100k.
And that would from the cheapest end, provided you can land decent discounts, which you should for that cash.
>>
File: lifestream.gif (1 MB, 160x118) Image search: [iqdb] [SauceNao] [Google]
lifestream.gif
1 MB, 160x118
>>42955360

OP here, I'm going to use it for storing image streams of peoples' lives take from a first person perspective
>>
>>42955472
confirmed for google glass database administrator
>>
>>42955415

Couldn't you just line up a bunch of storniators?
http://www.45drives.com/products/direct-wired-redundant.php

Each one of those holds 45 drives.

You would need 12 enclosures and 500x2TB drives.

12 enclosures alone would cost $84000 give or take.

Just picking a decent NAS drive, ST2000VN000, $100 per 2TB.

500x of those is gonna cost $50000

Total budget should be moved up to $134,000 for that.

If you want a real storage solution, you're probably going to need to increase your budget by 5-6 times. But I doubt you're looking for enterprise equipment (3PAR, HDS, NetApp FAS, etc).
>>
>>42955273

Going to do long-term storage, plus a cache layer on the front that serves up recently / commonly accessed portions. Cache layer is unrelated, this is simply for cold, long-term storage.
>>
>>42955273
>>
>>42955273

OP Here:

100 IOPS
RAID 6
Copper
Whitebox
>>
>>42955484
Not quite google glass, but close, https://www.youtube.com/watch?v=j1gQvnIgzQg
>>
>>42955648
So is this project for creepers and voyeurs?
>>
question
why 2tb drives at consumer prices
why not 4tb drives at bulk discount?
why not 6tb helium drives?
>>
>>42955648

Who gives you guys the money for this bullshit and where comes the profit from?
>>
File: karl.png (178 KB, 504x360) Image search: [iqdb] [SauceNao] [Google]
karl.png
178 KB, 504x360
>>42955667

No, it's for people to record their lives.

Like Karl.


Also see:
>>42949101
>>
>>42955724
People conditioned to believe that everyone's private life should be non-existent and that other people actually care about their lives (or maybe they hope they can profit off of people spectating on their lifes, somehow).
>>
>>42955698

Our users will pay a monthly fee to store the prior N months of their life. We've calculated the cost of storing a picture every second of every day plus audio to be around 35GB / month.
>>
>>42955273

Any suggestions?
>>
>>42955864
Workin on it right now, gimmie a bit.

Just doing a few space checks and raw vs. usable. Then I'll get some prices together
>>
>>42955769
Somehow I don't think wearing a cheaply made ball cap that records your boring and uninteresting life is somehow cyberpunk related.
>>
>>42955909
You clearly haven't read the literature then. It's somewhere between simstim, gargoyles, and karl here.
>>
>>42955946
I hope they arrest you.
>>
>>42955215
I think the best advice I can give you for that money, is that start small and scale up, since 100k wont give you 1PB of storage, unless you harness old VCRs, acquire fuckton of tapes and write some sort of data encoder, DIY tape drives yo..

So build a system that is easy to expand, relies on common gear to be found for a while and use software solutions that are active projects, like already mentioned Hadoop.
Don't rely on "costly" hardware solutions for data duplication etc. you don't want to rely on tech that you can't easily change or swap, one example would be HW RAID solutions.
Don't get vendor locked so to speak.
This is the only way to have cheap and functional mass storage that can scale.
Look what other so called pioneers are doing, Google is using consumer tech to scale fast and not get hold up in costly investments if you need to scale down.
>>
>>42955864
>>42955883
Here's what I've come up with, and I really should have used 4TB drives in my orignal suggestion but that was me just being lazy.

NewEgg has solid prices, but a reseller can likely negotiate for a special pricing for the volume you're looking for.
This is all budgetary regardless.

7x Storinator (45 Drive enclosure, redundant boot and power) - $6108.04 ea - $42756.28
1x NetShelter SX 42U 600mm Wide x 1070mm Deep Enclosure - $1200 ea - $1200
272x Seagate 4TB NAS Drive - $180 ea - $48960

Total: $92916.28

In a RAID 6 you'll have roughly 1PB usable.

I didn't bother with power requirements, but you'll need a PDU to mount inside of that cabinet and power everything in there.
8K should cover that.

Feel free to ask any questions or expand on this.
>>
>>42956095
Also, I believe only 2 drives would be allowed to fail, which seems kind of risky with 272 spinning drives.

You may want to consider several extra drives to act as backup/spares in the array.
>>
>>42955946
To be honest I'd rather the guys manufacturing and buying this be upfront about being peeping toms and creeps. That at least I can understand. It's easier to deal with than the guy in cargo shorts and sandals who thinks he's part of the cyberpunk movement.
>>
>>42956130
Yea, I was also considering something like RAID (6 + 0) so that we could have two failures per RAID 6 set.

>>42956095
Is there a reason you chose that Seagate as opposed to something like:

WD Green WD40EZRX 4TB @ $150
or
Seagate Desktop ST4000DM000 4TB @ $150
>>
How much data does Google have?
>>
>>42956229
FB has >100 PB (facebook com/notes/facebook-engineering/under-the-hood-hadoop-distributed-filesystem-reliability-with-namenode-and-avata/10150888759153920)

I would assume Google has more. Back of the envelope on GMail alone is ~127 PB (cyber-knowledge net/blog/gmail/)
>>
>>42956197
I only personally would avoid Green because they have a weird low power mode they sit in when they're not being used.

It really started to drive me nuts waiting for them to spin back up everytime I wanted to use them.

That was just my experience though, others may have had better.

I went with ST4000VN000 (should have thrown that in) because of the dedicated NAS design. That was only personal preference.

If you want to save a little, you can go with the ST4000DM000 (as you mentioned)that the 45 drives guys use in their system .http://blog.backblaze.com/2014/03/19/backblaze-storage-pod-4/

I don't think you would have an issue, but if I'm already spending that much I personally would bump up the drives to something of a bit high caliber.
>>
>>42956291
Un-fucking-believable. And people say government wouldn't be able to record every phone call. They probably have all calls for the last 10 or 20 years.
>>
>>42956304
Thanks a lot, the blaze guys did some great analysis and that's where I started when I began researching this. Gotta love how Cost / GB just keeps halving every few years.
>>
>>42956412

No problem, it's a nice change of pace from the enterprise solutions I usually deal with.

You're just gonna need to get rackspace for everything, it's a shitload of drives to store.
>>
>>42956384

Here's an article that you should read, the internet archive's brewster kahle does some back of the envelope calculations:

.http://blog.archive.org/2013/06/15/cost-to-store-all-us-phonecalls-made-in-a-year-in-cloud-storage-so-it-could-be-datamined/
>>
>>42956462
Haha yea, w.r.t. off the shelf solutions, if you haven't read that article above, you should too. I have a few friends who work at the archive and they did something like buy a bunch of off-the-shelf external hard drives from amazon b/c seagate sells them for less when they're packaged as "USB external storage drives".
>>
You need to employ a perminant sysadmin. Will need at least a 10Gb/s line for his customers, warehouse cost, business insurance, websites.
>>
>>42956563
Don't forget marketing , judging by your youtube view with sub 2000 view you look like a college kid who has been posting this to your friends and family.

You need to think bigger, you can't just rely on viral
>>
>>42956563

That's assumed ;). Of course none of this even includes colocation or electricity bill, but the point of the thread was the fixed costs. Thanks for the answers on that end.

>>42956598

Worry not, the marketing plan includes more than just a youtube video.
>>
>>42955526
May want to set up RAID 6 or something.
>>
>>42956901
I was also thinking RAID 60, so that I could get up to 2 disk failures in each sub-array.
>>
>>42955215

Tape.

Or better, use Amazon S3.
>>
>>42956969
The cost of running storing 1 PB on S3... would be... ungodly.

Something like $45,000 per month.

.http://calculator.s3.amazonaws.com/index.html
>>
You're going to want some sort of clustering file system.

OpenAFS
Gluster
Ceph

Pick one
>>
>>42957061
Ceph using github to host their source makes me want to use it over the others.
>>
You could look into something like backblaze, it's cheap and you dont have to spend a ton of money buying parts yourself.
>>
>>42957061
Or do real sysadmins use 9P?
>>
>>42955648
your website is shit

how much when
>>
>>42957252
9P would work too. I figured that OP was using linux. 9P in userspace is turdy.
>>
>>42955822
wait no, fuck you.
I buy the hat, it records to micro sdhc
WHEN I WANT IT TO

I transfer the data to my pc and have full ownership of it

no wireless or gps

I also doubt your 18 hour battery life
>>
>>42956304
>It really started to drive me nuts waiting for them to spin back up everytime I wanted to use them.
i know this feel
>>
>>42956197
If you're using WD use Reds.
>>
>>42957033
$10k/month on Glacier
>>
>>42956384
I saw this documentary about satelites for the gubbernmt that record a million terrabytes a day.

A DAY
>>
>>42957672
There is some speculation that Glacier is tape or BD, probably not appropriate for OP's needs
>>
>>42957670
+upvote ;)
>>
lol at the retards suggesting cloud storage.
>>
>>42957559

Winter

>>42957617

That's the default mode that you get for free. People who don't care as much about their privacy will want to use the cloud service. I'm more like you and will probably only transfer files over USB.

Never said it would have 18 hours of battery life. Current tests are 7-10 hours depending on usage.
>>
>>42957678
what
>>
>>42958873
In the video the guy clearly states
>0:14 enough battery life to be on from the moment you wake up to the time you go to bed.


>>42959791
>excerpt
>1 million terabytes a day saved forever.
>FOREVER

https://www.youtube.com/watch?v=QGxNyaXfJsA
>>
>>42955215

how big can a porn library get?
>>
>>42955395
I don't know about newer ones, but I can tell you it's around $10 for an 800GB LTO-3 tape
>>
>>42955533
Tapes for sure.
>>
>>42957717
What? Reds' firmware is designed for things like being in a RAID array
>>
>>42961982
1863GB Western Digital WDC WD20EFRX-68AX9N0 ATA Device (SATA) 28 °C
>>
>>42962073
Oh yeah?
>>
>>42962136
FUCK YEAH !
>>
File: sweatin man.jpg (15 KB, 300x300) Image search: [iqdb] [SauceNao] [Google]
sweatin man.jpg
15 KB, 300x300
>>42960510
Thread replies: 72
Thread images: 6
Thread DB ID: 3962



[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vip /vp / vr / w / wg / wsg / wsr / x / y] [Search | Home]

[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vip /vp / vr / w / wg / wsg / wsr / x / y] [Search | Home]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the shown content originated from that site. This means that 4Archive shows their content, archived. If you need information for a Poster - contact them.
If a post contains personal/copyrighted/illegal content, then use the post's [Report] link! If a post is not removed within 24h contact me at [email protected] with the post's information.