[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y ] [Search | Free Show | Home]

big data ideas

This is a blue board which means that it's for everybody (Safe For Work content only). If you see any adult content, please report it.

Thread replies: 12
Thread images: 1

File: Esri-and-Big-Data.jpg (245KB, 1338x1142px) Image search: [Google]
Esri-and-Big-Data.jpg
245KB, 1338x1142px
Alright /g/
I need some ideas to develop a big data system for my final year project
Ideally it should be something that integrates multiple sources of data (even in different formats) and after applying different machine learning algorithms produces some insights from that data

I can't think of a problem or data sets I could use for this
Anyone have any ideas?
>>
>>56956453

1. write a webcrawler for 4chan
2. get posts from /g/, /v/, /pol/, /b/ and /mlp/ over one month
3. map/reduce posts to topics/badwords/..
4. do some analytical circlejerking with some figures
5. come back here and post results
>>
>>56956453
do like a german anon did once

get all 4chan posts/information and try to separate them by user, using writing patterns and shit like that
>>
>>56957010

Source, plox.
>>
>>56956453
Nobody will care about what exactly the result of that is, right? Just about the data system?

Shit, put some data from an US or Swiss or whatever statistics buerau into Apache Slick / Hadoop shit and wrangle it by state / canton.

Not big enough data? Grab wikipedia and try to learn what was the cause for most reverts in the changelog, at what time of day it was done, who did it, yadda yadda. Or whatever.

Or try to classify the pictures and comments posted on some fucking social network.
>>
>>56956453
Try weather prediction, you have massive amount of weather data to train on, just look into yahoo weather api.
>>
>>56957114

They even tracked the weather?

Those motherfuckers..
>>
>>56956453
try using EM clustering algorithim to diagnosis diseases in patients. I was going to do the same thing for my masters but unfortunately I chose to work in actuarial models (more $$).
>>
>>56957114
Wouldn't do this for a final year project.

It won't work well and in most instances you'll just get poor grades for that.

Usually you do just some damn shit nobody else has done (too) much and get some result.

If you're good like that, how about you instead try to find possible (past) rivers and lakes and so on from topological data. Or find sites where humans mined minerals or had stone quarries.
>>
My suggestion: Get a shitload of images or audio files and train an sparse autoencoder on them.
Next, using the vector representation you got you can create an image/audio search engine using cosine or some other similarity measure on the feature space you created.


>>56957148
Why would you do an unsupervised learning technique for a classification task?
>>
>>56957032
don't have it, read about it here a few months back, the guy even said that the german nsa contacted him about it
>>
>>56957596

Oh shit, if they are able to track people down by their way of posting.. I don't like this idea.

BigData is cool, but why are all BigData applications evil deeds?
Thread posts: 12
Thread images: 1


[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y] [Search | Top | Home]

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/


If you need a post removed click on it's [Report] button and follow the instruction.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com.
If you like this website please support us by donating with Bitcoins at 16mKtbZiwW52BLkibtCr8jUg2KVUMTxVQ5
All trademarks and copyrights on this page are owned by their respective parties.
Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the content originated from that site.
This means that RandomArchive shows their content, archived.
If you need information for a Poster - contact them.