[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y ] [Search | Free Show | Home]

Data Mining

This is a blue board which means that it's for everybody (Safe For Work content only). If you see any adult content, please report it.

Thread replies: 7
Thread images: 1

File: blogs_kdnuggets[1].jpg (19KB, 495x245px) Image search: [Google]
blogs_kdnuggets[1].jpg
19KB, 495x245px
I've recently finished my computer science bachelors, I want to get into data-science / machine learning and am looking for a project to do in my spare time.

I'm hoping to create a decision support tool which could possibly make me some money if possible, using data-mining algorithms (Predicting specific social changes? )

I know that football games can be predicted with the rough estimation of 70% accuracy, i'm looking for something a little more certain than that. My final year project, I used multiple databases of patients and generated a classification model to predict the chance of a new patient developing colon-cancer.

Anyone worked with big-data / data-analytics or data-mining who can shed some light or give me some useful project ideas?

I'm looking to do everything in python / C# but could probably pick up R pretty easily.

Any-one have any thoughts?

Thanks in advance
>>
>>61245640
Stick with python, C# and R are garbage
numpy, scipy and panda along with shit tons of deep learning libraries are all you need
>>
>>61245666

Yeah, I've been doing some bits with numpy and padas it's what i'd most likely be using,

I mainly put C# in there for a front-end sorta thing and due to being slightly more comfortable using it than any other languages.
>>
>>61245733
With the exception of GUI and multithreading/multiprocessing anything you can do in C# you can whip up something equivalent in Python much faster. With a little bit of luck it might even run faster if you use the native libraries right.

Jupyter Notebook is also a really good IDE substitute that allow you to prototype and iterate very fast, which is crucial in data science

I was a C# programmer before I code almost exclusively in Python now.
>>
>>61245835

Thank you for that recommendation. I had never heard of Jupyter before it's pretty handy
>>
>>61246004
Here's a list of my cookie cutter python libraries for data science:
numpy
scipy
pandas
scikit-learn
xgboost
nltk
[deep learning library of your choice]
matplotlib + seaborn
pickle
tqdm

You can try your hands on some Kaggle competitions if you've got the time
>>
>>61246102

Much appreciated!
Thread posts: 7
Thread images: 1


[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y] [Search | Top | Home]

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/


If you need a post removed click on it's [Report] button and follow the instruction.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com.
If you like this website please support us by donating with Bitcoins at 16mKtbZiwW52BLkibtCr8jUg2KVUMTxVQ5
All trademarks and copyrights on this page are owned by their respective parties.
Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the content originated from that site.
This means that RandomArchive shows their content, archived.
If you need information for a Poster - contact them.