[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y ] [Search | Free Show | Home]

Data Science

This is a blue board which means that it's for everybody (Safe For Work content only). If you see any adult content, please report it.

Thread replies: 24
Thread images: 1

File: 300k.jpg (34KB, 640x427px) Image search: [Google]
300k.jpg
34KB, 640x427px
What is this "Data Science" meme and how do I profit off of it?
>>
>>1061954
Glorified software dev+analyst jobs
>>
>>1061970
What the fuck is this whole "analyst" thing? Every fucking job I see has "analyst" in the title. What the hell do "analysts" do all day?
>>
>>1062037
>sit in front of a computer
>google company websites for research and analysis
>make phone calls to get non-public info
>write reports
>rinse and repeat
>>
>>1062047
Lol, you have no fucking clue what you're talking about.

At work now will elaborate later.
>>
>>1062078
How late is "later"?
>>
>>1063033
Ok. Fine.

What this fag >>1062047 is referring to is market analysis which is a subset of business strategy, and has barely anything to do with what a data scientist does on a day to day basis.

Generally a data scientist is someone who brings together business knowledge and deep statistics knowledge to address business problems. I know that sound like BS so here are a couple typical things data scientists have actually done:

1) You work at a large credit card company and you want to be able to identify identity theft before it costs you money. Your build a model with hundreds of variables that can identify suspicious transactions. Now these transactions get flagged in real time and the card gets declined before any money comes out of your pocket.

2) You work at a large online retailer that is interested in improving it's conversion rate. You notice that some customers are returning later to buy similar products. You implement a recommender system which displays products which similar customers bought after browsing a given product page. Conversion shoots through the roof.

3) You work at a multinational oil company. Each rig nets you $200K USD of oil each week. Your goal is to minimize downtime and costly equipment failures so you build a statistical model which takes the sensor data collected during the operation of the rig and uses it to predict when the rig will need to be temporarily decommissioned for maintenance. This saves you 10's of millions of dollars each year over 1000's of oil rigs.

I could go on, but that's the jist. You take statistical modeling knowledge, use it to solve a business problem and then implement it in production.
>>
>>1063066
Interesting, thank you.
>>
>>1063066
Thanks a lot. I'm teaching myself R and machine learning at home. Is there anything else I should be doing to prepare?
>>
>>1063198
Is R really the best language for that though? My Data Structures course is in Java.
>>
>>1063198
Get good at R. Also make sure you know SQL pretty well. Pick up big data technologies as you go. Most of them have a SQLesque semantic layer so it won't be super challenging. I'm a big fan of Python over R for data science but R is more or less the de facto standard. Also I suggest learning Tableau for visualization.

>>1063393
R is the standard right now, mostly because of the wide range of available packages. Java is more useful if you are leaning on the software development side.
>>
Data science is machine learning/AI CompSCI jobs applied to things like commerce or stocks

Very complex math
>>
>>1063066
its basically just a new buzzword to describe what statisticians and people in operations research have been doing for decades... only now with more data and technology to work with
>>
>>1063393
>My Data Structures course is in Java.

I think you're getting a bit confused - data structures course would be a fairly standard thing to have in a general CS degree... it has got little to do with statistics etc..
>>
>>1063615
I'm not doing a CS degree but a math degree.

On my faculty everyone is constantly talking about how Data Science is gonna be the next big thing.
>>
>>1063572
ah I'm already doing a sort of python-y data-y course for my physics degree so that's why I elected to focus on R.
>>
>>1063066
>>1063610
this. These guys know their stuff.

>>1063198
>>1063572
Surprisingly Haskell is getting bigger as a data analysis language. While weird and exotic it has performance speed comparable to C, contrary to R which is really slow, and it scales very, very well. This means that you can use the same code to crunch 100 data sets or 100 million without performance taking a too big hit.

>>1063623
I has been the next big thing for a few years now, the hype should probably diminish withing the next 5-10 years.
>>
>>1065275
My company only uses SAS/SQL. Things run very slow. It's very common for code to take several hours to run. I get that we are pulling in a lot from our data warehouses, but I feel like this is just too damn slow.

Anyway, is R faster? Is Haskell faster? This is my first data analyst job so I don't honestly know if this is normal or not.
>>
>>1065299
haskell is the business

look into 'apama' as well, its a weird kind of ai for automatically handling the kind of scenarios >>1063066 described
>>
>>1062037

excel monkey
>>
business intelligence here
>be accountant
>gain basic knowledge of vba, autohotkey, sql
>automate job
>get promoted to senior bi analyst

Most companies are just falling for the meme and dont have a real need for data science.
>>
>>1065299
Haskell is definitely faster, you can count on it being within +-30% of C.
Not sure about R, it might be faster but probably not by that much.

>>1065329
Also check out:
https://www.fpcomplete.com/
It has resources to learn haskell and even an online IDE.
It concentrates primarily on Haskell for data analysis.
>>
>>1065275
>>1065299
>>1065329
>>1065394

I don't know that I've seen much Data Science done in Haskell. As far as I knew Scala was where it's at, mostly due to Spark and Cascading(Scalding). It seems like all the big data competent companies have their own pet language though.

Anyways, the primary advantage of functional languages is that they scale well in distributed computing due to the limited amount of message-passing between nodes and the ability to do lazy evaluation.

As far as speed R/SAS are both going to be pretty bad (never used SAS though). R especially has the limitation of needing to put tables into memory (though there is a commercial distribution that gets around this.) Fundamentally though the basic data analysis interpreted languages (R/Python/MATLAB/SAS) are not going to be terribly different given that they all are calling the same C libraries on the backend.
>>
>>1065442
Most companies using Haskell on a wide scale consider it a competitive advantage and as such dont make that fact public. A bit like Lisp where it too seems as if no one is using it.
Thread posts: 24
Thread images: 1


[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y] [Search | Top | Home]

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/


If you need a post removed click on it's [Report] button and follow the instruction.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com.
If you like this website please support us by donating with Bitcoins at 16mKtbZiwW52BLkibtCr8jUg2KVUMTxVQ5
All trademarks and copyrights on this page are owned by their respective parties.
Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the content originated from that site.
This means that RandomArchive shows their content, archived.
If you need information for a Poster - contact them.