What this fag >>1062047 is referring to is market analysis which is a subset of business strategy, and has barely anything to do with what a data scientist does on a day to day basis.
Generally a data scientist is someone who brings together business knowledge and deep statistics knowledge to address business problems. I know that sound like BS so here are a couple typical things data scientists have actually done:
1) You work at a large credit card company and you want to be able to identify identity theft before it costs you money. Your build a model with hundreds of variables that can identify suspicious transactions. Now these transactions get flagged in real time and the card gets declined before any money comes out of your pocket.
2) You work at a large online retailer that is interested in improving it's conversion rate. You notice that some customers are returning later to buy similar products. You implement a recommender system which displays products which similar customers bought after browsing a given product page. Conversion shoots through the roof.
3) You work at a multinational oil company. Each rig nets you $200K USD of oil each week. Your goal is to minimize downtime and costly equipment failures so you build a statistical model which takes the sensor data collected during the operation of the rig and uses it to predict when the rig will need to be temporarily decommissioned for maintenance. This saves you 10's of millions of dollars each year over 1000's of oil rigs.
I could go on, but that's the jist. You take statistical modeling knowledge, use it to solve a business problem and then implement it in production.
>>1063198 Get good at R. Also make sure you know SQL pretty well. Pick up big data technologies as you go. Most of them have a SQLesque semantic layer so it won't be super challenging. I'm a big fan of Python over R for data science but R is more or less the de facto standard. Also I suggest learning Tableau for visualization.
>>1063393 R is the standard right now, mostly because of the wide range of available packages. Java is more useful if you are leaning on the software development side.
>>1063198 >>1063572 Surprisingly Haskell is getting bigger as a data analysis language. While weird and exotic it has performance speed comparable to C, contrary to R which is really slow, and it scales very, very well. This means that you can use the same code to crunch 100 data sets or 100 million without performance taking a too big hit.
>>1063623 I has been the next big thing for a few years now, the hype should probably diminish withing the next 5-10 years.
>>1065275 My company only uses SAS/SQL. Things run very slow. It's very common for code to take several hours to run. I get that we are pulling in a lot from our data warehouses, but I feel like this is just too damn slow.
Anyway, is R faster? Is Haskell faster? This is my first data analyst job so I don't honestly know if this is normal or not.
I don't know that I've seen much Data Science done in Haskell. As far as I knew Scala was where it's at, mostly due to Spark and Cascading(Scalding). It seems like all the big data competent companies have their own pet language though.
Anyways, the primary advantage of functional languages is that they scale well in distributed computing due to the limited amount of message-passing between nodes and the ability to do lazy evaluation.
As far as speed R/SAS are both going to be pretty bad (never used SAS though). R especially has the limitation of needing to put tables into memory (though there is a commercial distribution that gets around this.) Fundamentally though the basic data analysis interpreted languages (R/Python/MATLAB/SAS) are not going to be terribly different given that they all are calling the same C libraries on the backend.
All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the shown content originated from that site. This means that 4Archive shows their content, archived. If you need information for a Poster - contact them.
If a post contains personal/copyrighted/illegal content, then use the post's [Report] link! If a post is not removed within 24h contact me at email@example.com with the post's information.