How do I become a data scientist?

data science -2

Answer by Pronojit Saha:

SELF STARTER WAY
For a self-starter novice, here is an outline that one can start with. (this is reproduced from my blog- How to acquire the "Essential Skill Set"?- the Self Starter way).

0. Basic Pre-requisites:

1. Acquire & Scrub Data:

2. Filter & Mine data:

3. Represent & Refine Data: Tableau-Training & Tutorials, Data visualisation in R with ggplot2 and plyr, Predictive Analytics: Overview and Data visualization, Flowing Data-Tutorials, UC Berkeley-Data Visualization, D3.js Tutorial

4. Domain Knowledge: The Black-Box, as per your interest.

Combining all the above:
Data Literacy Course — IAP
UC Berkeley Introduction to Data Science
Coursera-Introduction to Data Science
Teach Data Science-Syracuse University

Apply the knowledge:
Harvard Data Science Course Homework
Analyzing Big Data with Twitter
Analyzing Twitter Data with Apache Hadoop

FORMAL WAY
For a more formal way of becoming a data scientist one can look into this post (reproduced below)- How to acquire the "Essential Skill Set"?- the Formal way.
The Essential Skill Set are the basic fundamental skills which every data scientist is expected to know. Traditionally, these can be acquired by undertaking a computer science degree or a statistics degree from an institution. The Stanford  Computer Science courses & Statistics courses provide a good reference list of courses to undertake. Now some of the courses are relevant while many others are not. For example in Computer Science while one would do good to learn about large scale distributed databases & algorithms but there is no need for learning HCI and UX, or pureplay storage and operating systems, networking, etc. Similarly some statistics courses focus too much on, lets say, "old school statistics" including thousands of ways of hypothesis testing instead of more on machine learning (clustering, regression, classification, etc). So both the streams have many nice to have courses and must have courses for a data scientist (I dare to claim that at present the percentage of must have courses seems to be greater in a traditional Statistics stream than a Computer Science stream). As such one needs to pick the courses wisely.

Or alternatively, one can also look into a number of new Data Science courses that some universities are offering harping on the points I mentioned above. They combine the must have courses from both the traditional statistics and computer science program to impart the 4 Essential Skills as well as include courses to develop the Differentiator Skills in students. The MS in Data Science at NYU & MS in Analytics at USF are good examples of such amalgamation of the requisite courses. A complete list of such courses is presented here- Colleges with Data Science Degrees.

The correct program obviously depends on the individual's goal. One of the recent O'Rielly publications titled 'Analyzing the Analyzers' does a very good job in aggregating the various data scientist roles into 4 main categories as per their skills. An individual may therefore select a program as per the category of data scientist he most identifies himself with, as shown below.

  • Data Businesspeople are the product and profit-focused data scientists. They're leaders, managers, and entrepreneurs, but with a technical bent. A common educational path is an engineering degree paired with an MBA or the new Data Science programs as mentioned above.
  • Data Creatives are eclectic jacks-of-all-trades, able to work with a broad range of data and tools. They may think of themselves as artists or hackers, and excel at visualization and open source technologies. They are expected to have a engineering degree (mostly in statistics or economics) but not much in business skills.
  • Data Developers are focused on writing software to do analytic, statistical, and machine learning tasks, often in production environments. They often have computer science degrees, and often work with so-called "big data".
  • Data Researchers apply their scientific training, and the tools and techniques they learned in academia, to organizational data. They may have a MS or PhDs in statistics, economic, physics, etc., and their creative applications of mathematical tools yields valuable insights and products.

The skills associated with the 4 main categories, which justify the above mentioned program recommendation, are as below:

How do I become a data scientist?

Leave a comment