What is a data scientist about?
A data scientist is, I believe, ultimately someone that’s part of a data science team that can generate value to a company or product. To do this they use their tools of programming and statistics to try to answer questions related to the business.
For example, I can be a data scientist in a business that is a streaming service for programming tutorials. Some questions me and my team may want to know to improve the business may be:
-
What promotions gain the most new subscribers?
-
Which topics are gaining in popularity?
-
What videos do users watch the most?
Answering these questions and others can help the business go in a direction that serves the customer the best.
As far as technical knowledge goes, this post has a quote that I think resonates the best the differences between a data scientist and a statistician.
A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician.
Duties and knowledge areas
Data scientists have, I would say, a wide range of duties and areas they would need to know to help them in their daily tasks. They have to be knowledgeable in several areas. These areas themselves are very wide in terms of what they are.
For example, based on a diagram from this post a data scientist may need to know the following:
-
Statistics
-
Programming
-
Domain knowledge
If you have all three of these you are considered a data scientist. All three of these areas are very in depth.
Statistics and programming have their own disciplines. You can get a Ph.D. in either of them so that alone lets you know how wide each of those subjects are.
For domain knowledge, that’s really something you tend to learn more as you’re in the job. To take our streaming service for programming tutorials example, you would learn a lot about sales and customer retention as well as the technologies used to provide the best streaming experience for the users. Other products and services have much deeper domain knowledge. If you’re on a team that supplies software to doctors you may need to know a lot about the medical industry and medical products.
Data science and statistics similarities
Some may argue that a data scientist is just a modern day statistician. In fact, some well known statisticians, such as Nate Silver from the blog FiveThirtyEight and the book The Signal and the Noise mentions in this post that
Statistics is a branch of science. Data scientist is slightly redundant in some way and people shouldn’t berate the term statistician.
And this makes sense, statistics has been around for centuries. Data science has only been around a few years. You can get a doctorate in statistics. You may be lucky to be able to get an undergraduate in data science.
However, as mentioned above, a data scientist needs to know a lot besides statistics. They need to be proficient in programming to be able to analyze data and create models. Though, you can argue that statisticians can do this already hence the creation of the R programming language.
My relation to the two
When I went to take my undergraduate my degree was in math and computer science. However, I never had to take any statistics courses. I wish I had but I didn’t see the value in statistics back then as I do now. I took more calculus and abstract algebra courses instead.
My main focus was programming and computer science. I enjoyed making programs and automating where I can.
However, the more I got into stats, thanks to data science, the more I enjoy it. The more I wish I learned more when I was in college. Honestly, if someone asks why math is important I would say statistics since it is probably one of the most applicable disciplines of math that exists.
My love will always be programming but statistics is something I can see loving just as much.