Understanding Data Science

Data science is an exciting new field that is not fully understood by many people, and is often confused as being just another, or rebranded, name for statistics. In my first blog post, I will summarize what I see to be the major differences between a statistician and a data scientist.

Data science is an interdisciplinary field involving not just statistics, but also programming and business or product knowledge. A data scientist needs to deeply understand the business so they can use data analytics to appropriately solve a problem. Unlike statisticians, a data scientist is often concerned with building the most accurate predictive model and applying it to guide or make business decisions. A statistician may be more interested in quantifying uncertainty and understanding how the variables affect the response, or inference. Often, a data scientist will work with much larger data sets than a statistician, and the sources the data is derived from can be very different. A statistician may work with survey data or experimental data, for example, while a data scientist is more likely to train a model on big data derived from many different sources.

As a statistician who has had some training in statistical learning methods, I understand some of the techniques that go into building predictive models. However, if I were to pursue the role of a data scientist, I would need to have a deeper level of programming, business and product knowledge, including Python, SQL, Spark, etc. Even if I do not pursue the title of a data scientist, my knowledge of programming methods, UI, how to handle large datasets, and business acumen will be invaluable in my career to come.

Written on May 20, 2021