Reflections from the Data Science Course

After taking the Data Science for Statisticians course, I understand further what the role of a data scientist is. One thing that has changed for me about my understanding of the data science role is how much they actually perform machine learning or data analysis in their day to day work. I’ve learned that this is only a small part of what they do, and the bulk of the work is understanding business needs, data processing and cleaning, and deploying and intergreting the models.

Read More

Predictive Models and Automated Reports Project Reflection

In this project, I developed models with a partner to predict the number of bicycle rentals given a set of variables. First, we performed Exploratory Data Analysis, creating several graphs and summary statistics to understand our data better. Then, we fit four separate models - two linear regression, one random forest, and one boosting model - on a training set using cross-validation. We then tested those models on a test set and compared the MSE for each fit. We set up the markdown code to create a different report for each day of the week when knit.

Read More

API Project Reflection

In this project I created several functions to access different endpoints from the NHL API. I used these functions to create one “master” function that I could use to call any function and access any endpoint easily. After completing this, I did some Exploratory Data Analysis with data pulled from this API. I created several tables and charts, and got a general understanding of the structure of much of this data.

Read More

Thoughts on R and other programming

R is the first programming language I have had the oppourtunity to get deep experience in. I had some experience with R before starting the Master’s program from taking Dataquest lessons. I found these lessons very helpful as they provided short readings followed by short exercises, so I quickly got used to writing code to solve problems. They also had lessons on Python, and I found the two languages about equal in difficulty. Since starting the statistics program, I have nearly always had R open on my computer to work on assignments and projects. It’s the first language I really feel comfortable using, and I did not find it too difficult to learn.

Read More

Understanding Data Science

Data science is an exciting new field that is not fully understood by many people, and is often confused as being just another, or rebranded, name for statistics. In my first blog post, I will summarize what I see to be the major differences between a statistician and a data scientist.

Read More