ECON 413
Introduction to Data Science


Erol Taymaz
Department of Economics
Middle East Technical University

Topics

What is “Data Science”?

Data science, also known as data-driven science, is an interdisciplinary field about scientific methods, processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured …

Data science is a “concept to unify statistics, data analysis and their related methods” in order to “understand and analyze actual phenomena” with data. It employs techniques and theories drawn from many fields within the broad areas of mathematics, statistics, information science, and computer science, in particular from the subdomains of machine learning, classification, cluster analysis, data mining, databases, and visualization. (Wikipedia/Data Science)

What is “data”?

What is “data”?

What is “data”?

What is “data”?

What is “data”?

Why is data science important?

Why is data science important?

The 50 Best Jobs in America in 2022

  1. Enterprise Architect
  2. Full Stack Engineer
  3. Data Scientist
  4. DevOps Engineer
  5. Strategy Manager
  6. Machine Learning Engineer
  7. Data Engineer
  8. Software Engineer
  9. Java Developer
  10. Product Manager

Source: Glassdoor/50 Best Jobs in America

Data Science Process

What is ECON 413?

Textbooks

Topics

Part 1. Basics

  1. Introduction

  2. Data types and data objects

  3. Algorithms, loops, functions

  4. Basic functions

  5. Data manipulation with data.table

  6. Data visualization and ggplot2

  7. Factors, lists, functionals

Part 2. Applications

  1. Reproducible and interactive research (Rmarkdown, Quarto and Shiny packages)

  2. Web scrapping and text analysis

  3. Regression analysis

  4. Maps

  5. Animations and simulations

  6. R best practice

  7. Review

Lectures

Please review the presentation slides and try examples before the lecture.

Grading

The course consists of lectures, quizzes, homeworks and projects.

Course grades will be based on 6 quizzes (10 pts each), 1 project (40 pts), and forum participation (as a bonus, up to 10 pts).

There will be 7 quizzes in total, and you can take any 6 of them. There will be no make-up.

The project teams will consist of 3 students. Projects will be presented on-line on January 23, 2024, and be submitted by midnight, the same day.

DataCamp

“This class is supported by DataCamp, the most intuitive learning platform for data science and analytics. Learn any time, anywhere and become an expert in R, Python, SQL, and more. DataCamp’s learn-by-doing methodology combines short expert videos and hands-on-the-keyboard exercises to help learners retain knowledge. DataCamp offers 350+ courses by expert instructors on topics such as importing data, data visualization, and machine learning. They’re constantly expanding their curriculum to keep up with the latest technology trends and to provide the best learning experience for all skill levels. Join over 6 million learners around the world and close your skills gap.”

I will register your name at DataCamp. If you do not want to use it, please inform me by e-mail.

What is R?

What is R?

Why R?

Why not Stata, or SPSS, or …?

Why R?

Why R?

Why R?

Why R?

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

1-year forecasts for US/TL

Assume that we need to forecast $/TL exchange rate and report results every week

Data science process - the usual one

Assume that we need to forecast $/TL exchange rate and report results every week

Data science process with R

Only 7 lines of code

library(CBRT)
library(forecast)

myData <- getDataSeries("TP.DK.USD.A.EF", start = "2015-01-01", freq = 3)
usd <- ts(myData$TP.DK.USD.A.EF,  frequency = 52, start = c(2015, 1))
musd <- auto.arima(usd)
fusd <- forecast(musd, h= 52)
autoplot(fusd) + theme_bw()

Data science process with R

To do this week

File organization

Using RStudio

Make errors!

Note that you will make errors frequently when you start using R. Do not get frustrated when you get error messages. It is an essential part of the learning process. Therefore, try to fix these errors by yourself.

If you get any error message anytime while using R, check the code first. Most of these errors will be due to missing parentheses and commas.

If you cannot solve the problem in a reasonable time, submit a question at the Forum page of ODTUClass. When you submit your question, please add the error message and provide sufficient info to reproduce the error.

Try to solve the problems/errors posted on the Forum page, and share your solutions with others. This is one of the best methods to learn R programming.

Search for the error in Google (just copy the error message to the Google search bar), and try to find an answer (among the search results, first check Stackoverflow sites).





Good luck!