In this tutorial, I will try to show to create an interactive chart by using plotly
and leaflet
function by using Corona dataset taken from the Kaggle. (https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset) The source of the dataset: Johns Hopkins University
#install.packages("plotly")
library(ggplot2)
library(dplyr)
library(plotly)
library(hrbrthemes)
The dataset shows the following information related to nCoV virus, known as Corona.
County that the virus seen.
Provinces that virus seen.
Total number of patients having the virus.
Number of deaths caused by virus.
Total number of recovered patiens.
Before reading the dataset, I divided dataset into two parts. I extracted the date column manually, then made a correction to simplify the analysis.
corona=read.csv("ncov_2019.csv",header=T,sep=";")
date=read.table("date.txt",header=T)
corona=data.frame(corona,date)
head(corona)
## Sno Province.State Country Time Confirmed Deaths Recovered X
## 1 1 Anhui China 12:00 1 0 0 NA
## 2 2 Beijing China 12:00 14 0 0 NA
## 3 3 Chongqing China 12:00 6 0 0 NA
## 4 4 Fujian China 12:00 1 0 0 NA
## 5 5 Gansu China 12:00 0 0 0 NA
## 6 6 Guangdong China 12:00 26 0 0 NA
## Date
## 1 22-01-2020
## 2 22-01-2020
## 3 22-01-2020
## 4 22-01-2020
## 5 22-01-2020
## 6 22-01-2020
To draw the plot, I changed the class of the date object as date.
corona$Date=as.Date(corona$Date)
Total Number of People having the disease between 22.01.2020-31.01.2020
I calculated the sum of the confirmed number of patients with respect to dates between 22.01.2020-31.01.2020.
sum_data=group_by(corona, Date) %>% summarize(sum_confirm = sum(Confirmed))
sum_data
## # A tibble: 10 x 2
## Date sum_confirm
## <date> <dbl>
## 1 0022-01-20 555
## 2 0023-01-20 653
## 3 0024-01-20 941
## 4 0025-01-20 2019
## 5 0026-01-20 2794
## 6 0027-01-20 4473
## 7 0028-01-20 6057
## 8 0029-01-20 7783
## 9 0030-01-20 9776
## 10 0031-01-20 11374
p <-sum_data %>%
ggplot( aes(x=Date, y=sum_confirm)) +
geom_area(fill="#69b3a2", alpha=0.5) +
geom_line(color="#69b3a2") +
ylab("Total Number of Patients")+
theme_ipsum()
# Turn it interactive with ggplotly
p <- ggplotly(p)
p
It is seen that the spread of the disease approximately shows linearly increasing trend over time.
Total Number of Deaths caused by the disease between 22.01.2020-31.01.2020
The total number of deaths caused by the disease is also visualized below.
death_data=group_by(corona, Date) %>% summarize(sum_death = sum(Deaths))
death_data
## # A tibble: 10 x 2
## Date sum_death
## <date> <dbl>
## 1 0022-01-20 0
## 2 0023-01-20 18
## 3 0024-01-20 26
## 4 0025-01-20 56
## 5 0026-01-20 80
## 6 0027-01-20 107
## 7 0028-01-20 132
## 8 0029-01-20 170
## 9 0030-01-20 213
## 10 0031-01-20 259
p1 <-death_data %>%
ggplot( aes(x=Date, y=sum_death)) +
geom_area(fill="#CC0000", alpha=0.5) +
geom_line(color="#CC0000") +
ylab("Total Number of Deaths caused by Corona") +
theme_ipsum()
# Turn it interactive with ggplotly
p1 <- ggplotly(p1)
p1
As seen that, the number of death patients shows similar behavior like spread of the disease. The number of deaths unfortunately shows increasing trend over time.
Total Number of Recovered Patients between 22.01.2020-31.01.2020
Now, the total number of recovered patients is also visualized below.
rec_data=group_by(corona, Date) %>% summarize(sum_rec = sum(Recovered))
rec_data
## # A tibble: 10 x 2
## Date sum_rec
## <date> <dbl>
## 1 0022-01-20 0
## 2 0023-01-20 30
## 3 0024-01-20 36
## 4 0025-01-20 49
## 5 0026-01-20 54
## 6 0027-01-20 63
## 7 0028-01-20 110
## 8 0029-01-20 133
## 9 0030-01-20 187
## 10 0031-01-20 252
p2 <-rec_data %>%
ggplot( aes(x=Date, y=sum_rec)) +
geom_area(fill="#D55E00", alpha=0.5) +
geom_line(color="#D55E00") +
ylab("Total Number of Deaths caused by Corona") +
theme_ipsum()
# Turn it interactive with ggplotly
p2 <- ggplotly(p2)
p2
Drawing a map using Leaflet
First of all, we created the dataset showing total number of disease for each country.
country_data=group_by(corona, Country) %>% summarize(total= sum(Confirmed))
country_data
## # A tibble: 31 x 2
## Country total
## <fct> <dbl>
## 1 Australia 43
## 2 Brazil 0
## 3 Cambodia 5
## 4 Canada 13
## 5 China 549
## 6 Finland 3
## 7 France 31
## 8 Germany 20
## 9 Hong Kong 68
## 10 India 2
## # ... with 21 more rows
In order to draw a map using leaflet, we need to have lattiude and longitude of the countries. The following data set provides these information for the countries listed above.
lattitude_longitude=read.table("lattitude_longitude.txt",sep = "\t",header=T)
lattitude_longitude
## country latitude longitude name
## 1 AT 47.516231 14.550072 Austria
## 2 BR -14.235004 -51.925280 Brazil
## 3 CB 11.559720 104.917500 Cambodia
## 4 CA 56.130366 -106.346771 Canada
## 5 CN 35.861660 104.195397 China
## 6 FI 61.924110 25.748151 Finland
## 7 FR 46.227638 2.213749 France
## 8 DE 51.165691 10.451526 Germany
## 9 HK 22.396428 114.109497 Hong Kong
## 10 IN 20.593684 78.962880 India
## 11 IT 41.871940 12.567380 Italy
## 12 CI 7.539989 -5.547080 Ivory Coast
## 13 JP 36.204824 138.252924 Japan
## 14 MO 22.198745 113.543873 Macau
## 15 CN 35.861660 104.195397 Mainland China
## 16 MY 4.210484 101.975766 Malaysia
## 17 MX 23.634501 -102.552784 Mexico
## 18 NP 28.394857 84.124008 Nepal
## 19 PH 12.879721 121.774017 Philippines
## 20 RU 61.524010 105.318756 Russia
## 21 SG 1.352083 103.819836 Singapore
## 22 KR 35.907757 127.766922 South Korea
## 23 ES 40.463667 -3.749220 Spain
## 24 LK 7.873054 80.771797 Sri Lanka
## 25 SE 60.128161 18.643501 Sweden
## 26 TW 23.697810 120.960515 Taiwan
## 27 TH 15.870032 100.992541 Thailand
## 28 TH 51.477780 -0.001390 United Kingdom
## 29 AE 23.424076 53.847818 United Arab Emirates
## 30 US 37.090240 -95.712891 United States
## 31 VN 14.058324 108.277199 Vietnam
Then, merge the data sets.
leaflet_data=data.frame(country_data,lattitude_longitude)
leaflet_data
## Country total country latitude longitude
## 1 Australia 43 AT 47.516231 14.550072
## 2 Brazil 0 BR -14.235004 -51.925280
## 3 Cambodia 5 CB 11.559720 104.917500
## 4 Canada 13 CA 56.130366 -106.346771
## 5 China 549 CN 35.861660 104.195397
## 6 Finland 3 FI 61.924110 25.748151
## 7 France 31 FR 46.227638 2.213749
## 8 Germany 20 DE 51.165691 10.451526
## 9 Hong Kong 68 HK 22.396428 114.109497
## 10 India 2 IN 20.593684 78.962880
## 11 Italy 4 IT 41.871940 12.567380
## 12 Ivory Coast 0 CI 7.539989 -5.547080
## 13 Japan 61 JP 36.204824 138.252924
## 14 Macau 46 MO 22.198745 113.543873
## 15 Mainland China 45207 CN 35.861660 104.195397
## 16 Malaysia 41 MY 4.210484 101.975766
## 17 Mexico 0 MX 23.634501 -102.552784
## 18 Nepal 7 NP 28.394857 84.124008
## 19 Philippines 2 PH 12.879721 121.774017
## 20 Russia 2 RU 61.524010 105.318756
## 21 Singapore 59 SG 1.352083 103.819836
## 22 South Korea 39 KR 35.907757 127.766922
## 23 Spain 1 ES 40.463667 -3.749220
## 24 Sri Lanka 5 LK 7.873054 80.771797
## 25 Sweden 1 SE 60.128161 18.643501
## 26 Taiwan 51 TW 23.697810 120.960515
## 27 Thailand 94 TH 15.870032 100.992541
## 28 UK 2 TH 51.477780 -0.001390
## 29 United Arab Emirates 12 AE 23.424076 53.847818
## 30 US 39 US 37.090240 -95.712891
## 31 Vietnam 18 VN 14.058324 108.277199
## name
## 1 Austria
## 2 Brazil
## 3 Cambodia
## 4 Canada
## 5 China
## 6 Finland
## 7 France
## 8 Germany
## 9 Hong Kong
## 10 India
## 11 Italy
## 12 Ivory Coast
## 13 Japan
## 14 Macau
## 15 Mainland China
## 16 Malaysia
## 17 Mexico
## 18 Nepal
## 19 Philippines
## 20 Russia
## 21 Singapore
## 22 South Korea
## 23 Spain
## 24 Sri Lanka
## 25 Sweden
## 26 Taiwan
## 27 Thailand
## 28 United Kingdom
## 29 United Arab Emirates
## 30 United States
## 31 Vietnam
library(leaflet)
# Center points for the map
center_lon <- median(leaflet_data$longitude, na.rm = TRUE)
center_lat <- median(leaflet_data$latitude, na.rm = TRUE)
leaflet(leaflet_data) %>%
addProviderTiles("Esri") %>%
addCircles(~leaflet_data$longitude, ~leaflet_data$latitude, weight =~leaflet_data$total^(1/2), popup = ~leaflet_data$name, color = "red") %>%
setView(lng = center_lon, lat = center_lat, zoom = 2)
You can find the codes and dataset from my GitHub page. (https://github.com/ozancanozdemir)