My first tidy Tuesday
Tidy Tuesday
I have seen some cool graphs on twitter created for Tidy Tuesday. I wanted to join in on the fun so I downloaded the data from week 3 and started playinh. The data are from our world in data and I downloaded the data file from github.
mortality <- readxl::read_excel(here::here("global_mortality.xlsx"))
glimpse(mortality)
## Observations: 6,156
## Variables: 35
## $ country <chr> "Afghanistan", "Afg...
## $ country_code <chr> "AFG", "AFG", "AFG"...
## $ year <dbl> 1990, 1991, 1992, 1...
## $ `Cardiovascular diseases (%)` <dbl> 17.61040, 17.80181,...
## $ `Cancers (%)` <dbl> 4.025975, 4.054145,...
## $ `Respiratory diseases (%)` <dbl> 2.106626, 2.134176,...
## $ `Diabetes (%)` <dbl> 3.832555, 3.822228,...
## $ `Dementia (%)` <dbl> 0.5314287, 0.532497...
## $ `Lower respiratory infections (%)` <dbl> 10.886362, 10.35696...
## $ `Neonatal deaths (%)` <dbl> 9.184653, 8.938897,...
## $ `Diarrheal diseases (%)` <dbl> 2.497141, 2.572228,...
## $ `Road accidents (%)` <dbl> 3.715944, 3.729142,...
## $ `Liver disease (%)` <dbl> 0.8369093, 0.845515...
## $ `Tuberculosis (%)` <dbl> 5.877075, 5.891704,...
## $ `Kidney disease (%)` <dbl> 1.680611, 1.671115,...
## $ `Digestive diseases (%)` <dbl> 1.058771, 1.049322,...
## $ `HIV/AIDS (%)` <dbl> 0.01301948, 0.01451...
## $ `Suicide (%)` <dbl> 0.4366105, 0.442280...
## $ `Malaria (%)` <dbl> 0.4488863, 0.455019...
## $ `Homicide (%)` <dbl> 1.287020, 1.290991,...
## $ `Nutritional deficiencies (%)` <dbl> 0.3505045, 0.343212...
## $ `Meningitis (%)` <dbl> 3.037603, 2.903202,...
## $ `Protein-energy malnutrition (%)` <dbl> 0.3297599, 0.322171...
## $ `Drowning (%)` <dbl> 0.9838624, 0.954586...
## $ `Maternal deaths (%)` <dbl> 1.769213, 1.749264,...
## $ `Parkinson disease (%)` <dbl> 0.02515859, 0.02545...
## $ `Alcohol disorders (%)` <dbl> 0.02899828, 0.02917...
## $ `Intestinal infectious diseases (%)` <dbl> 0.1833303, 0.178107...
## $ `Drug disorders (%)` <dbl> 0.04120540, 0.04203...
## $ `Hepatitis (%)` <dbl> 0.1387378, 0.135008...
## $ `Fire (%)` <dbl> 0.1741567, 0.170671...
## $ `Heat-related (hot and cold exposure) (%)` <dbl> 0.1378229, 0.134826...
## $ `Natural disasters (%)` <dbl> 0.00000000, 0.79760...
## $ `Conflict (%)` <dbl> 0.932, 2.044, 2.408...
## $ `Terrorism (%)` <dbl> 0.007, 0.040, 0.027...
The variable names had some special characters so I started by doing some name tidying.
names(mortality)<- mortality %>%
names() %>%
to_snake_case() %>%
str_remove(pattern="\\(%\\)") %>%
str_replace_all(pattern="-", "_") %>%
str_replace_all(pattern="/", "_") %>%
str_replace_all(pattern="\\(", "_") %>%
str_remove(pattern="\\)")
Next it was time to reshape the data.
world <- mortality %>%
filter(country=="World") %>%
select(-country_code) %>%
gather(key=disease, value ="percent", -(country:year))
There is a lot of data so I decided to plot a time series for the five diseases that killed the most people in 2016.
top5_world <- world %>%
filter(year==2016) %>%
arrange(desc(percent)) %>%
top_n(5)
## Selecting by percent
I have been trying to get a different project to work in tweenr
for a while so I thought that this was a good time to try my luck on a different data set. This data set is very similar to the gapminder data that is used in a tweenr
tutorial so I relied heavily on that code.
mortality_edit <- world %>%
filter(disease %in% top5_world$disease) %>%
select(-country) %>%
rename(x=year,y=percent,time=year) %>%
mutate(ease="linear")
mortality_tween <- tween_elements(mortality_edit, time="time", group="disease", ease="ease", nframes=150) %>%
mutate(year = round(time), disease = .group) %>%
left_join(world, by=c("disease","year")) %>%
mutate(disease=Hmisc::capitalize(str_replace(disease, pattern="_", " ")))
## Warning: Column `disease` joining factor and character vector, coercing
## into character vector
It worked! Now I just need to apply my new skills to my old project.
p2 <- ggplot(mortality_tween,
aes(x=year, y=percent, group=disease, frame = .frame, cumulative = TRUE)) +
geom_line(aes(color=disease), size=1) + theme_minimal() + ggtitle("Top 5 causes of death in the world") +
xlab("Year") + scale_colour_discrete(name ="Disease")
gganimate(p2, title_frame = FALSE,interval = 0.05, filename ="world_mortality_2016.gif")
Packages I used
library(tidyverse)
library(snakecase)
library(gganimate)
library(readxl)