Consider this data set from IMDB on 5,000 movies we have in the environment called movies.
This is what glimpse(movies)
outputs.
## Observations: 3,258
## Variables: 29
## $ color <chr> "Color", "Color", "Color", "Color", ...
## $ director_name <chr> "James Cameron", "Gore Verbinski", "...
## $ num_critic_for_reviews <int> 723, 302, 813, 462, 392, 324, 635, 6...
## $ duration <int> 178, 169, 164, 132, 156, 100, 141, 1...
## $ director_facebook_likes <int> 0, 563, 22000, 475, 0, 15, 0, 0, 0, ...
## $ actor_3_facebook_likes <int> 855, 1000, 23000, 530, 4000, 284, 19...
## $ actor_2_name <chr> "Joel David Moore", "Orlando Bloom",...
## $ actor_1_facebook_likes <int> 1000, 40000, 27000, 640, 24000, 799,...
## $ gross <int> 760505847, 309404152, 448130642, 730...
## $ genres <chr> "Action|Adventure|Fantasy|Sci-Fi", "...
## $ actor_1_name <chr> "CCH Pounder", "Johnny Depp", "Tom H...
## $ movie_title <chr> "Avatar ", "Pirates of the Caribbean...
## $ num_voted_users <int> 886204, 471220, 1144337, 212204, 383...
## $ cast_total_facebook_likes <int> 4834, 48350, 106759, 1873, 46055, 20...
## $ actor_3_name <chr> "Wes Studi", "Jack Davenport", "Jose...
## $ facenumber_in_poster <int> 0, 0, 0, 1, 0, 1, 4, 0, 0, 2, 1, 0, ...
## $ plot_keywords <chr> "avatar|future|marine|native|paraple...
## $ movie_imdb_link <chr> "http://www.imdb.com/title/tt0499549...
## $ num_user_for_reviews <int> 3054, 1238, 2701, 738, 1902, 387, 11...
## $ language <chr> "English", "English", "English", "En...
## $ country <chr> "USA", "USA", "USA", "USA", "USA", "...
## $ content_rating <chr> "PG-13", "PG-13", "PG-13", "PG-13", ...
## $ budget <int> 237000000, 300000000, 250000000, 263...
## $ title_year <int> 2009, 2007, 2012, 2012, 2007, 2010, ...
## $ actor_2_facebook_likes <int> 936, 5000, 23000, 632, 11000, 553, 2...
## $ imdb_score <dbl> 7.9, 7.1, 8.5, 6.6, 6.2, 7.8, 7.5, 6...
## $ aspect_ratio <dbl> 1.78, 2.35, 2.35, 2.35, 2.35, 1.85, ...
## $ movie_facebook_likes <int> 33000, 0, 164000, 24000, 0, 29000, 1...
## $ genre1 <chr> "Action", "Action", "Action", "Actio...
Let’s start easy with a simple scatter plot comparing box office gross to the budget.
ggplot(______) + geom_________(aes(x=____,y=_____))
ggplot(movies) + geom_point(aes(x=gross,y=budget))
Let’s change the color of the circles to blue.
ggplot(______) + geom_________(aes(x=_____,y=_____), ________)
ggplot(movies) + geom_point(aes(x=gross,y=budget), color="blue")
Add a color factor based on content_rating
.
ggplot(______) + geom_________(aes(x=_____, y=______, _______=_____))
ggplot(movies) + geom_point(aes(x=gross,y=budget, color=content_rating))
Did you notice the placement of the second-to-last parenthesis for color this time?
Make a bar plot chart that counts up the number of titles per year (title_year).
ggplot(______,
aes(x=_________)) +
geom__________()
ggplot(movies,
aes(x=title_year)) +
geom_bar()
Add content_rating as a grouping per year counting up the movies to create a stacked bar chart.
ggplot(______,
aes(x=_________,_________)) +
geom__________()
ggplot(movies,
aes(x=title_year, fill=content_rating)) +
geom_bar()
Hint: You may want to use the fill
argument in the aes()
.
Great, now split up the bars so they’re not stacked but next to each other.
And we’ll focus on movies created after 2001 (title_year is the variable).
movies %>%
filter(___________) %>%
ggplot(aes(x=_________,fill=________)) +
geom__________(________________)
movies %>%
filter(title_year>2001) %>%
ggplot(aes(x=title_year,fill=content_rating)) +
geom_bar(position="dodge")
Hint: You may want to use the position
argument in the geom_bar()
function.
Alright, let’s make a percent stacked chart this time:
movies %>%
filter(___________) %>%
ggplot(aes(x=_________,fill=_________)) +
geom__________(position=________)
movies %>%
filter(title_year>2001) %>%
ggplot(aes(x=title_year,fill=content_rating)) +
geom_bar(position="fill")
Consider this data set from IMDB on 5,000 movies we have in the environment called movies.
This is what glimpse(movies)
outputs.
## Observations: 3,258
## Variables: 29
## $ color <chr> "Color", "Color", "Color", "Color", ...
## $ director_name <chr> "James Cameron", "Gore Verbinski", "...
## $ num_critic_for_reviews <int> 723, 302, 813, 462, 392, 324, 635, 6...
## $ duration <int> 178, 169, 164, 132, 156, 100, 141, 1...
## $ director_facebook_likes <int> 0, 563, 22000, 475, 0, 15, 0, 0, 0, ...
## $ actor_3_facebook_likes <int> 855, 1000, 23000, 530, 4000, 284, 19...
## $ actor_2_name <chr> "Joel David Moore", "Orlando Bloom",...
## $ actor_1_facebook_likes <int> 1000, 40000, 27000, 640, 24000, 799,...
## $ gross <int> 760505847, 309404152, 448130642, 730...
## $ genres <chr> "Action|Adventure|Fantasy|Sci-Fi", "...
## $ actor_1_name <chr> "CCH Pounder", "Johnny Depp", "Tom H...
## $ movie_title <chr> "Avatar ", "Pirates of the Caribbean...
## $ num_voted_users <int> 886204, 471220, 1144337, 212204, 383...
## $ cast_total_facebook_likes <int> 4834, 48350, 106759, 1873, 46055, 20...
## $ actor_3_name <chr> "Wes Studi", "Jack Davenport", "Jose...
## $ facenumber_in_poster <int> 0, 0, 0, 1, 0, 1, 4, 0, 0, 2, 1, 0, ...
## $ plot_keywords <chr> "avatar|future|marine|native|paraple...
## $ movie_imdb_link <chr> "http://www.imdb.com/title/tt0499549...
## $ num_user_for_reviews <int> 3054, 1238, 2701, 738, 1902, 387, 11...
## $ language <chr> "English", "English", "English", "En...
## $ country <chr> "USA", "USA", "USA", "USA", "USA", "...
## $ content_rating <chr> "PG-13", "PG-13", "PG-13", "PG-13", ...
## $ budget <int> 237000000, 300000000, 250000000, 263...
## $ title_year <int> 2009, 2007, 2012, 2012, 2007, 2010, ...
## $ actor_2_facebook_likes <int> 936, 5000, 23000, 632, 11000, 553, 2...
## $ imdb_score <dbl> 7.9, 7.1, 8.5, 6.6, 6.2, 7.8, 7.5, 6...
## $ aspect_ratio <dbl> 1.78, 2.35, 2.35, 2.35, 2.35, 1.85, ...
## $ movie_facebook_likes <int> 33000, 0, 164000, 24000, 0, 29000, 1...
## $ genre1 <chr> "Action", "Action", "Action", "Actio...
Let’s look at box office results for all the movies that James Cameron has created (variable is director_name).
movies %>%
filter(__________) %>%
ggplot(aes(x=_________,y=___________)) +
geom_bar(___________)
movies %>%
filter(director_name=="James Cameron") %>%
ggplot(aes(x=movie_title,y=gross)) +
geom_bar(stat="identity")
Hint: You may want to pass the argument stat=
to the geom_bar()
function. What do you fill with stat? You’ll need to check your notes.
Transpose that chart so that the movies are on the y axis instead of the x axis (without swapping the coords from the code above).
movies %>%
filter(__________) %>%
ggplot(aes(x=movie_title,y=gross)) +
geom_bar(___________) +
__________()