1  Exercises (Chapter 1)

  1. How many rows are in penguins? How many columns?

    # Your R Code here

    Your answer here.

  2. What does the bill_depth_mm variable in the penguins data frame describe? Read the help for ?penguins to find out.


    Type your answer here.

  3. Make a scatterplot of bill_depth_mm vs. bill_length_mm. That is, make a scatterplot with bill_depth_mm on the y-axis and bill_length_mm on the x-axis. Describe the relationship between these two variables.

    # Your R code here

    Your answer here.

  4. What happens if you make a scatterplot of species vs. bill_depth_mm? What might be a better choice of geom?

    # Your R code here

    Your answer here.

    # Your R code here
  5. Why does the following give an error and how would you fix it?

    ggplot(data = penguins) + 

    Your answer here.

    # Correct code here
  6. What does the na.rm argument do in geom_point()? What is the default value of the argument? Create a scatterplot where you successfully use this argument set to TRUE.


    Your answer here.

    # Your R code here
  7. Add the following caption to the plot you made in the previous exercise: “Data come from the palmerpenguins package.” Hint: Take a look at the documentation for labs().

    # Your R code here
  8. Recreate the following visualization. What aesthetic should bill_depth_mm be mapped to? And should it be mapped at the global level or at the geom level?

    A scatterplot of body mass vs. flipper length of penguins, colored by bill depth. A smooth curve of the relationship between body mass and flipper length is overlaid. The relationship is positive, fairly linear, and moderately strong.

    # Your R Code here

    Your answer here.

  9. Run this code in your head and predict what the output will look like. Then, run the code in R and check your predictions.

      data = penguins,
      mapping = aes(x = flipper_length_mm, y = body_mass_g, color = island)
    ) +
        geom_point() +
        geom_smooth(se = FALSE)

    Your answer here.

    # Your R code here
  10. Will these two graphs look different? Why/why not?

      data = penguins,
      mapping = aes(x = flipper_length_mm, y = body_mass_g)
    ) +
      geom_point() +
    ggplot() +
        data = penguins,
        mapping = aes(x = flipper_length_mm, y = body_mass_g)
      ) +
        data = penguins,
        mapping = aes(x = flipper_length_mm, y = body_mass_g)

    Your answer here.

    # Your R code here
  11. Make a bar plot of species of penguins, where you assign species to the y aesthetic. How is this plot different?


    Your answer here.

    # Your R code here
  12. How are the following two plots different? Which aesthetic, color or fill, is more useful for changing the color of bars?

    ggplot(penguins, aes(x = species)) +
      geom_bar(color = "red")
    ggplot(penguins, aes(x = species)) +
      geom_bar(fill = "red")
    ggplot(penguins, aes(x = species)) +
        geom_bar(color = "red") -> p1
    ggplot(penguins, aes(x = species)) +
      geom_bar(fill = "red") -> p2
    p1 + p2

    Your answer here.

  13. What does the bins argument in geom_histogram() do?


    Your answer here.

  14. Make a histogram of the carat variable in the diamonds dataset that is available when you load the tidyverse package. Experiment with different binwidths. What binwidth reveals the most interesting patterns?

    # Your R code here

    Your answer here.

  15. The mpg data frame that is bundled with the ggplot2 package contains 234 observations collected by the US Environmental Protection Agency on 38 car models. Which variables in mpg are categorical? Which variables are numerical? (Hint: Type ?mpg to read the documentation for the dataset.) How can you see this information when you run mpg?

    # Your R code here

    Your answer here.

  16. Make a scatterplot of hwy vs. displ using the mpg data frame. Next, map a third, numerical variable to color, then size, then both color and size, then shape. How do these aesthetics behave differently for categorical vs. numerical variables?

    # Your R code here

    Your answer here.

  17. In the scatterplot of hwy vs. displ, what happens if you map a third variable to linewidth?

    # Your R code here

    Your answer here.

  18. What happens if you map the same variable to multiple aesthetics?

    # Your R code here

    Your answer here.

  19. Make a scatterplot of bill_depth_mm vs. bill_length_mm and color the points by species. What does adding coloring by species reveal about the relationship between these two variables? What about faceting by species?

    # Your R code here

    Your answer here.

  20. Why does the following yield two separate legends? How would you fix it to combine the two legends?

      data = penguins,
      mapping = aes(
        x = bill_length_mm, y = bill_depth_mm, 
        color = species, shape = species
    ) +
      geom_point() +
      labs(color = "Species")
    # Your R fix here

    Your answer here.

  21. Create the two following stacked bar plots. Which question can you answer with the first one? Which question can you answer with the second one?

    ggplot(penguins, aes(x = island, fill = species)) +
      geom_bar(position = "fill")
    ggplot(penguins, aes(x = species, fill = island)) +
      geom_bar(position = "fill")
    ggplot(penguins, aes(x = island, fill = species)) +
      geom_bar(position = "fill") -> p1
    ggplot(penguins, aes(x = species, fill = island)) +
      geom_bar(position = "fill") -> p2
    p1 / p2

    Your answer here.

  22. Run the following lines of code. Which of the two plots is saved as mpg-plot.png? Why?

    ggplot(mpg, aes(x = class)) +
    ggplot(mpg, aes(x = cty, y = hwy)) +
    # Your R code here

    Your answer here.

  23. What do you need to change in the code above to save the plot as a PDF instead of a PNG? How could you find out what types of image files would work in ggsave()?


    Your answer here.