5  Exercises (Chapter 5)

Tables
table1
# A tibble: 6 × 4
  country      year  cases population
  <chr>       <dbl>  <dbl>      <dbl>
1 Afghanistan  1999    745   19987071
2 Afghanistan  2000   2666   20595360
3 Brazil       1999  37737  172006362
4 Brazil       2000  80488  174504898
5 China        1999 212258 1272915272
6 China        2000 213766 1280428583
table2
# A tibble: 12 × 4
   country      year type            count
   <chr>       <dbl> <chr>           <dbl>
 1 Afghanistan  1999 cases             745
 2 Afghanistan  1999 population   19987071
 3 Afghanistan  2000 cases            2666
 4 Afghanistan  2000 population   20595360
 5 Brazil       1999 cases           37737
 6 Brazil       1999 population  172006362
 7 Brazil       2000 cases           80488
 8 Brazil       2000 population  174504898
 9 China        1999 cases          212258
10 China        1999 population 1272915272
11 China        2000 cases          213766
12 China        2000 population 1280428583
table3
# A tibble: 6 × 3
  country      year rate             
  <chr>       <dbl> <chr>            
1 Afghanistan  1999 745/19987071     
2 Afghanistan  2000 2666/20595360    
3 Brazil       1999 37737/172006362  
4 Brazil       2000 80488/174504898  
5 China        1999 212258/1272915272
6 China        2000 213766/1280428583
  1. For each of the sample tables, describe what each observation and each column represents.

    Answer

    Your text answer here.

  2. Sketch out the process you’d use to calculate the rate for table2 and table3. You will need to perform four operations:

    1. Extract the number of TB cases per country per year.
    2. Extract the matching population per country per year.
    3. Divide cases by population, and multiply by 10000.
    4. Store back in the appropriate place.

    You haven’t yet learned all the functions you’d need to actually perform these operations, but you should still be able to think through the transformations you’d need.

    Answer
    table2 |>
      pivot_wider(names_from = type,
                  values_from = count) |>
      mutate(rate = cases / population * 10000)
    # A tibble: 6 × 5
      country      year  cases population  rate
      <chr>       <dbl>  <dbl>      <dbl> <dbl>
    1 Afghanistan  1999    745   19987071 0.373
    2 Afghanistan  2000   2666   20595360 1.29 
    3 Brazil       1999  37737  172006362 2.19 
    4 Brazil       2000  80488  174504898 4.61 
    5 China        1999 212258 1272915272 1.67 
    6 China        2000 213766 1280428583 1.67 
    #
    table3 |>
      separate_wider_delim(
        cols = rate,
        delim = "/",
        names = c("cases", "population"),
      ) |>
      mutate(
        cases = as.numeric(cases),
        population = as.numeric(population),
        rate = cases / population * 10000
      )
    # A tibble: 6 × 5
      country      year  cases population  rate
      <chr>       <dbl>  <dbl>      <dbl> <dbl>
    1 Afghanistan  1999    745   19987071 0.373
    2 Afghanistan  2000   2666   20595360 1.29 
    3 Brazil       1999  37737  172006362 2.19 
    4 Brazil       2000  80488  174504898 4.61 
    5 China        1999 212258 1272915272 1.67 
    6 China        2000 213766 1280428583 1.67 

    For table2, we need to reshape the data to have a column for cases and a column for population and then divide the two to calculate the rate. A possible approach is shown above.

    For table3, we need to separate cases and population into their own columns and then divide them. A possible approach is shown above.