7  Exercises (Chapter 7)

  1. What function would you use to read a file where fields were separated with “|”?

    Answer

    Your text answer here.

  2. Apart from file, skip, and comment, what other arguments do read_csv() and read_tsv() have in common?

    Answer

    Your text answer here.

  3. What are the most important arguments to read_fwf()?

    Answer

    Your text answer here.

  4. Sometimes strings in a CSV file contain commas. To prevent them from causing problems, they need to be surrounded by a quoting character, like " or '. By default, read_csv() assumes that the quoting character will be ". To read the following text into a data frame, what argument to read_csv() do you need to specify?

    "x,y\n1,'a,b'"
    Answer

    We need to specify the quote argument.

    read_csv("x,y\n1,'a,b'", quote = "\'")
    Rows: 1 Columns: 2
    ── Column specification ────────────────────────────────────────────────────────
    Delimiter: ","
    chr (1): y
    dbl (1): x
    
    ℹ Use `spec()` to retrieve the full column specification for this data.
    ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
    # A tibble: 1 × 2
          x y    
      <dbl> <chr>
    1     1 a,b  
  5. Identify what is wrong with each of the following inline CSV files. What happens when you run the code?

    read_csv("a,b\n1,2,3\n4,5,6")
    read_csv("a,b,c\n1,2\n1,2,3,4")
    read_csv("a,b\n\"1")
    read_csv("a,b\n1,2\na,b")
    read_csv("a;b\n1;3")
    Answer
    read_csv("a,b\n1,2,3\n4,5,6")
    Warning: One or more parsing issues, call `problems()` on your data frame for details,
    e.g.:
      dat <- vroom(...)
      problems(dat)
    Rows: 2 Columns: 2
    ── Column specification ────────────────────────────────────────────────────────
    Delimiter: ","
    dbl (1): a
    num (1): b
    
    ℹ Use `spec()` to retrieve the full column specification for this data.
    ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
    # A tibble: 2 × 2
          a     b
      <dbl> <dbl>
    1     1    23
    2     4    56

    There are only two column headers but three values in each row, so the last two get merged.

    Answer
    read_csv("a,b,c\n1,2\n1,2,3,4")
    Warning: One or more parsing issues, call `problems()` on your data frame for details,
    e.g.:
      dat <- vroom(...)
      problems(dat)
    Rows: 2 Columns: 3
    ── Column specification ────────────────────────────────────────────────────────
    Delimiter: ","
    dbl (2): a, b
    num (1): c
    
    ℹ Use `spec()` to retrieve the full column specification for this data.
    ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
    # A tibble: 2 × 3
          a     b     c
      <dbl> <dbl> <dbl>
    1     1     2    NA
    2     1     2    34

    here are only three column headers, first row is missing a value in the last column so gets an NA there, the second row has four values so the last two get merge

    Answer
    read_csv("a,b\n\"1")
    Rows: 0 Columns: 2
    ── Column specification ────────────────────────────────────────────────────────
    Delimiter: ","
    chr (2): a, b
    
    ℹ Use `spec()` to retrieve the full column specification for this data.
    ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
    # A tibble: 0 × 2
    # ℹ 2 variables: a <chr>, b <chr>

    No rows are read in.

    Answer
    read_csv("a,b\n1,2\na,b")
    Rows: 2 Columns: 2
    ── Column specification ────────────────────────────────────────────────────────
    Delimiter: ","
    chr (2): a, b
    
    ℹ Use `spec()` to retrieve the full column specification for this data.
    ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
    # A tibble: 2 × 2
      a     b    
      <chr> <chr>
    1 1     2    
    2 a     b    

    Each column has a numerical and a character value, so the column type is coerced to character.

    Answer
    read_csv("a;b\n1;3")
    Rows: 1 Columns: 1
    ── Column specification ────────────────────────────────────────────────────────
    Delimiter: ","
    chr (1): a;b
    
    ℹ Use `spec()` to retrieve the full column specification for this data.
    ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
    # A tibble: 1 × 1
      `a;b`
      <chr>
    1 1;3  

    The delimiter is ; but it’s not specified, therefore this is read in as a single-column data frame with a single observation.

  6. Practice referring to non-syntactic names in the following data frame by:

    set.seed(321)
    annoying <- tibble(
      `1` = 1:10,
      `2` = `1` * 2 + rnorm(length(`1`))
    )
    1. Extracting the variable called 1.

      Answer
      annoying |> 
        select(`1`)
      # A tibble: 10 × 1
           `1`
         <int>
       1     1
       2     2
       3     3
       4     4
       5     5
       6     6
       7     7
       8     8
       9     9
      10    10
      # or
      annoying$`1`
       [1]  1  2  3  4  5  6  7  8  9 10
    2. Plotting a scatterplot of 1 vs. 2.

      Answer
      annoying |> 
        ggplot(aes(x = `2`, y = `1`)) + 
          geom_point()

    3. Creating a new column called 3, which is 2 divided by 1.

      Answer
      annoying |> 
        mutate(`3` = `2`/`1`)
      # A tibble: 10 × 3
           `1`   `2`   `3`
         <int> <dbl> <dbl>
       1     1  3.70  3.70
       2     2  3.29  1.64
       3     3  5.72  1.91
       4     4  7.88  1.97
       5     5  9.88  1.98
       6     6 12.3   2.04
       7     7 14.7   2.10
       8     8 16.2   2.03
       9     9 18.3   2.04
      10    10 19.4   1.94
    4. Renaming the columns to one, two, and three.

      Answer
      annoying |> 
        mutate(`3` = `2`/`1`) |> 
          rename(
          "one" = `1`,
          "two" = `2`,
          "three" = `3`
          )
      # A tibble: 10 × 3
           one   two three
         <int> <dbl> <dbl>
       1     1  3.70  3.70
       2     2  3.29  1.64
       3     3  5.72  1.91
       4     4  7.88  1.97
       5     5  9.88  1.98
       6     6 12.3   2.04
       7     7 14.7   2.10
       8     8 16.2   2.03
       9     9 18.3   2.04
      10    10 19.4   1.94