7 Exercises (Chapter 7)

What function would you use to read a file where fields were separated with “|”?

Answer

Your text answer here.
Apart from file, skip, and comment, what other arguments do read_csv() and read_tsv() have in common?

Answer

Your text answer here.
What are the most important arguments to read_fwf()?

Answer

Your text answer here.

Sometimes strings in a CSV file contain commas. To prevent them from causing problems, they need to be surrounded by a quoting character, like " or '. By default, read_csv() assumes that the quoting character will be ". To read the following text into a data frame, what argument to read_csv() do you need to specify?

"x,y\n1,'a,b'"

Answer

We need to specify the quote argument.

read_csv("x,y\n1,'a,b'", quote = "\'")

Rows: 1 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): y
dbl (1): x

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# A tibble: 1 × 2
      x y    
  <dbl> <chr>
1     1 a,b

Identify what is wrong with each of the following inline CSV files. What happens when you run the code?

read_csv("a,b\n1,2,3\n4,5,6")
read_csv("a,b,c\n1,2\n1,2,3,4")
read_csv("a,b\n\"1")
read_csv("a,b\n1,2\na,b")
read_csv("a;b\n1;3")

Answer

read_csv("a,b\n1,2,3\n4,5,6")

Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)

Rows: 2 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (1): a
num (1): b

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# A tibble: 2 × 2
      a     b
  <dbl> <dbl>
1     1    23
2     4    56

There are only two column headers but three values in each row, so the last two get merged.

Answer

read_csv("a,b,c\n1,2\n1,2,3,4")

Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)

Rows: 2 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): a, b
num (1): c

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# A tibble: 2 × 3
      a     b     c
  <dbl> <dbl> <dbl>
1     1     2    NA
2     1     2    34

here are only three column headers, first row is missing a value in the last column so gets an NA there, the second row has four values so the last two get merge

Answer

read_csv("a,b\n\"1")

Rows: 0 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): a, b

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# A tibble: 0 × 2
# ℹ 2 variables: a <chr>, b <chr>

No rows are read in.

Answer

read_csv("a,b\n1,2\na,b")

Rows: 2 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): a, b

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# A tibble: 2 × 2
  a     b    
  <chr> <chr>
1 1     2    
2 a     b

Each column has a numerical and a character value, so the column type is coerced to character.

Answer

read_csv("a;b\n1;3")

Rows: 1 Columns: 1
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): a;b

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# A tibble: 1 × 1
  `a;b`
  <chr>
1 1;3

The delimiter is ; but it’s not specified, therefore this is read in as a single-column data frame with a single observation.

Practice referring to non-syntactic names in the following data frame by:

set.seed(321)
annoying <- tibble(
  `1` = 1:10,
  `2` = `1` * 2 + rnorm(length(`1`))
)

Extracting the variable called 1.

Answer

annoying |> 
  select(`1`)

# A tibble: 10 × 1
     `1`
   <int>
 1     1
 2     2
 3     3
 4     4
 5     5
 6     6
 7     7
 8     8
 9     9
10    10

# or
annoying$`1`

 [1]  1  2  3  4  5  6  7  8  9 10

Plotting a scatterplot of 1 vs. 2.

Answer

annoying |> 
  ggplot(aes(x = `2`, y = `1`)) + 
    geom_point()

Creating a new column called 3, which is 2 divided by 1.

Answer

annoying |> 
  mutate(`3` = `2`/`1`)

# A tibble: 10 × 3
     `1`   `2`   `3`
   <int> <dbl> <dbl>
 1     1  3.70  3.70
 2     2  3.29  1.64
 3     3  5.72  1.91
 4     4  7.88  1.97
 5     5  9.88  1.98
 6     6 12.3   2.04
 7     7 14.7   2.10
 8     8 16.2   2.03
 9     9 18.3   2.04
10    10 19.4   1.94

Renaming the columns to one, two, and three.

Answer

annoying |> 
  mutate(`3` = `2`/`1`) |> 
    rename(
    "one" = `1`,
    "two" = `2`,
    "three" = `3`
    )

# A tibble: 10 × 3
     one   two three
   <int> <dbl> <dbl>
 1     1  3.70  3.70
 2     2  3.29  1.64
 3     3  5.72  1.91
 4     4  7.88  1.97
 5     5  9.88  1.98
 6     6 12.3   2.04
 7     7 14.7   2.10
 8     8 16.2   2.03
 9     9 18.3   2.04
10    10 19.4   1.94