1 Directions

Reproduce this document. The data frame WATER is from the PASWR2 package written by Arnholt (2016). The example is from Ugarte, Militino, and Arnholt (2015).

1.1 Example Problem 9.12

A bottled water company acquires its water from two independent sources, x and y. The company suspects that the sodium content in the water from source x is less than the sodium content for water from source y. An independent agency measures the sodium content in 20 samples from source x and 10 samples from source y and stores them in the data frame WATER. Is there statistical evidence to suggest the average sodium content in the water from source x is less than the average sodium content in the water from source y? The measurements for the sodium values are mg/L. Use an \(\alpha\) of 0.05 to test the appropriate hypotheses.

1.2 Partial Solution

To solve this problem, start by verifying the reasonableness of the normality assumption. The side-by-side boxplots and normal quantile-quantile plots depicted in Figures 1.1 and 1.2, respectively suggest it is reasonable to assume the sodium values for both sources follow normal distributions; however, it is clear from the boxplots that the variances are very different.

library(PASWR2)
ggplot(data = WATER, mapping = aes(x = source, y = sodium)) + 
  geom_boxplot() + 
  theme_bw()
Side-by-side boxplots of the sodium content for source `x` and `y`

Figure 1.1: Side-by-side boxplots of the sodium content for source x and y

ggplot(data = WATER, mapping = aes(sample = sodium, color = source)) +
  stat_qq() + 
  theme_bw()
Normal quantile-quantile plots of the sodium content for source `x` and source`y`

Figure 1.2: Normal quantile-quantile plots of the sodium content for source x and sourcey

Step 1: Hypotheses — Since the problem wants to test to see if the mean sodium content from source x is less than the mean sodium content from source y, use a lower one-sided alternative hypothesis as shown in Equation (1.1).

\[\begin{align} H_0&: \mu_x - \mu_y = 0 \nonumber \\ H_A&: \mu_x - \mu_y < 0. \tag{1.1} \end{align}\]
library(dplyr)
NDF <- WATER %>%
  group_by(source) %>%
  summarize(MEAN = mean(sodium), SD = sd(sodium), n = n())
knitr::kable(NDF, caption = "Summary statistics for the `WATER` data frame")
Table 1.1: Summary statistics for the WATER data frame
source MEAN SD n
x 76.4 11.080566 20
y 81.2 2.299758 10

Step 2: Test Statistic — The test statistic chosen is \(\bar{X} - \bar{Y}\) because \(E\left[\bar{X} - \bar{Y} \right] = \mu_x - \mu_y\). The value of this test statistic is \(76.4 - 81.2 = -4.8\). The standardized test statistic under the assumption theat \(H_0\) is true and its appropriate distribution are given in Equation (1.2).

\[\begin{equation} \frac{\left[(\bar{X} - \bar{Y}) - \delta_0 \right]}{\sqrt{\left(\frac{S_x^2}{n_x} + \frac{S_y^2}{n_y}\right)}} \overset{\bullet}{\sim} t_{\nu} \tag{1.2} \end{equation}\]
t.test(sodium ~ source, data = WATER, alternative = "less")

    Welch Two Sample t-test

data:  sodium by source
t = -1.8589, df = 22.069, p-value = 0.03822
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
       -Inf -0.3665724
sample estimates:
mean in group x mean in group y 
           76.4            81.2 

References

Arnholt, Alan T. 2016. PASWR2: Probability and Statistics with R, Second Edition. https://CRAN.R-project.org/package=PASWR2.

Ugarte, Maria Dolores, Ana F. Militino, and Alan T. Arnholt. 2015. Probability and Statistics with R, Second Edition. 2 edition. Boca Raton: Chapman; Hall/CRC.