Chapter 2 Graphs

Creating a Frequency Table and Bar Chart for PV with StatCrunch using the data set CategoricalData.



Creating a contingency table and a side-by-side bar chart showing the conditional distribution of SEX for each category of political viewpoint.



A more helpful bar plot:



Segmented Bar Chart:



Note that the distributions of political view within SEX are different, suggesting that political view and SEX are not independent.

Simpson’s Paradox

Consider the StatCrunch data set DISCRIM_CLEAN a slighly modified and cleaned version of the data used in the article Simpson’s Paradox: A Data Set and Discrimination Case Study Exercise (Taylor and Mickel 2014).

Compute the Department of Developmental Services (DDS) average expenditure

  • by Gender
  • By Ethnicity


It appears as if the DDS is spending more money on White not Hisapnics than it is on Hispanics. But, is this really the case?

Consider the average expenditure by age_group as well as the number of individuals in each age_group when accounting for Ethnicity.





Using R

# Read in the data to CD
CD <- read.csv("../data/CategoricalData.csv")
head(CD)  # View first six rows
     SEX      PV
1 Female Liberal
2 Female Liberal
3 Female Liberal
4 Female Liberal
5 Female Liberal
6 Female Liberal
# Marginal distribution of SEX
xtabs(~SEX, data = CD)
SEX
Female   Male 
    77    115 
# Marginal distribution of PV
xtabs(~PV, data = CD)
PV
Conservative      Liberal     Moderate 
          27           85           80 
# Contingency table stored in T1
(T1 <- xtabs(~SEX + PV, data = CD))
        PV
SEX      Conservative Liberal Moderate
  Female            6      35       36
  Male             21      50       44
# Conditional on SEX
prop.table(T1, 1)
        PV
SEX      Conservative    Liberal   Moderate
  Female   0.07792208 0.45454545 0.46753247
  Male     0.18260870 0.43478261 0.38260870
# Conditional on PV
prop.table(T1, 2)
        PV
SEX      Conservative   Liberal  Moderate
  Female    0.2222222 0.4117647 0.4500000
  Male      0.7777778 0.5882353 0.5500000
# Total
prop.table(T1)
        PV
SEX      Conservative   Liberal  Moderate
  Female    0.0312500 0.1822917 0.1875000
  Male      0.1093750 0.2604167 0.2291667

Same thing with a tidyverse approach


References

Taylor, Stanley A., and Amy E. Mickel. 2014. “Simpson’s Paradox: A Data Set and Discrimination Case Study Exercise.” Journal of Statistics Education 22 (1): null–null. https://doi.org/10.1080/10691898.2014.11889697.