Cross-Tabulations

R Language

Vision Zero

NYC DOT's Vision Zero data

monthly: Zip GZip
R script and R library


average casualties per month	before Vision Zero	after Vision Zero

less than 0.05	39.1 %	47.1 %
0.05 to 0.14	34.1 %	31.7 %
0.15 to 0.24	13.1 %	9.4 %
0.25 to 0.39	7.2 %	7.0 %
0.40 to 0.59	3.4 %	2.6 %
0.60 to 0.99	2.2 %	1.6 %
1.00 and over	0.9 %	0.6 %

In our discussion of charts and graphs, we observed that New York City's Vision Zero initiative has sharply reduced traffic injuries and fatalities. Between 2013 and 2017, the number of injuries decreased 20 percent and the number of fatalities decreased 35 percent.

Here, we explore the same question by comparing the distribution of casualties before and after the Vision Zero initiative began. And once again, I define "casualties" as the sum of "fatalities" and "injuries."

From the graph above (and the accompanying table) we see that the number of intersections with less than 0.05 average monthly casualties has increased. And we see that the number of intersections at each of the dangerous levels has fallen.

Put differently, the distribution of average monthly casualties has shifted towards zero.

The percentage of intersections with less than 0.05 average monthly casualties rose 8 percentage points. (And by definition, the more dangerous categories fell by a combined 8 percentage points).

To create such a comparison, we first use the tapply function to create a table of intersections with average monthly casualties before and after the Vision Zero initiative began:

## make casualties by intersection, before and after Vision Zero
cas_table <- tapply( X = nycdot$Casualties,
INDEX = nycdot[,c("NODEID", "VisionZero")], FUN = mean )

In that cas_table, each row represents one intersection. In one column of each row is the average monthly casualties before Vision Zero. And in the other column is the average monthly casualties after Vision Zero.

Next, we need a function that categorizes the average monthly casualties:

## function to cut casualty counts into buckets
makeCasualtyBuckets <- function(x) {
  ifelse( x < 0.05 , "a. less than 0.05" ,
    ifelse( x < 0.15 , "b. 0.05 to 0.14" ,
      ifelse( x < 0.25 , "c. 0.15 to 0.24" ,
        ifelse( x < 0.40 , "d. 0.25 to 0.39" ,
          ifelse( x < 0.60 , "e. 0.40 to 0.59" ,
            ifelse( x < 1.00 , "f. 0.60 to 0.99" ,
              "g. 1.00 and over"
              ))))))
}

With that function, we can then count intersections by category. If each category (of average monthly casualties) is represented both before and after Vision Zero began, we can simply cbind together two tables:

## quick and dirty cross-tabulations
cas_ctabs <- cbind(
table( makeCasualtyBuckets( cas_table[,"0"] )),
table( makeCasualtyBuckets( cas_table[,"1"] ))
)
colnames(cas_ctabs) <- c("before Vision Zero","after Vision Zero")

But it's helpful to have a more general crosstabs function that can handle the case where the count is zero for one or more categories.

## create cross tabs function
crosstabs <- function( tapp , FUN ) {

  ## fetch column names
  cnames <- colnames( tapp )

  ## run function on each column, store values in list
  cvals <- list()
  allvals <- NA
  for (cname in cnames) {
    cvals[[cname]] <- table(FUN(tapp[,cname]))
    allvals <- c( allvals , names( cvals[[cname]] ))
  }
  allvals <- sort(unique(allvals))

  ## create a table to store the cross tabulations
  ctabs <- matrix( data = 0 , nrow = length(allvals) , ncol = length(cnames) )
  rownames(ctabs) <- allvals
  colnames(ctabs) <- cnames

  ## populate the table
  for (cname in cnames) {
    lsnames <- names( cvals[[cname]] )
    ctabs[lsnames,cname] <- cvals[[cname]]
  }

  ## return the cross tabulations
  ctabs
}

Then, as before, we use the tapply function to create a table of intersections with average monthly casualties before and after the Vision Zero initiative began. But this time, we use our crosstabs function.

## make casualties by intersection, before and after Vision Zero
cas_table <- tapply( X = nycdot$Casualties,
INDEX = nycdot[,c("NODEID", "VisionZero")], FUN = mean )

## make crosstabs
cas_vzero <- crosstabs( tapp = cas_table , FUN = makeCasualtyBuckets )
print( cas_vzero )

Finally, we can compare the two distributions by plotting them side-by-side:

## compare the distributions
barplot( t(cas_vzero) , beside = TRUE , main = "NYC Casualty Distribution",
xlab = "average casualties per month" , ylab = "number of intersections",
legend = c("before Vision Zero", "after Vision Zero"), col = c("blue","orange"))

More detailed examples can be found in the R script and R library that I wrote for this analysis. And in our discussion of the Perl language, we will create the data structures that convert the original data into the dataset used for this analysis.

:: previous ::
regressions

table of contents