# Cross-Tabulations ### Vision Zero

 averagecasualtiesper month beforeVisionZero afterVisionZero less than 0.05 39.1 % 47.1 % 0.05 to 0.14 34.1 % 31.7 % 0.15 to 0.24 13.1 % 9.4 % 0.25 to 0.39 7.2 % 7.0 % 0.40 to 0.59 3.4 % 2.6 % 0.60 to 0.99 2.2 % 1.6 % 1.00 and over 0.9 % 0.6 %

In our discussion of charts and graphs, we observed that New York City's Vision Zero initiative has sharply reduced traffic injuries and fatalities. Between 2013 and 2017, the number of injuries decreased 20 percent and the number of fatalities decreased 35 percent.

Here, we explore the same question by comparing the distribution of casualties before and after the Vision Zero initiative began. And once again, I define "casualties" as the sum of "fatalities" and "injuries."

From the graph above (and the accompanying table) we see that the number of intersections with less than 0.05 average monthly casualties has increased. And we see that the number of intersections at each of the dangerous levels has fallen.

Put differently, the distribution of average monthly casualties has shifted towards zero.

The percentage of intersections with less than 0.05 average monthly casualties rose 8 percentage points. (And by definition, the more dangerous categories fell by a combined 8 percentage points).

To create such a comparison, we first use the tapply function to create a table of intersections with average monthly casualties before and after the Vision Zero initiative began:

## make casualties by intersection, before and after Vision Zero
cas_table <- tapply( X = nycdot\$Casualties,
INDEX = nycdot[,c("NODEID", "VisionZero")], FUN = mean )

In that cas_table, each row represents one intersection. In one column of each row is the average monthly casualties before Vision Zero. And in the other column is the average monthly casualties after Vision Zero.

Next, we need a function that categorizes the average monthly casualties:

## function to cut casualty counts into buckets
makeCasualtyBuckets <- function(x) {
ifelse( x < 0.05 , "a. less than 0.05" ,
ifelse( x < 0.15 , "b. 0.05 to 0.14" ,
ifelse( x < 0.25 , "c. 0.15 to 0.24" ,
ifelse( x < 0.40 , "d. 0.25 to 0.39" ,
ifelse( x < 0.60 , "e. 0.40 to 0.59" ,
ifelse( x < 1.00 , "f. 0.60 to 0.99" ,
"g. 1.00 and over"
))))))
}

With that function, we can then count intersections by category. If each category (of average monthly casualties) is represented both before and after Vision Zero began, we can simply cbind together two tables:

## quick and dirty cross-tabulations
cas_ctabs <- cbind(
table( makeCasualtyBuckets( cas_table[,"0"] )),
table( makeCasualtyBuckets( cas_table[,"1"] ))
)
colnames(cas_ctabs) <- c("before Vision Zero","after Vision Zero")

But it's helpful to have a more general crosstabs function that can handle the case where the count is zero for one or more categories.

## create cross tabs function
crosstabs <- function( tapp , FUN ) {

## fetch column names
cnames <- colnames( tapp )

## run function on each column, store values in list
cvals <- list()
allvals <- NA
for (cname in cnames) {
cvals[[cname]] <- table(FUN(tapp[,cname]))
allvals <- c( allvals , names( cvals[[cname]] ))
}
allvals <- sort(unique(allvals))

## create a table to store the cross tabulations
ctabs <- matrix( data = 0 , nrow = length(allvals) , ncol = length(cnames) )
rownames(ctabs) <- allvals
colnames(ctabs) <- cnames

## populate the table
for (cname in cnames) {
lsnames <- names( cvals[[cname]] )
ctabs[lsnames,cname] <- cvals[[cname]]
}

## return the cross tabulations
ctabs
}

Then, as before, we use the tapply function to create a table of intersections with average monthly casualties before and after the Vision Zero initiative began. But this time, we use our crosstabs function.

## make casualties by intersection, before and after Vision Zero
cas_table <- tapply( X = nycdot\$Casualties,
INDEX = nycdot[,c("NODEID", "VisionZero")], FUN = mean )

## make crosstabs
cas_vzero <- crosstabs( tapp = cas_table , FUN = makeCasualtyBuckets )
print( cas_vzero )

Finally, we can compare the two distributions by plotting them side-by-side:

## compare the distributions
barplot( t(cas_vzero) , beside = TRUE , main = "NYC Casualty Distribution",
xlab = "average casualties per month" , ylab = "number of intersections",
legend = c("before Vision Zero", "after Vision Zero"), col = c("blue","orange"))

More detailed examples can be found in the R script and R library that I wrote for this analysis. And in our discussion of the Perl language, we will create the data structures that convert the original data into the dataset used for this analysis.