Court told TUSD’s MAS data flawed

TUSD Board member Dr. Mark Stegeman wrote to federal Judge David Bury “to express concerns with two aspects of the draft unitary plan for the Tucson Unified School District” in its desegregation case.

Recently plaintiffs, parents, and concerned citizens have expressed concerns about some of the issues in the proposed settlement agreement as news leaks out about upheavals in the team of consultants assembled by the court appointed Special Master; Willis Hawley.

Two weeks ago, Latino activist Gabriela Saucedo Mercer offered to the Court a letter signed by nearly two hundred people opposing the reinstitution of the District’s controversial Mexican American Studies classes. Saucedo Mercer introduced legislation last year to the Arizona Legislature prohibiting teachers from the indoctrination of school children. Students in the District’s Mexican American Studies classes had complained about the teachers’ imposition of their radical views on students.

That legislation met strong opposition from teachers’ unions, and Bill Ayer’s organization: the American Education Research Association. Saucedo Mercer’s legislation had tried to shift the focus of current law prohibiting the resegregation of students and teaching racial resentment in the classroom, away from only protecting Latino students against indoctrination but to include all students in grades K-12.

Stegeman, a Ph.D. from MIT and University of Arizona Economics professor wrote a 10 page carefully crafted letter to the Court “as a private citizen and as one member of the TUSD Governing Board but not as a representative of the Board as a whole.”

Fellow TUSD Board member Michael Hicks endorsed Stegeman’s letter saying that he was concerned about the fact that no federal court has never before mandated courses to be taught to students in a desegregation case. Mr. Hicks asked, “If this happens, what courses will be mandated by federal courts next? Will local school boards matter anymore? Will state boards of education matter anymore?”

Stegeman began his letter by advising the Court that overall he thinks the “draft plan is much better, in many ways, than the post-unitary plan that the Board sent to the court in 2009.”

He wrote to the Court expressing concern “that the student assignment portion of the proposed plan reimposes racial preferences through a weighted lottery, something that TUSD has not seen in years. This adds a layer of complexity and possible perceptions of unequal treatment and, according to my understanding, may not be legally sustainable over the long term.”

He said that while “supporting the inclusion of more Mexican American history, literature, and perspective into the standard high school core sequences in social studies and English, I also continue to support the previous Board’s objection to the mandate for culturally relevant courses, what I shall call the “curriculum mandate”:

Stegeman argued that the curriculum mandate should not be imposed by the court and said his “reasons can be summarized in seven points, which are elaborated below:

(1) The much-cited achievement evidence concerning TUSD’s former Mexican American Studies (MAS) program is weak.

(2) The mandate may not represent the best investment of desegregation funds, from the standpoint of raising student achievement.

(3) The mandate tends to promote rather than to reduce segregation.

(4) Problematic curriculum could reappear, despite the district’s stated intentions.

(5) Such a curriculum mandate is unusual in desegregation cases and driven by no previous action in this case.

(6) The mandate could force the district into conflict with a provision in state law.

(7) The proposed mandate is essentially an attempt to recruit the court into taking sides in a political dispute.

The achievement data are weak.

School districts make many decisions and investments that are not clearly justified by any academic study.

Indeed, it is the general nature of educational data and research that unambiguous recommendations based on sound statistical technique are hard to find. Academic studies are thus only one of many considerations when making educational policy.

The statistical analysis of achievement data has, however, become prominent in the public debate over MAS.

Many persons have asked why the Board suspended a program that produced such strong achievement results, but the statistical evidence that the MAS classes caused higher achievement is, in fact, weak. The data are suggestive (to use a word that economists like) but I have seen no analysis that provides serious support for the achievement claim.

A basic limitation is that the data set is relatively small. No other school district appears to have operated a similar curriculum program for any significant period of time, so the data come only from TUSD.

Within TUSD, the MAS program served at most a few hundred students a year, and persons studying the question have, as far as I know, considered only recent graduation cohorts and fewer than 1,600 MAS students altogether.

Two relatively serious attempts to study the data are a memo (March 11, 2011), from TUSD’s Accountability and Research Department and an unpublished paper by Cabrera et al. (June 20, 2012). Documents such as the Cambium audit add little to the debate, because that study’s authors make no claim to be statisticians and as far as I know did no independent statistical analysis.

The staff memo (which does not claim to offer a serious statistical study) considers two achievement measures:

(a) the AIMS passing rates of high school juniors who failed the AIMS test during their sophomore year, comparing students who took a MAS course in the junior year to juniors who did not;

(b) the graduation rates of high school seniors, comparing students who took a MAS course in the senior year to seniors who did not.

The memo reports that students taking a MAS course in the junior year passed the AIMS reading test at a rate 5% to 16% higher than other students, depending on the year. The writing test, but not the math test, shows similar differences. Also, seniors taking a MAS course during their senior year graduated at a rate 5% to 11% higher than seniors who did not.

The Cabrera et al. paper appears to report much stronger effects on AIMS results and graduation rates. (Their estimations also suggest that taking MAS courses makes students less likely to attend college but they argue that this result is not meaningful.)

It is possible that MAS courses causally increase either AIMS scores or graduation rates, but neither of these two documents demonstrate this. Here is a partial list of the issues.

Reasons that causal effects cannot be inferred from the 3/11/11 staff memo.

(A) No control for class size.

On average, MAS classes had lower enrollments than the standard core courses for which they substituted.

The higher achievement for MAS students could be entirely due to class size rather than to the curriculum.

For example, in 2008-2010, combining Rincon and Tucson High Schools, the average class size in junior level “traditional” history was 28.4 while the average class size in MAS history was 21.5. I do not know whether that gap (calculated by staff) is representative, but any credible analysis of the causal effects of MAS classes should account for class size.

(B) No control for variables that might raise student achievement and also be correlated with taking MAS courses.

For example, students who failed the 10 -grade AIMS test th by a relatively small margin might be more likely to pass it later and also more likely to take MAS courses.

A second example: About half of MAS enrollments occurred at Tucson High School; therefore, if some other characteristic of that school improved graduation rates, then higher graduation rates could be correlated with MAS enrollment district-wide even if taking MAS courses had no causal effect.

Many similar examples can be constructed. The point is that inferring causality from MAS classes is unjustified when other plausible explanatory variables are excluded from the analysis.

Among the students who failed the 10th grade AIMS test, the ones who chose to enroll (or were recruited to enroll) in MAS courses may have had relatively high academic motivation. In the absence of any other proxy for this higher motivation, the MAS variable would pick it up and taking MAS classes would thus (falsely) appear to be causing higher achievement.

(D) No tests of statistical significance.

None of TUSD’s internal analyses attempt to determine whether the observed differences in achievement are statistically significant. Especially in small samples, such differences can reflect random variation rather than causation.

Reasons that causal effects cannot be inferred from the 6/20/12 Cabrera et al. paper.

To read more of the letter, click here.

All of the concerns (A)-(D) apply also to the Cabrera et al. study, except that the latter study does include tests for statistical significance. The Cabrera et al. study does, however, suffer from the following additional shortcomings.

(E) Mis-specification of the logistic equation.

The regression that underlies the entire analysis appears to be fundamentally mis-specified: the dependent variable should be the logarithm of the odds ratio: ln[P(Y)/(1-P(Y)] rather than ln(P(Y))/[1-P(Y)] as reported in the paper (p.3).

The reported specification makes no sense. It might be a typographical error, except that the same error appears twice and should have been obvious to anyone familiar with the logistic model. If the authors actually estimated the equation that they say they estimated, then the results are meaningless.

(F) The non-standard estimation of ratios raises several issues.

The paper estimates binary treatment effects (taking a MAS course being the binary treatment), a standard statistical problem that arises in many fields. The average treatment effect estimators appropriate for such a problem are well understood. But Cabrera et al. do something different and unusual, by estimating and reporting the odds ratios. This strategy could be viable but it raises several issues. First, the distribution of the standard errors of the odds ratio, and thus the p-values, are not obvious, and Cabrera et al. beg the question by saying only that they use “conventionally accepted standards.” All of their tests of statistical significance depend upon this.

Estimating the odds ratios, instead of differences in the averages, also makes the estimation less robust to specification errors. Because mis-specification is almost inevitable in this and similar models, their nonstandard approach increases the likelihood of badly flawed estimates.

(G) A serious selection problem.

Unlike the staff memo, the paper’s sample appears to include the cohort of 10th graders who fail the AIMS test in 10th grade and then drop out after 10th grade. This could greatly exaggerate the estimated achievement effects.

To consider an extreme example, suppose that half of the students who fail 10th grade AIMS drop out after 10th grade; of the remainder who matriculate to 11th grade, suppose that half take MAS courses and half fail to graduate but that those events are uncorrelated. Then, of the original cohort who failed AIMS in 10th grade, half of those who take MAS courses will graduate but only 1/6 of the rest will graduate (because many of them did not even advance to 11th grade). Then students in the sample who take MAS courses will graduate at thrice the rate of those who do not, even though there is no causal relationship!

Students who drop out after 11th grade cause a similar but probably less dramatic error in the estimated effect on graduation: they can obviously neither graduate nor take a senior-level MAS course, regardless of whether any causal relationship exists between those outcomes.

This problem alone could explain why Cabrera et al. find much larger effects than seem plausible given the data in the TUSD staff memo.

(H) Omitted variables.

Unlike the staff study, the Cabrera paper uses a regression to account for many (mainly demographic) control variables, but the regression omits many plausible causal factors. Class size, 10th grade AIMS results, and the school attended are all omitted, and any of these omissions (as explained in points (A) and (B) could generate a spurious relationship between taking MAS classes and student achievement.

The authors deliberately excluded the dummy variables for Asian Americans and Native Americans from the sample; they write that in some cases they also excluded African Americans and persons with high incomes. They explain that these dummy variables exhibited too little variation to be included (e.g. there were not enough Native Americans in the sample), but that is generally not a good reason to exclude variables that are presumed to have explanatory power; and that is certainly the case here because the premise underlying culturally relevant pedagogy is that ethnicity affects how students respond to different pedagogical approaches!

In short, the paper selectively and rather mysteriously omits explanatory variables that should have been included. This could exaggerate the estimates of achievement due to MAS. For example, if Native American students rarely took MAS courses and also exhibited lower achievement as measured by the paper, then one would expect the dummy variable for Native American identification to have an estimated negative effect on achievement; but excluding that variable could spuriously push that effect into the MAS variable, driving up the estimated effect of taking MAS courses.”