Monday, October 24, 2011

The China Study II: Animal protein, wheat, and mortality … there is something odd here!

WarpPLS and HealthCorrelator for Excel were used in the analyses below. For other China Study analyses, many using WarpPLS and HealthCorrelator for Excel, click here. For the dataset used, visit the HealthCorrelator for Excel site and check under the sample datasets area. I thank Dr. T. Colin Campbell and his collaborators at the University of Oxford for making the data publicly available for independent analyses.

The graph below shows the results of a multivariate linear WarpPLS analysis including the following variables: Wheat (wheat flour consumption in g/d), Aprot (animal protein consumption in g/d), Mor35_69 (number of deaths per 1,000 people in the 35-69 age range), and Mor70_79 (number of deaths per 1,000 people in the 70-79 age range).


Just a technical comment here, regarding the possibility of ecological fallacy. I am not going to get into this in any depth now, but let me say that the patterns in the data suggest that, with the possible exception of some variables (e.g., blood glucose, gender; the latter will get us going in the next few posts), ecological fallacy due to county aggregation is not a big problem. The threat of ecological fallacy exists, here and in many other datasets, but it is generally overstated (often by those whose previous findings are contradicted by aggregated results).

I have not included plant protein consumption in the analysis because plant protein consumption is very strongly and positively associated with wheat flour consumption. The reason is simple. Almost all of the plant protein consumed by the participants in this study was probably gluten, from wheat products. Fruits and vegetables have very small amounts of protein. Keeping that in mind, what the graph above tells us is that:

- Wheat flour consumption is significantly and negatively associated with animal protein consumption. This is probably due to those eating more wheat products tending to consume less animal protein.

- Wheat flour consumption is positively associated with mortality in the 35-69 age range. The P value (P=0.06) is just shy of the 5 percent (i.e., P=0.05) that most researchers would consider to be the threshold for statistical significance. More consumption of wheat in a county, more deaths in this age range.

- Wheat flour consumption is significantly and positively associated with mortality in the 70-79 age range. More consumption of wheat in a county, more deaths in this age range.

- Animal protein consumption is not significantly associated with mortality in the 35-69 age range.

- Animal protein consumption is significantly and negatively associated with mortality in the 70-79 age range. More consumption of animal protein in a county, fewer deaths in this age range.

Let me tell you, from my past experience analyzing health data (as well as other types of data, from different fields), that these coefficients of association do not suggest super-strong associations. Actually this is also indicated by the R-squared coefficients, which vary from 3 to 7 percent. These are the variances explained by the model on the variables above the R-squared coefficients. They are low, which means that the model has weak explanatory power.

R-squared coefficients of 20 percent and above would be more promising. I hate to disappoint hardcore carnivores and the fans of the “wheat is murder” theory, but these coefficients of association and variance explained are probably way less than what we would expect to see if animal protein was humanity's salvation and wheat its demise.

Moreover, the lack of association between animal protein consumption and mortality in the 35-69 age range is a bit strange, given that there is an association suggestive of a protective effect in the 70-79 age range.

Of course death happens for all kinds of reasons, not only what we eat. Still, let us take a look at some other graphs involving these foodstuffs to see if we can form a better picture of what is going on here. Below is a graph showing mortality at the two age ranges for different levels of animal protein consumption. The results are organized in quintiles.


As you can see, the participants in this study consumed relatively little animal protein. The lowest mortality in the 70-79 age range, arguably the range of higher vulnerability, was for the 28 to 35 g/d quintile of consumption. That was the highest consumption quintile. About a quarter to a third of 1 lb/d of beef, and less of seafood (in general), would give you that much animal protein.

Keep in mind that the unit of analysis here is the county, and that these results are based on county averages. I wish I had access to data on individual participants! Still I stand by my comment earlier on ecological fallacy. Don't worry too much about it just yet.

Clearly the above results and graphs contradict claims that animal protein consumption makes people die earlier, and go somewhat against the notion that animal protein consumption causes things that make people die earlier, such as cancer. But they do so in a messy way - that spike in mortality in the 70-79 age range for 21-28 g/d of animal protein is a bit strange.

Below is a graph showing mortality at the two age ranges (i.e., 35-69 and 70-79) for different levels of wheat flour consumption. Again, the results are shown in quintiles.


Without a doubt the participants in this study consumed a lot of wheat flour. The lowest mortality in the 70-79 age range, which is the range of higher vulnerability, was for the 300 to 450 g/d quintile of wheat flour consumption. The high end of this range is about 1 lb/d of wheat flour! How many slices of bread would this be equivalent to? I don’t know, but my guess is that it would be many.

Well, this is not exactly the smoking gun linking wheat with early death, a connection that has been reaching near mythical proportions on the Internetz lately. Overall, the linear trend seems to be one of decreased longevity associated with wheat flour consumption, as suggested by the WarpPLS results, but the relationship between these two variables is messy and somewhat weak. It is not even clearly nonlinear, at least in terms of the ubiquitous J-curve relationship.

Frankly, there is something odd about these results.

This oddity led to me to explore, using HealthCorrelator for Excel, all ordered associations between mortality in the 35-69 and 70-79 age ranges and all of the other variables in the dataset. That in turn led me to a more complex WarpPLS analysis, which I’ll talk about in my next post, which is still being written.

I can tell you right now that there will be more oddities there, which will eventually take us to what I refer to as the mysterious factor X. Ah, by the way, that factor X is not gender - but gender leads us to it.