Monday, September 26, 2011

Calling self-experimentation N=1 is incorrect and misleading

This is not a post about semantics. Using “N=1” to refer to self-experimentation is okay, as long as one understands that self-experimentation is one of the most powerful ways to improve one’s health. Typically the term “N=1” is used in a demeaning way, as in: “It is just my N=1 experience, so it’s not worth much, but …” This is the reason behind this post. Using the “N=1” term to refer to self-experimentation in this way is both incorrect and misleading.

Calling self-experimentation N=1 is incorrect

The table below shows a dataset that is discussed in this YouTube video on HealthCorrelator for Excel (HCE). It refers to one single individual. Nearly all health-related datasets will look somewhat like this, with columns referring to health variables and rows referring to multiple measurements for the health variables. (This actually applies to datasets in general, including datasets about non-health-related phenomena.)


Often each individual measurement, or row, will be associated with a particular point in time, such as a date. This will characterize the measurement approach used as longitudinal, as opposed to cross-sectional. One example of the latter would be a dataset where each row referred to a different individual, with the data on all rows collected at the same point in time. Longitudinal health-related measurement is frequently considered superior to cross-sectional measurement in terms of the insights that it can provide.

As you can see, the dataset has 10 rows, with the top row containing the names of the variables. So this dataset contains nine rows of data, which means that in this dataset “N=9”, even though the data is for one single individual. To call this an “N=1” experiment is incorrect.

As a side note, an empty cell, like that on the top row for HDL cholesterol, essentially means that a measurement for that variable was not taken on that date, or that it was left out because of obvious measurement error (e.g., the value received from the lab was “-10”, which would be a mistake since nobody has a negative HDL cholesterol level). The N of the dataset as a whole would still be technically 9 in a situation like this, with only one missing cell on the row in question. But the software would typically calculate associations for that variable (HDL cholesterol) based on a sample of 8.

Calling self-experimentation N=1 is misleading

Calling self-experimentation “N=1”, meaning that the results of self-experimentation are not a good basis for generalization, is very misleading. But there is a twist. Those results may indeed not be a good basis for generalization to other people, but they provide a particularly good basis for generalization for you. It is often much safer to generalize based on self-experimentation, even with small samples (e.g., N=9).

The reason, as I pointed out in this interview with Jimmy Moore, is that data about oneself only tends to be much more uniform than data about a sample of individuals. When multiple individuals are included in an analysis, the number of sources of error (e.g., confounding variables, measurement problems) is much higher than when the analysis is based on one single individual. Thus analyses based on data from one single individual yield results that are more uniform and stable across the sample.

Moreover, analyses of data about a sample of individuals are typically summarized through averages, and those averages tend to be biased by outliers. There are always outliers in any dataset; you might possibly be one of them if you were part of a dataset, which would render the average results at best misleading, and at worst meaningless, to you. This is a point that has also been made by Richard Nikoley, who has been discussing self-experimentation for quite some time, in this very interesting video.

Another person who has been talking about self-experimentation, and showing how it can be useful in personal health management, is Seth Roberts. He and the idea of self-experimentation were prominently portrayed in this article on the New York Times. Check this video where Dr. Roberts talks about how he found out through self-experimentation that, among other things, consuming butter reduced his arterial plaque deposits. Plaque reduction is something that only rarely happens, at least in folks who follow the traditional American diet.

HCE generates coefficients of association and graphs at the click of a button, making it relatively easy for anybody to understand how his or her health variables are associated with one another, and thus what modifiable health factors (e.g., consumption of certain foods) could be causing health effects (e.g., body fact accumulation). It may also help you identify other, more counter-intuitive, links; such as between certain thought and behavior patterns (e.g., wealth accumulation thoughts, looking at the mirror multiple times a day) and undesirable mental states (e.g., depression, panic attacks).

Just keep in mind that you need to have at least some variation in all the variables involved. Without variation there is no correlation, and thus causation may remain hidden from view.