Regression analysis is an interesting concept.

You start with the most important variable, the one you want to predict, the one you call the dependent variable.

Then, you collect together a bunch of other variables. If you’re lucky, you’ve got a dataset with hundreds of other variables to pick from. These are the ones you call dependent variables.

With the variables outlined, you can now say:

*The dependent variable depends on the independent variables*

*The independent variables predict the dependent variable*

*The dependent variable is influenced by the independent variables*

*The dependent variable is explained by the independent variables*

But just because you call a variable a dependent doesn’t make it a dependent variable. As with correlation, a regression does not signify causation. Regression is a slightly more complicated, slightly more cool, slightly more interesting correlation.

If you wanted to, you could call any variable a dependent variable. You could easily run a regression model to find the co-efficients behind any of these equations containing both a dependent and independent variables:

*Education is a function of salary + number of cars you own + square footage of your residence
Attitude towards the environment is a function of how much meat you eat + the number of fur coats you own
Your favorite colour is a function of the colour of your car + the colour of your living room + the colour of your front door*

In each case, is the dependent variable truly caused by the independent variables? Absolutely not. Is there a scientific relationship between the variables dependent and independent variables? Absolutely. It’s purely correlational, but it is most definitely a clear relationship.

So the next time you plop variables into one or the other side of an equation, consider whether it is a predictive or correlational model because the model won’t tell you either way. So in the immortal words of Jean Luc Picard, just saying it won’t make it so.

###### Related articles

- Regression Fantasies: Part III (statswithcats.wordpress.com)

My favourite example is the blood alcohol test. We can use the amount of alcohol in somebody’s blood, via regression, to predict how much alcohol they drank. However, the alcohol in the blood did not cause drinking, the drinking caused the blood alcohol.

Another nice example, and one that gets very controversial when applied to climate estimates thousands of years ago, is that regression can be used to estimate the weather from the size of tree rings, the wider the rings the better the weather. In this particular regression, the size of the tree rings is treated as the independent variable, and the weather as the dependent variable. But, the tree rings did not cause the weather.

However, calling it the dependent variable does make it the dependent variable, at least to scientists. However, the scientists do not mean what the person in the street means by dependent variable. The independent variables are the ones we are going to vary in the test, and the associated movement in the dependent variable is what we want to find out – it is dependent is the sense of being the variable of interest, but, as you say, no causality is implied.