Calling it a dependent variable does not make it so #MRX

Regression analysis is an interesting concept.

You start with the most important variable, the one you want to predict, the one you call the dependent variable.

Then, you collect together a bunch of other variables. If you’re lucky, you’ve got a dataset with hundreds of other variables to pick from. These are the ones you call dependent variables.

With the variables outlined, you can now say:

The dependent variable depends on the independent variables
The independent variables predict the dependent variable
The dependent variable is influenced by the independent variables
The dependent variable is explained by the independent variables

But just because you call a variable a dependent doesn’t make it a dependent variable. As with correlation, a regression does not signify causation. Regression is a slightly more complicated, slightly more cool, slightly more interesting correlation.

If you wanted to, you could call any variable a dependent variable. You could easily run a regression model to find the co-efficients behind any of these equations containing both a dependent and independent variables:

Education is a function of salary + number of cars you own + square footage of your residence
Attitude towards the environment is a function of how much meat you eat + the number of fur coats you own
Your favorite colour is a function of the colour of your car + the colour of your living room + the colour of your front door

In each case, is the dependent variable truly caused by the independent variables? Absolutely not. Is there a scientific relationship between the variables dependent and independent variables? Absolutely. It’s purely correlational, but it is most definitely a clear relationship.

So the next time you plop variables into one or the other side of an equation, consider whether it is a predictive or correlational model because the model won’t tell you either way. So in the immortal words of Jean Luc Picard, just saying it won’t make it so.