Regression analysis is one of the most widely used methods of statistical analysis, and it is often used in both academic and applied contexts. Why is it happening? partly because it is one of the best tools available for investigating interrelationships between variables. In addition, it enables us to predict outcomes based on data we have not yet seen. The great majority of people have taken a statistics course or two and can easily construct and perform simple regression analyses. I expect that if you provided the average person any model output, they could figure out what the y-intercept and the variable coefficients were. While the aforementioned data points are crucial, what about the rest of the information that is generated everytime a model is run?
Is there anything else we need to think about? What insights about ourselves may we get from the other virtues?
Instead of focusing on the procedural details of how to calculate each metric, we will instead take a deep dive into each measure with the goal of actually understanding what each metric is telling us about the model. The goal of this in-depth analysis is to have a firm grasp of what each statistic reveals about the model.
The Outcome of a Linear Regression Analysis
To get started, a basic regression r linear regression model will be built with the independent variable being points and the dependent variable being salary. The outputs of this regression model are as follows:
Now that we have a model and its output, let’s break it down component by component to see how each part sheds light on the model’s performance. When we’ve finished, our assessment of the model will be more reliable.
In the call section, we can see the formula that R used to fit the regression model. Our focus is on using points from the NBA dataset as the independent variable to analyse salary as the dependent variable.
The residuals are determined by subtracting the observed data from the predicted data. By subtracting the actual pay values from the model-predicted pay values, we get the same results.
How are we to interpret this information now? If we give the matter some serious attention, we’ll notice that the best-case scenario would have our median value centred about zero. This would mean that our model was successful in predicting values at both the extremes of our data set and that our residuals were well balanced. Based on the data shown above, it seems that our distribution is somewhat biassed to the right. Our model fares worse at predicting incomes in the higher income brackets than it does in the lower income brackets, as seen by these findings. To visualise this, we may create a quantile-quantile plot. The following graphic shows that there are extremes at both ends of the spectrum, with the top-end outliers seeming to be more severe than their bottom-end counterparts. In aggregate, the residuals seem to follow a fairly standard distribution.
A player’s salary in the NBA may be estimated in part by looking at how many points he scores in a season, but this metric alone is insufficient to provide a reliable estimate.