Multivariate Linear Regression

Explains why t-test and F-test can be used in multivariate linear regression analysis

T Miyamoto
7 min readDec 30, 2021

The present article is intended for undergraduates, masters and doctoral students, and researchers. In this article, we present a mathematical explanation of the background of t-test and F-test in multivariate linear regression analysis.

Problem Setup

Vector X of observed data points is modeled as a linear combination of matrix Z of variables (aka design matrix), parameters beta of the model, and noise w.

The least-squares estimator (OLS estimator) of the parameter vector beta are obtained by minimizing the norm of the noise vector. To do so, first partial differentiate the norm with the parameter, and then set it to zero. Then we can solve the following (q+1) equations:

Therefore, the least-squares estimators of the parameters are obtained.

Thus, we see that beta hat is an unbiased estimator of beta.

Note that we are using one of the Gauss-Markov assumptions here.

Cochran’s theorem

Now we’d like to learn a little bit about Cochran’s theorem here. It’ll come in handy later. Suppose the vector of random variables {V_j} have a multivariate normal distribution.

And we consider the norm of the vector, and we split it in the following way:

where matrix A_j is positive semi-definite. Then if the sum of the rank of A_j (over j=1, to j=m) is equal to n, the following statements are true:

Cochran’s theorem application 1

OK, so we’ve seen Cochran’s theorem. Now let us take advantage of this theorem to get a useful result — we begin by introducing a matrix P_z which is defined by:

The matrix P_z is actually a projection matrix:

Indeed, this means that

Next, we assume that the noise has a multivariate normal distribution:

Then, we divide the norm of the noise into 2 pieces:

The reason why the second term of the equation disappears is because we have the following relation:

Here, the matrix P_z is symmetric and has eigenvalues 0 or 1, meaning that P_z is positive semi-definite. Accordingly, I_n-P_z is positive semi-definite. P_z has a rank of (q+1), and the matrix (I_n-P_z) has a rank of n-(q+1). Thus, Cochran’s theorem assures us that

This tells us that the norm of the noise (divided by sigma²) has the chi-squared distribution with DoF n-(q+1).

Cochran’s theorem application 2

Now we introduce a matrix J_n whose elements are all unity (and accordingly the matrix is symmetric). We split the norm of the noise in the following way:

When it comes to their ranks, we have:

and the sum of them is n. Thus, Cochran’s theorem assures us that

T-test

Now that we’ve seen Cochran’s theorem, it’s time to move on to the t-test in multivariate linear regression. First let us get the covariance matrix of the OLS estimator.

For the sake of convenience, we set:

If we assume that the noise has a multivariate normal distribution, then the least-squares estimator has a multivariate normal distribution as well.
That is,

In general, if a random variable X has a multivariate normal distribution, its affine transformation Y=a+BX has a multivariate normal distribution as well.

If we consider the following affine transformation,

then we see that the i-th component of the OLS estimator has the normal distribution with mean beta_i and variance C_ii sigma² .

Then, we can standardize it:

Also, since we now assume that the noise has a multivariate normal distribution, Cochran’s theorem assures us that the norm of the noise has the chi-squared distribution with DoF n-(q+1).

It is known that if random variable X has the standard normal distribution and random variable W has the chi-squared distribution with DoF p, then X over the square root of (W divided by p) has the t distribution with DoF p. Thus, we obtain

and therefore we can conduct a t-test for each of the OLS parameters. Note that, in this case, the null hypothesis is beta_i=0.

F-test

In terms of the F-test for multivariate linear regression, the null hypothesis is all the parameters are zero except for beta_0:

Then we define X bar by the average of X_{j} over j=1,2, …, n:

Under the null hypothesis, X bar reduces to:

Then we can find that

This leads to

Then we use the fact that

and

to see

Therefore, we have:

When it comes to the norm of the noise, we’ve already have the relation:

In relation to this, consider || w hat ||², that is the norm of the difference between X and PzX, and in fact this is equal to || w ||² .

For this reason, w hat has the chi-squared distribution with DoF (n-q-1) as well.

Now let us define the regression sum of squares (SSR) and the error sum of squares (SSE) by:

By using these quantities, we can see that the ratio of SSR (divided by q) to SSE (divided by n-q-1) has the F distribution with DoFs (q, n-q-1).

Thus, we can conduct an F-test for the multivariate linear regression.

Coefficient of Determination

In the previous section, we introduced SSR and SSE to have the F statistic. Now we consider the sum of squared deviations from the mean (aka the adjusted total sum of squares, SST). The SST is defined by:

Actually, under the null hypothesis H_F0, we can express it as follows:

Considering this fact, we can split the SST into SSR and SSE:

Then we consider the ratio of SSR to SST:

This quantity R² represents how much of the SST is accounted for by the SSR. In other words, the higher this value is, the stronger the fit of the regression to the data.

The larger the number of parameters, the larger the value of the coefficient of determination. To take this into account, the adjusted R2, defined as follows, is often used:

--

--