Visualizing Lasso Polytope Geometry
June 8, 2014
Some recent research about the lasso exploits a beautiful geometric picture: Suppose you fix the design matrix X and the regularization parameter $\lambda$. For a particular value of y, the n-dimensional response variable, you can then solve the lasso problem and examine the signs of the coefficients. Now if you partition n-dimensional space based upon the signs of the coefficients you'd get if you solved the lasso problem at each value of y, then you'll find that each of the resulting regions are convex polytopes. This is kind of a mouthful, so I made the visualization below to illustrate this partitioning. You can drag x1, x2, and x3 to rotate them, and slide the slider to increase or decrease $\lambda$.
For this visualization, the design matrix X has two rows, (so there are two observations and y lives in the two-dimensional plane). X also has three columns, (variables), called x1, x2, and x3, each normalized to have length one. Each of these variables, (ie columns of X), also lives in the plane, and is drawn above. I've labeled each of the regions in the plane with the sign of the coefficients that would result from fitting the lasso with y in that region, and colored them based upon how many variables would be the model. For example, if you fit the lasso using any y in a region labeled "-+0", then the resulting x1 coefficient will be negative, the x2 coefficient will be positive, and the x3 coefficient will be zero.
To be concrete, the version of the lasso that I'm using here is the one that solves this optimization problem: $$ \min_{\beta} \frac{1}{2}||y - X\beta||_2^2 + \lambda ||\beta||_1 $$
If you play around with this picture long enough, you'll find a number of beautiful features of this picture. Each of the ones I describe below also generalizes to more observations, more variables, and variables without norm one:
Varying $\lambda$
When $\lambda$ is small, notice that almost the whole space becomes blue, (regions with two variables in the model). When $\lambda$ is large, the whole space becomes white, (the null model with no variables in it). This is this picture's representation of the fact that you fit larger models with smaller values of $\lambda$.
Characterizing the Null Model Polytope
The white central polytope is the set of y's resulting in coefficients with all zeros. Notice that each of its faces is orthogonal to one of the x's, and a distance of exactly $\lambda$ away for the origin. You can see this most clearly when $\lambda=1$; at this point, the faces all touch one of the x vectors (or one of the negative x vectors), since each of the x's is normalized to have length one. This feature is used by lasso solvers to determine the value of $\lambda$ when the first variable enters the model; (it's just the largest absolute inner product between y and any of the columns of X).
"Dimensions" of the Polytopes
The blue regions, (those for which the lasso selects two variables), are two-dimensional pie-slices. If you squint, (or make $\lambda$ small), you'll notice that the red regions, (those for which the which the lasso selects just one variable), look approximately like 1-dimensional lines. And of course, the null model is a bounded polytope, which is kind of like a "0-dimensional point", (so long as the variables span n-dimensional space). So if you squint, the polytope associated with a k-dimensional model is "sort of k-dimensional".
This pattern holds more generally: You can show that for any point y in the region associated with a model of size k, if you travel away from y in the cone generated by the (signed) variables in the model, then you'll still stay in that region. For example, if you start at any point in a red region, and walk in the direction of the associated x, you'll stay in that region. And for any point in a blue region, if you walk in either of the directions indicated by the active variables, you'll also stay in that region.
The Projection Picture
There is substantial structure associated with the model-regions. To see this, pick any point y in the plane, and project it onto the null-model polytope. (That is, find the nearest point on the white null polytope). You'll find that all of the points in a particular model-region project onto the same face of the null-model polytope. Moreover, the dimension of this face is two minus the number of nonzero coefficients in the model, (or more generally, the number of observations minus the number of nonzero coefficients), and the face is formed from the intersection of facets associated with the variables in the model. Tibshirani and Taylor (2012) used this projection picture as a critical component of their proof that the degrees of freedom of the lasso, for a fixed value of $\lambda$, is exactly equal to the expected number of variables in the model.
Sets of Accessible Models
In a forthcoming paper, my friend Amir Sepehri and I also noticed that this projection picture admits a very beautiful description of which (signed) variables can possibly be in a model together: only those which form a face of the convex hull of all the variables and their negations. Using the upper bound theorem for polytopes, we were able to bound the number of possible lasso models. As a simple consequence of this bound, we were also able to prove model-selection inconsistency for the lasso when the number of variables in the true model exceeds half the number of observations, simply because with extremely high probability there is no y at all in n-dimensional space that results in a lasso fit with the signs of the true model.