Skip to main content
6 votes

For a square matrix of data, I achieve $R^2=1$ for Linear Regression and $R^2=0$ for Lasso. What's the intuition behind?

A few things going on here: Your matrix is 100x100. So you have no degrees of freedom left in a linear model, which will cause $R^2=1$. See this post. You use random numbers. Thus, they should make ...
Peter's user avatar
  • 7,866
4 votes

Why does Lasso behave "erratically" when the number of features is greater than the number of training instances?

When p > n, the LASSO model can only sustain up to n variables (this can be proven using linear algebra, the rank of the data matrix in particular), leaving at least p - n variables out (some that ...
aranglol's user avatar
  • 2,236
4 votes

Normalisation results in R^2 score of 0 - Lasso regression

Standardizing/normalizing is generally the right thing to do, but it will make little/no difference with just one independent variable if you also adjust the regularization strength. With more than ...
Ben Reiniger's user avatar
  • 12.7k
3 votes

Group lasso and feature selection

Presumably, you need a sparse group logistic regression model to perform feature selection while considering the binary response. skglm is a new modular, scikit-...
Badr MOUFAD's user avatar
3 votes
Accepted

Elegant way to plot the L2 regularization path of logistic regression in python?

sklearn has such a functionality already for regression problems, in enet_path and lasso_path. There's an example notebook here....
Ben Reiniger's user avatar
  • 12.7k
3 votes

How do standardization and normalization impact the coefficients of linear models?

When you have a linear regression (without any scaling, just plain numbers) and you have a model with one explanatory variable $x$ and coefficients $\beta_0=0$ and $\beta_1=1$, then you essentially ...
Peter's user avatar
  • 7,866
3 votes
Accepted

Need advice regarding cross-validiation to obtain optimal lambda in Lasso

Welcome to DS.SE @h_ihkam ! So how can I decide on the search range? What is the best practice? Please provide me with some guidance. Good questions !! Choosing the Optimal Lambda in LASSO Using ...
Robert Long's user avatar
  • 3,238
2 votes

how Lasso regression helps to shrinks the coefficient to zero and why ridge regression dose not shrink the coefficient to zero?

These diagrams show the "constrained" version of lasso/ridge, in which you minimize the pure loss function subject to a constraint $\|\beta\|_1\leq t$ or $\|\beta\|_2\leq t$. (Another ...
Ben Reiniger's user avatar
  • 12.7k
2 votes
Accepted

What is the meaning of the sparsity parameter

When we implement penalized regression models we are saying that we are going to add a penalty to the sum of the squared errors. Recall that the sum of squared errors is the following and that we are ...
Ethan's user avatar
  • 1,657
2 votes

Lasso regression not getting better without random features

In answer to your first question: The reason that your RMSE proceeded to increase as you increased the strength of your regularization (the value of $\lambda$) can be explained by reviewing the ...
Ethan's user avatar
  • 1,657
2 votes
Accepted

Do I have to remove features with pairwise correlation even if I am doing a regularized logistic regression?

Yes the L1 regularization will shrink the irrelevant feature coefficients to zero and hence it doesn't require feature selection. In fact it IS a commonly used feature selection technique. So ...
spectre's user avatar
  • 2,223
2 votes
Accepted

What's the correct cost function for Linear Regression

Interesting question. I'd say it is correct not to divide, due to the following reasoning... For linear regression there is no difference. The optimum of the cost function stays the same, regardless ...
MB-F's user avatar
  • 286
2 votes
Accepted

Difference between PCA and regularisation

Lasso does feature selection in the way that a penalty is added to the OLS loss function (see figure below). So you can say that features with low "impact" will be "shrunken" by ...
Peter's user avatar
  • 7,866
1 vote

Lack of standardization in Kaggle jupyter notebooks when using lasso/ridge?

Kaggle is a crowd source platform with no quality control. It is to be expected that there will be deviations from best practices.
Brian Spiering's user avatar
1 vote

How to compare between two methods of multivariate to filling NA

You don't at this stage. Train a few models with each method and compare.
lpounng's user avatar
  • 1,177
1 vote
Accepted

Why is gridsearchCV.best_estimator_.score giving me r2_score even if I mentioned MAE as my main scoring metric?

This is the default behavior for any Scikit-learn regressor, and as far as I know, it cannot be modified. So for regressors, the score method will return the $R^2$ ...
Multivac's user avatar
  • 3,139
1 vote

Is it possible to explain why Lasso models eliminated certain coefficient?

Have a look at "Introduction to Statistical Learning" (Chapter 6.2.2). The Lasso adds an aditional penalty term to the original OLS penalty. In addition to the residual sum of squares (RSS, ...
Peter's user avatar
  • 7,866
1 vote
Accepted

Accessing regression coefficients when using MultiOutputRegressor

Instead of using the estimator attribute you should be using the best_estimator attribute, after which you can access the ...
Oxbowerce's user avatar
  • 8,492
1 vote

What (linear) model is common practice to use on sample size of 500 with 26 features?

The predictive power of a model is highly contingent on the data generating process and it is ex ante hard to tell what will work best (especially with limited information about the data as in this ...
Peter's user avatar
  • 7,866
1 vote

How to set coefficient limit in lasso regression in Python?

Scikit-learn (which I'm assuming you're using) does not allow you to constrain the coefficients in such a way (at most you can constrain them to all be positive with ...
mdgrogan's user avatar
1 vote

How to remove features from a sklearn pipeline after it has already been fitted?

Standard Scalar trained on 30 features so it expects 30 features only. One simple hack you can do is, you can create a new ...
Uday's user avatar
  • 576
1 vote
Accepted

Lasso Regression for Feature Importance saying almost every feature is unimportant?

Change (search over) the penalty parameter of lasso. FinalRevenue = RevenueSoFar is a good baseline "model," but hopefully your other features can ...
Ben Reiniger's user avatar
  • 12.7k
1 vote

Interpreting machine learning coefficients

Neural Networks are notoriously good at performance and bad at interpretability, i.e. it's very difficult (almost impossible) to explain why a particular prediction was made. It's even more difficult ...
Erwan's user avatar
  • 26.2k
1 vote

What is the meaning of the sparsity parameter

@Ethan is correct about the formulation of the lasso penalty, and I think it's particularly important to understand it in that form (for one thing, because that same penalty can work with other models ...
Ben Reiniger's user avatar
  • 12.7k
1 vote

how Lasso regression helps to shrinks the coefficient to zero and why ridge regression dose not shrink the coefficient to zero?

This StatQuest video does a fantastic job of explaining in simple terms why this is the case.
Oliver Foster's user avatar
1 vote

regarding lasso.score in lasso modeling using scikit-learn

R^2 is a statistical measure of how close the data are to the fitted regression line. It does this by seeing percentage of the variance of dependent varible that's explained by independent variable. ...
prashant0598's user avatar
  • 1,561
1 vote

How do standardization and normalization impact the coefficients of linear models?

I believe with scaling, the coeff. are scaled by the same level i.e. Std. Deviation times with Standardization and (Max-Min) times with Normalization If we look at all the features individually, we ...
10xAI's user avatar
  • 5,839
1 vote
Accepted

How is learning rate calculated in sklearn Lasso regression?

With sklearn you can have two approaches for linear regression: 1) LinearRegression object uses Ordinary Least Squares (OLS) solver from scipy, as Learning rate (...
Carlos Mougan's user avatar
1 vote
Accepted

When should we start using stacking of models?

Stacking is going to help most when individual models capture unique characteristics of the data. It is often the case that different architectures perform similarly, if somewhat differently, on the ...
HEITZ's user avatar
  • 911
1 vote

LASSO remaining features for different penalisation

Lambda is a tuning parameter („how much regularisation“, I think called alpha in sklearn) and you would choose lambda so that you optimise fit (e.g. by MSE). You can do this by running cross ...
Peter's user avatar
  • 7,866

Only top scored, non community-wiki answers of a minimum length are eligible