6
votes
For a square matrix of data, I achieve $R^2=1$ for Linear Regression and $R^2=0$ for Lasso. What's the intuition behind?
A few things going on here:
Your matrix is 100x100. So you have no degrees of freedom left in a linear model, which will cause $R^2=1$. See this post.
You use random numbers. Thus, they should make ...
4
votes
Why does Lasso behave "erratically" when the number of features is greater than the number of training instances?
When p > n, the LASSO model can only sustain up to n variables (this can be proven using linear algebra, the rank of the data matrix in particular), leaving at least p - n variables out (some that ...
4
votes
Normalisation results in R^2 score of 0 - Lasso regression
Standardizing/normalizing is generally the right thing to do, but it will make little/no difference with just one independent variable if you also adjust the regularization strength.
With more than ...
3
votes
Group lasso and feature selection
Presumably, you need a sparse group logistic regression model to perform feature selection while considering the binary response.
skglm is a new modular, scikit-...
3
votes
Accepted
Elegant way to plot the L2 regularization path of logistic regression in python?
sklearn has such a functionality already for regression problems, in enet_path and lasso_path. There's an example notebook here....
3
votes
How do standardization and normalization impact the coefficients of linear models?
When you have a linear regression (without any scaling, just plain numbers) and you have a model with one explanatory variable $x$ and coefficients $\beta_0=0$ and $\beta_1=1$, then you essentially ...
3
votes
Accepted
Need advice regarding cross-validiation to obtain optimal lambda in Lasso
Welcome to DS.SE @h_ihkam !
So how can I decide on the search range? What is the best practice? Please provide me with some guidance.
Good questions !!
Choosing the Optimal Lambda in LASSO Using ...
2
votes
how Lasso regression helps to shrinks the coefficient to zero and why ridge regression dose not shrink the coefficient to zero?
These diagrams show the "constrained" version of lasso/ridge, in which you minimize the pure loss function subject to a constraint $\|\beta\|_1\leq t$ or $\|\beta\|_2\leq t$. (Another ...
2
votes
Accepted
What is the meaning of the sparsity parameter
When we implement penalized regression models we are saying that we are going to add a penalty to the sum of the squared errors.
Recall that the sum of squared errors is the following and that we are ...
2
votes
Lasso regression not getting better without random features
In answer to your first question:
The reason that your RMSE proceeded to increase as you increased the strength of your regularization (the value of $\lambda$) can be explained by reviewing the ...
2
votes
Accepted
Do I have to remove features with pairwise correlation even if I am doing a regularized logistic regression?
Yes the L1 regularization will shrink the irrelevant feature coefficients to zero and hence it doesn't require feature selection. In fact it IS a commonly used feature selection technique. So ...
2
votes
Accepted
What's the correct cost function for Linear Regression
Interesting question. I'd say it is correct not to divide, due to the following reasoning...
For linear regression there is no difference. The optimum of the cost function stays the same, regardless ...
2
votes
Accepted
Difference between PCA and regularisation
Lasso does feature selection in the way that a penalty is added to the OLS loss function (see figure below). So you can say that features with low "impact" will be "shrunken" by ...
1
vote
Lack of standardization in Kaggle jupyter notebooks when using lasso/ridge?
Kaggle is a crowd source platform with no quality control. It is to be expected that there will be deviations from best practices.
1
vote
How to compare between two methods of multivariate to filling NA
You don't at this stage. Train a few models with each method and compare.
1
vote
Accepted
Why is gridsearchCV.best_estimator_.score giving me r2_score even if I mentioned MAE as my main scoring metric?
This is the default behavior for any Scikit-learn regressor, and as far as I know, it cannot be modified.
So for regressors, the score method will return the $R^2$ ...
1
vote
Is it possible to explain why Lasso models eliminated certain coefficient?
Have a look at "Introduction to Statistical Learning" (Chapter 6.2.2). The Lasso adds an aditional penalty term to the original OLS penalty. In addition to the residual sum of squares (RSS, ...
1
vote
Accepted
Accessing regression coefficients when using MultiOutputRegressor
Instead of using the estimator attribute you should be using the best_estimator attribute, after which you can access the ...
1
vote
What (linear) model is common practice to use on sample size of 500 with 26 features?
The predictive power of a model is highly contingent on the data generating process and it is ex ante hard to tell what will work best (especially with limited information about the data as in this ...
1
vote
How to set coefficient limit in lasso regression in Python?
Scikit-learn (which I'm assuming you're using) does not allow you to constrain the coefficients in such a way (at most you can constrain them to all be positive with ...
1
vote
How to remove features from a sklearn pipeline after it has already been fitted?
Standard Scalar trained on 30 features so it expects 30 features only. One simple hack you can do is, you can create a new ...
1
vote
Accepted
Lasso Regression for Feature Importance saying almost every feature is unimportant?
Change (search over) the penalty parameter of lasso. FinalRevenue = RevenueSoFar is a good baseline "model," but hopefully your other features can ...
1
vote
Interpreting machine learning coefficients
Neural Networks are notoriously good at performance and bad at interpretability, i.e. it's very difficult (almost impossible) to explain why a particular prediction was made. It's even more difficult ...
1
vote
What is the meaning of the sparsity parameter
@Ethan is correct about the formulation of the lasso penalty, and I think it's particularly important to understand it in that form (for one thing, because that same penalty can work with other models ...
1
vote
how Lasso regression helps to shrinks the coefficient to zero and why ridge regression dose not shrink the coefficient to zero?
This StatQuest video does a fantastic job of explaining in simple terms why this is the case.
1
vote
regarding lasso.score in lasso modeling using scikit-learn
R^2 is a statistical measure of how close the data are to the fitted regression line. It does this by seeing percentage of the variance of dependent varible that's explained by independent variable.
...
1
vote
How do standardization and normalization impact the coefficients of linear models?
I believe with scaling, the coeff. are scaled by the same level i.e. Std. Deviation times with Standardization and (Max-Min) times with Normalization
If we look at all the features individually, we ...
1
vote
Accepted
How is learning rate calculated in sklearn Lasso regression?
With sklearn you can have two approaches for linear regression:
1) LinearRegression object uses Ordinary Least Squares (OLS) solver from scipy, as Learning rate (...
1
vote
Accepted
When should we start using stacking of models?
Stacking is going to help most when individual models capture unique characteristics of the data. It is often the case that different architectures perform similarly, if somewhat differently, on the ...
1
vote
LASSO remaining features for different penalisation
Lambda is a tuning parameter („how much regularisation“, I think called alpha in sklearn) and you would choose lambda so that you optimise fit (e.g. by MSE). You can do this by running cross ...
Only top scored, non community-wiki answers of a minimum length are eligible
Related Tags
lasso × 46machine-learning × 17
regression × 16
linear-regression × 12
scikit-learn × 11
regularization × 11
ridge-regression × 10
python × 9
feature-selection × 7
r × 4
research × 3
elastic-net × 3
dataset × 2
pca × 2
feature-scaling × 2
sparsity × 2
neural-network × 1
time-series × 1
predictive-modeling × 1
data-mining × 1
data × 1
machine-learning-model × 1
logistic-regression × 1
data-science-model × 1
cross-validation × 1