-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MemoryError when fitting on sparse X as apparently a Hessian matrix is being instantied ? #485
Comments
Hi @mathurinm! Unfortunately, glum is not the right package to estimate this type of model. Are you sure your design matrix is correctly specified? You are trying to train a model containing potentially 4.3 million coefficients with fewer than 20k observations (according to the description of the dataset here). You will probably need to transform your design matrix to reduce the number of coefficients before using glum for this purpose. In general, glum is fine-tuned to solve problems involving many observations (potentially millions) with a number of parameters significantly lower than this (a few thousands at most). |
I just wanted to leave a comment saying that the implementation of a solver more appropriate for this purpose would not be a huge undertaking and 75% of the pieces necessary already exist within glum. |
@MarcAntoineSchmidtQC I may be wrong in the way I do it, but I am trying to fit a Lasso. Statistical analysis of the Lasso shows that it recovers good variables even when the number of features is exponential in the number of variables. Sparse solvers such as Celer, sklearn, blitz or glmnet handle this problem. I saw your impressive benchmarks results and tried to reproduce them on large scale data. |
requirement:
pip install libsvmdata
-- python utility to download data from LIBSVM website; the first time downloading the data may take 2 mins.The following script causes a MemoryError on my machine:
output:
ping @QB3
The text was updated successfully, but these errors were encountered: