Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lazy module loading #128

Merged
merged 17 commits into from
May 20, 2022
Merged

lazy module loading #128

merged 17 commits into from
May 20, 2022

Conversation

CarloLucibello
Copy link
Member

@CarloLucibello CarloLucibello commented May 10, 2022

Fix #126 using FileIO and a similar method for lazy loading modules.
Time-to-first-MNIST goes from 14.5s to 5.5s.

###### MASTER ##############

julia> @time using MLDatasets
 12.421854 seconds (19.82 M allocations: 1.144 GiB, 6.86% gc time, 61.53% compilation time)

julia> @time MNIST()
  1.965113 seconds (1.34 M allocations: 301.964 MiB, 13.25% gc time, 44.99% compilation time)
dataset MNIST:
  metadata    =>    Dict{String, Any} with 3 entries
  split       =>    :train
  features    =>    28×28×60000 Array{Float32, 3}
  targets     =>    60000-element Vector{Int64}

#### THIS PR #####################
julia> @time using MLDatasets
  3.757601 seconds (5.07 M allocations: 293.185 MiB, 5.89% gc time, 66.09% compilation time)

julia> @time MNIST()
  1.653665 seconds (1.33 M allocations: 299.937 MiB, 3.27% gc time, 48.02% compilation time)
dataset MNIST:
  metadata    =>    Dict{String, Any} with 3 entries
  split       =>    :train
  features    =>    28×28×60000 Array{Float32, 3}
  targets     =>    60000-element Vector{Int64}

@CarloLucibello
Copy link
Member Author

@johnnychen94 I'm running in some world age issue, see the failing tests. Do you know how to fix them?

@codecov-commenter
Copy link

codecov-commenter commented May 14, 2022

Codecov Report

Merging #128 (53a6ab7) into master (e3cc061) will decrease coverage by 3.61%.
The diff coverage is 43.47%.

@@            Coverage Diff             @@
##           master     #128      +/-   ##
==========================================
- Coverage   33.26%   29.64%   -3.62%     
==========================================
  Files          38       39       +1     
  Lines        1527     1565      +38     
==========================================
- Hits          508      464      -44     
- Misses       1019     1101      +82     
Impacted Files Coverage Δ
src/MLDatasets.jl 100.00% <ø> (ø)
src/containers/tabledataset.jl 0.00% <0.00%> (-89.29%) ⬇️
src/datasets/graphs/reddit.jl 4.25% <0.00%> (ø)
src/datasets/text/udenglish.jl 56.86% <ø> (ø)
src/datasets/vision/cifar10.jl 2.40% <0.00%> (-0.03%) ⬇️
src/datasets/vision/cifar100.jl 2.40% <ø> (ø)
src/datasets/vision/emnist.jl 10.00% <0.00%> (ø)
src/datasets/vision/svhn2.jl 1.96% <0.00%> (ø)
src/abstract_datasets.jl 22.58% <37.50%> (-1.56%) ⬇️
src/require.jl 37.50% <37.50%> (ø)
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e3cc061...53a6ab7. Read the comment docs.

Copy link
Member

@johnnychen94 johnnychen94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall change looks good to me although I'm not confident that users want the @require error -- I understand how you want to reduce the package loading latency, just unsure if it's a good direction.

Said that, if we introduce the "hard" requirement on package loading, then we might need to add some explaination before/after the https://juliaml.github.io/MLDatasets.jl/stable/#Basic-Usage to tell users how to properly handle this.

Because the hard requirement will break people's code, this will need a 0.7.0 version bump.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

using MLDatasets is very slow
3 participants