lazy module loading #128

CarloLucibello · 2022-05-10T10:00:42Z

Fix #126 using FileIO and a similar method for lazy loading modules.
Time-to-first-MNIST goes from 14.5s to 5.5s.

###### MASTER ##############

julia> @time using MLDatasets
 12.421854 seconds (19.82 M allocations: 1.144 GiB, 6.86% gc time, 61.53% compilation time)

julia> @time MNIST()
  1.965113 seconds (1.34 M allocations: 301.964 MiB, 13.25% gc time, 44.99% compilation time)
dataset MNIST:
  metadata    =>    Dict{String, Any} with 3 entries
  split       =>    :train
  features    =>    28×28×60000 Array{Float32, 3}
  targets     =>    60000-element Vector{Int64}

#### THIS PR #####################
julia> @time using MLDatasets
  3.757601 seconds (5.07 M allocations: 293.185 MiB, 5.89% gc time, 66.09% compilation time)

julia> @time MNIST()
  1.653665 seconds (1.33 M allocations: 299.937 MiB, 3.27% gc time, 48.02% compilation time)
dataset MNIST:
  metadata    =>    Dict{String, Any} with 3 entries
  split       =>    :train
  features    =>    28×28×60000 Array{Float32, 3}
  targets     =>    60000-element Vector{Int64}

CarloLucibello · 2022-05-10T15:38:13Z

@johnnychen94 I'm running in some world age issue, see the failing tests. Do you know how to fix them?

src/io.jl

src/imports.jl

codecov-commenter · 2022-05-14T08:43:55Z

Codecov Report

Merging #128 (53a6ab7) into master (e3cc061) will decrease coverage by 3.61%.
The diff coverage is 43.47%.

@@            Coverage Diff             @@
##           master     #128      +/-   ##
==========================================
- Coverage   33.26%   29.64%   -3.62%     
==========================================
  Files          38       39       +1     
  Lines        1527     1565      +38     
==========================================
- Hits          508      464      -44     
- Misses       1019     1101      +82

Impacted Files	Coverage Δ
src/MLDatasets.jl	`100.00% <ø> (ø)`
src/containers/tabledataset.jl	`0.00% <0.00%> (-89.29%)`	⬇️
src/datasets/graphs/reddit.jl	`4.25% <0.00%> (ø)`
src/datasets/text/udenglish.jl	`56.86% <ø> (ø)`
src/datasets/vision/cifar10.jl	`2.40% <0.00%> (-0.03%)`	⬇️
src/datasets/vision/cifar100.jl	`2.40% <ø> (ø)`
src/datasets/vision/emnist.jl	`10.00% <0.00%> (ø)`
src/datasets/vision/svhn2.jl	`1.96% <0.00%> (ø)`
src/abstract_datasets.jl	`22.58% <37.50%> (-1.56%)`	⬇️
src/require.jl	`37.50% <37.50%> (ø)`
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e3cc061...53a6ab7. Read the comment docs.

src/imports.jl

src/MLDatasets.jl

johnnychen94

The overall change looks good to me although I'm not confident that users want the @require error -- I understand how you want to reduce the package loading latency, just unsure if it's a good direction.

Said that, if we introduce the "hard" requirement on package loading, then we might need to add some explaination before/after the https://juliaml.github.io/MLDatasets.jl/stable/#Basic-Usage to tell users how to properly handle this.

Because the hard requirement will break people's code, this will need a 0.7.0 version bump.

CarloLucibello added 4 commits May 10, 2022 11:59

lazy module loading

4440c04

fix mutagenesis

c8ae46c

cleanup

8922cda

cleanup

163aab0

johnnychen94 reviewed May 10, 2022

View reviewed changes

src/io.jl Outdated Show resolved Hide resolved

johnnychen94 reviewed May 10, 2022

View reviewed changes

src/imports.jl Outdated Show resolved Hide resolved

lazy_import and require_import

62105e0

johnnychen94 reviewed May 16, 2022

View reviewed changes

src/imports.jl Outdated Show resolved Hide resolved

CarloLucibello added 4 commits May 17, 2022 07:49

use LazyModules.jl

1656650

cleanup

98d0ae1

cleanup

62afef3

fix crossreference

daef202

CarloLucibello requested a review from johnnychen94 May 17, 2022 06:14

johnnychen94 reviewed May 17, 2022

View reviewed changes

src/MLDatasets.jl Outdated Show resolved Hide resolved

johnnychen94 reviewed May 17, 2022

View reviewed changes

src/MLDatasets.jl Show resolved Hide resolved

johnnychen94 reviewed May 17, 2022

View reviewed changes

johnnychen94 mentioned this pull request May 17, 2022

The lazy Images package with dependencies loading delayed JuliaImages/Images.jl#1009

Open

CarloLucibello added 8 commits May 17, 2022 10:47

ImageShow and docs

32b473e

cleanup

3fb887e

cleanup

2841a8a

ud hash

3f7aecf

remove tabledataset

a9437ca

fix

834c367

docs

7304fda

fix

53a6ab7

CarloLucibello merged commit 0d26602 into master May 20, 2022

johnnychen94 mentioned this pull request May 24, 2022

read_csv needs DataFrames.jl which is a require import #133

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lazy module loading #128

lazy module loading #128

CarloLucibello commented May 10, 2022 •

edited

Loading

CarloLucibello commented May 10, 2022

codecov-commenter commented May 14, 2022 •

edited

Loading

johnnychen94 left a comment

lazy module loading #128

lazy module loading #128

Conversation

CarloLucibello commented May 10, 2022 • edited Loading

CarloLucibello commented May 10, 2022

codecov-commenter commented May 14, 2022 • edited Loading

Codecov Report

johnnychen94 left a comment

Choose a reason for hiding this comment

CarloLucibello commented May 10, 2022 •

edited

Loading

codecov-commenter commented May 14, 2022 •

edited

Loading