Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splitting functionality into different Packages #4

Open
favba opened this issue Nov 14, 2024 · 5 comments
Open

Splitting functionality into different Packages #4

favba opened this issue Nov 14, 2024 · 5 comments

Comments

@favba
Copy link

favba commented Nov 14, 2024

Hi there!

The SmallVector provided by this package is something that I needed a while ago.
I didn't find your package back then and ended up reproducing the functionality here.
I remember I looked into every package in the JuliaArrays organization looking for this.
I just found it through a comment of yours in a post in the Julia discourse forum.

Maybe I just didn't do enough research, but I thought it might be a good idea to split the functionality provided by this package into separate packages (e.g., SmallVectors.jl, SmallDicts.jl, etc.) and make this one a meta package that exports all of them.
Then SmallVectors could live in the JuliaArray organization and maybe improve its discoverability.

@matthias314
Copy link
Owner

ended up reproducing the functionality

Interesting! Any lessons learnt that you can share? For example, I've seen that you use UInt8 to store the vector length. While I'm currently using Int, I've been wondering if some other type (say, UInt8 or Int16) would be better (i.e., saving memory without affecting performance).

split the functionality provided by this package into separate packages. [...] Then SmallVectors could live in the JuliaArray organization and maybe improve its discoverability.

I'm almost done with adding several new types (FixedVector, SmallDict, SmallSet plus their mutable counterparts including MutableSmallVector, see the dev documentation). Once this is done, I want to announce the package on Discourse. I hope that will make it better known.

@mbauman I'm pinging you as a member of @JuliaArrays. What would be your recommendation about splitting this package and/or putting it under the @JuliaArrays umbrella? I don't have any experience with this kind of question.

@favba
Copy link
Author

favba commented Nov 18, 2024

Interesting! Any lessons learnt that you can share? For example, I've seen that you use UInt8 to store the vector length. While I'm currently using Int, I've been wondering if some other type (say, UInt8 or Int16) would be better (i.e., saving memory without affecting performance).

Since UInt8 uses only 1 byte, it gives "Julia" more freedom when padding the ImmutableVector data, instead of being constrained by the 8 bytes from a Int field:

julia> using ImmutableVectors, SmallCollections

julia> im = ImmutableVector{7,Float16}(1,2,3,4)
4-element ImmutableVector{7, Float16}:
 1.0
 2.0
 3.0
 4.0

julia> sm = SmallVector{7,Float16}(im)
4-element SmallVector{7, Float16}:
 1.0
 2.0
 3.0
 4.0

julia> sizeof(im)
16

julia> sizeof(sm)
24

julia> imb = ImmutableVector{9,Bool}(1,0,0,0,1)
5-element ImmutableVector{9, Bool}:
 1
 0
 0
 0
 1

julia> smb = SmallVector{9,Bool}(imb)
5-element SmallVector{9, Bool}:
 1
 0
 0
 0
 1

julia> sizeof(imb)
10

julia> sizeof(smb)
24

julia> struct my_struct
           a::Float16
           l::Bool
       end

julia> ims = ImmutableVector{7,my_struct}(my_struct(1,1))
1-element ImmutableVector{7, my_struct}:
 my_struct(Float16(1.0), true)

julia> sms = SmallVector{7,my_struct}(ims)
1-element SmallVector{7, my_struct}:
 my_struct(Float16(1.0), true)

julia> sizeof(ims)
30

julia> sizeof(sms)
40

The difference in size only goes away when using elements of 8 bytes (in 64bit systems):

julia> im64 = ImmutableVector{6}(1.0,2.0,3.0)
3-element ImmutableVector{6, Float64}:
 1.0
 2.0
 3.0

julia> sm64 = SmallVector{6}(im64)
3-element SmallVector{6, Float64}:
 1.0
 2.0
 3.0

julia> sizeof(im64)
56

julia> sizeof(sm64)
56

However, an obvious disadvantage of using UInt8 is that you now have a hard definition of what "Small" means. It means < 256, which was more than necessary for my use case.
But I think it is a reasonable constraint, If I'm not mistaken it is around that tuple length that things start working strangely for tuples (like it starts getting heap allocated, or functions not getting inlined, etc, but I'm not sure of that).

If I have time later I'll try including the ImmutableVector in the benchmark you have to see if there is any difference in performance. And maybe include some benchmark for my typical usage of such vectors (constructing them in hot loops)

@matthias314
Copy link
Owner

In the meantime I added a type SmallLength for the length field of a SmallVector or MutableSmallVector, see 332135d in master. It's currently set to Int16 because on the computers I tried UInt8 was somewhat slower and Int8 might be too small. You can play around with it and see if you get similar results.

@favba
Copy link
Author

favba commented Nov 19, 2024

Cool, I'll try the master branch out.
It also occurred to me that another option is to make this a user choice, by parameterizing the n field.
I believe maintaining the same API would still be possible by defining a type alias.
Something like:

struct ImmutableVector{N,T,TL<:Integer} <: AbstractSmallVector{N,T}
    b::Values{N,T}
    n::TL
end

const SmallVector{N,T} = ImmutableVector{N, T, SmallLength}
export SmallVector

@matthias314
Copy link
Owner

parameterizing the n field

That looks to complicated to me for such a minor aspect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants