Skip to content

RFC: marking private and public APIs #48772

Closed
@Roger-luo

Description

@Roger-luo

This is based on #42117 but is a bit different proposal that might be breaking for Julia 1.x based on different considerations thus I decided to open a separate issue to make reading a bit easier.

In short, I'd like a similar macro/keyword that marks an object:

  • public, accessible via <module>.<name> or <name> after explicitly imported by import/using
  • unmarked names are considered private, which means one cannot access them at all from outside

the syntax may vary due to compatibility concerns, e.g

Keep things non-breaking

because we currently do not hide things by default, we will need a marker for things we want to make private

@public <something>
@private <something>

Breaking but simpler

if we choose something more breaking, it could be a public/export keyword combined with #39235, where export/public could just mean "public" accessible APIs, and things are not accessible from the outside module directly.

On the other hand, using XXX and import XXX needs to be private by default, unless marked with export/public so we can prevent someone using a long module path to access some deep dependencies from a package (see the point 2 in Why).

Additionally

I'd like a macro that marks certain API's stability status, this is something I find quite nice from the rust community, where they have things like

#[cfg_attr(not(test), rustc_diagnostic_item = "IpAddr")]
#[stable(feature = "ip_addr", since = "1.7.0")]
#[derive(Copy, Clone, Eq, PartialEq, Hash, PartialOrd, Ord)]
pub enum IpAddr {
    /// An IPv4 address.
    #[stable(feature = "ip_addr", since = "1.7.0")]
    V4(#[stable(feature = "ip_addr", since = "1.7.0")] Ipv4Addr),
    /// An IPv6 address.
    #[stable(feature = "ip_addr", since = "1.7.0")]
    V6(#[stable(feature = "ip_addr", since = "1.7.0")] Ipv6Addr),
}

It marks certain things' stability at the same place where it's defined programmatically so that a linter can warn users based on their current toolchain version.

In Julia, we only have a poor docstring saying "use at your own risk", which is something I think could be improved by this. Having this shouldn't break, it could be one macro marks struct fields and function arguments about their availability and stability. This could make the experimental feature easier to provide and play with in downstream.

Why?

#42117 has overlapped with this proposal thus the reasons @DilumAluthge listed also apply here, I'd like to provide a few other motivations tho

  1. potential reduction on pkgimage/cache/sysimg size, because many functions in a package are used by the package rather than the downstream user, the methods corresponding to these functions that are not used by downstream can be deleted in the downstream package cache/sysimg in principle, but we currently cannot do it because users are allowed access them by a deep chain of module path (e.g A.B.C.<a private function>. I think quite a few AOT languages have a similar mechanism to mark things private so they can be tree-shake away. I'm not an expert on package cache or system images, but I think this might be one of the low-hanging fruit that can be improved by changing the semantics a bit, so please correct me if this is not what can be improved in Julia's case.

A demonstration of this can be resolving the issue that https://expronicon.rogerluo.dev/intro/bootstrap and https://github.com/thautwarm/DevOnly.jl trying to solve automatically:

MLStyle is an extreme example of this, what @match generates only depends on Base, but the only reason why downstream packages will still load MLStyle is only that users are allowed to access MLStyle's @match via AAA.BBB.CCC.@match if CCC contains using MLStyle: @match, and one can get a rough estimation in this extreme case on loading time improvements

julia> @time using MLStyle
  0.041998 seconds (148.07 k allocations: 10.553 MiB)

julia> @time using ExproniconLite
  0.020682 seconds (53.83 k allocations: 3.991 MiB)

here you can see even Expronicon depends on MLStyle, by removing MLStyle from loading entirely we can get twice faster loading time than depending on MLStyle.

  1. prevent downstream users hacking unstable things, this is more or less an effect of having 1, but I want to argue it has certain advantages, one example is the non-public APIs from Base, we currently have a very vague way of distinguishing them by whether the function has a docstring or not, IMO even functions only made for developers deserves a docstring in the dev docs. If we have a mark to distinguish such APIs, then the reference page can be generated automatically for manual and dev docs of the corresponding functions. And APIs like Base.print_matrix etc. can be more clear to people that whether they should use and whether this is maybe broken in future versions.

But I'd like to mention one real-world example which is the usage of DiffEq, most of the time one only uses one ODE solver from that giant package, thus in principle downstream package should not be loading the whole thing, but that ODE solver code only. But because currently you are allowed to access other solvers by something like MyPackage.DiffEq.OrdinaryDiffEq.Vern8 we will have to load the whole thing which is super slow.

Ideally, the compiler should cache the corresponding solver code into the downstream package image and only load that piece when using MyPackage without explicit using DiffEq. But I think this is not allowed because users can technically do MyPackage.DiffEq to access anything inside DiffEq

Metadata

Metadata

Assignees

No one assigned

    Labels

    designDesign of APIs or of the language itself

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions