Skip to content

feat: add lazy loading #81

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 15, 2025
Merged

feat: add lazy loading #81

merged 8 commits into from
Apr 15, 2025

Conversation

serramatutu
Copy link
Collaborator

@serramatutu serramatutu commented Apr 14, 2025

Summary

This PR introduces new functionality to allow models to lazy load certain large fields if users want to. This is only used in Metric.dimensions, Metric.entities and Metric.measures for now, but can easily be expanded to other fields in the future if we deem necessary.

There are no breaking changes.

End-user API

This is what it looks like from an end user perspective:

from dbtsl import SemanticLayerClient

def main():
    sl = SemanticLayerClient(..., lazy=True)
    with sl.session():
        metrics = sl.metrics()
        metric = metrics[0]
        assert metric.dimensions == []
        loaded = metric.load_dimensions()
        assert len(loaded) > 0
        assert loaded == metric.dimensions
        
main()

The asyncio equivalent is:

import asyncio
from dbtsl.asyncio import AsyncSemanticLayerClient

async def main():
    sl = AsyncSemanticLayerClient(..., lazy=True)
    async with sl.session():
        metrics = await sl.metrics()
        metric = metrics[0]
        assert metric.dimensions == []
        loaded = await metric.load_dimensions()
        assert len(loaded) > 0
        assert loaded == metric.dimensions
        
asyncio.run(main())

Base implementation

The bulk of the work happened in base.py, and metric.py uses the new functionality in the Metric model (more on that later). Other changes are tests and "plumbing" of the lazy paramater which needs to get passed around.

I added a _lazy_loadable_fields property that gets added to each subclass of GraphQLModelMixin on GraphQLModelMixin._register_subclasses(). When registering a new subclass, it will iterate over all fields in that subclass and determine whether the field is lazy-loadable. It makes that decision by:

  • Checking if the field has NOT_LAZY in the metadata. If that's true, the field is not added to _lazy_loadable_fields.
  • If the field is not Optional[...], Union[...] or List[...], then it's also considered not lazy-loadable.
  • If the inner argument of the type annotation is also a GraphQLModelMixin, then it's considered lazy-loadable and is added to _lazy_loadable_fields.

This makes it possible for us to tag "light" fields as NOT_LAZY [1], like I did for saved queries. In that case, I believe it's better to just fetch everything at once to minimize round trip time. I did this mainly because otherwise it would be pretty annoying for the user in some cases where our API has multiple levels of nested objects like saved_query.query_params.metrics.name.

Then, I made a minor modification to GraphQLModelMixin.gql_fragments(), which now accepts a lazy: bool param. If lazy=True, it will omit lazy fields from the fragment definition.

Finally,GraphQLModelMixin._register_subclasses() will create load_{field}() methods in the object, which will wrap a _load_{field_name} method with a sync/async loader that will also set the field after the loader returns. These _load_{field_name} methods will be specific of each model implementation, which can now use self._client to make requests.

To test all this, I added some tests on test_base.py to make sure that the _lazy_loadable_fields property gets initialized properly for subclasses, and that the GraphQL fragments contain the expected GraphQL text and dependencies depending if we're usinglazy or not.

I also added some sanity check tests to assist developers in catching bad implementations in unit tests instead of having to wait for integration tests to fail with runtime errors. It ensures every lazy loadable field needs a default value (like an empty list, or None etc) and a corresponding _load_{field} method.

[1] It is perfectly valid to ask why I made the default be lazy, while NOT_LAZY is opt-in, and not the other way around. My rationale was that if we ever add new fields to the API, we want to autodetect if they are supposed to be lazy (i.e return a list), and make developers have to opt-out if they think it's unnecessary. This way we hopefully won't end up with slow fields in the future.

Metric implementation

I made dimensions, entities and measures lazy. I had to create two new classes: SyncMetric and AsyncMetric, which are only for annotating the return type of load_{field} depending on the client. The sync clients return SyncMetric while the async clients return AsyncMetric. This is all for typing only, and at runtime everything is just regular Metric. I made both of these classes inherit from ABC to make sure users don't use them directly, and I added docs to warn them that they shouldn't.

Integration testing

I added a new test for lazy loading dimensions, entities and measures of a metric in our integration test suite. The regular "eager" tests continue to work normally.

Docs

I added a new example to examples/, and I added a new section to our README briefly explaining when/why to use lazy-loading.

@serramatutu serramatutu requested a review from a team as a code owner April 14, 2025 15:41
@@ -20,7 +20,7 @@ class DimensionType(Enum, metaclass=FlexibleEnumMeta):
)


@dataclass(frozen=True)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since models can change now after they're lazy loaded, dataclasses aren't frozen anymore.

assert dims == metric.dimensions
assert model_list_equal(dims, metric.dimensions)

with subtests.test("measures"):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were missing an integration test for measures, oops...

}
notLazyOptionalA {
...fragmentA
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note how the .gql_fragments() method does not return lazyA nor manyA if lazy=True

@serramatutu
Copy link
Collaborator Author

@mirnawong1 I think we might need to update our docs for this once it gets merged and released!

@mirnawong1
Copy link
Contributor

no worries, thanks for the tag @serramatutu !

Copy link
Contributor

@DevonFulcher DevonFulcher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💥 Awesome stuff!

This commit introduces lazy fetching of GraphQL fields. Now, the
`GraphQLFragmentMixin.get_fragments()` method has a new `lazy` argument,
which will make it skip certain fields that are considered "large".

A field is lazy-loadable when:
1. It is a `List`, `Union` or `Optional` of `GraphQLFragmentMixin`.
2. It is not marked as `NOT_LAZY`.

This will make a difference when fetching things like metrics. In
"eager" mode, the client will fetch all subfields of each metrics,
including dimensions and entities, which makes the response potentially
very large. Now, if the client is "lazy", the `.metrics()` method will
only return the metrics themselves, and the `dimensions` and `entities`
fields will be empty.

Certain things like saved query exports don't need lazy fields as their
child objects are not large, so it's worth it to just fetch everything
in one go.

I added two tests for this functionality. One is to make sure that the
`get_fragments()` method returns the expected GraphQL fragments for lazy
fields. The other is to ensure that all lazy-loadable fields have a
default value which can be used to initialize the property locally when
it's not initialized from server data.

In the next commit, I'll wire this through the client to make it
actually work in the APIs.
This commit adds a private `_client` property to all
`GraphQLFragmentMixin` which will get auto-populated by the loading
client. This is so that methods such as `Metric.load_dimensions()` will
be able to refer back to the client to make requests.
This is for type checking
@serramatutu serramatutu merged commit de0f9a1 into main Apr 15, 2025
4 checks passed
@serramatutu serramatutu deleted the serramatutu/lazy branch April 15, 2025 11:55
@mirnawong1
Copy link
Contributor

hey @serramatutu - looks like this is merged now and will work on a pr!

@serramatutu
Copy link
Collaborator Author

thank you @mirnawong1 !!!

@mirnawong1
Copy link
Contributor

hey @serramatutu , i've created a docs pr to add lazy loading to the python sdk docs - can you give a look when you have a chance just to make sure it's looking ok?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants