Skip to content

Commit 7261e4d

Browse files
authored
docs: Add remaining documentation for v2 (#202)
1 parent 17aa1a5 commit 7261e4d

File tree

11 files changed

+271
-11
lines changed

11 files changed

+271
-11
lines changed

dataframely/collection/collection.py

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -956,11 +956,6 @@ def scan_parquet(
956956
ValueError: If the provided directory does not contain parquet files for
957957
all required members.
958958
959-
Note:
960-
Due to current limitations in dataframely, this method actually reads the
961-
parquet file into memory if `"validation"` is `"warn"` or `"allow"`
962-
and validation is required.
963-
964959
Attention:
965960
Be aware that this method suffers from the same limitations as
966961
:meth:`serialize`.
@@ -1049,9 +1044,6 @@ def scan_delta(
10491044
ValueError:
10501045
If the provided source does not contain Delta tables for all required members.
10511046
1052-
Note:
1053-
Due to current limitations in dataframely, this method may read the Delta table into memory if `validation` is `"warn"` or `"allow"` and validation is required.
1054-
10551047
Attention:
10561048
Schema metadata is stored as custom commit metadata. Only the schema
10571049
information from the last commit is used, so any table modifications

docs/_templates/classes/error.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
:html_theme.sidebar_secondary.remove: true
2+
3+
.. role:: hidden
4+
5+
{{ name | underline }}
6+
7+
.. currentmodule:: {{ module }}
8+
9+
.. autoclass:: {{ name }}
10+
:members:
11+
:exclude-members: add_note, with_traceback
12+
:autosummary:
13+
:autosummary-nosignatures:

docs/api/errors/index.rst

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
======
2+
Errors
3+
======
4+
5+
.. currentmodule:: dataframely
6+
.. autosummary::
7+
:toctree: _gen/
8+
:template: classes/error.rst
9+
:nosignatures:
10+
11+
~exc.SchemaError
12+
~exc.ValidationError
13+
~exc.ImplementationError
14+
~exc.AnnotationImplementationError
15+
~exc.ValidationRequiredError

docs/api/index.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,15 @@ API Reference
2525

2626
columns/index
2727

28+
.. grid::
29+
30+
.. grid-item-card::
31+
32+
.. toctree::
33+
:maxdepth: 1
34+
35+
errors/index
36+
2837
.. grid-item-card::
2938

3039
.. toctree::

docs/conf.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -174,7 +174,9 @@ def hide_class_signature(
174174
return_annotation: str,
175175
) -> tuple[str, str] | None:
176176
if what == "class" and (
177-
name.endswith("FilterResult") or name.endswith("FailureInfo")
177+
name.endswith("FilterResult")
178+
or name.endswith("FailureInfo")
179+
or name.endswith("AnnotationImplementationError")
178180
):
179181
# Return empty signature (no args after the class name)
180182
return "", return_annotation

docs/guides/faq.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,3 +29,20 @@ class UserSchema(dy.Schema):
2929
"""Email must be unique, if provided."""
3030
return pl.col("email").is_null() | pl.col("email").is_unique()
3131
```
32+
33+
## How do I fix the ruff error `First argument of a method should be named self`?
34+
35+
If you are using [`ruff`](https://docs.astral.sh/ruff/) and introduce custom rules for your schemas, `ruff` will create
36+
the following linting error:
37+
38+
```
39+
N805 First argument of a method should be named `self`
40+
```
41+
42+
To fix this, you'll need to let `ruff` know that the `@dy.rule` decorator is applied to classmethods. This can easily
43+
be done by adding the following to your `pyproject.toml`:
44+
45+
```toml
46+
[tool.ruff.lint.pep8-naming]
47+
classmethod-decorators = ["dataframely.rule"]
48+
```

docs/guides/features/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,5 @@ data-generation
88
primary-keys
99
serialization
1010
sql-generation
11+
lazy-validation
1112
```
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Lazy Validation
2+
3+
In many cases, dataframely's capability to validate and filter input data is used at core application boundaries.
4+
As a result, `validate` and `filter` are generally expected to be used at points where `collect` is called on a lazy
5+
frame. However, there may be situations where validation or filtering should simply be added to the lazy computation
6+
graph. Starting in dataframely v2, this is supported via a custom polars plugin.
7+
8+
## The `eager` parameter
9+
10+
All of the following methods expose an `eager: bool` parameter:
11+
12+
- {meth}`Schema.validate() <dataframely.Schema.validate>`
13+
- {meth}`Schema.filter() <dataframely.Schema.filter>`
14+
- {meth}`Collection.validate() <dataframely.Collection.validate>`
15+
- {meth}`Collection.filter() <dataframely.Collection.filter>`
16+
17+
By default, `eager=True`. However, users may decide to set `eager=False` in order to simply append the validation or
18+
the filtering operation to the lazy frame. For example, one might decide to run validation lazily:
19+
20+
```python
21+
def validate_lf(lf: pl.LazyFrame) -> pl.LazyFrame:
22+
return lf.pipe(MySchema.validate, eager=False)
23+
```
24+
25+
When `eager=False`, validation is only run once the lazy frame is collected. If input data does not satisfy the schema,
26+
no error is raised here, yet.
27+
28+
## Error Types
29+
30+
Due to current limitations in polars plugins, the type of error that is being raised from the `validate` function (both
31+
for schemas and collections) is dependent on the value of the `eager` parameter:
32+
33+
- When `eager=True`, a {class}`~dataframely.ValidationError` is raised from the `validate` function
34+
- When `eager=False`, a {class}`~polars.exceptions.ComputeError` is raised from the `collect` function
35+
36+
```{note}
37+
For schemas, the error _message_ itself is equivalent.
38+
For collections, the error message for `eager=False` is limited and non-deterministic: the error message only includes
39+
information about a single member and, if multiple members fail validation, the member that the error message refers to
40+
may vary across executions.
41+
```

docs/guides/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ quickstart
88
examples/index
99
features/index
1010
development
11-
versioning
11+
migration/index
1212
faq
1313
```
1414

docs/guides/versioning.md renamed to docs/guides/migration/index.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,13 @@
1-
# Versioning policy and breaking changes
1+
# Migration Guides
2+
3+
```{toctree}
4+
:maxdepth: 1
5+
:hidden:
6+
7+
v1-v2
8+
```
9+
10+
## Versioning policy and breaking changes
211

312
Dataframely uses [semantic versioning](https://semver.org/).
413
This versioning scheme is designed to make it easy for users to anticipate what types of change they can expect from a

0 commit comments

Comments
 (0)