Skip to content

Releases: dlt-hub/dlt

1.4.0

14 Nov 21:15
0fce1c8
Compare
Choose a tag to compare

Core Library

  • feat: add incremental lag (attribution window) for datetime, int, and float cursors by @donotpush in #1957
  • LanceDB - (1) support merge key to merge chunked documents correctly - removes orphaned chunks (2) huge performance upgrade by loading data via arrow by @Pipboyguy in #1620
  • Move exclude_keys() to dlt.common.utils by @burnash in #1966
  • Fix BigQueryLoadJob hiding root cause exception by @xneg in #1992
  • loads secrets from colab userdata and steamlit + bugfixes by @rudolfix in #1994
  • Fix pagination issue in JSONResponseCursorPaginator with empty string cursor value by @kang8 in #2016
  • fix: if name of distribution is None by @senickel in #2024
  • allows to pass default values when writing specs by @rudolfix in #2018
  • enable delta partitioning on arrow normalizer load id by @jorritsandbrink in #2022
  • add session token to duckdb s3 secret by @jorritsandbrink in #2007
  • Add user agent for Databricks by @VioletM in #1987
  • Fix an incorrect missing dependency error by @burnash in #2001
  • fix resource level max_table_nesting and normalizer performance tuning by @sh-rp in #2026
  • move default pipelines of cores sources into source folders by @sh-rp in #1888
  • duckdb filesystem custom secrets by @sh-rp in #2017
  • allows for empty dataset clickhouse by @rudolfix in #2045
  • add GCP default credential handling for delta table format by @jorritsandbrink in #2048
  • enables merges for bigquery autodetect schema by @sh-rp in #2035
  • logs warning if deduplication state is large by @willi-mueller in #1877
  • Add core sources extras to requirements in dlt init by @burnash in #2028
  • Fix merge write disposition for pyarrow and ClickHouse by @burnash in #2042

Experimental interfaces

dlt dataset public interface and docs coming next week.

  • 1990 - dataset columns select and limit by @sh-rp in #2000

Docs

New Contributors

Full Changelog: 1.3.0...1.4.0

1.3.0

22 Oct 08:53
1893860
Compare
Choose a tag to compare

Core Library

  • Fix try/except in from_reference shadowing MissingDependencyException by @burnash in #1939
  • prefers uv over pip if found (when creating virtual envs) by @rudolfix in #1940
  • allows to plug new or updated dlt cli commands by @sh-rp in #1938
  • Feat/557 rest api add oauth2clientcredentials to built in auth methods by @willi-mueller in #1871
  • uses path normalize for columns in arrow tables by @rudolfix in #1947
  • Added extended jsonpath_ng parser (rest_api) by @francescomucio in #1941
  • Fix/1897 support https endpoints clickhouse by @sh-rp in #1931
  • Fix for multiple ignores is not working (rest_api) by @burnash in #1956
  • SQL Database: Support including/excluding NULL cursor values by @steinitzu in #1946
  • Add references table hint and reflect them in sql_database by @steinitzu in #1925
  • only truncate or delete from existing tables in refresh modes by @sh-rp in #1926
  • adds bigquery partition expiration and motherduck connection string by @rudolfix in #1968

Experimental interfaces

Below we expose a new pipeline._dataset and dlt._dataset interfaces that provide unified access to data loaded into destination. We also implement duckdb-based SQL client on a filesystem destination to access data in data lakes. We'll add documentation once we stabilize dataset interface. However already now you can benefit from new cursor implementation of sql_client that allows to take data frames, arrow tables also in batches:

  • dataset factory by @sh-rp in #1945
  • expose readable datasets as dataframes and arrow tables by @sh-rp in #1507

PRs below adds pluggy and a few first plugin hooks. The idea is to make a lot of functionalities in dlt pluggable. Currently you can plug new cli command (or upgrade existing) and you can also plug your own runtime environment (how dlt looks for data, secrets etc.)

Docs

New Contributors

Full Changelog: 1.2.0...1.3.0

1.2.0

07 Oct 21:10
8798c17
Compare
Choose a tag to compare

Core Library

Docs

New Contributors

Full Changelog: 1.1.0...1.2.0

1.1.0

26 Sep 13:33
d2b6d05
Compare
Choose a tag to compare

What's Changed

Docs

Verified Sources

  • Custom filter clauses supported, pyarrow/arrowmongo requirement optional for Mongo by @Pipboyguy

New Contributors

Full Changelog: 1.0.0...1.1.0

1.0.0

16 Sep 15:07
Compare
Choose a tag to compare

This is a major dlt release. Please check the list of breaking changes and deprecations: #1778

Core Library

  • move rest_api, sql_database and filesystem sources to dlt core by @willi-mueller in #1728
  • drops foreign_key, adds nested references (row_key - parent_key) by @rudolfix in #1774
  • deprecates complex data type, changes to json by @rudolfix in #1792
  • Feat/1749 abort load package and raise exception on terminal errors in jobs by @willi-mueller in #1781
  • Feat/1492 extend timestamp config to handle naive timestamps (without timezone) by @donotpush in #1669
  • Fix/1571 Incremental: Optionally load or ignore/exclude/include records with cursor_path missing or None value by @willi-mueller in #1576
  • creates a single source in extract for all resource instances passed as list by @rudolfix in #1535
  • Enable BigQuery schema auto-detection with partitioning and clustering hints by @Pipboyguy in #1806
  • Sqlalchemy destination (merge support and docs still in progress) by @steinitzu in #1734
  • Feat/1730 extend filesystem sftp by @donotpush in #1769
  • Stops dumping secrets to dlt traces. by @willi-mueller in #1797
  • Don't use Custom Embedding Functions on LanceDB by @Pipboyguy in #1771
  • sets default concurrency for blob upload for adlfs to 1 to avoid massive memory usage on large files by @rudolfix in #1779
  • Fix/1790 support incremental load with arrow when cursor column is not nullable by @willi-mueller in #1791
  • controls row group size and empty tables in memory buffer when writing parquet by @rudolfix in #1782
  • fix installation command" by @novica in #1741
  • skips tables without jobs when merging delta tables by @rudolfix in #1803

Docs

New Contributors

Full Changelog: 0.5.4...1.0.0

0.5.4

28 Aug 20:02
9857029
Compare
Choose a tag to compare

Core Library

Docs:

New Contributors

Full Changelog: 0.5.3...0.5.4

0.5.3

13 Aug 00:20
19c41ea
Compare
Choose a tag to compare

Core Library

  • Add support for continuously starting load jobs as slots free up in the loader. This will significantly speed up loading packages with many files. by @sh-rp in #1494
  • Add get_delta_tables helper function to optimize and vacuum tables by @jorritsandbrink in #1664
  • Raise/warn on incomplete columns in normalize by @steinitzu in #1504
  • Add enable_dataset_name_normalization option by @VioletM in #1676
  • updates duckdb/motherduck load job to match parquet by column names by @rudolfix in #1674
  • updates duckdb/motherduck load job to fully allow jsonl file format by @rudolfix in #1674
  • removes internal locks when loading parquet from multiple threads (duckdb got fixed) #1674
  • enables multi transactions statements for Motherduck #1674
  • fixes dbt logs line endings

Docs

Verified Sources

  • Column selector added to sql_database @steinitzu

New Contributors

Full Changelog: 0.5.2...0.5.3

0.5.2

02 Aug 19:18
e00baa0
Compare
Choose a tag to compare

Core Library

  • Add upsert merge strategy for Postgres and Snowflake, by @jorritsandbrink in #1466
  • Add basic upsert support for delta table format in filesystem destination by @jorritsandbrink in #1600
  • query tagging for snowflake by @rudolfix in #1582
  • Support Open Source ClickHouse Deployments (MergeTree engine and more) by @Pipboyguy in #1496
  • allows nested types in BigQuery via native autodetect_schema by @rudolfix in #1591
  • Enable upsert merge strategy for more SQL destinations (Athena, BigQuery, Databricks, mssql) by @jorritsandbrink in #1628
  • Fix/1512 fixes current.pipeline() access by @rudolfix in #1581
  • feat: add config dataset_name_prefix to set custom staging dataset name by @donotpush in #1563
  • fix: add airflow db reset for all tests by @donotpush in #1559
  • Enable S3 compatible storage for delta table format by @jorritsandbrink in #1586
  • feat/1495 rest_client: renames JSONResponsePaginator to JSONLinkPaginator by @willi-mueller in #1558
  • Feat/1596 adds custom config providers + example of yaml config provider supporting profiles and jinja placeholders by @rudolfix in #1642
  • Feat/1583 rest client session timeout configuration by @willi-mueller in #1590
  • Add clarification for add_limit by @VioletM in #1594
  • Fix/1606 fixes validator incremental step order to keep it always last in the pipe by @rudolfix in #1641
  • Feat/1593 rest_client: allow setting of request kwargs by @willi-mueller in #1609
  • prevent accidental wrapping of sources in resources when using adapters by @sh-rp in #1645
  • Add empty source handling for delta table format on filesystem destination by @jorritsandbrink in #1617
  • Surface original err msg from pydantic as extended_info on DataValidationError by @codingcyclist in #1569
  • fix(dockerfile): remove extra spaces around equals sign in LABEL inst… by @thisisdope in #1573
  • Qdrant uncommitted state restore and test by @steinitzu in #1545
  • fix: suppress alembic logs for tests by @donotpush in #1578

Docs

New Contributors

Full Changelog: 0.5.1...0.5.2

0.5.1

08 Jul 15:28
d1e5666
Compare
Choose a tag to compare

This is a major release (0.4 -> 0.5) in our versioning scheme so please review the breaking changes below. Most of them are relevant only for platform builders that use dlt internals. Some of the long-deprecated components were removed as well

Breaking Changes

Breaking Changes (internals)

  • if dlt.source or dlt.resource decorated function is passed a None in a default argument during a function call, it will be handled exactly like in regular Python function call. Previously such None would request argument injection from configuration. Please read more here: (#1430)
  • dlt.config.value and dlt.secrets.value were evaluating to None at runtime. Now they will evaluate to a sentinel value. All the existing code should be backward compatible. (#1430)
  • full_refresh flag of dlt.pipeline will be deprecated and replaced with dev_mode. (#1063) and (https://dlthub.com/devel/general-usage/pipeline#do-experiments-with-dev-mode)
  • the default resource extraction sequence has changed to round_robin from fifo as a default setting. You can switch back to the previous behavior and learn more about what this means here: (https://dlthub.com/docs/reference/performance#resources-extraction-fifo-vs-round-robin)
  • if you create an instance of a SPEC (ie. SnowflakeCredentials) it will not be marked as resolved even if all required fields are provided. previously some were resolving and some were not. #1489
  • parse_native_representation never marks config as resolved. previously some were resolving and some were not. #1489

Core Library

Docs

Verified Sources

We worked intensively on rest_api and sql_database:

Read more

0.4.12

29 May 13:14
b4e0491
Compare
Choose a tag to compare

Core Library

  • feat(pipeline): add an ability to auto truncate staging dataset by @IlyaFaer in #1292
  • Feat/1406 bumps duckdb 0.10 + dbt to <=1.8.x by @rudolfix in #1407
  • Azure service principal credentials support by @steinitzu in #1377
  • Support partitioning hints for athena iceberg by @steinitzu in #1403
  • Add recommended_file_size cap to limit data writer file size and cap BigQuery to 4gb by @steinitzu in #1368
  • limits mssql query size to fit network buffer to prevent errors on large inserts by @rudolfix in #1372
  • allows to bubble up exceptions when standalone resource returns by @rudolfix in #1374
  • Fix: use .get on column in mssql destination for cases where the yaml… by @Daniel-Vetter-Coverwhale in #1380
  • Make path tests Windows compatible by @jorritsandbrink in #1384
  • RESTClient: Added "values" to the data pattern of the rest_api helper by @francescomucio in #1399
  • corrects single entity path detection by @rudolfix in #1394
  • RESTClient: implement AuthConfigBase.bool + update docs by @burnash in #1413
  • Fix: ensure custom session can be provided to rest client by @z3z1ma in #1396

Docs

  • RESTClient: add an example for creating a custom POST paginator by @burnash in #1358
  • Add rest_api verified source documentation by @burnash in #1308
  • Fix typo in Slack Docs by @cybermaxs in #1369
  • RESTClient: docs: add the troubleshooting section by @burnash in #1367
  • Replace weather api example with github in create a pipeline walkthrough by @sultaniman in #1351
  • RESTClient: docs: Fixed snippet definition by @burnash in #1373
  • docs: destination tables: elaborate on example code by @burnash in #1386
  • add naming rules to contributing by @sh-rp in #1291
  • Added info about how to reorder the columns to adjust a schema by @dat-a-man in #1364
  • rest_api: add response_actions documentation by @burnash in #1362
  • Update the tutorial to use rest_client.paginate for pagination by @burnash in #1287
  • fix command to install dlt by @Benjamin0313 in #1404
  • improves sql database docs by @rudolfix in #1383
  • add typing classifier and update maintainers in pyproject by @sh-rp in #1391
  • Updated installation command in destination docs and a few others by @dat-a-man in #1410
  • Update filesystem docs with auto mkdir config by @VioletM in #1416
  • add page to docs for openapi generator by @sh-rp in #1417

New Contributors

Full Changelog: 0.4.11...0.4.12