Skip to content

relytcloud/pg_ducklake

Repository files navigation

pg_ducklake

PostgreSQL Extension for DuckLake

dockerhub License

This project is under high development and is not yet ready for production use.

pg_ducklake brings a native datalake experience into PostgreSQL, powered by DuckLake (a DuckDB lakehouse format with SQL catalog metadata and open Parquet data files).

Key Features

  • Managed DuckLake tables: Create/Write/Query DuckLake tables in PostgreSQL via SQL (e.g., psql/JDBC).
  • HTAP support: incremental heap-to-ducklake conversion with pg_duckpipe.
  • DuckDB compatibility: tables created by pg_ducklake are directly queryable from DuckDB clients.
  • Cloud storage: store data files in AWS S3 (or GCS, R2) to decouple storage and compute for serverless analytics.
  • Fast analytics: columnar storage + DuckDB vectorized execution, with hybrid queries over PostgreSQL heap tables supported.

See it in action

Your first Data Lake in PostgreSQL

-- Or use AWS S3 as data storage.
-- SET ducklake.default_table_path = 's3://my-bucket/prefix/';

CREATE TABLE my_table (
    id INT,
    name TEXT,
    age INT
) USING ducklake;

INSERT INTO my_table VALUES (1, 'Alice', 25), (2, 'Bob', 30);

SELECT * FROM my_table;

Access your data with DuckDB

INSTALL ducklake;
LOAD ducklake;
ATTACH 'ducklake:postgres:dbname=postgres host=localhost' AS my_ducklake (METADATA_SCHEMA 'ducklake');
SELECT * FROM my_ducklake.public.my_table;

For cloud storage (AWS S3 or Azure Blob Storage), see the DuckLake Secrets Management guide.

Quick Start

Docker

Run PostgreSQL with pg_ducklake pre-installed in a docker container:

docker run -d -e POSTGRES_PASSWORD=duckdb --name pgducklake pgducklake/pgducklake:18-main

docker exec -it pgducklake psql

Compile from source

Requirements:

  • PostgreSQL: 14, 15, 16, 17, 18
  • Operating Systems: Ubuntu 22.04-24.04, macOS
git clone https://github.com/relytcloud/pg_ducklake
cd pg_ducklake
# (Optional) install pg_duckdb
# make install-pg_duckdb
make install

See documentation for detailed instructions.

Usecases

Convert a PostgreSQL heap table into a DuckLake table

This pattern performs a one-time ETL copy from row-store (PostgreSQL heap) tables to DuckLake (column-store) tables for fast analytics, while OLTP continues to use the original heap tables.

-- Create a PostgreSQL row-store (heap) table.
CREATE TABLE row_store_table AS
SELECT i AS id, 'hello pg_ducklake' AS msg
FROM generate_series(1, 10000) AS i;

-- Create a DuckLake column-store table via ETL.
CREATE TABLE col_store_table USING ducklake AS
SELECT *
FROM row_store_table;

-- Run analytics against the converted table.
SELECT max(id) FROM col_store_table;

Load an external dataset

External datasets (e.g., CSV/Parquet) can be ingested with DuckDB readers and materialized as tables for analytics.

CREATE TABLE titanic USING ducklake AS
SELECT * FROM read_csv('https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv');

SELECT "Pclass", "Sex", COUNT(*), AVG("Survived") AS survival_rate
FROM titanic
GROUP BY "Pclass", "Sex";

Roadmap

For a detailed comparison of upstream DuckLake features and what pg_ducklake currently supports, see DuckLake Feature Coverage.

pg_ducklake

  • HTAP support for incremental row-store to column-store conversion (PostgreSQL heap to DuckLake) with pg_duckpipe
  • Better transaction concurrency model (based on PostgreSQL XID)
  • Faster metadata operations via PostgreSQL native functions (e.g., SPI)

Documentation

See docs/ for full documentation including SQL reference, settings, access control, and feature coverage.

Contributing

We welcome contributions! Please see CONTRIBUTING.md for how to get involved.

Acknowledgments

This project is built with pg_duckdb and ducklake.

About

Native lakehouse experience in Postgres powered by DuckDB & Ducklake

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Contributors