Skip to content

Implement #204 - Add support for POSTGRES_BINARY copy method #205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 3, 2024

Conversation

Mytherin
Copy link
Contributor

@Mytherin Mytherin commented Apr 3, 2024

Implements #204

This PR adds support for the POSTGRES_BINARY copy format, that allows you to write files in the Postgres binary format. The files can be subsequently copied into Postgres using their FORMAT binary copy method. Example use case:

DuckDB
D create table integers(i int);
D insert into integers from range(100);
D copy integers to 'pg_binary.bin' (format postgres_binary);
Postgres
postgres=# create table integers(i int);
postgres=# copy integers from '/{working_directory}/pg_binary.bin' (format binary);
postgres=# SELECT COUNT(*), MIN(i), MAX(i), SUM(i) FROM integers;
 count | min | max | sum  
-------+-----+-----+------
   100 |   0 |  99 | 4950
(1 row)

This uses the PostgresBinaryWriter that we have developed. Not all data types are supported - for now this is limited to types that directly map to Postgres - bool, smallint, int, bigint, float, double, decimal, date, time, timestamp, interval, uuid, blob, varchar and composite types (arrays/structs). We could probably create a mapping for other types (e.g. unsigned types to Postgres' signed types).

For now this only supports writing - although we could probably support reading from the binary format using the PostgresBinaryReader as well.

@adriangb
Copy link

adriangb commented Apr 3, 2024

I'm amazed at how little extra code this was and how fast you implemented it, this is really an amazing project!

I tested this and it worked as described with no issues 😄

The one thing missing for DuckDB to replace pgpq is a way to copy to an in-memory buffer in Python at least, but that's tangential to this PR, I think this should go ahead just like this!

@Mytherin Mytherin merged commit d857039 into duckdb:main Apr 3, 2024
16 checks passed
@Mytherin
Copy link
Contributor Author

Mytherin commented Apr 3, 2024

Thanks for trying it out. Copying to an in-memory buffer might be possible with fsspec, although I'm not super well versed in that so I'm not entirely sure.

@adriangb
Copy link

adriangb commented Apr 3, 2024

Thanks certainly not related to this so I opened duckdb/duckdb#11499 to continue that discussion. Thank you again for getting this done so quickly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants