Skip to content

Partial reading with duckdb-wasm on a parquet file store on a public GCS bucket #2115

@nmondon

Description

@nmondon

What happens?

Hi there and thanks for the great work on duckdb!

I'm trying to load partially a parquet file stored in a public GCS bucket but can't get any 206 responses, only 200 with duckdb-wasm. I should mention that I've got duckdb-wasm-kit on top of it but this lib loads duckdb-wasm as a peer dependency.

The partial loading is working with hyparquet, but don't hesitate if you need some more settings (CORS policy, ...).

Here is my setup to generate the parquet file on the bucket:

TO 'gs://figdata-deces/base_deces_sample.parquet' (
	COMPRESSION 'SNAPPY',
	ROW_GROUP_SIZE 100_000,
    OVERWRITE_OR_IGNORE TRUE,
	parquet_version 'v2'
);

and my client code to load it:

import { StrictMode, useEffect } from 'react';
import { createRoot } from 'react-dom/client';
import { useDuckDb } from 'duckdb-wasm-kit';

function Demo() {
  const { db, loading, error } = useDuckDb();

  useEffect(() => {
    const fetchData = async () => {
      if (db) {
        const c = await db.connect();
        await c.query('LOAD httpfs;');
        await c.query("SET s3_endpoint='storage.googleapis.com';");

        await c
          .query(
            "SELECT nom, prenoms FROM read_parquet('https://storage.googleapis.com/figdata-deces/base_deces_sample.parquet') LIMIT 10;"
          )
          .then((result) => {
            console.log(result);
          });

        await c.close();
      }
    };

    fetchData();
  }, [db]);

  if (loading) return <div>Loading...</div>;
  if (error) return <div>Error: {error.message}</div>;

  return <div>Demo</div>;
}

createRoot(document.getElementById('root')!).render(
  <StrictMode>
    <div className="column">
      <p>
        Lorem ipsum dolor sit amet consectetur adipisicing elit. Quisquam, quos.
      </p>
      <Demo />
    </div>
  </StrictMode>
);

To Reproduce

Execute this react code:

npm i @duckdb/[email protected] duckdb-wasm-kit
import { StrictMode, useEffect } from 'react';
import { createRoot } from 'react-dom/client';
import { useDuckDb } from 'duckdb-wasm-kit';

function Demo() {
  const { db, loading, error } = useDuckDb();

  useEffect(() => {
    const fetchData = async () => {
      if (db) {
        const c = await db.connect();
        await c.query('LOAD httpfs;');
        await c.query("SET s3_endpoint='storage.googleapis.com';");

        await c
          .query(
            "SELECT nom, prenoms FROM read_parquet('https://storage.googleapis.com/figdata-deces/base_deces_sample.parquet') LIMIT 10;"
          )
          .then((result) => {
            console.log(result);
          });

        await c.close();
      }
    };

    fetchData();
  }, [db]);

  if (loading) return <div>Loading...</div>;
  if (error) return <div>Error: {error.message}</div>;

  return <div>Demo</div>;
}

createRoot(document.getElementById('root')!).render(
  <StrictMode>
    <div className="column">
      <p>
        Lorem ipsum dolor sit amet consectetur adipisicing elit. Quisquam, quos.
      </p>
      <Demo />
    </div>
  </StrictMode>
);

### Browser/Environment:

Chrome Version 140.0.7339.215 (Official Build) (arm64)

### Device:

OS X 

### DuckDB-Wasm Version:

1.30

### DuckDB-Wasm Deployment:

React client via duckdb-wasm-kit

### Full Name:

Nicolas Mondon

### Affiliation:

Le Figaro

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions