Skip to content

@libsql/client Memory leak when doing batch inserts to a file #333

@kevinfiol

Description

@kevinfiol

System info:

  • OS: Fedora Linux 42
  • CPU: 12th Gen Intel® Core™ i5-12400F (12)
  • Node: 24.11.1
  • @libsql/client: 0.15.15

Steps to reproduce:

  1. Created the following table:
create table if not exists user_indexed (
  id integer primary key autoincrement,
  first_name text not null,
  last_name text not null,
  email text not null,
  birthday text,
  is_pro integer not null default 0,
  deleted_at text,
  created_at text default CURRENT_TIMESTAMP,
  updated_at text
) strict;
  1. Run the following seed script (@faker-js/faker is used to generate random data):
import { memoryUsage } from 'node:process';
import { createClient } from '@libsql/client';
import { faker } from '@faker-js/faker';

const MEMORY_LIMIT = 2_147_483_648; // 2GB
const USER_COUNT = 500_000;
const BATCH_SIZE = 20_000;

const db = createClient({
  url: 'file:db.sqlite'
});

// set pragmas
db.execute('pragma journal_mode = wal;')

// generate user data
const users = [];
for (let i = 0; i < USER_COUNT; i++) {
  const firstName = faker.person.firstName();
  const lastName = faker.person.lastName();
  const email = faker.internet.email({ firstName, lastName }).toLowerCase();
  const birthday = faker.date.birthdate().toISOString();
  const is_pro = faker.number.int({ min: 0, max: 1 });

  const today = (new Date()).toISOString();
  const created_at = faker.date.between({ from: '2010-01-01', to: today }).toISOString();
  const updated_at = faker.date.between({ from: created_at, to: today }).toISOString();

  const is_deleted = faker.datatype.boolean();
  const deleted_at = is_deleted ? faker.date.between({ from: updated_at, to: today }).toISOString() : null;

  users.push({
    firstName,
    lastName,
    email,
    birthday,
    is_pro,
    deleted_at,
    created_at,
    updated_at
  });
}

// create insert statements
const stmts = [];
for (const user of users) {
  stmts.push({
    sql: `
      insert into user_indexed (
        first_name,
        last_name,
        email,
        birthday,
        is_pro,
        deleted_at,
        created_at,
        updated_at
      ) values (
        :firstName,
        :lastName,
        :email,
        :birthday,
        :is_pro,
        :deleted_at,
        :created_at,
        :updated_at
      )
    `,
    args: user
  });
}

try {
  console.time('seeding');

  let count = 0;
  while (stmts.length > 0) {
    const memUsageInBytes = memoryUsage.rss();
    console.log(`memory usage: ${formatBytes(memUsageInBytes)}`)
    if (memUsageInBytes >= MEMORY_LIMIT) {
      console.error('exceeded memory limit. exiting');
      process.exit(1);
    }

    const batch = stmts.splice(0, BATCH_SIZE);
    await db.batch(batch);

    count += BATCH_SIZE;
    console.log(`seeded ${count} rows...`)
  }

  console.log('DB has been seeded');
  console.timeEnd('seeding');
} catch (e) {
  throw e;
}

function formatBytes(bytes) {
  const kb = 1024;
  const mb = kb * 1024;
  const gb = mb * 1024;

  if (bytes < kb) {
    return bytes + ' B';
  } else if (bytes < mb) {
    return (bytes / kb).toFixed(2) + ' KB';
  } else if (bytes < gb) {
    return (bytes / mb).toFixed(2) + ' MB';
  } else {
    return (bytes / gb).toFixed(2) + ' GB';
  }
}
  1. Track the memory usage. From my results, it seems to balloon with every batch of insertions:
$ node scripts/seed-leak.js 
memory usage: 372.71 MB
seeded 20000 rows...
memory usage: 483.02 MB
seeded 40000 rows...
memory usage: 623.32 MB
seeded 60000 rows...
memory usage: 734.32 MB
seeded 80000 rows...
memory usage: 852.62 MB
seeded 100000 rows...
memory usage: 963.37 MB
seeded 120000 rows...
memory usage: 1.06 GB
seeded 140000 rows...
memory usage: 1.18 GB
seeded 160000 rows...
memory usage: 1.28 GB
seeded 180000 rows...
memory usage: 1.40 GB
seeded 200000 rows...
memory usage: 1.51 GB
seeded 220000 rows...
memory usage: 1.63 GB
seeded 240000 rows...
memory usage: 1.75 GB
seeded 260000 rows...
memory usage: 1.85 GB
seeded 280000 rows...
memory usage: 1.97 GB
seeded 300000 rows...
memory usage: 2.08 GB
exceeded memory limit. exiting

Some observations that may be important:

  • I did not experience the memory leak when replacing @libsql/client with better-sqlite3@12.4.6.
  • I did not experience the memory leak when replacing @libsql/client with libsql, which has a better-sqlite3-compatible API (however, it was considerably slower than better-sqlite3).
  • In both cases, I wrapped inserts in transactions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions