Skip to content

primary_key not removed with apply_hints(primary_key="") (same issue with merge_key) #3210

@jorritsandbrink

Description

@jorritsandbrink

dlt version

1.17.1

Describe the problem

Providing an empty value for primary_key to apply_hints does not remove a primary key that was set before:

  1. my_resource.apply_hints(primary_key="id") sets the primary key correctly
  2. my_resource.apply_hints(primary_key="") does not remove the primary key (neither does my_resource.apply_hints(primary_key=[]))

The documentation for apply_hints says:

"Pass empty value (for a particular type i.e. "" for a string) to remove a hint."

The same behavior occurs for merge_key.

I found this related issue that may suggest this is intended behavior, but I'm not sure and I think this behavior is confusing: #2440

Expected behavior

my_resource.apply_hints(primary_key="") (or my_resource.apply_hints(primary_key=[]) removes the primary key (if exists)

Steps to reproduce

import json

import dlt


@dlt.resource
def my_table():
    return [{"id": 1, "foo": 1}, {"id": 2, "foo": 1}]


pipe = dlt.pipeline(destination="duckdb")

# Set primary_key using apply_hints
my_table.apply_hints(primary_key="id")
pipe.run(my_table())
table_schema = pipe.default_schema.get_table("my_table")
print('Table schema after `my_table.apply_hints(primary_key="id")` (primary_key set: CORRECT):')
print(json.dumps(table_schema, indent=2))

# Attempt to unset primary_key through apply_hints: doesn't work
my_table.apply_hints(primary_key="")  # primary_key=[] has same behavior as primary_key=""
pipe.run(my_table())
table_schema = pipe.default_schema.get_table("my_table")
print()
print('Table schema after `my_table.apply_hints(primary_key="")` (primary_key still set: INCORRECT):')
print(json.dumps(table_schema, indent=2))

# Set primary_key to False using `columns` as workaround: works
my_table.apply_hints(columns={"id": {"primary_key": False}})
pipe.run(my_table())
table_schema = pipe.default_schema.get_table("my_table")
print()
print('Table schema after `my_table.apply_hints(columns={"id": {"primary_key": False}})` (primary_key set to False: CORRECT, but workaround):')
print(json.dumps(table_schema, indent=2))

Output:

Table schema after `my_table.apply_hints(primary_key="id")` (primary_key set: CORRECT):
{
  "columns": {
    "id": {
      "name": "id",
      "nullable": false,
      "primary_key": true,  # CORRECT
      "data_type": "bigint"
    },
    "foo": {
      "name": "foo",
      "data_type": "bigint",
      "nullable": true
    },
    "_dlt_load_id": {
      "name": "_dlt_load_id",
      "data_type": "text",
      "nullable": false
    },
    "_dlt_id": {
      "name": "_dlt_id",
      "data_type": "text",
      "nullable": false,
      "unique": true,
      "row_key": true
    }
  },
  "write_disposition": "append",
  "name": "my_table",
  "resource": "my_table",
  "x-normalizer": {
    "seen-data": true
  }
}

Table schema after `my_table.apply_hints(primary_key="")` (primary_key still set: INCORRECT):
{
  "columns": {
    "id": {
      "name": "id",
      "nullable": false,
      "primary_key": true,  # INCORRECT
      "data_type": "bigint"
    },
    "foo": {
      "name": "foo",
      "data_type": "bigint",
      "nullable": true
    },
    "_dlt_load_id": {
      "name": "_dlt_load_id",
      "data_type": "text",
      "nullable": false
    },
    "_dlt_id": {
      "name": "_dlt_id",
      "data_type": "text",
      "nullable": false,
      "unique": true,
      "row_key": true
    }
  },
  "write_disposition": "append",
  "name": "my_table",
  "resource": "my_table",
  "x-normalizer": {
    "seen-data": true
  }
}

Table schema after `my_table.apply_hints(columns={"id": {"primary_key": False}})` (primary_key set to False: CORRECT, but workaround):
{
  "columns": {
    "id": {
      "name": "id",
      "nullable": false,
      "primary_key": false,  # CORRECT
      "data_type": "bigint"
    },
    "foo": {
      "name": "foo",
      "data_type": "bigint",
      "nullable": true
    },
    "_dlt_load_id": {
      "name": "_dlt_load_id",
      "data_type": "text",
      "nullable": false
    },
    "_dlt_id": {
      "name": "_dlt_id",
      "data_type": "text",
      "nullable": false,
      "unique": true,
      "row_key": true
    }
  },
  "write_disposition": "append",
  "name": "my_table",
  "resource": "my_table",
  "x-normalizer": {
    "seen-data": true
  }
}

Operating system

Linux

Runtime environment

Local

Python version

3.13

dlt data source

No response

dlt destination

No response

Other deployment details

No response

Additional information

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions