Replies: 3 comments 3 replies
-
|
My answers to the above questions:
|
Beta Was this translation helpful? Give feedback.
-
|
The metadata from a current Note that some of the label and annotation values will likely be the same for either every run or very large numbers of runs, for example, |
Beta Was this translation helpful? Give feedback.
-
|
To be reflected somewhere I wanted to add in this discussion another alternative that was contemplated and discarded to enhance read operations performance: business logic specific index and adapting queries based on that. Here is how the proposal would look like: Indexed Label/Annotation Paths Configuration ProposalWhen using equality operators instead of containment operators with EXISTS subqueries: -- Slow (7 minutes): GIN index with @> operator
WHERE data->'metadata'->'labels' @> '{"app.example.com/name":"app-1"}'
-- Fast (9ms): BTREE index with equality operator
WHERE EXISTS (
SELECT 1 FROM jsonb_each_text(r.data->'metadata'->'labels') l
WHERE l.key = 'app.example.com/name' AND l.value = 'app-1'
)The subquery approach works but:
Proposed SolutionCreate dedicated BTREE indexes for frequently queried, low-selectivity label and annotation keys, and provide a configuration mechanism to inform the API which paths have dedicated indexes. Core Components
Query Transformation ExamplesBefore (Current Implementation)SELECT data->'metadata'->>'creationTimestamp' as created_at, id, uuid, data
FROM resource
WHERE kind = 'PipelineRun'
AND api_version = 'tekton.dev/v1'
AND namespace = 'tenant-a'
AND data->'metadata'->>'creationTimestamp' > '2025-10-15T08:33:08Z'
AND data->'metadata'->'labels' @> '{"app.example.com/name":"app-1", "pipeline.example.com/type":"build"}'
ORDER BY data->'metadata'->>'creationTimestamp' DESC, id DESC
LIMIT 30;Performance: ~6 minutes (BitmapAnd with 335,501 rows rechecked) After (With Indexed Paths)SELECT data->'metadata'->>'creationTimestamp' as created_at, id, uuid, data
FROM resource
WHERE kind = 'PipelineRun'
AND api_version = 'tekton.dev/v1'
AND namespace = 'tenant-a'
AND data->'metadata'->>'creationTimestamp' > '2025-10-15T08:33:08Z'
AND data->'metadata'->'labels'->>'app.example.com/name' = 'app-1'
AND data->'metadata'->'labels'->>'pipeline.example.com/type' = 'build'
ORDER BY data->'metadata'->>'creationTimestamp' DESC, id DESC
LIMIT 30;Expected Performance: <100ms (Index scan on Why this was discarded:
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Label Normalization Proposal - Full Normalization Schema
Executive Summary
This document proposes normalizing Kubernetes resource labels from JSONB into relational tables using full normalization to enhance performance with label-filtered queries while minimizing storage overhead from value duplication.
Current Problem:
Label-filtered queries can take 7+ minutes when filtering by common labels (e.g., application name, pipeline type). This happens because:
app.example.com/name=app-1often match 10K-100K+ resources (low selectivity)work_mem(default: 4MB), PostgreSQL switches to "lossy mode" which requires rechecking ALL rows in matched pages, causing massive overhead📊 Click to see detailed technical analysis and query execution internals
Example Query Performance
Query: Find PipelineRuns with
app.example.com/name=app-1created after a certain dateCurrent performance:
How PostgreSQL Executes These Queries
When querying with label filters, PostgreSQL must combine two different index types:
1. GIN Index Scan on
data->'metadata'->'labels':@> '{"app.example.com/name":"app-1"}'returns 362,394 row IDs2. BTREE Index Scan on
(kind, api_version, namespace, creationTimestamp):3. BitmapAnd Operation:
The work_mem Limitation
PostgreSQL's
work_memparameter (default: 4MB) determines how much memory each bitmap can use:When bitmap fits in work_mem (non-lossy):
When bitmap exceeds work_mem (lossy):
Why Non-Selective Labels Cause Problems
Selectivity = (rows matching filter) / (total rows)
Highly selective (<0.1%): e.g., specific git commit SHA
Low selectivity (>1%): e.g., application name, pipeline type
Common Kubernetes labels with low selectivity (>1% of resources):
app.kubernetes.io/managed-by: pipelinesascode.tekton.dev→ ALL managed resourcesapp.example.com/name: app-1→ ALL resources for one application (10K-100K+)pipeline.example.com/type: build→ ALL build pipelinesnamespace: tenant-a→ ALL resources in namespaceAs the database grows, these labels match more rows, making the problem worse.
Real-World Example: Query Plan Analysis
Database: 1.25 million resources, 78K PipelineRuns in namespace
Query: Find PipelineRuns with
app.example.com/name=app-1after 2025-10-15Execution:
Why so slow:
Solutions Comparison
Recommendation: Full normalization eliminates bitmap scan issues entirely by using direct BTREE index lookups.
Proposed Solution:
Schema Design
Full Normalization: Four-Table Design
Note: Generic table names (
metadata_key,metadata_value,metadata_pair) allow future reuse for annotations via a separateresource_annotationtable that references the samemetadata_pairtable.Table Definitions
Table 1:
resource(Existing - No Changes)Note: The
dataJSONB field is kept for full manifest storage. Labels are duplicated in relational tables for query performance.Table 2:
metadata_key(New - Stores Unique Metadata Keys)Purpose:
app.kubernetes.io/managed-by)app.kubernetes.io/managed-bystored once, not once per valueEstimated rows: ~500-1000 unique keys for labels, ~500-1000 for annotations (relatively small)
Table 3:
metadata_value(New - Stores Unique Metadata Values)Purpose:
pipelinesascode.tekton.dev,v0.40.0,app-1)pipelinesascode.tekton.devstored once, not millions of timesEstimated rows: ~10K-100K unique values (depends on cardinality of fields like commit SHAs)
Table 4:
metadata_pair(New - Stores Unique Key-Value Pairs)Purpose:
app.kubernetes.io/managed-by=pipelinesascode.tekton.dev)app.kubernetes.io/managed-by=pipelinesascode.tekton.devstored once, referenced millions of timesEstimated rows: ~50K-200K unique pairs for labels, similar for annotations (far fewer than total occurrences)
Table 5:
resource_label(New - Links Resources to Label Pairs)Purpose:
resource_annotationtable can reuse the samemetadata_pairtableEstimated rows: ~20-50 million (1M resources × 20-50 labels each)
📋 Example Data with Full Normalization (Click to expand)
Scenario: 3 PipelineRuns with Common Labels
Table:
resource{"kind":"PipelineRun","metadata":{"labels":{...}},...}{"kind":"PipelineRun","metadata":{"labels":{...}},...}{"kind":"PipelineRun","metadata":{"labels":{...}},...}Table:
metadata_key(Unique Keys)Note: Each key string stored once regardless of how many values or resources use it.
Table:
metadata_value(Unique Values)Note: Each value string stored once.
pipelinesascode.tekton.devappears once even if used by millions of resources.Table:
metadata_pair(Unique Key-Value Pairs)app.example.com/name=app-1pipeline.example.com/type=buildpipeline.example.com/type=testscm.example.com/commit-sha=abc123def456scm.example.com/commit-sha=def789abc012app.kubernetes.io/version=v0.40.0app.kubernetes.io/managed-by=pipelinesascode.tekton.devtest.example.com/scenario=integration-testsNote: Each unique key-value combination stored once. When thousands of PipelineRuns share the same label, only the pair ID is duplicated in
resource_label.Table:
resource_label(Resource-Label Associations)app.example.com/name=app-1pipeline.example.com/type=buildscm.example.com/commit-sha=abc123def456app.kubernetes.io/version=v0.40.0app.kubernetes.io/managed-by=pipelinesascode.tekton.devapp.example.com/name=app-1(SAME pair as 500123)pipeline.example.com/type=build(SAME pair as 500123)scm.example.com/commit-sha=def789abc012app.kubernetes.io/version=v0.40.0(SAME pair as 500123)app.kubernetes.io/managed-by=pipelinesascode.tekton.dev(SAME pair as 500123)app.example.com/name=app-1(SAME pair as others)pipeline.example.com/type=testtest.example.com/scenario=integration-testsVisual Representation:
Key insight: The pair
app.example.com/name=app-1(pair_id=1) appears in all three PipelineRuns, meaning:"app.example.com/name"and"app-1"stored once in their respective tableslabel_pairresource_labelfor each resource"app.example.com/name": "app-1"(34 bytes) thousands of times, or even (key_id, value_id) = 16 bytes thousands of times, we store one 8-byte integer per resourceQuery Examples
Query 1: Find All Resources with
app.example.com/name=app-1Current JSONB approach (7 minutes):
New fully normalized approach with metadata_pair (< 100ms):
Alternative simpler version:
Query Plan (estimated):
Result: Direct B-tree index lookups using index on pair_id, no bitmap scans, ~50-100ms
Query 2: Find Resources with Multiple Labels
Find PipelineRuns with
app.example.com/name=app-1ANDscm.example.com/commit-sha=abc123def456:Query 3: Get All Labels for a Resource
Result:
Query 4: Analytics - Most Common Label Values
Result:
💾 Storage Analysis: Problem, Current State, and Solution (Click to expand)
The Storage Problem: Value Duplication
Many label keys and values are repeated across hundreds of thousands or millions of resources:
High-frequency labels (same value on most/all resources):
app.kubernetes.io/managed-by: pipelinesascode.tekton.dev- on ALL managed PipelineRunsapp.kubernetes.io/version: v0.40.0- on ALL runs with this versionpipeline.example.com/type: build- on ALL build pipelinesscm.example.com/git-provider: github- on ALL GitHub-based pipelinesMedium-frequency labels (same value on thousands of resources):
app.example.com/name: app-1- on ALL resources for this application (~13K resources)namespace: tenant-a- on ALL resources in this namespaceLow-frequency labels (unique or rare values):
scm.example.com/commit-sha: eb1befa91218676771a798b14c2931594710538e- unique per commitProblem: Label values like
"pipelinesascode.tekton.dev"(28 chars) are duplicated many times in partial normalization approaches. With full normalization, each unique string is stored once.Current Storage (JSONB Only)
Estimated for 1M PipelineRuns with 20 labels each:
datafieldPartial Normalization Storage
Scenario: 1 million PipelineRuns, each with 20 labels
Using partial normalization (
label(id, key, value)+resource_label):labeltableresource_labeltableProblem: Strings like
"pipelinesascode.tekton.dev"stored multiple times as part of different key-value pairs.Full Normalization Storage (4-Table Design - Proposed Solution)
Estimated for 1M PipelineRuns with 20 labels each:
metadata_keyapp.kubernetes.io/managed-by+ timestampsmetadata_valuepipelinesascode.tekton.dev,app-1+ timestampsmetadata_pairresource_labelStorage Comparison
Savings:
Critical benefit as scale increases:
vs. keeping labels in JSONB + duplicating in normalized tables:
Data Migration Strategy
📦 Phase 1: Create New Tables (Click to expand SQL)
Note on
updated_atcolumns:All tables include an
updated_attimestamp column that should be automatically updated whenever a row is modified. We'll add a trigger function to handle this in Phase 1b below.🔄 Phase 1b: Create Trigger for Automatic `updated_at` Updates (Click to expand SQL)
Note: These triggers will automatically update the
updated_attimestamp whenever any row in these tables is modified via UPDATE operations. This ensures accurate tracking of when each record was last modified.📥 Phase 2: Populate Tables from Existing Data (Click to expand SQL)
Note: For large databases (2M+ resources), run this in batches:
⚙️ Phase 3: Modify Write Path with Database Trigger (Click to expand SQL)
Recommended: Use database trigger with WHEN clause to automatically sync labels only when they change.
Performance Impact:
Testing the Trigger:
📖 Phase 4: Application Read Code Examples (Click to expand)
Architecture Overview
KubeArchive uses a database-agnostic architecture:
reader.go: Database-agnostic business logic<vendor>.go: Vendor-specific SQL building (PostgreSQL, MySQL, etc.)To support normalized label queries, we need to:
reader.govendor-agnosticStep 1: Extend the Facade Interface
Step 2: Implement PostgreSQL-specific Label Facade
Step 3: Create Label-Specific Filters
Step 4: Database-Agnostic Reader Code
Step 5: Optimized Single-Label Query (Using CTE)
For better performance with single labels, use a CTE-based approach:
Step 6: Example Usage in API Handler
Benefits of This Architecture
reader.gohas no vendor-specific SQLQueryFacadeinterface for unit tests⚖️ Write Path: Database Triggers vs. Application-Level Writes (Click to expand)
The proposal includes database triggers for syncing labels to normalized tables. However, there's an alternative: writing to normalized tables directly from the KubeArchive sink.
Option 1: Database Triggers (Proposed in this document)
How it works:
resourcetable (JSONB)Pros:
Cons:
Option 2: Application-Level Writes (Alternative approach)
How it works:
resourcetable AND normalized tables in single transactionExample implementation:
Pros:
Cons:
Recommendation
Use Database Triggers (Option 1) for this implementation:
DROP TRIGGERif issues ariseNEW.data->'metadata'->'labels' IS DISTINCT FROM OLD.data->'metadata'->'labels'without any application overheadWhy the trigger WHEN clause is optimal:
Consider Application-Level Writes (Option 2) only if:
Beta Was this translation helpful? Give feedback.
All reactions