Skip to content

Conversation

@luoluoyuyu
Copy link
Member

@luoluoyuyu luoluoyuyu commented Nov 4, 2025

Description

This PR primarily addresses two performance bottlenecks in wide-table scenarios:

  1. Metadata Analysis Phase - StatementAnalyze takes too long (even longer than writing to MemTable time)
  2. TSFile Table Registration - Unnecessary schema conversion overhead

I. Metadata Phase Optimization

Problem Background

Through flame graph analysis, it was found that in wide-table scenarios, the CPU time consumption of the StatementAnalyze stage is too high, becoming the main bottleneck for write performance.

StatementAnalyze Flame Graph

Optimization Measures

1. Reduce Redundant TsTableTableSchema Conversion

Problem Positioning: Frequent schema conversion operations occupy a large amount of CPU

Conversion Optimization Comparison

2. Reduce Unnecessary semanticCheck Execution

Problem Positioning: In wide-table scenarios, complete semantic checks are executed on every write, but most check items are repeated

Optimization Solution: Introduce a check item caching mechanism to skip repeated checks

semanticCheck Optimization


3. Optimize TsTable.getColumnSchema Read Lock Time - Introduce Optimistic Locking

Problem Positioning: Severe read lock contention under high concurrency

Optimization Solution:

  • Read operations prefer optimistic read lock, downgrade to read lock on failure

Lock Optimization Flame Graph

II. TSFile Table Registration Optimization

Problem Background

The TSFile registration stage has unnecessary TsTableSchema conversion, causing additional overhead.

TSFile Registration Flame Graph

Optimization Measures

1. Eliminate Redundant TsTableSchema Conversion

Conversion Elimination Comparison

Implementation Points:

  • Remove intermediate TsTableSchema conversion in TsFileRegister

This PR has:

  • been self-reviewed.
    • concurrent read
    • concurrent write
    • concurrent read and write
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods.
  • added or updated version, license, or notice information
  • added comments explaining the "why" and the intent of the code wherever would not be obvious
    for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold
    for code coverage.
  • added integration tests.
  • been tested in a test IoTDB cluster.

Key changed/added classes (or packages if there are too many classes) in this PR

@luoluoyuyu luoluoyuyu closed this Nov 6, 2025
@luoluoyuyu luoluoyuyu deleted the impl-table-util branch November 6, 2025 10:11
@luoluoyuyu luoluoyuyu restored the impl-table-util branch November 7, 2025 02:13
@luoluoyuyu luoluoyuyu reopened this Nov 7, 2025
@luoluoyuyu luoluoyuyu changed the title Optimize TableSchema conversion for write performance perf: Optimize wide table write performance Nov 7, 2025
Comment on lines 116 to 126
// Skip column name
ReadWriteIOUtils.readString(buffer);
// Skip data type
ReadWriteIOUtils.readDataType(buffer);
// Skip encoding and compression for FIELD columns
if (category == TsTableColumnCategory.FIELD) {
ReadWriteIOUtils.readEncoding(buffer);
ReadWriteIOUtils.readCompressionType(buffer);
}
// Skip column props
ReadWriteIOUtils.readMap(buffer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May add skipString and skipMap, which only change the buffer position instead of creating temporary objects.

Copilot finished reviewing on behalf of HTHou November 20, 2025 10:41
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes wide-table write performance by addressing two main bottlenecks: metadata analysis phase and TSFile table registration. Key optimizations include:

  1. Optimistic locking in TsTable - Introduces lock-free fast paths for read operations using version tracking and write flags
  2. Semantic check caching - Adds flags to skip redundant validation of InsertNode measurements
  3. Direct schema conversion - Eliminates intermediate schema conversions during TSFile registration
  4. Lower-case transformation optimization - Adds caching to prevent redundant toLowerCase operations
  5. Test utilities - Introduces TSDataTypeTestUtils for consistent handling of supported data types in tests

Reviewed Changes

Copilot reviewed 35 out of 35 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
pom.xml Updates tsfile dependency version to 2.2.0-251111-SNAPSHOT
TsTable.java Adds optimistic locking mechanism with version tracking for improved read performance
TsTableColumnSchema.java, FieldColumnSchema.java Adds getMeasurementSchema() method for schema conversion
TsFileTableSchemaUtil.java New utility class for optimized TsTable↔TableSchema conversion without intermediate serialization
InsertNodeMeasurementInfo.java New class encapsulating insert node measurements with lazy evaluation support
InsertBaseStatement.java Adds caching flags for semantic checks, toLowerCase, and attribute columns
InsertTabletStatement.java, InsertRowStatement.java Implements rebuildArraysAfterExpansion for TAG column reordering
WrappedInsertStatement.java Refactors validation to use new InsertNodeMeasurementInfo and optimized TAG column handling
TableHeaderSchemaValidator.java Adds validateInsertNodeMeasurements with custom handlers for optimized validation
DataRegion.java Replaces schema cache with version-aware TableSchema cache
LoadTsFileManager.java, UnsealedTsFileRecoverPerformer.java Uses TsFileTableSchemaUtil instead of intermediate conversions
DataNodeTableCache.java Renames version to instanceVersion for clarity
TSDataTypeTestUtils.java New test utility for filtering unsupported TSDataType values
AlignedTVList.java Optimizes bitmap initialization using markRange
TVList.java Changes hasLimit() to hasSetLimit() for pagination controller
IoTDBConfig.java, IoTDBDescriptor.java Removes deprecated loadTableSchemaCacheSizeInBytes configuration
Comments suppressed due to low confidence (1)

iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/sql/ast/WrappedInsertStatement.java:1

  • This duplicates the same incorrect logic from InsertBaseStatement.semanticCheck() (see previous comment). If a measurement has failed and is in failedMeasurementIndex2Info, it could be null but would skip the null check, then get added to deduplicatedMeasurements, causing incorrect behavior. Failed measurements should be completely skipped from duplicate detection.
/*

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 458 to 464
measurementValidator.validate(
i,
measurementInfo.getMeasurementName(i),
measurementInfo.getType(i),
columnCategories[i],
table.getColumnSchema(measurementInfo.getMeasurementName(i)));
}
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable measurementValidator may be null at this access as suggested by this null guard.

Suggested change
measurementValidator.validate(
i,
measurementInfo.getMeasurementName(i),
measurementInfo.getType(i),
columnCategories[i],
table.getColumnSchema(measurementInfo.getMeasurementName(i)));
}
if (measurementValidator != null) {
measurementValidator.validate(
i,
measurementInfo.getMeasurementName(i),
measurementInfo.getType(i),
columnCategories[i],
table.getColumnSchema(measurementInfo.getMeasurementName(i)));
}

Copilot uses AI. Check for mistakes.
i,
measurementInfo.getMeasurementName(i),
measurementInfo.getType(i),
columnCategories[i],
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable columnCategories may be null at this access as suggested by this null guard.

Suggested change
columnCategories[i],
columnCategories != null ? columnCategories[i] : null,

Copilot uses AI. Check for mistakes.
.validateTableHeaderSchema(
database, tableSchema, context, allowCreateTable, isStrictTagColumn);
}

Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method overrides Metadata.validateInsertNodeMeasurements; it is advisable to add an Override annotation.

Suggested change
@Override

Copilot uses AI. Check for mistakes.
assertEquals(tableSchema, schema);
return Optional.of(tableSchema);
}

Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method overrides TestMetadata.validateInsertNodeMeasurements; it is advisable to add an Override annotation.

Suggested change
@Override

Copilot uses AI. Check for mistakes.
for (int oldIdx = 0; oldIdx < oldLength; oldIdx++) {
final int newIdx = oldToNewMapping[oldIdx];
columns[newIdx] = oldColumns[oldIdx];
if (nullBitMaps != null && oldNullBitMaps != null) {
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is useless. oldNullBitMaps cannot be null at this check, since it is guarded by ... != ....

Suggested change
if (nullBitMaps != null && oldNullBitMaps != null) {
if (nullBitMaps != null) {

Copilot uses AI. Check for mistakes.
public void testInsertMultiRowWithNull() throws SQLException {
try (Connection connection = EnvFactory.getEnv().getConnection(BaseEnv.TABLE_SQL_DIALECT);
Statement st1 = connection.createStatement()) {
st1.execute("SET CONFIGURATION enable_auto_create_schema='false'");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reset configurations when the test is done.

Comment on lines +639 to +640
newNullBitMaps[newIdx] = new BitMap(rowCount);
newNullBitMaps[newIdx].markAll();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not relevant to this PR, but we may add an extension of BitMap like AllMarkedBitMap, which cannot be marked/unmarked and always returns true when a position is tested.
This could somehow reduce the memory footprint, because it will not store any underlying array.

@jt2594838 jt2594838 merged commit 23be220 into apache:master Nov 28, 2025
27 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants