Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] 合并策略为time_series_compaction_level_threshold=2的表,实际合并后数据文件并未合并至预期大小 #48970

Open
3 tasks done
zoy-bot opened this issue Mar 12, 2025 · 0 comments

Comments

@zoy-bot
Copy link

zoy-bot commented Mar 12, 2025

Search before asking

  • I had searched in the issues and found no similar issues.

Version

doris-2.1.6-rc04

What's Wrong?

  • 问题描述

2.1.6版本下创建的表,合并策略为time_series,启用了二次合并"time_series_compaction_level_threshold" = "2",同时配置 "time_series_compaction_goal_size_mbytes" = "1024",实际在数据写入、合并完成后仍然有大量低于1024M的dat文件,导致数据文件过多性能未达预期。

  • 主要表配置
CREATE TABLE `mytable` (
  `in_date` datetime(3) NOT NULL COMMENT '采集时间',
  `send_date` datetime(3) NULL COMMENT '发送时间',
  `id` varchar(40) NOT NULL COMMENT '数据ID'
  INDEX idx_inverted_send_date (`send_date`) USING INVERTED COMMENT 'idx_inverted_send_date',
  INDEX idx_inverted_log_id (`id`) USING INVERTED PROPERTIES("parser" = "unicode", "support_phrase" = "true") COMMENT 'idx_inverted_log_id',
) ENGINE=OLAP
DUPLICATE KEY(`in_date`)
COMMENT 'mytable'
PARTITION BY RANGE(`in_date`)()
DISTRIBUTED BY RANDOM BUCKETS 64
PROPERTIES (
  "replication_allocation" = "tag.location.default: 1",
  "bloom_filter_columns" = "id",
  "is_being_synced" = "false",
  "dynamic_partition.enable" = "true",
  "dynamic_partition.time_unit" = "DAY",
  "dynamic_partition.time_zone" = "Asia/Shanghai",
  "dynamic_partition.start" = "-190",
  "dynamic_partition.end" = "2",
  "dynamic_partition.prefix" = "p",
  "dynamic_partition.replication_allocation" = "tag.location.default: 1",
  "dynamic_partition.buckets" = "64",
  "dynamic_partition.create_history_partition" = "true",
  "dynamic_partition.history_partition_num" = "100",
  "dynamic_partition.storage_medium" = "SSD",
  "storage_format" = "V2",
  "inverted_index_storage_format" = "v2",
  "light_schema_change" = "true",
  "compaction_policy" = "time_series",
  "time_series_compaction_goal_size_mbytes" = "1024",
  "time_series_compaction_file_count_threshold" = "2000",
  "time_series_compaction_time_threshold_seconds" = "1800",
  "time_series_compaction_empty_rowsets_threshold" = "5",
  "time_series_compaction_level_threshold" = "2",
  "disable_auto_compaction" = "false",
  "enable_single_replica_compaction" = "false"
);

合并后的tablet信息

Image

What You Expected?

在启用二次合并"time_series_compaction_level_threshold" = "2"的表上,数据文件可以按配置 "time_series_compaction_goal_size_mbytes" = "1024"进行多次合并,从而减少小文件数量,提升性能

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant