Skip to content

[opt](orc-reader)Turn on late materialization of orc complex types. #49718

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

kaka11chen
Copy link
Contributor

@kaka11chen kaka11chen commented Mar 31, 2025

What problem does this PR solve?

Related PR: #45966

Release note

[opt] (orc-reader) Turn on late materialization of orc complex types.

After implementing the new merge io function in #45966 to adapt the complex type delayed materialization and the need to backtrack to solve the reading characteristics, turn on the late materialization of orc complex types in orc reader.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Mar 31, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34046 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 0e8473a04215ac88a1df3a71d8969cc09a26a921, data reload: false

------ Round 1 ----------------------------------
q1	25965	5095	5044	5044
q2	2064	293	170	170
q3	10405	1236	687	687
q4	10251	1010	532	532
q5	7474	2808	2399	2399
q6	191	164	133	133
q7	933	766	624	624
q8	9333	1304	1094	1094
q9	6804	5115	5127	5115
q10	6813	2267	1894	1894
q11	461	274	253	253
q12	352	354	209	209
q13	17767	3656	3106	3106
q14	225	232	219	219
q15	526	485	488	485
q16	640	607	581	581
q17	569	861	344	344
q18	7397	7357	7073	7073
q19	1084	963	552	552
q20	325	330	186	186
q21	3874	2586	2380	2380
q22	1022	1021	966	966
Total cold run time: 114475 ms
Total hot run time: 34046 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5171	5127	5158	5127
q2	241	337	233	233
q3	2175	2655	2360	2360
q4	1395	1812	1353	1353
q5	4534	4482	4467	4467
q6	215	167	126	126
q7	1947	1913	1719	1719
q8	2595	2536	2478	2478
q9	7191	7235	7129	7129
q10	2931	3150	2724	2724
q11	576	499	492	492
q12	687	748	619	619
q13	3499	3858	3286	3286
q14	277	304	261	261
q15	515	483	468	468
q16	658	689	647	647
q17	1140	1531	1393	1393
q18	7831	7482	7503	7482
q19	868	852	1040	852
q20	1995	2064	1887	1887
q21	5287	4708	4594	4594
q22	1033	1021	996	996
Total cold run time: 52761 ms
Total hot run time: 50693 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 186180 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 0e8473a04215ac88a1df3a71d8969cc09a26a921, data reload: false

query1	1025	472	464	464
query2	6568	1948	1936	1936
query3	6805	215	220	215
query4	26381	23321	23273	23273
query5	4368	661	477	477
query6	301	202	184	184
query7	4609	494	279	279
query8	302	255	253	253
query9	8623	2600	2607	2600
query10	478	307	266	266
query11	16174	15224	14715	14715
query12	173	107	106	106
query13	1653	497	399	399
query14	9881	6026	6071	6026
query15	213	184	166	166
query16	7602	600	488	488
query17	1167	712	556	556
query18	1997	404	297	297
query19	184	188	152	152
query20	126	121	114	114
query21	207	118	110	110
query22	4196	4345	4283	4283
query23	33687	32931	32889	32889
query24	8332	2366	2365	2365
query25	536	451	398	398
query26	1231	261	144	144
query27	2762	479	327	327
query28	4365	2406	2409	2406
query29	683	557	433	433
query30	284	217	187	187
query31	955	875	780	780
query32	75	62	63	62
query33	559	387	354	354
query34	768	836	500	500
query35	806	825	734	734
query36	970	992	882	882
query37	121	101	72	72
query38	4070	4189	4167	4167
query39	1458	1388	1391	1388
query40	207	120	104	104
query41	57	59	55	55
query42	123	108	137	108
query43	513	490	468	468
query44	1265	782	779	779
query45	177	171	173	171
query46	836	1010	615	615
query47	1787	1805	1743	1743
query48	378	415	305	305
query49	772	501	430	430
query50	672	723	400	400
query51	4173	4177	4094	4094
query52	115	106	93	93
query53	216	247	177	177
query54	494	482	423	423
query55	81	81	81	81
query56	279	296	269	269
query57	1145	1160	1097	1097
query58	264	259	256	256
query59	2699	2631	2665	2631
query60	303	296	284	284
query61	161	159	159	159
query62	773	721	653	653
query63	222	185	183	183
query64	4371	1116	796	796
query65	4354	4214	4308	4214
query66	1052	403	300	300
query67	15931	15617	15410	15410
query68	8453	876	515	515
query69	470	304	265	265
query70	1124	1138	1115	1115
query71	462	304	297	297
query72	5554	4754	5011	4754
query73	738	702	345	345
query74	8829	8933	8684	8684
query75	4229	3216	2717	2717
query76	3690	1229	767	767
query77	813	385	299	299
query78	10106	10084	9209	9209
query79	3115	810	559	559
query80	680	520	476	476
query81	459	261	222	222
query82	467	127	98	98
query83	222	179	161	161
query84	286	100	77	77
query85	771	356	314	314
query86	341	305	290	290
query87	4473	4484	4306	4306
query88	2990	2271	2296	2271
query89	430	351	283	283
query90	1960	202	212	202
query91	146	140	115	115
query92	72	60	61	60
query93	1945	1040	594	594
query94	672	412	312	312
query95	368	276	265	265
query96	487	555	284	284
query97	3202	3185	3093	3093
query98	221	201	212	201
query99	1458	1370	1285	1285
Total cold run time: 277348 ms
Total hot run time: 186180 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.82 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 0e8473a04215ac88a1df3a71d8969cc09a26a921, data reload: false

query1	0.04	0.03	0.04
query2	0.12	0.10	0.11
query3	0.25	0.20	0.20
query4	1.60	0.19	0.21
query5	0.55	0.55	0.56
query6	1.19	0.72	0.74
query7	0.03	0.02	0.02
query8	0.04	0.03	0.04
query9	0.59	0.52	0.52
query10	0.60	0.58	0.58
query11	0.15	0.11	0.10
query12	0.16	0.11	0.11
query13	0.63	0.62	0.60
query14	2.66	2.69	2.67
query15	0.94	0.86	0.85
query16	0.39	0.39	0.39
query17	1.01	1.04	1.02
query18	0.22	0.20	0.20
query19	2.02	1.88	1.84
query20	0.01	0.01	0.01
query21	15.35	0.92	0.53
query22	0.76	1.20	0.73
query23	14.80	1.41	0.60
query24	7.10	1.34	0.45
query25	0.46	0.13	0.06
query26	0.54	0.17	0.14
query27	0.05	0.05	0.05
query28	9.49	0.89	0.45
query29	12.60	3.99	3.29
query30	0.25	0.10	0.06
query31	2.81	0.60	0.39
query32	3.24	0.59	0.49
query33	3.02	3.09	3.11
query34	15.75	5.24	4.47
query35	4.50	4.48	4.50
query36	0.66	0.50	0.49
query37	0.09	0.06	0.07
query38	0.05	0.03	0.04
query39	0.03	0.03	0.03
query40	0.17	0.13	0.12
query41	0.09	0.03	0.03
query42	0.04	0.02	0.03
query43	0.04	0.03	0.03
Total cold run time: 105.09 s
Total hot run time: 30.82 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/4) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 51.22% (13702/26751)
Line Coverage 40.55% (119196/293940)
Region Coverage 39.26% (60618/154420)
Branch Coverage 34.03% (30398/89328)

@kaka11chen kaka11chen force-pushed the turn_on_late_mat_orc_complex_types branch from 0e8473a to 4103597 Compare April 2, 2025 08:53
@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34219 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 41035973b35f71402eb67c2a4f93511c307f439b, data reload: false

------ Round 1 ----------------------------------
q1	25452	5089	4994	4994
q2	2060	276	179	179
q3	10414	1251	673	673
q4	10209	985	530	530
q5	7522	2371	2320	2320
q6	187	161	133	133
q7	909	761	609	609
q8	9423	1306	1112	1112
q9	7030	5277	5216	5216
q10	7171	2341	1909	1909
q11	474	298	278	278
q12	352	352	219	219
q13	18358	3660	3101	3101
q14	226	231	203	203
q15	530	493	487	487
q16	642	601	571	571
q17	616	852	383	383
q18	7540	7270	7080	7080
q19	1894	951	553	553
q20	330	331	229	229
q21	4048	3438	2469	2469
q22	1063	994	971	971
Total cold run time: 116450 ms
Total hot run time: 34219 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5148	5097	5142	5097
q2	241	330	242	242
q3	2124	2628	2258	2258
q4	1397	1817	1440	1440
q5	4491	4435	4371	4371
q6	219	171	127	127
q7	1987	1894	1774	1774
q8	2658	2594	2576	2576
q9	7285	7028	7319	7028
q10	2975	3150	2737	2737
q11	586	518	479	479
q12	692	742	627	627
q13	3403	3889	3343	3343
q14	291	297	271	271
q15	534	478	476	476
q16	649	657	642	642
q17	1160	1550	1371	1371
q18	7862	7781	7498	7498
q19	796	787	835	787
q20	1914	2000	1865	1865
q21	5300	4808	4922	4808
q22	1110	1089	997	997
Total cold run time: 52822 ms
Total hot run time: 50814 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192609 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 41035973b35f71402eb67c2a4f93511c307f439b, data reload: false

query1	1393	1087	1048	1048
query2	6230	1988	1931	1931
query3	11050	4402	4391	4391
query4	57061	24380	23047	23047
query5	5128	459	456	456
query6	387	195	209	195
query7	5198	505	292	292
query8	326	230	229	229
query9	7063	2578	2589	2578
query10	420	324	274	274
query11	15329	15001	14924	14924
query12	160	115	109	109
query13	1241	522	413	413
query14	10202	6400	6408	6400
query15	209	203	176	176
query16	7154	659	483	483
query17	1114	750	607	607
query18	1725	405	358	358
query19	202	212	175	175
query20	133	128	124	124
query21	215	128	114	114
query22	4640	4486	4521	4486
query23	34133	33198	33280	33198
query24	6622	2424	2413	2413
query25	445	458	416	416
query26	686	275	159	159
query27	2301	498	344	344
query28	3228	2436	2454	2436
query29	579	562	431	431
query30	273	224	191	191
query31	880	897	840	840
query32	76	63	61	61
query33	448	374	324	324
query34	754	866	531	531
query35	786	827	752	752
query36	942	1009	885	885
query37	120	101	75	75
query38	4156	4245	4086	4086
query39	1495	1462	1432	1432
query40	202	120	108	108
query41	52	53	52	52
query42	128	115	109	109
query43	506	524	491	491
query44	1317	829	826	826
query45	184	176	168	168
query46	846	1036	653	653
query47	1828	1889	1819	1819
query48	395	416	309	309
query49	706	575	424	424
query50	665	701	424	424
query51	4211	4335	4293	4293
query52	113	113	110	110
query53	241	271	204	204
query54	590	602	528	528
query55	88	88	88	88
query56	329	309	300	300
query57	1209	1210	1122	1122
query58	274	252	263	252
query59	2747	2985	2822	2822
query60	322	323	306	306
query61	132	145	128	128
query62	727	740	674	674
query63	225	194	192	192
query64	1776	1057	703	703
query65	4433	4349	4203	4203
query66	699	402	305	305
query67	15856	15792	15192	15192
query68	7766	892	528	528
query69	594	301	267	267
query70	1227	1081	1089	1081
query71	500	327	287	287
query72	5750	4708	4810	4708
query73	1229	645	350	350
query74	8878	9240	8603	8603
query75	3793	3210	2721	2721
query76	4403	1185	757	757
query77	620	376	364	364
query78	10584	10036	9264	9264
query79	3246	810	554	554
query80	688	496	435	435
query81	473	261	222	222
query82	499	126	96	96
query83	389	255	254	254
query84	292	100	79	79
query85	803	351	314	314
query86	400	291	282	282
query87	4374	4470	4268	4268
query88	3391	2248	2254	2248
query89	426	304	281	281
query90	1994	209	213	209
query91	137	139	109	109
query92	73	59	54	54
query93	1965	951	575	575
query94	681	406	354	354
query95	368	281	286	281
query96	495	571	276	276
query97	3147	3249	3114	3114
query98	225	205	202	202
query99	1419	1370	1283	1283
Total cold run time: 306134 ms
Total hot run time: 192609 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.54 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 41035973b35f71402eb67c2a4f93511c307f439b, data reload: false

query1	0.04	0.04	0.03
query2	0.13	0.10	0.10
query3	0.25	0.19	0.19
query4	1.59	0.20	0.20
query5	0.60	0.58	0.58
query6	1.19	0.73	0.71
query7	0.02	0.02	0.02
query8	0.04	0.04	0.04
query9	0.58	0.53	0.52
query10	0.57	0.58	0.58
query11	0.15	0.11	0.10
query12	0.14	0.11	0.11
query13	0.62	0.61	0.60
query14	2.66	2.69	2.81
query15	0.94	0.85	0.85
query16	0.38	0.38	0.41
query17	1.04	1.05	1.00
query18	0.21	0.19	0.20
query19	1.88	1.99	1.82
query20	0.01	0.01	0.02
query21	15.36	0.89	0.56
query22	0.77	1.16	0.62
query23	15.02	1.40	0.62
query24	7.19	1.89	0.36
query25	0.41	0.19	0.06
query26	0.52	0.15	0.13
query27	0.05	0.04	0.04
query28	9.01	0.85	0.45
query29	12.58	3.97	3.26
query30	0.25	0.10	0.07
query31	2.82	0.61	0.37
query32	3.24	0.55	0.47
query33	3.02	3.05	3.08
query34	15.81	5.20	4.45
query35	4.56	4.52	4.51
query36	0.66	0.50	0.48
query37	0.08	0.06	0.07
query38	0.05	0.04	0.04
query39	0.03	0.02	0.03
query40	0.17	0.14	0.13
query41	0.08	0.03	0.02
query42	0.04	0.02	0.02
query43	0.04	0.04	0.03
Total cold run time: 104.8 s
Total hot run time: 30.54 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.11% (13953/26776)
Line Coverage 40.78% (119999/294285)
Region Coverage 39.56% (61205/154718)
Branch Coverage 34.22% (30612/89460)

Copy link
Contributor

github-actions bot commented Apr 8, 2025

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 8, 2025
Copy link
Contributor

github-actions bot commented Apr 8, 2025

PR approved by anyone and no changes requested.

@morningman morningman merged commit 55bf08c into apache:master Apr 8, 2025
26 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.x-experimental dev/3.0.x-experimental reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants