Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](thirdparty) Disable BRPC 1.4 contention profiler to avoid deadlock #41891

Merged
merged 2 commits into from
Oct 15, 2024

Conversation

xinyiZzz
Copy link
Contributor

@xinyiZzz xinyiZzz commented Oct 15, 2024

BRPC contention profiler hooks pthread mutex, which may deadlock when used with Jemalloc.
This PR remove pthread mutex hook and disable BRPC contention profiler.

image

similar issue: apache/brpc#2726
reference fix: apache/brpc#2727

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@xinyiZzz
Copy link
Contributor Author

run buildall

Copy link
Contributor

sh-checker report

To get the full details, please check in the job output.

shellcheck errors
'shellcheck ' found no issues.

shfmt errors

'shfmt ' returned error 1 finding the following formatting issues:

----------
--- thirdparty/download-thirdparty.sh.orig
+++ thirdparty/download-thirdparty.sh
@@ -454,7 +454,7 @@
         if [[ ! -f "${PATCHED_MARK}" ]]; then
             for patch_file in "${TP_PATCH_DIR}"/brpc-*; do
                 echo "patch ${patch_file}"
-                patch -p1 <"${patch_file}" --ignore-whitespace
+                patch -p1 --ignore-whitespace <"${patch_file}"
             done
             touch "${PATCHED_MARK}"
         fi
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename


@xinyiZzz
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41377 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5a9bb10a5f14111e3ced8c64a1832740fc80efaf, data reload: false

------ Round 1 ----------------------------------
q1	17634	7454	7304	7304
q2	2060	163	150	150
q3	10574	1133	1170	1133
q4	10235	839	819	819
q5	7743	3085	3064	3064
q6	239	153	154	153
q7	1014	619	599	599
q8	9356	1889	1997	1889
q9	6608	6453	6415	6415
q10	7013	2414	2426	2414
q11	441	237	241	237
q12	407	220	222	220
q13	17783	2995	3002	2995
q14	243	210	209	209
q15	586	511	518	511
q16	645	588	581	581
q17	968	502	590	502
q18	7272	6814	6885	6814
q19	1354	1021	1050	1021
q20	484	188	190	188
q21	3994	3306	3168	3168
q22	1095	991	1015	991
Total cold run time: 107748 ms
Total hot run time: 41377 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7254	7293	7230	7230
q2	332	235	225	225
q3	2919	2711	2810	2711
q4	1994	1716	1731	1716
q5	5470	5468	5547	5468
q6	227	144	145	144
q7	2144	1732	1709	1709
q8	3244	3426	3451	3426
q9	8568	8559	8561	8559
q10	3501	3429	3464	3429
q11	576	477	478	477
q12	784	601	582	582
q13	8851	3033	3039	3033
q14	285	282	273	273
q15	574	523	503	503
q16	685	639	623	623
q17	1813	1572	1579	1572
q18	7829	7585	7585	7585
q19	1687	1548	1501	1501
q20	2045	1813	1844	1813
q21	5310	5343	5316	5316
q22	1114	1046	1032	1032
Total cold run time: 67206 ms
Total hot run time: 58927 ms

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Oct 15, 2024
Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-DS: Total hot run time: 192189 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5a9bb10a5f14111e3ced8c64a1832740fc80efaf, data reload: false

query1	973	380	375	375
query2	6525	2096	2010	2010
query3	6797	223	241	223
query4	34121	23698	23505	23505
query5	4330	475	469	469
query6	252	166	173	166
query7	4612	296	294	294
query8	302	246	226	226
query9	9715	2714	2721	2714
query10	472	276	262	262
query11	18046	15402	15337	15337
query12	155	107	104	104
query13	1658	431	424	424
query14	10706	7577	7738	7577
query15	251	170	178	170
query16	7832	458	469	458
query17	1672	581	573	573
query18	2099	302	311	302
query19	367	154	152	152
query20	118	111	111	111
query21	211	106	107	106
query22	4582	4207	4179	4179
query23	34945	34286	34119	34119
query24	11160	2730	2839	2730
query25	644	418	419	418
query26	1379	165	165	165
query27	2671	292	294	292
query28	7842	2459	2432	2432
query29	886	450	449	449
query30	326	159	156	156
query31	1046	823	833	823
query32	101	56	58	56
query33	772	288	298	288
query34	959	527	543	527
query35	905	742	739	739
query36	1078	941	953	941
query37	160	98	95	95
query38	4006	3907	3935	3907
query39	1500	1437	1403	1403
query40	278	103	98	98
query41	49	46	48	46
query42	126	101	98	98
query43	537	485	488	485
query44	1222	826	820	820
query45	199	168	167	167
query46	1190	704	715	704
query47	1938	1825	1838	1825
query48	416	323	319	319
query49	1180	438	429	429
query50	836	390	396	390
query51	7085	6996	6969	6969
query52	104	90	92	90
query53	254	185	183	183
query54	1339	436	440	436
query55	82	78	79	78
query56	278	277	269	269
query57	1324	1153	1142	1142
query58	254	231	239	231
query59	3124	2817	2844	2817
query60	308	291	277	277
query61	123	133	100	100
query62	872	687	664	664
query63	217	193	182	182
query64	5325	623	593	593
query65	3295	3266	3251	3251
query66	1477	316	305	305
query67	16352	15825	15635	15635
query68	4503	561	575	561
query69	477	282	294	282
query70	1182	1087	1122	1087
query71	348	265	274	265
query72	7067	4001	3995	3995
query73	769	354	357	354
query74	10234	9061	8949	8949
query75	3372	2671	2661	2661
query76	2947	980	908	908
query77	426	305	287	287
query78	10535	9679	9719	9679
query79	2672	594	606	594
query80	1033	480	442	442
query81	572	245	246	245
query82	942	140	137	137
query83	255	134	143	134
query84	239	76	76	76
query85	1295	301	284	284
query86	425	299	308	299
query87	4531	4324	4313	4313
query88	3872	2233	2191	2191
query89	405	290	287	287
query90	1938	189	190	189
query91	138	100	98	98
query92	72	51	51	51
query93	1580	546	535	535
query94	972	284	272	272
query95	346	242	252	242
query96	609	283	289	283
query97	3227	3147	3131	3131
query98	211	205	191	191
query99	1538	1287	1303	1287
Total cold run time: 304692 ms
Total hot run time: 192189 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.75 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 5a9bb10a5f14111e3ced8c64a1832740fc80efaf, data reload: false

query1	0.04	0.03	0.03
query2	0.07	0.03	0.03
query3	0.23	0.07	0.06
query4	1.64	0.10	0.10
query5	0.51	0.49	0.52
query6	1.13	0.73	0.72
query7	0.02	0.01	0.02
query8	0.04	0.04	0.06
query9	0.58	0.49	0.52
query10	0.54	0.55	0.53
query11	0.14	0.10	0.10
query12	0.14	0.12	0.12
query13	0.61	0.61	0.60
query14	2.71	2.86	2.86
query15	0.90	0.83	0.85
query16	0.39	0.39	0.39
query17	1.09	1.05	0.99
query18	0.23	0.23	0.22
query19	1.87	1.89	2.01
query20	0.01	0.00	0.01
query21	15.38	0.59	0.59
query22	2.58	2.09	2.94
query23	17.07	0.85	0.75
query24	3.01	0.58	1.40
query25	0.23	0.06	0.14
query26	0.50	0.13	0.13
query27	0.05	0.04	0.04
query28	10.85	1.10	1.07
query29	12.62	3.24	3.24
query30	0.25	0.06	0.06
query31	2.87	0.40	0.38
query32	3.26	0.47	0.46
query33	3.00	3.02	3.05
query34	17.17	4.47	4.50
query35	4.50	4.51	4.52
query36	0.69	0.49	0.49
query37	0.08	0.06	0.06
query38	0.05	0.04	0.04
query39	0.04	0.02	0.03
query40	0.17	0.12	0.12
query41	0.07	0.03	0.02
query42	0.03	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 107.39 s
Total hot run time: 32.75 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.46% (9708/25916)
Line Coverage: 28.74% (80630/280504)
Region Coverage: 28.18% (41704/147970)
Branch Coverage: 24.77% (21213/85638)
Coverage Report: http://coverage.selectdb-in.cc/coverage/5a9bb10a5f14111e3ced8c64a1832740fc80efaf_5a9bb10a5f14111e3ced8c64a1832740fc80efaf/report/index.html

@yiguolei yiguolei merged commit 0d5830c into apache:master Oct 15, 2024
26 of 28 checks passed
xinyiZzz added a commit to xinyiZzz/incubator-doris that referenced this pull request Oct 16, 2024
…ock (apache#41891)

BRPC contention profiler hooks pthread mutex, which may deadlock when
used with Jemalloc.
This PR remove pthread mutex hook and disable BRPC contention profiler.

![image](https://github.com/user-attachments/assets/62ccc04c-718a-43db-8354-b1bbc0565958)

similar issue: apache/brpc#2726
reference fix: apache/brpc#2727
yiguolei pushed a commit that referenced this pull request Oct 16, 2024
@chenBright
Copy link

chenBright commented Oct 31, 2024

bRPC 1.4 does not contain the code of apache/brpc#2692, so there is no issue of apache/brpc#2726. The real issue should be apache/brpc#859, which has been resolved in apache/brpc#2684.

In addition, bRPC 1.11 supports disabling pthread mutex hook with NO_PTHREAD_MUTEX_HOOK macro.

@xinyiZzz
Copy link
Contributor Author

bRPC 1.4 does not contain the code of apache/brpc#2692, so there is no issue of apache/brpc#2726. The real issue should be apache/brpc#859, which has been resolved in apache/brpc#2684.

In addition, bRPC 1.11 supports disabling pthread mutex hook using NO_PTHREAD_MUTEX_HOOK macro.

Thanks for your suggestion! this PR simply disables pthread mutex hook and contention profiler.

Next, I will consider pick apache/brpc#2684, or upgrading bRPC version (I tried several times, so sad)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.7-merged dev/3.0.x reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants