Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve](routine load) introduce routine load abnormal job monitor #48171

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sollhui
Copy link
Contributor

@sollhui sollhui commented Feb 21, 2025

What problem does this PR solve?

related #48511

Add a metric doris_fe_routine_load_abnormal_job_nums to monitor abnormal job.

How to define abnormal?

In the routine load scheduler thread, check if it is an abnormal job:

  1. Is the value of autoResumeCount greater than or equal to Config.min_abnormal_auto_resume_count_threshold
  2. Defined a window with a size of c (the purpose of defining the window is that the job is constantly running), and performed the following two checks in the window:
  • check abort transaction ratio is greater than or equal to Config.min_abnormal_abort_txn_ratio_threshold
  • Check if the progress of each partition increases when lag is not zero.

How to use the metrics?

The metric doris_fe_routine_load_abnormal_job_nums can be configured in monitoring platforms such as Grafana. If a value greater than 0 is found, we provide an HTTP API to display which jobs are abnormal. Here is an example:

Assume that there is an abnormal job, and observed that the metric doris_fe_routine_load_abnormal_job_nums is greater than 0, we can use the HTTP API to display which jobs are abnormal.

curl -X GET --location-trusted -u ${User}:${Password} http://${feHttpAddress}/api/routine_load/abnormal_jobs

result is:

{"db.example_routine_load_job":"The auto resume time reaches threshold: 5"}

The result means the routine load job db.example_routine_load_job has been automatically resume all the time, and the specific reason can be observed by show routine load for db.example_routine_load_job command:

                  Id: 1740796901932
                Name: example_routine_load_job
          CreateTime: 2025-03-02 16:34:49
           PauseTime: 2025-03-02 16:34:58
             EndTime: NULL
              DbName: db
           TableName: test
        IsMultiTable: false
               State: PAUSED
      DataSourceType: KAFKA
      CurrentTaskNum: 0
       JobProperties: {"max_batch_rows":"20000000","timezone":"Asia/Shanghai","send_batch_parallelism":"1","load_to_single_tablet":"false","current_concurrent_number":"0","delete":"*","partial_columns":"false","merge_type":"APPEND","exec_mem_limit":"2147483648","strict_mode":"false","jsonpaths":"","max_batch_interval":"60","max_batch_size":"1073741824","fuzzy_parse":"false","escape":"0","enclose":"0","partitions":"*","columnToColumnExpr":"time,stream,logtag,kubernetes","whereExpr":"*","desired_concurrent_number":"256","precedingFilter":"*","format":"json","max_error_number":"0","max_filter_ratio":"1.0","json_root":"","strip_outer_array":"false","num_as_string":"false"}
DataSourceProperties: {"topic":"test_user111","currentKafkaPartitions":"","brokerList":"127.0.0.1:9092"}
    CustomProperties: {"kafka_default_offsets":"OFFSET_BEGINNING","group.id":"example_routine_load_csv2_e3be6163-e58a-42c7-9715-488cc50b5eb8"}
           Statistic: {"receivedBytes":0,"runningTxns":[],"errorRows":0,"committedTaskNum":0,"loadedRows":0,"loadRowsRate":0,"abortedTaskNum":0,"errorRowsAfterResumed":0,"totalRows":0,"unselectedRows":0,"receivedBytesRate":0,"taskExecuteTimeMs":1}
            Progress: {}
                 Lag: {}
ReasonOfStateChanged: ErrorReason{code=errCode = 4, msg='errCode = 2, detailMessage = Failed to get all partitions of kafka topic: test_user111 error: errCode = 2, detailMessage = Failed to get info may be Kafka properties set in job is error or no partition in this topic that should check Kafka'}
        ErrorLogUrls: 
            OtherMsg: 
                User: root
             Comment: 

Finally we find this abnormal job is due to continuous error messages when unable to connect to Kafka, and it was eventually discovered that the topic does not exist.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@sollhui
Copy link
Contributor Author

sollhui commented Feb 21, 2025

run buildall

@sollhui sollhui force-pushed the rl_abnormal_job_monitor branch from d640074 to 983a4cc Compare February 21, 2025 03:41
@sollhui
Copy link
Contributor Author

sollhui commented Feb 21, 2025

run buildall

@sollhui sollhui changed the title [improve](routine load)(observability) introduce routine load abnormal job monitor [improve](routine load) introduce routine load abnormal job monitor Feb 21, 2025
@doris-robot
Copy link

TPC-H: Total hot run time: 31586 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 983a4cc990c8c6ca386856f404544742ebaafd5d, data reload: false

------ Round 1 ----------------------------------
q1	17642	5194	5114	5114
q2	2044	299	168	168
q3	10413	1305	755	755
q4	10208	1023	531	531
q5	7524	2455	2265	2265
q6	190	169	135	135
q7	902	757	609	609
q8	9296	1345	1173	1173
q9	5054	4660	4652	4652
q10	6835	2350	1924	1924
q11	481	267	252	252
q12	346	362	219	219
q13	17748	3718	3072	3072
q14	223	225	219	219
q15	498	465	461	461
q16	616	623	590	590
q17	563	873	344	344
q18	6552	6210	6176	6176
q19	1222	950	532	532
q20	316	318	185	185
q21	2880	2160	1912	1912
q22	360	336	298	298
Total cold run time: 101913 ms
Total hot run time: 31586 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5093	5140	5117	5117
q2	233	329	233	233
q3	2179	2674	2272	2272
q4	1414	1831	1348	1348
q5	4227	4180	4180	4180
q6	207	171	126	126
q7	1884	1844	1659	1659
q8	2632	2576	2504	2504
q9	7389	7173	7074	7074
q10	2985	3229	2729	2729
q11	585	524	479	479
q12	681	728	632	632
q13	3513	3910	3268	3268
q14	273	305	295	295
q15	507	476	475	475
q16	637	665	642	642
q17	1115	1570	1335	1335
q18	7614	7316	7267	7267
q19	805	799	954	799
q20	1981	2013	1894	1894
q21	5430	4987	4922	4922
q22	626	577	568	568
Total cold run time: 52010 ms
Total hot run time: 49818 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183367 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 983a4cc990c8c6ca386856f404544742ebaafd5d, data reload: false

query1	951	362	383	362
query2	6526	1935	1833	1833
query3	6802	213	205	205
query4	26635	23746	22896	22896
query5	4305	689	473	473
query6	301	191	187	187
query7	4611	499	307	307
query8	289	246	233	233
query9	8617	2561	2582	2561
query10	489	313	259	259
query11	15365	15106	14983	14983
query12	174	108	105	105
query13	1651	500	382	382
query14	8991	6243	6101	6101
query15	206	194	176	176
query16	7127	621	446	446
query17	896	701	533	533
query18	1947	383	289	289
query19	178	193	162	162
query20	117	114	116	114
query21	207	120	97	97
query22	4155	4313	4446	4313
query23	34260	33460	33255	33255
query24	7703	2385	2401	2385
query25	534	466	376	376
query26	1240	261	148	148
query27	2581	473	332	332
query28	4339	2404	2394	2394
query29	771	536	421	421
query30	232	181	157	157
query31	931	833	809	809
query32	68	60	59	59
query33	543	383	296	296
query34	768	836	503	503
query35	799	801	750	750
query36	973	1014	922	922
query37	117	95	74	74
query38	4249	4172	4094	4094
query39	1447	1409	1407	1407
query40	215	116	107	107
query41	54	54	50	50
query42	122	104	106	104
query43	507	519	469	469
query44	1294	778	773	773
query45	173	172	158	158
query46	864	1030	657	657
query47	1752	1797	1739	1739
query48	401	416	305	305
query49	784	521	415	415
query50	677	738	430	430
query51	4227	4207	4098	4098
query52	109	103	94	94
query53	224	261	180	180
query54	477	480	403	403
query55	87	82	77	77
query56	258	261	232	232
query57	1142	1149	1089	1089
query58	237	242	238	238
query59	2635	2758	2544	2544
query60	274	274	257	257
query61	118	122	113	113
query62	786	726	651	651
query63	244	188	187	187
query64	4471	985	641	641
query65	3216	3108	3133	3108
query66	1114	398	316	316
query67	15650	15497	15270	15270
query68	2214	777	544	544
query69	434	307	284	284
query70	1215	1157	1056	1056
query71	320	298	328	298
query72	5913	3589	3686	3589
query73	652	745	350	350
query74	9058	9166	8949	8949
query75	3114	3249	2712	2712
query76	2245	1151	750	750
query77	350	371	288	288
query78	9954	10170	9287	9287
query79	1127	937	598	598
query80	650	546	464	464
query81	487	329	236	236
query82	1272	129	99	99
query83	229	169	150	150
query84	287	92	73	73
query85	729	334	310	310
query86	329	314	283	283
query87	4470	4487	4421	4421
query88	3002	2231	2235	2231
query89	404	319	296	296
query90	1734	204	198	198
query91	138	146	105	105
query92	62	63	58	58
query93	1101	991	577	577
query94	487	401	305	305
query95	354	269	265	265
query96	507	546	296	296
query97	2767	2904	2734	2734
query98	231	204	205	204
query99	1451	1440	1253	1253
Total cold run time: 261494 ms
Total hot run time: 183367 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.83 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 983a4cc990c8c6ca386856f404544742ebaafd5d, data reload: false

query1	0.04	0.04	0.04
query2	0.07	0.03	0.03
query3	0.24	0.07	0.08
query4	1.61	0.10	0.10
query5	0.42	0.42	0.40
query6	1.16	0.65	0.66
query7	0.02	0.02	0.01
query8	0.04	0.03	0.03
query9	0.59	0.52	0.54
query10	0.58	0.58	0.57
query11	0.15	0.10	0.10
query12	0.14	0.11	0.11
query13	0.62	0.60	0.62
query14	2.68	2.70	2.70
query15	0.90	0.83	0.83
query16	0.37	0.38	0.40
query17	1.04	1.02	1.02
query18	0.21	0.20	0.20
query19	1.87	1.80	1.99
query20	0.02	0.01	0.02
query21	15.38	0.91	0.55
query22	0.75	1.14	0.62
query23	15.03	1.41	0.66
query24	7.52	1.16	1.00
query25	0.55	0.24	0.11
query26	0.63	0.17	0.14
query27	0.05	0.05	0.05
query28	10.04	0.87	0.42
query29	12.55	3.98	3.33
query30	0.25	0.09	0.06
query31	2.84	0.57	0.37
query32	3.22	0.55	0.48
query33	3.04	3.01	3.00
query34	15.78	5.16	4.51
query35	4.57	4.57	4.59
query36	0.67	0.49	0.48
query37	0.09	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.03	0.02
query40	0.17	0.14	0.12
query41	0.09	0.02	0.02
query42	0.04	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 106.14 s
Total hot run time: 30.83 s

@sollhui sollhui force-pushed the rl_abnormal_job_monitor branch from 983a4cc to 96abe50 Compare February 21, 2025 06:12
@sollhui
Copy link
Contributor Author

sollhui commented Feb 21, 2025

run buildall

@sollhui sollhui marked this pull request as draft February 21, 2025 06:17
@doris-robot
Copy link

TPC-H: Total hot run time: 31192 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 96abe50dd47182a565e6f57e7a16258da71d0e53, data reload: false

------ Round 1 ----------------------------------
q1	17614	5504	5084	5084
q2	2047	284	166	166
q3	10429	1231	765	765
q4	10211	1025	539	539
q5	7538	2463	2250	2250
q6	186	185	139	139
q7	896	743	609	609
q8	9304	1309	1171	1171
q9	4825	4555	4519	4519
q10	6836	2298	1877	1877
q11	504	279	261	261
q12	352	357	219	219
q13	17761	3712	3054	3054
q14	223	227	211	211
q15	507	464	447	447
q16	624	595	577	577
q17	579	847	332	332
q18	6618	6027	6113	6027
q19	1848	942	541	541
q20	301	319	187	187
q21	2858	2122	1918	1918
q22	366	326	299	299
Total cold run time: 102427 ms
Total hot run time: 31192 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5155	5097	5109	5097
q2	230	323	231	231
q3	2156	2660	2271	2271
q4	1429	1811	1365	1365
q5	4263	4143	4125	4125
q6	208	160	125	125
q7	1851	1806	1687	1687
q8	2556	2517	2547	2517
q9	7150	7166	7093	7093
q10	3022	3199	2774	2774
q11	574	510	500	500
q12	717	783	605	605
q13	3551	3768	3344	3344
q14	288	300	273	273
q15	521	474	459	459
q16	619	675	626	626
q17	1120	1536	1369	1369
q18	7555	7369	7223	7223
q19	798	822	873	822
q20	1967	1979	1854	1854
q21	5394	5291	4753	4753
q22	616	563	532	532
Total cold run time: 51740 ms
Total hot run time: 49645 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191826 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 96abe50dd47182a565e6f57e7a16258da71d0e53, data reload: false

query1	1299	948	981	948
query2	6211	1916	1847	1847
query3	10973	4465	4293	4293
query4	53104	25847	23667	23667
query5	5216	564	494	494
query6	372	210	191	191
query7	5131	509	300	300
query8	330	237	220	220
query9	6300	2774	2763	2763
query10	424	306	262	262
query11	15538	15100	14974	14974
query12	162	109	116	109
query13	1144	559	417	417
query14	10337	6920	6467	6467
query15	198	194	180	180
query16	7096	645	490	490
query17	1096	772	619	619
query18	1510	436	324	324
query19	235	203	188	188
query20	131	126	124	124
query21	209	129	105	105
query22	4549	4497	4272	4272
query23	34167	33480	33424	33424
query24	5679	2427	2494	2427
query25	476	478	405	405
query26	678	305	167	167
query27	1892	503	357	357
query28	2949	2571	2499	2499
query29	571	572	440	440
query30	207	187	152	152
query31	909	841	801	801
query32	76	66	83	66
query33	476	389	324	324
query34	774	877	552	552
query35	822	862	763	763
query36	996	1026	904	904
query37	126	104	70	70
query38	4306	4374	4461	4374
query39	1477	1454	1456	1454
query40	213	115	107	107
query41	56	51	48	48
query42	124	110	108	108
query43	516	524	511	511
query44	1389	869	890	869
query45	186	176	169	169
query46	922	1079	669	669
query47	1837	1856	1775	1775
query48	414	430	337	337
query49	693	549	462	462
query50	717	762	439	439
query51	4351	4283	4249	4249
query52	113	107	109	107
query53	250	271	194	194
query54	498	499	432	432
query55	88	88	91	88
query56	282	283	268	268
query57	1179	1184	1105	1105
query58	257	253	253	253
query59	2785	2953	2865	2865
query60	283	284	281	281
query61	125	115	130	115
query62	722	758	708	708
query63	241	207	197	197
query64	1447	1026	688	688
query65	3194	3140	3112	3112
query66	733	396	293	293
query67	15833	15450	15335	15335
query68	5323	800	544	544
query69	521	361	268	268
query70	1233	1126	1119	1119
query71	429	304	269	269
query72	6281	3657	3509	3509
query73	1038	756	369	369
query74	9204	9128	9143	9128
query75	3217	3183	2700	2700
query76	3829	1174	752	752
query77	540	387	288	288
query78	9934	10149	9236	9236
query79	2410	863	640	640
query80	604	536	463	463
query81	506	277	240	240
query82	475	133	99	99
query83	176	170	163	163
query84	279	94	74	74
query85	758	341	306	306
query86	377	309	284	284
query87	4459	4468	4515	4468
query88	3904	2404	2366	2366
query89	407	317	302	302
query90	1805	196	195	195
query91	140	133	112	112
query92	70	58	55	55
query93	1949	1005	574	574
query94	696	401	305	305
query95	347	277	271	271
query96	516	579	294	294
query97	2825	2851	2748	2748
query98	242	206	199	199
query99	1656	1398	1269	1269
Total cold run time: 293744 ms
Total hot run time: 191826 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.52 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 96abe50dd47182a565e6f57e7a16258da71d0e53, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.03	0.04
query3	0.24	0.06	0.06
query4	1.67	0.10	0.10
query5	0.41	0.42	0.41
query6	1.15	0.66	0.66
query7	0.02	0.02	0.02
query8	0.04	0.04	0.03
query9	0.60	0.50	0.52
query10	0.56	0.57	0.57
query11	0.15	0.11	0.11
query12	0.14	0.11	0.11
query13	0.62	0.60	0.60
query14	2.72	2.74	2.72
query15	0.92	0.83	0.85
query16	0.37	0.37	0.38
query17	1.03	1.02	1.04
query18	0.22	0.20	0.19
query19	1.91	1.77	2.01
query20	0.01	0.02	0.01
query21	15.35	0.93	0.55
query22	0.76	1.23	0.64
query23	14.98	1.39	0.60
query24	7.54	1.58	0.76
query25	0.52	0.20	0.13
query26	0.57	0.15	0.14
query27	0.06	0.05	0.05
query28	9.96	0.82	0.44
query29	12.52	3.96	3.29
query30	0.26	0.09	0.06
query31	2.83	0.60	0.38
query32	3.22	0.54	0.47
query33	3.15	3.06	3.05
query34	15.79	5.15	4.53
query35	4.54	4.50	4.57
query36	0.66	0.51	0.48
query37	0.08	0.06	0.06
query38	0.06	0.04	0.04
query39	0.03	0.02	0.02
query40	0.17	0.13	0.13
query41	0.09	0.03	0.02
query42	0.03	0.02	0.02
query43	0.04	0.02	0.03
Total cold run time: 106.1 s
Total hot run time: 30.52 s

@sollhui
Copy link
Contributor Author

sollhui commented Feb 26, 2025

run buildall

@sollhui sollhui marked this pull request as ready for review February 26, 2025 12:05
@doris-robot
Copy link

TPC-H: Total hot run time: 31668 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f67d0d0d1bfcfccc3f7308b9d4a41bb67249f456, data reload: false

------ Round 1 ----------------------------------
q1	17612	5108	5121	5108
q2	2047	294	168	168
q3	10511	1271	696	696
q4	10288	1002	528	528
q5	8565	2414	2366	2366
q6	187	173	134	134
q7	895	730	585	585
q8	9310	1260	1116	1116
q9	5060	4857	4647	4647
q10	6812	2284	1881	1881
q11	470	273	261	261
q12	342	350	220	220
q13	17765	3684	3077	3077
q14	217	245	202	202
q15	506	463	479	463
q16	632	610	590	590
q17	578	866	350	350
q18	6966	6259	6284	6259
q19	1535	953	568	568
q20	323	327	195	195
q21	2790	2291	1951	1951
q22	366	336	303	303
Total cold run time: 103777 ms
Total hot run time: 31668 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5195	5185	5124	5124
q2	233	337	230	230
q3	2167	2694	2258	2258
q4	1439	1852	1404	1404
q5	4232	4132	4132	4132
q6	201	162	124	124
q7	1849	1787	1780	1780
q8	2585	2575	2540	2540
q9	7249	7185	7252	7185
q10	3037	3205	2788	2788
q11	575	506	480	480
q12	692	748	612	612
q13	3502	3824	3178	3178
q14	272	326	284	284
q15	513	469	464	464
q16	677	706	640	640
q17	1125	1582	1394	1394
q18	7475	7283	7222	7222
q19	799	786	826	786
q20	1929	2021	1886	1886
q21	5394	5000	4919	4919
q22	604	598	550	550
Total cold run time: 51744 ms
Total hot run time: 49980 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183377 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f67d0d0d1bfcfccc3f7308b9d4a41bb67249f456, data reload: false

query1	967	407	386	386
query2	6510	1914	1856	1856
query3	6816	206	204	204
query4	26859	23405	23342	23342
query5	4330	688	489	489
query6	307	194	193	193
query7	4604	511	289	289
query8	300	230	226	226
query9	8603	2582	2571	2571
query10	459	326	256	256
query11	15725	15009	14794	14794
query12	156	104	102	102
query13	1652	517	406	406
query14	9683	6245	6238	6238
query15	217	182	181	181
query16	7194	638	475	475
query17	1203	720	564	564
query18	1956	413	303	303
query19	194	193	163	163
query20	121	114	113	113
query21	207	124	106	106
query22	4109	4152	4418	4152
query23	34456	33493	32815	32815
query24	7779	2385	2350	2350
query25	526	442	380	380
query26	1234	270	153	153
query27	2451	466	320	320
query28	4219	2455	2394	2394
query29	792	529	412	412
query30	229	182	156	156
query31	955	841	752	752
query32	71	63	63	63
query33	553	387	303	303
query34	777	851	484	484
query35	798	814	717	717
query36	977	1011	916	916
query37	120	101	75	75
query38	4113	4110	4059	4059
query39	1423	1367	1399	1367
query40	211	110	124	110
query41	58	54	53	53
query42	124	98	102	98
query43	479	518	477	477
query44	1271	780	772	772
query45	173	175	162	162
query46	845	1022	630	630
query47	1798	1826	1738	1738
query48	373	400	294	294
query49	770	513	410	410
query50	707	735	402	402
query51	4237	4205	4163	4163
query52	103	103	88	88
query53	226	263	178	178
query54	476	481	407	407
query55	83	80	80	80
query56	274	263	246	246
query57	1136	1145	1053	1053
query58	253	240	241	240
query59	2726	2851	2746	2746
query60	281	302	285	285
query61	125	122	137	122
query62	812	736	661	661
query63	225	192	181	181
query64	4460	1004	666	666
query65	3185	3163	3098	3098
query66	1149	385	304	304
query67	15781	15484	15258	15258
query68	8383	895	488	488
query69	470	297	263	263
query70	1232	1168	1110	1110
query71	470	298	254	254
query72	5549	3548	3771	3548
query73	800	723	348	348
query74	9292	8886	8988	8886
query75	3867	3234	2691	2691
query76	3758	1161	739	739
query77	796	398	273	273
query78	10120	10142	9293	9293
query79	2513	833	591	591
query80	599	530	459	459
query81	533	277	249	249
query82	674	127	97	97
query83	178	171	152	152
query84	251	90	75	75
query85	803	355	309	309
query86	384	295	295	295
query87	4449	4660	4254	4254
query88	3619	2190	2184	2184
query89	399	315	284	284
query90	1867	193	194	193
query91	139	141	113	113
query92	80	60	58	58
query93	1698	1052	565	565
query94	649	425	297	297
query95	346	259	259	259
query96	481	554	264	264
query97	3351	3457	3253	3253
query98	239	253	204	204
query99	1352	1406	1282	1282
Total cold run time: 275327 ms
Total hot run time: 183377 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.84 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f67d0d0d1bfcfccc3f7308b9d4a41bb67249f456, data reload: false

query1	0.04	0.05	0.03
query2	0.07	0.03	0.04
query3	0.24	0.06	0.07
query4	1.61	0.10	0.10
query5	0.56	0.55	0.57
query6	1.18	0.73	0.72
query7	0.03	0.02	0.01
query8	0.04	0.03	0.03
query9	0.57	0.54	0.51
query10	0.57	0.57	0.57
query11	0.15	0.10	0.11
query12	0.15	0.11	0.11
query13	0.61	0.61	0.60
query14	2.66	2.82	2.66
query15	0.92	0.85	0.84
query16	0.38	0.38	0.39
query17	1.06	1.02	1.06
query18	0.21	0.19	0.20
query19	1.91	1.83	1.98
query20	0.02	0.01	0.02
query21	15.38	0.90	0.55
query22	0.74	1.22	0.64
query23	14.90	1.41	0.65
query24	7.32	1.32	0.82
query25	0.50	0.17	0.13
query26	0.64	0.17	0.14
query27	0.05	0.05	0.05
query28	9.16	0.85	0.45
query29	12.54	3.94	3.28
query30	0.24	0.08	0.07
query31	2.82	0.61	0.38
query32	3.23	0.54	0.47
query33	2.99	3.02	3.00
query34	15.84	5.15	4.55
query35	4.54	4.53	4.50
query36	0.66	0.49	0.49
query37	0.10	0.06	0.06
query38	0.04	0.04	0.04
query39	0.03	0.02	0.02
query40	0.17	0.13	0.13
query41	0.09	0.03	0.02
query42	0.03	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 105.02 s
Total hot run time: 30.84 s

@sollhui
Copy link
Contributor Author

sollhui commented Feb 27, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31727 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b14ccec4481d58b7355a7445e77420f3ce5aa877, data reload: false

------ Round 1 ----------------------------------
q1	17589	5207	5110	5110
q2	2066	290	168	168
q3	10412	1225	746	746
q4	10212	1010	527	527
q5	7545	2326	2340	2326
q6	191	170	131	131
q7	932	724	604	604
q8	9290	1236	1108	1108
q9	4852	4847	4853	4847
q10	6809	2305	1882	1882
q11	465	282	246	246
q12	352	355	222	222
q13	17758	3659	3032	3032
q14	227	222	208	208
q15	523	464	456	456
q16	638	613	587	587
q17	569	857	354	354
q18	6840	6163	6207	6163
q19	1218	954	533	533
q20	305	329	193	193
q21	2868	2188	1985	1985
q22	368	329	299	299
Total cold run time: 102029 ms
Total hot run time: 31727 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5136	5143	5110	5110
q2	237	329	235	235
q3	2156	2699	2282	2282
q4	1466	1866	1368	1368
q5	4234	4114	4174	4114
q6	200	162	126	126
q7	1871	1793	1660	1660
q8	2585	2616	2452	2452
q9	7268	7290	7239	7239
q10	2982	3200	2751	2751
q11	575	526	496	496
q12	691	778	649	649
q13	3504	3840	3353	3353
q14	275	287	281	281
q15	518	482	472	472
q16	646	669	660	660
q17	1135	1617	1310	1310
q18	7543	7351	7429	7351
q19	836	833	869	833
q20	1959	2059	1876	1876
q21	5436	5112	4762	4762
q22	636	601	539	539
Total cold run time: 51889 ms
Total hot run time: 49919 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 190321 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b14ccec4481d58b7355a7445e77420f3ce5aa877, data reload: false

query1	1355	978	933	933
query2	6186	1876	1872	1872
query3	10979	4451	4481	4451
query4	55976	26770	23215	23215
query5	5053	513	474	474
query6	330	191	191	191
query7	4876	504	289	289
query8	314	252	239	239
query9	5494	2597	2602	2597
query10	411	340	261	261
query11	15141	15192	14845	14845
query12	163	111	112	111
query13	1048	526	386	386
query14	10701	6312	6257	6257
query15	221	198	180	180
query16	7219	704	445	445
query17	1087	730	548	548
query18	1717	428	303	303
query19	196	197	153	153
query20	129	126	118	118
query21	218	126	102	102
query22	4535	4568	4530	4530
query23	33885	33312	33483	33312
query24	5725	2421	2423	2421
query25	464	467	390	390
query26	720	284	155	155
query27	1918	474	329	329
query28	2937	2484	2420	2420
query29	580	579	423	423
query30	212	204	159	159
query31	877	898	787	787
query32	70	60	60	60
query33	458	360	300	300
query34	759	858	503	503
query35	836	868	753	753
query36	950	1011	871	871
query37	127	97	73	73
query38	4106	4169	4140	4140
query39	1653	1430	1438	1430
query40	207	123	107	107
query41	54	52	51	51
query42	124	109	105	105
query43	499	515	487	487
query44	1303	805	810	805
query45	195	177	172	172
query46	890	1061	664	664
query47	1845	1867	1802	1802
query48	399	421	314	314
query49	741	535	452	452
query50	715	759	423	423
query51	4371	4355	4239	4239
query52	122	103	103	103
query53	239	293	188	188
query54	507	502	445	445
query55	96	80	82	80
query56	291	283	261	261
query57	1194	1188	1140	1140
query58	246	250	245	245
query59	2668	2649	2750	2649
query60	290	272	274	272
query61	142	127	115	115
query62	763	752	680	680
query63	225	200	192	192
query64	1798	1031	675	675
query65	3263	3256	3230	3230
query66	707	390	314	314
query67	16101	15704	15284	15284
query68	8245	882	494	494
query69	541	304	271	271
query70	1171	1137	1131	1131
query71	502	305	257	257
query72	5937	3634	3787	3634
query73	1534	761	348	348
query74	9059	9177	9024	9024
query75	3664	3174	2677	2677
query76	4116	1187	752	752
query77	682	374	286	286
query78	10176	10219	9199	9199
query79	2376	822	589	589
query80	605	531	442	442
query81	525	276	240	240
query82	525	125	93	93
query83	174	173	158	158
query84	290	91	71	71
query85	790	347	315	315
query86	413	299	282	282
query87	4397	4540	4479	4479
query88	3776	2223	2208	2208
query89	413	319	285	285
query90	1793	194	197	194
query91	137	140	114	114
query92	80	73	56	56
query93	1775	1054	568	568
query94	667	417	289	289
query95	388	267	261	261
query96	482	558	274	274
query97	3399	3406	3329	3329
query98	221	203	200	200
query99	1405	1413	1281	1281
Total cold run time: 299879 ms
Total hot run time: 190321 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.24 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b14ccec4481d58b7355a7445e77420f3ce5aa877, data reload: false

query1	0.03	0.03	0.04
query2	0.07	0.04	0.04
query3	0.24	0.07	0.07
query4	1.62	0.10	0.10
query5	0.56	0.55	0.55
query6	1.20	0.72	0.72
query7	0.02	0.02	0.01
query8	0.04	0.04	0.03
query9	0.59	0.54	0.53
query10	0.58	0.58	0.59
query11	0.15	0.11	0.11
query12	0.14	0.11	0.12
query13	0.61	0.60	0.60
query14	2.80	2.68	2.80
query15	0.93	0.86	0.85
query16	0.38	0.38	0.38
query17	1.02	1.01	1.01
query18	0.21	0.19	0.20
query19	1.98	1.96	1.83
query20	0.01	0.01	0.01
query21	15.35	0.88	0.53
query22	0.75	1.16	0.77
query23	14.88	1.40	0.60
query24	6.47	2.38	1.18
query25	0.50	0.27	0.12
query26	0.58	0.16	0.13
query27	0.06	0.05	0.05
query28	10.32	0.79	0.43
query29	12.54	3.94	3.32
query30	0.25	0.09	0.06
query31	2.85	0.60	0.38
query32	3.23	0.54	0.45
query33	3.09	2.96	3.00
query34	15.78	5.12	4.53
query35	4.50	4.51	4.51
query36	0.68	0.51	0.48
query37	0.09	0.06	0.06
query38	0.05	0.04	0.03
query39	0.04	0.02	0.03
query40	0.16	0.14	0.14
query41	0.08	0.03	0.02
query42	0.04	0.02	0.02
query43	0.03	0.03	0.02
Total cold run time: 105.5 s
Total hot run time: 31.24 s

@sollhui sollhui force-pushed the rl_abnormal_job_monitor branch from b14ccec to 0ab9771 Compare February 28, 2025 10:33
@sollhui
Copy link
Contributor Author

sollhui commented Feb 28, 2025

run buildall

@sollhui sollhui force-pushed the rl_abnormal_job_monitor branch from 0ab9771 to 0b1ccf5 Compare March 1, 2025 02:36
@sollhui
Copy link
Contributor Author

sollhui commented Mar 1, 2025

run buildall

@sollhui sollhui force-pushed the rl_abnormal_job_monitor branch from 0b1ccf5 to 5156681 Compare March 1, 2025 02:54
@sollhui
Copy link
Contributor Author

sollhui commented Mar 1, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31608 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 515668184526994b7231d0dfb670c6bc46d68cbb, data reload: false

------ Round 1 ----------------------------------
q1	17592	5163	5065	5065
q2	2059	310	168	168
q3	10659	1224	757	757
q4	10224	1021	507	507
q5	7604	2428	2343	2343
q6	194	168	131	131
q7	912	754	605	605
q8	9288	1255	1094	1094
q9	4952	4883	4871	4871
q10	6822	2288	1867	1867
q11	505	277	260	260
q12	349	350	219	219
q13	17783	3723	3050	3050
q14	228	230	209	209
q15	507	455	452	452
q16	621	616	596	596
q17	595	860	344	344
q18	7051	6216	6139	6139
q19	1665	947	522	522
q20	313	323	189	189
q21	2823	2131	1911	1911
q22	378	342	309	309
Total cold run time: 103124 ms
Total hot run time: 31608 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5116	5119	5102	5102
q2	239	324	234	234
q3	2199	2669	2358	2358
q4	1428	1807	1345	1345
q5	4287	4117	4156	4117
q6	210	161	124	124
q7	1884	1834	1725	1725
q8	2639	2567	2574	2567
q9	7216	7214	7243	7214
q10	2987	3196	2815	2815
q11	572	536	494	494
q12	692	780	635	635
q13	3441	3892	3289	3289
q14	288	293	280	280
q15	524	477	459	459
q16	636	695	650	650
q17	1179	1621	1355	1355
q18	7687	7371	7323	7323
q19	793	808	892	808
q20	1971	2119	1863	1863
q21	5508	5018	4853	4853
q22	659	592	568	568
Total cold run time: 52155 ms
Total hot run time: 50178 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 189401 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 515668184526994b7231d0dfb670c6bc46d68cbb, data reload: false

query1	1321	949	944	944
query2	6241	1867	1829	1829
query3	11022	4377	4454	4377
query4	54965	25843	22916	22916
query5	5139	519	462	462
query6	357	195	196	195
query7	4983	499	297	297
query8	332	256	243	243
query9	6235	2535	2548	2535
query10	435	307	259	259
query11	15101	14992	14790	14790
query12	150	111	111	111
query13	1096	522	378	378
query14	10850	6521	6282	6282
query15	203	198	177	177
query16	7080	638	500	500
query17	1091	750	610	610
query18	1562	415	315	315
query19	207	208	202	202
query20	124	123	121	121
query21	215	124	98	98
query22	4354	4444	4329	4329
query23	34160	33437	33399	33399
query24	5701	2401	2432	2401
query25	452	469	400	400
query26	727	277	155	155
query27	1854	497	330	330
query28	2825	2463	2450	2450
query29	567	579	437	437
query30	212	202	153	153
query31	878	866	856	856
query32	71	68	67	67
query33	504	354	336	336
query34	759	873	511	511
query35	820	847	747	747
query36	935	1000	908	908
query37	119	98	74	74
query38	4326	4209	4129	4129
query39	1645	1427	1427	1427
query40	208	111	102	102
query41	54	49	53	49
query42	124	107	112	107
query43	529	516	485	485
query44	1343	805	803	803
query45	179	170	162	162
query46	886	1066	655	655
query47	1845	1855	1814	1814
query48	393	424	325	325
query49	705	504	413	413
query50	730	764	408	408
query51	4265	4266	4207	4207
query52	109	116	96	96
query53	242	259	193	193
query54	503	488	416	416
query55	81	87	79	79
query56	275	262	271	262
query57	1165	1166	1125	1125
query58	235	232	247	232
query59	2860	2826	2782	2782
query60	296	264	276	264
query61	125	134	118	118
query62	753	756	679	679
query63	231	188	197	188
query64	2128	1062	674	674
query65	3317	3344	3147	3147
query66	786	386	291	291
query67	15820	15629	15323	15323
query68	8101	865	501	501
query69	531	290	263	263
query70	1165	1127	1095	1095
query71	485	287	251	251
query72	5887	3538	3842	3538
query73	1344	747	359	359
query74	8926	9170	8801	8801
query75	3795	3118	2670	2670
query76	4261	1164	740	740
query77	648	357	276	276
query78	10092	10174	9318	9318
query79	1872	893	585	585
query80	702	521	495	495
query81	501	278	243	243
query82	601	125	93	93
query83	291	169	156	156
query84	281	98	71	71
query85	777	342	310	310
query86	371	295	301	295
query87	4513	4511	4275	4275
query88	2884	2253	2215	2215
query89	417	325	284	284
query90	1961	197	194	194
query91	136	139	109	109
query92	69	63	55	55
query93	1236	1032	577	577
query94	701	406	293	293
query95	350	275	294	275
query96	486	570	264	264
query97	3425	3375	3210	3210
query98	231	199	203	199
query99	1468	1401	1242	1242
Total cold run time: 298313 ms
Total hot run time: 189401 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.01 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 515668184526994b7231d0dfb670c6bc46d68cbb, data reload: false

query1	0.03	0.04	0.03
query2	0.07	0.03	0.03
query3	0.24	0.06	0.06
query4	1.62	0.10	0.10
query5	0.57	0.57	0.56
query6	1.17	0.72	0.71
query7	0.02	0.01	0.02
query8	0.04	0.03	0.03
query9	0.60	0.56	0.52
query10	0.58	0.59	0.58
query11	0.16	0.11	0.11
query12	0.15	0.12	0.11
query13	0.62	0.61	0.59
query14	2.68	2.70	2.78
query15	0.92	0.86	0.84
query16	0.40	0.39	0.40
query17	1.00	1.03	1.04
query18	0.21	0.19	0.20
query19	1.91	1.80	1.97
query20	0.01	0.01	0.01
query21	15.36	0.91	0.55
query22	0.76	1.17	0.72
query23	14.86	1.40	0.63
query24	7.53	0.92	1.52
query25	0.53	0.09	0.14
query26	0.67	0.16	0.15
query27	0.06	0.05	0.06
query28	9.47	0.86	0.45
query29	12.57	3.92	3.26
query30	0.25	0.09	0.06
query31	2.82	0.63	0.38
query32	3.23	0.57	0.47
query33	3.03	3.05	3.04
query34	15.76	5.17	4.54
query35	4.48	4.51	4.57
query36	0.66	0.51	0.48
query37	0.09	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.02	0.03
query40	0.17	0.14	0.12
query41	0.07	0.03	0.02
query42	0.04	0.02	0.03
query43	0.04	0.03	0.03
Total cold run time: 105.53 s
Total hot run time: 31.01 s

@sollhui sollhui force-pushed the rl_abnormal_job_monitor branch from 5156681 to f182a7e Compare March 1, 2025 10:07
@sollhui
Copy link
Contributor Author

sollhui commented Mar 1, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31328 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 60c2372c68d1b164b83f33ee2d58d81d2f2c905b, data reload: false

------ Round 1 ----------------------------------
q1	17610	5240	5017	5017
q2	2047	291	165	165
q3	10425	1224	730	730
q4	10210	1027	523	523
q5	7520	2396	2335	2335
q6	193	170	134	134
q7	899	772	601	601
q8	9311	1297	1087	1087
q9	4978	4684	4565	4565
q10	6807	2324	1899	1899
q11	474	270	258	258
q12	349	350	219	219
q13	17774	3676	3104	3104
q14	227	229	206	206
q15	507	479	456	456
q16	650	604	579	579
q17	576	855	335	335
q18	6605	6301	6150	6150
q19	1208	940	527	527
q20	323	328	189	189
q21	2733	2215	1937	1937
q22	364	328	312	312
Total cold run time: 101790 ms
Total hot run time: 31328 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5083	5092	5063	5063
q2	232	319	233	233
q3	2171	2644	2349	2349
q4	1395	1797	1342	1342
q5	4223	4123	4121	4121
q6	204	163	120	120
q7	1839	1817	1629	1629
q8	2557	2530	2454	2454
q9	7503	7313	7294	7294
q10	3080	3257	2802	2802
q11	576	516	506	506
q12	706	844	628	628
q13	3735	3848	3309	3309
q14	280	320	266	266
q15	516	474	470	470
q16	647	683	656	656
q17	1133	1593	1319	1319
q18	7675	7382	7331	7331
q19	783	796	948	796
q20	2009	1990	1953	1953
q21	5510	4887	4778	4778
q22	608	576	565	565
Total cold run time: 52465 ms
Total hot run time: 49984 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 190656 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 60c2372c68d1b164b83f33ee2d58d81d2f2c905b, data reload: false

query1	1307	935	940	935
query2	6215	1904	1893	1893
query3	11132	4722	4592	4592
query4	25580	23418	23508	23418
query5	4300	650	483	483
query6	292	206	194	194
query7	3983	496	305	305
query8	283	235	229	229
query9	8455	2527	2523	2523
query10	456	327	243	243
query11	15908	15038	14850	14850
query12	163	108	103	103
query13	1554	527	392	392
query14	9305	6401	6148	6148
query15	201	191	176	176
query16	7605	612	481	481
query17	1174	784	578	578
query18	1986	413	351	351
query19	208	193	174	174
query20	125	123	119	119
query21	219	137	109	109
query22	4337	4814	4516	4516
query23	34275	33345	33474	33345
query24	8434	2417	2473	2417
query25	516	466	395	395
query26	1187	278	154	154
query27	2790	485	344	344
query28	4832	2475	2429	2429
query29	681	560	452	452
query30	221	188	165	165
query31	940	917	813	813
query32	70	62	61	61
query33	559	357	310	310
query34	803	877	513	513
query35	781	852	782	782
query36	982	981	929	929
query37	119	101	75	75
query38	4136	4106	4196	4106
query39	1510	1433	1420	1420
query40	213	123	121	121
query41	59	56	57	56
query42	123	113	108	108
query43	508	509	487	487
query44	1311	857	836	836
query45	184	178	199	178
query46	879	1067	667	667
query47	1859	1860	1786	1786
query48	406	430	314	314
query49	782	528	416	416
query50	703	739	422	422
query51	4232	4304	4186	4186
query52	105	110	95	95
query53	227	272	206	206
query54	497	532	425	425
query55	84	80	76	76
query56	276	280	281	280
query57	1176	1188	1135	1135
query58	260	253	240	240
query59	2762	2985	2598	2598
query60	284	280	262	262
query61	119	123	121	121
query62	807	761	682	682
query63	228	195	194	194
query64	4190	1072	689	689
query65	3343	3275	3204	3204
query66	1123	395	333	333
query67	16188	15519	15435	15435
query68	8741	882	515	515
query69	473	301	268	268
query70	1180	1158	1074	1074
query71	480	302	258	258
query72	5696	3531	3617	3531
query73	734	683	355	355
query74	8915	9087	8912	8912
query75	3796	3243	2657	2657
query76	3599	1181	750	750
query77	783	392	279	279
query78	9915	10081	9336	9336
query79	2552	827	587	587
query80	676	510	468	468
query81	517	268	244	244
query82	585	123	96	96
query83	172	171	156	156
query84	242	91	71	71
query85	776	339	298	298
query86	383	308	283	283
query87	4403	4451	4379	4379
query88	3799	2247	2244	2244
query89	424	315	289	289
query90	1836	200	189	189
query91	136	142	108	108
query92	70	59	56	56
query93	2041	1022	569	569
query94	640	406	302	302
query95	353	272	256	256
query96	478	568	271	271
query97	3333	3388	3293	3293
query98	226	202	202	202
query99	1614	1372	1286	1286
Total cold run time: 280120 ms
Total hot run time: 190656 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.47 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 60c2372c68d1b164b83f33ee2d58d81d2f2c905b, data reload: false

query1	0.05	0.04	0.03
query2	0.07	0.03	0.04
query3	0.23	0.06	0.07
query4	1.63	0.10	0.11
query5	0.55	0.55	0.56
query6	1.20	0.72	0.72
query7	0.02	0.02	0.01
query8	0.04	0.03	0.04
query9	0.57	0.54	0.52
query10	0.58	0.57	0.58
query11	0.15	0.10	0.11
query12	0.14	0.11	0.11
query13	0.60	0.60	0.59
query14	2.80	2.69	2.71
query15	0.91	0.85	0.85
query16	0.39	0.37	0.40
query17	1.09	1.03	1.06
query18	0.20	0.19	0.19
query19	1.89	1.79	1.99
query20	0.02	0.01	0.02
query21	15.39	0.90	0.55
query22	0.75	1.21	0.66
query23	14.95	1.36	0.62
query24	7.11	1.72	0.57
query25	0.46	0.12	0.07
query26	0.63	0.17	0.15
query27	0.05	0.06	0.05
query28	9.18	0.86	0.43
query29	12.54	3.94	3.26
query30	0.25	0.09	0.07
query31	2.81	0.56	0.37
query32	3.21	0.55	0.46
query33	3.01	2.98	3.04
query34	15.68	5.13	4.53
query35	4.60	4.53	4.54
query36	0.67	0.50	0.49
query37	0.09	0.07	0.06
query38	0.05	0.04	0.04
query39	0.03	0.02	0.02
query40	0.18	0.14	0.14
query41	0.08	0.03	0.02
query42	0.04	0.03	0.02
query43	0.04	0.03	0.03
Total cold run time: 104.93 s
Total hot run time: 30.47 s

@sollhui
Copy link
Contributor Author

sollhui commented Mar 5, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32169 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2f16f8cb529a79f49ffc526c1f73f55d04413917, data reload: false

------ Round 1 ----------------------------------
q1	17611	5141	5072	5072
q2	2041	299	169	169
q3	10516	1328	697	697
q4	10313	1037	525	525
q5	8886	2471	2312	2312
q6	193	162	132	132
q7	891	754	634	634
q8	9320	1290	1077	1077
q9	5249	4860	4585	4585
q10	6804	2302	1882	1882
q11	495	281	259	259
q12	345	345	214	214
q13	17762	3697	3087	3087
q14	224	240	203	203
q15	534	495	476	476
q16	633	619	588	588
q17	578	862	346	346
q18	6979	6457	6337	6337
q19	1263	937	532	532
q20	323	340	184	184
q21	2779	2137	1897	1897
q22	1074	1044	961	961
Total cold run time: 104813 ms
Total hot run time: 32169 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5181	5105	5100	5100
q2	237	328	239	239
q3	2128	2655	2271	2271
q4	1428	1852	1364	1364
q5	4241	4129	4140	4129
q6	206	162	120	120
q7	1929	1941	1741	1741
q8	2621	2570	2580	2570
q9	7215	7184	7233	7184
q10	3047	3223	2787	2787
q11	592	510	491	491
q12	676	769	609	609
q13	3477	3862	3200	3200
q14	290	294	270	270
q15	513	480	473	473
q16	650	707	651	651
q17	1148	1788	1311	1311
q18	7792	7613	7449	7449
q19	798	800	914	800
q20	1946	1990	1885	1885
q21	5358	4955	4749	4749
q22	1091	1077	1039	1039
Total cold run time: 52564 ms
Total hot run time: 50432 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183995 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2f16f8cb529a79f49ffc526c1f73f55d04413917, data reload: false

query1	990	389	398	389
query2	6568	1941	1884	1884
query3	6792	208	209	208
query4	26473	23630	22865	22865
query5	4369	646	470	470
query6	318	202	200	200
query7	4600	505	293	293
query8	306	251	230	230
query9	8600	2524	2523	2523
query10	489	313	252	252
query11	15642	15153	14905	14905
query12	169	106	109	106
query13	1669	514	405	405
query14	9916	6554	6213	6213
query15	209	188	176	176
query16	7546	624	495	495
query17	1203	699	561	561
query18	1973	403	305	305
query19	198	190	156	156
query20	119	117	115	115
query21	210	178	101	101
query22	4275	4190	4072	4072
query23	33613	32868	32960	32868
query24	7683	2366	2388	2366
query25	523	449	390	390
query26	1236	266	153	153
query27	2185	500	325	325
query28	3989	2442	2385	2385
query29	708	547	409	409
query30	281	220	200	200
query31	958	857	794	794
query32	74	67	61	61
query33	548	355	324	324
query34	766	834	487	487
query35	804	820	760	760
query36	963	996	887	887
query37	115	93	76	76
query38	4052	4135	4119	4119
query39	1437	1407	1412	1407
query40	208	115	101	101
query41	53	52	49	49
query42	115	105	102	102
query43	506	496	469	469
query44	1244	779	765	765
query45	181	169	164	164
query46	829	1035	608	608
query47	1746	1773	1699	1699
query48	380	407	293	293
query49	793	521	441	441
query50	681	739	402	402
query51	4170	4206	4140	4140
query52	106	102	93	93
query53	222	251	186	186
query54	484	477	429	429
query55	82	84	82	82
query56	254	258	275	258
query57	1142	1161	1069	1069
query58	242	241	244	241
query59	2723	2581	2705	2581
query60	282	263	258	258
query61	117	117	116	116
query62	781	747	673	673
query63	222	186	184	184
query64	4279	987	641	641
query65	4377	4263	4312	4263
query66	1055	398	302	302
query67	15744	15502	15215	15215
query68	8128	859	563	563
query69	464	292	265	265
query70	1187	1109	1083	1083
query71	482	295	279	279
query72	5247	3523	3761	3523
query73	775	724	340	340
query74	8957	9078	8685	8685
query75	3779	3179	2720	2720
query76	3712	1163	754	754
query77	783	380	272	272
query78	9915	9975	9269	9269
query79	2946	836	579	579
query80	697	513	437	437
query81	467	265	222	222
query82	707	125	147	125
query83	206	174	149	149
query84	280	97	69	69
query85	772	340	295	295
query86	333	295	299	295
query87	4482	4440	4359	4359
query88	3349	2128	2146	2128
query89	388	311	281	281
query90	1950	196	193	193
query91	135	140	108	108
query92	78	61	58	58
query93	1530	1012	580	580
query94	681	412	365	365
query95	352	259	257	257
query96	479	555	260	260
query97	3333	3398	3268	3268
query98	220	204	199	199
query99	1457	1394	1256	1256
Total cold run time: 273858 ms
Total hot run time: 183995 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.38 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 2f16f8cb529a79f49ffc526c1f73f55d04413917, data reload: false

query1	0.03	0.03	0.04
query2	0.07	0.03	0.03
query3	0.23	0.07	0.07
query4	1.61	0.09	0.10
query5	0.55	0.56	0.55
query6	1.22	0.72	0.72
query7	0.03	0.01	0.02
query8	0.04	0.03	0.03
query9	0.58	0.52	0.52
query10	0.57	0.57	0.58
query11	0.16	0.10	0.11
query12	0.14	0.11	0.12
query13	0.61	0.60	0.60
query14	2.69	2.69	2.71
query15	0.92	0.84	0.83
query16	0.38	0.36	0.37
query17	1.02	1.02	1.02
query18	0.20	0.19	0.19
query19	1.92	1.75	1.97
query20	0.02	0.01	0.02
query21	15.36	0.90	0.58
query22	0.75	1.12	0.59
query23	15.09	1.35	0.65
query24	7.86	1.83	0.70
query25	0.53	0.29	0.08
query26	0.55	0.16	0.13
query27	0.05	0.04	0.05
query28	9.71	0.78	0.43
query29	12.53	3.91	3.22
query30	0.25	0.09	0.06
query31	2.82	0.59	0.40
query32	3.23	0.54	0.46
query33	3.00	3.06	2.99
query34	15.75	5.13	4.48
query35	4.55	4.50	4.54
query36	0.66	0.49	0.48
query37	0.09	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.03	0.02
query40	0.17	0.13	0.12
query41	0.08	0.02	0.02
query42	0.03	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 106.17 s
Total hot run time: 30.38 s

@sollhui sollhui force-pushed the rl_abnormal_job_monitor branch from 2f16f8c to c952646 Compare March 5, 2025 14:13
@sollhui
Copy link
Contributor Author

sollhui commented Mar 5, 2025

run buildall

@sollhui sollhui force-pushed the rl_abnormal_job_monitor branch from c952646 to 858ad59 Compare March 5, 2025 14:33
@sollhui
Copy link
Contributor Author

sollhui commented Mar 5, 2025

run buildall

@sollhui sollhui force-pushed the rl_abnormal_job_monitor branch from 858ad59 to e4a4825 Compare March 5, 2025 14:37
@sollhui
Copy link
Contributor Author

sollhui commented Mar 5, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32084 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e4a4825f1f7bca75536075982488ed637f59c7f8, data reload: false

------ Round 1 ----------------------------------
q1	14784	5058	5094	5058
q2	2042	289	177	177
q3	5192	1267	706	706
q4	705	1038	570	570
q5	2336	2084	2263	2084
q6	199	167	134	134
q7	896	788	605	605
q8	1072	1162	1091	1091
q9	4974	4787	4569	4569
q10	6838	2328	1880	1880
q11	467	283	259	259
q12	363	351	223	223
q13	17771	3686	3070	3070
q14	241	225	216	216
q15	535	485	498	485
q16	649	631	607	607
q17	572	850	341	341
q18	7060	6581	6354	6354
q19	1325	961	531	531
q20	560	327	193	193
q21	5055	2121	1958	1958
q22	1065	1007	973	973
Total cold run time: 74701 ms
Total hot run time: 32084 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5143	5121	5116	5116
q2	241	332	231	231
q3	2137	2689	2294	2294
q4	1419	1803	1349	1349
q5	4212	4092	4127	4092
q6	202	163	125	125
q7	1901	1839	1671	1671
q8	2492	2524	2458	2458
q9	6827	6774	6799	6774
q10	2889	3111	2659	2659
q11	566	499	476	476
q12	679	732	610	610
q13	3299	3634	3075	3075
q14	265	290	255	255
q15	510	472	478	472
q16	615	670	646	646
q17	1093	1518	1338	1338
q18	7395	7250	7166	7166
q19	777	766	862	766
q20	1912	1976	1830	1830
q21	5146	4774	4849	4774
q22	1081	1028	987	987
Total cold run time: 50801 ms
Total hot run time: 49164 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 185680 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e4a4825f1f7bca75536075982488ed637f59c7f8, data reload: false

query1	993	383	406	383
query2	6562	1978	1950	1950
query3	6799	217	219	217
query4	26426	23477	23040	23040
query5	4587	650	478	478
query6	285	199	208	199
query7	4609	499	300	300
query8	303	252	237	237
query9	8632	2578	2561	2561
query10	455	349	265	265
query11	15809	15095	15034	15034
query12	166	108	115	108
query13	1682	537	400	400
query14	10442	6566	6419	6419
query15	212	196	188	188
query16	7661	618	498	498
query17	1608	725	540	540
query18	1985	417	286	286
query19	226	182	176	176
query20	119	112	114	112
query21	213	119	102	102
query22	4285	4298	3969	3969
query23	34104	32993	33163	32993
query24	7324	2342	2384	2342
query25	532	453	389	389
query26	1227	271	157	157
query27	2324	487	334	334
query28	4143	2434	2384	2384
query29	706	551	431	431
query30	280	218	186	186
query31	968	863	771	771
query32	76	69	61	61
query33	569	371	306	306
query34	777	840	494	494
query35	775	829	737	737
query36	957	975	906	906
query37	123	96	74	74
query38	4265	4182	4083	4083
query39	1459	1428	1416	1416
query40	214	117	103	103
query41	60	53	52	52
query42	123	103	102	102
query43	514	522	480	480
query44	1282	800	784	784
query45	185	175	164	164
query46	827	1017	619	619
query47	1745	1792	1721	1721
query48	371	406	291	291
query49	792	530	446	446
query50	681	738	423	423
query51	4193	4201	4163	4163
query52	110	106	96	96
query53	231	264	198	198
query54	508	504	445	445
query55	85	82	81	81
query56	305	292	270	270
query57	1125	1171	1075	1075
query58	246	241	233	233
query59	2710	2831	2858	2831
query60	304	285	262	262
query61	124	121	125	121
query62	812	743	685	685
query63	236	194	194	194
query64	4217	1006	664	664
query65	4380	4300	4338	4300
query66	1053	398	305	305
query67	15822	15417	15582	15417
query68	8370	872	502	502
query69	449	289	260	260
query70	1262	1149	1121	1121
query71	469	294	261	261
query72	5664	3497	3673	3497
query73	779	675	350	350
query74	9219	9138	8825	8825
query75	3786	3175	2821	2821
query76	3681	1182	745	745
query77	780	373	284	284
query78	9969	10189	9387	9387
query79	2672	819	584	584
query80	626	531	467	467
query81	471	266	223	223
query82	647	126	100	100
query83	244	179	150	150
query84	239	95	71	71
query85	805	361	319	319
query86	344	308	292	292
query87	4503	4479	4405	4405
query88	3374	2163	2170	2163
query89	390	315	281	281
query90	1959	196	195	195
query91	138	142	111	111
query92	73	59	63	59
query93	1452	1037	589	589
query94	682	419	303	303
query95	364	260	254	254
query96	479	556	262	262
query97	3395	3386	3244	3244
query98	220	216	200	200
query99	1667	1404	1304	1304
Total cold run time: 277095 ms
Total hot run time: 185680 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.89 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit e4a4825f1f7bca75536075982488ed637f59c7f8, data reload: false

query1	0.04	0.03	0.03
query2	0.07	0.04	0.03
query3	0.24	0.07	0.07
query4	1.61	0.10	0.11
query5	0.55	0.58	0.55
query6	1.22	0.72	0.72
query7	0.03	0.02	0.01
query8	0.04	0.03	0.03
query9	0.60	0.52	0.51
query10	0.58	0.57	0.58
query11	0.15	0.10	0.10
query12	0.14	0.11	0.11
query13	0.62	0.60	0.60
query14	2.67	2.67	2.74
query15	0.95	0.86	0.85
query16	0.37	0.39	0.39
query17	1.01	1.04	1.02
query18	0.21	0.20	0.20
query19	1.90	1.74	2.04
query20	0.01	0.01	0.01
query21	15.35	0.94	0.56
query22	0.73	1.18	0.62
query23	14.99	1.39	0.65
query24	6.77	1.62	1.04
query25	0.50	0.19	0.12
query26	0.67	0.15	0.13
query27	0.05	0.05	0.06
query28	8.94	0.87	0.45
query29	12.55	3.97	3.27
query30	0.25	0.08	0.07
query31	2.82	0.59	0.38
query32	3.23	0.55	0.46
query33	3.02	3.03	3.04
query34	15.54	5.14	4.46
query35	4.53	4.54	4.49
query36	0.67	0.50	0.48
query37	0.09	0.06	0.06
query38	0.06	0.04	0.04
query39	0.03	0.03	0.02
query40	0.16	0.14	0.13
query41	0.08	0.03	0.03
query42	0.03	0.02	0.03
query43	0.03	0.02	0.02
Total cold run time: 104.1 s
Total hot run time: 30.89 s

// 1. check auto resume count
if (this.autoResumeCount >= Config.min_abnormal_auto_resume_count_threshold) {
Env.getCurrentEnv().getRoutineLoadManager().addAbnormalJob(this.id,
"The auto resume time reaches threshold: " + Config.min_abnormal_auto_resume_count_threshold);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automatic resume has failed multiple times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants