Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve](routine load) improve routine load observability #46238

Merged
merged 1 commit into from
Jan 7, 2025

Conversation

sollhui
Copy link
Contributor

@sollhui sollhui commented Jan 1, 2025

What problem does this PR solve?

related #48511

  1. reset other msg in the a stream window
    The routine load job is a continuously scheduled job, and as the job runs, previous errors do not need to be constantly displayed.

  2. show error info when transaction of sub task failed
    If a subtask fails, it will continuously retry, and there may be some errors that prevent the job from scheduling and consuming data properly, such as continuous too many segments error(code: -235). At this time, it is necessary to display it in a timely manner to make the user aware.

  3. set pause reason to other msg when reschedule job
    For jobs that are unexpectedly paused, the job manager has an auto resume mechanism. However, for some scenarios, such as not being able to connect to Kafka and being auto resumed after pause to retry, it may cause users to not see the problem for a long time. Unexpectedly paused jobs always have issues, even if auto resume occurs, the reason for the error needs to be displayed.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Jan 1, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@sollhui
Copy link
Contributor Author

sollhui commented Jan 1, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32703 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e14a130486984474b6dbfa21c9aa6818d9f38fe2, data reload: false

------ Round 1 ----------------------------------
q1	17572	6131	6005	6005
q2	2043	289	161	161
q3	10483	1205	754	754
q4	10250	878	444	444
q5	8149	2221	1987	1987
q6	200	184	151	151
q7	881	757	595	595
q8	9225	1400	1180	1180
q9	5203	4953	4974	4953
q10	6771	2326	1895	1895
q11	502	290	268	268
q12	336	358	225	225
q13	17833	3711	3002	3002
q14	248	233	221	221
q15	582	525	512	512
q16	625	627	587	587
q17	572	858	328	328
q18	6970	6578	6433	6433
q19	1923	964	543	543
q20	299	305	182	182
q21	2881	2177	1960	1960
q22	351	335	317	317
Total cold run time: 103899 ms
Total hot run time: 32703 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6292	6270	6249	6249
q2	241	324	233	233
q3	2288	2731	2386	2386
q4	1477	1882	1437	1437
q5	4375	4798	4928	4798
q6	192	187	145	145
q7	2148	2100	1950	1950
q8	2727	2951	2825	2825
q9	7577	7548	7523	7523
q10	3115	3389	2881	2881
q11	597	505	493	493
q12	644	740	632	632
q13	3516	3872	3184	3184
q14	280	320	299	299
q15	584	523	507	507
q16	675	684	653	653
q17	1213	1744	1282	1282
q18	7682	7570	7286	7286
q19	900	1149	1148	1148
q20	2058	2036	1940	1940
q21	5754	5332	4950	4950
q22	608	635	609	609
Total cold run time: 54943 ms
Total hot run time: 53410 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197047 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e14a130486984474b6dbfa21c9aa6818d9f38fe2, data reload: false

query1	1311	987	918	918
query2	6498	2447	2357	2357
query3	10957	4877	4749	4749
query4	33014	23794	23469	23469
query5	4099	636	469	469
query6	294	206	203	203
query7	3998	533	303	303
query8	296	253	237	237
query9	9542	2714	2725	2714
query10	469	314	267	267
query11	17971	15277	15092	15092
query12	157	107	105	105
query13	1552	548	395	395
query14	10473	6793	7423	6793
query15	264	214	198	198
query16	8310	675	484	484
query17	1587	794	597	597
query18	2167	444	349	349
query19	219	206	179	179
query20	132	112	112	112
query21	209	133	109	109
query22	4529	4487	4435	4435
query23	35537	33845	33641	33641
query24	6469	2363	2454	2363
query25	499	467	402	402
query26	771	299	156	156
query27	2017	505	335	335
query28	5285	2488	2477	2477
query29	699	578	427	427
query30	217	188	160	160
query31	1012	934	826	826
query32	90	64	59	59
query33	466	361	294	294
query34	816	910	529	529
query35	822	893	751	751
query36	1049	1075	969	969
query37	119	106	75	75
query38	4355	4193	4363	4193
query39	1527	1486	1448	1448
query40	211	127	109	109
query41	52	57	44	44
query42	122	104	101	101
query43	523	551	514	514
query44	1420	837	845	837
query45	185	179	173	173
query46	974	1117	665	665
query47	1991	2043	1954	1954
query48	412	436	330	330
query49	714	482	379	379
query50	685	709	418	418
query51	7328	7150	7171	7150
query52	107	104	106	104
query53	242	274	193	193
query54	496	509	428	428
query55	103	79	80	79
query56	278	287	287	287
query57	1246	1229	1177	1177
query58	250	231	234	231
query59	3217	3306	3169	3169
query60	287	281	266	266
query61	114	138	109	109
query62	883	827	758	758
query63	236	203	195	195
query64	3790	1050	668	668
query65	3321	3327	3353	3327
query66	703	420	317	317
query67	16755	15911	15594	15594
query68	10061	760	515	515
query69	512	301	255	255
query70	1265	1153	1138	1138
query71	445	304	255	255
query72	5894	3964	4013	3964
query73	1369	806	369	369
query74	10149	9123	8926	8926
query75	4627	3181	2693	2693
query76	5642	1270	801	801
query77	1027	372	284	284
query78	10147	10284	9595	9595
query79	5197	904	599	599
query80	691	538	424	424
query81	479	275	229	229
query82	227	221	125	125
query83	200	165	148	148
query84	280	88	68	68
query85	747	356	322	322
query86	342	310	297	297
query87	4465	4431	4315	4315
query88	4350	2220	2200	2200
query89	434	331	299	299
query90	2079	196	189	189
query91	135	134	104	104
query92	67	58	51	51
query93	2741	897	535	535
query94	699	400	286	286
query95	328	274	291	274
query96	496	674	285	285
query97	2697	2836	2673	2673
query98	225	206	198	198
query99	1618	1611	1416	1416
Total cold run time: 306078 ms
Total hot run time: 197047 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.79 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit e14a130486984474b6dbfa21c9aa6818d9f38fe2, data reload: false

query1	0.03	0.03	0.06
query2	0.07	0.04	0.03
query3	0.24	0.07	0.07
query4	1.61	0.11	0.10
query5	0.43	0.42	0.40
query6	1.17	0.65	0.64
query7	0.03	0.02	0.01
query8	0.04	0.03	0.03
query9	0.60	0.50	0.50
query10	0.56	0.56	0.56
query11	0.14	0.10	0.10
query12	0.14	0.11	0.12
query13	0.60	0.60	0.60
query14	2.83	2.86	2.75
query15	0.90	0.82	0.84
query16	0.38	0.37	0.40
query17	1.02	1.04	0.98
query18	0.22	0.22	0.21
query19	1.90	1.76	2.00
query20	0.01	0.01	0.01
query21	15.36	0.95	0.60
query22	0.77	0.79	0.90
query23	15.04	1.40	0.57
query24	2.84	1.40	2.24
query25	0.21	0.10	0.13
query26	0.18	0.15	0.14
query27	0.05	0.05	0.04
query28	14.63	1.47	1.04
query29	12.58	4.02	3.30
query30	0.25	0.10	0.06
query31	2.82	0.60	0.38
query32	3.23	0.52	0.45
query33	3.09	3.18	3.23
query34	16.83	5.06	4.46
query35	4.45	4.42	4.43
query36	0.63	0.51	0.48
query37	0.10	0.06	0.06
query38	0.04	0.04	0.03
query39	0.03	0.02	0.02
query40	0.16	0.14	0.12
query41	0.08	0.03	0.02
query42	0.04	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 106.37 s
Total hot run time: 31.79 s

@sollhui sollhui force-pushed the improve_rl_observability branch from e14a130 to 39089e6 Compare January 2, 2025 04:56
@sollhui
Copy link
Contributor Author

sollhui commented Jan 2, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32931 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 39089e68f95f5949f483d0fb1644933bb937c40d, data reload: false

------ Round 1 ----------------------------------
q1	17580	6255	6137	6137
q2	2052	317	178	178
q3	10404	1257	751	751
q4	10209	873	446	446
q5	7535	2220	1993	1993
q6	210	180	144	144
q7	898	752	620	620
q8	9242	1404	1238	1238
q9	5220	4961	4964	4961
q10	6720	2332	1858	1858
q11	493	286	253	253
q12	341	354	221	221
q13	17760	3581	2920	2920
q14	260	231	212	212
q15	571	508	493	493
q16	616	624	598	598
q17	581	856	326	326
q18	7056	6539	6537	6537
q19	2676	974	566	566
q20	298	315	182	182
q21	2932	2510	1989	1989
q22	365	343	308	308
Total cold run time: 104019 ms
Total hot run time: 32931 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6414	6261	6492	6261
q2	236	323	227	227
q3	2226	2639	2347	2347
q4	1396	1840	1365	1365
q5	4327	4769	4932	4769
q6	188	176	143	143
q7	2060	2038	1857	1857
q8	2645	2818	2717	2717
q9	7349	7341	7341	7341
q10	3073	3356	2762	2762
q11	582	499	494	494
q12	651	815	611	611
q13	3397	3895	3178	3178
q14	288	304	278	278
q15	566	512	495	495
q16	645	728	660	660
q17	1238	1739	1259	1259
q18	7734	7523	7422	7422
q19	863	1213	1097	1097
q20	2000	2008	1884	1884
q21	5837	5325	4963	4963
q22	620	636	576	576
Total cold run time: 54335 ms
Total hot run time: 52706 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 196550 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 39089e68f95f5949f483d0fb1644933bb937c40d, data reload: false

query1	1308	991	926	926
query2	6483	2299	2440	2299
query3	11108	4861	4662	4662
query4	32939	23724	23559	23559
query5	3875	595	471	471
query6	262	190	176	176
query7	3999	493	296	296
query8	292	238	229	229
query9	9504	2729	2717	2717
query10	439	324	253	253
query11	17865	15637	15156	15156
query12	164	107	102	102
query13	1582	540	416	416
query14	11253	7403	7511	7403
query15	266	208	186	186
query16	7568	600	453	453
query17	1514	778	584	584
query18	1685	395	326	326
query19	192	183	158	158
query20	131	116	118	116
query21	200	144	112	112
query22	4718	4634	4414	4414
query23	34849	33774	33377	33377
query24	6400	2319	2282	2282
query25	477	468	399	399
query26	772	281	148	148
query27	2012	478	333	333
query28	5990	2485	2470	2470
query29	590	544	439	439
query30	207	181	155	155
query31	979	957	842	842
query32	86	66	61	61
query33	468	344	300	300
query34	772	867	528	528
query35	816	820	759	759
query36	1048	1080	990	990
query37	119	95	75	75
query38	4189	4206	4460	4206
query39	1504	1478	1496	1478
query40	207	118	105	105
query41	44	42	43	42
query42	118	102	103	102
query43	530	534	500	500
query44	1334	816	825	816
query45	185	175	171	171
query46	889	1041	662	662
query47	2026	2023	1966	1966
query48	390	413	322	322
query49	716	475	392	392
query50	652	669	407	407
query51	7365	7198	7265	7198
query52	107	102	97	97
query53	244	259	183	183
query54	491	509	412	412
query55	84	77	84	77
query56	253	260	247	247
query57	1229	1240	1186	1186
query58	245	226	232	226
query59	3145	3301	3234	3234
query60	286	290	252	252
query61	114	103	119	103
query62	865	826	762	762
query63	234	193	190	190
query64	3496	1053	758	758
query65	3386	3323	3219	3219
query66	821	418	340	340
query67	16658	15758	15554	15554
query68	8433	693	508	508
query69	471	287	246	246
query70	1193	1108	1138	1108
query71	424	282	245	245
query72	6499	3873	3848	3848
query73	660	747	358	358
query74	10371	9146	8794	8794
query75	4008	3129	2642	2642
query76	3636	1174	775	775
query77	751	374	270	270
query78	10073	9987	9344	9344
query79	3247	821	590	590
query80	655	500	429	429
query81	477	260	227	227
query82	620	149	124	124
query83	196	162	142	142
query84	272	82	68	68
query85	769	377	308	308
query86	352	304	299	299
query87	4423	4463	4403	4403
query88	4325	2170	2165	2165
query89	427	337	294	294
query90	1871	183	187	183
query91	126	131	109	109
query92	61	52	51	51
query93	1820	869	530	530
query94	649	380	291	291
query95	334	264	246	246
query96	564	596	274	274
query97	2742	2824	2636	2636
query98	230	204	199	199
query99	1662	1589	1465	1465
Total cold run time: 297080 ms
Total hot run time: 196550 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.83 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 39089e68f95f5949f483d0fb1644933bb937c40d, data reload: false

query1	0.03	0.04	0.03
query2	0.08	0.04	0.03
query3	0.25	0.07	0.06
query4	1.60	0.11	0.11
query5	0.44	0.43	0.41
query6	1.16	0.65	0.66
query7	0.02	0.01	0.02
query8	0.04	0.03	0.03
query9	0.59	0.50	0.50
query10	0.55	0.56	0.54
query11	0.14	0.10	0.10
query12	0.14	0.11	0.11
query13	0.60	0.59	0.61
query14	2.71	2.75	2.83
query15	0.89	0.83	0.82
query16	0.39	0.37	0.39
query17	1.06	1.02	1.05
query18	0.23	0.20	0.20
query19	1.96	1.72	1.94
query20	0.02	0.00	0.02
query21	15.36	0.89	0.57
query22	0.75	0.84	0.70
query23	15.22	1.41	0.56
query24	3.36	2.01	0.64
query25	0.24	0.15	0.10
query26	0.19	0.14	0.14
query27	0.07	0.05	0.05
query28	14.02	1.49	1.05
query29	12.59	3.86	3.24
query30	0.25	0.09	0.06
query31	2.83	0.57	0.37
query32	3.23	0.54	0.46
query33	3.15	3.20	3.02
query34	16.71	5.10	4.52
query35	4.50	4.49	4.54
query36	0.67	0.50	0.49
query37	0.10	0.06	0.06
query38	0.05	0.04	0.03
query39	0.03	0.03	0.03
query40	0.17	0.14	0.13
query41	0.08	0.03	0.02
query42	0.03	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 106.54 s
Total hot run time: 30.83 s

@sollhui sollhui force-pushed the improve_rl_observability branch from 39089e6 to 56b84ba Compare January 2, 2025 14:06
@sollhui
Copy link
Contributor Author

sollhui commented Jan 2, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32637 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 56b84ba2773a791cfe0fd35bf664e4fc045baabd, data reload: false

------ Round 1 ----------------------------------
q1	17578	6079	6013	6013
q2	2044	294	173	173
q3	10456	1212	769	769
q4	10300	858	428	428
q5	9186	2205	1968	1968
q6	208	183	145	145
q7	888	742	597	597
q8	9735	1409	1244	1244
q9	5321	4902	4989	4902
q10	6767	2309	1881	1881
q11	459	280	258	258
q12	345	365	222	222
q13	17760	3654	3024	3024
q14	256	237	212	212
q15	554	510	499	499
q16	643	604	580	580
q17	575	866	350	350
q18	6766	6332	6313	6313
q19	1954	976	563	563
q20	312	317	195	195
q21	2823	2335	1996	1996
q22	363	339	305	305
Total cold run time: 105293 ms
Total hot run time: 32637 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6269	6209	6214	6209
q2	233	326	227	227
q3	2217	2667	2297	2297
q4	1411	1835	1340	1340
q5	4385	4743	4883	4743
q6	185	177	138	138
q7	2111	1963	1813	1813
q8	2658	2841	2688	2688
q9	7362	7210	7324	7210
q10	3044	3346	2798	2798
q11	589	510	489	489
q12	657	801	644	644
q13	3536	3900	3195	3195
q14	304	302	309	302
q15	551	529	489	489
q16	637	690	635	635
q17	1219	1721	1253	1253
q18	7711	7619	7366	7366
q19	866	1121	1146	1121
q20	1959	2028	1890	1890
q21	5560	5011	4899	4899
q22	627	629	610	610
Total cold run time: 54091 ms
Total hot run time: 52356 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197974 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 56b84ba2773a791cfe0fd35bf664e4fc045baabd, data reload: false

query1	1294	960	925	925
query2	6496	2466	2336	2336
query3	10997	4739	4717	4717
query4	33047	23733	23657	23657
query5	4218	611	454	454
query6	282	211	212	211
query7	3988	495	307	307
query8	284	235	257	235
query9	9339	2625	2629	2625
query10	430	301	242	242
query11	17959	15505	15269	15269
query12	146	109	107	107
query13	1570	551	403	403
query14	9389	7305	7568	7305
query15	223	220	199	199
query16	8114	603	488	488
query17	1586	812	601	601
query18	2137	427	359	359
query19	209	205	176	176
query20	119	118	113	113
query21	209	140	115	115
query22	4564	4547	4443	4443
query23	34463	34619	33881	33881
query24	6865	2305	2340	2305
query25	527	459	392	392
query26	740	280	161	161
query27	2294	485	340	340
query28	5148	2466	2481	2466
query29	553	550	431	431
query30	210	195	160	160
query31	993	941	875	875
query32	84	61	56	56
query33	500	357	315	315
query34	760	893	503	503
query35	838	859	738	738
query36	1025	1054	955	955
query37	117	104	69	69
query38	4207	4425	4245	4245
query39	1524	1492	1481	1481
query40	205	114	103	103
query41	46	41	40	40
query42	124	109	103	103
query43	568	560	522	522
query44	1374	832	830	830
query45	192	174	167	167
query46	896	1076	675	675
query47	2035	1990	1954	1954
query48	383	439	326	326
query49	717	501	405	405
query50	679	679	391	391
query51	7196	7314	7202	7202
query52	104	103	96	96
query53	239	256	188	188
query54	499	521	422	422
query55	84	77	81	77
query56	261	255	257	255
query57	1247	1268	1170	1170
query58	233	231	227	227
query59	3315	3402	3205	3205
query60	286	274	251	251
query61	115	109	104	104
query62	900	833	764	764
query63	243	194	200	194
query64	3640	1019	690	690
query65	3336	3280	3250	3250
query66	1001	438	311	311
query67	16441	15823	15562	15562
query68	9114	701	510	510
query69	483	294	258	258
query70	1188	1090	1071	1071
query71	432	291	268	268
query72	6308	3796	3889	3796
query73	657	752	361	361
query74	10475	9140	8976	8976
query75	3976	3165	2651	2651
query76	3623	1191	784	784
query77	775	372	283	283
query78	10267	10122	9457	9457
query79	3175	809	593	593
query80	648	508	428	428
query81	490	269	228	228
query82	683	157	130	130
query83	164	159	149	149
query84	246	92	72	72
query85	798	358	377	358
query86	396	321	278	278
query87	4497	4494	4636	4494
query88	4629	2171	2167	2167
query89	420	335	297	297
query90	1744	185	186	185
query91	131	131	106	106
query92	65	56	53	53
query93	1757	867	538	538
query94	650	380	278	278
query95	331	264	246	246
query96	493	607	273	273
query97	2890	3008	2834	2834
query98	219	199	192	192
query99	1714	1547	1516	1516
Total cold run time: 297245 ms
Total hot run time: 197974 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.75 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 56b84ba2773a791cfe0fd35bf664e4fc045baabd, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.03	0.04
query3	0.24	0.07	0.07
query4	1.63	0.10	0.11
query5	0.39	0.43	0.42
query6	1.14	0.64	0.64
query7	0.02	0.01	0.02
query8	0.04	0.04	0.03
query9	0.59	0.50	0.51
query10	0.55	0.57	0.54
query11	0.14	0.09	0.10
query12	0.15	0.11	0.11
query13	0.62	0.60	0.60
query14	2.72	2.86	2.84
query15	0.89	0.82	0.83
query16	0.39	0.38	0.39
query17	1.05	1.07	1.03
query18	0.22	0.21	0.20
query19	1.92	1.86	2.00
query20	0.01	0.01	0.00
query21	15.38	0.94	0.61
query22	0.75	0.76	0.61
query23	15.40	1.40	0.64
query24	2.60	1.17	1.26
query25	0.19	0.08	0.12
query26	0.28	0.16	0.13
query27	0.07	0.04	0.05
query28	13.73	1.54	1.05
query29	12.61	4.01	3.31
query30	0.25	0.09	0.07
query31	2.83	0.60	0.38
query32	3.24	0.53	0.45
query33	3.15	3.08	3.11
query34	16.79	5.17	4.60
query35	4.54	4.47	4.50
query36	0.84	0.47	0.47
query37	0.09	0.06	0.06
query38	0.05	0.04	0.03
query39	0.03	0.03	0.03
query40	0.17	0.12	0.11
query41	0.09	0.02	0.02
query42	0.04	0.02	0.03
query43	0.03	0.02	0.03
Total cold run time: 105.96 s
Total hot run time: 31.75 s

Copy link
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 7, 2025
Copy link
Contributor

github-actions bot commented Jan 7, 2025

PR approved by at least one committer and no changes requested.

Copy link
Contributor

github-actions bot commented Jan 7, 2025

PR approved by anyone and no changes requested.

@liaoxin01 liaoxin01 added dev/2.1.x and removed approved Indicates a PR has been approved by one committer. reviewed labels Jan 7, 2025
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring added the usercase Important user case type label label Jan 7, 2025
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 7, 2025
Copy link
Contributor

github-actions bot commented Jan 7, 2025

PR approved by at least one committer and no changes requested.

Copy link
Contributor

github-actions bot commented Jan 7, 2025

PR approved by anyone and no changes requested.

@dataroaring dataroaring merged commit d7b28f5 into apache:master Jan 7, 2025
27 of 29 checks passed
github-actions bot pushed a commit that referenced this pull request Jan 7, 2025
1. **reset other msg in the a stream window**
The routine load job is a continuously scheduled job, and as the job
runs, previous errors do not need to be constantly displayed.

2. **show error info when  transaction of sub task failed**
If a subtask fails, it will continuously retry, and there may be some
errors that prevent the job from scheduling and consuming data properly,
such as continuous too many segments error(code: -235). At this time, it
is necessary to display it in a timely manner to make the user aware.

3. **set pause reason to other msg when reschedule job**
For jobs that are unexpectedly paused, the job manager has an auto
resume mechanism. However, for some scenarios, such as not being able to
connect to Kafka and being auto resumed after pause to retry, it may
cause users to not see the problem for a long time. Unexpectedly paused
jobs always have issues, even if auto resume occurs, the reason for the
error needs to be displayed.
github-actions bot pushed a commit that referenced this pull request Jan 7, 2025
1. **reset other msg in the a stream window**
The routine load job is a continuously scheduled job, and as the job
runs, previous errors do not need to be constantly displayed.

2. **show error info when  transaction of sub task failed**
If a subtask fails, it will continuously retry, and there may be some
errors that prevent the job from scheduling and consuming data properly,
such as continuous too many segments error(code: -235). At this time, it
is necessary to display it in a timely manner to make the user aware.

3. **set pause reason to other msg when reschedule job**
For jobs that are unexpectedly paused, the job manager has an auto
resume mechanism. However, for some scenarios, such as not being able to
connect to Kafka and being auto resumed after pause to retry, it may
cause users to not see the problem for a long time. Unexpectedly paused
jobs always have issues, even if auto resume occurs, the reason for the
error needs to be displayed.
yiguolei pushed a commit that referenced this pull request Jan 8, 2025
1. **reset other msg in the a stream window**
The routine load job is a continuously scheduled job, and as the job
runs, previous errors do not need to be constantly displayed.

2. **show error info when  transaction of sub task failed**
If a subtask fails, it will continuously retry, and there may be some
errors that prevent the job from scheduling and consuming data properly,
such as continuous too many segments error(code: -235). At this time, it
is necessary to display it in a timely manner to make the user aware.

3. **set pause reason to other msg when reschedule job**
For jobs that are unexpectedly paused, the job manager has an auto
resume mechanism. However, for some scenarios, such as not being able to
connect to Kafka and being auto resumed after pause to retry, it may
cause users to not see the problem for a long time. Unexpectedly paused
jobs always have issues, even if auto resume occurs, the reason for the
error needs to be displayed.
yiguolei pushed a commit that referenced this pull request Jan 8, 2025
1. **reset other msg in the a stream window**
The routine load job is a continuously scheduled job, and as the job
runs, previous errors do not need to be constantly displayed.

2. **show error info when  transaction of sub task failed**
If a subtask fails, it will continuously retry, and there may be some
errors that prevent the job from scheduling and consuming data properly,
such as continuous too many segments error(code: -235). At this time, it
is necessary to display it in a timely manner to make the user aware.

3. **set pause reason to other msg when reschedule job**
For jobs that are unexpectedly paused, the job manager has an auto
resume mechanism. However, for some scenarios, such as not being able to
connect to Kafka and being auto resumed after pause to retry, it may
cause users to not see the problem for a long time. Unexpectedly paused
jobs always have issues, even if auto resume occurs, the reason for the
error needs to be displayed.
yiguolei pushed a commit that referenced this pull request Jan 10, 2025
dataroaring pushed a commit that referenced this pull request Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.8-merged dev/3.0.4-merged reviewed usercase Important user case type label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants