forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathlog.txt
2864 lines (2864 loc) · 521 KB
/
log.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
INFO 09-18 07:16:52 llm_engine.py:170] Initializing an LLM engine (v0.5.0) with config: model='facebook/opt-6.7b', speculative_config=None, tokenizer='facebook/opt-6.7b', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=facebook/opt-6.7b)
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:16:54 multiproc_worker_utils.py:216] Worker ready; awaiting tasks
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:16:54 multiproc_worker_utils.py:216] Worker ready; awaiting tasks
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:16:54 multiproc_worker_utils.py:216] Worker ready; awaiting tasks
INFO 09-18 07:16:55 utils.py:623] Found nccl from library libnccl.so.2
INFO 09-18 07:16:55 pynccl.py:65] vLLM is using nccl==2.20.5
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:16:55 utils.py:623] Found nccl from library libnccl.so.2
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:16:55 pynccl.py:65] vLLM is using nccl==2.20.5
INFO 09-18 07:16:55 custom_all_reduce_utils.py:180] reading GPU P2P access cache from /home/lrq/.config/vllm/gpu_p2p_access_cache_for_0,1,2,3.json
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:16:55 custom_all_reduce_utils.py:180] reading GPU P2P access cache from /home/lrq/.config/vllm/gpu_p2p_access_cache_for_0,1,2,3.json
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:16:55 utils.py:623] Found nccl from library libnccl.so.2
INFO 09-18 07:16:55 utils.py:623] Found nccl from library libnccl.so.2
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:16:55 pynccl.py:65] vLLM is using nccl==2.20.5
INFO 09-18 07:16:55 pynccl.py:65] vLLM is using nccl==2.20.5
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:16:55 utils.py:623] Found nccl from library libnccl.so.2
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:16:55 utils.py:623] Found nccl from library libnccl.so.2
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:16:55 pynccl.py:65] vLLM is using nccl==2.20.5
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:16:55 pynccl.py:65] vLLM is using nccl==2.20.5
WARNING 09-18 07:16:55 custom_all_reduce.py:170] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
[1;36m(VllmWorkerProcess pid=898982)[0;0m WARNING 09-18 07:16:55 custom_all_reduce.py:170] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
[1;36m(VllmWorkerProcess pid=898984)[0;0m WARNING 09-18 07:16:55 custom_all_reduce.py:170] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
[1;36m(VllmWorkerProcess pid=898983)[0;0m WARNING 09-18 07:16:55 custom_all_reduce.py:170] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.
INFO 09-18 07:16:56 weight_utils.py:218] Using model weights format ['*.bin']
INFO 09-18 07:17:03 model_runner.py:212] Loading model weights took 12.4036 GB
INFO 09-18 07:17:04 distributed_gpu_executor.py:59] # GPU blocks: 964, # CPU blocks: 512
INFO 09-18 07:17:06 llm_engine.py:759] ---------Start 0'th liquid: Scale out from GPU0 to GPU1---------
INFO 09-18 07:17:06 multiproc_gpu_executor.py:258] Start to do liquid from src: 0 to dst: 1 with shard_ids: [2, 3]
INFO 09-18 07:17:06 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.508 GB, free space: 2.129GB, frag space: 0.010GB, [(898899, 21.529296875)]
It takes: 0.28s to send model shards
INFO 09-18 07:17:07 worker.py:152] send weights shards takes: 0.33s, sent out: 6.21GB, sent bw: 18.90GB/s
INFO 09-18 07:17:07 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 14.304 GB, reserved space on GPU 0: 14.379 GB, free space: 8.257GB, frag space: 0.075GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 9.483GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m It takes: 0.05s to init model weights
[1;36m(VllmWorkerProcess pid=898982)[0;0m It takes: 0.22s to recv shards
[1;36m(VllmWorkerProcess pid=898982)[0;0m It takes 0.00 to load shards
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:07 worker.py:215] Before recv kvc shards, allocated space on GPU 1: 10.499 GB, reserved space on GPU 1: 16.660 GB, free space: 5.790GB, frag space: 6.161GB
INFO 09-18 07:17:07 cache_engine.py:159] send kvc shards takes: 0.11s, sent out: 3.77GB, sent bw: 34.05GB/s
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:07 worker.py:217] After recv kvc shards, allocated space on GPU 1: 14.265 GB, reserved space on GPU 1: 20.473 GB, free space: 1.977GB, frag space: 6.208GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:07 worker.py:223] After appending kvc shards, allocated space on GPU 1: 10.499 GB, reserved space on GPU 1: 20.473 GB, free space: 1.977GB, frag space: 9.974GB
INFO 09-18 07:17:07 cache_engine.py:165] Successfully send kv cache shards: [2, 3] to rank: 1
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:07 worker.py:228] After appending kvc shards, allocated space on GPU 1: 10.499 GB, reserved space on GPU 1: 10.641 GB, free space: 11.809GB, frag space: 0.142GB
INFO 09-18 07:17:07 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 10.629 GB, free space: 12.007GB, frag space: 0.122GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.809GB, frag space: 0.000GB
2417
INFO 09-18 07:17:07 multiproc_gpu_executor.py:171] After scale out, num_gpu_blocks: #3381
INFO 09-18 07:17:08 worker.py:202] extend gpu in worker takes: 0.28s
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:08 worker.py:202] extend gpu in worker takes: 0.29s
INFO 09-18 07:17:08 llm_engine.py:773] Finished liquid for 0 times, output: Completed! Move shard: [2, 3] from [0] to [1];liquid e2e latency: 1.37s, update worker latency: 0.00s, liquid model weights latency: 0.36s, init mem latency: 0.27s, liquid kvc latency: 0.44s, extending gpu blocks latency: 0.29s, update blocks latency: 0.00s;, current mem info on GPU0: allocated space on GPU 0: 19.948 GB, reserved space on GPU 0: 20.066 GB, free space: 2.570GB, frag space: 0.118GB, current gpu block: #3381
INFO 09-18 07:17:08 llm_engine.py:759] ---------Start 1'th liquid: Scale out from GPU[0,1] to GPU[2,3]---------
INFO 09-18 07:17:08 multiproc_gpu_executor.py:258] Start to do liquid from src: 0 to dst: 2 with shard_ids: [1]
INFO 09-18 07:17:08 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 19.948 GB, reserved space on GPU 0: 20.068 GB, free space: 2.451GB, frag space: 0.120GB, [(898899, 21.20703125)]
It takes: 0.37s to send model shards
INFO 09-18 07:17:08 worker.py:152] send weights shards takes: 0.40s, sent out: 3.11GB, sent bw: 7.87GB/s
INFO 09-18 07:17:08 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 16.852 GB, reserved space on GPU 0: 17.346 GB, free space: 5.115GB, frag space: 0.494GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 2.346GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m It takes: 0.08s to init model weights
[1;36m(VllmWorkerProcess pid=898983)[0;0m It takes: 0.28s to recv shards
[1;36m(VllmWorkerProcess pid=898983)[0;0m It takes 0.00 to load shards
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:08 worker.py:215] Before recv kvc shards, allocated space on GPU 2: 10.238 GB, reserved space on GPU 2: 13.672 GB, free space: 9.115GB, frag space: 3.433GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:09 worker.py:217] After recv kvc shards, allocated space on GPU 2: 16.863 GB, reserved space on GPU 2: 20.297 GB, free space: 2.490GB, frag space: 3.433GB
INFO 09-18 07:17:09 cache_engine.py:159] send kvc shards takes: 0.35s, sent out: 6.60GB, sent bw: 18.77GB/s
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:09 worker.py:223] After appending kvc shards, allocated space on GPU 2: 10.238 GB, reserved space on GPU 2: 20.297 GB, free space: 2.490GB, frag space: 10.058GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:09 worker.py:228] After appending kvc shards, allocated space on GPU 2: 10.238 GB, reserved space on GPU 2: 10.305 GB, free space: 12.482GB, frag space: 0.066GB
INFO 09-18 07:17:09 cache_engine.py:165] Successfully send kv cache shards: [1] to rank: 2
INFO 09-18 07:17:09 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 10.270 GB, reserved space on GPU 0: 10.721 GB, free space: 11.740GB, frag space: 0.451GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 2.346GB, frag space: 0.000GB
INFO 09-18 07:17:09 multiproc_gpu_executor.py:258] Start to do liquid from src: 1 to dst: 3 with shard_ids: [3]
INFO 09-18 07:17:09 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 10.270 GB, reserved space on GPU 0: 10.721 GB, free space: 11.740GB, frag space: 0.451GB, [(898899, 11.91796875)]
[1;36m(VllmWorkerProcess pid=898982)[0;0m It takes: 0.32s to send model shards
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:09 worker.py:152] send weights shards takes: 0.35s, sent out: 3.11GB, sent bw: 8.98GB/s
INFO 09-18 07:17:09 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 10.270 GB, reserved space on GPU 0: 10.721 GB, free space: 11.740GB, frag space: 0.451GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 5.352GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m It takes: 0.05s to init model weights
[1;36m(VllmWorkerProcess pid=898984)[0;0m It takes: 0.26s to recv shards
[1;36m(VllmWorkerProcess pid=898984)[0;0m It takes 0.00 to load shards
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:09 worker.py:215] Before recv kvc shards, allocated space on GPU 3: 10.238 GB, reserved space on GPU 3: 13.672 GB, free space: 9.115GB, frag space: 3.433GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:10 cache_engine.py:159] send kvc shards takes: 0.35s, sent out: 6.60GB, sent bw: 18.77GB/s
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:10 worker.py:217] After recv kvc shards, allocated space on GPU 3: 16.863 GB, reserved space on GPU 3: 20.297 GB, free space: 2.490GB, frag space: 3.433GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:10 worker.py:223] After appending kvc shards, allocated space on GPU 3: 10.238 GB, reserved space on GPU 3: 20.297 GB, free space: 2.490GB, frag space: 10.058GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:10 worker.py:228] After appending kvc shards, allocated space on GPU 3: 10.238 GB, reserved space on GPU 3: 10.305 GB, free space: 12.482GB, frag space: 0.066GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:10 cache_engine.py:165] Successfully send kv cache shards: [3] to rank: 3
INFO 09-18 07:17:10 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 10.270 GB, reserved space on GPU 0: 10.721 GB, free space: 11.740GB, frag space: 0.451GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.977GB, frag space: 0.000GB
INFO 09-18 07:17:10 multiproc_gpu_executor.py:189] After scale out, num_gpu_blocks: #8179
INFO 09-18 07:17:10 worker.py:202] extend gpu in worker takes: 0.41s
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:10 worker.py:202] extend gpu in worker takes: 0.42s
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:10 worker.py:202] extend gpu in worker takes: 0.42s
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:10 worker.py:202] extend gpu in worker takes: 0.42s
INFO 09-18 07:17:10 llm_engine.py:773] Finished liquid for 1 times, output: Completed! Move shard: [1, 3] from [0, 1] to [2, 3];liquid e2e latency: 2.32s, update worker latency: 0.00s, liquid model weights latency: 1.31s, init mem latency: 0.00s, liquid kvc latency: 0.57s, extending gpu blocks latency: 0.43s, update blocks latency: 0.00s;, current mem info on GPU0: allocated space on GPU 0: 19.645 GB, reserved space on GPU 0: 20.096 GB, free space: 2.365GB, frag space: 0.451GB, current gpu block: #8179
INFO 09-18 07:17:11 llm_engine.py:759] ---------Start 2'th liquid: Scale in from GPU[2,3] to GPU[0,1]---------
INFO 09-18 07:17:11 multiproc_gpu_executor.py:242] Shrink to: #3381, currently using blocks: #1
INFO 09-18 07:17:11 multiproc_gpu_executor.py:243] Before move and shrink: allocated space on GPU 0: 19.645 GB, reserved space on GPU 0: 20.098 GB, free space: 2.078GB, frag space: 0.453GB
INFO 09-18 07:17:11 multiproc_gpu_executor.py:245] After move and shrink: allocated space on GPU 0: 10.270 GB, reserved space on GPU 0: 10.721 GB, free space: 11.455GB, frag space: 0.451GB
INFO 09-18 07:17:11 multiproc_gpu_executor.py:258] Start to do liquid from src: 2 to dst: 0 with shard_ids: [1]
INFO 09-18 07:17:11 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 10.270 GB, reserved space on GPU 0: 10.721 GB, free space: 11.455GB, frag space: 0.451GB, [(898899, 12.203125)]
INFO 09-18 07:17:11 worker.py:165] Before appending weights shards, allocated space on GPU 0: 10.270 GB, reserved space on GPU 0: 10.721 GB, free space: 11.455GB, frag space: 0.451GB
INFO 09-18 07:17:11 worker.py:168] After recving weights shards, allocated space on GPU 0: 13.366 GB, reserved space on GPU 0: 13.592 GB, free space: 8.564GB, frag space: 0.226GB
INFO 09-18 07:17:11 worker.py:171] After appending weights shards, allocated space on GPU 0: 13.366 GB, reserved space on GPU 0: 16.316 GB, free space: 5.839GB, frag space: 2.950GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m It takes: 0.34s to send model shards
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:11 worker.py:152] send weights shards takes: 0.37s, sent out: 3.10GB, sent bw: 8.42GB/s
INFO 09-18 07:17:11 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 13.366 GB, reserved space on GPU 0: 16.316 GB, free space: 5.839GB, frag space: 2.950GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.747GB, frag space: 0.000GB
INFO 09-18 07:17:11 worker.py:215] Before recv kvc shards, allocated space on GPU 0: 13.366 GB, reserved space on GPU 0: 16.316 GB, free space: 5.839GB, frag space: 2.950GB
INFO 09-18 07:17:12 worker.py:217] After recv kvc shards, allocated space on GPU 0: 19.991 GB, reserved space on GPU 0: 20.449 GB, free space: 1.707GB, frag space: 0.458GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:12 cache_engine.py:159] send kvc shards takes: 0.47s, sent out: 6.60GB, sent bw: 14.07GB/s
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:12 cache_engine.py:165] Successfully send kv cache shards: [1] to rank: 0
INFO 09-18 07:17:12 worker.py:223] After appending kvc shards, allocated space on GPU 0: 19.948 GB, reserved space on GPU 0: 22.105 GB, free space: 0.050GB, frag space: 2.157GB
INFO 09-18 07:17:12 worker.py:228] After appending kvc shards, allocated space on GPU 0: 19.948 GB, reserved space on GPU 0: 20.449 GB, free space: 1.707GB, frag space: 0.501GB
INFO 09-18 07:17:12 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 19.948 GB, reserved space on GPU 0: 20.449 GB, free space: 1.707GB, frag space: 0.501GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.747GB, frag space: 0.000GB
INFO 09-18 07:17:12 multiproc_gpu_executor.py:258] Start to do liquid from src: 3 to dst: 1 with shard_ids: [3]
INFO 09-18 07:17:12 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 19.948 GB, reserved space on GPU 0: 20.449 GB, free space: 1.707GB, frag space: 0.501GB, [(898899, 21.951171875)]
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:12 worker.py:165] Before appending weights shards, allocated space on GPU 1: 10.270 GB, reserved space on GPU 1: 10.314 GB, free space: 11.747GB, frag space: 0.045GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:13 worker.py:168] After recving weights shards, allocated space on GPU 1: 13.366 GB, reserved space on GPU 1: 13.623 GB, free space: 8.419GB, frag space: 0.257GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:13 worker.py:171] After appending weights shards, allocated space on GPU 1: 13.366 GB, reserved space on GPU 1: 16.551 GB, free space: 5.491GB, frag space: 3.185GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m It takes: 0.24s to send model shards
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:13 worker.py:152] send weights shards takes: 0.26s, sent out: 3.10GB, sent bw: 11.95GB/s
INFO 09-18 07:17:13 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 19.948 GB, reserved space on GPU 0: 20.449 GB, free space: 1.707GB, frag space: 0.501GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 5.491GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:13 worker.py:215] Before recv kvc shards, allocated space on GPU 1: 13.366 GB, reserved space on GPU 1: 16.551 GB, free space: 5.491GB, frag space: 3.185GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:13 worker.py:217] After recv kvc shards, allocated space on GPU 1: 19.991 GB, reserved space on GPU 1: 20.141 GB, free space: 1.901GB, frag space: 0.149GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:13 cache_engine.py:159] send kvc shards takes: 0.49s, sent out: 6.60GB, sent bw: 13.50GB/s
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:13 cache_engine.py:165] Successfully send kv cache shards: [3] to rank: 1
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:13 worker.py:223] After appending kvc shards, allocated space on GPU 1: 19.948 GB, reserved space on GPU 1: 21.797 GB, free space: 0.245GB, frag space: 1.849GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:13 worker.py:228] After appending kvc shards, allocated space on GPU 1: 19.948 GB, reserved space on GPU 1: 20.141 GB, free space: 1.901GB, frag space: 0.192GB
INFO 09-18 07:17:13 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 19.948 GB, reserved space on GPU 0: 20.449 GB, free space: 1.707GB, frag space: 0.501GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 1.901GB, frag space: 0.000GB
INFO 09-18 07:17:13 llm_engine.py:773] Finished liquid for 2 times, output: Completed! Move shard: [1, 3] from [2, 3] to [0, 1];liquid e2e latency: 2.83s, move and shrink latency: -1726640231.14s, update worker latency: 1726640231.14s, liquid model weights latency: 1.99s, init mem latency: 0.00s, liquid kvc latency: 0.83s, update blocks latency: 0.00s;, current mem info on GPU0: allocated space on GPU 0: 19.948 GB, reserved space on GPU 0: 20.449 GB, free space: 1.707GB, frag space: 0.501GB, current gpu block: #3381
INFO 09-18 07:17:14 llm_engine.py:759] ---------Start 3'th liquid: Scale in from GPU1 to GPU 0---------
INFO 09-18 07:17:14 multiproc_gpu_executor.py:216] Shrink to: #964, currently using blocks: #1
INFO 09-18 07:17:14 multiproc_gpu_executor.py:217] Before move and shrink: on GPU: allocated space on GPU 0: 19.948 GB, reserved space on GPU 0: 20.451 GB, free space: 1.705GB, frag space: 0.503GB
INFO 09-18 07:17:14 multiproc_gpu_executor.py:219] After move and shrink: allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 11.012 GB, free space: 11.144GB, frag space: 0.505GB
INFO 09-18 07:17:14 multiproc_gpu_executor.py:258] Start to do liquid from src: 1 to dst: 0 with shard_ids: [2, 3]
INFO 09-18 07:17:14 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 11.012 GB, free space: 11.144GB, frag space: 0.505GB, [(898899, 12.513671875)]
INFO 09-18 07:17:14 worker.py:165] Before appending weights shards, allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 11.012 GB, free space: 11.144GB, frag space: 0.505GB
INFO 09-18 07:17:14 worker.py:168] After recving weights shards, allocated space on GPU 0: 16.700 GB, reserved space on GPU 0: 16.830 GB, free space: 5.326GB, frag space: 0.130GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m It takes: 0.33s to send model shards
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:14 worker.py:152] send weights shards takes: 0.36s, sent out: 6.19GB, sent bw: 17.24GB/s
INFO 09-18 07:17:14 worker.py:171] After appending weights shards, allocated space on GPU 0: 16.701 GB, reserved space on GPU 0: 20.371 GB, free space: 1.785GB, frag space: 3.670GB
INFO 09-18 07:17:14 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 16.701 GB, reserved space on GPU 0: 20.371 GB, free space: 1.785GB, frag space: 3.670GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 17.669GB, frag space: 0.000GB
INFO 09-18 07:17:14 worker.py:215] Before recv kvc shards, allocated space on GPU 0: 16.701 GB, reserved space on GPU 0: 20.371 GB, free space: 1.785GB, frag space: 3.670GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:14 cache_engine.py:159] send kvc shards takes: 0.19s, sent out: 3.77GB, sent bw: 19.34GB/s
INFO 09-18 07:17:14 worker.py:217] After recv kvc shards, allocated space on GPU 0: 20.466 GB, reserved space on GPU 0: 20.930 GB, free space: 1.226GB, frag space: 0.463GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:15 cache_engine.py:165] Successfully send kv cache shards: [2, 3] to rank: 0
INFO 09-18 07:17:15 worker.py:223] After appending kvc shards, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 21.344 GB, free space: 0.812GB, frag space: 0.846GB
INFO 09-18 07:17:15 worker.py:228] After appending kvc shards, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.867 GB, free space: 1.289GB, frag space: 0.370GB
INFO 09-18 07:17:15 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.867 GB, free space: 1.289GB, frag space: 0.370GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 21.481GB, frag space: 0.000GB
INFO 09-18 07:17:15 llm_engine.py:773] Finished liquid for 3 times, output: Completed! Move shard: [2, 3] from [1] to [0];liquid e2e latency: 1.31s, move and shrink latency: 0.12s, update worker latency: -0.11s, liquid model weights latency: 0.61s, init mem latency: 0.00s, liquid kvc latency: 0.70s, update blocks latency: 0.00s;, current mem info on GPU0: allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.867 GB, free space: 1.289GB, frag space: 0.370GB, current gpu block: #964
INFO 09-18 07:17:15 llm_engine.py:759] ---------Start 4'th liquid: Scale out from GPU0 to GPU1---------
INFO 09-18 07:17:15 multiproc_gpu_executor.py:258] Start to do liquid from src: 0 to dst: 1 with shard_ids: [2, 3]
INFO 09-18 07:17:15 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.869 GB, free space: 1.287GB, frag space: 0.372GB, [(898899, 22.37109375)]
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:15 worker.py:165] Before appending weights shards, allocated space on GPU 1: 0.548 GB, reserved space on GPU 1: 0.561 GB, free space: 21.481GB, frag space: 0.012GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:15 worker.py:168] After recving weights shards, allocated space on GPU 1: 6.741 GB, reserved space on GPU 1: 6.754 GB, free space: 15.288GB, frag space: 0.013GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:15 worker.py:171] After appending weights shards, allocated space on GPU 1: 6.741 GB, reserved space on GPU 1: 8.307 GB, free space: 13.735GB, frag space: 1.565GB
It takes: 0.23s to send model shards
INFO 09-18 07:17:15 worker.py:152] send weights shards takes: 0.27s, sent out: 6.19GB, sent bw: 22.62GB/s
INFO 09-18 07:17:15 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 14.304 GB, reserved space on GPU 0: 14.891 GB, free space: 7.265GB, frag space: 0.587GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 13.735GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:15 worker.py:215] Before recv kvc shards, allocated space on GPU 1: 6.741 GB, reserved space on GPU 1: 8.307 GB, free space: 13.735GB, frag space: 1.565GB
INFO 09-18 07:17:15 cache_engine.py:159] send kvc shards takes: 0.11s, sent out: 3.77GB, sent bw: 33.62GB/s
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:15 worker.py:217] After recv kvc shards, allocated space on GPU 1: 10.507 GB, reserved space on GPU 1: 12.119 GB, free space: 9.923GB, frag space: 1.612GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:15 worker.py:223] After appending kvc shards, allocated space on GPU 1: 10.507 GB, reserved space on GPU 1: 12.238 GB, free space: 9.804GB, frag space: 1.731GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:15 worker.py:228] After appending kvc shards, allocated space on GPU 1: 10.507 GB, reserved space on GPU 1: 10.619 GB, free space: 11.423GB, frag space: 0.112GB
INFO 09-18 07:17:15 cache_engine.py:165] Successfully send kv cache shards: [2, 3] to rank: 1
INFO 09-18 07:17:15 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 11.021 GB, free space: 11.134GB, frag space: 0.515GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.423GB, frag space: 0.000GB
2244
INFO 09-18 07:17:15 multiproc_gpu_executor.py:171] After scale out, num_gpu_blocks: #3208
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:16 worker.py:202] extend gpu in worker takes: 0.35s
INFO 09-18 07:17:16 worker.py:202] extend gpu in worker takes: 0.36s
INFO 09-18 07:17:16 llm_engine.py:773] Finished liquid for 4 times, output: Completed! Move shard: [2, 3] from [0] to [1];liquid e2e latency: 0.86s, update worker latency: 0.00s, liquid model weights latency: 0.33s, init mem latency: 0.00s, liquid kvc latency: 0.15s, extending gpu blocks latency: 0.36s, update blocks latency: 0.00s;, current mem info on GPU0: allocated space on GPU 0: 19.304 GB, reserved space on GPU 0: 19.891 GB, free space: 2.265GB, frag space: 0.587GB, current gpu block: #3208
INFO 09-18 07:17:16 llm_engine.py:759] ---------Start 5'th liquid: Scale out from GPU[0,1] to GPU[2,3]---------
INFO 09-18 07:17:16 multiproc_gpu_executor.py:258] Start to do liquid from src: 0 to dst: 2 with shard_ids: [1]
INFO 09-18 07:17:16 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 19.304 GB, reserved space on GPU 0: 19.893 GB, free space: 2.263GB, frag space: 0.589GB, [(898899, 21.39453125)]
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:16 worker.py:165] Before appending weights shards, allocated space on GPU 2: 0.525 GB, reserved space on GPU 2: 0.539 GB, free space: 22.000GB, frag space: 0.014GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:16 worker.py:168] After recving weights shards, allocated space on GPU 2: 3.621 GB, reserved space on GPU 2: 3.887 GB, free space: 18.652GB, frag space: 0.265GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:16 worker.py:171] After appending weights shards, allocated space on GPU 2: 3.621 GB, reserved space on GPU 2: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
It takes: 0.29s to send model shards
INFO 09-18 07:17:16 worker.py:152] send weights shards takes: 0.31s, sent out: 3.10GB, sent bw: 9.91GB/s
INFO 09-18 07:17:16 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 16.207 GB, reserved space on GPU 0: 16.816 GB, free space: 5.339GB, frag space: 0.609GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 2.673GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:16 worker.py:215] Before recv kvc shards, allocated space on GPU 2: 3.621 GB, reserved space on GPU 2: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
INFO 09-18 07:17:17 cache_engine.py:159] send kvc shards takes: 0.42s, sent out: 6.27GB, sent bw: 14.75GB/s
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:17 worker.py:217] After recv kvc shards, allocated space on GPU 2: 9.887 GB, reserved space on GPU 2: 10.977 GB, free space: 11.562GB, frag space: 1.090GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:17 worker.py:223] After appending kvc shards, allocated space on GPU 2: 9.887 GB, reserved space on GPU 2: 11.174 GB, free space: 11.365GB, frag space: 1.287GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:17 worker.py:228] After appending kvc shards, allocated space on GPU 2: 9.887 GB, reserved space on GPU 2: 10.020 GB, free space: 12.519GB, frag space: 0.132GB
INFO 09-18 07:17:17 cache_engine.py:165] Successfully send kv cache shards: [1] to rank: 2
INFO 09-18 07:17:17 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 9.910 GB, reserved space on GPU 0: 10.566 GB, free space: 11.589GB, frag space: 0.656GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 2.673GB, frag space: 0.000GB
INFO 09-18 07:17:17 multiproc_gpu_executor.py:258] Start to do liquid from src: 1 to dst: 3 with shard_ids: [3]
INFO 09-18 07:17:17 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 9.910 GB, reserved space on GPU 0: 10.566 GB, free space: 11.589GB, frag space: 0.656GB, [(898899, 12.068359375)]
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:17 worker.py:165] Before appending weights shards, allocated space on GPU 3: 0.525 GB, reserved space on GPU 3: 0.539 GB, free space: 22.000GB, frag space: 0.014GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:17 worker.py:168] After recving weights shards, allocated space on GPU 3: 3.621 GB, reserved space on GPU 3: 3.887 GB, free space: 18.652GB, frag space: 0.265GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:17 worker.py:171] After appending weights shards, allocated space on GPU 3: 3.621 GB, reserved space on GPU 3: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m It takes: 0.58s to send model shards
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:17 worker.py:152] send weights shards takes: 0.61s, sent out: 3.10GB, sent bw: 5.05GB/s
INFO 09-18 07:17:17 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 9.910 GB, reserved space on GPU 0: 10.566 GB, free space: 11.589GB, frag space: 0.656GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 5.659GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:17 worker.py:215] Before recv kvc shards, allocated space on GPU 3: 3.621 GB, reserved space on GPU 3: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:18 cache_engine.py:159] send kvc shards takes: 0.41s, sent out: 6.27GB, sent bw: 15.24GB/s
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:18 worker.py:217] After recv kvc shards, allocated space on GPU 3: 9.887 GB, reserved space on GPU 3: 10.977 GB, free space: 11.562GB, frag space: 1.090GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:18 worker.py:223] After appending kvc shards, allocated space on GPU 3: 9.887 GB, reserved space on GPU 3: 11.174 GB, free space: 11.365GB, frag space: 1.287GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:18 worker.py:228] After appending kvc shards, allocated space on GPU 3: 9.887 GB, reserved space on GPU 3: 10.020 GB, free space: 12.519GB, frag space: 0.132GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:18 cache_engine.py:165] Successfully send kv cache shards: [3] to rank: 3
INFO 09-18 07:17:18 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 9.910 GB, reserved space on GPU 0: 10.566 GB, free space: 11.589GB, frag space: 0.656GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.909GB, frag space: 0.000GB
INFO 09-18 07:17:18 multiproc_gpu_executor.py:189] After scale out, num_gpu_blocks: #7929
INFO 09-18 07:17:18 worker.py:202] extend gpu in worker takes: 0.37s
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:18 worker.py:202] extend gpu in worker takes: 0.37s
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:18 worker.py:202] extend gpu in worker takes: 0.39s
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:18 worker.py:202] extend gpu in worker takes: 0.40s
INFO 09-18 07:17:18 llm_engine.py:773] Finished liquid for 5 times, output: Completed! Move shard: [1, 3] from [0, 1] to [2, 3];liquid e2e latency: 2.46s, update worker latency: 0.01s, liquid model weights latency: 1.54s, init mem latency: 0.00s, liquid kvc latency: 0.50s, extending gpu blocks latency: 0.41s, update blocks latency: 0.01s;, current mem info on GPU0: allocated space on GPU 0: 19.145 GB, reserved space on GPU 0: 19.754 GB, free space: 2.402GB, frag space: 0.609GB, current gpu block: #7929
INFO 09-18 07:17:18 llm_engine.py:759] ---------Start 6'th liquid: Scale in from GPU[2,3] to GPU[0,1]---------
INFO 09-18 07:17:18 multiproc_gpu_executor.py:242] Shrink to: #3208, currently using blocks: #1
INFO 09-18 07:17:18 multiproc_gpu_executor.py:243] Before move and shrink: allocated space on GPU 0: 19.145 GB, reserved space on GPU 0: 19.756 GB, free space: 2.400GB, frag space: 0.611GB
INFO 09-18 07:17:19 multiproc_gpu_executor.py:245] After move and shrink: allocated space on GPU 0: 9.910 GB, reserved space on GPU 0: 10.566 GB, free space: 11.589GB, frag space: 0.656GB
INFO 09-18 07:17:19 multiproc_gpu_executor.py:258] Start to do liquid from src: 2 to dst: 0 with shard_ids: [1]
INFO 09-18 07:17:19 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 9.910 GB, reserved space on GPU 0: 10.566 GB, free space: 11.589GB, frag space: 0.656GB, [(898899, 12.068359375)]
INFO 09-18 07:17:19 worker.py:165] Before appending weights shards, allocated space on GPU 0: 9.910 GB, reserved space on GPU 0: 10.566 GB, free space: 11.589GB, frag space: 0.656GB
INFO 09-18 07:17:19 worker.py:168] After recving weights shards, allocated space on GPU 0: 13.007 GB, reserved space on GPU 0: 13.270 GB, free space: 8.886GB, frag space: 0.263GB
INFO 09-18 07:17:19 worker.py:171] After appending weights shards, allocated space on GPU 0: 13.007 GB, reserved space on GPU 0: 16.244 GB, free space: 5.912GB, frag space: 3.237GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m It takes: 0.35s to send model shards
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:19 worker.py:152] send weights shards takes: 0.37s, sent out: 3.10GB, sent bw: 8.38GB/s
INFO 09-18 07:17:19 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 13.007 GB, reserved space on GPU 0: 16.244 GB, free space: 5.912GB, frag space: 3.237GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.909GB, frag space: 0.000GB
INFO 09-18 07:17:19 worker.py:215] Before recv kvc shards, allocated space on GPU 0: 13.007 GB, reserved space on GPU 0: 16.244 GB, free space: 5.912GB, frag space: 3.237GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:20 cache_engine.py:159] send kvc shards takes: 0.46s, sent out: 6.27GB, sent bw: 13.60GB/s
INFO 09-18 07:17:20 worker.py:217] After recv kvc shards, allocated space on GPU 0: 19.272 GB, reserved space on GPU 0: 19.689 GB, free space: 2.466GB, frag space: 0.417GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:20 cache_engine.py:165] Successfully send kv cache shards: [1] to rank: 0
INFO 09-18 07:17:20 worker.py:223] After appending kvc shards, allocated space on GPU 0: 19.304 GB, reserved space on GPU 0: 20.416 GB, free space: 1.740GB, frag space: 1.112GB
INFO 09-18 07:17:20 worker.py:228] After appending kvc shards, allocated space on GPU 0: 19.304 GB, reserved space on GPU 0: 19.627 GB, free space: 2.529GB, frag space: 0.323GB
INFO 09-18 07:17:20 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 19.304 GB, reserved space on GPU 0: 19.627 GB, free space: 2.529GB, frag space: 0.323GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.909GB, frag space: 0.000GB
INFO 09-18 07:17:20 multiproc_gpu_executor.py:258] Start to do liquid from src: 3 to dst: 1 with shard_ids: [3]
INFO 09-18 07:17:20 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 19.304 GB, reserved space on GPU 0: 19.627 GB, free space: 2.529GB, frag space: 0.323GB, [(898899, 21.12890625)]
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:20 worker.py:165] Before appending weights shards, allocated space on GPU 1: 9.910 GB, reserved space on GPU 1: 10.133 GB, free space: 11.909GB, frag space: 0.222GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m It takes: 0.24s to send model shards
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:20 worker.py:152] send weights shards takes: 0.26s, sent out: 3.10GB, sent bw: 11.94GB/s
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:20 worker.py:168] After recving weights shards, allocated space on GPU 1: 13.007 GB, reserved space on GPU 1: 13.309 GB, free space: 8.733GB, frag space: 0.302GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:20 worker.py:171] After appending weights shards, allocated space on GPU 1: 13.007 GB, reserved space on GPU 1: 16.111 GB, free space: 5.930GB, frag space: 3.104GB
INFO 09-18 07:17:20 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 19.304 GB, reserved space on GPU 0: 19.627 GB, free space: 2.529GB, frag space: 0.323GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 5.930GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:20 worker.py:215] Before recv kvc shards, allocated space on GPU 1: 13.007 GB, reserved space on GPU 1: 16.111 GB, free space: 5.930GB, frag space: 3.104GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:21 cache_engine.py:159] send kvc shards takes: 0.42s, sent out: 6.27GB, sent bw: 14.90GB/s
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:21 worker.py:217] After recv kvc shards, allocated space on GPU 1: 19.272 GB, reserved space on GPU 1: 19.514 GB, free space: 2.528GB, frag space: 0.241GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:21 cache_engine.py:165] Successfully send kv cache shards: [3] to rank: 1
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:21 worker.py:223] After appending kvc shards, allocated space on GPU 1: 19.304 GB, reserved space on GPU 1: 20.240 GB, free space: 1.802GB, frag space: 0.936GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:21 worker.py:228] After appending kvc shards, allocated space on GPU 1: 19.304 GB, reserved space on GPU 1: 19.451 GB, free space: 2.591GB, frag space: 0.147GB
INFO 09-18 07:17:21 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 19.304 GB, reserved space on GPU 0: 19.627 GB, free space: 2.529GB, frag space: 0.323GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 2.591GB, frag space: 0.000GB
INFO 09-18 07:17:21 llm_engine.py:773] Finished liquid for 6 times, output: Completed! Move shard: [1, 3] from [2, 3] to [0, 1];liquid e2e latency: 2.56s, move and shrink latency: -1726640238.98s, update worker latency: 1726640238.99s, liquid model weights latency: 1.83s, init mem latency: 0.00s, liquid kvc latency: 0.72s, update blocks latency: 0.00s;, current mem info on GPU0: allocated space on GPU 0: 19.304 GB, reserved space on GPU 0: 19.627 GB, free space: 2.529GB, frag space: 0.323GB, current gpu block: #3208
INFO 09-18 07:17:21 llm_engine.py:759] ---------Start 7'th liquid: Scale in from GPU1 to GPU 0---------
INFO 09-18 07:17:21 multiproc_gpu_executor.py:216] Shrink to: #964, currently using blocks: #1
INFO 09-18 07:17:21 multiproc_gpu_executor.py:217] Before move and shrink: on GPU: allocated space on GPU 0: 19.304 GB, reserved space on GPU 0: 19.629 GB, free space: 2.527GB, frag space: 0.325GB
INFO 09-18 07:17:21 multiproc_gpu_executor.py:219] After move and shrink: allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 10.877 GB, free space: 11.279GB, frag space: 0.370GB
INFO 09-18 07:17:21 multiproc_gpu_executor.py:258] Start to do liquid from src: 1 to dst: 0 with shard_ids: [2, 3]
INFO 09-18 07:17:21 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 10.877 GB, free space: 11.279GB, frag space: 0.370GB, [(898899, 12.37890625)]
INFO 09-18 07:17:21 worker.py:165] Before appending weights shards, allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 10.877 GB, free space: 11.279GB, frag space: 0.370GB
INFO 09-18 07:17:22 worker.py:168] After recving weights shards, allocated space on GPU 0: 16.700 GB, reserved space on GPU 0: 16.898 GB, free space: 5.257GB, frag space: 0.199GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m It takes: 0.32s to send model shards
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:22 worker.py:152] send weights shards takes: 0.35s, sent out: 6.19GB, sent bw: 17.57GB/s
INFO 09-18 07:17:22 worker.py:171] After appending weights shards, allocated space on GPU 0: 16.701 GB, reserved space on GPU 0: 20.627 GB, free space: 1.529GB, frag space: 3.926GB
INFO 09-18 07:17:22 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 16.701 GB, reserved space on GPU 0: 20.627 GB, free space: 1.529GB, frag space: 3.926GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 17.669GB, frag space: 0.000GB
INFO 09-18 07:17:22 worker.py:215] Before recv kvc shards, allocated space on GPU 0: 16.701 GB, reserved space on GPU 0: 20.627 GB, free space: 1.529GB, frag space: 3.926GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:22 cache_engine.py:159] send kvc shards takes: 0.19s, sent out: 3.77GB, sent bw: 19.93GB/s
INFO 09-18 07:17:22 worker.py:217] After recv kvc shards, allocated space on GPU 0: 20.466 GB, reserved space on GPU 0: 21.010 GB, free space: 1.146GB, frag space: 0.544GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:22 cache_engine.py:165] Successfully send kv cache shards: [2, 3] to rank: 0
INFO 09-18 07:17:22 worker.py:223] After appending kvc shards, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 21.424 GB, free space: 0.732GB, frag space: 0.926GB
INFO 09-18 07:17:22 worker.py:228] After appending kvc shards, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.947 GB, free space: 1.209GB, frag space: 0.450GB
INFO 09-18 07:17:22 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.947 GB, free space: 1.209GB, frag space: 0.450GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 21.481GB, frag space: 0.000GB
INFO 09-18 07:17:22 llm_engine.py:773] Finished liquid for 7 times, output: Completed! Move shard: [2, 3] from [1] to [0];liquid e2e latency: 1.29s, move and shrink latency: 0.11s, update worker latency: -0.11s, liquid model weights latency: 0.63s, init mem latency: 0.00s, liquid kvc latency: 0.66s, update blocks latency: 0.00s;, current mem info on GPU0: allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.947 GB, free space: 1.209GB, frag space: 0.450GB, current gpu block: #964
INFO 09-18 07:17:22 llm_engine.py:759] ---------Start 8'th liquid: Scale out from GPU0 to GPU1---------
INFO 09-18 07:17:22 multiproc_gpu_executor.py:258] Start to do liquid from src: 0 to dst: 1 with shard_ids: [2, 3]
INFO 09-18 07:17:22 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.949 GB, free space: 1.207GB, frag space: 0.452GB, [(898899, 22.451171875)]
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:22 worker.py:165] Before appending weights shards, allocated space on GPU 1: 0.548 GB, reserved space on GPU 1: 0.561 GB, free space: 21.481GB, frag space: 0.012GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:23 worker.py:168] After recving weights shards, allocated space on GPU 1: 6.741 GB, reserved space on GPU 1: 6.754 GB, free space: 15.288GB, frag space: 0.013GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:23 worker.py:171] After appending weights shards, allocated space on GPU 1: 6.741 GB, reserved space on GPU 1: 8.307 GB, free space: 13.735GB, frag space: 1.565GB
It takes: 0.22s to send model shards
INFO 09-18 07:17:23 worker.py:152] send weights shards takes: 0.26s, sent out: 6.19GB, sent bw: 23.45GB/s
INFO 09-18 07:17:23 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 14.304 GB, reserved space on GPU 0: 14.906 GB, free space: 7.250GB, frag space: 0.603GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 13.735GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:23 worker.py:215] Before recv kvc shards, allocated space on GPU 1: 6.741 GB, reserved space on GPU 1: 8.307 GB, free space: 13.735GB, frag space: 1.565GB
INFO 09-18 07:17:23 cache_engine.py:159] send kvc shards takes: 0.11s, sent out: 3.77GB, sent bw: 33.87GB/s
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:23 worker.py:217] After recv kvc shards, allocated space on GPU 1: 10.507 GB, reserved space on GPU 1: 12.119 GB, free space: 9.923GB, frag space: 1.612GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:23 worker.py:223] After appending kvc shards, allocated space on GPU 1: 10.507 GB, reserved space on GPU 1: 12.238 GB, free space: 9.804GB, frag space: 1.731GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:23 worker.py:228] After appending kvc shards, allocated space on GPU 1: 10.507 GB, reserved space on GPU 1: 10.619 GB, free space: 11.423GB, frag space: 0.112GB
INFO 09-18 07:17:23 cache_engine.py:165] Successfully send kv cache shards: [2, 3] to rank: 1
INFO 09-18 07:17:23 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 11.037 GB, free space: 11.119GB, frag space: 0.530GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.423GB, frag space: 0.000GB
2240
INFO 09-18 07:17:23 multiproc_gpu_executor.py:171] After scale out, num_gpu_blocks: #3204
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:23 worker.py:202] extend gpu in worker takes: 0.38s
INFO 09-18 07:17:23 worker.py:202] extend gpu in worker takes: 0.38s
INFO 09-18 07:17:23 llm_engine.py:773] Finished liquid for 8 times, output: Completed! Move shard: [2, 3] from [0] to [1];liquid e2e latency: 0.87s, update worker latency: 0.00s, liquid model weights latency: 0.33s, init mem latency: 0.00s, liquid kvc latency: 0.15s, extending gpu blocks latency: 0.38s, update blocks latency: 0.00s;, current mem info on GPU0: allocated space on GPU 0: 19.257 GB, reserved space on GPU 0: 19.906 GB, free space: 2.250GB, frag space: 0.649GB, current gpu block: #3204
INFO 09-18 07:17:23 llm_engine.py:759] ---------Start 9'th liquid: Scale out from GPU[0,1] to GPU[2,3]---------
INFO 09-18 07:17:23 multiproc_gpu_executor.py:258] Start to do liquid from src: 0 to dst: 2 with shard_ids: [1]
INFO 09-18 07:17:23 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 19.257 GB, reserved space on GPU 0: 19.908 GB, free space: 2.248GB, frag space: 0.651GB, [(898899, 21.41015625)]
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:23 worker.py:165] Before appending weights shards, allocated space on GPU 2: 0.525 GB, reserved space on GPU 2: 0.539 GB, free space: 22.000GB, frag space: 0.014GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:24 worker.py:168] After recving weights shards, allocated space on GPU 2: 3.621 GB, reserved space on GPU 2: 3.887 GB, free space: 18.652GB, frag space: 0.265GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:24 worker.py:171] After appending weights shards, allocated space on GPU 2: 3.621 GB, reserved space on GPU 2: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
It takes: 0.26s to send model shards
INFO 09-18 07:17:24 worker.py:152] send weights shards takes: 0.28s, sent out: 3.10GB, sent bw: 11.02GB/s
INFO 09-18 07:17:24 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 16.160 GB, reserved space on GPU 0: 16.582 GB, free space: 5.574GB, frag space: 0.422GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 2.673GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:24 worker.py:215] Before recv kvc shards, allocated space on GPU 2: 3.621 GB, reserved space on GPU 2: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
INFO 09-18 07:17:24 cache_engine.py:159] send kvc shards takes: 0.42s, sent out: 6.26GB, sent bw: 14.81GB/s
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:24 worker.py:217] After recv kvc shards, allocated space on GPU 2: 9.879 GB, reserved space on GPU 2: 10.977 GB, free space: 11.562GB, frag space: 1.097GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:24 worker.py:223] After appending kvc shards, allocated space on GPU 2: 9.879 GB, reserved space on GPU 2: 11.174 GB, free space: 11.365GB, frag space: 1.295GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:24 worker.py:228] After appending kvc shards, allocated space on GPU 2: 9.879 GB, reserved space on GPU 2: 10.020 GB, free space: 12.519GB, frag space: 0.140GB
INFO 09-18 07:17:24 cache_engine.py:165] Successfully send kv cache shards: [1] to rank: 2
INFO 09-18 07:17:24 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 9.903 GB, reserved space on GPU 0: 10.332 GB, free space: 11.824GB, frag space: 0.429GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 2.673GB, frag space: 0.000GB
INFO 09-18 07:17:24 multiproc_gpu_executor.py:258] Start to do liquid from src: 1 to dst: 3 with shard_ids: [3]
INFO 09-18 07:17:24 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 9.903 GB, reserved space on GPU 0: 10.332 GB, free space: 11.824GB, frag space: 0.429GB, [(898899, 11.833984375)]
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:24 worker.py:165] Before appending weights shards, allocated space on GPU 3: 0.525 GB, reserved space on GPU 3: 0.539 GB, free space: 22.000GB, frag space: 0.014GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:25 worker.py:168] After recving weights shards, allocated space on GPU 3: 3.621 GB, reserved space on GPU 3: 3.887 GB, free space: 18.652GB, frag space: 0.265GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:25 worker.py:171] After appending weights shards, allocated space on GPU 3: 3.621 GB, reserved space on GPU 3: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m It takes: 0.59s to send model shards
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:25 worker.py:152] send weights shards takes: 0.62s, sent out: 3.10GB, sent bw: 4.98GB/s
INFO 09-18 07:17:25 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 9.903 GB, reserved space on GPU 0: 10.332 GB, free space: 11.824GB, frag space: 0.429GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 5.659GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:25 worker.py:215] Before recv kvc shards, allocated space on GPU 3: 3.621 GB, reserved space on GPU 3: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:25 cache_engine.py:159] send kvc shards takes: 0.41s, sent out: 6.26GB, sent bw: 15.19GB/s
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:25 worker.py:217] After recv kvc shards, allocated space on GPU 3: 9.879 GB, reserved space on GPU 3: 10.977 GB, free space: 11.562GB, frag space: 1.097GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:25 worker.py:223] After appending kvc shards, allocated space on GPU 3: 9.879 GB, reserved space on GPU 3: 11.174 GB, free space: 11.365GB, frag space: 1.295GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:25 worker.py:228] After appending kvc shards, allocated space on GPU 3: 9.879 GB, reserved space on GPU 3: 10.020 GB, free space: 12.519GB, frag space: 0.140GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:25 cache_engine.py:165] Successfully send kv cache shards: [3] to rank: 3
INFO 09-18 07:17:25 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 9.903 GB, reserved space on GPU 0: 10.332 GB, free space: 11.824GB, frag space: 0.429GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.909GB, frag space: 0.000GB
INFO 09-18 07:17:25 multiproc_gpu_executor.py:189] After scale out, num_gpu_blocks: #8045
INFO 09-18 07:17:26 worker.py:202] extend gpu in worker takes: 0.36s
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:26 worker.py:202] extend gpu in worker takes: 0.38s
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:26 worker.py:202] extend gpu in worker takes: 0.38s
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:26 worker.py:202] extend gpu in worker takes: 0.39s
INFO 09-18 07:17:26 llm_engine.py:773] Finished liquid for 9 times, output: Completed! Move shard: [1, 3] from [0, 1] to [2, 3];liquid e2e latency: 2.42s, update worker latency: 0.01s, liquid model weights latency: 1.52s, init mem latency: 0.00s, liquid kvc latency: 0.50s, extending gpu blocks latency: 0.39s, update blocks latency: 0.01s;, current mem info on GPU0: allocated space on GPU 0: 19.358 GB, reserved space on GPU 0: 19.770 GB, free space: 2.386GB, frag space: 0.412GB, current gpu block: #8045
INFO 09-18 07:17:26 llm_engine.py:759] ---------Start 10'th liquid: Scale in from GPU[2,3] to GPU[0,1]---------
INFO 09-18 07:17:26 multiproc_gpu_executor.py:242] Shrink to: #3204, currently using blocks: #1
INFO 09-18 07:17:26 multiproc_gpu_executor.py:243] Before move and shrink: allocated space on GPU 0: 19.358 GB, reserved space on GPU 0: 19.771 GB, free space: 2.384GB, frag space: 0.414GB
INFO 09-18 07:17:26 multiproc_gpu_executor.py:245] After move and shrink: allocated space on GPU 0: 9.903 GB, reserved space on GPU 0: 10.332 GB, free space: 11.824GB, frag space: 0.429GB
INFO 09-18 07:17:26 multiproc_gpu_executor.py:258] Start to do liquid from src: 2 to dst: 0 with shard_ids: [1]
INFO 09-18 07:17:26 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 9.903 GB, reserved space on GPU 0: 10.332 GB, free space: 11.824GB, frag space: 0.429GB, [(898899, 11.833984375)]
INFO 09-18 07:17:26 worker.py:165] Before appending weights shards, allocated space on GPU 0: 9.903 GB, reserved space on GPU 0: 10.332 GB, free space: 11.824GB, frag space: 0.429GB
INFO 09-18 07:17:26 worker.py:168] After recving weights shards, allocated space on GPU 0: 12.999 GB, reserved space on GPU 0: 13.285 GB, free space: 8.871GB, frag space: 0.286GB
INFO 09-18 07:17:26 worker.py:171] After appending weights shards, allocated space on GPU 0: 12.999 GB, reserved space on GPU 0: 16.432 GB, free space: 5.724GB, frag space: 3.433GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m It takes: 0.34s to send model shards
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:26 worker.py:152] send weights shards takes: 0.36s, sent out: 3.10GB, sent bw: 8.51GB/s
INFO 09-18 07:17:26 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 12.999 GB, reserved space on GPU 0: 16.432 GB, free space: 5.724GB, frag space: 3.433GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.909GB, frag space: 0.000GB
INFO 09-18 07:17:26 worker.py:215] Before recv kvc shards, allocated space on GPU 0: 12.999 GB, reserved space on GPU 0: 16.432 GB, free space: 5.724GB, frag space: 3.433GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:27 cache_engine.py:159] send kvc shards takes: 0.46s, sent out: 6.26GB, sent bw: 13.59GB/s
INFO 09-18 07:17:27 worker.py:217] After recv kvc shards, allocated space on GPU 0: 19.257 GB, reserved space on GPU 0: 19.756 GB, free space: 2.400GB, frag space: 0.499GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:27 cache_engine.py:165] Successfully send kv cache shards: [1] to rank: 0
INFO 09-18 07:17:27 worker.py:223] After appending kvc shards, allocated space on GPU 0: 19.257 GB, reserved space on GPU 0: 20.482 GB, free space: 1.673GB, frag space: 1.226GB
INFO 09-18 07:17:27 worker.py:228] After appending kvc shards, allocated space on GPU 0: 19.257 GB, reserved space on GPU 0: 19.693 GB, free space: 2.463GB, frag space: 0.436GB
INFO 09-18 07:17:27 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 19.257 GB, reserved space on GPU 0: 19.693 GB, free space: 2.463GB, frag space: 0.436GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.909GB, frag space: 0.000GB
INFO 09-18 07:17:27 multiproc_gpu_executor.py:258] Start to do liquid from src: 3 to dst: 1 with shard_ids: [3]
INFO 09-18 07:17:27 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 19.257 GB, reserved space on GPU 0: 19.693 GB, free space: 2.463GB, frag space: 0.436GB, [(898899, 21.1953125)]
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:27 worker.py:165] Before appending weights shards, allocated space on GPU 1: 9.903 GB, reserved space on GPU 1: 10.133 GB, free space: 11.909GB, frag space: 0.230GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:28 worker.py:168] After recving weights shards, allocated space on GPU 1: 12.999 GB, reserved space on GPU 1: 13.309 GB, free space: 8.733GB, frag space: 0.310GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:28 worker.py:171] After appending weights shards, allocated space on GPU 1: 12.999 GB, reserved space on GPU 1: 16.111 GB, free space: 5.930GB, frag space: 3.112GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m It takes: 0.24s to send model shards
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:28 worker.py:152] send weights shards takes: 0.27s, sent out: 3.10GB, sent bw: 11.68GB/s
INFO 09-18 07:17:28 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 19.257 GB, reserved space on GPU 0: 19.693 GB, free space: 2.463GB, frag space: 0.436GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 5.930GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:28 worker.py:215] Before recv kvc shards, allocated space on GPU 1: 12.999 GB, reserved space on GPU 1: 16.111 GB, free space: 5.930GB, frag space: 3.112GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:28 cache_engine.py:159] send kvc shards takes: 0.42s, sent out: 6.26GB, sent bw: 15.07GB/s
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:28 worker.py:217] After recv kvc shards, allocated space on GPU 1: 19.257 GB, reserved space on GPU 1: 19.514 GB, free space: 2.528GB, frag space: 0.257GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:28 cache_engine.py:165] Successfully send kv cache shards: [3] to rank: 1
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:28 worker.py:223] After appending kvc shards, allocated space on GPU 1: 19.257 GB, reserved space on GPU 1: 20.240 GB, free space: 1.802GB, frag space: 0.983GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:28 worker.py:228] After appending kvc shards, allocated space on GPU 1: 19.257 GB, reserved space on GPU 1: 19.451 GB, free space: 2.591GB, frag space: 0.194GB
INFO 09-18 07:17:28 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 19.257 GB, reserved space on GPU 0: 19.693 GB, free space: 2.463GB, frag space: 0.436GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 2.591GB, frag space: 0.000GB
INFO 09-18 07:17:28 llm_engine.py:773] Finished liquid for 10 times, output: Completed! Move shard: [1, 3] from [2, 3] to [0, 1];liquid e2e latency: 2.61s, move and shrink latency: -1726640246.28s, update worker latency: 1726640246.28s, liquid model weights latency: 1.89s, init mem latency: 0.00s, liquid kvc latency: 0.71s, update blocks latency: 0.00s;, current mem info on GPU0: allocated space on GPU 0: 19.257 GB, reserved space on GPU 0: 19.693 GB, free space: 2.463GB, frag space: 0.436GB, current gpu block: #3204
INFO 09-18 07:17:28 llm_engine.py:759] ---------Start 11'th liquid: Scale in from GPU1 to GPU 0---------
INFO 09-18 07:17:28 multiproc_gpu_executor.py:216] Shrink to: #964, currently using blocks: #1
INFO 09-18 07:17:28 multiproc_gpu_executor.py:217] Before move and shrink: on GPU: allocated space on GPU 0: 19.257 GB, reserved space on GPU 0: 19.695 GB, free space: 2.461GB, frag space: 0.438GB
INFO 09-18 07:17:29 multiproc_gpu_executor.py:219] After move and shrink: allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 10.943 GB, free space: 11.213GB, frag space: 0.436GB
INFO 09-18 07:17:29 multiproc_gpu_executor.py:258] Start to do liquid from src: 1 to dst: 0 with shard_ids: [2, 3]
INFO 09-18 07:17:29 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 10.943 GB, free space: 11.213GB, frag space: 0.436GB, [(898899, 12.4453125)]
INFO 09-18 07:17:29 worker.py:165] Before appending weights shards, allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 10.943 GB, free space: 11.213GB, frag space: 0.436GB
INFO 09-18 07:17:29 worker.py:168] After recving weights shards, allocated space on GPU 0: 16.700 GB, reserved space on GPU 0: 16.902 GB, free space: 5.254GB, frag space: 0.203GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m It takes: 0.32s to send model shards
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:29 worker.py:152] send weights shards takes: 0.35s, sent out: 6.19GB, sent bw: 17.61GB/s
INFO 09-18 07:17:29 worker.py:171] After appending weights shards, allocated space on GPU 0: 16.701 GB, reserved space on GPU 0: 20.912 GB, free space: 1.244GB, frag space: 4.212GB
INFO 09-18 07:17:29 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 16.701 GB, reserved space on GPU 0: 20.912 GB, free space: 1.244GB, frag space: 4.212GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 17.669GB, frag space: 0.000GB
INFO 09-18 07:17:29 worker.py:215] Before recv kvc shards, allocated space on GPU 0: 16.701 GB, reserved space on GPU 0: 20.912 GB, free space: 1.244GB, frag space: 4.212GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:29 cache_engine.py:159] send kvc shards takes: 0.20s, sent out: 3.77GB, sent bw: 19.11GB/s
INFO 09-18 07:17:29 worker.py:217] After recv kvc shards, allocated space on GPU 0: 20.466 GB, reserved space on GPU 0: 21.029 GB, free space: 1.127GB, frag space: 0.563GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:30 cache_engine.py:165] Successfully send kv cache shards: [2, 3] to rank: 0
INFO 09-18 07:17:30 worker.py:223] After appending kvc shards, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 21.920 GB, free space: 0.236GB, frag space: 1.422GB
INFO 09-18 07:17:30 worker.py:228] After appending kvc shards, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.967 GB, free space: 1.189GB, frag space: 0.469GB
INFO 09-18 07:17:30 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.967 GB, free space: 1.189GB, frag space: 0.469GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 21.481GB, frag space: 0.000GB
INFO 09-18 07:17:30 llm_engine.py:773] Finished liquid for 11 times, output: Completed! Move shard: [2, 3] from [1] to [0];liquid e2e latency: 1.31s, move and shrink latency: 0.12s, update worker latency: -0.11s, liquid model weights latency: 0.62s, init mem latency: 0.00s, liquid kvc latency: 0.68s, update blocks latency: 0.00s;, current mem info on GPU0: allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.967 GB, free space: 1.189GB, frag space: 0.469GB, current gpu block: #964
INFO 09-18 07:17:30 llm_engine.py:759] ---------Start 12'th liquid: Scale out from GPU0 to GPU1---------
INFO 09-18 07:17:30 multiproc_gpu_executor.py:258] Start to do liquid from src: 0 to dst: 1 with shard_ids: [2, 3]
INFO 09-18 07:17:30 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.969 GB, free space: 1.187GB, frag space: 0.471GB, [(898899, 22.470703125)]
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:30 worker.py:165] Before appending weights shards, allocated space on GPU 1: 0.548 GB, reserved space on GPU 1: 0.561 GB, free space: 21.481GB, frag space: 0.012GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:30 worker.py:168] After recving weights shards, allocated space on GPU 1: 6.741 GB, reserved space on GPU 1: 6.754 GB, free space: 15.288GB, frag space: 0.013GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:30 worker.py:171] After appending weights shards, allocated space on GPU 1: 6.741 GB, reserved space on GPU 1: 8.307 GB, free space: 13.735GB, frag space: 1.565GB
It takes: 0.26s to send model shards
INFO 09-18 07:17:30 worker.py:152] send weights shards takes: 0.31s, sent out: 6.19GB, sent bw: 20.23GB/s
INFO 09-18 07:17:30 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 14.304 GB, reserved space on GPU 0: 14.922 GB, free space: 7.234GB, frag space: 0.618GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 13.735GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:30 worker.py:215] Before recv kvc shards, allocated space on GPU 1: 6.741 GB, reserved space on GPU 1: 8.307 GB, free space: 13.735GB, frag space: 1.565GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:30 worker.py:217] After recv kvc shards, allocated space on GPU 1: 10.507 GB, reserved space on GPU 1: 12.119 GB, free space: 9.923GB, frag space: 1.612GB
INFO 09-18 07:17:30 cache_engine.py:159] send kvc shards takes: 0.11s, sent out: 3.77GB, sent bw: 33.54GB/s
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:30 worker.py:223] After appending kvc shards, allocated space on GPU 1: 10.507 GB, reserved space on GPU 1: 12.238 GB, free space: 9.804GB, frag space: 1.731GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:30 worker.py:228] After appending kvc shards, allocated space on GPU 1: 10.507 GB, reserved space on GPU 1: 10.619 GB, free space: 11.423GB, frag space: 0.112GB
INFO 09-18 07:17:30 cache_engine.py:165] Successfully send kv cache shards: [2, 3] to rank: 1
INFO 09-18 07:17:30 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 11.053 GB, free space: 11.103GB, frag space: 0.546GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.423GB, frag space: 0.000GB
2236
INFO 09-18 07:17:30 multiproc_gpu_executor.py:171] After scale out, num_gpu_blocks: #3200
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:31 worker.py:202] extend gpu in worker takes: 0.35s
INFO 09-18 07:17:31 worker.py:202] extend gpu in worker takes: 0.37s
INFO 09-18 07:17:31 llm_engine.py:773] Finished liquid for 12 times, output: Completed! Move shard: [2, 3] from [0] to [1];liquid e2e latency: 0.87s, update worker latency: 0.00s, liquid model weights latency: 0.35s, init mem latency: 0.00s, liquid kvc latency: 0.15s, extending gpu blocks latency: 0.37s, update blocks latency: 0.00s;, current mem info on GPU0: allocated space on GPU 0: 19.241 GB, reserved space on GPU 0: 19.859 GB, free space: 2.297GB, frag space: 0.618GB, current gpu block: #3200
INFO 09-18 07:17:31 llm_engine.py:759] ---------Start 13'th liquid: Scale out from GPU[0,1] to GPU[2,3]---------
INFO 09-18 07:17:31 multiproc_gpu_executor.py:258] Start to do liquid from src: 0 to dst: 2 with shard_ids: [1]
INFO 09-18 07:17:31 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 19.241 GB, reserved space on GPU 0: 19.861 GB, free space: 2.295GB, frag space: 0.620GB, [(898899, 21.36328125)]
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:31 worker.py:165] Before appending weights shards, allocated space on GPU 2: 0.525 GB, reserved space on GPU 2: 0.539 GB, free space: 22.000GB, frag space: 0.014GB
It takes: 0.28s to send model shards
INFO 09-18 07:17:31 worker.py:152] send weights shards takes: 0.31s, sent out: 3.10GB, sent bw: 9.98GB/s
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:31 worker.py:168] After recving weights shards, allocated space on GPU 2: 3.621 GB, reserved space on GPU 2: 3.887 GB, free space: 18.652GB, frag space: 0.265GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:31 worker.py:171] After appending weights shards, allocated space on GPU 2: 3.621 GB, reserved space on GPU 2: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
INFO 09-18 07:17:31 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 16.145 GB, reserved space on GPU 0: 16.660 GB, free space: 5.496GB, frag space: 0.515GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 2.735GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:31 worker.py:215] Before recv kvc shards, allocated space on GPU 2: 3.621 GB, reserved space on GPU 2: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
INFO 09-18 07:17:31 cache_engine.py:159] send kvc shards takes: 0.42s, sent out: 6.25GB, sent bw: 14.83GB/s
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:31 worker.py:217] After recv kvc shards, allocated space on GPU 2: 9.871 GB, reserved space on GPU 2: 10.914 GB, free space: 11.625GB, frag space: 1.043GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:31 worker.py:223] After appending kvc shards, allocated space on GPU 2: 9.871 GB, reserved space on GPU 2: 11.109 GB, free space: 11.429GB, frag space: 1.238GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:32 worker.py:228] After appending kvc shards, allocated space on GPU 2: 9.871 GB, reserved space on GPU 2: 9.957 GB, free space: 12.582GB, frag space: 0.086GB
INFO 09-18 07:17:32 cache_engine.py:165] Successfully send kv cache shards: [1] to rank: 2
INFO 09-18 07:17:32 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 9.895 GB, reserved space on GPU 0: 10.410 GB, free space: 11.746GB, frag space: 0.515GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 2.735GB, frag space: 0.000GB
INFO 09-18 07:17:32 multiproc_gpu_executor.py:258] Start to do liquid from src: 1 to dst: 3 with shard_ids: [3]
INFO 09-18 07:17:32 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 9.895 GB, reserved space on GPU 0: 10.410 GB, free space: 11.746GB, frag space: 0.515GB, [(898899, 11.912109375)]
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:32 worker.py:165] Before appending weights shards, allocated space on GPU 3: 0.525 GB, reserved space on GPU 3: 0.539 GB, free space: 22.000GB, frag space: 0.014GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m It takes: 0.24s to send model shards
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:32 worker.py:152] send weights shards takes: 0.26s, sent out: 3.10GB, sent bw: 11.71GB/s
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:32 worker.py:168] After recving weights shards, allocated space on GPU 3: 3.621 GB, reserved space on GPU 3: 3.887 GB, free space: 18.652GB, frag space: 0.265GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:32 worker.py:171] After appending weights shards, allocated space on GPU 3: 3.621 GB, reserved space on GPU 3: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
INFO 09-18 07:17:32 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 9.895 GB, reserved space on GPU 0: 10.410 GB, free space: 11.746GB, frag space: 0.515GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 5.721GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:32 worker.py:215] Before recv kvc shards, allocated space on GPU 3: 3.621 GB, reserved space on GPU 3: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:33 cache_engine.py:159] send kvc shards takes: 0.41s, sent out: 6.25GB, sent bw: 15.22GB/s
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:33 worker.py:217] After recv kvc shards, allocated space on GPU 3: 9.871 GB, reserved space on GPU 3: 10.914 GB, free space: 11.625GB, frag space: 1.043GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:33 worker.py:223] After appending kvc shards, allocated space on GPU 3: 9.871 GB, reserved space on GPU 3: 11.109 GB, free space: 11.429GB, frag space: 1.238GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:33 worker.py:228] After appending kvc shards, allocated space on GPU 3: 9.871 GB, reserved space on GPU 3: 9.957 GB, free space: 12.582GB, frag space: 0.086GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:33 cache_engine.py:165] Successfully send kv cache shards: [3] to rank: 3
INFO 09-18 07:17:33 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 9.895 GB, reserved space on GPU 0: 10.410 GB, free space: 11.746GB, frag space: 0.515GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.971GB, frag space: 0.000GB
INFO 09-18 07:17:33 multiproc_gpu_executor.py:189] After scale out, num_gpu_blocks: #8001
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:33 worker.py:202] extend gpu in worker takes: 0.39s
INFO 09-18 07:17:33 worker.py:202] extend gpu in worker takes: 0.39s
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:33 worker.py:202] extend gpu in worker takes: 0.40s
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:33 worker.py:202] extend gpu in worker takes: 0.40s
INFO 09-18 07:17:33 llm_engine.py:773] Finished liquid for 13 times, output: Completed! Move shard: [1, 3] from [0, 1] to [2, 3];liquid e2e latency: 2.50s, update worker latency: 0.01s, liquid model weights latency: 1.59s, init mem latency: 0.00s, liquid kvc latency: 0.50s, extending gpu blocks latency: 0.40s, update blocks latency: 0.01s;, current mem info on GPU0: allocated space on GPU 0: 19.272 GB, reserved space on GPU 0: 19.848 GB, free space: 2.308GB, frag space: 0.576GB, current gpu block: #8001
INFO 09-18 07:17:33 llm_engine.py:759] ---------Start 14'th liquid: Scale in from GPU[2,3] to GPU[0,1]---------
INFO 09-18 07:17:33 multiproc_gpu_executor.py:242] Shrink to: #3200, currently using blocks: #2
INFO 09-18 07:17:33 multiproc_gpu_executor.py:243] Before move and shrink: allocated space on GPU 0: 19.272 GB, reserved space on GPU 0: 19.850 GB, free space: 2.306GB, frag space: 0.578GB
INFO 09-18 07:17:33 multiproc_gpu_executor.py:245] After move and shrink: allocated space on GPU 0: 9.895 GB, reserved space on GPU 0: 10.410 GB, free space: 11.746GB, frag space: 0.515GB
INFO 09-18 07:17:33 multiproc_gpu_executor.py:258] Start to do liquid from src: 2 to dst: 0 with shard_ids: [1]
INFO 09-18 07:17:33 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 9.895 GB, reserved space on GPU 0: 10.410 GB, free space: 11.746GB, frag space: 0.515GB, [(898899, 11.912109375)]
INFO 09-18 07:17:33 worker.py:165] Before appending weights shards, allocated space on GPU 0: 9.895 GB, reserved space on GPU 0: 10.410 GB, free space: 11.746GB, frag space: 0.515GB
INFO 09-18 07:17:34 worker.py:168] After recving weights shards, allocated space on GPU 0: 12.991 GB, reserved space on GPU 0: 13.215 GB, free space: 8.941GB, frag space: 0.224GB
INFO 09-18 07:17:34 worker.py:171] After appending weights shards, allocated space on GPU 0: 12.991 GB, reserved space on GPU 0: 16.361 GB, free space: 5.795GB, frag space: 3.370GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m It takes: 0.34s to send model shards
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:34 worker.py:152] send weights shards takes: 0.36s, sent out: 3.10GB, sent bw: 8.52GB/s
INFO 09-18 07:17:34 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 12.991 GB, reserved space on GPU 0: 16.361 GB, free space: 5.795GB, frag space: 3.370GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.971GB, frag space: 0.000GB
INFO 09-18 07:17:34 worker.py:215] Before recv kvc shards, allocated space on GPU 0: 12.991 GB, reserved space on GPU 0: 16.361 GB, free space: 5.795GB, frag space: 3.370GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:34 cache_engine.py:159] send kvc shards takes: 0.44s, sent out: 6.25GB, sent bw: 14.15GB/s
INFO 09-18 07:17:34 worker.py:217] After recv kvc shards, allocated space on GPU 0: 19.241 GB, reserved space on GPU 0: 19.709 GB, free space: 2.447GB, frag space: 0.468GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:34 cache_engine.py:165] Successfully send kv cache shards: [1] to rank: 0
INFO 09-18 07:17:35 worker.py:223] After appending kvc shards, allocated space on GPU 0: 19.241 GB, reserved space on GPU 0: 20.490 GB, free space: 1.666GB, frag space: 1.249GB
INFO 09-18 07:17:35 worker.py:228] After appending kvc shards, allocated space on GPU 0: 19.241 GB, reserved space on GPU 0: 19.709 GB, free space: 2.447GB, frag space: 0.468GB
INFO 09-18 07:17:35 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 19.241 GB, reserved space on GPU 0: 19.709 GB, free space: 2.447GB, frag space: 0.468GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.971GB, frag space: 0.000GB
INFO 09-18 07:17:35 multiproc_gpu_executor.py:258] Start to do liquid from src: 3 to dst: 1 with shard_ids: [3]
INFO 09-18 07:17:35 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 19.241 GB, reserved space on GPU 0: 19.709 GB, free space: 2.447GB, frag space: 0.468GB, [(898899, 21.2109375)]
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:35 worker.py:165] Before appending weights shards, allocated space on GPU 1: 9.895 GB, reserved space on GPU 1: 10.070 GB, free space: 11.971GB, frag space: 0.175GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:35 worker.py:168] After recving weights shards, allocated space on GPU 1: 12.991 GB, reserved space on GPU 1: 13.246 GB, free space: 8.796GB, frag space: 0.255GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:35 worker.py:171] After appending weights shards, allocated space on GPU 1: 12.991 GB, reserved space on GPU 1: 16.049 GB, free space: 5.993GB, frag space: 3.058GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m It takes: 0.24s to send model shards
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:35 worker.py:152] send weights shards takes: 0.26s, sent out: 3.10GB, sent bw: 11.76GB/s
INFO 09-18 07:17:35 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 19.241 GB, reserved space on GPU 0: 19.709 GB, free space: 2.447GB, frag space: 0.468GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 5.993GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:35 worker.py:215] Before recv kvc shards, allocated space on GPU 1: 12.991 GB, reserved space on GPU 1: 16.049 GB, free space: 5.993GB, frag space: 3.058GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:35 cache_engine.py:159] send kvc shards takes: 0.42s, sent out: 6.25GB, sent bw: 14.77GB/s
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:35 worker.py:217] After recv kvc shards, allocated space on GPU 1: 19.241 GB, reserved space on GPU 1: 19.389 GB, free space: 2.653GB, frag space: 0.147GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:35 cache_engine.py:165] Successfully send kv cache shards: [3] to rank: 1
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:36 worker.py:223] After appending kvc shards, allocated space on GPU 1: 19.241 GB, reserved space on GPU 1: 20.170 GB, free space: 1.872GB, frag space: 0.929GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:36 worker.py:228] After appending kvc shards, allocated space on GPU 1: 19.241 GB, reserved space on GPU 1: 19.389 GB, free space: 2.653GB, frag space: 0.147GB
INFO 09-18 07:17:36 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 19.241 GB, reserved space on GPU 0: 19.709 GB, free space: 2.447GB, frag space: 0.468GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 2.653GB, frag space: 0.000GB
INFO 09-18 07:17:36 llm_engine.py:773] Finished liquid for 14 times, output: Completed! Move shard: [1, 3] from [2, 3] to [0, 1];liquid e2e latency: 2.55s, move and shrink latency: -1726640253.74s, update worker latency: 1726640253.74s, liquid model weights latency: 1.81s, init mem latency: 0.00s, liquid kvc latency: 0.73s, update blocks latency: 0.00s;, current mem info on GPU0: allocated space on GPU 0: 19.241 GB, reserved space on GPU 0: 19.709 GB, free space: 2.447GB, frag space: 0.468GB, current gpu block: #3200
INFO 09-18 07:17:36 llm_engine.py:759] ---------Start 15'th liquid: Scale in from GPU1 to GPU 0---------
INFO 09-18 07:17:36 multiproc_gpu_executor.py:216] Shrink to: #964, currently using blocks: #2
INFO 09-18 07:17:36 multiproc_gpu_executor.py:217] Before move and shrink: on GPU: allocated space on GPU 0: 19.241 GB, reserved space on GPU 0: 19.711 GB, free space: 2.445GB, frag space: 0.470GB
INFO 09-18 07:17:36 multiproc_gpu_executor.py:219] After move and shrink: allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 11.021 GB, free space: 11.134GB, frag space: 0.515GB
INFO 09-18 07:17:36 multiproc_gpu_executor.py:258] Start to do liquid from src: 1 to dst: 0 with shard_ids: [2, 3]
INFO 09-18 07:17:36 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 11.021 GB, free space: 11.134GB, frag space: 0.515GB, [(898899, 12.5234375)]
INFO 09-18 07:17:36 worker.py:165] Before appending weights shards, allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 11.021 GB, free space: 11.134GB, frag space: 0.515GB
INFO 09-18 07:17:36 worker.py:168] After recving weights shards, allocated space on GPU 0: 16.700 GB, reserved space on GPU 0: 16.918 GB, free space: 5.238GB, frag space: 0.218GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m It takes: 0.33s to send model shards
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:36 worker.py:152] send weights shards takes: 0.36s, sent out: 6.19GB, sent bw: 17.29GB/s
INFO 09-18 07:17:36 worker.py:171] After appending weights shards, allocated space on GPU 0: 16.701 GB, reserved space on GPU 0: 20.896 GB, free space: 1.259GB, frag space: 4.196GB
INFO 09-18 07:17:36 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 16.701 GB, reserved space on GPU 0: 20.896 GB, free space: 1.259GB, frag space: 4.196GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 17.669GB, frag space: 0.000GB
INFO 09-18 07:17:36 worker.py:215] Before recv kvc shards, allocated space on GPU 0: 16.701 GB, reserved space on GPU 0: 20.896 GB, free space: 1.259GB, frag space: 4.196GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:37 cache_engine.py:159] send kvc shards takes: 0.19s, sent out: 3.77GB, sent bw: 19.50GB/s
INFO 09-18 07:17:37 worker.py:217] After recv kvc shards, allocated space on GPU 0: 20.466 GB, reserved space on GPU 0: 21.088 GB, free space: 1.068GB, frag space: 0.622GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:37 cache_engine.py:165] Successfully send kv cache shards: [2, 3] to rank: 0
INFO 09-18 07:17:37 worker.py:223] After appending kvc shards, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 21.979 GB, free space: 0.177GB, frag space: 1.481GB
INFO 09-18 07:17:37 worker.py:228] After appending kvc shards, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 21.025 GB, free space: 1.130GB, frag space: 0.528GB
INFO 09-18 07:17:37 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 21.025 GB, free space: 1.130GB, frag space: 0.528GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 21.481GB, frag space: 0.000GB
INFO 09-18 07:17:37 llm_engine.py:773] Finished liquid for 15 times, output: Completed! Move shard: [2, 3] from [1] to [0];liquid e2e latency: 1.30s, move and shrink latency: 0.10s, update worker latency: -0.10s, liquid model weights latency: 0.61s, init mem latency: 0.00s, liquid kvc latency: 0.69s, update blocks latency: 0.00s;, current mem info on GPU0: allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 21.025 GB, free space: 1.130GB, frag space: 0.528GB, current gpu block: #964
INFO 09-18 07:17:37 llm_engine.py:759] ---------Start 16'th liquid: Scale out from GPU0 to GPU1---------
INFO 09-18 07:17:37 multiproc_gpu_executor.py:258] Start to do liquid from src: 0 to dst: 1 with shard_ids: [2, 3]
INFO 09-18 07:17:37 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 21.027 GB, free space: 1.129GB, frag space: 0.530GB, [(898899, 22.529296875)]
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:37 worker.py:165] Before appending weights shards, allocated space on GPU 1: 0.548 GB, reserved space on GPU 1: 0.561 GB, free space: 21.481GB, frag space: 0.012GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:37 worker.py:168] After recving weights shards, allocated space on GPU 1: 6.741 GB, reserved space on GPU 1: 6.754 GB, free space: 15.288GB, frag space: 0.013GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:37 worker.py:171] After appending weights shards, allocated space on GPU 1: 6.741 GB, reserved space on GPU 1: 8.307 GB, free space: 13.735GB, frag space: 1.565GB
It takes: 0.26s to send model shards
INFO 09-18 07:17:38 worker.py:152] send weights shards takes: 0.31s, sent out: 6.19GB, sent bw: 20.26GB/s
INFO 09-18 07:17:38 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 14.304 GB, reserved space on GPU 0: 15.047 GB, free space: 7.109GB, frag space: 0.743GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 13.735GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:38 worker.py:215] Before recv kvc shards, allocated space on GPU 1: 6.741 GB, reserved space on GPU 1: 8.307 GB, free space: 13.735GB, frag space: 1.565GB
INFO 09-18 07:17:38 cache_engine.py:159] send kvc shards takes: 0.11s, sent out: 3.77GB, sent bw: 33.72GB/s
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:38 worker.py:217] After recv kvc shards, allocated space on GPU 1: 10.507 GB, reserved space on GPU 1: 12.119 GB, free space: 9.923GB, frag space: 1.612GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:38 worker.py:223] After appending kvc shards, allocated space on GPU 1: 10.507 GB, reserved space on GPU 1: 12.238 GB, free space: 9.804GB, frag space: 1.731GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:38 worker.py:228] After appending kvc shards, allocated space on GPU 1: 10.507 GB, reserved space on GPU 1: 10.619 GB, free space: 11.423GB, frag space: 0.112GB
INFO 09-18 07:17:38 cache_engine.py:165] Successfully send kv cache shards: [2, 3] to rank: 1
INFO 09-18 07:17:38 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 11.178 GB, free space: 10.978GB, frag space: 0.671GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.423GB, frag space: 0.000GB
2204
INFO 09-18 07:17:38 multiproc_gpu_executor.py:171] After scale out, num_gpu_blocks: #3168
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:38 worker.py:202] extend gpu in worker takes: 0.35s
INFO 09-18 07:17:38 worker.py:202] extend gpu in worker takes: 0.36s
INFO 09-18 07:17:38 llm_engine.py:773] Finished liquid for 16 times, output: Completed! Move shard: [2, 3] from [0] to [1];liquid e2e latency: 0.87s, update worker latency: 0.00s, liquid model weights latency: 0.34s, init mem latency: 0.00s, liquid kvc latency: 0.16s, extending gpu blocks latency: 0.36s, update blocks latency: 0.00s;, current mem info on GPU0: allocated space on GPU 0: 19.116 GB, reserved space on GPU 0: 19.859 GB, free space: 2.297GB, frag space: 0.743GB, current gpu block: #3168
INFO 09-18 07:17:38 llm_engine.py:759] ---------Start 17'th liquid: Scale out from GPU[0,1] to GPU[2,3]---------
INFO 09-18 07:17:38 multiproc_gpu_executor.py:258] Start to do liquid from src: 0 to dst: 2 with shard_ids: [1]
INFO 09-18 07:17:38 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 19.116 GB, reserved space on GPU 0: 19.861 GB, free space: 2.295GB, frag space: 0.745GB, [(898899, 21.36328125)]
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:38 worker.py:165] Before appending weights shards, allocated space on GPU 2: 0.525 GB, reserved space on GPU 2: 0.539 GB, free space: 22.000GB, frag space: 0.014GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:38 worker.py:168] After recving weights shards, allocated space on GPU 2: 3.621 GB, reserved space on GPU 2: 3.887 GB, free space: 18.652GB, frag space: 0.265GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:38 worker.py:171] After appending weights shards, allocated space on GPU 2: 3.621 GB, reserved space on GPU 2: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
It takes: 0.24s to send model shards
INFO 09-18 07:17:38 worker.py:152] send weights shards takes: 0.27s, sent out: 3.10GB, sent bw: 11.54GB/s
INFO 09-18 07:17:38 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 16.020 GB, reserved space on GPU 0: 16.379 GB, free space: 5.777GB, frag space: 0.359GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 2.860GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:38 worker.py:215] Before recv kvc shards, allocated space on GPU 2: 3.621 GB, reserved space on GPU 2: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:39 worker.py:217] After recv kvc shards, allocated space on GPU 2: 9.809 GB, reserved space on GPU 2: 10.852 GB, free space: 11.687GB, frag space: 1.043GB
INFO 09-18 07:17:39 cache_engine.py:159] send kvc shards takes: 0.42s, sent out: 6.19GB, sent bw: 14.76GB/s
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:39 worker.py:223] After appending kvc shards, allocated space on GPU 2: 9.809 GB, reserved space on GPU 2: 11.045 GB, free space: 11.494GB, frag space: 1.236GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:39 worker.py:228] After appending kvc shards, allocated space on GPU 2: 9.809 GB, reserved space on GPU 2: 9.895 GB, free space: 12.644GB, frag space: 0.086GB
INFO 09-18 07:17:39 cache_engine.py:165] Successfully send kv cache shards: [1] to rank: 2
INFO 09-18 07:17:39 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 9.832 GB, reserved space on GPU 0: 10.191 GB, free space: 11.964GB, frag space: 0.359GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 2.860GB, frag space: 0.000GB
INFO 09-18 07:17:39 multiproc_gpu_executor.py:258] Start to do liquid from src: 1 to dst: 3 with shard_ids: [3]
INFO 09-18 07:17:39 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 9.832 GB, reserved space on GPU 0: 10.191 GB, free space: 11.964GB, frag space: 0.359GB, [(898899, 11.693359375)]
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:39 worker.py:165] Before appending weights shards, allocated space on GPU 3: 0.525 GB, reserved space on GPU 3: 0.539 GB, free space: 22.000GB, frag space: 0.014GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:39 worker.py:168] After recving weights shards, allocated space on GPU 3: 3.621 GB, reserved space on GPU 3: 3.887 GB, free space: 18.652GB, frag space: 0.265GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m It takes: 0.24s to send model shards
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:39 worker.py:152] send weights shards takes: 0.47s, sent out: 3.10GB, sent bw: 6.63GB/s
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:39 worker.py:171] After appending weights shards, allocated space on GPU 3: 3.621 GB, reserved space on GPU 3: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
INFO 09-18 07:17:39 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 9.832 GB, reserved space on GPU 0: 10.191 GB, free space: 11.964GB, frag space: 0.359GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 5.846GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:39 worker.py:215] Before recv kvc shards, allocated space on GPU 3: 3.621 GB, reserved space on GPU 3: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:40 cache_engine.py:159] send kvc shards takes: 0.41s, sent out: 6.19GB, sent bw: 15.14GB/s
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:40 worker.py:217] After recv kvc shards, allocated space on GPU 3: 9.809 GB, reserved space on GPU 3: 10.852 GB, free space: 11.687GB, frag space: 1.043GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:40 worker.py:223] After appending kvc shards, allocated space on GPU 3: 9.809 GB, reserved space on GPU 3: 11.045 GB, free space: 11.494GB, frag space: 1.236GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:40 worker.py:228] After appending kvc shards, allocated space on GPU 3: 9.809 GB, reserved space on GPU 3: 9.895 GB, free space: 12.644GB, frag space: 0.086GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:40 cache_engine.py:165] Successfully send kv cache shards: [3] to rank: 3
INFO 09-18 07:17:40 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 9.832 GB, reserved space on GPU 0: 10.191 GB, free space: 11.964GB, frag space: 0.359GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 12.034GB, frag space: 0.000GB
INFO 09-18 07:17:40 multiproc_gpu_executor.py:189] After scale out, num_gpu_blocks: #8081
INFO 09-18 07:17:40 worker.py:202] extend gpu in worker takes: 0.39s
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:40 worker.py:202] extend gpu in worker takes: 0.40s
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:40 worker.py:202] extend gpu in worker takes: 0.40s
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:40 worker.py:202] extend gpu in worker takes: 0.41s
INFO 09-18 07:17:40 llm_engine.py:773] Finished liquid for 17 times, output: Completed! Move shard: [1, 3] from [0, 1] to [2, 3];liquid e2e latency: 2.31s, update worker latency: 0.01s, liquid model weights latency: 1.34s, init mem latency: 0.00s, liquid kvc latency: 0.48s, extending gpu blocks latency: 0.41s, update blocks latency: 0.07s;, current mem info on GPU0: allocated space on GPU 0: 19.457 GB, reserved space on GPU 0: 19.816 GB, free space: 2.339GB, frag space: 0.359GB, current gpu block: #8081
INFO 09-18 07:17:40 llm_engine.py:759] ---------Start 18'th liquid: Scale in from GPU[2,3] to GPU[0,1]---------
INFO 09-18 07:17:40 multiproc_gpu_executor.py:242] Shrink to: #3168, currently using blocks: #2
INFO 09-18 07:17:40 multiproc_gpu_executor.py:243] Before move and shrink: allocated space on GPU 0: 19.457 GB, reserved space on GPU 0: 19.818 GB, free space: 2.338GB, frag space: 0.361GB
INFO 09-18 07:17:41 multiproc_gpu_executor.py:245] After move and shrink: allocated space on GPU 0: 9.832 GB, reserved space on GPU 0: 10.191 GB, free space: 11.964GB, frag space: 0.359GB
INFO 09-18 07:17:41 multiproc_gpu_executor.py:258] Start to do liquid from src: 2 to dst: 0 with shard_ids: [1]
INFO 09-18 07:17:41 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 9.832 GB, reserved space on GPU 0: 10.191 GB, free space: 11.964GB, frag space: 0.359GB, [(898899, 11.693359375)]
INFO 09-18 07:17:41 worker.py:165] Before appending weights shards, allocated space on GPU 0: 9.832 GB, reserved space on GPU 0: 10.191 GB, free space: 11.964GB, frag space: 0.359GB
INFO 09-18 07:17:41 worker.py:168] After recving weights shards, allocated space on GPU 0: 12.929 GB, reserved space on GPU 0: 13.164 GB, free space: 8.992GB, frag space: 0.235GB
INFO 09-18 07:17:41 worker.py:171] After appending weights shards, allocated space on GPU 0: 12.929 GB, reserved space on GPU 0: 16.404 GB, free space: 5.752GB, frag space: 3.476GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m It takes: 0.33s to send model shards
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:41 worker.py:152] send weights shards takes: 0.36s, sent out: 3.10GB, sent bw: 8.72GB/s
INFO 09-18 07:17:41 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 12.929 GB, reserved space on GPU 0: 16.404 GB, free space: 5.752GB, frag space: 3.476GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 12.034GB, frag space: 0.000GB
INFO 09-18 07:17:41 worker.py:215] Before recv kvc shards, allocated space on GPU 0: 12.929 GB, reserved space on GPU 0: 16.404 GB, free space: 5.752GB, frag space: 3.476GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:41 cache_engine.py:159] send kvc shards takes: 0.44s, sent out: 6.19GB, sent bw: 14.08GB/s
INFO 09-18 07:17:41 worker.py:217] After recv kvc shards, allocated space on GPU 0: 19.116 GB, reserved space on GPU 0: 19.584 GB, free space: 2.572GB, frag space: 0.468GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:41 cache_engine.py:165] Successfully send kv cache shards: [1] to rank: 0
INFO 09-18 07:17:42 worker.py:223] After appending kvc shards, allocated space on GPU 0: 19.116 GB, reserved space on GPU 0: 20.357 GB, free space: 1.798GB, frag space: 1.241GB
INFO 09-18 07:17:42 worker.py:228] After appending kvc shards, allocated space on GPU 0: 19.116 GB, reserved space on GPU 0: 19.584 GB, free space: 2.572GB, frag space: 0.468GB
INFO 09-18 07:17:42 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 19.116 GB, reserved space on GPU 0: 19.584 GB, free space: 2.572GB, frag space: 0.468GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 12.034GB, frag space: 0.000GB
INFO 09-18 07:17:42 multiproc_gpu_executor.py:258] Start to do liquid from src: 3 to dst: 1 with shard_ids: [3]
INFO 09-18 07:17:42 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 19.116 GB, reserved space on GPU 0: 19.584 GB, free space: 2.572GB, frag space: 0.468GB, [(898899, 21.0859375)]
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:42 worker.py:165] Before appending weights shards, allocated space on GPU 1: 9.832 GB, reserved space on GPU 1: 10.008 GB, free space: 12.034GB, frag space: 0.175GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:42 worker.py:168] After recving weights shards, allocated space on GPU 1: 12.929 GB, reserved space on GPU 1: 13.184 GB, free space: 8.858GB, frag space: 0.255GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:42 worker.py:171] After appending weights shards, allocated space on GPU 1: 12.929 GB, reserved space on GPU 1: 15.986 GB, free space: 6.055GB, frag space: 3.058GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m It takes: 0.24s to send model shards
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:42 worker.py:152] send weights shards takes: 0.27s, sent out: 3.10GB, sent bw: 11.60GB/s
INFO 09-18 07:17:42 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 19.116 GB, reserved space on GPU 0: 19.584 GB, free space: 2.572GB, frag space: 0.468GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 6.055GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:42 worker.py:215] Before recv kvc shards, allocated space on GPU 1: 12.929 GB, reserved space on GPU 1: 15.986 GB, free space: 6.055GB, frag space: 3.058GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:43 cache_engine.py:159] send kvc shards takes: 0.41s, sent out: 6.19GB, sent bw: 14.96GB/s
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:43 worker.py:217] After recv kvc shards, allocated space on GPU 1: 19.116 GB, reserved space on GPU 1: 19.264 GB, free space: 2.778GB, frag space: 0.147GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:43 cache_engine.py:165] Successfully send kv cache shards: [3] to rank: 1
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:43 worker.py:223] After appending kvc shards, allocated space on GPU 1: 19.116 GB, reserved space on GPU 1: 20.811 GB, free space: 1.231GB, frag space: 1.694GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:43 worker.py:228] After appending kvc shards, allocated space on GPU 1: 19.116 GB, reserved space on GPU 1: 19.264 GB, free space: 2.778GB, frag space: 0.147GB
INFO 09-18 07:17:43 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 19.116 GB, reserved space on GPU 0: 19.584 GB, free space: 2.572GB, frag space: 0.468GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 2.778GB, frag space: 0.000GB
INFO 09-18 07:17:43 llm_engine.py:773] Finished liquid for 18 times, output: Completed! Move shard: [1, 3] from [2, 3] to [0, 1];liquid e2e latency: 2.71s, move and shrink latency: -1726640260.92s, update worker latency: 1726640260.93s, liquid model weights latency: 2.00s, init mem latency: 0.00s, liquid kvc latency: 0.70s, update blocks latency: 0.00s;, current mem info on GPU0: allocated space on GPU 0: 19.116 GB, reserved space on GPU 0: 19.584 GB, free space: 2.572GB, frag space: 0.468GB, current gpu block: #3168
INFO 09-18 07:17:43 llm_engine.py:759] ---------Start 19'th liquid: Scale in from GPU1 to GPU 0---------
INFO 09-18 07:17:43 multiproc_gpu_executor.py:216] Shrink to: #964, currently using blocks: #2
INFO 09-18 07:17:43 multiproc_gpu_executor.py:217] Before move and shrink: on GPU: allocated space on GPU 0: 19.116 GB, reserved space on GPU 0: 19.586 GB, free space: 2.570GB, frag space: 0.470GB
INFO 09-18 07:17:43 multiproc_gpu_executor.py:219] After move and shrink: allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 11.021 GB, free space: 11.134GB, frag space: 0.515GB
INFO 09-18 07:17:43 multiproc_gpu_executor.py:258] Start to do liquid from src: 1 to dst: 0 with shard_ids: [2, 3]
INFO 09-18 07:17:43 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 11.021 GB, free space: 11.134GB, frag space: 0.515GB, [(898899, 12.5234375)]
INFO 09-18 07:17:43 worker.py:165] Before appending weights shards, allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 11.021 GB, free space: 11.134GB, frag space: 0.515GB
INFO 09-18 07:17:44 worker.py:168] After recving weights shards, allocated space on GPU 0: 16.700 GB, reserved space on GPU 0: 16.902 GB, free space: 5.254GB, frag space: 0.203GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m It takes: 0.32s to send model shards
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:44 worker.py:152] send weights shards takes: 0.35s, sent out: 6.19GB, sent bw: 17.53GB/s
INFO 09-18 07:17:44 worker.py:171] After appending weights shards, allocated space on GPU 0: 16.701 GB, reserved space on GPU 0: 21.100 GB, free space: 1.056GB, frag space: 4.399GB
INFO 09-18 07:17:44 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 16.701 GB, reserved space on GPU 0: 21.100 GB, free space: 1.056GB, frag space: 4.399GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 17.669GB, frag space: 0.000GB
INFO 09-18 07:17:44 worker.py:215] Before recv kvc shards, allocated space on GPU 0: 16.701 GB, reserved space on GPU 0: 21.100 GB, free space: 1.056GB, frag space: 4.399GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:44 cache_engine.py:159] send kvc shards takes: 0.20s, sent out: 3.77GB, sent bw: 18.96GB/s
INFO 09-18 07:17:44 worker.py:217] After recv kvc shards, allocated space on GPU 0: 20.466 GB, reserved space on GPU 0: 21.041 GB, free space: 1.115GB, frag space: 0.575GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:44 cache_engine.py:165] Successfully send kv cache shards: [2, 3] to rank: 0
INFO 09-18 07:17:45 worker.py:223] After appending kvc shards, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 21.932 GB, free space: 0.224GB, frag space: 1.434GB
INFO 09-18 07:17:45 worker.py:228] After appending kvc shards, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.979 GB, free space: 1.177GB, frag space: 0.481GB
INFO 09-18 07:17:45 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.979 GB, free space: 1.177GB, frag space: 0.481GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 21.481GB, frag space: 0.000GB
INFO 09-18 07:17:45 llm_engine.py:773] Finished liquid for 19 times, output: Completed! Move shard: [2, 3] from [1] to [0];liquid e2e latency: 1.36s, move and shrink latency: 0.11s, update worker latency: -0.11s, liquid model weights latency: 0.61s, init mem latency: 0.00s, liquid kvc latency: 0.74s, update blocks latency: 0.00s;, current mem info on GPU0: allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.979 GB, free space: 1.177GB, frag space: 0.481GB, current gpu block: #964
INFO 09-18 07:17:45 llm_engine.py:759] ---------Start 20'th liquid: Scale out from GPU0 to GPU1---------
INFO 09-18 07:17:45 multiproc_gpu_executor.py:258] Start to do liquid from src: 0 to dst: 1 with shard_ids: [2, 3]
INFO 09-18 07:17:45 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.980 GB, free space: 1.175GB, frag space: 0.483GB, [(898899, 22.482421875)]
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:45 worker.py:165] Before appending weights shards, allocated space on GPU 1: 0.548 GB, reserved space on GPU 1: 0.561 GB, free space: 21.481GB, frag space: 0.012GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:45 worker.py:168] After recving weights shards, allocated space on GPU 1: 6.741 GB, reserved space on GPU 1: 6.754 GB, free space: 15.288GB, frag space: 0.013GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:45 worker.py:171] After appending weights shards, allocated space on GPU 1: 6.741 GB, reserved space on GPU 1: 8.307 GB, free space: 13.735GB, frag space: 1.565GB
It takes: 0.22s to send model shards
INFO 09-18 07:17:45 worker.py:152] send weights shards takes: 0.26s, sent out: 6.19GB, sent bw: 23.41GB/s
INFO 09-18 07:17:45 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 14.304 GB, reserved space on GPU 0: 14.875 GB, free space: 7.281GB, frag space: 0.571GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 13.735GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:45 worker.py:215] Before recv kvc shards, allocated space on GPU 1: 6.741 GB, reserved space on GPU 1: 8.307 GB, free space: 13.735GB, frag space: 1.565GB
INFO 09-18 07:17:45 cache_engine.py:159] send kvc shards takes: 0.11s, sent out: 3.77GB, sent bw: 33.53GB/s
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:45 worker.py:217] After recv kvc shards, allocated space on GPU 1: 10.507 GB, reserved space on GPU 1: 12.119 GB, free space: 9.923GB, frag space: 1.612GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:45 worker.py:223] After appending kvc shards, allocated space on GPU 1: 10.507 GB, reserved space on GPU 1: 12.238 GB, free space: 9.804GB, frag space: 1.731GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:45 worker.py:228] After appending kvc shards, allocated space on GPU 1: 10.507 GB, reserved space on GPU 1: 10.619 GB, free space: 11.423GB, frag space: 0.112GB
INFO 09-18 07:17:45 cache_engine.py:165] Successfully send kv cache shards: [2, 3] to rank: 1
INFO 09-18 07:17:45 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 11.006 GB, free space: 11.150GB, frag space: 0.499GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.423GB, frag space: 0.000GB
2248
INFO 09-18 07:17:45 multiproc_gpu_executor.py:171] After scale out, num_gpu_blocks: #3212
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:45 worker.py:202] extend gpu in worker takes: 0.36s
INFO 09-18 07:17:45 worker.py:202] extend gpu in worker takes: 0.37s
INFO 09-18 07:17:45 llm_engine.py:773] Finished liquid for 20 times, output: Completed! Move shard: [2, 3] from [0] to [1];liquid e2e latency: 0.84s, update worker latency: 0.00s, liquid model weights latency: 0.30s, init mem latency: 0.00s, liquid kvc latency: 0.16s, extending gpu blocks latency: 0.38s, update blocks latency: 0.00s;, current mem info on GPU0: allocated space on GPU 0: 19.304 GB, reserved space on GPU 0: 19.875 GB, free space: 2.281GB, frag space: 0.571GB, current gpu block: #3212
INFO 09-18 07:17:45 llm_engine.py:759] ---------Start 21'th liquid: Scale out from GPU[0,1] to GPU[2,3]---------
INFO 09-18 07:17:45 multiproc_gpu_executor.py:258] Start to do liquid from src: 0 to dst: 2 with shard_ids: [1]
INFO 09-18 07:17:46 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 19.304 GB, reserved space on GPU 0: 19.877 GB, free space: 2.279GB, frag space: 0.573GB, [(898899, 21.37890625)]
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:46 worker.py:165] Before appending weights shards, allocated space on GPU 2: 0.525 GB, reserved space on GPU 2: 0.539 GB, free space: 22.000GB, frag space: 0.014GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:46 worker.py:168] After recving weights shards, allocated space on GPU 2: 3.621 GB, reserved space on GPU 2: 3.887 GB, free space: 18.652GB, frag space: 0.265GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:46 worker.py:171] After appending weights shards, allocated space on GPU 2: 3.621 GB, reserved space on GPU 2: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
It takes: 0.26s to send model shards
INFO 09-18 07:17:46 worker.py:152] send weights shards takes: 0.29s, sent out: 3.10GB, sent bw: 10.70GB/s
INFO 09-18 07:17:46 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 16.207 GB, reserved space on GPU 0: 16.629 GB, free space: 5.527GB, frag space: 0.422GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 2.673GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:46 worker.py:215] Before recv kvc shards, allocated space on GPU 2: 3.621 GB, reserved space on GPU 2: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
INFO 09-18 07:17:46 cache_engine.py:159] send kvc shards takes: 0.42s, sent out: 6.27GB, sent bw: 14.77GB/s
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:46 worker.py:217] After recv kvc shards, allocated space on GPU 2: 9.895 GB, reserved space on GPU 2: 10.977 GB, free space: 11.562GB, frag space: 1.082GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:46 worker.py:223] After appending kvc shards, allocated space on GPU 2: 9.895 GB, reserved space on GPU 2: 11.174 GB, free space: 11.365GB, frag space: 1.279GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:46 worker.py:228] After appending kvc shards, allocated space on GPU 2: 9.895 GB, reserved space on GPU 2: 10.020 GB, free space: 12.519GB, frag space: 0.125GB
INFO 09-18 07:17:46 cache_engine.py:165] Successfully send kv cache shards: [1] to rank: 2
INFO 09-18 07:17:46 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 9.918 GB, reserved space on GPU 0: 10.379 GB, free space: 11.777GB, frag space: 0.461GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 2.673GB, frag space: 0.000GB
INFO 09-18 07:17:46 multiproc_gpu_executor.py:258] Start to do liquid from src: 1 to dst: 3 with shard_ids: [3]
INFO 09-18 07:17:46 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 9.918 GB, reserved space on GPU 0: 10.379 GB, free space: 11.777GB, frag space: 0.461GB, [(898899, 11.880859375)]
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:46 worker.py:165] Before appending weights shards, allocated space on GPU 3: 0.525 GB, reserved space on GPU 3: 0.539 GB, free space: 22.000GB, frag space: 0.014GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:47 worker.py:168] After recving weights shards, allocated space on GPU 3: 3.621 GB, reserved space on GPU 3: 3.887 GB, free space: 18.652GB, frag space: 0.265GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:47 worker.py:171] After appending weights shards, allocated space on GPU 3: 3.621 GB, reserved space on GPU 3: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m It takes: 0.25s to send model shards
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:47 worker.py:152] send weights shards takes: 0.48s, sent out: 3.10GB, sent bw: 6.45GB/s
INFO 09-18 07:17:47 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 9.918 GB, reserved space on GPU 0: 10.379 GB, free space: 11.777GB, frag space: 0.461GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 5.659GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:47 worker.py:215] Before recv kvc shards, allocated space on GPU 3: 3.621 GB, reserved space on GPU 3: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:47 cache_engine.py:159] send kvc shards takes: 0.41s, sent out: 6.27GB, sent bw: 15.23GB/s
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:47 worker.py:217] After recv kvc shards, allocated space on GPU 3: 9.895 GB, reserved space on GPU 3: 10.977 GB, free space: 11.562GB, frag space: 1.082GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:47 worker.py:223] After appending kvc shards, allocated space on GPU 3: 9.895 GB, reserved space on GPU 3: 11.174 GB, free space: 11.365GB, frag space: 1.279GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:47 worker.py:228] After appending kvc shards, allocated space on GPU 3: 9.895 GB, reserved space on GPU 3: 10.020 GB, free space: 12.519GB, frag space: 0.125GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:47 cache_engine.py:165] Successfully send kv cache shards: [3] to rank: 3
INFO 09-18 07:17:47 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 9.918 GB, reserved space on GPU 0: 10.379 GB, free space: 11.777GB, frag space: 0.461GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.909GB, frag space: 0.000GB
INFO 09-18 07:17:47 multiproc_gpu_executor.py:189] After scale out, num_gpu_blocks: #8029
INFO 09-18 07:17:48 worker.py:202] extend gpu in worker takes: 0.37s
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:48 worker.py:202] extend gpu in worker takes: 0.39s
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:48 worker.py:202] extend gpu in worker takes: 0.41s
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:48 worker.py:202] extend gpu in worker takes: 0.41s
INFO 09-18 07:17:48 llm_engine.py:773] Finished liquid for 21 times, output: Completed! Move shard: [1, 3] from [0, 1] to [2, 3];liquid e2e latency: 2.39s, update worker latency: 0.01s, liquid model weights latency: 1.47s, init mem latency: 0.00s, liquid kvc latency: 0.49s, extending gpu blocks latency: 0.42s, update blocks latency: 0.01s;, current mem info on GPU0: allocated space on GPU 0: 19.332 GB, reserved space on GPU 0: 19.754 GB, free space: 2.402GB, frag space: 0.422GB, current gpu block: #8029
INFO 09-18 07:17:48 llm_engine.py:759] ---------Start 22'th liquid: Scale in from GPU[2,3] to GPU[0,1]---------
INFO 09-18 07:17:48 multiproc_gpu_executor.py:242] Shrink to: #3212, currently using blocks: #2
INFO 09-18 07:17:48 multiproc_gpu_executor.py:243] Before move and shrink: allocated space on GPU 0: 19.332 GB, reserved space on GPU 0: 19.756 GB, free space: 2.400GB, frag space: 0.423GB
INFO 09-18 07:17:48 multiproc_gpu_executor.py:245] After move and shrink: allocated space on GPU 0: 9.918 GB, reserved space on GPU 0: 10.379 GB, free space: 11.777GB, frag space: 0.461GB
INFO 09-18 07:17:48 multiproc_gpu_executor.py:258] Start to do liquid from src: 2 to dst: 0 with shard_ids: [1]
INFO 09-18 07:17:48 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 9.918 GB, reserved space on GPU 0: 10.379 GB, free space: 11.777GB, frag space: 0.461GB, [(898899, 11.880859375)]
INFO 09-18 07:17:48 worker.py:165] Before appending weights shards, allocated space on GPU 0: 9.918 GB, reserved space on GPU 0: 10.379 GB, free space: 11.777GB, frag space: 0.461GB
INFO 09-18 07:17:48 worker.py:168] After recving weights shards, allocated space on GPU 0: 13.015 GB, reserved space on GPU 0: 13.285 GB, free space: 8.871GB, frag space: 0.270GB
INFO 09-18 07:17:48 worker.py:171] After appending weights shards, allocated space on GPU 0: 13.015 GB, reserved space on GPU 0: 16.416 GB, free space: 5.740GB, frag space: 3.401GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m It takes: 0.35s to send model shards
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:48 worker.py:152] send weights shards takes: 0.37s, sent out: 3.10GB, sent bw: 8.28GB/s
INFO 09-18 07:17:48 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 13.015 GB, reserved space on GPU 0: 16.416 GB, free space: 5.740GB, frag space: 3.401GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.909GB, frag space: 0.000GB
INFO 09-18 07:17:48 worker.py:215] Before recv kvc shards, allocated space on GPU 0: 13.015 GB, reserved space on GPU 0: 16.416 GB, free space: 5.740GB, frag space: 3.401GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:49 cache_engine.py:159] send kvc shards takes: 0.45s, sent out: 6.27GB, sent bw: 14.07GB/s
INFO 09-18 07:17:49 worker.py:217] After recv kvc shards, allocated space on GPU 0: 19.288 GB, reserved space on GPU 0: 19.732 GB, free space: 2.423GB, frag space: 0.444GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:49 cache_engine.py:165] Successfully send kv cache shards: [1] to rank: 0
INFO 09-18 07:17:50 worker.py:223] After appending kvc shards, allocated space on GPU 0: 19.304 GB, reserved space on GPU 0: 20.459 GB, free space: 1.697GB, frag space: 1.155GB
INFO 09-18 07:17:50 worker.py:228] After appending kvc shards, allocated space on GPU 0: 19.304 GB, reserved space on GPU 0: 19.670 GB, free space: 2.486GB, frag space: 0.366GB
INFO 09-18 07:17:50 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 19.304 GB, reserved space on GPU 0: 19.670 GB, free space: 2.486GB, frag space: 0.366GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.909GB, frag space: 0.000GB
INFO 09-18 07:17:50 multiproc_gpu_executor.py:258] Start to do liquid from src: 3 to dst: 1 with shard_ids: [3]
INFO 09-18 07:17:50 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 19.304 GB, reserved space on GPU 0: 19.670 GB, free space: 2.486GB, frag space: 0.366GB, [(898899, 21.171875)]
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:50 worker.py:165] Before appending weights shards, allocated space on GPU 1: 9.918 GB, reserved space on GPU 1: 10.133 GB, free space: 11.909GB, frag space: 0.215GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:50 worker.py:168] After recving weights shards, allocated space on GPU 1: 13.015 GB, reserved space on GPU 1: 13.309 GB, free space: 8.733GB, frag space: 0.294GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:50 worker.py:171] After appending weights shards, allocated space on GPU 1: 13.015 GB, reserved space on GPU 1: 16.111 GB, free space: 5.930GB, frag space: 3.097GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m It takes: 0.26s to send model shards
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:50 worker.py:152] send weights shards takes: 0.28s, sent out: 3.10GB, sent bw: 10.90GB/s
INFO 09-18 07:17:50 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 19.304 GB, reserved space on GPU 0: 19.670 GB, free space: 2.486GB, frag space: 0.366GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 5.930GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:50 worker.py:215] Before recv kvc shards, allocated space on GPU 1: 13.015 GB, reserved space on GPU 1: 16.111 GB, free space: 5.930GB, frag space: 3.097GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:50 cache_engine.py:159] send kvc shards takes: 0.42s, sent out: 6.27GB, sent bw: 14.92GB/s
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:50 worker.py:217] After recv kvc shards, allocated space on GPU 1: 19.288 GB, reserved space on GPU 1: 19.514 GB, free space: 2.528GB, frag space: 0.226GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:50 cache_engine.py:165] Successfully send kv cache shards: [3] to rank: 1
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:51 worker.py:223] After appending kvc shards, allocated space on GPU 1: 19.304 GB, reserved space on GPU 1: 20.240 GB, free space: 1.802GB, frag space: 0.936GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:51 worker.py:228] After appending kvc shards, allocated space on GPU 1: 19.304 GB, reserved space on GPU 1: 19.451 GB, free space: 2.591GB, frag space: 0.147GB
INFO 09-18 07:17:51 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 19.304 GB, reserved space on GPU 0: 19.670 GB, free space: 2.486GB, frag space: 0.366GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 2.591GB, frag space: 0.000GB
INFO 09-18 07:17:51 llm_engine.py:773] Finished liquid for 22 times, output: Completed! Move shard: [1, 3] from [2, 3] to [0, 1];liquid e2e latency: 2.78s, move and shrink latency: -1726640268.38s, update worker latency: 1726640268.39s, liquid model weights latency: 2.04s, init mem latency: 0.00s, liquid kvc latency: 0.73s, update blocks latency: 0.00s;, current mem info on GPU0: allocated space on GPU 0: 19.304 GB, reserved space on GPU 0: 19.670 GB, free space: 2.486GB, frag space: 0.366GB, current gpu block: #3212
INFO 09-18 07:17:51 llm_engine.py:759] ---------Start 23'th liquid: Scale in from GPU1 to GPU 0---------
INFO 09-18 07:17:51 multiproc_gpu_executor.py:216] Shrink to: #964, currently using blocks: #2
INFO 09-18 07:17:51 multiproc_gpu_executor.py:217] Before move and shrink: on GPU: allocated space on GPU 0: 19.304 GB, reserved space on GPU 0: 19.672 GB, free space: 2.484GB, frag space: 0.368GB
INFO 09-18 07:17:51 multiproc_gpu_executor.py:219] After move and shrink: allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 10.920 GB, free space: 11.236GB, frag space: 0.413GB
INFO 09-18 07:17:51 multiproc_gpu_executor.py:258] Start to do liquid from src: 1 to dst: 0 with shard_ids: [2, 3]
INFO 09-18 07:17:51 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 10.920 GB, free space: 11.236GB, frag space: 0.413GB, [(898899, 12.421875)]
INFO 09-18 07:17:51 worker.py:165] Before appending weights shards, allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 10.920 GB, free space: 11.236GB, frag space: 0.413GB
INFO 09-18 07:17:51 worker.py:168] After recving weights shards, allocated space on GPU 0: 16.700 GB, reserved space on GPU 0: 16.895 GB, free space: 5.261GB, frag space: 0.195GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m It takes: 0.32s to send model shards
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:51 worker.py:152] send weights shards takes: 0.35s, sent out: 6.19GB, sent bw: 17.62GB/s
INFO 09-18 07:17:51 worker.py:171] After appending weights shards, allocated space on GPU 0: 16.701 GB, reserved space on GPU 0: 20.592 GB, free space: 1.564GB, frag space: 3.891GB
INFO 09-18 07:17:51 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 16.701 GB, reserved space on GPU 0: 20.592 GB, free space: 1.564GB, frag space: 3.891GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 17.669GB, frag space: 0.000GB
INFO 09-18 07:17:51 worker.py:215] Before recv kvc shards, allocated space on GPU 0: 16.701 GB, reserved space on GPU 0: 20.592 GB, free space: 1.564GB, frag space: 3.891GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:52 cache_engine.py:159] send kvc shards takes: 0.18s, sent out: 3.77GB, sent bw: 20.92GB/s
INFO 09-18 07:17:52 worker.py:217] After recv kvc shards, allocated space on GPU 0: 20.466 GB, reserved space on GPU 0: 21.062 GB, free space: 1.093GB, frag space: 0.596GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:52 cache_engine.py:165] Successfully send kv cache shards: [2, 3] to rank: 0
INFO 09-18 07:17:52 worker.py:223] After appending kvc shards, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 21.947 GB, free space: 0.209GB, frag space: 1.450GB
INFO 09-18 07:17:52 worker.py:228] After appending kvc shards, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.994 GB, free space: 1.162GB, frag space: 0.497GB
INFO 09-18 07:17:52 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.994 GB, free space: 1.162GB, frag space: 0.497GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 21.481GB, frag space: 0.000GB
INFO 09-18 07:17:52 llm_engine.py:773] Finished liquid for 23 times, output: Completed! Move shard: [2, 3] from [1] to [0];liquid e2e latency: 1.35s, move and shrink latency: 0.11s, update worker latency: -0.11s, liquid model weights latency: 0.63s, init mem latency: 0.00s, liquid kvc latency: 0.72s, update blocks latency: 0.00s;, current mem info on GPU0: allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.994 GB, free space: 1.162GB, frag space: 0.497GB, current gpu block: #964
INFO 09-18 07:17:52 llm_engine.py:759] ---------Start 24'th liquid: Scale out from GPU0 to GPU1---------
INFO 09-18 07:17:52 multiproc_gpu_executor.py:258] Start to do liquid from src: 0 to dst: 1 with shard_ids: [2, 3]
INFO 09-18 07:17:52 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.996 GB, free space: 1.160GB, frag space: 0.499GB, [(898899, 22.498046875)]
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:52 worker.py:165] Before appending weights shards, allocated space on GPU 1: 0.548 GB, reserved space on GPU 1: 0.561 GB, free space: 21.481GB, frag space: 0.012GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:52 worker.py:168] After recving weights shards, allocated space on GPU 1: 6.741 GB, reserved space on GPU 1: 6.754 GB, free space: 15.288GB, frag space: 0.013GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:52 worker.py:171] After appending weights shards, allocated space on GPU 1: 6.741 GB, reserved space on GPU 1: 8.307 GB, free space: 13.735GB, frag space: 1.565GB
It takes: 0.24s to send model shards
INFO 09-18 07:17:52 worker.py:152] send weights shards takes: 0.29s, sent out: 6.19GB, sent bw: 21.59GB/s
INFO 09-18 07:17:52 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 14.304 GB, reserved space on GPU 0: 15.016 GB, free space: 7.140GB, frag space: 0.712GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 13.735GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:52 worker.py:215] Before recv kvc shards, allocated space on GPU 1: 6.741 GB, reserved space on GPU 1: 8.307 GB, free space: 13.735GB, frag space: 1.565GB
INFO 09-18 07:17:53 cache_engine.py:159] send kvc shards takes: 0.11s, sent out: 3.77GB, sent bw: 33.65GB/s
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:53 worker.py:217] After recv kvc shards, allocated space on GPU 1: 10.507 GB, reserved space on GPU 1: 12.119 GB, free space: 9.923GB, frag space: 1.612GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:53 worker.py:223] After appending kvc shards, allocated space on GPU 1: 10.507 GB, reserved space on GPU 1: 12.238 GB, free space: 9.804GB, frag space: 1.731GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:53 worker.py:228] After appending kvc shards, allocated space on GPU 1: 10.507 GB, reserved space on GPU 1: 10.619 GB, free space: 11.423GB, frag space: 0.112GB
INFO 09-18 07:17:53 cache_engine.py:165] Successfully send kv cache shards: [2, 3] to rank: 1
INFO 09-18 07:17:53 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 11.146 GB, free space: 11.009GB, frag space: 0.640GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.423GB, frag space: 0.000GB
2212
INFO 09-18 07:17:53 multiproc_gpu_executor.py:171] After scale out, num_gpu_blocks: #3176
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:53 worker.py:202] extend gpu in worker takes: 0.36s
INFO 09-18 07:17:53 worker.py:202] extend gpu in worker takes: 0.36s
INFO 09-18 07:17:53 llm_engine.py:773] Finished liquid for 24 times, output: Completed! Move shard: [2, 3] from [0] to [1];liquid e2e latency: 0.87s, update worker latency: 0.00s, liquid model weights latency: 0.34s, init mem latency: 0.00s, liquid kvc latency: 0.16s, extending gpu blocks latency: 0.37s, update blocks latency: 0.00s;, current mem info on GPU0: allocated space on GPU 0: 19.179 GB, reserved space on GPU 0: 19.891 GB, free space: 2.265GB, frag space: 0.712GB, current gpu block: #3176
INFO 09-18 07:17:53 llm_engine.py:759] ---------Start 25'th liquid: Scale out from GPU[0,1] to GPU[2,3]---------
INFO 09-18 07:17:53 multiproc_gpu_executor.py:258] Start to do liquid from src: 0 to dst: 2 with shard_ids: [1]
INFO 09-18 07:17:53 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 19.179 GB, reserved space on GPU 0: 19.893 GB, free space: 2.263GB, frag space: 0.714GB, [(898899, 21.39453125)]
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:53 worker.py:165] Before appending weights shards, allocated space on GPU 2: 0.525 GB, reserved space on GPU 2: 0.539 GB, free space: 22.000GB, frag space: 0.014GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:53 worker.py:168] After recving weights shards, allocated space on GPU 2: 3.621 GB, reserved space on GPU 2: 3.887 GB, free space: 18.652GB, frag space: 0.265GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:53 worker.py:171] After appending weights shards, allocated space on GPU 2: 3.621 GB, reserved space on GPU 2: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
It takes: 0.27s to send model shards
INFO 09-18 07:17:53 worker.py:152] send weights shards takes: 0.30s, sent out: 3.10GB, sent bw: 10.34GB/s
INFO 09-18 07:17:53 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 16.082 GB, reserved space on GPU 0: 16.488 GB, free space: 5.668GB, frag space: 0.406GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 2.798GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:53 worker.py:215] Before recv kvc shards, allocated space on GPU 2: 3.621 GB, reserved space on GPU 2: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
INFO 09-18 07:17:54 cache_engine.py:159] send kvc shards takes: 0.42s, sent out: 6.20GB, sent bw: 14.77GB/s
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:54 worker.py:217] After recv kvc shards, allocated space on GPU 2: 9.825 GB, reserved space on GPU 2: 10.914 GB, free space: 11.625GB, frag space: 1.090GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:54 worker.py:223] After appending kvc shards, allocated space on GPU 2: 9.825 GB, reserved space on GPU 2: 11.109 GB, free space: 11.429GB, frag space: 1.285GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:54 worker.py:228] After appending kvc shards, allocated space on GPU 2: 9.825 GB, reserved space on GPU 2: 9.957 GB, free space: 12.582GB, frag space: 0.132GB
INFO 09-18 07:17:54 cache_engine.py:165] Successfully send kv cache shards: [1] to rank: 2
INFO 09-18 07:17:54 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 9.848 GB, reserved space on GPU 0: 10.301 GB, free space: 11.855GB, frag space: 0.453GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 2.798GB, frag space: 0.000GB
INFO 09-18 07:17:54 multiproc_gpu_executor.py:258] Start to do liquid from src: 1 to dst: 3 with shard_ids: [3]
INFO 09-18 07:17:54 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 9.848 GB, reserved space on GPU 0: 10.301 GB, free space: 11.855GB, frag space: 0.453GB, [(898899, 11.802734375)]
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:54 worker.py:165] Before appending weights shards, allocated space on GPU 3: 0.525 GB, reserved space on GPU 3: 0.539 GB, free space: 22.000GB, frag space: 0.014GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:55 worker.py:168] After recving weights shards, allocated space on GPU 3: 3.621 GB, reserved space on GPU 3: 3.887 GB, free space: 18.652GB, frag space: 0.265GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:55 worker.py:171] After appending weights shards, allocated space on GPU 3: 3.621 GB, reserved space on GPU 3: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m It takes: 0.57s to send model shards
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:55 worker.py:152] send weights shards takes: 0.60s, sent out: 3.10GB, sent bw: 5.20GB/s
INFO 09-18 07:17:55 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 9.848 GB, reserved space on GPU 0: 10.301 GB, free space: 11.855GB, frag space: 0.453GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 5.784GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:55 worker.py:215] Before recv kvc shards, allocated space on GPU 3: 3.621 GB, reserved space on GPU 3: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:55 cache_engine.py:159] send kvc shards takes: 0.41s, sent out: 6.20GB, sent bw: 15.20GB/s
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:55 worker.py:217] After recv kvc shards, allocated space on GPU 3: 9.825 GB, reserved space on GPU 3: 10.914 GB, free space: 11.625GB, frag space: 1.090GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:55 worker.py:223] After appending kvc shards, allocated space on GPU 3: 9.825 GB, reserved space on GPU 3: 11.109 GB, free space: 11.429GB, frag space: 1.285GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:55 worker.py:228] After appending kvc shards, allocated space on GPU 3: 9.825 GB, reserved space on GPU 3: 9.957 GB, free space: 12.582GB, frag space: 0.132GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:55 cache_engine.py:165] Successfully send kv cache shards: [3] to rank: 3
INFO 09-18 07:17:55 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 9.848 GB, reserved space on GPU 0: 10.301 GB, free space: 11.855GB, frag space: 0.453GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.971GB, frag space: 0.000GB
INFO 09-18 07:17:55 multiproc_gpu_executor.py:189] After scale out, num_gpu_blocks: #8033
INFO 09-18 07:17:55 worker.py:202] extend gpu in worker takes: 0.38s
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:55 worker.py:202] extend gpu in worker takes: 0.39s
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:55 worker.py:202] extend gpu in worker takes: 0.39s
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:55 worker.py:202] extend gpu in worker takes: 0.40s
INFO 09-18 07:17:55 llm_engine.py:773] Finished liquid for 25 times, output: Completed! Move shard: [1, 3] from [0, 1] to [2, 3];liquid e2e latency: 2.46s, update worker latency: 0.01s, liquid model weights latency: 1.55s, init mem latency: 0.00s, liquid kvc latency: 0.49s, extending gpu blocks latency: 0.41s, update blocks latency: 0.01s;, current mem info on GPU0: allocated space on GPU 0: 19.334 GB, reserved space on GPU 0: 19.801 GB, free space: 2.355GB, frag space: 0.466GB, current gpu block: #8033
INFO 09-18 07:17:55 llm_engine.py:759] ---------Start 26'th liquid: Scale in from GPU[2,3] to GPU[0,1]---------
INFO 09-18 07:17:56 multiproc_gpu_executor.py:242] Shrink to: #3176, currently using blocks: #2
INFO 09-18 07:17:56 multiproc_gpu_executor.py:243] Before move and shrink: allocated space on GPU 0: 19.334 GB, reserved space on GPU 0: 19.803 GB, free space: 2.353GB, frag space: 0.468GB
INFO 09-18 07:17:56 multiproc_gpu_executor.py:245] After move and shrink: allocated space on GPU 0: 9.848 GB, reserved space on GPU 0: 10.301 GB, free space: 11.855GB, frag space: 0.453GB
INFO 09-18 07:17:56 multiproc_gpu_executor.py:258] Start to do liquid from src: 2 to dst: 0 with shard_ids: [1]
INFO 09-18 07:17:56 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 9.848 GB, reserved space on GPU 0: 10.301 GB, free space: 11.855GB, frag space: 0.453GB, [(898899, 11.802734375)]
INFO 09-18 07:17:56 worker.py:165] Before appending weights shards, allocated space on GPU 0: 9.848 GB, reserved space on GPU 0: 10.301 GB, free space: 11.855GB, frag space: 0.453GB
INFO 09-18 07:17:56 worker.py:168] After recving weights shards, allocated space on GPU 0: 12.944 GB, reserved space on GPU 0: 13.227 GB, free space: 8.929GB, frag space: 0.282GB
INFO 09-18 07:17:56 worker.py:171] After appending weights shards, allocated space on GPU 0: 12.944 GB, reserved space on GPU 0: 16.326 GB, free space: 5.830GB, frag space: 3.382GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m It takes: 0.32s to send model shards
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:56 worker.py:152] send weights shards takes: 0.35s, sent out: 3.10GB, sent bw: 8.88GB/s
INFO 09-18 07:17:56 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 12.944 GB, reserved space on GPU 0: 16.326 GB, free space: 5.830GB, frag space: 3.382GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.971GB, frag space: 0.000GB
INFO 09-18 07:17:56 worker.py:215] Before recv kvc shards, allocated space on GPU 0: 12.944 GB, reserved space on GPU 0: 16.326 GB, free space: 5.830GB, frag space: 3.382GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:57 cache_engine.py:159] send kvc shards takes: 0.44s, sent out: 6.20GB, sent bw: 14.01GB/s
INFO 09-18 07:17:57 worker.py:217] After recv kvc shards, allocated space on GPU 0: 19.147 GB, reserved space on GPU 0: 19.498 GB, free space: 2.658GB, frag space: 0.351GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:17:57 cache_engine.py:165] Successfully send kv cache shards: [1] to rank: 0
INFO 09-18 07:17:57 worker.py:223] After appending kvc shards, allocated space on GPU 0: 19.179 GB, reserved space on GPU 0: 20.217 GB, free space: 1.939GB, frag space: 1.038GB
INFO 09-18 07:17:57 worker.py:228] After appending kvc shards, allocated space on GPU 0: 19.179 GB, reserved space on GPU 0: 19.436 GB, free space: 2.720GB, frag space: 0.257GB
INFO 09-18 07:17:57 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 19.179 GB, reserved space on GPU 0: 19.436 GB, free space: 2.720GB, frag space: 0.257GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.971GB, frag space: 0.000GB
INFO 09-18 07:17:57 multiproc_gpu_executor.py:258] Start to do liquid from src: 3 to dst: 1 with shard_ids: [3]
INFO 09-18 07:17:57 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 19.179 GB, reserved space on GPU 0: 19.436 GB, free space: 2.720GB, frag space: 0.257GB, [(898899, 20.9375)]
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:57 worker.py:165] Before appending weights shards, allocated space on GPU 1: 9.848 GB, reserved space on GPU 1: 10.070 GB, free space: 11.971GB, frag space: 0.222GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:57 worker.py:168] After recving weights shards, allocated space on GPU 1: 12.944 GB, reserved space on GPU 1: 13.246 GB, free space: 8.796GB, frag space: 0.302GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:57 worker.py:171] After appending weights shards, allocated space on GPU 1: 12.944 GB, reserved space on GPU 1: 16.049 GB, free space: 5.993GB, frag space: 3.104GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m It takes: 0.26s to send model shards
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:57 worker.py:152] send weights shards takes: 0.28s, sent out: 3.10GB, sent bw: 10.92GB/s
INFO 09-18 07:17:57 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 19.179 GB, reserved space on GPU 0: 19.436 GB, free space: 2.720GB, frag space: 0.257GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 5.993GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:57 worker.py:215] Before recv kvc shards, allocated space on GPU 1: 12.944 GB, reserved space on GPU 1: 16.049 GB, free space: 5.993GB, frag space: 3.104GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:58 cache_engine.py:159] send kvc shards takes: 0.42s, sent out: 6.20GB, sent bw: 14.85GB/s
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:58 worker.py:217] After recv kvc shards, allocated space on GPU 1: 19.147 GB, reserved space on GPU 1: 19.389 GB, free space: 2.653GB, frag space: 0.241GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:17:58 cache_engine.py:165] Successfully send kv cache shards: [3] to rank: 1
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:58 worker.py:223] After appending kvc shards, allocated space on GPU 1: 19.179 GB, reserved space on GPU 1: 20.107 GB, free space: 1.934GB, frag space: 0.929GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:58 worker.py:228] After appending kvc shards, allocated space on GPU 1: 19.179 GB, reserved space on GPU 1: 19.326 GB, free space: 2.716GB, frag space: 0.147GB
INFO 09-18 07:17:58 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 19.179 GB, reserved space on GPU 0: 19.436 GB, free space: 2.720GB, frag space: 0.257GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 2.716GB, frag space: 0.000GB
INFO 09-18 07:17:58 llm_engine.py:773] Finished liquid for 26 times, output: Completed! Move shard: [1, 3] from [2, 3] to [0, 1];liquid e2e latency: 2.63s, move and shrink latency: -1726640275.99s, update worker latency: 1726640276.00s, liquid model weights latency: 1.91s, init mem latency: 0.00s, liquid kvc latency: 0.71s, update blocks latency: 0.00s;, current mem info on GPU0: allocated space on GPU 0: 19.179 GB, reserved space on GPU 0: 19.436 GB, free space: 2.720GB, frag space: 0.257GB, current gpu block: #3176
INFO 09-18 07:17:58 llm_engine.py:759] ---------Start 27'th liquid: Scale in from GPU1 to GPU 0---------
INFO 09-18 07:17:58 multiproc_gpu_executor.py:216] Shrink to: #964, currently using blocks: #2
INFO 09-18 07:17:58 multiproc_gpu_executor.py:217] Before move and shrink: on GPU: allocated space on GPU 0: 19.179 GB, reserved space on GPU 0: 19.438 GB, free space: 2.718GB, frag space: 0.259GB
INFO 09-18 07:17:58 multiproc_gpu_executor.py:219] After move and shrink: allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 10.811 GB, free space: 11.345GB, frag space: 0.304GB
INFO 09-18 07:17:58 multiproc_gpu_executor.py:258] Start to do liquid from src: 1 to dst: 0 with shard_ids: [2, 3]
INFO 09-18 07:17:58 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 10.811 GB, free space: 11.345GB, frag space: 0.304GB, [(898899, 12.3125)]
INFO 09-18 07:17:58 worker.py:165] Before appending weights shards, allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 10.811 GB, free space: 11.345GB, frag space: 0.304GB
INFO 09-18 07:17:59 worker.py:168] After recving weights shards, allocated space on GPU 0: 16.700 GB, reserved space on GPU 0: 16.895 GB, free space: 5.261GB, frag space: 0.195GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m It takes: 0.32s to send model shards
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:59 worker.py:152] send weights shards takes: 0.35s, sent out: 6.19GB, sent bw: 17.60GB/s
INFO 09-18 07:17:59 worker.py:171] After appending weights shards, allocated space on GPU 0: 16.701 GB, reserved space on GPU 0: 20.748 GB, free space: 1.408GB, frag space: 4.047GB
INFO 09-18 07:17:59 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 16.701 GB, reserved space on GPU 0: 20.748 GB, free space: 1.408GB, frag space: 4.047GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 17.669GB, frag space: 0.000GB
INFO 09-18 07:17:59 worker.py:215] Before recv kvc shards, allocated space on GPU 0: 16.701 GB, reserved space on GPU 0: 20.748 GB, free space: 1.408GB, frag space: 4.047GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:59 cache_engine.py:159] send kvc shards takes: 0.19s, sent out: 3.77GB, sent bw: 19.33GB/s
INFO 09-18 07:17:59 worker.py:217] After recv kvc shards, allocated space on GPU 0: 20.466 GB, reserved space on GPU 0: 21.010 GB, free space: 1.146GB, frag space: 0.544GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:17:59 cache_engine.py:165] Successfully send kv cache shards: [2, 3] to rank: 0
INFO 09-18 07:17:59 worker.py:223] After appending kvc shards, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 21.424 GB, free space: 0.732GB, frag space: 0.926GB
INFO 09-18 07:17:59 worker.py:228] After appending kvc shards, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.947 GB, free space: 1.209GB, frag space: 0.450GB
INFO 09-18 07:17:59 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.947 GB, free space: 1.209GB, frag space: 0.450GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 21.481GB, frag space: 0.000GB
INFO 09-18 07:17:59 llm_engine.py:773] Finished liquid for 27 times, output: Completed! Move shard: [2, 3] from [1] to [0];liquid e2e latency: 1.30s, move and shrink latency: 0.11s, update worker latency: -0.11s, liquid model weights latency: 0.63s, init mem latency: 0.00s, liquid kvc latency: 0.67s, update blocks latency: 0.00s;, current mem info on GPU0: allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.947 GB, free space: 1.209GB, frag space: 0.450GB, current gpu block: #964
INFO 09-18 07:18:00 llm_engine.py:759] ---------Start 28'th liquid: Scale out from GPU0 to GPU1---------
INFO 09-18 07:18:00 multiproc_gpu_executor.py:258] Start to do liquid from src: 0 to dst: 1 with shard_ids: [2, 3]
INFO 09-18 07:18:00 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.949 GB, free space: 1.207GB, frag space: 0.452GB, [(898899, 22.451171875)]
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:00 worker.py:165] Before appending weights shards, allocated space on GPU 1: 0.548 GB, reserved space on GPU 1: 0.561 GB, free space: 21.481GB, frag space: 0.012GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:00 worker.py:168] After recving weights shards, allocated space on GPU 1: 6.741 GB, reserved space on GPU 1: 6.754 GB, free space: 15.288GB, frag space: 0.013GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:00 worker.py:171] After appending weights shards, allocated space on GPU 1: 6.741 GB, reserved space on GPU 1: 8.307 GB, free space: 13.735GB, frag space: 1.565GB
It takes: 0.26s to send model shards
INFO 09-18 07:18:00 worker.py:152] send weights shards takes: 0.30s, sent out: 6.19GB, sent bw: 20.51GB/s
INFO 09-18 07:18:00 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 14.304 GB, reserved space on GPU 0: 14.844 GB, free space: 7.312GB, frag space: 0.540GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 13.735GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:00 worker.py:215] Before recv kvc shards, allocated space on GPU 1: 6.741 GB, reserved space on GPU 1: 8.307 GB, free space: 13.735GB, frag space: 1.565GB
INFO 09-18 07:18:00 cache_engine.py:159] send kvc shards takes: 0.11s, sent out: 3.77GB, sent bw: 33.67GB/s
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:00 worker.py:217] After recv kvc shards, allocated space on GPU 1: 10.507 GB, reserved space on GPU 1: 12.119 GB, free space: 9.923GB, frag space: 1.612GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:00 worker.py:223] After appending kvc shards, allocated space on GPU 1: 10.507 GB, reserved space on GPU 1: 12.238 GB, free space: 9.804GB, frag space: 1.731GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:00 worker.py:228] After appending kvc shards, allocated space on GPU 1: 10.507 GB, reserved space on GPU 1: 10.619 GB, free space: 11.423GB, frag space: 0.112GB
INFO 09-18 07:18:00 cache_engine.py:165] Successfully send kv cache shards: [2, 3] to rank: 1
INFO 09-18 07:18:00 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 10.975 GB, free space: 11.181GB, frag space: 0.468GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.423GB, frag space: 0.000GB
2256
INFO 09-18 07:18:00 multiproc_gpu_executor.py:171] After scale out, num_gpu_blocks: #3220
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:00 worker.py:202] extend gpu in worker takes: 0.37s
INFO 09-18 07:18:00 worker.py:202] extend gpu in worker takes: 0.38s
INFO 09-18 07:18:00 llm_engine.py:773] Finished liquid for 28 times, output: Completed! Move shard: [2, 3] from [0] to [1];liquid e2e latency: 0.89s, update worker latency: 0.00s, liquid model weights latency: 0.34s, init mem latency: 0.00s, liquid kvc latency: 0.16s, extending gpu blocks latency: 0.38s, update blocks latency: 0.00s;, current mem info on GPU0: allocated space on GPU 0: 19.319 GB, reserved space on GPU 0: 19.906 GB, free space: 2.250GB, frag space: 0.587GB, current gpu block: #3220
INFO 09-18 07:18:00 llm_engine.py:759] ---------Start 29'th liquid: Scale out from GPU[0,1] to GPU[2,3]---------
INFO 09-18 07:18:00 multiproc_gpu_executor.py:258] Start to do liquid from src: 0 to dst: 2 with shard_ids: [1]
INFO 09-18 07:18:00 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 19.319 GB, reserved space on GPU 0: 19.908 GB, free space: 2.248GB, frag space: 0.589GB, [(898899, 21.41015625)]
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:18:00 worker.py:165] Before appending weights shards, allocated space on GPU 2: 0.525 GB, reserved space on GPU 2: 0.539 GB, free space: 22.000GB, frag space: 0.014GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:18:01 worker.py:168] After recving weights shards, allocated space on GPU 2: 3.621 GB, reserved space on GPU 2: 3.887 GB, free space: 18.652GB, frag space: 0.265GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:18:01 worker.py:171] After appending weights shards, allocated space on GPU 2: 3.621 GB, reserved space on GPU 2: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
It takes: 0.29s to send model shards
INFO 09-18 07:18:01 worker.py:152] send weights shards takes: 0.31s, sent out: 3.10GB, sent bw: 9.95GB/s
INFO 09-18 07:18:01 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 16.223 GB, reserved space on GPU 0: 16.535 GB, free space: 5.621GB, frag space: 0.312GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 2.610GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:18:01 worker.py:215] Before recv kvc shards, allocated space on GPU 2: 3.621 GB, reserved space on GPU 2: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
INFO 09-18 07:18:01 cache_engine.py:159] send kvc shards takes: 0.42s, sent out: 6.29GB, sent bw: 14.80GB/s
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:18:01 worker.py:217] After recv kvc shards, allocated space on GPU 2: 9.934 GB, reserved space on GPU 2: 10.977 GB, free space: 11.562GB, frag space: 1.043GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:18:01 worker.py:223] After appending kvc shards, allocated space on GPU 2: 9.934 GB, reserved space on GPU 2: 11.174 GB, free space: 11.365GB, frag space: 1.240GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:18:01 worker.py:228] After appending kvc shards, allocated space on GPU 2: 9.934 GB, reserved space on GPU 2: 10.020 GB, free space: 12.519GB, frag space: 0.086GB
INFO 09-18 07:18:01 cache_engine.py:165] Successfully send kv cache shards: [1] to rank: 2
INFO 09-18 07:18:01 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 9.957 GB, reserved space on GPU 0: 10.223 GB, free space: 11.933GB, frag space: 0.265GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 2.610GB, frag space: 0.000GB
INFO 09-18 07:18:01 multiproc_gpu_executor.py:258] Start to do liquid from src: 1 to dst: 3 with shard_ids: [3]
INFO 09-18 07:18:02 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 9.957 GB, reserved space on GPU 0: 10.223 GB, free space: 11.933GB, frag space: 0.265GB, [(898899, 11.724609375)]
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:18:02 worker.py:165] Before appending weights shards, allocated space on GPU 3: 0.525 GB, reserved space on GPU 3: 0.539 GB, free space: 22.000GB, frag space: 0.014GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:18:02 worker.py:168] After recving weights shards, allocated space on GPU 3: 3.621 GB, reserved space on GPU 3: 3.887 GB, free space: 18.652GB, frag space: 0.265GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:18:02 worker.py:171] After appending weights shards, allocated space on GPU 3: 3.621 GB, reserved space on GPU 3: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m It takes: 0.24s to send model shards
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:02 worker.py:152] send weights shards takes: 0.27s, sent out: 3.10GB, sent bw: 11.46GB/s
INFO 09-18 07:18:02 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 9.957 GB, reserved space on GPU 0: 10.223 GB, free space: 11.933GB, frag space: 0.265GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 5.596GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:18:02 worker.py:215] Before recv kvc shards, allocated space on GPU 3: 3.621 GB, reserved space on GPU 3: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:02 cache_engine.py:159] send kvc shards takes: 0.41s, sent out: 6.29GB, sent bw: 15.23GB/s
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:18:02 worker.py:217] After recv kvc shards, allocated space on GPU 3: 9.934 GB, reserved space on GPU 3: 10.977 GB, free space: 11.562GB, frag space: 1.043GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:18:02 worker.py:223] After appending kvc shards, allocated space on GPU 3: 9.934 GB, reserved space on GPU 3: 11.174 GB, free space: 11.365GB, frag space: 1.240GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:18:02 worker.py:228] After appending kvc shards, allocated space on GPU 3: 9.934 GB, reserved space on GPU 3: 10.020 GB, free space: 12.519GB, frag space: 0.086GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:03 cache_engine.py:165] Successfully send kv cache shards: [3] to rank: 3
INFO 09-18 07:18:03 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 9.957 GB, reserved space on GPU 0: 10.223 GB, free space: 11.933GB, frag space: 0.265GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.909GB, frag space: 0.000GB
INFO 09-18 07:18:03 multiproc_gpu_executor.py:189] After scale out, num_gpu_blocks: #8105
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:03 worker.py:202] extend gpu in worker takes: 0.42s
INFO 09-18 07:18:03 worker.py:202] extend gpu in worker takes: 0.43s
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:18:03 worker.py:202] extend gpu in worker takes: 0.43s
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:18:03 worker.py:202] extend gpu in worker takes: 0.43s
INFO 09-18 07:18:03 llm_engine.py:773] Finished liquid for 29 times, output: Completed! Move shard: [1, 3] from [0, 1] to [2, 3];liquid e2e latency: 2.52s, update worker latency: 0.01s, liquid model weights latency: 1.56s, init mem latency: 0.00s, liquid kvc latency: 0.50s, extending gpu blocks latency: 0.44s, update blocks latency: 0.01s;, current mem info on GPU0: allocated space on GPU 0: 19.475 GB, reserved space on GPU 0: 19.785 GB, free space: 2.371GB, frag space: 0.310GB, current gpu block: #8105
INFO 09-18 07:18:03 llm_engine.py:759] ---------Start 30'th liquid: Scale in from GPU[2,3] to GPU[0,1]---------
INFO 09-18 07:18:03 multiproc_gpu_executor.py:242] Shrink to: #3220, currently using blocks: #3
INFO 09-18 07:18:03 multiproc_gpu_executor.py:243] Before move and shrink: allocated space on GPU 0: 19.475 GB, reserved space on GPU 0: 19.787 GB, free space: 2.369GB, frag space: 0.312GB
INFO 09-18 07:18:03 multiproc_gpu_executor.py:245] After move and shrink: allocated space on GPU 0: 9.957 GB, reserved space on GPU 0: 10.223 GB, free space: 11.933GB, frag space: 0.265GB
INFO 09-18 07:18:03 multiproc_gpu_executor.py:258] Start to do liquid from src: 2 to dst: 0 with shard_ids: [1]
INFO 09-18 07:18:03 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 9.957 GB, reserved space on GPU 0: 10.223 GB, free space: 11.933GB, frag space: 0.265GB, [(898899, 11.724609375)]
INFO 09-18 07:18:03 worker.py:165] Before appending weights shards, allocated space on GPU 0: 9.957 GB, reserved space on GPU 0: 10.223 GB, free space: 11.933GB, frag space: 0.265GB
INFO 09-18 07:18:04 worker.py:168] After recving weights shards, allocated space on GPU 0: 13.054 GB, reserved space on GPU 0: 13.297 GB, free space: 8.859GB, frag space: 0.243GB
INFO 09-18 07:18:04 worker.py:171] After appending weights shards, allocated space on GPU 0: 13.054 GB, reserved space on GPU 0: 16.553 GB, free space: 5.603GB, frag space: 3.499GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m It takes: 0.35s to send model shards
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:18:04 worker.py:152] send weights shards takes: 0.37s, sent out: 3.10GB, sent bw: 8.33GB/s
INFO 09-18 07:18:04 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 13.054 GB, reserved space on GPU 0: 16.553 GB, free space: 5.603GB, frag space: 3.499GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.909GB, frag space: 0.000GB
INFO 09-18 07:18:04 worker.py:215] Before recv kvc shards, allocated space on GPU 0: 13.054 GB, reserved space on GPU 0: 16.553 GB, free space: 5.603GB, frag space: 3.499GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:18:04 cache_engine.py:159] send kvc shards takes: 0.46s, sent out: 6.29GB, sent bw: 13.75GB/s
INFO 09-18 07:18:04 worker.py:217] After recv kvc shards, allocated space on GPU 0: 19.366 GB, reserved space on GPU 0: 19.838 GB, free space: 2.318GB, frag space: 0.472GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:18:04 cache_engine.py:165] Successfully send kv cache shards: [1] to rank: 0
INFO 09-18 07:18:05 worker.py:223] After appending kvc shards, allocated space on GPU 0: 19.319 GB, reserved space on GPU 0: 20.627 GB, free space: 1.529GB, frag space: 1.308GB
INFO 09-18 07:18:05 worker.py:228] After appending kvc shards, allocated space on GPU 0: 19.319 GB, reserved space on GPU 0: 19.838 GB, free space: 2.318GB, frag space: 0.519GB
INFO 09-18 07:18:05 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 19.319 GB, reserved space on GPU 0: 19.838 GB, free space: 2.318GB, frag space: 0.519GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.909GB, frag space: 0.000GB
INFO 09-18 07:18:05 multiproc_gpu_executor.py:258] Start to do liquid from src: 3 to dst: 1 with shard_ids: [3]
INFO 09-18 07:18:05 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 19.319 GB, reserved space on GPU 0: 19.838 GB, free space: 2.318GB, frag space: 0.519GB, [(898899, 21.33984375)]
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:05 worker.py:165] Before appending weights shards, allocated space on GPU 1: 9.957 GB, reserved space on GPU 1: 10.133 GB, free space: 11.909GB, frag space: 0.175GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:05 worker.py:168] After recving weights shards, allocated space on GPU 1: 13.054 GB, reserved space on GPU 1: 13.309 GB, free space: 8.733GB, frag space: 0.255GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:05 worker.py:171] After appending weights shards, allocated space on GPU 1: 13.054 GB, reserved space on GPU 1: 16.111 GB, free space: 5.930GB, frag space: 3.058GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m It takes: 0.26s to send model shards
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:18:05 worker.py:152] send weights shards takes: 0.28s, sent out: 3.10GB, sent bw: 11.04GB/s
INFO 09-18 07:18:05 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 19.319 GB, reserved space on GPU 0: 19.838 GB, free space: 2.318GB, frag space: 0.519GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 5.930GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:05 worker.py:215] Before recv kvc shards, allocated space on GPU 1: 13.054 GB, reserved space on GPU 1: 16.111 GB, free space: 5.930GB, frag space: 3.058GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:18:05 cache_engine.py:159] send kvc shards takes: 0.43s, sent out: 6.29GB, sent bw: 14.79GB/s
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:05 worker.py:217] After recv kvc shards, allocated space on GPU 1: 19.366 GB, reserved space on GPU 1: 19.514 GB, free space: 2.528GB, frag space: 0.147GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:18:05 cache_engine.py:165] Successfully send kv cache shards: [3] to rank: 1
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:06 worker.py:223] After appending kvc shards, allocated space on GPU 1: 19.319 GB, reserved space on GPU 1: 20.303 GB, free space: 1.739GB, frag space: 0.983GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:06 worker.py:228] After appending kvc shards, allocated space on GPU 1: 19.319 GB, reserved space on GPU 1: 19.514 GB, free space: 2.528GB, frag space: 0.194GB
INFO 09-18 07:18:06 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 19.319 GB, reserved space on GPU 0: 19.838 GB, free space: 2.318GB, frag space: 0.519GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 2.528GB, frag space: 0.000GB
INFO 09-18 07:18:06 llm_engine.py:773] Finished liquid for 30 times, output: Completed! Move shard: [1, 3] from [2, 3] to [0, 1];liquid e2e latency: 2.59s, move and shrink latency: -1726640283.50s, update worker latency: 1726640283.50s, liquid model weights latency: 1.86s, init mem latency: 0.00s, liquid kvc latency: 0.73s, update blocks latency: 0.00s;, current mem info on GPU0: allocated space on GPU 0: 19.319 GB, reserved space on GPU 0: 19.838 GB, free space: 2.318GB, frag space: 0.519GB, current gpu block: #3220
INFO 09-18 07:18:06 llm_engine.py:759] ---------Start 31'th liquid: Scale in from GPU1 to GPU 0---------
INFO 09-18 07:18:06 multiproc_gpu_executor.py:216] Shrink to: #964, currently using blocks: #3
INFO 09-18 07:18:06 multiproc_gpu_executor.py:217] Before move and shrink: on GPU: allocated space on GPU 0: 19.319 GB, reserved space on GPU 0: 19.840 GB, free space: 2.316GB, frag space: 0.520GB
INFO 09-18 07:18:06 multiproc_gpu_executor.py:219] After move and shrink: allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 11.025 GB, free space: 11.130GB, frag space: 0.519GB
INFO 09-18 07:18:06 multiproc_gpu_executor.py:258] Start to do liquid from src: 1 to dst: 0 with shard_ids: [2, 3]
INFO 09-18 07:18:06 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 11.025 GB, free space: 11.130GB, frag space: 0.519GB, [(898899, 12.52734375)]
INFO 09-18 07:18:06 worker.py:165] Before appending weights shards, allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 11.025 GB, free space: 11.130GB, frag space: 0.519GB
INFO 09-18 07:18:06 worker.py:168] After recving weights shards, allocated space on GPU 0: 16.700 GB, reserved space on GPU 0: 16.906 GB, free space: 5.250GB, frag space: 0.207GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m It takes: 0.32s to send model shards
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:06 worker.py:152] send weights shards takes: 0.35s, sent out: 6.19GB, sent bw: 17.69GB/s
INFO 09-18 07:18:06 worker.py:171] After appending weights shards, allocated space on GPU 0: 16.701 GB, reserved space on GPU 0: 20.885 GB, free space: 1.271GB, frag space: 4.184GB
INFO 09-18 07:18:06 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 16.701 GB, reserved space on GPU 0: 20.885 GB, free space: 1.271GB, frag space: 4.184GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 17.669GB, frag space: 0.000GB
INFO 09-18 07:18:06 worker.py:215] Before recv kvc shards, allocated space on GPU 0: 16.701 GB, reserved space on GPU 0: 20.885 GB, free space: 1.271GB, frag space: 4.184GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:07 cache_engine.py:159] send kvc shards takes: 0.51s, sent out: 3.77GB, sent bw: 7.40GB/s
INFO 09-18 07:18:07 worker.py:217] After recv kvc shards, allocated space on GPU 0: 20.466 GB, reserved space on GPU 0: 21.029 GB, free space: 1.127GB, frag space: 0.563GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:07 cache_engine.py:165] Successfully send kv cache shards: [2, 3] to rank: 0
INFO 09-18 07:18:07 worker.py:223] After appending kvc shards, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 21.920 GB, free space: 0.236GB, frag space: 1.422GB
INFO 09-18 07:18:07 worker.py:228] After appending kvc shards, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.967 GB, free space: 1.189GB, frag space: 0.469GB
INFO 09-18 07:18:07 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.967 GB, free space: 1.189GB, frag space: 0.469GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 21.481GB, frag space: 0.000GB
INFO 09-18 07:18:07 llm_engine.py:773] Finished liquid for 31 times, output: Completed! Move shard: [2, 3] from [1] to [0];liquid e2e latency: 1.35s, move and shrink latency: 0.12s, update worker latency: -0.12s, liquid model weights latency: 0.63s, init mem latency: 0.00s, liquid kvc latency: 0.72s, update blocks latency: 0.00s;, current mem info on GPU0: allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.967 GB, free space: 1.189GB, frag space: 0.469GB, current gpu block: #964
INFO 09-18 07:18:07 llm_engine.py:759] ---------Start 32'th liquid: Scale out from GPU0 to GPU1---------
INFO 09-18 07:18:07 multiproc_gpu_executor.py:258] Start to do liquid from src: 0 to dst: 1 with shard_ids: [2, 3]
INFO 09-18 07:18:07 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 20.497 GB, reserved space on GPU 0: 20.969 GB, free space: 1.187GB, frag space: 0.471GB, [(898899, 22.470703125)]
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:07 worker.py:165] Before appending weights shards, allocated space on GPU 1: 0.548 GB, reserved space on GPU 1: 0.561 GB, free space: 21.481GB, frag space: 0.012GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:07 worker.py:168] After recving weights shards, allocated space on GPU 1: 6.741 GB, reserved space on GPU 1: 6.754 GB, free space: 15.288GB, frag space: 0.013GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:07 worker.py:171] After appending weights shards, allocated space on GPU 1: 6.741 GB, reserved space on GPU 1: 8.307 GB, free space: 13.735GB, frag space: 1.565GB
It takes: 0.22s to send model shards
INFO 09-18 07:18:07 worker.py:152] send weights shards takes: 0.26s, sent out: 6.19GB, sent bw: 23.42GB/s
INFO 09-18 07:18:07 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 14.304 GB, reserved space on GPU 0: 14.922 GB, free space: 7.234GB, frag space: 0.618GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 13.735GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:07 worker.py:215] Before recv kvc shards, allocated space on GPU 1: 6.741 GB, reserved space on GPU 1: 8.307 GB, free space: 13.735GB, frag space: 1.565GB
INFO 09-18 07:18:07 cache_engine.py:159] send kvc shards takes: 0.11s, sent out: 3.77GB, sent bw: 33.69GB/s
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:07 worker.py:217] After recv kvc shards, allocated space on GPU 1: 10.507 GB, reserved space on GPU 1: 12.119 GB, free space: 9.923GB, frag space: 1.612GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:07 worker.py:223] After appending kvc shards, allocated space on GPU 1: 10.507 GB, reserved space on GPU 1: 12.238 GB, free space: 9.804GB, frag space: 1.731GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:07 worker.py:228] After appending kvc shards, allocated space on GPU 1: 10.507 GB, reserved space on GPU 1: 10.619 GB, free space: 11.423GB, frag space: 0.112GB
INFO 09-18 07:18:07 cache_engine.py:165] Successfully send kv cache shards: [2, 3] to rank: 1
INFO 09-18 07:18:07 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 10.507 GB, reserved space on GPU 0: 11.053 GB, free space: 11.103GB, frag space: 0.546GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.423GB, frag space: 0.000GB
2236
INFO 09-18 07:18:07 multiproc_gpu_executor.py:171] After scale out, num_gpu_blocks: #3200
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:08 worker.py:202] extend gpu in worker takes: 0.35s
INFO 09-18 07:18:08 worker.py:202] extend gpu in worker takes: 0.35s
INFO 09-18 07:18:08 llm_engine.py:773] Finished liquid for 32 times, output: Completed! Move shard: [2, 3] from [0] to [1];liquid e2e latency: 0.82s, update worker latency: 0.00s, liquid model weights latency: 0.30s, init mem latency: 0.00s, liquid kvc latency: 0.16s, extending gpu blocks latency: 0.35s, update blocks latency: 0.00s;, current mem info on GPU0: allocated space on GPU 0: 19.241 GB, reserved space on GPU 0: 19.859 GB, free space: 2.297GB, frag space: 0.618GB, current gpu block: #3200
INFO 09-18 07:18:08 llm_engine.py:759] ---------Start 33'th liquid: Scale out from GPU[0,1] to GPU[2,3]---------
INFO 09-18 07:18:08 multiproc_gpu_executor.py:258] Start to do liquid from src: 0 to dst: 2 with shard_ids: [1]
INFO 09-18 07:18:08 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 19.241 GB, reserved space on GPU 0: 19.861 GB, free space: 2.295GB, frag space: 0.620GB, [(898899, 21.36328125)]
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:18:08 worker.py:165] Before appending weights shards, allocated space on GPU 2: 0.525 GB, reserved space on GPU 2: 0.539 GB, free space: 22.000GB, frag space: 0.014GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:18:08 worker.py:168] After recving weights shards, allocated space on GPU 2: 3.621 GB, reserved space on GPU 2: 3.887 GB, free space: 18.652GB, frag space: 0.265GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:18:08 worker.py:171] After appending weights shards, allocated space on GPU 2: 3.621 GB, reserved space on GPU 2: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
It takes: 0.26s to send model shards
INFO 09-18 07:18:08 worker.py:152] send weights shards takes: 0.29s, sent out: 3.10GB, sent bw: 10.74GB/s
INFO 09-18 07:18:08 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 16.145 GB, reserved space on GPU 0: 16.473 GB, free space: 5.683GB, frag space: 0.328GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 2.735GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:18:08 worker.py:215] Before recv kvc shards, allocated space on GPU 2: 3.621 GB, reserved space on GPU 2: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
INFO 09-18 07:18:09 cache_engine.py:159] send kvc shards takes: 0.42s, sent out: 6.25GB, sent bw: 14.83GB/s
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:18:09 worker.py:217] After recv kvc shards, allocated space on GPU 2: 9.871 GB, reserved space on GPU 2: 10.914 GB, free space: 11.625GB, frag space: 1.043GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:18:09 worker.py:223] After appending kvc shards, allocated space on GPU 2: 9.871 GB, reserved space on GPU 2: 11.109 GB, free space: 11.429GB, frag space: 1.238GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:18:09 worker.py:228] After appending kvc shards, allocated space on GPU 2: 9.871 GB, reserved space on GPU 2: 9.957 GB, free space: 12.582GB, frag space: 0.086GB
INFO 09-18 07:18:09 cache_engine.py:165] Successfully send kv cache shards: [1] to rank: 2
INFO 09-18 07:18:09 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 9.895 GB, reserved space on GPU 0: 10.223 GB, free space: 11.933GB, frag space: 0.328GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 2.735GB, frag space: 0.000GB
INFO 09-18 07:18:09 multiproc_gpu_executor.py:258] Start to do liquid from src: 1 to dst: 3 with shard_ids: [3]
INFO 09-18 07:18:09 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 9.895 GB, reserved space on GPU 0: 10.223 GB, free space: 11.933GB, frag space: 0.328GB, [(898899, 11.724609375)]
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:18:09 worker.py:165] Before appending weights shards, allocated space on GPU 3: 0.525 GB, reserved space on GPU 3: 0.539 GB, free space: 22.000GB, frag space: 0.014GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:18:09 worker.py:168] After recving weights shards, allocated space on GPU 3: 3.621 GB, reserved space on GPU 3: 3.887 GB, free space: 18.652GB, frag space: 0.265GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:18:09 worker.py:171] After appending weights shards, allocated space on GPU 3: 3.621 GB, reserved space on GPU 3: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m It takes: 0.56s to send model shards
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:09 worker.py:152] send weights shards takes: 0.59s, sent out: 3.10GB, sent bw: 5.25GB/s
INFO 09-18 07:18:09 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 9.895 GB, reserved space on GPU 0: 10.223 GB, free space: 11.933GB, frag space: 0.328GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 5.721GB, frag space: 0.000GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:18:09 worker.py:215] Before recv kvc shards, allocated space on GPU 3: 3.621 GB, reserved space on GPU 3: 4.664 GB, free space: 17.875GB, frag space: 1.043GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:10 cache_engine.py:159] send kvc shards takes: 0.41s, sent out: 6.25GB, sent bw: 15.18GB/s
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:18:10 worker.py:217] After recv kvc shards, allocated space on GPU 3: 9.871 GB, reserved space on GPU 3: 10.914 GB, free space: 11.625GB, frag space: 1.043GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:18:10 worker.py:223] After appending kvc shards, allocated space on GPU 3: 9.871 GB, reserved space on GPU 3: 11.109 GB, free space: 11.429GB, frag space: 1.238GB
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:18:10 worker.py:228] After appending kvc shards, allocated space on GPU 3: 9.871 GB, reserved space on GPU 3: 9.957 GB, free space: 12.582GB, frag space: 0.086GB
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:10 cache_engine.py:165] Successfully send kv cache shards: [3] to rank: 3
INFO 09-18 07:18:10 multiproc_gpu_executor.py:280] After liquid kvc, allocated space on GPU 0: 9.895 GB, reserved space on GPU 0: 10.223 GB, free space: 11.933GB, frag space: 0.328GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.971GB, frag space: 0.000GB
INFO 09-18 07:18:10 multiproc_gpu_executor.py:189] After scale out, num_gpu_blocks: #8097
INFO 09-18 07:18:10 worker.py:202] extend gpu in worker takes: 0.40s
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:18:10 worker.py:202] extend gpu in worker takes: 0.40s
[1;36m(VllmWorkerProcess pid=898984)[0;0m INFO 09-18 07:18:10 worker.py:202] extend gpu in worker takes: 0.41s
[1;36m(VllmWorkerProcess pid=898982)[0;0m INFO 09-18 07:18:10 worker.py:202] extend gpu in worker takes: 0.42s
INFO 09-18 07:18:10 llm_engine.py:773] Finished liquid for 33 times, output: Completed! Move shard: [1, 3] from [0, 1] to [2, 3];liquid e2e latency: 2.46s, update worker latency: 0.01s, liquid model weights latency: 1.52s, init mem latency: 0.00s, liquid kvc latency: 0.50s, extending gpu blocks latency: 0.42s, update blocks latency: 0.01s;, current mem info on GPU0: allocated space on GPU 0: 19.459 GB, reserved space on GPU 0: 19.848 GB, free space: 2.308GB, frag space: 0.388GB, current gpu block: #8097
INFO 09-18 07:18:10 llm_engine.py:759] ---------Start 34'th liquid: Scale in from GPU[2,3] to GPU[0,1]---------
INFO 09-18 07:18:10 multiproc_gpu_executor.py:242] Shrink to: #3200, currently using blocks: #3
INFO 09-18 07:18:10 multiproc_gpu_executor.py:243] Before move and shrink: allocated space on GPU 0: 19.459 GB, reserved space on GPU 0: 19.850 GB, free space: 2.306GB, frag space: 0.390GB
INFO 09-18 07:18:11 multiproc_gpu_executor.py:245] After move and shrink: allocated space on GPU 0: 9.895 GB, reserved space on GPU 0: 10.223 GB, free space: 11.933GB, frag space: 0.328GB
INFO 09-18 07:18:11 multiproc_gpu_executor.py:258] Start to do liquid from src: 2 to dst: 0 with shard_ids: [1]
INFO 09-18 07:18:11 multiproc_gpu_executor.py:265] Before liquid model weights, allocated space on GPU 0: 9.895 GB, reserved space on GPU 0: 10.223 GB, free space: 11.933GB, frag space: 0.328GB, [(898899, 11.724609375)]
INFO 09-18 07:18:11 worker.py:165] Before appending weights shards, allocated space on GPU 0: 9.895 GB, reserved space on GPU 0: 10.223 GB, free space: 11.933GB, frag space: 0.328GB
INFO 09-18 07:18:11 worker.py:168] After recving weights shards, allocated space on GPU 0: 12.991 GB, reserved space on GPU 0: 13.230 GB, free space: 8.925GB, frag space: 0.239GB
INFO 09-18 07:18:11 worker.py:171] After appending weights shards, allocated space on GPU 0: 12.991 GB, reserved space on GPU 0: 16.471 GB, free space: 5.685GB, frag space: 3.479GB
[1;36m(VllmWorkerProcess pid=898983)[0;0m It takes: 0.35s to send model shards
[1;36m(VllmWorkerProcess pid=898983)[0;0m INFO 09-18 07:18:11 worker.py:152] send weights shards takes: 0.37s, sent out: 3.10GB, sent bw: 8.33GB/s
INFO 09-18 07:18:11 multiproc_gpu_executor.py:270] After liquid model weights, allocated space on GPU 0: 12.991 GB, reserved space on GPU 0: 16.471 GB, free space: 5.685GB, frag space: 3.479GB, allocated space on GPU 1: 0.000 GB, reserved space on GPU 1: 0.000 GB, free space: 11.971GB, frag space: 0.000GB
INFO 09-18 07:18:11 worker.py:215] Before recv kvc shards, allocated space on GPU 0: 12.991 GB, reserved space on GPU 0: 16.471 GB, free space: 5.685GB, frag space: 3.479GB