@@ -8,9 +8,9 @@ It compares the performance of `foldedtensor` with various alternatives for padd
88and working with nested lists and tensors.
99
1010Environment:
11- - ` torch.__version__ == '2.6 .0' `
11+ - ` torch.__version__ == '2.8 .0' `
1212- ` foldedtensor.__version__ == '0.4.0' `
13- - ` python == 3.9.20 `
13+ - ` python == 3.11.3 `
1414- ` sys.platform == 'darwin' `
1515
1616
@@ -22,79 +22,79 @@ nested_list = make_nested_list(32, (50, 100), (25, 30), value=1)
2222
2323Comparisons:
2424% timeit python_padding(nested_list)
25- # 100 loops, best of 5: 15.09 ms per loop
25+ # 100 loops, best of 5: 19.02 ms per loop
2626
2727% timeit foldedtensor.as_folded_tensor(nested_list)
28- # 100 loops, best of 5: 0.73 ms per loop
28+ # 100 loops, best of 5: 0.82 ms per loop
2929
3030```
31- Speedup against best alternative: ** 20.67x ** :rocket :
31+ Speedup against best alternative: ** 23.24x ** :rocket :
3232
3333## Case 2 (same lengths nested lists)
3434
3535``` python
3636nested_list = make_nested_list(32 , 100 , 30 , value = 1 )
3737
3838% timeit torch.tensor(nested_list)
39- # 100 loops, best of 5: 6.51 ms per loop
39+ # 100 loops, best of 5: 7.86 ms per loop
4040
4141% timeit torch.LongTensor(nested_list)
42- # 100 loops, best of 5: 2.78 ms per loop
42+ # 100 loops, best of 5: 3.69 ms per loop
4343
4444% timeit python_padding(nested_list)
45- # 100 loops, best of 5: 18.38 ms per loop
45+ # 100 loops, best of 5: 23.35 ms per loop
4646
4747% timeit torch.nested.nested_tensor([torch.LongTensor(sub) for sub in nested_list]).to_padded_tensor(0 )
48- # 100 loops, best of 5: 3.00 ms per loop
48+ # 100 loops, best of 5: 3.94 ms per loop
4949
5050% timeit foldedtensor.as_folded_tensor(nested_list)
51- # 100 loops, best of 5: 1.08 ms per loop
51+ # 100 loops, best of 5: 1.18 ms per loop
5252
5353```
54- Speedup against best alternative: ** 2.58x ** :rocket :
54+ Speedup against best alternative: ** 3.12x ** :rocket :
5555
5656## Case 3 (simple list)
5757
5858``` python
5959simple_list = make_nested_list(10000 , value = 1 )
6060
6161% timeit torch.tensor(simple_list)
62- # 100 loops, best of 5: 0.63 ms per loop
62+ # 100 loops, best of 5: 0.77 ms per loop
6363
6464% timeit torch.LongTensor(simple_list)
65- # 100 loops, best of 5: 0.27 ms per loop
65+ # 100 loops, best of 5: 0.37 ms per loop
6666
6767% timeit python_padding(simple_list)
68- # 100 loops, best of 5: 0.28 ms per loop
68+ # 100 loops, best of 5: 0.37 ms per loop
6969
7070% timeit foldedtensor.as_folded_tensor(simple_list)
71- # 100 loops, best of 5: 0.08 ms per loop
71+ # 100 loops, best of 5: 0.10 ms per loop
7272
7373```
74- Speedup against best alternative: ** 3.32x ** :rocket :
74+ Speedup against best alternative: ** 3.59x ** :rocket :
7575
7676## Case 4 (same lengths nested lists to flat tensor)
7777
7878``` python
7979nested_list = make_nested_list(32 , 100 , 30 , value = 1 )
8080
8181% timeit torch.tensor(nested_list).view(- 1 )
82- # 100 loops, best of 5: 6.52 ms per loop
82+ # 100 loops, best of 5: 7.83 ms per loop
8383
8484% timeit torch.LongTensor(nested_list).view(- 1 )
85- # 100 loops, best of 5: 2.76 ms per loop
85+ # 100 loops, best of 5: 3.68 ms per loop
8686
8787% timeit python_padding(nested_list).view(- 1 )
88- # 100 loops, best of 5: 18.62 ms per loop
88+ # 100 loops, best of 5: 23.17 ms per loop
8989
9090% timeit foldedtensor.as_folded_tensor(nested_list).view(- 1 )
91- # 100 loops, best of 5: 1.12 ms per loop
91+ # 100 loops, best of 5: 1.19 ms per loop
9292
9393% timeit foldedtensor.as_folded_tensor(nested_list, data_dims = (2 ,))
94- # 100 loops, best of 5: 1.08 ms per loop
94+ # 100 loops, best of 5: 1.16 ms per loop
9595
9696```
97- Speedup against best alternative: ** 2.47x ** :rocket :
97+ Speedup against best alternative: ** 3.10x ** :rocket :
9898## Case 5 (variable lengths nested lists) to padded embeddings
9999
100100Nested lists with different lengths (second level lists have lengths between 50 and 150). We compare ` foldedtensor ` with ` torch.nested ` .
@@ -104,41 +104,72 @@ nested_list = make_nested_list(32, (50, 150), 30, value=1)
104104# Padding with 0
105105
106106% timeit torch.nested.nested_tensor([torch.LongTensor(sub) for sub in nested_list]).to_padded_tensor(0 )
107- # 100 loops, best of 5: 3.02 ms per loop
107+ # 100 loops, best of 5: 4.40 ms per loop
108108
109109% timeit foldedtensor.as_folded_tensor(nested_list).as_tensor()
110- # 100 loops, best of 5: 1.03 ms per loop
110+ # 100 loops, best of 5: 1.29 ms per loop
111111
112112```
113- Speedup against best alternative: ** 2.95x ** :rocket :
113+ Speedup against best alternative: ** 3.41x ** :rocket :
114114``` python
115115# Padding with 1
116116
117117% timeit torch.nested.nested_tensor([torch.FloatTensor(sub) for sub in nested_list]).to_padded_tensor(1 )
118- # 100 loops, best of 5: 3.72 ms per loop
118+ # 100 loops, best of 5: 4.77 ms per loop
119119
120120% timeit x = foldedtensor.as_folded_tensor(nested_list); x.masked_fill_(x.mask, 1 )
121- # 100 loops, best of 5: 1.62 ms per loop
121+ # 100 loops, best of 5: 1.65 ms per loop
122122
123123```
124- Speedup against best alternative: ** 2.30x ** :rocket :
124+ Speedup against best alternative: ** 2.89x ** :rocket :
125125
126126## Case 6 (2d padding)
127127
128128``` python
129129nested_list = make_nested_list(160 , (50 , 150 ), value = 1 )
130130
131131% timeit python_padding(nested_list)
132- # 100 loops, best of 5: 1.33 ms per loop
132+ # 100 loops, best of 5: 1.73 ms per loop
133133
134134% timeit torch.nested.nested_tensor([torch.LongTensor(sub) for sub in nested_list]).to_padded_tensor(0 )
135- # 100 loops, best of 5: 1.14 ms per loop
135+ # 100 loops, best of 5: 1.48 ms per loop
136136
137137% timeit torch.nn.utils.rnn.pad_sequence([torch.LongTensor(sub) for sub in nested_list], batch_first = True , padding_value = 0 )
138- # 100 loops, best of 5: 0.86 ms per loop
138+ # 100 loops, best of 5: 1.22 ms per loop
139139
140140% timeit foldedtensor.as_folded_tensor(nested_list)
141- # 100 loops, best of 5: 0.15 ms per loop
141+ # 100 loops, best of 5: 0.18 ms per loop
142142
143143```
144- Speedup against best alternative: ** 5.88x** :rocket :
144+ Speedup against best alternative: ** 6.68x** :rocket :
145+
146+ ## Case 7 (summing vectors inside each differently-sized sequence, all concatenated)
147+
148+ ``` python
149+ def sum_all_words_per_sample (t ):
150+ begins = torch.arange(len (t.lengths[1 ]))
151+ ends = begins + 1
152+ indices, offsets, spans = t.lengths.make_indices_ranges(
153+ begins = (begins,), ends = (ends,), indice_dims = (0 ,)
154+ )
155+ return torch.nn.functional.embedding_bag(
156+ input = indices,
157+ weight = t.view(- 1 , t.size(- 1 )),
158+ offsets = offsets,
159+ mode = " sum" ,
160+ )
161+
162+ embedder = torch.nn.Embedding(500 , 128 )
163+ nested_list = make_nested_list(320 , (150 , 250 ), value = 1 )
164+ ft = foldedtensor.as_folded_tensor(nested_list).refold(1 )
165+ ft = embedder(ft)
166+
167+
168+ % timeit ft.refold(0 , 1 ).sum(- 2 )
169+ # 100 loops, best of 5: 3.54 ms per loop
170+
171+ % timeit sum_all_words_per_sample(ft)
172+ # 100 loops, best of 5: 1.01 ms per loop
173+
174+ ```
175+ Speedup against pad-then-sum: ** 3.52x** :rocket :
0 commit comments