You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: c_reference/include/conv1d.h
+17-13
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@
8
8
9
9
NOTES for the conv layers
10
10
-> The conv1d & conv1d_lr layers work for all cases and can be used unconstrained.
11
-
There are no hard constraints for the parallel version, but a points regarding the optimal usage are given below
11
+
There are no hard constraints for the parallel version, but a few points regarding its optimal usage are given below
12
12
-> Dilation = 1 (no dilation) for all cases
13
13
-> For the non-depthwise cases, store the matrices as described below. Permutation might be necessary
14
14
-> The low-rank decomposition cannot be applied to the depthwise weight matrices. This is due to the out_channels/in_channels = 0 constarint imposed by the depthwise convolution.
@@ -22,10 +22,10 @@
22
22
23
23
Important points regarding parallel versions
24
24
-> Due to the above reason, the parallel layers is only recommended for large in_time inputs
25
-
This should typically be for in_time (without the padding) > 2 * (kernel_size + stride). Else there would not be enough time-steps to efficiently parallelize
26
-
For other shorter input cases, the code will skip the MatMul computation and use MatVec instead (but the MatMul-variable computation overhead would remain)
27
-
For such cases, the MatVec code (conv1d and conv1d_lr) would work more efficiently
28
-
The RAM usage would be lower and the function would not have any overheads (calculation of the iterators and MatMul-auxiliary variables)
25
+
This should typically be for in_time (without the padding) > 2 * num_steps_one_row + stride. Else there would not be enough time-steps to efficiently parallelise
26
+
We need at least 2 rows for a good a MatMul performace. In the worst case the starting time step would be (stride - 1). Hence we choose 2 * num_steps_one_row + stride as the threshold
27
+
For the short input cases, the code will skip the MatMul computation and use MatVec instead (but the MatMul-variable computation overhead would remain)
28
+
For such cases, the MatVec code (conv1d and conv1d_lr) would work more efficiently due to the lower RAM usage and lack of any major overheads
29
29
-> There is no support for depthwise for conv1d_parallel
30
30
The regular convolution acts on all the channels while the depthwise acts only on one channel at a time
31
31
This results in a non-contiguos memory access. MatMul would need to process multiple such time-steps, while the MatVec would only need to process one
0 commit comments