The NaiveIterator may benefit from numba.jit because of the sequential loop.
The VectorizedCumSum may benefit from a pure numpy implementation avoiding the overhead of creating multiple pd.Series etc. The groupby.sum may be replaced via np.unique in combination with np.add.reduceat.
However, the benchmark utility is required first, as addressed here #3.