Correct me if I'm wrong - but I recall that @thomasloux performed some benchmarking and concluded that storing the system_idx as a sparse tensor performs just as well as storing system_idx as a dense tensor (I think on scatter_sum operations).
We should probably make it a sparse tensor because:
- It saves memory
- It might make some operations cheaper to perform