Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Nov 16, 2025

Implementation Complete: Sparse Matrix Addition

Successfully implemented the +(A::AbstractDeviceSparseMatrix, B::AbstractDeviceSparseMatrix) operation for CSC-CSC, CSR-CSR, and COO-COO formats.

Completed Tasks:

  • Implement CSC + CSC addition with kernel
    • Add kernel_count_nnz_per_col_csc! in src/matrix_csc/matrix_csc_kernels.jl
    • Add kernel_merge_csc! in src/matrix_csc/matrix_csc_kernels.jl
    • Add +(A::DeviceSparseMatrixCSC, B::DeviceSparseMatrixCSC) in src/matrix_csc/matrix_csc.jl
  • Implement CSR + CSR addition with kernel
    • Add kernel_count_nnz_per_row_csr! in src/matrix_csr/matrix_csr_kernels.jl
    • Add kernel_merge_csr! in src/matrix_csr/matrix_csr_kernels.jl
    • Add +(A::DeviceSparseMatrixCSR, B::DeviceSparseMatrixCSR) in src/matrix_csr/matrix_csr.jl
  • Implement COO + COO addition with proper duplicate merging
    • Add kernel_mark_unique_coo! in src/matrix_coo/matrix_coo_kernels.jl
    • Add kernel_compact_coo! in src/matrix_coo/matrix_coo_kernels.jl
    • Add +(A::DeviceSparseMatrixCOO, B::DeviceSparseMatrixCOO) in src/matrix_coo/matrix_coo.jl
  • Add comprehensive tests
    • Tests in test/shared/matrix_csc.jl
    • Tests in test/shared/matrix_csr.jl
    • Tests in test/shared/matrix_coo.jl
  • Fix GPU compatibility issues (scalar indexing)
  • Add benchmarks
    • benchmark_sparse_sparse_add! in benchmarks/matrix_benchmarks.jl
    • Updated benchmarks/runbenchmarks.jl
  • Remove unnecessary test/Project.toml dependency
  • All tests pass (1478 CPU tests for Base Array and JLArray backends)

Implementation Details:

CSC/CSR Formats: Two-pass merge algorithm

  1. Count non-zeros per column/row via parallel kernel (merging sorted indices)
  2. Build pointer array using cumulative sum (GPU-friendly, no scalar indexing)
  3. Merge entries from both matrices in parallel, combining values at same positions

COO Format: Sort + compact algorithm

  1. Concatenate coordinate arrays from both matrices
  2. Sort by (row, col) using lexicographic keys
  3. Mark unique entries and compute write indices
  4. Compact by merging duplicate entries (summing values)

GPU Compatibility: Avoided scalar indexing using broadcasting operations (e.g., colptr_C[1:1] .= one(Ti))

Test Results:

1478 tests passed (Base Array + JLArray backends)
✅ All sparse + sparse addition tests pass
✅ Works correctly with overlapping and non-overlapping entries
✅ Proper dimension checking and error handling
✅ GPU-compatible (no scalar indexing issues)
✅ COO format now properly merges duplicates

Original prompt

This section details on the original issue you should resolve

<issue_title>Implement sum between two AbstractDeviceSparseMatrix</issue_title>
<issue_description>Implement the method +(A::AbstractDeviceSparseMatrix, B::AbstractDeviceSparseMatrix) efficiently. Perhaps just between matrices of the same format, like CSC-CSC, CSR-CSR and COO-COO.</issue_description>

<agent_instructions>Avoid any kind of conversion to CPU. It should be efficiently oriented for GPU kernels, as the other implementations. Add related tests, following the same style of the others, and add benchmarks as well.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI changed the title [WIP] Implement sum method for AbstractDeviceSparseMatrix Implement sparse + sparse matrix addition for CSC, CSR, and COO formats Nov 16, 2025
Copilot finished work on behalf of albertomercurio November 16, 2025 12:28
Copy link
Owner

@albertomercurio albertomercurio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Overall it seems a good job. I'm wondering if we can also implement the sum among transposed/adjointed matrices, like +(A::DeviceSparseMatrixCSC, B::Transpose{T, <:DeviceSparseMatrixCSC}), and in case how to do it. Should one convert the transposed matrix to CSR? If so, can the sum be efficiently computed between a CSC and CSR?

If not, are there other direct ways to do it, like the methods done for the mul! or dot functions?

Comment on lines 372 to 391
nnz_A = nnz(A)
nnz_B = nnz(B)
nnz_total = nnz_A + nnz_B

# Allocate result arrays
rowind_C = similar(getrowind(A), nnz_total)
colind_C = similar(getcolind(A), nnz_total)
nzval_C = similar(nonzeros(A), Tv, nnz_total)

# Copy entries from A
rowind_C[1:nnz_A] .= getrowind(A)
colind_C[1:nnz_A] .= getcolind(A)
nzval_C[1:nnz_A] .= nonzeros(A)

# Copy entries from B
rowind_C[(nnz_A+1):end] .= getrowind(B)
colind_C[(nnz_A+1):end] .= getcolind(B)
nzval_C[(nnz_A+1):end] .= nonzeros(B)

return DeviceSparseMatrixCOO(m, n, rowind_C, colind_C, nzval_C)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Here you are just concatenating the nonzero values, but they can also be repeated and one should merge them instead. Could you take into account also that?

Check that you merge also the other formats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 778e354. COO addition now properly merges duplicate entries by:

  1. Concatenating and sorting by (row, col)
  2. Marking unique entries
  3. Compacting with duplicate merging using kernel_compact_coo!

All three formats (CSC, CSR, COO) now correctly merge duplicates. Verified with tests.

Copy link
Contributor Author

Copilot AI commented Nov 16, 2025

@copilot Overall it seems a good job. I'm wondering if we can also implement the sum among transposed/adjointed matrices, like +(A::DeviceSparseMatrixCSC, B::Transpose{T, <:DeviceSparseMatrixCSC}), ...

Good question! Adding support for transposed/adjointed matrices would be valuable. Here are the options:

Option 1: Convert CSC transpose to CSR (Most efficient)

  • transpose(CSC) has the same structure as CSR
  • Can directly add CSC + transpose(CSC) by treating it as CSC + CSR addition
  • Would need new kernels to merge CSC and CSR formats

Option 2: Use existing pattern from mul!/dot (Also efficient)

  • Similar to how mul! handles transposed matrices with Val{TRANS} parameters
  • Could add kernels that work directly with transposed indices
  • Keeps both matrices in original format

Option 3: Convert to same format (Simpler but less efficient)

  • Convert transpose(CSC) → CSC or CSR(transpose(CSC)) → CSR first
  • Then use existing addition kernels
  • Extra memory allocation but reuses existing code

Recommendation: Option 1 (CSC+CSR merge) or Option 2 (direct transpose handling) would be most efficient. Option 2 follows the existing mul! pattern more closely. Should I implement one of these approaches?

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • https://api.github.com/repos/FluxML/MacroTools.jl/tarball/1e0228a030642014fe5cfe68c2c0a818f9e3f522
    • Triggering command: julia --project -e using Pkg; Pkg.instantiate() (http block)
  • https://api.github.com/repos/JuliaArrays/ArrayInterface.jl/tarball/d81ae5489e13bc03567d4fbbb06c546a5e53c857
    • Triggering command: julia --project -e using Pkg; Pkg.instantiate() (http block)
  • https://api.github.com/repos/JuliaArrays/StaticArrays.jl/tarball/b8693004b385c842357406e3af647701fe783f98
    • Triggering command: julia --project -e using Pkg; Pkg.instantiate() (http block)
  • https://api.github.com/repos/JuliaArrays/StaticArraysCore.jl/tarball/6ab403037779dae8c514bad259f32a447262455a
    • Triggering command: julia --project -e using Pkg; Pkg.instantiate() (http block)
  • https://api.github.com/repos/JuliaBinaryWrappers/LLVMExtra_jll.jl/tarball/8e76807afb59ebb833e9b131ebf1a8c006510f33
    • Triggering command: julia --project=test -e using Pkg; Pkg.update() (http block)
  • https://api.github.com/repos/JuliaCollections/OrderedCollections.jl/tarball/05868e21324cede2207c6f0f466b4bfef6d5e7ee
    • Triggering command: julia --project=test -e using Pkg; Pkg.update() (http block)
  • https://api.github.com/repos/JuliaConcurrent/Atomix.jl/tarball/29bb0eb6f578a587a49da16564705968667f5fa8
    • Triggering command: julia --project -e using Pkg; Pkg.instantiate() (http block)
  • https://api.github.com/repos/JuliaConcurrent/UnsafeAtomics.jl/tarball/b13c4edda90890e5b04ba24e20a310fbe6f249ff
    • Triggering command: julia --project -e using Pkg; Pkg.instantiate() (http block)
  • https://api.github.com/repos/JuliaDebug/CodeTracking.jl/tarball/9ce926a33a8608421a4d45c012884165b3fcd3ee
    • Triggering command: julia --project=test -e using Pkg; Pkg.update() (http block)
  • https://api.github.com/repos/JuliaDebug/JuliaInterpreter.jl/tarball/b6c76964c65ebf8309460fb8f0f437b4a59d809b
    • Triggering command: julia --project=test -e using Pkg; Pkg.update() (http block)
  • https://api.github.com/repos/JuliaDebug/LoweredCodeUtils.jl/tarball/e24491cb83551e44a69b9106c50666dea9d953ab
    • Triggering command: julia --project=test -e using Pkg; Pkg.update() (http block)
  • https://api.github.com/repos/JuliaGPU/AcceleratedKernels.jl/tarball/0de01460ed11e90b42ce666c8ed0265bad59aa6a
    • Triggering command: julia --project -e using Pkg; Pkg.instantiate() (http block)
  • https://api.github.com/repos/JuliaGPU/Adapt.jl/tarball/7e35fca2bdfba44d797c53dfe63a51fabf39bfc0
    • Triggering command: julia --project -e using Pkg; Pkg.instantiate() (http block)
  • https://api.github.com/repos/JuliaGPU/GPUArrays.jl/tarball/6eff50be44a36e068a3c4b558d718feb8c178d00
    • Triggering command: julia --project=test -e using Pkg; Pkg.update() (http block)
  • https://api.github.com/repos/JuliaGPU/GPUArrays.jl/tarball/83cf05ab16a73219e5f6bd1bdfa9848fa24ac627
    • Triggering command: julia --project -e using Pkg; Pkg.instantiate() (http block)
  • https://api.github.com/repos/JuliaGPU/GPUArrays.jl/tarball/8ddb438e956891a63a5367d7fab61550fc720026
    • Triggering command: julia --project=test -e using Pkg; Pkg.update() (http block)
  • https://api.github.com/repos/JuliaGPU/KernelAbstractions.jl/tarball/b5a371fcd1d989d844a4354127365611ae1e305f
    • Triggering command: julia --project -e using Pkg; Pkg.instantiate() (http block)
  • https://api.github.com/repos/JuliaInterop/CEnum.jl/tarball/389ad5c84de1ae7cf0e28e381131c98ea87d54fc
    • Triggering command: julia --project=test -e using Pkg; Pkg.update() (http block)
  • https://api.github.com/repos/JuliaLLVM/LLVM.jl/tarball/ce8614210409eaa54ed5968f4b50aa96da7ae543
    • Triggering command: julia --project=test -e using Pkg; Pkg.update() (http block)
  • https://api.github.com/repos/JuliaLang/BaseCompiler.jl/tarball/382d79bfe72a406294faca39ef0c3cef6e6ce1f1
    • Triggering command: julia --project=test -e using Pkg; Pkg.update() (http block)
  • https://api.github.com/repos/JuliaLang/Compat.jl/tarball/9d8a54ce4b17aa5bdce0ea5c34bc5e7c340d16ad
    • Triggering command: julia --project=test -e using Pkg; Pkg.update() (http block)
  • https://api.github.com/repos/JuliaLang/JuliaSyntax.jl/tarball/0d4b3dab95018bcf3925204475693d9f09dc45b8
    • Triggering command: julia --project=test -e using Pkg; Pkg.update() (http block)
  • https://api.github.com/repos/JuliaLang/PrecompileTools.jl/tarball/07a921781cab75691315adc645096ed5e370cb77
    • Triggering command: julia --project -e using Pkg; Pkg.instantiate() (http block)
  • https://api.github.com/repos/JuliaLang/ScopedValues.jl/tarball/c3b2323466378a2ba15bea4b2f73b081e022f473
    • Triggering command: julia --project=test -e using Pkg; Pkg.update() (http block)
  • https://api.github.com/repos/JuliaPackaging/JLLWrappers.jl/tarball/0533e564aae234aff59ab625543145446d8b6ec2
    • Triggering command: julia --project=test -e using Pkg; Pkg.update() (http block)
  • https://api.github.com/repos/JuliaPackaging/Preferences.jl/tarball/0f27480397253da18fe2c12a4ba4eb9eb208bf3d
    • Triggering command: julia --project -e using Pkg; Pkg.instantiate() (http block)
  • https://api.github.com/repos/JuliaPackaging/Requires.jl/tarball/62389eeff14780bfe55195b7204c0d8738436d64
    • Triggering command: julia --project -e using Pkg; Pkg.instantiate() (http block)
  • https://api.github.com/repos/JuliaStats/Statistics.jl/tarball/ae3bb1eb3bba077cd276bc5cfc337cc65c3075c0
    • Triggering command: julia --project=test -e using Pkg; Pkg.update() (http block)
  • https://api.github.com/repos/JuliaTesting/Aqua.jl/tarball/d57fd255a8932b6509baf43284c416fc44d0b903
    • Triggering command: julia --project=test -e using Pkg; Pkg.update() (http block)
  • https://api.github.com/repos/aviatesk/JET.jl/tarball/ca8ece730f1cd51f313e37e059a7ff827ad48115
    • Triggering command: julia --project=test -e using Pkg; Pkg.update() (http block)
  • https://api.github.com/repos/jw3126/ArgCheck.jl/tarball/f9e9a66c9b7be1ad7372bbd9b062d9230c30c5ce
    • Triggering command: julia --project -e using Pkg; Pkg.instantiate() (http block)
  • https://api.github.com/repos/simonster/Reexport.jl/tarball/45e428421666073eab6f2da5c9d310d99bb12f9b
    • Triggering command: julia --project=test -e using Pkg; Pkg.update() (http block)
  • https://api.github.com/repos/timholy/Revise.jl/tarball/85d94c2be31f58728cd69d13f2e0bdd7ecf6dfe9
    • Triggering command: julia --project=test -e using Pkg; Pkg.update() (http block)
  • https://api.github.com/repos/vchuravy/HashArrayMappedTries.jl/tarball/2eaa69a7cab70a52b9687c8bf950a5a93ec895ae
    • Triggering command: julia --project=test -e using Pkg; Pkg.update() (http block)
  • pkg.julialang.org
    • Triggering command: julia --project -e using Pkg; Pkg.instantiate() (dns block)
    • Triggering command: julia --project=test -e using Pkg; Pkg.update() (dns block)
    • Triggering command: julia --project -e using Pkg; Pkg.update(); Pkg.test() (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot finished work on behalf of albertomercurio November 16, 2025 15:50
@codecov
Copy link

codecov bot commented Nov 16, 2025

Codecov Report

❌ Patch coverage is 93.44978% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.04%. Comparing base (59526b2) to head (778e354).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/matrix_coo/matrix_coo_kernels.jl 65.00% 7 Missing ⚠️
src/matrix_csc/matrix_csc_kernels.jl 93.44% 4 Missing ⚠️
src/matrix_csr/matrix_csr_kernels.jl 93.44% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #27      +/-   ##
==========================================
+ Coverage   82.64%   85.04%   +2.40%     
==========================================
  Files          14       14              
  Lines         801     1030     +229     
==========================================
+ Hits          662      876     +214     
- Misses        139      154      +15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@albertomercurio albertomercurio marked this pull request as ready for review November 16, 2025 22:41
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Benchmark Results'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.30.

Benchmark suite Current: 778e354 Previous: 862b4d5 Ratio
Kronecker Product/JLArray/CSC 16593217 ns 12002090.5 ns 1.38
Kronecker Product/JLArray/COO 3484290 ns 2334051 ns 1.49
Kronecker Product/JLArray/CSR 16622938 ns 11984809 ns 1.39

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement sum between two AbstractDeviceSparseMatrix

2 participants