Skip to content

Accept AbstractSparseMatrixCSC for decompression#298

Merged
gdalle merged 10 commits intoJuliaDiff:mainfrom
blegat:abstractsparse
Mar 20, 2026
Merged

Accept AbstractSparseMatrixCSC for decompression#298
gdalle merged 10 commits intoJuliaDiff:mainfrom
blegat:abstractsparse

Conversation

@blegat
Copy link
Contributor

@blegat blegat commented Jan 19, 2026

Here is a suggestion for #296 (comment)
I can write tests if the change sounds reasonable

@codecov
Copy link

codecov bot commented Jan 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.52%. Comparing base (0996030) to head (e93a786).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #298      +/-   ##
==========================================
- Coverage   99.39%   95.52%   -3.87%     
==========================================
  Files          20       20              
  Lines        2145     2145              
==========================================
- Hits         2132     2049      -83     
- Misses         13       96      +83     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@gdalle gdalle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there are any API specifications on SparseArrays.AbstractSparseMatrixCSC, and since we make heavy use of the fields of SparseMatrixCSC, this seems like a risky move. Can you explain where it comes in handy?

@blegat
Copy link
Contributor Author

blegat commented Jan 20, 2026

Yes, the decompression relies on a few properties of SparseMatrixCSC. It seems that it needs to be able to

  • Get the size
  • Get colptr, currently it is accessing it directly via the field but it could call getcolptr instead
  • Call nonzeros

In my case, the nonzeros is a view and I want to avoid having to copy it to a Vector so that it fits in a SparseMatrixCSC. So I define a custom AbstractSparseMatrixCSC:
https://github.com/blegat/ArrayDiff.jl/pull/23/files#diff-47c27891e951c8cd946b850dc2df31082624afdf57446c21cb6992f5f4b74aa2R207-R215

The properties that the implementation relies on are that getcolptr is the colptr if a CSC representation, that seems safe to assume and that the nonzeros are sorted in increasing row indices (per column), which is perhaps the one more risky but we may also just consider that it's part of the definition of the CSC format.
The other option would be defining a custom struct in this package (so basically just copy-pasting the one I did in ArrayDiff). Then there is the risk of someone needed yet another struct.
Both options look fine to me, let me know which one you prefer ;)

@gdalle
Copy link
Member

gdalle commented Jan 29, 2026

The properties that the implementation relies on are that getcolptr is the colptr if a CSC representation, that seems safe to assume and that the nonzeros are sorted in increasing row indices (per column), which is perhaps the one more risky but we may also just consider that it's part of the definition of the CSC format.

There are CSC matrices in the ecosystem where the sorting is not enforced, see eg JuliaGPU/GPUArrays.jl#648. The question is whether there could be subtypes of SparseArrays.AbstractSparseMatrixCSC where that is also the case, and the answer is probably: since there is no documentation, anything goes?

Given that uncertainty, I kinda like the suggestion you made in #296 to split out a decompression subroutine that uses an AbstractVector (probably with 1-based, contiguous indexing checked upfront).

@blegat
Copy link
Contributor Author

blegat commented Jan 29, 2026

Reading more the implementation of the decompression, it seemed that it was also using the colptr vector while I initially though it was only using the nonzeros when I wrote that comment. Hence the change of mind with this PR. The implementation of the decompression is however not reading the rowvals since it assumes that it is sorted. So one option could be to pass along both the colptr and the nonzeros.
Another alternative is for SMC to define a new function is_rowvals_increasing_per_column with a default implementation returning false, implement it for SparseMatrixCSC and then ArrayDiff could implement it as well for its custom sparse matrix. The, we can check it and error if it returns false. Let me know what you prefer

@blegat blegat requested a review from gdalle February 12, 2026 15:07
@gdalle
Copy link
Member

gdalle commented Feb 18, 2026

I think I'd rather add an internal indirection step like

function decompress!(A::SparseMatrixCSC, ...)
    return decompress_csc!(A.nzval, A.rowval, A.colptr, A.m, A.n, ...)
end

than rely on an incompletely specified AbstractSparseMatrixCSC API. You can then use decompress_csc! in ArrayDiff, until we figure out something cleaner?

blegat and others added 2 commits March 9, 2026 15:07
Co-authored-by: Guillaume Dalle <22795598+gdalle@users.noreply.github.com>
@codecov
Copy link

codecov bot commented Mar 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.20%. Comparing base (f56da29) to head (2fd25ee).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #298      +/-   ##
==========================================
+ Coverage   95.52%   99.20%   +3.68%     
==========================================
  Files          20       20              
  Lines        2145     2145              
==========================================
+ Hits         2049     2128      +79     
+ Misses         96       17      -79     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@gdalle gdalle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Since this only affects private internals, I see no reason to delay it any further. I'll merge and release so that you may experiment with it, then we'll see whether we need to extend what you coded to other result types

@gdalle gdalle merged commit 9aebbe6 into JuliaDiff:main Mar 20, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants