Skip to content

Support sparse matrices in vectorize_counts_and_tree#1

Open
r0hansaxena wants to merge 10 commits intomainfrom
sparse-fix
Open

Support sparse matrices in vectorize_counts_and_tree#1
r0hansaxena wants to merge 10 commits intomainfrom
sparse-fix

Conversation

@r0hansaxena
Copy link
Copy Markdown
Owner

@r0hansaxena r0hansaxena commented Mar 10, 2026

Please complete the following checklist:

  • I have read the contribution guidelines.

  • I have documented all public-facing changes in the changelog.

  • This pull request includes code, documentation, or other content derived from external source(s). If this is the case, ensure the external source's license is compatible with scikit-bio's license. Include the license in the licenses directory and add a comment in the code giving proper attribution. Ensure any other requirements set forth by the license and/or author are satisfied.

    • It is your responsibility to disclose code, documentation, or other content derived from external source(s). If you have questions about whether something can be included in the project or how to give proper attribution, include those questions in your pull request and a reviewer will assist you.
  • This pull request does not include code, documentation, or other content derived from external source(s).

Note: This document may also be helpful to see some of the things code reviewers will be verifying when reviewing your pull request.


Description

Adds support for sparse matrix inputs to vectorize_counts_and_tree, improving memory efficiency when working with large biological datasets. Internally routes to a new sparse-aware helper that uses scipy sparse operations instead of dense matrix expansion

r0hansaxena and others added 9 commits March 11, 2026 00:06
* Update action versions in gh action workflows

* Specify cibuildwheel version
)

* ENH: add graphembed_rs wrapping for network embedding (scikit-bio#2212)

* Fix lint errors in _graphembed.py

* ENH: Parse string taxonomies into tree

* Add changelog entry for string parsing (scikit-bio#2406)

* ENH: Generalize taxonomy parsing in TreeNode.from_taxonomy

* Restrict rank regex to standard prefixes and add extract_rank option

* Update tests to Greengenes2 nomenclature and test extract_rank

* fix Greengenes

* docs: generalize taxonomy parsing description

* Improve taxonomy rank extraction and null handling in from_taxonomy

* cleaning up graphembed commits
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants