added random_landmarking support for precomputed distance/affinity#88
added random_landmarking support for precomputed distance/affinity#88MattScicluna wants to merge 5 commits intomasterfrom
Conversation
Pull Request Test Coverage Report for Build 20349244901Details
💛 - Coveralls |
There was a problem hiding this comment.
Pull request overview
This PR adds support for random landmarking with precomputed distance and affinity matrices, addressing issue #87. Previously, random landmarking only worked with raw data using Euclidean distance computations. Now users can provide precomputed distance or affinity matrices and use random landmarking for landmark-based dimensionality reduction.
Key changes:
- Extended
build_landmark_op()to detect and handle precomputed matrices by using affinity-based clustering instead of distance-based clustering - Added test coverage for random landmarking with precomputed affinity matrices
- Preserved existing functionality for non-precomputed graphs
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| graphtools/graphs.py | Modified build_landmark_op() method to check for precomputed matrices and use kernel-based affinity clustering (argmax) instead of distance-based clustering (argmin) when precomputed matrices are detected |
| test/test_random_landmarking.py | Added new test test_random_landmarking_with_precomputed_affinity() to verify random landmarking works correctly with precomputed affinity matrices |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def test_random_landmarking_with_precomputed_affinity(): | ||
| """Random landmarking should work with precomputed affinity matrices""" | ||
| affinity = np.array( | ||
| [ | ||
| [1.0, 0.8, 0.1, 0.0, 0.0, 0.0], | ||
| [0.8, 1.0, 0.2, 0.0, 0.0, 0.0], | ||
| [0.1, 0.2, 1.0, 0.9, 0.4, 0.0], | ||
| [0.0, 0.0, 0.9, 1.0, 0.5, 0.2], | ||
| [0.0, 0.0, 0.4, 0.5, 1.0, 0.9], | ||
| [0.0, 0.0, 0.0, 0.2, 0.9, 1.0], | ||
| ] | ||
| ) | ||
| affinity = (affinity + affinity.T) / 2 # ensure symmetry | ||
| n_landmark = 3 | ||
| random_state = 42 | ||
|
|
||
| G = graphtools.Graph( | ||
| affinity, | ||
| precomputed="affinity", | ||
| n_landmark=n_landmark, | ||
| random_landmarking=True, | ||
| random_state=random_state, | ||
| knn=3, | ||
| thresh=0, | ||
| ) | ||
|
|
||
| # Trigger landmark construction | ||
| _ = G.landmark_op | ||
|
|
||
| rng = np.random.default_rng(random_state) | ||
| landmark_indices = rng.choice(affinity.shape[0], n_landmark, replace=False) | ||
| expected_clusters = np.asarray( | ||
| G.kernel[:, landmark_indices].argmax(axis=1) | ||
| ).reshape(-1) | ||
|
|
||
| assert np.array_equal(G.clusters, expected_clusters) | ||
| assert G.transitions.shape == (affinity.shape[0], n_landmark) | ||
| assert G.landmark_op.shape == (n_landmark, n_landmark) | ||
|
|
There was a problem hiding this comment.
While this test covers precomputed="affinity", there is no test coverage for precomputed="distance". Since the implementation in graphs.py handles precomputed distance matrices by converting them to affinity matrices in build_kernel(), it would be valuable to add a similar test case that verifies random landmarking works correctly with precomputed distance matrices as well.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def test_random_landmarking_with_precomputed_affinity(): | ||
| """Random landmarking should work with precomputed affinity matrices""" | ||
| affinity = np.array( | ||
| [ | ||
| [1.0, 0.8, 0.1, 0.0, 0.0, 0.0], | ||
| [0.8, 1.0, 0.2, 0.0, 0.0, 0.0], | ||
| [0.1, 0.2, 1.0, 0.9, 0.4, 0.0], | ||
| [0.0, 0.0, 0.9, 1.0, 0.5, 0.2], | ||
| [0.0, 0.0, 0.4, 0.5, 1.0, 0.9], | ||
| [0.0, 0.0, 0.0, 0.2, 0.9, 1.0], | ||
| ] | ||
| ) | ||
| affinity = (affinity + affinity.T) / 2 # ensure symmetry | ||
| n_landmark = 3 | ||
| random_state = 42 | ||
|
|
||
| G = graphtools.Graph( | ||
| affinity, | ||
| precomputed="affinity", | ||
| n_landmark=n_landmark, | ||
| random_landmarking=True, | ||
| random_state=random_state, | ||
| knn=3, | ||
| thresh=0, | ||
| ) | ||
|
|
||
| # Trigger landmark construction | ||
| _ = G.landmark_op | ||
|
|
||
| rng = np.random.default_rng(random_state) | ||
| landmark_indices = rng.choice(affinity.shape[0], n_landmark, replace=False) | ||
| expected_clusters = np.asarray( | ||
| G.kernel[:, landmark_indices].argmax(axis=1) | ||
| ).reshape(-1) | ||
|
|
||
| assert np.array_equal(G.clusters, expected_clusters) | ||
| assert G.transitions.shape == (affinity.shape[0], n_landmark) | ||
| assert G.landmark_op.shape == (n_landmark, n_landmark) | ||
|
|
There was a problem hiding this comment.
The test only covers dense precomputed affinity matrices. Consider adding test coverage for sparse precomputed affinity matrices to ensure the sparse matrix handling code path (lines 1210-1217 in graphs.py) works correctly with random landmarking.
| def test_random_landmarking_with_precomputed_distance(): | ||
| """Random landmarking should work with precomputed distance matrices""" | ||
| dist = np.array( | ||
| [ | ||
| [0, 1, 4, 4, 4, 4], | ||
| [1, 0, 4, 4, 4, 4], | ||
| [4, 4, 0, 1, 4, 4], | ||
| [4, 4, 1, 0, 4, 4], | ||
| [4, 4, 4, 4, 0, 1], | ||
| [4, 4, 4, 4, 1, 0], | ||
| ] | ||
| ) | ||
|
|
||
| n_landmark = 3 | ||
| random_state = 42 | ||
|
|
||
| G = graphtools.Graph( | ||
| dist, | ||
| precomputed="distance", | ||
| n_landmark=n_landmark, | ||
| random_landmarking=True, | ||
| random_state=random_state, | ||
| bandwidth=1, # deterministic affinity: exp(-dist) | ||
| decay=1, | ||
| thresh=0, | ||
| knn=3, | ||
| ) | ||
|
|
||
| # Trigger landmark construction | ||
| _ = G.landmark_op | ||
|
|
||
| rng = np.random.default_rng(random_state) | ||
| landmark_indices = rng.choice(dist.shape[0], n_landmark, replace=False) | ||
| expected_clusters = np.asarray( | ||
| G.kernel[:, landmark_indices].argmax(axis=1) | ||
| ).reshape(-1) | ||
|
|
||
| assert np.array_equal(G.clusters, expected_clusters) | ||
| assert G.transitions.shape == (dist.shape[0], n_landmark) | ||
| assert G.landmark_op.shape == (n_landmark, n_landmark) | ||
|
|
There was a problem hiding this comment.
The test only covers dense precomputed distance matrices. Consider adding test coverage for sparse precomputed distance matrices to ensure the sparse matrix code path works correctly with random landmarking.
| def test_random_landmarking_with_precomputed_affinity(): | ||
| """Random landmarking should work with precomputed affinity matrices""" | ||
| affinity = np.array( | ||
| [ | ||
| [1.0, 0.8, 0.1, 0.0, 0.0, 0.0], | ||
| [0.8, 1.0, 0.2, 0.0, 0.0, 0.0], | ||
| [0.1, 0.2, 1.0, 0.9, 0.4, 0.0], | ||
| [0.0, 0.0, 0.9, 1.0, 0.5, 0.2], | ||
| [0.0, 0.0, 0.4, 0.5, 1.0, 0.9], | ||
| [0.0, 0.0, 0.0, 0.2, 0.9, 1.0], | ||
| ] | ||
| ) | ||
| affinity = (affinity + affinity.T) / 2 # ensure symmetry | ||
| n_landmark = 3 | ||
| random_state = 42 | ||
|
|
||
| G = graphtools.Graph( | ||
| affinity, | ||
| precomputed="affinity", | ||
| n_landmark=n_landmark, | ||
| random_landmarking=True, | ||
| random_state=random_state, | ||
| knn=3, | ||
| thresh=0, | ||
| ) | ||
|
|
||
| # Trigger landmark construction | ||
| _ = G.landmark_op | ||
|
|
||
| rng = np.random.default_rng(random_state) | ||
| landmark_indices = rng.choice(affinity.shape[0], n_landmark, replace=False) | ||
| expected_clusters = np.asarray( | ||
| G.kernel[:, landmark_indices].argmax(axis=1) | ||
| ).reshape(-1) | ||
|
|
||
| assert np.array_equal(G.clusters, expected_clusters) | ||
| assert G.transitions.shape == (affinity.shape[0], n_landmark) | ||
| assert G.landmark_op.shape == (n_landmark, n_landmark) | ||
|
|
There was a problem hiding this comment.
Consider adding a test case for the edge case where some samples have zero affinity to all randomly selected landmarks. This would test the warning logic implemented in lines 1223-1228 of graphs.py, which is currently not covered by these tests.
graphtools/graphs.py
Outdated
| precomputed = getattr(self, "precomputed", None) | ||
|
|
||
| if precomputed is not None: | ||
| # Use the precomputed affinities/distances directly to avoid Euclidean fallback |
There was a problem hiding this comment.
The comment could be clearer. It says "Use the precomputed affinities/distances directly" but the code actually uses self.kernel, which is always an affinity matrix (distances are converted to affinities in build_kernel). Consider updating to "Use affinities from the kernel computed from the precomputed matrix" for clarity.
| # Use the precomputed affinities/distances directly to avoid Euclidean fallback | |
| # Use affinities from the kernel computed from the precomputed matrix to avoid Euclidean fallback |
|
@copilot open a new pull request to apply changes based on the comments in this thread |
|
@MattScicluna I've opened a new pull request, #89, to work on those changes. Once the pull request is ready, I'll request review from you. |
…prove comment clarity Co-authored-by: MattScicluna <19255250+MattScicluna@users.noreply.github.com>
Add sparse matrix and edge case test coverage for random landmarking with precomputed matrices
close #87