Several data structures in the HLSCM decimation and solver pipeline use heap-heavy containers where cache-friendly flat arrays would be more efficient. Identified during code review of PR #44.
Items
High priority
- UV storage (
solveLSCMLevel/prolongateUVs): unordered_map<size_t, array<T,2>> used throughout — individually heap-allocated entries with cache-hostile random access and a full map copy during prolongation. Vertex indices are dense integers; vector<array<T,2>> with a SIZE_MAX/NaN sentinel would be far more cache-efficient.
vertexNeighbors() after collapse: Allocates a fresh vector after each successful collapse to discover new PQ edges. The same neighbor data is already computed inside tryCollapse via fillSortedNeighbors. Should return neighbors as part of the collapse result to avoid redundant work.
Medium priority
HierarchyLevel::originalToLocal: unordered_map<size_t,size_t> queried per-vertex in snapshot() and solveLSCMLevel. A vector<size_t> indexed by original vertex index (with SIZE_MAX sentinel) would be O(1) with better cache behavior.
buildLevelMesh: Converts array<size_t,3> faces to vector<vector<size_t>> per face just to call insert_faces(). Creates 500K+ small heap allocations at the finest level. Either add an insert_faces overload accepting array<size_t,3> spans or use a flat buffer.
buildEdges_(): Rebuilds the full edge set from scratch at each hierarchy level by iterating all faces. Edges could be maintained incrementally during collapses using the already-computed neighbor results from vertexNeighbors.
Expected impact
On meshes with 500K+ vertices, this eliminates millions of heap allocations per hierarchy level, reducing decimation and solver time significantly.
Several data structures in the HLSCM decimation and solver pipeline use heap-heavy containers where cache-friendly flat arrays would be more efficient. Identified during code review of PR #44.
Items
High priority
solveLSCMLevel/prolongateUVs):unordered_map<size_t, array<T,2>>used throughout — individually heap-allocated entries with cache-hostile random access and a full map copy during prolongation. Vertex indices are dense integers;vector<array<T,2>>with a SIZE_MAX/NaN sentinel would be far more cache-efficient.vertexNeighbors()after collapse: Allocates a freshvectorafter each successful collapse to discover new PQ edges. The same neighbor data is already computed insidetryCollapseviafillSortedNeighbors. Should return neighbors as part of the collapse result to avoid redundant work.Medium priority
HierarchyLevel::originalToLocal:unordered_map<size_t,size_t>queried per-vertex insnapshot()andsolveLSCMLevel. Avector<size_t>indexed by original vertex index (withSIZE_MAXsentinel) would be O(1) with better cache behavior.buildLevelMesh: Convertsarray<size_t,3>faces tovector<vector<size_t>>per face just to callinsert_faces(). Creates 500K+ small heap allocations at the finest level. Either add aninsert_facesoverload acceptingarray<size_t,3>spans or use a flat buffer.buildEdges_(): Rebuilds the full edge set from scratch at each hierarchy level by iterating all faces. Edges could be maintained incrementally during collapses using the already-computed neighbor results fromvertexNeighbors.Expected impact
On meshes with 500K+ vertices, this eliminates millions of heap allocations per hierarchy level, reducing decimation and solver time significantly.