Threaded mesh topology construction improvements#4083
Conversation
schnellerhase
left a comment
There was a problem hiding this comment.
Removes thread spawning for num_threads=1, recovering previous sequential execution path. New sorting algorithm affects non-threaded execution path.
| const common::IndexMap& vertex_index_map) | ||
| auto build_entity_list | ||
| = [](std::span<std::int32_t> entity_list, | ||
| std::span<std::int32_t> entity_list_sorted, auto&& cell_idx, |
There was a problem hiding this comment.
Restrict to std::int32_t ranges.
| int num_entities_per_cell = cell_type_entities[k].size(); | ||
| std::vector<std::jthread> threads(num_threads); | ||
| for (int i = 0; i < num_threads; ++i) | ||
| if (num_threads > 1) |
There was a problem hiding this comment.
Should allow for compile time deduction for no threading configuration. Resolvable with #3716.
There was a problem hiding this comment.
No need to overcomplicate - it will have no measurable runtime effect.
There was a problem hiding this comment.
Not about performance. It ensures optionality of the functionality. When passing num_threads around explicitly this needs to be deduced from the associated type.
There was a problem hiding this comment.
I don't follow. The if/else switch isn't inside a hot loop, so what's the issue?
There was a problem hiding this comment.
It would be good to maintain optionality of the threading support. We can (should) not change function signatures with an altered configuration, so this would need to be indicated by the type of the argument.
There was a problem hiding this comment.
It is optional now - '0' is no thread launch.
| std::vector<std::int32_t> sort_order( | ||
| entity_list_sorted.size() / num_vertices_per_entity, 0); | ||
| std::iota(sort_order.begin(), sort_order.end(), 0); | ||
| boost::sort::parallel_stable_sort( |
There was a problem hiding this comment.
Adds new library dependency to Boost.Sort.sort_by_perm in particular avoids lexicographical_compare - performance impact?
There was a problem hiding this comment.
Significantly faster (several factors)
… garth/mesh-hot-loops
dolfinx::sort_permdidn't respectBITStemplate parameter. PR fixes this, which gives a significant performance boost for some cases.