Auto select CAGRA build algorithm for hnsw::build#1719
Auto select CAGRA build algorithm for hnsw::build#1719tfeher wants to merge 10 commits intorapidsai:mainfrom
Conversation
bb78635 to
23a0b16
Compare
| size_t total_host = | ||
| graph_host_mem + host_workspace_size + 2e9; // added 2 GB extra workspace (IVF-PQ search) | ||
| size_t total_dev = | ||
| std::max(dataset_gpu_mem, gpu_workspace_size) + 1e9; // addet 1 GB extra workspace size |
There was a problem hiding this comment.
This is still optimistic, we need to update and test before the PR can be merged.
| hnsw::index_params params; | ||
| params.M = 24; | ||
| params.ef_construction = 200; | ||
| params.hierarchy = cuvs::neighbors::hnsw::HnswHierarchy::GPU; |
| int64_t topk = 12; | ||
|
|
||
| // HNSW index parameters | ||
| hnsw::index_params params; |
There was a problem hiding this comment.
We need to figure out how to handle ace.build_dir in this setup: the user does not set ace_params. The algorithm is automatically selected. But if we happen to choose ace, then we need to know which disk space to use. Do you have suggestions @julianmi?
There was a problem hiding this comment.
I think this is a somewhat challenging problem. In general, I agree with @benfred's comment that hardcoding it is not a good approach.
I think the two most important properties are available disk space and speed. How about a layered approach:
- Environment variable. E.g.,
CUVS_ACE_BUILD_DIR. - We could read
/sys/blockon Linux to query for fast disks (NVMe > SSD > HDD) and check for free size. - Fall back to
/tmpor a generated temporary path like @benfred suggested.
There was a problem hiding this comment.
I agree with a layered approach using only (1) and (3) which are both agreed on by the user to write to -- I don't think we can just write to a drive without user consent.
There was a problem hiding this comment.
I agree that (2) is problematic. But (3) has similar problems, right? /tmp could be very small or on a slow disk.
Another approach I see would be to fail with a helpful message that a disk will be used given the graph size and the user should provide a suitable directory.
In general, I am not sure if environment variables are a good approach since the project does not use them a lot. @tfeher, @cjnolet What do you think?
mfoerste4
left a comment
There was a problem hiding this comment.
I did not go over all memory estimates in detail but suggest to align predictions with real data.
Is autotuning of ACE params part of a different PR? Besides the open question on the file location we might want to at least set the number of partitions dynamically.
| raft::make_host_matrix_view<const T, int64_t>(dataset, nrow, this->dim_)); | ||
| } | ||
|
|
||
| auto dataset_view = raft::make_host_matrix_view<const T, int64_t>(dataset, nrow, this->dim_); |
There was a problem hiding this comment.
Is the data expected to always reside in host memory?
There was a problem hiding this comment.
ACE only supports host memory right now. The main reasons is that we expect the data size to be large and memory-mapped. Further, we do the partitioning and reordering on the host since there is no benefit of moving it to the GPU only to write it to disk afterwards.
Anyways, I think we can support device datasets easily since these should not end up using ACE with this heuristic. @tfeher What do you think?
| namespace helpers { | ||
| /** Calculates the workspace for graph optimization | ||
| * | ||
| * @param[in] n_rows number of rows in the dataset (or number of points in the grapt) |
There was a problem hiding this comment.
| * @param[in] n_rows number of rows in the dataset (or number of points in the grapt) | |
| * @param[in] n_rows number of rows in the dataset (or number of points in the graph) |
There was a problem hiding this comment.
Also mst_optimize is not documented
| // ACE build and search example. | ||
| cagra_build_search_ace(res); |
There was a problem hiding this comment.
maybe we want to rename this to something generic now that the selection is hidden from the user.
| int64_t topk = 12; | ||
|
|
||
| // HNSW index parameters | ||
| hnsw::index_params params; |
There was a problem hiding this comment.
I agree with a layered approach using only (1) and (3) which are both agreed on by the user to write to -- I don't think we can just write to a drive without user consent.
| // Configure ACE parameters for CAGRA | ||
| cuvs::neighbors::cagra::graph_build_params::ace_params cagra_ace_params; | ||
| cagra_ace_params.npartitions = ace_params.npartitions; | ||
| cagra_ace_params.ef_construction = params.ef_construction; | ||
| cagra_ace_params.build_dir = ace_params.build_dir; | ||
| cagra_ace_params.use_disk = ace_params.use_disk; | ||
| cagra_ace_params.max_host_memory_gb = ace_params.max_host_memory_gb; | ||
| cagra_ace_params.max_gpu_memory_gb = ace_params.max_gpu_memory_gb; | ||
| cagra_params.graph_build_params = cagra_ace_params; |
There was a problem hiding this comment.
Are you planning to add a heuristic for npartitions depending on dimensions here as well?
julianmi
left a comment
There was a problem hiding this comment.
I did not get a chance to fully review the memory heuristics yet. I wonder how we can test it though. Should max_host_memory_gb and max_gpu_memory_gb be optional HNSW parameters that we could use to test that the expected algorithm is used based on memory limits set?
| constexpr static uint32_t kIndexGroupSize = 32; | ||
| constexpr static uint32_t kIndexGroupVecLen = 16; | ||
|
|
||
| std::cout << "pq_dim " << params.pq_dim << ", pq_bits " << params.pq_bits << ", n_lists" |
There was a problem hiding this comment.
Is there a specific reason not to use RAFT_LOG_INFO here and in the following lines?
| } | ||
| } | ||
|
|
||
| inline std::pair<size_t, size_t> get_available_memory( |
There was a problem hiding this comment.
Should this helper be placed in cuvs::util:: like get_free_host_memory?
| }(); | ||
|
|
||
| RAFT_LOG_DEBUG("# Building IVF-PQ index %s", model_name.c_str()); | ||
| RAFT_LOG_INFO("# Building IVF-PQ index %s", model_name.c_str()); |
There was a problem hiding this comment.
Was this and the following logging changes intentionally? Logging every 10 seconds might write a lot of output on a large run.
| size_t total_dev = | ||
| std::max(dataset_gpu_mem, gpu_workspace_size) + 1e9; // addet 1 GB extra workspace size | ||
|
|
||
| std::cout << "IVF-PQ build memory requirements\ndataset_gpu " << dataset_gpu_mem / 1e9 << " GB" |
There was a problem hiding this comment.
Similar comment about using RAFT_LOG_INFO.
# Conflicts: # cpp/include/cuvs/neighbors/ivf_pq.hpp
Co-authored-by: Julian Miller <mail@julian-miller.de>
Co-authored-by: Julian Miller <mail@julian-miller.de>
… data file while tracking memory usage
Configuring HNSW graph build using CAGRA is complicated, because CAGRA offers multiple build algorithms. This PR implements an automatic algorithm selection. The goal is to have a simplified API, where the user needs to set only two parameters that control graph size and quality (
Mandef_constructionrespectively). This shall be familiar for HNSW users, and allows easier adaption of cuvs accelerated HNSW graph building.If we have enough memory (host and GPU) to do both the KNN graph building and optimization in memory, then we choose in memory build, and let
cagra::index_params::from_hnsw_paramsderive the additional configuration parameters.If the build would require more memory then available, then we choose ACE method and let the number of partitions derived using #1603.
For host we query the os for available memory, for GPU it is assumed that the whole device memory is available.