chore: custom kernel launcher API to remove macro by 0ax1 · Pull Request #6112 · vortex-data/vortex

0ax1 · 2026-01-22T22:11:32Z

@joseph-isaacs claude draft, wdyt? I'll do a clean up if we think this is the right direction.

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>

codspeed-hq · 2026-01-22T22:17:41Z

CodSpeed Performance Report

Merging this PR will degrade performance by 29.68%

_{Comparing ad/new-launch-kernel-fn (5cd76db) with develop (238d063)}

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

⚡ 3 improved benchmarks
❌ 15 regressed benchmarks
✅ 1256 untouched benchmarks
⏩ 1254 skipped benchmarks¹

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	WallTime	`u64_FoR[10K]`	9.7 µs	13.8 µs	-29.68%
❌	WallTime	`u16_FoR[1K]`	5.6 µs	6.9 µs	-17.79%
❌	WallTime	`u16_FoR[10M]`	9.7 µs	11.4 µs	-14.41%
⚡	Simulation	`canonical_into_nullable[(10000, 10, 0.0)]`	528.8 µs	444.6 µs	+18.93%
❌	Simulation	`into_canonical_non_nullable[(10000, 100, 0.01)]`	2.2 ms	3 ms	-26.73%
❌	Simulation	`into_canonical_non_nullable[(10000, 100, 0.0)]`	1.9 ms	2.7 ms	-29.35%
❌	Simulation	`into_canonical_non_nullable[(10000, 100, 0.1)]`	3.8 ms	4.6 ms	-17.82%
⚡	Simulation	`into_canonical_nullable[(10000, 10, 0.0)]`	537.4 µs	452.3 µs	+18.82%
⚡	Simulation	`into_canonical_nullable[(10000, 10, 0.1)]`	710.5 µs	632.3 µs	+12.37%
❌	Simulation	`into_canonical_nullable[(10000, 100, 0.0)]`	4.4 ms	5.2 ms	-15.77%
❌	Simulation	`patched_take_10k_contiguous_not_patches`	1.2 ms	1.4 ms	-10.19%
❌	Simulation	`canonical_into_non_nullable[(10000, 1, 0.01)]`	36 µs	44.2 µs	-18.45%
❌	Simulation	`patched_take_10k_contiguous_patches`	2 ms	2.5 ms	-16.83%
❌	Simulation	`canonical_into_non_nullable[(10000, 1, 0.1)]`	52 µs	60.2 µs	-13.71%
❌	Simulation	`canonical_into_non_nullable[(10000, 1, 0.0)]`	30.9 µs	39 µs	-20.79%
❌	Simulation	`canonical_into_non_nullable[(10000, 100, 0.0)]`	1.9 ms	2.7 ms	-29.5%
❌	Simulation	`canonical_into_non_nullable[(10000, 100, 0.01)]`	2.1 ms	2.9 ms	-27.4%
❌	Simulation	`canonical_into_non_nullable[(10000, 100, 0.1)]`	3.7 ms	4.5 ms	-18.03%

1254 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

joseph-isaacs · 2026-01-23T10:37:31Z

vortex-cuda/src/kernel/launcher.rs

+//! let events = KernelLauncher::new(ctx, "for", &[array.ptype()])?
+//!     .arg_view(&cuda_view)
+//!     .arg(&reference)
+//!     .arg(&array_len)
+//!     .event_flags(CU_EVENT_DISABLE_TIMING)
+//!     .launch(array.len())?;


this is great

joseph-isaacs · 2026-01-23T10:38:11Z

vortex-cuda/src/kernel/launcher.rs

+    pub fn new(
+        ctx: &'a CudaExecutionCtx,
+        module_name: &str,
+        ptypes: &[PType],


This is not enough for non prim?

joseph-isaacs · 2026-01-23T10:38:27Z

vortex-cuda/src/kernel/launcher.rs

+    /// - Integers: u8, u16, u32, u64, i8, i16, i32, i64
+    /// - Floats: f32, f64


What about str literal?

joseph-isaacs · 2026-01-23T10:39:36Z

vortex-cuda/src/kernel/launcher.rs

+        // The _sync guard is dropped immediately, but that's fine since we're just
+        // reading the pointer value, not scheduling any work yet.
+        let (device_ptr, _sync) = view.device_ptr(self.stream);
+        self.storage.push(device_ptr);


what keeps the view alive?

chore: custom kernel launcher API to remove macro

5cd76db

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>

0ax1 added the changelog/chore A trivial change label Jan 22, 2026

joseph-isaacs reviewed Jan 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: custom kernel launcher API to remove macro#6112

chore: custom kernel launcher API to remove macro#6112
0ax1 wants to merge 1 commit intodevelopfrom
ad/new-launch-kernel-fn

0ax1 commented Jan 22, 2026 •

edited

Loading

Uh oh!

codspeed-hq bot commented Jan 22, 2026

Uh oh!

joseph-isaacs Jan 23, 2026

Uh oh!

joseph-isaacs Jan 23, 2026

Uh oh!

joseph-isaacs Jan 23, 2026

Uh oh!

joseph-isaacs Jan 23, 2026

Uh oh!

joseph-isaacs Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		/// - Integers: u8, u16, u32, u64, i8, i16, i32, i64
		/// - Floats: f32, f64

Conversation

0ax1 commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Jan 22, 2026

CodSpeed Performance Report

Merging this PR will degrade performance by 29.68%

Summary

Performance Changes

Footnotes

Uh oh!

joseph-isaacs Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

joseph-isaacs Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

joseph-isaacs Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

joseph-isaacs Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

joseph-isaacs Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

0ax1 commented Jan 22, 2026 •

edited

Loading