Skip to content

[Performance] ClipNDArray with @out does redundant copy #595

@Nucs

Description

@Nucs

Problem

np.maximum/minimum/clip with @out parameter does two passes:

  1. np.copyto(@out, lhs) - copy input to output
  2. ClipArrayMin(@out, min) - clip in-place

Root Cause

ClipArrayMin<T>(T* output, T* minArr, int size) operates in-place. No separate src parameter.

Fix

Add 3-operand kernel variants:

ClipArrayMinFromSource<T>(T* dest, T* src, T* minArr, int size)
// dest[i] = max(src[i], minArr[i]) - single pass

Same for ClipArrayMax and ClipArrayBounds.

Impact

Affects np.maximum, np.minimum, np.fmax, np.clip when @out is provided.

Metadata

Metadata

Assignees

Labels

apiPublic API surface (np.*, NDArray methods, operators)performancePerformance improvements or optimizations

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions