Performance Guide

RustCall.jl provides multiple features to optimize performance when calling Rust code from Julia. This guide explains best practices and optimization tips for improving performance.

Compilation Caching
LLVM Optimization
Function Call Optimization
Memory Management
Benchmark Results
Performance Tuning Tips

Compilation Caching

RustCall.jl automatically caches compiled Rust libraries. This eliminates the need to recompile the same code and significantly reduces startup time.

How Caching Works

Cache Key: Generated from code hash, compiler settings, and target triple
Cache Location: ~/.julia/compiled/vX.Y/RustCall/
Automatic Verification: Automatically checks cache integrity

Cache Management

using RustCall

# Check cache size
size = RustCall.get_cache_size()
println("Cache size: $(size / 1024 / 1024) MB")

# List cached libraries
libraries = RustCall.list_cached_libraries()
println("Cached libraries: $(length(libraries))")

# Cleanup old cache (older than 30 days)
RustCall.cleanup_old_cache(30)

# Clear cache completely
RustCall.clear_cache()

Cache Best Practices

During Development: Keep cache enabled to reduce recompilation time
Production: Warm up cache beforehand to avoid first-run delays
CI/CD: Save and restore cache to reduce build time

LLVM Optimization

RustCall.jl supports optimization at the LLVM IR level. Using the @rust_llvm macro enables more advanced optimizations.

Optimization Level Settings

using RustCall

# Create optimization configuration
config = RustCall.OptimizationConfig(
    level=3,  # 0-3 (3 is most optimized)
    enable_vectorization=true,
    enable_loop_unrolling=true,
    enable_licm=true
)

rust_code = """
#[no_mangle]
pub extern "C" fn compute(x: f64) -> f64 {
    x * x + 1.0
}
"""

# Compile Rust to LLVM IR and load module
wrapped = RustCall.wrap_rust_code(rust_code)
compiler = RustCall.get_default_compiler()
ir_path = RustCall.compile_rust_to_llvm_ir(wrapped; compiler=compiler)
rust_mod = RustCall.load_llvm_ir(ir_path; source_code=wrapped)
mod = rust_mod.mod

# Apply optimization
RustCall.optimize_module!(mod; config=config)

Optimization Presets

# Speed-optimized
RustCall.optimize_for_speed!(mod)

# Size-optimized
RustCall.optimize_for_size!(mod)

Optimization Level Selection

Level 0: No optimization (for debugging)
Level 1: Basic optimizations
Level 2: Standard optimizations (default)
Level 3: Maximum optimization (may take longer to compile)

Function Call Optimization

`@rust` vs `@rust_llvm`

@rust: Standard call via ccall. Highly stable, recommended for most cases
@rust_llvm: Call via LLVM IR integration (experimental). Has optimization potential but limitations with some types

# Standard call (recommended)
result = @rust add(Int32(10), Int32(20))::Int32

# LLVM integration call (experimental)
result = @rust_llvm add(Int32(10), Int32(20))

Type Inference Optimization

Explicit type specification can reduce type inference overhead:

# With type inference (slightly slower)
result = @rust add(10, 20)

# Explicit type specification (recommended)
result = @rust add(Int32(10), Int32(20))::Int32

Function Registration Optimization

Frequently called functions can be optimized by registering them beforehand:

# Register function for LLVM path
RustCall.compile_and_register_rust_function("""
#[no_mangle]
pub extern "C" fn add(a: i32, b: i32) -> i32 {
    a + b
}
""", "add")

# Call through LLVM path
result = @rust_llvm add(Int32(10), Int32(20))

Memory Management

Efficient Use of Ownership Types

Ownership types (RustBox, RustRc, RustArc, RustVec) prevent memory leaks when used appropriately:

# Temporary allocations are automatically cleaned up
box = RustCall.RustBox(Int32(42))
# Automatically dropped after use

# Explicit drop (when early release is needed)
RustCall.drop!(box)

Efficient Use of RustVec

RustVec is a type for manipulating Rust's Vec<T> from Julia. Best practices when handling large amounts of data:

# Create RustVec from Julia array
julia_vec = Int32[1, 2, 3, 4, 5]
rust_vec = RustCall.create_rust_vec(julia_vec)

# Efficient bulk copy (recommended)
result = Vector{Int32}(undef, length(rust_vec))
RustCall.copy_to_julia!(rust_vec, result)

# Or use to_julia_vector
result = RustCall.to_julia_vector(rust_vec)

# Element-by-element access (not recommended for large data)
for i in 1:length(rust_vec)
    value = rust_vec[i]  # FFI call occurs
end

# Explicitly drop after use
RustCall.drop!(rust_vec)

RustVec vs Julia Array Selection

Scenario	Recommendation
Computation within Julia	Julia arrays
Input to Rust functions	RustVec
Output from Rust functions	RustVec → Convert to Julia array
Temporary storage of large data	Julia arrays (managed by GC)
Data manipulation on Rust side	RustVec

Avoiding Memory Leaks

# Pattern 1: Use try-finally
box = RustCall.RustBox(Int32(42))
try
    # Use
    value = box.ptr
finally
    RustCall.drop!(box)  # Ensure cleanup
end

# Pattern 2: Leverage local scope
function compute()
    box = RustCall.RustBox(Int32(42))
    # Use
    return result
    # box is automatically dropped
end

Benchmark Results

Basic Operations

The following benchmarks were run on Julia 1.12, Rust 1.92.0, macOS:

Operation	Julia Native	@rust	@rust_llvm
i32 addition	1.0x	1.2x	1.1x
i64 addition	1.0x	1.2x	1.1x
f64 addition	1.0x	1.3x	1.2x
i32 multiplication	1.0x	1.2x	1.1x
f64 multiplication	1.0x	1.3x	1.2x

Complex Computations

Computation	Julia Native	@rust	@rust_llvm
Fibonacci (n=30)	1.0x	1.1x	1.0x
Sum Range (1..1000)	1.0x	1.2x	1.1x

Ownership Type Operations

Operation	Average Time	Notes
RustBox create+drop	~170 ns	Single value allocation/release
RustRc create+drop	~180 ns	With reference counting
RustRc clone+drop	~180 ns	Clone operation
RustArc create+drop	~190 ns	Atomic reference counting
RustArc clone+drop	~200 ns	Thread-safe

RustVec Operations

Operation	Average Time	Notes
RustVec(1000 elements) create	~1 μs	Conversion from Julia array
RustVec copy_to_julia!(1000 elements)	~500 ns	Efficient bulk copy
RustVec element access	~50 ns/element	Includes FFI call
RustVec push!	~100 ns	When no reallocation occurs

Note: These results may vary by environment. Actual performance can vary significantly depending on hardware, OS, and Julia/Rust versions.

Running Benchmarks

# Basic benchmarks
julia --project benchmark/benchmarks.jl

# LLVM integration benchmarks
julia --project benchmark/benchmarks_llvm.jl

# Ownership type benchmarks
julia --threads=4 --project benchmark/benchmarks_ownership.jl

# Array operation benchmarks
julia --project benchmark/benchmarks_arrays.jl

# Generics benchmarks
julia --project benchmark/benchmarks_generics.jl

Performance Tuning Tips

1. Reducing Compilation Time

Leverage cache: Don't recompile the same code
Adjust optimization level: Level 1-2 during development, Level 3 in production
Disable debug info: emit_debug_info=false

compiler = RustCall.RustCompiler(
    optimization_level=2,  # 2 is sufficient during development
    emit_debug_info=false
)
RustCall.set_default_compiler(compiler)

2. Improving Runtime Performance

Explicit types: Reduce type inference overhead
Register functions: Pre-register frequently called functions
Batch processing: Combine multiple calls

# Inefficient: Type inference every time in loop
for i in 1:1000
    result = @rust add(i, i+1)  # Type inference runs every time
end

# Efficient: Explicit types
for i in 1:1000
    result = @rust add(Int32(i), Int32(i+1))::Int32
end

3. Optimizing Memory Usage

Appropriate use of ownership types: Drop immediately when no longer needed
Appropriate choice of Rc/Arc: Use Rc for single-threaded, Arc for multi-threaded
Cache cleanup: Regularly delete old cache

4. Parallel Processing Optimization

using Base.Threads

# Use Arc to share data between threads
shared_data = RustCall.RustArc(Int32(0))

# Work on multiple threads
@threads for i in 1:1000
    local_arc = RustCall.clone(shared_data)
    # Work
    RustCall.drop!(local_arc)
end

5. Profiling

Use Julia's profiling tools to identify bottlenecks:

using Profile

# Start profiling
Profile.clear()
@profile for i in 1:1000
    @rust add(Int32(i), Int32(i+1))
end

# Display results
Profile.print()

Troubleshooting

When Performance is Lower Than Expected

Check cache: Verify cache is working correctly
Check optimization level: Verify optimization level is set appropriately
Explicit types: Reduce type inference overhead
Profiling: Identify bottlenecks

When Memory Usage is High

Check ownership types: Verify they are being dropped appropriately
Cache cleanup: Delete old cache
Rc/Arc usage: Avoid unnecessary clones

Summary

To optimize RustCall.jl performance:

✅ Leverage cache: Reduce compilation time
✅ Adjust optimization level: Select optimization level according to use case
✅ Explicit types: Reduce type inference overhead
✅ Memory management: Use ownership types appropriately
✅ Profiling: Identify and optimize bottlenecks

By following these best practices, you can maximize the performance of applications using RustCall.jl.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance Guide

Table of Contents

Compilation Caching

How Caching Works

Cache Management

Cache Best Practices

LLVM Optimization

Optimization Level Settings

Optimization Presets

Optimization Level Selection

Function Call Optimization

`@rust` vs `@rust_llvm`

Type Inference Optimization

Function Registration Optimization

Memory Management

Efficient Use of Ownership Types

Efficient Use of RustVec

RustVec vs Julia Array Selection

Avoiding Memory Leaks

Benchmark Results

Basic Operations

Complex Computations

Ownership Type Operations

RustVec Operations

Running Benchmarks

Performance Tuning Tips

1. Reducing Compilation Time

2. Improving Runtime Performance

3. Optimizing Memory Usage

4. Parallel Processing Optimization

5. Profiling

Troubleshooting

When Performance is Lower Than Expected

When Memory Usage is High

Summary

Uh oh!

FilesExpand file tree

performance.md

Latest commit

History

performance.md

File metadata and controls

Performance Guide

Table of Contents

Compilation Caching

How Caching Works

Cache Management

Cache Best Practices

LLVM Optimization

Optimization Level Settings

Optimization Presets

Optimization Level Selection

Function Call Optimization

@rust vs @rust_llvm

Type Inference Optimization

Function Registration Optimization

Memory Management

Efficient Use of Ownership Types

Efficient Use of RustVec

RustVec vs Julia Array Selection

Avoiding Memory Leaks

Benchmark Results

Basic Operations

Complex Computations

Ownership Type Operations

RustVec Operations

Running Benchmarks

Performance Tuning Tips

1. Reducing Compilation Time

2. Improving Runtime Performance

3. Optimizing Memory Usage

4. Parallel Processing Optimization

5. Profiling

Troubleshooting

When Performance is Lower Than Expected

When Memory Usage is High

Summary

`@rust` vs `@rust_llvm`