RustCall.jl provides multiple features to optimize performance when calling Rust code from Julia. This guide explains best practices and optimization tips for improving performance.
- Compilation Caching
- LLVM Optimization
- Function Call Optimization
- Memory Management
- Benchmark Results
- Performance Tuning Tips
RustCall.jl automatically caches compiled Rust libraries. This eliminates the need to recompile the same code and significantly reduces startup time.
- Cache Key: Generated from code hash, compiler settings, and target triple
- Cache Location:
~/.julia/compiled/vX.Y/RustCall/ - Automatic Verification: Automatically checks cache integrity
using RustCall
# Check cache size
size = RustCall.get_cache_size()
println("Cache size: $(size / 1024 / 1024) MB")
# List cached libraries
libraries = RustCall.list_cached_libraries()
println("Cached libraries: $(length(libraries))")
# Cleanup old cache (older than 30 days)
RustCall.cleanup_old_cache(30)
# Clear cache completely
RustCall.clear_cache()- During Development: Keep cache enabled to reduce recompilation time
- Production: Warm up cache beforehand to avoid first-run delays
- CI/CD: Save and restore cache to reduce build time
RustCall.jl supports optimization at the LLVM IR level. Using the @rust_llvm macro enables more advanced optimizations.
using RustCall
# Create optimization configuration
config = RustCall.OptimizationConfig(
level=3, # 0-3 (3 is most optimized)
enable_vectorization=true,
enable_loop_unrolling=true,
enable_licm=true
)
rust_code = """
#[no_mangle]
pub extern "C" fn compute(x: f64) -> f64 {
x * x + 1.0
}
"""
# Compile Rust to LLVM IR and load module
wrapped = RustCall.wrap_rust_code(rust_code)
compiler = RustCall.get_default_compiler()
ir_path = RustCall.compile_rust_to_llvm_ir(wrapped; compiler=compiler)
rust_mod = RustCall.load_llvm_ir(ir_path; source_code=wrapped)
mod = rust_mod.mod
# Apply optimization
RustCall.optimize_module!(mod; config=config)# Speed-optimized
RustCall.optimize_for_speed!(mod)
# Size-optimized
RustCall.optimize_for_size!(mod)- Level 0: No optimization (for debugging)
- Level 1: Basic optimizations
- Level 2: Standard optimizations (default)
- Level 3: Maximum optimization (may take longer to compile)
@rust: Standard call viaccall. Highly stable, recommended for most cases@rust_llvm: Call via LLVM IR integration (experimental). Has optimization potential but limitations with some types
# Standard call (recommended)
result = @rust add(Int32(10), Int32(20))::Int32
# LLVM integration call (experimental)
result = @rust_llvm add(Int32(10), Int32(20))Explicit type specification can reduce type inference overhead:
# With type inference (slightly slower)
result = @rust add(10, 20)
# Explicit type specification (recommended)
result = @rust add(Int32(10), Int32(20))::Int32Frequently called functions can be optimized by registering them beforehand:
# Register function for LLVM path
RustCall.compile_and_register_rust_function("""
#[no_mangle]
pub extern "C" fn add(a: i32, b: i32) -> i32 {
a + b
}
""", "add")
# Call through LLVM path
result = @rust_llvm add(Int32(10), Int32(20))Ownership types (RustBox, RustRc, RustArc, RustVec) prevent memory leaks when used appropriately:
# Temporary allocations are automatically cleaned up
box = RustCall.RustBox(Int32(42))
# Automatically dropped after use
# Explicit drop (when early release is needed)
RustCall.drop!(box)RustVec is a type for manipulating Rust's Vec<T> from Julia. Best practices when handling large amounts of data:
# Create RustVec from Julia array
julia_vec = Int32[1, 2, 3, 4, 5]
rust_vec = RustCall.create_rust_vec(julia_vec)
# Efficient bulk copy (recommended)
result = Vector{Int32}(undef, length(rust_vec))
RustCall.copy_to_julia!(rust_vec, result)
# Or use to_julia_vector
result = RustCall.to_julia_vector(rust_vec)
# Element-by-element access (not recommended for large data)
for i in 1:length(rust_vec)
value = rust_vec[i] # FFI call occurs
end
# Explicitly drop after use
RustCall.drop!(rust_vec)| Scenario | Recommendation |
|---|---|
| Computation within Julia | Julia arrays |
| Input to Rust functions | RustVec |
| Output from Rust functions | RustVec → Convert to Julia array |
| Temporary storage of large data | Julia arrays (managed by GC) |
| Data manipulation on Rust side | RustVec |
# Pattern 1: Use try-finally
box = RustCall.RustBox(Int32(42))
try
# Use
value = box.ptr
finally
RustCall.drop!(box) # Ensure cleanup
end
# Pattern 2: Leverage local scope
function compute()
box = RustCall.RustBox(Int32(42))
# Use
return result
# box is automatically dropped
endThe following benchmarks were run on Julia 1.12, Rust 1.92.0, macOS:
| Operation | Julia Native | @rust | @rust_llvm |
|---|---|---|---|
| i32 addition | 1.0x | 1.2x | 1.1x |
| i64 addition | 1.0x | 1.2x | 1.1x |
| f64 addition | 1.0x | 1.3x | 1.2x |
| i32 multiplication | 1.0x | 1.2x | 1.1x |
| f64 multiplication | 1.0x | 1.3x | 1.2x |
| Computation | Julia Native | @rust | @rust_llvm |
|---|---|---|---|
| Fibonacci (n=30) | 1.0x | 1.1x | 1.0x |
| Sum Range (1..1000) | 1.0x | 1.2x | 1.1x |
| Operation | Average Time | Notes |
|---|---|---|
| RustBox create+drop | ~170 ns | Single value allocation/release |
| RustRc create+drop | ~180 ns | With reference counting |
| RustRc clone+drop | ~180 ns | Clone operation |
| RustArc create+drop | ~190 ns | Atomic reference counting |
| RustArc clone+drop | ~200 ns | Thread-safe |
| Operation | Average Time | Notes |
|---|---|---|
| RustVec(1000 elements) create | ~1 μs | Conversion from Julia array |
| RustVec copy_to_julia!(1000 elements) | ~500 ns | Efficient bulk copy |
| RustVec element access | ~50 ns/element | Includes FFI call |
| RustVec push! | ~100 ns | When no reallocation occurs |
Note: These results may vary by environment. Actual performance can vary significantly depending on hardware, OS, and Julia/Rust versions.
# Basic benchmarks
julia --project benchmark/benchmarks.jl
# LLVM integration benchmarks
julia --project benchmark/benchmarks_llvm.jl
# Ownership type benchmarks
julia --threads=4 --project benchmark/benchmarks_ownership.jl
# Array operation benchmarks
julia --project benchmark/benchmarks_arrays.jl
# Generics benchmarks
julia --project benchmark/benchmarks_generics.jl- Leverage cache: Don't recompile the same code
- Adjust optimization level: Level 1-2 during development, Level 3 in production
- Disable debug info:
emit_debug_info=false
compiler = RustCall.RustCompiler(
optimization_level=2, # 2 is sufficient during development
emit_debug_info=false
)
RustCall.set_default_compiler(compiler)- Explicit types: Reduce type inference overhead
- Register functions: Pre-register frequently called functions
- Batch processing: Combine multiple calls
# Inefficient: Type inference every time in loop
for i in 1:1000
result = @rust add(i, i+1) # Type inference runs every time
end
# Efficient: Explicit types
for i in 1:1000
result = @rust add(Int32(i), Int32(i+1))::Int32
end- Appropriate use of ownership types: Drop immediately when no longer needed
- Appropriate choice of Rc/Arc: Use
Rcfor single-threaded,Arcfor multi-threaded - Cache cleanup: Regularly delete old cache
using Base.Threads
# Use Arc to share data between threads
shared_data = RustCall.RustArc(Int32(0))
# Work on multiple threads
@threads for i in 1:1000
local_arc = RustCall.clone(shared_data)
# Work
RustCall.drop!(local_arc)
endUse Julia's profiling tools to identify bottlenecks:
using Profile
# Start profiling
Profile.clear()
@profile for i in 1:1000
@rust add(Int32(i), Int32(i+1))
end
# Display results
Profile.print()- Check cache: Verify cache is working correctly
- Check optimization level: Verify optimization level is set appropriately
- Explicit types: Reduce type inference overhead
- Profiling: Identify bottlenecks
- Check ownership types: Verify they are being dropped appropriately
- Cache cleanup: Delete old cache
- Rc/Arc usage: Avoid unnecessary clones
To optimize RustCall.jl performance:
- ✅ Leverage cache: Reduce compilation time
- ✅ Adjust optimization level: Select optimization level according to use case
- ✅ Explicit types: Reduce type inference overhead
- ✅ Memory management: Use ownership types appropriately
- ✅ Profiling: Identify and optimize bottlenecks
By following these best practices, you can maximize the performance of applications using RustCall.jl.