Question
Rust Array Loop Performance: Why Size Changes Can Affect Optimization
Question
I am benchmarking a Rust program that repeatedly sums the contents of a fixed-size array, and I noticed a dramatic slowdown when the array size is 240 or larger. With CAPACITY = 239, the program appears to run about 80 times faster than with CAPACITY = 240.
Is Rust applying a special optimization for smaller arrays, or is this behavior caused by compiler optimizations in general?
The program is compiled with:
rustc -C opt-level=3
Code:
use std::time::Instant;
const CAPACITY: usize = 240;
const IN_LOOPS: usize = 500_000;
fn main() {
let mut arr = [0; CAPACITY];
for i in 0..CAPACITY {
arr[i] = i;
}
let mut sum = 0;
let now = Instant::now();
for _ in 0..IN_LOOPS {
let mut s = 0;
for i in 0..arr.len() {
s += arr[i];
}
sum += s;
}
println!("sum:{} time:{:?}", sum, now.elapsed());
}
Short Answer
By the end of this page, you will understand why small code changes can cause large benchmark differences in Rust, especially when the compiler can fully optimize one version but not another. You will learn about constant sizes, loop unrolling, dead-code elimination, benchmark pitfalls, and how to write more reliable performance tests.
Concept
In this example, the core concept is compiler optimization during benchmarking.
When you compile Rust with -C opt-level=3, LLVM performs aggressive optimizations. If your program uses constants such as CAPACITY, and the compiler can predict exactly what a loop will do, it may transform the code heavily.
That can include:
- Loop unrolling: replacing a loop with repeated operations
- Constant folding: computing results at compile time
- Dead-code elimination: removing work that does not affect observable output
- Vectorization: using CPU instructions that process multiple values at once
In your code, the array contents are fully predictable:
arr[i] = iCAPACITYis a compile-time constant- the summation always produces the same result for a given
CAPACITY
This gives the compiler a lot of freedom.
The surprising part is that optimizations often have thresholds. For example, the optimizer may fully unroll a loop when it is below some internal size limit, but stop doing so when it becomes slightly larger. That can create a cliff where 239 and 240 behave very differently.
So the issue is usually not that Rust has a special rule for arrays below 240 elements. Instead, it is that the generated machine code changes when the optimizer decides one version is cheap enough to transform and another is not.
This matters in real programming because benchmarks can be misleading if the compiler is optimizing away the thing you think you are measuring. To benchmark accurately, you need to make sure the computation really happens at runtime.
Mental Model
Think of the compiler like a very smart assistant asked to carry out a repetitive task.
- If the task is small and predictable, the assistant may do all the math in advance and hand you the final answer immediately.
- If the task gets slightly bigger, the assistant may decide it is no longer worth precomputing everything and instead perform the work step by step.
That means two nearly identical programs can run very differently, not because the hardware changed, but because the compiler chose a different strategy.
A useful mental model is:
- Source code = your instructions
- Optimizer = a planner that rewrites those instructions
- Benchmark result = the performance of the rewritten plan, not necessarily the original code structure
So when benchmarking, you are often measuring what the compiler turned your code into, not just the loop you wrote.
Syntax and Examples
The most important syntax here is a simple array loop in Rust:
let mut s = 0;
for i in 0..arr.len() {
s += arr[i];
}
A more idiomatic Rust version is:
let s: usize = arr.iter().sum();
Example:
fn main() {
let arr = [1, 2, 3, 4, 5];
let mut sum1 = 0;
for i in 0..arr.len() {
sum1 += arr[i];
}
let sum2: i32 = arr.iter().sum();
(, sum1, sum2);
}
Step by Step Execution
Consider this smaller example:
fn main() {
let arr = [0, 1, 2, 3];
let mut s = 0;
for i in 0..arr.len() {
s += arr[i];
}
println!("{}", s);
}
Step by step:
arris created with four elements:[0, 1, 2, 3]sstarts at0- The loop runs with
i = 0, 1, 2, 3 - Each iteration adds one element to
s
Trace:
| i | arr[i] | s after addition |
|---|
Real World Use Cases
This concept appears often in real software work:
Benchmarking algorithms
If you compare two implementations, the compiler may optimize one more aggressively than the other. That can make the benchmark unfair.
Numeric processing
In data-heavy code such as image processing, simulations, or audio work, loop structure can affect whether the compiler vectorizes or unrolls operations.
Serialization and parsing
Code that works on fixed-size buffers may be optimized differently from code that handles dynamic input sizes.
Embedded systems
When working with small fixed arrays, compilers often produce very compact and fast code. Slight changes in size can alter instruction count and memory behavior.
Performance tuning
Developers often inspect generated assembly or use benchmarking tools when a tiny source-code change causes a large timing difference.
Real Codebase Usage
In real projects, developers usually avoid drawing conclusions from a single hand-written timing loop. Instead, they use patterns that make benchmarks more trustworthy.
Common patterns
Use benchmark tools
Rust developers often use crates such as criterion for stable benchmarking instead of measuring one Instant::now() block.
Prevent optimization from removing the work
A benchmark should make the compiler treat values as genuinely used. Otherwise, it may precompute or eliminate operations.
A common tool is std::hint::black_box:
use std::hint::black_box;
fn main() {
let arr = [1usize, 2, 3, 4, 5];
let mut total = 0;
for _ in 0..1_000_000 {
let s: usize = arr.iter().copied().sum();
total += (s);
}
(, total);
}
Common Mistakes
Mistake 1: Assuming the benchmark measures exactly the written loop
Broken assumption:
for _ in 0..IN_LOOPS {
let mut s = 0;
for i in 0..arr.len() {
s += arr[i];
}
sum += s;
}
Why it is a problem:
- The compiler may precompute
s - The inner loop may disappear or be rewritten completely
How to avoid it:
- Use
black_box - Benchmark with realistic runtime data
- Use a benchmarking library
Mistake 2: Using fully predictable input
Broken example:
for i in 0..CAPACITY {
arr[i] = i;
}
This makes the array contents easy to reason about at compile time.
How to avoid it:
Use values that are harder to fold into constants during optimization, or hide them from the compiler during the benchmark.
Mistake 3: Thinking array size alone explains everything
Comparisons
| Approach | Example | Readability | Optimization potential | Benchmark reliability |
|---|---|---|---|---|
| Manual index loop | for i in 0..arr.len() { s += arr[i]; } | Medium | High | Low if compiler can predict everything |
| Iterator sum | arr.iter().copied().sum() | High | High | Low if input is constant and predictable |
Benchmark with Instant only | let now = Instant::now() | Simple | N/A | Weak |
Benchmark with black_box |
Cheat Sheet
Quick reference
Fixed-size array loop
for i in 0..arr.len() {
s += arr[i];
}
Idiomatic sum
let s: usize = arr.iter().copied().sum();
Why performance may change sharply
- compiler heuristics have thresholds
- small loops may be fully unrolled
- predictable data may be constant-folded
- larger code may stop qualifying for an optimization
Benchmarking tips
- prefer
criterionfor serious benchmarks - use
std::hint::black_boxto reduce unwanted optimization - benchmark runtime-dependent values when possible
- do multiple runs, not just one timing
Important idea
You are benchmarking optimized machine code, not just the source loop you wrote.
Safer benchmark pattern
use std::hint::black_box;
() {
= [, , , , ];
= ;
.. {
: = (arr).().().();
total = (total + s);
}
(, total);
}
FAQ
Why is 239 much faster than 240 in this Rust loop?
Most likely because the compiler chose a different optimization strategy at that size. Small changes can cross an internal heuristic threshold for unrolling or constant evaluation.
Is Rust doing a special optimization for short arrays?
Not specifically for that exact array size. The behavior usually comes from LLVM optimization heuristics, not a Rust language rule tied to 239 or 240.
Is my benchmark reliable?
Not fully. The code is simple and predictable, so the compiler may optimize away much of the work. Use black_box or a benchmarking crate for more reliable results.
Should I use indexing or iterators in Rust?
In most real code, iterators are preferred because they are clearer and usually optimize very well.
Can the compiler compute the sum at compile time?
Sometimes, yes. If the input data and loop bounds are fully known and the result is predictable, the optimizer may simplify the code heavily.
How can I inspect what the compiler is doing?
You can look at generated assembly or use tools such as Compiler Explorer. This helps confirm whether loops were unrolled, removed, or vectorized.
Does opt-level=3 always make code faster?
It often helps, but not always in the way you expect. It can also make benchmarks harder to interpret because the optimizer may transform code aggressively.
Mini Project
Description
Build a small Rust benchmark that compares summing a fixed array in two ways: a plain loop and an iterator-based sum. Then make the benchmark harder for the compiler to optimize away by using black_box. This project demonstrates how benchmark structure affects the results you see.
Goal
Create a Rust program that compares summation approaches while reducing misleading compiler optimizations.
Requirements
- Create a fixed-size array of integers.
- Sum the array many times using a manual loop.
- Sum the same array many times using an iterator.
- Use
std::hint::black_boxso the compiler cannot easily precompute the result. - Print both totals and elapsed times.
Keep learning
Related questions
Accessing Cargo Package Metadata in Rust
Learn how to read Cargo package metadata like version, name, and authors in Rust using compile-time environment macros.
Associated Types vs Generic Type Parameters in Rust: When to Use Each
Learn when to use associated types vs generic parameters in Rust traits, with clear rules, examples, and practical API design advice.
Convert an Integer to a String in Rust
Learn the current Rust way to convert integers to strings, why `to_str()` no longer works, and when to use `to_string()` or `format!`.