Question

Rust Array Loop Performance: Why Size Changes Can Affect Optimization

rustarraysperformancellvm-codegen

Question

I am benchmarking a Rust program that repeatedly sums the contents of a fixed-size array, and I noticed a dramatic slowdown when the array size is 240 or larger. With CAPACITY = 239, the program appears to run about 80 times faster than with CAPACITY = 240.

Is Rust applying a special optimization for smaller arrays, or is this behavior caused by compiler optimizations in general?

The program is compiled with:

rustc -C opt-level=3

Code:

use std::time::Instant;

const CAPACITY: usize = 240;
const IN_LOOPS: usize = 500_000;

fn main() {
    let mut arr = [0; CAPACITY];

    for i in 0..CAPACITY {
        arr[i] = i;
    }

    let mut sum = 0;
    let now = Instant::now();

    for _ in 0..IN_LOOPS {
        let mut s = 0;
        for i in 0..arr.len() {
            s += arr[i];
        }
        sum += s;
    }

    println!("sum:{} time:{:?}", sum, now.elapsed());
}

Short Answer

By the end of this page, you will understand why small code changes can cause large benchmark differences in Rust, especially when the compiler can fully optimize one version but not another. You will learn about constant sizes, loop unrolling, dead-code elimination, benchmark pitfalls, and how to write more reliable performance tests.

Concept

In this example, the core concept is compiler optimization during benchmarking.

When you compile Rust with -C opt-level=3, LLVM performs aggressive optimizations. If your program uses constants such as CAPACITY, and the compiler can predict exactly what a loop will do, it may transform the code heavily.

That can include:

Loop unrolling: replacing a loop with repeated operations
Constant folding: computing results at compile time
Dead-code elimination: removing work that does not affect observable output
Vectorization: using CPU instructions that process multiple values at once

In your code, the array contents are fully predictable:

arr[i] = i
CAPACITY is a compile-time constant
the summation always produces the same result for a given CAPACITY

This gives the compiler a lot of freedom.

The surprising part is that optimizations often have thresholds. For example, the optimizer may fully unroll a loop when it is below some internal size limit, but stop doing so when it becomes slightly larger. That can create a cliff where 239 and 240 behave very differently.

So the issue is usually not that Rust has a special rule for arrays below 240 elements. Instead, it is that the generated machine code changes when the optimizer decides one version is cheap enough to transform and another is not.

This matters in real programming because benchmarks can be misleading if the compiler is optimizing away the thing you think you are measuring. To benchmark accurately, you need to make sure the computation really happens at runtime.

Mental Model

Think of the compiler like a very smart assistant asked to carry out a repetitive task.

If the task is small and predictable, the assistant may do all the math in advance and hand you the final answer immediately.
If the task gets slightly bigger, the assistant may decide it is no longer worth precomputing everything and instead perform the work step by step.

That means two nearly identical programs can run very differently, not because the hardware changed, but because the compiler chose a different strategy.

A useful mental model is:

Source code = your instructions
Optimizer = a planner that rewrites those instructions
Benchmark result = the performance of the rewritten plan, not necessarily the original code structure

So when benchmarking, you are often measuring what the compiler turned your code into, not just the loop you wrote.

Take Quiz

Syntax and Examples

The most important syntax here is a simple array loop in Rust:

let mut s = 0;
for i in 0..arr.len() {
    s += arr[i];
}

A more idiomatic Rust version is:

let s: usize = arr.iter().sum();

Example:

fn main() {
    let arr = [1, 2, 3, 4, 5];

    let mut sum1 = 0;
    for i in 0..arr.len() {
        sum1 += arr[i];
    }

    let sum2: i32 = arr.iter().sum();

    (, sum1, sum2);
}

Step by Step Execution

Consider this smaller example:

fn main() {
    let arr = [0, 1, 2, 3];
    let mut s = 0;

    for i in 0..arr.len() {
        s += arr[i];
    }

    println!("{}", s);
}

Step by step:

arr is created with four elements: [0, 1, 2, 3]
s starts at 0
The loop runs with i = 0, 1, 2, 3
Each iteration adds one element to s

Trace:

i	arr[i]	s after addition

Real World Use Cases

This concept appears often in real software work:

Benchmarking algorithms

If you compare two implementations, the compiler may optimize one more aggressively than the other. That can make the benchmark unfair.

Numeric processing

In data-heavy code such as image processing, simulations, or audio work, loop structure can affect whether the compiler vectorizes or unrolls operations.

Serialization and parsing

Code that works on fixed-size buffers may be optimized differently from code that handles dynamic input sizes.

Embedded systems

When working with small fixed arrays, compilers often produce very compact and fast code. Slight changes in size can alter instruction count and memory behavior.

Performance tuning

Developers often inspect generated assembly or use benchmarking tools when a tiny source-code change causes a large timing difference.

Take Quiz

Real Codebase Usage

In real projects, developers usually avoid drawing conclusions from a single hand-written timing loop. Instead, they use patterns that make benchmarks more trustworthy.

Common patterns

Use benchmark tools

Rust developers often use crates such as criterion for stable benchmarking instead of measuring one Instant::now() block.

Prevent optimization from removing the work

A benchmark should make the compiler treat values as genuinely used. Otherwise, it may precompute or eliminate operations.

A common tool is std::hint::black_box:

use std::hint::black_box;

fn main() {
    let arr = [1usize, 2, 3, 4, 5];
    let mut total = 0;

    for _ in 0..1_000_000 {
        let s: usize = arr.iter().copied().sum();
        total += (s);
    }

    (, total);
}

Common Mistakes

Mistake 1: Assuming the benchmark measures exactly the written loop

Broken assumption:

for _ in 0..IN_LOOPS {
    let mut s = 0;
    for i in 0..arr.len() {
        s += arr[i];
    }
    sum += s;
}

Why it is a problem:

The compiler may precompute s
The inner loop may disappear or be rewritten completely

How to avoid it:

Use black_box
Benchmark with realistic runtime data
Use a benchmarking library

Mistake 2: Using fully predictable input

Broken example:

for i in 0..CAPACITY {
    arr[i] = i;
}

This makes the array contents easy to reason about at compile time.

How to avoid it:

Use values that are harder to fold into constants during optimization, or hide them from the compiler during the benchmark.

Mistake 3: Thinking array size alone explains everything

Comparisons

Approach	Example	Readability	Optimization potential	Benchmark reliability
Manual index loop	`for i in 0..arr.len() { s += arr[i]; }`	Medium	High	Low if compiler can predict everything
Iterator sum	`arr.iter().copied().sum()`	High	High	Low if input is constant and predictable
Benchmark with `Instant` only	`let now = Instant::now()`	Simple	N/A	Weak
Benchmark with `black_box`

Cheat Sheet

Quick reference

Fixed-size array loop

for i in 0..arr.len() {
    s += arr[i];
}

Idiomatic sum

let s: usize = arr.iter().copied().sum();

Why performance may change sharply

compiler heuristics have thresholds
small loops may be fully unrolled
predictable data may be constant-folded
larger code may stop qualifying for an optimization

Benchmarking tips

prefer criterion for serious benchmarks
use std::hint::black_box to reduce unwanted optimization
benchmark runtime-dependent values when possible
do multiple runs, not just one timing

Important idea

You are benchmarking optimized machine code, not just the source loop you wrote.

Safer benchmark pattern

use std::hint::black_box;

 () {
      = [, , , , ];
      = ;

       .. {
         :  = (arr).().().();
        total = (total + s);
    }

    (, total);
}

FAQ

Why is `239` much faster than `240` in this Rust loop?

Most likely because the compiler chose a different optimization strategy at that size. Small changes can cross an internal heuristic threshold for unrolling or constant evaluation.

Is Rust doing a special optimization for short arrays?

Not specifically for that exact array size. The behavior usually comes from LLVM optimization heuristics, not a Rust language rule tied to 239 or 240.

Is my benchmark reliable?

Not fully. The code is simple and predictable, so the compiler may optimize away much of the work. Use black_box or a benchmarking crate for more reliable results.

Should I use indexing or iterators in Rust?

In most real code, iterators are preferred because they are clearer and usually optimize very well.

Can the compiler compute the sum at compile time?

Sometimes, yes. If the input data and loop bounds are fully known and the result is predictable, the optimizer may simplify the code heavily.

How can I inspect what the compiler is doing?

You can look at generated assembly or use tools such as Compiler Explorer. This helps confirm whether loops were unrolled, removed, or vectorized.

Does `opt-level=3` always make code faster?

It often helps, but not always in the way you expect. It can also make benchmarks harder to interpret because the optimizer may transform code aggressively.

Related Concepts

Loop unrolling — related because small fixed loops are often expanded into repeated instructions.
Constant folding — related because compile-time-known values can be precomputed.
Dead-code elimination — related because unused or predictable work may be removed.
Iterator performance in Rust — related because iterator-based code can compile to very efficient loops.
Benchmarking in Rust — related because measuring performance correctly requires avoiding misleading optimizations.
LLVM optimizations — related because Rust relies on LLVM for many low-level code transformations.
std::hint::black_box — related because it helps make benchmarks less vulnerable to over-optimization.
Array vs Vec in Rust — related because fixed-size arrays and dynamic vectors may be optimized differently.

Take Quiz

Mini Project

Description

Build a small Rust benchmark that compares summing a fixed array in two ways: a plain loop and an iterator-based sum. Then make the benchmark harder for the compiler to optimize away by using black_box. This project demonstrates how benchmark structure affects the results you see.

Goal

Create a Rust program that compares summation approaches while reducing misleading compiler optimizations.

Requirements

Create a fixed-size array of integers.
Sum the array many times using a manual loop.
Sum the same array many times using an iterator.
Use std::hint::black_box so the compiler cannot easily precompute the result.
Print both totals and elapsed times.

Take Quiz

Keep learning

Situation	What compiler may do
Small fixed loop	Fully unroll it
Predictable constant data	Precompute result
Larger loop	Keep an actual loop
Runtime-dependent input	Less compile-time simplification

Rust Array Loop Performance: Why Size Changes Can Affect Optimization

Question

Short Answer

Concept

Mental Model

Syntax and Examples

Step by Step Execution

Real World Use Cases

Benchmarking algorithms

Numeric processing

Serialization and parsing

Embedded systems

Performance tuning

Real Codebase Usage

Common patterns

Use benchmark tools

Prevent optimization from removing the work

Common Mistakes

Mistake 1: Assuming the benchmark measures exactly the written loop

Mistake 2: Using fully predictable input

Mistake 3: Thinking array size alone explains everything

Comparisons

Cheat Sheet

Quick reference

Fixed-size array loop

Idiomatic sum

Why performance may change sharply

Benchmarking tips

Important idea

Safer benchmark pattern

FAQ

Why is 239 much faster than 240 in this Rust loop?

Is Rust doing a special optimization for short arrays?

Is my benchmark reliable?

Should I use indexing or iterators in Rust?

Can the compiler compute the sum at compile time?

How can I inspect what the compiler is doing?

Does opt-level=3 always make code faster?

Related Concepts

Mini Project

Description

Goal

Requirements

Related questions

Accessing Cargo Package Metadata in Rust

Associated Types vs Generic Type Parameters in Rust: When to Use Each

Convert an Integer to a String in Rust

Prefer idiomatic iteration

Validate benchmark intent

Mistake 4: Ignoring integer types

Mistake 5: Benchmarking in main with one quick measurement

Watch out for

Why is `239` much faster than `240` in this Rust loop?

Does `opt-level=3` always make code faster?

Mistake 5: Benchmarking in `main` with one quick measurement