Question

Performance Comparison in C, Python, Erlang, and Haskell: Why the Same Algorithm Runs at Different Speeds

cpythonperformancehaskellerlang

Question

I implemented Project Euler Problem 12 in C, Python, Erlang, and Haskell as a programming exercise to compare runtime performance. To make the execution times longer and easier to compare, I searched for the first triangle number with more than 1000 divisors instead of 500.

The measured results were:

C

#include <stdio.h>
#include <math.h>

int factorCount(long n)
{
    double square = sqrt(n);
    int isquare = (int)square;
    int count = isquare == square ? -1 : 0;
    long candidate;

    for (candidate = 1; candidate <= isquare; candidate++) {
        if (n % candidate == 0) {
            count += 2;
        }
    }
    return count;
}

int main(void)
{
    long triangle = 1;
    int index = 1;

    while (factorCount(triangle) < 1001) {
        index++;
        triangle += index;
    }

    printf("%ld\n", triangle);
    return 0;
}

Runtime: about 11 seconds.

Python

#!/usr/bin/env python3

import math


def factorCount(n):
    square = math.sqrt(n)
    isquare = int(square)
    count = -1 if isquare == square else 0

    for candidate in range(1, isquare + 1):
        if n % candidate == 0:
            count += 2

    return count


triangle = 1
index = 1
while factorCount(triangle) < 1001:
    index += 1
    triangle += index

print(triangle)

Runtime: about 76 seconds.

Python with PyPy

Runtime: about 13 seconds.

Erlang

-module(euler12).
-compile(export_all).

factorCount(Number) ->
    factorCount(Number, math:sqrt(Number), 1, 0).

factorCount(_, Sqrt, Candidate, Count) when Candidate > Sqrt ->
    Count;
factorCount(_, Sqrt, Candidate, Count) when Candidate == Sqrt ->
    Count + 1;
factorCount(Number, Sqrt, Candidate, Count) ->
    case Number rem Candidate of
        0 -> factorCount(Number, Sqrt, Candidate + 1, Count + 2);
        _ -> factorCount(Number, Sqrt, Candidate + 1, Count)
    end.

nextTriangle(Index, Triangle) ->
    Count = factorCount(Triangle),
    if
        Count > 1000 -> Triangle;
        true -> nextTriangle(Index + 1, Triangle + Index + 1)
    end.

solve() ->
    io:format("~p~n", [nextTriangle(1, 1)]),
    halt(0).

Runtime: about 48 seconds.

Haskell

factorCount number = factorCount' number isquare 1 0 - (fromEnum $ square == fromIntegral isquare)
  where
    square = sqrt $ fromIntegral number
    isquare = floor square

factorCount' number sqrt candidate count
    | fromIntegral candidate > sqrt = count
    | number `mod` candidate == 0 = factorCount' number sqrt (candidate + 1) (count + 2)
    | otherwise = factorCount' number sqrt (candidate + 1) count

nextTriangle index triangle
    | factorCount triangle > 1000 = triangle
    | otherwise = nextTriangle (index + 1) (triangle + index + 1)

main = print $ nextTriangle 1 1

Runtime: about 157 seconds.

The main questions behind this are:

Do Python, Erlang, and Haskell lose performance because they use arbitrary-precision integers, or only once values become larger than machine integers?
Why might Haskell be much slower here?
How can these implementations be optimized without changing the overall factor-counting strategy?
Do the functional versions allow tail-call optimization / last-call optimization so they avoid unnecessary stack growth?

Short Answer

By the end of this page, you will understand why the same algorithm can have very different runtimes across languages, even when the code looks similar. You will learn how runtime systems, integer representations, recursion style, compiler optimizations, laziness, and interpreter overhead affect performance in C, Python, Erlang, and Haskell. You will also see practical ways to optimize code while keeping the underlying algorithm mostly the same.

Concept

The core concept here is performance depends on both the algorithm and the language runtime model.

Many beginners expect that if two programs do the same work, they should run at roughly the same speed. In practice, that is not true. Runtime performance is influenced by several layers:

Algorithmic cost: how much work the program does
Data representation: machine integers vs boxed integers vs arbitrary-precision integers
Execution model: compiled native code, bytecode, VM, interpreter, JIT
Language semantics: strict evaluation vs lazy evaluation
Function-call overhead: loops, recursion, closures, guards, pattern matching
Library and runtime startup costs

In your example, all versions are using roughly the same high-level algorithm:

Generate triangle numbers
Count divisors of each triangle number
Stop when divisor count exceeds 1000

That means the main algorithm is held mostly constant, so differences in execution time mainly reflect language implementation details and how naturally the code maps to efficient execution in that language.

Integer representation matters

Some languages use different integer strategies:

C: usually fixed-size machine integers like long
Python: integers are objects; even small integers have object overhead
Erlang: supports small integers efficiently, but values are still managed by the VM
Haskell: depends on the type; Int is fixed-size, Integer is arbitrary precision

Mental Model

Think of the algorithm as a recipe and the programming language as the kitchen.

All four cooks are making the same dish:

C works in a professional kitchen with sharp tools and little safety overhead.
Python works in a very flexible kitchen where every tool is labeled, checked, and easy to use, but slower.
PyPy is like Python with a smart assistant who notices repeated work and speeds it up.
Erlang works in a kitchen designed for many cooks collaborating safely, not for one cook chopping vegetables as fast as possible.
Haskell works in a kitchen where ingredients may be prepared lazily and with elegant abstractions, but if you do not guide it well, extra work can build up.

So even if the recipe is the same, the kitchens behave differently. The language runtime, compiler, and data model all affect speed.

Take Quiz

Syntax and Examples

Core idea: same task, different execution models

Here is a small divisor-counting example in Python:

import math


def factor_count(n):
    root = int(math.sqrt(n))
    count = 0

    for candidate in range(1, root + 1):
        if n % candidate == 0:
            count += 2

    if root * root == n:
        count -= 1

    return count


print(factor_count(28))  # 6

Why `28` has 6 divisors

The divisors are:

The loop only checks up to sqrt(28), because divisors come in pairs:

1 × 28
2 × 14
4 × 7

This is much faster than checking every number up to .

Step by Step Execution

Trace example

Let us trace a small version with n = 16.

Python example:

import math


def factor_count(n):
    root = int(math.sqrt(n))
    count = 0

    for candidate in range(1, root + 1):
        if n % candidate == 0:
            count += 2

    if root * root == n:
        count -= 1

    return count


print(factor_count(16))

Step by step

n = 16
math.sqrt(16) is 4.0
root = 4
count = 0

Now the loop runs from 1 to 4:

Real World Use Cases

This exact Project Euler task is artificial, but the underlying ideas appear in real software.

1. CPU-bound batch processing

If your program spends most of its time doing arithmetic and loops, language runtime overhead becomes visible.

Examples:

data analysis scripts
simulation code
puzzle solving
computational search

2. Benchmarking implementations

Teams often prototype in one language and later rewrite hot paths.

Examples:

Python app with a C extension
Python code accelerated by PyPy, NumPy, or Rust
Haskell code tuned with strictness annotations

3. Choosing the right integer type

Many applications do not need arbitrary-precision integers.

Examples:

counters
IDs
indexes
small numeric calculations

Using bigger numeric abstractions than necessary can cost performance.

4. Understanding runtime trade-offs

Some languages prioritize:

raw speed
safety
concurrency
expressiveness
productivity

Erlang, for example, is often chosen for reliability and concurrency, not tight arithmetic loops.

5. Fair performance testing

A fair benchmark should control for:

optimization flags
warm-up time for JITs
equivalent data types
input size
startup cost vs steady-state runtime

Real Codebase Usage

In real codebases, developers rarely stop at “same algorithm, same syntax.” They also adapt the code to the language.

Common patterns developers use

Guard clauses and early exits

Instead of doing extra work, return as soon as a condition is met.

if n < 1:
    return 0

Fixed-size integers when possible

In Haskell, choosing Int instead of Integer can matter a lot if overflow is not a concern.

factorCount :: Int -> Int

Compiler optimization flags

For Haskell, ghc -O2 is a common baseline for performance-sensitive code.

For C, -O2 or -O3 is also common.

Avoid unnecessary conversions

Repeated conversions between integer and floating-point types can slow code and create extra work.

Tail-recursive loops in functional languages

Erlang and Haskell code often uses recursive helper functions that accumulate results.

Keep hot paths simple

In performance-critical code, developers reduce:

Common Mistakes

1. Assuming arbitrary-precision integers are the only reason for slowness

That is only part of the story.

Even when numbers fit into machine range, these still matter:

boxed objects
VM overhead
interpreter dispatch
laziness
missing optimization flags

2. Comparing unoptimized builds

A very common mistake is benchmarking debug or default builds.

Better approach

C: compile with -O2 or -O3
Haskell: compile with -O2
PyPy: allow warm-up for JIT-heavy loops

Broken comparison idea:

gcc program.c -o app
./app

Better:

gcc -O2 program.c -lm -o app
./app

3. Ignoring Haskell numeric types

If no type is written, Haskell may infer a more general or more expensive numeric type than you expect.

Potentially problematic style:

factorCount number = ...

Safer for performance-sensitive code:

factorCount :: Int -> Int
factorCount number = ...

4. Forgetting that startup time affects short programs

Comparisons

Comparing the main causes of slowdown

Factor	C	Python	Erlang	Haskell
Execution model	Native compiled	Interpreted VM	BEAM VM	Native compiled with runtime
Small integer overhead	Low	High	Medium	Depends on type
Arbitrary precision by default	No	Yes	Supports big integers	`Integer` yes, `Int` no
JIT available	Usually no	CPython no	No	No
Lazy evaluation

Cheat Sheet

Quick answers

Same algorithm does not mean same speed.
C is fast here because of native code and cheap machine integers.
Python is slower mainly because integers are objects and loops run through the interpreter.
PyPy can greatly improve Python loop performance with JIT compilation.
Erlang is optimized for concurrency and fault tolerance, not numeric inner loops.
Haskell performance depends heavily on types, strictness, and compiler flags.

Integer rules

C long: fixed-size, fast
Python int: arbitrary precision, object overhead
Erlang integer: VM-managed, supports big integers
Haskell Int: fixed-size, faster
Haskell Integer: arbitrary precision, slower

Haskell performance tips

ghc -O2 file.hs -o app

Prefer explicit types:

factorCount :: Int -> Int

Be careful with laziness in accumulators.

Safer perfect-square check

Instead of comparing float equality directly:

root = int(math.sqrt(n))
 root * root == n:
    count -=

FAQ

Why is C usually faster than Python for numeric loops?

C compiles directly to native machine code and uses cheap primitive values. Python executes through an interpreter and uses objects for integers, which adds overhead.

Does Python always use arbitrary-precision integers?

Yes, Python integers are arbitrary precision. Even small integers still behave as Python objects, so there is overhead before values become “big.”

Does Haskell always use arbitrary-precision integers?

No. Int is fixed-size and usually faster. Integer is arbitrary precision and slower but avoids overflow.

Why can PyPy be much faster than CPython?

PyPy uses a JIT compiler, which can optimize repeated pure-Python operations at runtime, especially loops and arithmetic-heavy code.

Is tail recursion enough to make Haskell fast?

Not always. Tail recursion helps, but laziness can still build deferred computations unless values are evaluated strictly.

Is Erlang a bad language for math-heavy programs?

Not bad, but it is not usually the first choice for tight CPU-bound arithmetic loops. Its strengths are concurrency, distribution, and fault tolerance.

How can I make a fair language benchmark?

Use optimized builds, equivalent numeric types, realistic inputs, and idiomatic implementations for each language. Also separate startup time from actual computation time.

Should I optimize the language choice or the algorithm first?

Usually the algorithm first. A better algorithm often gives much larger gains than switching languages.

Related Concepts

Algorithmic complexity — performance depends first on how much work the algorithm does.
Big-O notation — useful for understanding why some optimizations matter more than micro-tuning.
Tail recursion — directly related to stack usage and recursion performance.
Lazy evaluation — important for understanding Haskell performance behavior.
Compiler optimization flags — flags like -O2 can dramatically change runtime speed.
Boxing and unboxing — explains why primitive values can be faster than heap-allocated objects.
JIT compilation — helps explain why PyPy can approach or beat compiled code in some loops.
Fixed-size vs arbitrary-precision integers — central to numeric performance comparisons.
Benchmarking — teaches how to measure performance fairly and meaningfully.
Profiling — helps find real bottlenecks before optimizing.

Take Quiz

Mini Project

Description

Build a small benchmark that compares two implementations of the same divisor-counting task in Python: one using plain CPython and one intended to be run with PyPy. The project demonstrates that runtime choice can significantly affect performance even when the source code is identical.

Goal

Write a triangle-number search program, measure how long it takes, and observe how execution model affects performance.

Requirements

Write a function that counts divisors by checking candidates only up to the square root.
Write a function that finds the first triangle number with more than a given number of divisors.
Measure the runtime using Python’s time module.
Print both the triangle number and the elapsed time.
Keep the algorithm simple and close to the original approach.

Take Quiz

Keep learning

Type	Meaning	Performance	Overflow
`Int`	Fixed-size machine integer	Faster	Can overflow
`Integer`	Arbitrary precision	Slower	No overflow

Style	Common in	Notes
`for` / `while` loop	C, Python	Usually cheap and direct
Tail recursion	Erlang, functional styles	Can be optimized to reuse stack frames
Non-strict recursion	Haskell	May still need strictness help for best performance

Runtime	Typical behavior
CPython	Slower on tight loops, very common default interpreter
PyPy	Often much faster on repeated pure-Python loops due to JIT

Performance Comparison in C, Python, Erlang, and Haskell: Why the Same Algorithm Runs at Different Speeds

Question

C

Python

Python with PyPy

Erlang

Haskell

Short Answer

Concept

Integer representation matters

Mental Model

Syntax and Examples

Core idea: same task, different execution models

Why 28 has 6 divisors

Step by Step Execution

Trace example

Step by step

Real World Use Cases

1. CPU-bound batch processing

2. Benchmarking implementations

3. Choosing the right integer type

4. Understanding runtime trade-offs

5. Fair performance testing

Real Codebase Usage

Common patterns developers use

Guard clauses and early exits

Fixed-size integers when possible

Compiler optimization flags

Avoid unnecessary conversions

Tail-recursive loops in functional languages

Keep hot paths simple

Common Mistakes

1. Assuming arbitrary-precision integers are the only reason for slowness

2. Comparing unoptimized builds

Better approach

3. Ignoring Haskell numeric types

4. Forgetting that startup time affects short programs

Comparisons

Comparing the main causes of slowdown

Cheat Sheet

Quick answers

Integer rules

Haskell performance tips

Safer perfect-square check

FAQ

Why is C usually faster than Python for numeric loops?

Does Python always use arbitrary-precision integers?

Does Haskell always use arbitrary-precision integers?

Why can PyPy be much faster than CPython?

Is tail recursion enough to make Haskell fast?

Is Erlang a bad language for math-heavy programs?

How can I make a fair language benchmark?

Should I optimize the language choice or the algorithm first?

Related Concepts

Mini Project

Description

Goal

Requirements

Related questions

Building More Fault-Tolerant Embedded C++ Applications for Radiation-Prone ARM Systems

C printf Format Specifier for bool: How to Print Boolean Values

Calling C or C++ from Python: Building Python Bindings

Why this matters in real programming

Equivalent idea in C

Haskell type choice matters

Important beginner takeaway

Step-by-step triangle search

Realistic conclusion

5. Assuming all recursion is optimized

6. Using floating-point equality carelessly

7. Expecting “same code shape” to mean “fair comparison”

Int vs Integer in Haskell

CPython vs PyPy

Tail recursion vs loops

Practical answer to the original questions

Tail-call notes

Fair benchmarking checklist

Why `28` has 6 divisors

`Int` vs `Integer` in Haskell