Question
Performance Comparison in C, Python, Erlang, and Haskell: Why the Same Algorithm Runs at Different Speeds
Question
I implemented Project Euler Problem 12 in C, Python, Erlang, and Haskell as a programming exercise to compare runtime performance. To make the execution times longer and easier to compare, I searched for the first triangle number with more than 1000 divisors instead of 500.
The measured results were:
C
#include <stdio.h>
#include <math.h>
int factorCount(long n)
{
double square = sqrt(n);
int isquare = (int)square;
int count = isquare == square ? -1 : 0;
long candidate;
for (candidate = 1; candidate <= isquare; candidate++) {
if (n % candidate == 0) {
count += 2;
}
}
return count;
}
int main(void)
{
long triangle = 1;
int index = 1;
while (factorCount(triangle) < 1001) {
index++;
triangle += index;
}
printf("%ld\n", triangle);
return 0;
}
Runtime: about 11 seconds.
Python
#!/usr/bin/env python3
import math
def factorCount(n):
square = math.sqrt(n)
isquare = int(square)
count = -1 if isquare == square else 0
for candidate in range(1, isquare + 1):
if n % candidate == 0:
count += 2
return count
triangle = 1
index = 1
while factorCount(triangle) < 1001:
index += 1
triangle += index
print(triangle)
Runtime: about 76 seconds.
Python with PyPy
Runtime: about 13 seconds.
Erlang
-module(euler12).
-compile(export_all).
factorCount(Number) ->
factorCount(Number, math:sqrt(Number), 1, 0).
factorCount(_, Sqrt, Candidate, Count) when Candidate > Sqrt ->
Count;
factorCount(_, Sqrt, Candidate, Count) when Candidate == Sqrt ->
Count + 1;
factorCount(Number, Sqrt, Candidate, Count) ->
case Number rem Candidate of
0 -> factorCount(Number, Sqrt, Candidate + 1, Count + 2);
_ -> factorCount(Number, Sqrt, Candidate + 1, Count)
end.
nextTriangle(Index, Triangle) ->
Count = factorCount(Triangle),
if
Count > 1000 -> Triangle;
true -> nextTriangle(Index + 1, Triangle + Index + 1)
end.
solve() ->
io:format("~p~n", [nextTriangle(1, 1)]),
halt(0).
Runtime: about 48 seconds.
Haskell
factorCount number = factorCount' number isquare 1 0 - (fromEnum $ square == fromIntegral isquare)
where
square = sqrt $ fromIntegral number
isquare = floor square
factorCount' number sqrt candidate count
| fromIntegral candidate > sqrt = count
| number `mod` candidate == 0 = factorCount' number sqrt (candidate + 1) (count + 2)
| otherwise = factorCount' number sqrt (candidate + 1) count
nextTriangle index triangle
| factorCount triangle > 1000 = triangle
| otherwise = nextTriangle (index + 1) (triangle + index + 1)
main = print $ nextTriangle 1 1
Runtime: about 157 seconds.
The main questions behind this are:
- Do Python, Erlang, and Haskell lose performance because they use arbitrary-precision integers, or only once values become larger than machine integers?
- Why might Haskell be much slower here?
- How can these implementations be optimized without changing the overall factor-counting strategy?
- Do the functional versions allow tail-call optimization / last-call optimization so they avoid unnecessary stack growth?
Short Answer
By the end of this page, you will understand why the same algorithm can have very different runtimes across languages, even when the code looks similar. You will learn how runtime systems, integer representations, recursion style, compiler optimizations, laziness, and interpreter overhead affect performance in C, Python, Erlang, and Haskell. You will also see practical ways to optimize code while keeping the underlying algorithm mostly the same.
Concept
The core concept here is performance depends on both the algorithm and the language runtime model.
Many beginners expect that if two programs do the same work, they should run at roughly the same speed. In practice, that is not true. Runtime performance is influenced by several layers:
- Algorithmic cost: how much work the program does
- Data representation: machine integers vs boxed integers vs arbitrary-precision integers
- Execution model: compiled native code, bytecode, VM, interpreter, JIT
- Language semantics: strict evaluation vs lazy evaluation
- Function-call overhead: loops, recursion, closures, guards, pattern matching
- Library and runtime startup costs
In your example, all versions are using roughly the same high-level algorithm:
- Generate triangle numbers
- Count divisors of each triangle number
- Stop when divisor count exceeds 1000
That means the main algorithm is held mostly constant, so differences in execution time mainly reflect language implementation details and how naturally the code maps to efficient execution in that language.
Integer representation matters
Some languages use different integer strategies:
- C: usually fixed-size machine integers like
long - Python: integers are objects; even small integers have object overhead
- Erlang: supports small integers efficiently, but values are still managed by the VM
- Haskell: depends on the type;
Intis fixed-size,Integeris arbitrary precision
Mental Model
Think of the algorithm as a recipe and the programming language as the kitchen.
All four cooks are making the same dish:
- C works in a professional kitchen with sharp tools and little safety overhead.
- Python works in a very flexible kitchen where every tool is labeled, checked, and easy to use, but slower.
- PyPy is like Python with a smart assistant who notices repeated work and speeds it up.
- Erlang works in a kitchen designed for many cooks collaborating safely, not for one cook chopping vegetables as fast as possible.
- Haskell works in a kitchen where ingredients may be prepared lazily and with elegant abstractions, but if you do not guide it well, extra work can build up.
So even if the recipe is the same, the kitchens behave differently. The language runtime, compiler, and data model all affect speed.
Syntax and Examples
Core idea: same task, different execution models
Here is a small divisor-counting example in Python:
import math
def factor_count(n):
root = int(math.sqrt(n))
count = 0
for candidate in range(1, root + 1):
if n % candidate == 0:
count += 2
if root * root == n:
count -= 1
return count
print(factor_count(28)) # 6
Why 28 has 6 divisors
The divisors are:
- 1
- 2
- 4
- 7
- 14
- 28
The loop only checks up to sqrt(28), because divisors come in pairs:
1 × 282 × 144 × 7
This is much faster than checking every number up to .
Step by Step Execution
Trace example
Let us trace a small version with n = 16.
Python example:
import math
def factor_count(n):
root = int(math.sqrt(n))
count = 0
for candidate in range(1, root + 1):
if n % candidate == 0:
count += 2
if root * root == n:
count -= 1
return count
print(factor_count(16))
Step by step
n = 16math.sqrt(16)is4.0root = 4count = 0
Now the loop runs from 1 to 4:
Real World Use Cases
This exact Project Euler task is artificial, but the underlying ideas appear in real software.
1. CPU-bound batch processing
If your program spends most of its time doing arithmetic and loops, language runtime overhead becomes visible.
Examples:
- data analysis scripts
- simulation code
- puzzle solving
- computational search
2. Benchmarking implementations
Teams often prototype in one language and later rewrite hot paths.
Examples:
- Python app with a C extension
- Python code accelerated by PyPy, NumPy, or Rust
- Haskell code tuned with strictness annotations
3. Choosing the right integer type
Many applications do not need arbitrary-precision integers.
Examples:
- counters
- IDs
- indexes
- small numeric calculations
Using bigger numeric abstractions than necessary can cost performance.
4. Understanding runtime trade-offs
Some languages prioritize:
- raw speed
- safety
- concurrency
- expressiveness
- productivity
Erlang, for example, is often chosen for reliability and concurrency, not tight arithmetic loops.
5. Fair performance testing
A fair benchmark should control for:
- optimization flags
- warm-up time for JITs
- equivalent data types
- input size
- startup cost vs steady-state runtime
Real Codebase Usage
In real codebases, developers rarely stop at “same algorithm, same syntax.” They also adapt the code to the language.
Common patterns developers use
Guard clauses and early exits
Instead of doing extra work, return as soon as a condition is met.
if n < 1:
return 0
Fixed-size integers when possible
In Haskell, choosing Int instead of Integer can matter a lot if overflow is not a concern.
factorCount :: Int -> Int
Compiler optimization flags
For Haskell, ghc -O2 is a common baseline for performance-sensitive code.
For C, -O2 or -O3 is also common.
Avoid unnecessary conversions
Repeated conversions between integer and floating-point types can slow code and create extra work.
Tail-recursive loops in functional languages
Erlang and Haskell code often uses recursive helper functions that accumulate results.
Keep hot paths simple
In performance-critical code, developers reduce:
Common Mistakes
1. Assuming arbitrary-precision integers are the only reason for slowness
That is only part of the story.
Even when numbers fit into machine range, these still matter:
- boxed objects
- VM overhead
- interpreter dispatch
- laziness
- missing optimization flags
2. Comparing unoptimized builds
A very common mistake is benchmarking debug or default builds.
Better approach
- C: compile with
-O2or-O3 - Haskell: compile with
-O2 - PyPy: allow warm-up for JIT-heavy loops
Broken comparison idea:
gcc program.c -o app
./app
Better:
gcc -O2 program.c -lm -o app
./app
3. Ignoring Haskell numeric types
If no type is written, Haskell may infer a more general or more expensive numeric type than you expect.
Potentially problematic style:
factorCount number = ...
Safer for performance-sensitive code:
factorCount :: Int -> Int
factorCount number = ...
4. Forgetting that startup time affects short programs
Comparisons
Comparing the main causes of slowdown
| Factor | C | Python | Erlang | Haskell |
|---|---|---|---|---|
| Execution model | Native compiled | Interpreted VM | BEAM VM | Native compiled with runtime |
| Small integer overhead | Low | High | Medium | Depends on type |
| Arbitrary precision by default | No | Yes | Supports big integers | Integer yes, Int no |
| JIT available | Usually no | CPython no | No | No |
| Lazy evaluation |
Cheat Sheet
Quick answers
- Same algorithm does not mean same speed.
- C is fast here because of native code and cheap machine integers.
- Python is slower mainly because integers are objects and loops run through the interpreter.
- PyPy can greatly improve Python loop performance with JIT compilation.
- Erlang is optimized for concurrency and fault tolerance, not numeric inner loops.
- Haskell performance depends heavily on types, strictness, and compiler flags.
Integer rules
- C
long: fixed-size, fast - Python
int: arbitrary precision, object overhead - Erlang integer: VM-managed, supports big integers
- Haskell
Int: fixed-size, faster - Haskell
Integer: arbitrary precision, slower
Haskell performance tips
ghc -O2 file.hs -o app
Prefer explicit types:
factorCount :: Int -> Int
Be careful with laziness in accumulators.
Safer perfect-square check
Instead of comparing float equality directly:
root = int(math.sqrt(n))
root * root == n:
count -=
FAQ
Why is C usually faster than Python for numeric loops?
C compiles directly to native machine code and uses cheap primitive values. Python executes through an interpreter and uses objects for integers, which adds overhead.
Does Python always use arbitrary-precision integers?
Yes, Python integers are arbitrary precision. Even small integers still behave as Python objects, so there is overhead before values become “big.”
Does Haskell always use arbitrary-precision integers?
No. Int is fixed-size and usually faster. Integer is arbitrary precision and slower but avoids overflow.
Why can PyPy be much faster than CPython?
PyPy uses a JIT compiler, which can optimize repeated pure-Python operations at runtime, especially loops and arithmetic-heavy code.
Is tail recursion enough to make Haskell fast?
Not always. Tail recursion helps, but laziness can still build deferred computations unless values are evaluated strictly.
Is Erlang a bad language for math-heavy programs?
Not bad, but it is not usually the first choice for tight CPU-bound arithmetic loops. Its strengths are concurrency, distribution, and fault tolerance.
How can I make a fair language benchmark?
Use optimized builds, equivalent numeric types, realistic inputs, and idiomatic implementations for each language. Also separate startup time from actual computation time.
Should I optimize the language choice or the algorithm first?
Usually the algorithm first. A better algorithm often gives much larger gains than switching languages.
Mini Project
Description
Build a small benchmark that compares two implementations of the same divisor-counting task in Python: one using plain CPython and one intended to be run with PyPy. The project demonstrates that runtime choice can significantly affect performance even when the source code is identical.
Goal
Write a triangle-number search program, measure how long it takes, and observe how execution model affects performance.
Requirements
- Write a function that counts divisors by checking candidates only up to the square root.
- Write a function that finds the first triangle number with more than a given number of divisors.
- Measure the runtime using Python’s
timemodule. - Print both the triangle number and the elapsed time.
- Keep the algorithm simple and close to the original approach.
Keep learning
Related questions
Building More Fault-Tolerant Embedded C++ Applications for Radiation-Prone ARM Systems
Learn practical C++ and compile-time techniques to reduce soft-error damage in embedded ARM systems exposed to radiation.
C printf Format Specifier for bool: How to Print Boolean Values
Learn how to print bool values in C with printf, why no %b/%B specifier exists, and the common patterns to print true/false or 0/1.
Calling C or C++ from Python: Building Python Bindings
Learn the quickest ways to call C or C++ from Python, including ctypes, C extensions, Cython, and binding tools with practical examples.