Arrays & Pointer Arithmetic.
Foreword & Credits
Don't let the word "arithmetic" intimidate you. Pointer arithmetic is just
simple addition and subtraction (+, -, ++, --) that moves pointers through
memory. If you can do x + 3, you can do pointer arithmetic.
I'm going to rip an example right from Low Level Learning's video on pointer arithmetic, but translating it to C++ and without all of the assembly stuff.
I still recommend watching his video regardless.
First, a very long detour: what is an array?
To understand pointer arithmetic, we need to understand arrays at a low level. An array is a data structure that has a fixed capacity of elements of the same type that you can store inside of it. For example, let's say we wanted to store the numbers 1-5 in one data structure:
For all examples going forward, we will be using C-style arrays because, while C++
provides some niceties in std::array, C-style arrays make these concepts a lot clearer,
and we don't have to worry about all of the extra fluff that C++ gives us (with one exception
I'll note later).
Array on the stack
An array in memory is just a contiguous (sharing a common border; touching) block where elements are stored back-to-back. Take the above array declaration, and consider what that looks like in memory:
But, of course, that's not the entire story
Take this code here & observe the output:
All of these things point to same address in memory, but, confusingly, all
of them have different types. The typeid translates to types we're already familiar with:
| Type Name | C++ Type |
|---|---|
| PA5_i | Pointer to Array of 5 ints |
| Pi | Pointer to int |
| A5_i | Array of 5 ints |
What is pointer decay?
Let's zoom in on c_array for the moment. "Pointer decay" means the array automatically
converts to a pointer to its first element in most contexts. In other words, the
compiler, because it hates you (and only because it hates you), secretly converts
your array to a pointer to its first element.
Why would someone do this to you? Well, pointer decay happens a good reason, I guess. Efficiency! Passing pointers is more optimal than copying entire arrays.
A good rule of thumb: If you use the array name by itself
(not with [] or sizeof), it decays to a pointer.
There are exactly three contexts where arrays don't decay:
Why does this matter?
The type matters for pointer arithmetic. Because of pointer decay, when we want to pass a C-style array to a function, we have to be careful about the type we use.
This is why C programmers pass size separately:
These function signatures are ALL identical to the compiler:
The moment you pass an array to a function, it decays to a pointer & loses all size information. This is one of C++'s most infamous footguns inherited from C.
Fortunately, C++ provides std::array to solve this problem. Since it's a
real object that can be copied, passed by value, and keeps its type information,
we don't actually have to worry about pointer decay with it:
Hey, remember pointer arithmetic?
Pointer arithmetic is weird. Take this code for example:
What would you expect element3 to return? You would be forgiven for
thinking that maybe it returns some address like 0x1002, but that
is not the case. The deal with pointer arithmetic is that the
compiler already knows what you're talking about, and doesn't
literally try to add 2 to the address of the pointer. It instead
recognizes that:
"The user wants to move forward 2 elements. Each element is an int (4 bytes). So I'll add 2 × 4 = 8 bytes to the address."
So the math behind element3 is actually:
The compiler scales the addition by the size of the type. This is the magic (and confusion) of pointer arithmetic.
Array indexing is also just pointer arithmetic in disguise:
when you write c_array[2], the compiler translates it to:
c_array[2] ≡ *(c_array + 2)
That's right - array indexing IS pointer arithmetic plus dereferencing. The brackets [] are just syntactic sugar. Both expressions:
- Take the base address (c_array)
- Add 2 element-widths (+ 2 → + 8 bytes)
- Dereference to get the value (*)
You could even write this (but please don't):
Because addition is commutative, c_array + 2 is the same as 2 + c_array, so c_array[2] and 2[c_array] are equivalent. This is cursed knowledge that you should never use in production code.
Contrived Example
Let's take this code example that I modified slightly from Low Level's video. The scenario
is that we have an array of people with the type Person. We want to set the age
of everyone in that array to 0.
Ignoring the fact that this example is a bit contrived, let's focus on the line
++p_person;. What does that do?
When you increment a pointer, it moves forward by the size of the type it points to.
In this case, p_person is a Person*, and sizeof(Person) is 68 bytes
(64 bytes for name + 4 bytes for age). So, each time we do ++p_person;,
it moves forward by 68 bytes in memory, effectively pointing to the next Person in the array.
In case you were wondering why this example is a little contrived, it's because there's actually a better way to do this with a range based for loop:
Some other trivia
When you subtract two pointers from each other, you get the number of elements (distance) between them (not bytes):
Pointer Comparison
You can compare pointers with ==, !=, <, >, <=, >=. They compare addresses, not values.
This is occasionally useful for bounds checking, otherwise it's not super common nor useful:
Note: Only compare pointers within the same array. Comparing pointers from different arrays is undefined behavior.
Traversal by pointer vs index
There's often multiple ways to traverse an array: by index or by pointer. For all intents and purposes, there is no modern, meaningful difference between the two methods in C++. You should use index traversal unless you have a specific reason to use pointer traversal (e.g., working with C APIs).