C Compilers -- Indirection with Multidim Arrays

Question

By definition, in every standard of C, x[y] is equivalent to (and often compiled as) *((x)+(y)). Additionally, a name of an array is converted to an address operator to it -- so if x is an array, it would be *((&(x))+(y))

So, for a multidimension array, x as a 2 dimension array, x[y][z] would be equivalent to (((&(x))+(y))+(z))

In the small scale toy C compiler I'm working on, this fails to generate proper code, because it tries to indirectly access a pointed to address at every * instruction -- this works for single dimension arrays, but for multi dimension it results in something like (in vaguely assembly pseudocode)

load &x; add y; deref; add z; deref

Where deref is an instruction to load the value at the address of the previous calculation -- as this is how the indirection operator seems to work??

However, this will generate bad code, since we should be dealing all with a single address, only dereferencing at the very end. I'm assuming there's something in the spec I'm missing?

"name of an array is converted to an address operator to it" No. You could say that x is converted to &x[0], which has different type compared to &x. — HolyBlackCat, Commented Oct 5, 2021 at 7:15
Arrays aren't converted to pointers when used as L-values, only R-values. — Barmar, Commented Oct 5, 2021 at 7:19
What deref does depends on the type, and you have to detect that. Generally, yes, deref() { if simple pointer; then deref; if array; then only remove one dimension from type and don't change the value, if pointer to function, then do nothing } — KamilCuk, Commented Oct 5, 2021 at 7:54
Aaand there's also the case where &* is a no-op, so you have to check if the next operation is & and then do nothing, for example. — KamilCuk, Commented Oct 5, 2021 at 8:00

HolyBlackCat · Accepted Answer · 2021-10-05 07:19:37Z

2

name of an array is converted to an address operator to it

No. You could say that x is converted to &x[0], which has different type compared to &x.

Assuming you have T a[M][N];, doing a[x][y] does following:

a is converted to a temporary pointer of type T (*)[N], pointing to the first array element.
This pointer is incremented by x * sizeof(T[N]), i.e. by x * N * sizeof(T).
The pointer is dereferenced, giving you a value of type T[N].
The result is converted to a temporary pointer of type T *.
The pointer is incremented by y * sizeof(T).
Finally, the pointer is dereferenced to produce a value of type T.

Note that an array itself (multidimensional or not) doesn't store any pointers to itself. When converted to a pointer, the resulting pointer is calculated on the fly.

answered Oct 5, 2021 at 7:19

HolyBlackCat

98.7k13 gold badges170 silver badges280 bronze badges

This is giving the same result? I think my issue might be more in what the indirection operator is supposed to output?? Should it have more complex logic to detect this? Because when T(*)[N] is dereferenced, it sees T(*[N]) as a pointer and tries to get the value at that address?
– Popeye Otaku
Commented Oct 5, 2021 at 7:43
1

@PopeyeOtaku You were asking "why 2 dereferences rather than 1", and the answer is "because there are 2 temporary pointers". "Should it have more complex logic" No, there is no complex logic here, except the fact that arrays are implicitly converted to pointers to their first element when passed to [].
– HolyBlackCat
Commented Oct 5, 2021 at 17:03
1

"when T()[N] is dereferenced, it sees T([N]) as a pointer and tries to get the value at that address" The result of the dereference has type T[N]. When it's converted to a pointer (before the second addition), the resulting pointer is computed on the fly, rather than being taken from some memory location..
– HolyBlackCat
Commented Oct 5, 2021 at 17:03

Add a comment |

Lundin · Accepted Answer · 2021-10-05 08:31:06Z

So, for a multidimension array, x as a 2 dimension array, x[y][z] would be equivalent to (((&(x))+(y))+(z))

No, a 2D array is an array of arrays. So *((x)+(y)) gives you that array, x decays into a pointer to the first element, which is then de-referenced to give you array number y.

This array too "decays" into a pointer of the first element, so you get:

( (*((x)+(y))) + (z) )

When part of an expression, arrays always decay into a pointer to it's first element. Except for a few exceptions, namely the & address of and sizeof operators. Why typing out the & as done in your pseudo code is just confusing.

A practical example would be:

int arr[x][y];
for(size_t i=0; i<x; i++)
  for(size_t j=0; j<y; j++)
    arr[i][j] = ...

In the expression arr[i][j], the [] is just "syntactic sugar" for pointer arithmetic (see Do pointers support "array style indexing"?).
So we get *((arr)+(i)), where arr is decayed into a pointer to the type of the first element, int(*)[y].
Pointer arithmetic on that array pointer type yields array number i of type int [y].
Again, there is array decay on this one, because it too is an array part of an expression. We get a pointer to the first element, type int*.
Pointer arithmetic of the int* + j gives the address of the integer, which is then finally de-referenced to give the actual int.

halfer · Accepted Answer · 2021-10-05 20:06:56Z

So, for a multidimension array, x as a 2 dimension array, x[y][z] would be equivalent to (((&(x))+(y))+(z))

You are mistaken. The expression x[y][z] is evaluated like:

*( *( x + y ) + z )

Here is a demonstration program:

#include <stdio.h>

int main(void) 
{
    enum { M = 3, N = 3 };
    int a[M][N] =
    {
        { 1, 2, 3 },
        { 4, 5, 6 },
        { 7, 8, 9 }
    };
    
    for ( size_t i = 0; i < M; i++ )
    {
        for ( size_t j = 0; j < N; j++ )
        {
            printf( "%d ", *( *( a + i ) + j ) );
        }
        putchar( '\n' );
    }

    return 0;
}

Its output is:

1 2 3 
4 5 6 
7 8 9

Array designators used in expressions (with rare exceptions) are implicitly converted to pointers to their first elements.

So if you have an array declared like:

int a[M][N];

then the array designator a is converted to a pointer to its first element ("row"). The type of the array element is int[N]. So a pointer to such object has the type int ( * )[N].

If you want that a pointer point to the i-th element of the array you need to write the expression a + i. Dereferencing the expression you will get the i-th row (one-dimensional array) that in turn used in expressions is converted to a pointer to its first element.

So the expression a + i has the type int ( * )[N].

The expression *( a + i ) has the type int[N] that at once is implicitly converted to a pointer of the type int * to its firs element in the enclosing expression.

The expression *( a + i ) + j points to the j-th element of the "row" of the two-dimensional array. Dereferencing the expression *( *( a + i ) + j ) you will get the j-th element of the i-th row of the array.

Collectives™ on Stack Overflow

C Compilers -- Indirection with Multidim Arrays

3 Answers 3

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Linked

Related