Monday, October 09, 2006

Arrays, indexes and memory

Before we go into the next article on inserting into arrays, I thought we could take a quick detour to answer one of the questions we had raised earlier:
Why do array indexes go from 0 to n-1 in languages like java, C, C++, etc?

In order to understand the reasons, we need to understand how memory is allocated for variables in general and arrays in particular.

Let us imagine the computer memory to be a large tract of land owned by the government. Whenever we want some land, we can approach the government and they will provide us the land we want, if available, as long as it is for a fixed time period, we agree to use it only to build temporary structures and we promise to return it back to the government when we are done using it.

When the government allocates the land to us, they ensure that no piece of it has already been allocated to anybody else. They also assume that we will not usurp/use any land outside of what has been allocated to us.

Some governments will keep a close watch to ensure that we do not build outside our area, while other more liberal governments will leave that responsibility with us.

If we have asked for place for 10 stalls and surreptitiously or unknowingly build the 11th one, we could end up building our stall on top of somebody else’s toilet, for example. Then when they use their toilet (or what used to be their toilet), they may end up pooping on our sofa creating a mess all around. This is exactly what happens when we misuse memory and overwrite some other variable – in unix the system will crash with a “core dump” message – kind of apt for our analogy, don’t you think?

We could ask for, say,
(1) 1000 sq. meters of land in order to put up my circus tent for a period of one month, or
(2) some contiguous space for a dozen 10 meters wide (and 20 meters long) stalls for me to run a carnival for the next one week.

The first request above is similar to asking for space for a primitive data type or a structure:

Int i;
Exchanger e = new Exchanger();

The second request is similar to asking for an array, and each stall is an element in that array, each element being the same size.

int [] intArray = new int[12];

The government will probably tell you that your area starts at a particular point and extends till some other point. The lease agreement will be for the period you have requested the land for (the scope of the variable).

Our location provided by the government is the starting address of our piece of land, which incidentally, in the second case, is also the starting address of our first stall.

Leaving the land analogy behind, we can now see how this maps to arrays.
The array variable contains the starting address of the entire memory that has been given to us, which incidentally, is also the starting address of the first element.
So the address of each of the elements can be calculated as
Address of first element = starting address of array + 0 * size of each element.
Address of second element = starting address of array + 1 * size of each element
Address of nth element = starting address of array + (n-1) * size of each element.

So it is only natural, in the programming world, that we give the array subscripts ranging from 0 to n-1. This is quite counter-intuitive for us [who has heard a door number 0, for example?] and thus some languages, like most variations of BASIC, have chosen to go with subscripts ranging from 1 to n. They would internally calculate the address of the different elements as
Address of first element = starting address of array + (1-1)* size of each element.
Address of second element = starting address of array + (2-1) * size of each element.
Address of nth element = starting address of array + (n-1) * size of each element.
Which as you can see is identical to the previous set of formulae.
I hope this has given you a brief idea about memory allocation and management.
Next article will be about the 3rd insertion technique I promised to write.

No comments: