Arrays under the hood: A gentle introduction to array data structure

If you have ever created a shopping list for a trip to the grocery store, you have already used an array without even realizing it. But if you haven't, try picturing it as a list of items you plan to receive for Christmas.

Arrays can be ordered or unordered, but that's another day's topic. The key takeaway is that arrays help developers manage lists of items efficiently.

An array is one of the fundamental data structures in programming. This article assumes that you are already familiar with it. To define it clearly, an array is a sequence of collections of a single data type (in statically typed languages) or various data types (for non-statically typed languages).

For example, if you have an e-commerce website that allows users to create a shopping list, it might resemble the below:

array = ["milk","sugar","banana","apple"]

Without the knowledge of arrays, a developer will be forced to create different variables that will store each item in the list above. This will make these variables harder to maintain and keep track of changes made to each item. The importance of arrays can not be overemphasized.

Now you understand why this data structure, let us look at how they work under the hood each time a developer declares and initializes one.

Before I go further, you should note that different programming languages have their way of storing these values in memory and mutating them. I will give you the general programming concept associated with this data structure.

To understand the array data structure or any data structure, we need to analyze the ways our code will interact with that data structure.

Data Structure Operations

Almost all data structure has common ways you can interact with them. These include:

Read: To simply put this, you are trying to look up a value at a particular index in the array. For example, you take the above list and try to read the value "milk" which is in the first position or index 0 from it. An index is how the computer keeps track of the position of each item in memory.

Search: Searching is an operation that refers to looking for a value within the data structure. In the case of an array, you are looking to see if a certain value is present within it, E.g. Searching to see if "sugar" exists in our example above.

Insert: Inserting refers to updating a data structure. If we decide to update that array above and add "yogurts" we have updated or inserted it in the array.

Delete: The name only gives you an idea. This operation refers to removing an item from our array.

How is an array saved in memory?

When we declare an array, for example

array = ["apples", "bananas", "cucumbers", "dates", "elderberries"]

The above diagram is a visual simplification of computer memory, and each number inside the block represents a memory address. This is how the computer keeps track of its storage.

When you create an array like the one we have above, the computer will look at the number of items in the list and create a sequential memory address to store each value in each memory address. The index variable in the first image is how the computer knows the position of each item. In most programming languages, the index starts at 0. We will adopt the same concept here.

Each item is saved in a memory address under the hood. Try to visualize a memory address as a street number used to identify a house. The index can be visualized the same way with a bit of difference.

An index of an array is like a number that tells us where something is in a list. Imagine a toy box with toys inside. Each toy has its number, and when you have a number, you can find the toy you want to play with. That number is like an index for the toys in the box. The index is the high-level abstraction of the memory address.

Read Operations in Array

When an array is saved, the computer knows which item is in the first position (index 0) by the memory address. So when a programmer tries to get an element in a particular index, say index 2, the computer takes the index and adds it to the base memory address (1010 for this case).

If I want to access the element in index 2

# 1010 + 2 = 1012 

print(array[2])

>>> cucumbers

You can see that when the index was added to the base address of the first item, it was easy to get the memory address that houses that particular value. This operation is so fast you won't even notice. In the computer science concept that measures the speed of operations, we can say that this operation is always constant even if the array size grows X more. So reading in an array happens in constant time ( O(1) )

Search Operations in Array

When you try to search if an item is present in an array, the computer performs two steps to find this item.

Iteration
Comparison

The computer goes over each item in the array and compares each item to see if it equals the item you are searching for. It returns true when it finds it or false when it doesn't.

This search and comparison always starts from the first index in the array to the last index. Say we have one million items, the program will perform a million searches and comparisons. In the worst-case scenario, the item may be the last item on the list or may not even be present.

While this article is not geared toward how fast our search works, you can intuitively see that our search will take a linear time to complete. It means a search operation in arrays takes longer than a read operation. There are proven efficient ways to search arrays, but that's out of the scope of the article.

Insert Operations in Array

The insert operation involves updating our array. In some cases, we can add elements to the end. Other times, we may add a new element somewhere in the middle or at the start of the array.

If we intend to add an element to a specific index, say index 2, the computer performs three operations:

Shifts
Insert
Update

Under the hood, the computer goes to that particular index, if it's not empty, it shifts every element after that index to the end before inserting the item into that specific index.

As you can see from the images above, Under the hood, the shifts are necessary to insert the new element.

In a best-case scenario, the element will be inserted at the end of the array. Since we consider worst cases as developers, imagine if we want to insert the new element at the beginning. This means that all elements in the array need to be shifted before the insertion.

This is a simple visualization of how insertion in an array works under the hood.

Delete Operations in Array

Deleting an item in an array is similar to insertion. When an element is deleted from the end, the computer frees up that space. However, when we delete an item like "cucumbers" from the array, consider the visual representation of what occurs.

Step 1: We delete the element.

Step 2: "dates" is shifted to the left.

Step 3: "elderberries" is shifted to the left.

As you can see above, the computer performs two operations which involve:

The actual deletion of the elements
The shifts that occurred to close up the gap

In the worst case, deleting elements from the beginning of the array will make the computer shift all remaining elements to the left.

Conclusion

Choosing a particular data structure to save and manipulate your data depends entirely on the use case. We just studied how the array data structure works under the hood, and now you understand how each operation works under the hood. In future articles, we will explore other data structures and go in-depth on how to use them efficiently. Kudos to getting this far. See you soon.

Photo/screenshot credit: A common sense to data structure and algorithms

Arrays under the hood

A gentle introduction to array data structure.