Home Implementing a vector in C
Post
Cancel

Implementing a vector in C

Suppose we need a generic vector data structure in C, where by generic we mean it can handle any type of data. A vector uses an underlying array, therefore it supports index-based access to its elements. Moreover, the underlying array is resizable, meaning that memory space is not wasted uselessly. If the vector is full, adding a new element causes the underlying array to double its size. If the vector is 75% empty, the underlying array halves its size.

How does a vector work?

The ASCII figure below shows the example of a vector v, initially empty and with an initial capacity of 4 items:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
                          -----------------
when v is created :       |   |   |   |   |
                          -----------------

                          -----------------
append 7 to v :           | 7 |   |   |   |
                          -----------------

                          -----------------
append 1 to v :           | 7 | 1 |   |   |
                          -----------------

                          -----------------
append 5 to v :           | 7 | 1 | 5 |   |
                          -----------------

                          -----------------
append 9 to v :           | 7 | 1 | 5 | 9 |
                          -----------------

                            0   1   2   3   4   5   6   7
                          ---------------------------------
insert 2 at index 3 :     | 7 | 1 | 5 | 2 | 9 |   |   |   |   <-- capacity doubles
                          ---------------------------------

                            0   1   2   3   4   5   6   7
                          ---------------------------------
insert 2 at index 3 :     | 7 | 1 | 5 | 2 | 9 |   |   |   |
                          ---------------------------------

                            0   1   2   3   4   5   6   7
                          ---------------------------------
remove item at index 3 :  | 7 | 1 | 5 | 9 |   |   |   |   |
                          ---------------------------------

                            0   1   2   3   4   5   6   7
                          ---------------------------------
remove item at index 1 :  | 7 | 5 | 9 |   |   |   |   |   |
                          ---------------------------------

                            0   1   2   3
                          -----------------
remove item at index 0 :  | 5 | 9 |   |   |   <-- capacity is reduced by half since 75% empty
                          -----------------

Comparison with alternative data structures

A fixed-size array allows access to its elements in \( O(1) \) time. Adding items at the end of the array also takes \( O(1) \) time. However, insertion (other than at the end of the array) and deletion require \( O(n) \) time. As its name implies, a fixed-size array cannot change its size.

A vector uses an underlying resizing array, meaning that its capacity is automatically adjusted to accommodate its elements.

In contrast to an array, a linked list allows insertion and deletion in \( O(1) \) time, but accessing the kth element requires \( O(n) \) time.

An array (a fixed-size array or a resizing array, i.e. a vector) should be used when indexing happens more often than insertion or deletion at arbitrary positions. A linked list is more appropriate when indexing happens rarely and when insertions and deletions are frequent.

Resizing the vector

A vector starts out with an initial capacity, for which we can make an educated guess depending on the application. Let us suppose a good choice for an initial capacity is 4.

When the 5th item is added to the vector, its capacity doubles, becoming 8. When the 9th item is added, the capacity doubles again, becoming 16, and so on. Doubling the vector capacity is thus performed only if it is absolutely necessary to do so.

Halving the vector capacity is more tricky. The aim is to strike a good balance between array resizing operations performed via realloc() and not wasting too much memory space. Suppose a vector has 9 items and its capacity is 16. If we remove one item, it would be tempting to halve the capacity on the spot. But if a 9th item needs to be added right away, we would have to double the capacity yet again. The best bet is to halve the vector capacity when it is one quarter full, because this means we have been removing items from the vector for quite some time and it is reasonable to assume that the need to double the capacity will not arise any time soon. Continuing the example, a vector would keep its capacity of 16 items as long as it has at least 5 items. When it only has 4 items, its capacity becomes 8. Consider another example of a vector with 513 elements and capacity 1024. If we start removing items, the capacity shrinks to 512 when only 256 items remain. The capacity further shrinks to 256 when only 128 items remain, and so on.

Vector definition

Here is a bare-bones definition of a vector in C:

1
2
3
4
5
typedef struct {
    void **data;     /* information stored in the vector */
    size_t count;    /* number of elements currently stored in the vector */
    size_t capacity; /* maximum capacity of the vector */
} Vector;

We want this data structure to be generic, meaning it must be able to handle any type of item: integers, doubles, strings, as well as user-defined data structures. This is why items in a Vector are of type void *. As vector items are organized in an array, what we need for the vector data is a pointer to pointer to void (void **).

We should also define the initial capacity of the vector. Note however that this choice is highly dependent on the application. Here, we will use an initial capacity of 4:

1
const size_t VECTOR_INIT_CAPACITY = 4;

The Vector implementation in libgcds (Library for Generic C Data Structures) defines the data structure as above, with the addition of several function pointers (structs with function pointers are the ancestors of classes; for the C ecosystem, it will just have to do). In this post the aim is to keep things simple and easy to understand, not to build a library, so we will just be defining stand-alone (“normal”) functions for the Vector API.

A basic vector API

Any basic Vector API should have the following methods:

  • A method to create a vector: Vector *vector_create() creates a Vector and returns a pointer to it, or the NULL pointer in case of failure.
  • A method to free a vector: void vector_free(Vector *vector) frees the specified vector.
  • A method to add an item at the end of a vector: int vector_add(Vector *vector, void *item) attempts to add the given item at the end of the vector, doubling the size of the underlying array if necessary. Returns 0 for success and -1 for failure.
  • A method to insert an item at an arbitrary position: int vector_insert(Vector *vector, void *item, int index) attempts to insert the given item at a specified index in the vector, doubling the size of the underlying array if necessary. Returns 0 for success and -1 for failure.
  • A method to delete an item at an arbitrary position: int vector_delete(Vector *vector, int index) attempts to delete the item at the specified index in the vector, halving the size of the underlying array if necessary. Returns 0 for success and -1 for failure.

In the remainder of this section we will implement each of these methods in turn.

vector_create()

We start by allocating memory for a vector and return NULL if the allocation fails. Then we initialize the number of elements to 0 and the capacity to the initial capacity. We must also allocate memory for the underlying array vector->data. If this is not possible, we free the vector pointer and return NULL. If everything went fine, the function returns a pointer to the brand new vector.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Vector *vector_create()
{
    Vector *vector = (Vector *)malloc(sizeof(Vector));
    if (!vector) return NULL;

    vector->count = 0;
    vector->capacity = VECTOR_INIT_CAPACITY;

    vector->data = (void *)malloc(vector->capacity * sizeof(void *));
    if (!vector->data) {
        vector_free(vector);
        return NULL;
    }

    return vector;
}

vector_free()

If the pointer to Vector is not NULL, we attempt to deallocate its data, then the vector itself.

1
2
3
4
5
6
7
8
void vector_free(Vector *vector)
{
    if (vector) {
        if (vector->data)
            free(vector->data);
        free(vector);
    }
}

_vector_resize()

Yes, _vector_resize() is not listed above. The reason for this is that this method is not part of the public API, but it is required for methods that may need to resize the underlying array: vector_add(), vector_insert() and vector_delete(). The client of the Vector API does not even need to know that this function exists. In order to keep it private to the implementation file of the vector (the .c file), we will be declaring it static.

The function starts out by attempting to reallocate the vector’s underlying array data with the new capacity.

Note: We use a new void ** pointer for this reallocation. This is important, because realloc() is not guaranteed to return a pointer to the memory location occupied by the array to be resized.

1
2
3
4
5
6
7
8
9
10
11
12
static int _vector_resize(Vector *vector, size_t capacity)
{
    void **data = realloc(vector->data, capacity * sizeof(void *));
    vector->capacity = capacity;

    if (!data) return -1;
    if (data != vector->data)
        vector->data = data;
    data = NULL;

    return 0;
}

vector_add()

Adding an item at the end of a vector can fail if the vector or its data is NULL, or if the resizing is unsuccessful. Resizing the underlying array is performed if there is no free slot in the vector to add the new item.

1
2
3
4
5
6
7
8
9
10
11
12
13
int vector_add(Vector *vector, void *item)
{
    if (!vector || !vector->data) return -1;

    if (vector->count == vector->capacity) {
        if (_vector_resize(vector, 2 * vector->capacity) == -1)
            return -1;
    }

    vector->data[vector->count++] = item;

    return 0;
}

vector_insert()

Inserting an item at an arbitrary position in a vector can fail if the vector or its data is NULL, if the index is incorrect, or if the resizing is unsuccessful. As for vector_add(), resizing is performed if there is no free slot for the new item. In addition, every item in the vector after the position designated by index must be shifted by one position to the right. A special case is identified where insertion takes place at the end of the vector, in which case vector_add() is used directly. As we’ve seen above, vector_add() may also fail, so a check for its return code is equally performed.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
int vector_insert(Vector *vector, void *item, int index)
{
    if (!vector || !vector->data) return -1;
    if (index < 0 || index > vector->count) return -1;

    if (index == vector->count) {
        if (vector_add(vector, item) == -1) return -1;
    } else {
        if (vector->count == vector->capacity) {
            if (_vector_resize(vector, 2 * vector->capacity) == -1)
                return -1;
        }

        for (int i = vector->count; i > index; i--)
            vector->data[i] = vector->data[i-1];

        vector->data[index] = item;
        vector->count++;
    }

    return 0;
}

vector_delete()

Deleting an item at an arbitrary position in a vector can fail if the vector or its data is NULL, if the index is incorrect, or if the resizing is unsuccessful. Resizing the underlying array to half its capacity is performed if the vector is one quarter full after deletion. Every item in the vector after the position designated by index must be shifted by one position to the left.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
int vector_delete(Vector *vector, int index)
{
    if (!vector || !vector->data) return -1;
    if (index < 0 || index >= vector->count) return -1;

    vector->count--;

    for (int i = index; i < vector->count; i++)
        vector->data[i] = vector->data[i+1];

    if (vector->count == vector->capacity / 4
            && vector->capacity > VECTOR_INIT_CAPACITY) {
        if (_vector_resize(vector, vector->capacity / 2) == -1)
            return -1;
    }

    return 0;
}

An improved Vector API

For all practical purposes, there is a high chance that the basic API we have just seen above is not sufficient. As the need arises, we may add several useful functions to our Vector API, such as:

  • A method to check whether the vector contains a given item: bool vector_contains(Vector* vector, void* item) determines whether the vector contains the specified item. Returns true if the item exists, false otherwise.
  • A method to determine the index of a given item in the vector: int vector_index(Vector* vector, void* item) determines the index of the specified item in the vector, or -1 if no such item exists.

Note: In both cases that we do not actually examine the value of the item, since at this point we cannot even know what kind of items we are dealing with. We simply compare void pointers, which enables us to determine whether the given item exists in the vector’s data array.

vector_contains()

If the vector is not NULL, we iterate its data array and compare its every item against the specified one.

1
2
3
4
5
6
7
8
bool vector_contains(Vector* vector, void* item)
{
    if (!vector) return false;
    for (unsigned int i = 0; i < vector->size; i++)
        if (vector->data[i] == item)
            return true;
    return false;
}

vector_index()

If the vector is not NULL, we iterate its data array until we find the specified item or until we hit the end of the array.

1
2
3
4
5
6
7
8
9
10
11
int vector_index(Vector* vector, void* item)
{
    if (!vector) return -1;

    for (unsigned int i = 0; i < vector->size; i++) {
        if (vector->data[i] == item)
            return (int) i;
    }

    return -1;
}

Examples of vectors

Here we will see how two clients may use the Vector API that we have just examined: one client creates a vector of integers, and the other one creates a vector of user-defined data structures.

Example #1: A vector of integers

Suppose we need a vector to handle integers. First, values 2, 4 and 6 are added at the end of the vector using vector_add(). The vector is [ 2 4 6 ]. Second, the values 1, 3 and 5 are inserted in between using vector_insert(), such that the vector becomes [ 1 2 3 4 5 6 ]. Finally, the last three values are deleted from the vector using vector_delete(). The vector is now [ 1 2 3 ]. The following program details these steps:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#include "vector.h"
#include <stdio.h>
#include <stdlib.h>

void vector_int_print(Vector *vector);

int main(void)
{
    int i;
    Vector *vector = vector_create();
    if (!vector) {
        fprintf(stderr, "Cannot allocate vector.\n");
        exit(EXIT_FAILURE);
    }

    int even[3] = {2, 4, 6};
    for (i = 0; i < 3; i++)
        vector_add(vector, &even[i]);
    vector_int_print(vector); /* [ 2 4 6 ] */

    int odd[3] = {1, 3, 5};
    vector_insert(vector, &odd[0], 0);
    vector_insert(vector, &odd[1], 2);
    vector_insert(vector, &odd[2], 4);
    vector_int_print(vector); /* [ 1 2 3 4 5 6 ] */

    for (i = 5; i > 2; i--)
        vector_delete(vector, i);
    vector_int_print(vector); /* [ 1 2 3 ] */

    vector_free(vector);

    return EXIT_SUCCESS;
}

void vector_int_print(Vector *vector)
{
    printf("[ ");
    for (size_t i = 0; i < vector->count; i++)
        printf("%d ", *((int *) vector->data[i]));
    printf("]\n");
}

Note: Checks for return codes from vector_add(), vector_insert() and vector_delete() should be performed, but have been omitted here for the sake of brevity.

Example #2: A vector of user-defined data structures

Suppose we now need a vector to handle a user-defined data structure representing 2D points. We first add the points with coordinates (1, 10) and (3, 30) to the vector using vector_add(). We then insert the point (2, 20) in between using vector_insert(). The following program details these steps:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#include "vector.h"
#include <stdio.h>
#include <stdlib.h>

void vector_point_print(Vector *vector);

typedef struct {
    int x;
    int y;
} Point;

int main(void)
{
    Vector *vector = vector_create();
    if (!vector) {
        fprintf(stderr, "Cannot allocate vector.\n");
        exit(EXIT_FAILURE);
    }

    Point p1 = {.x = 1, .y = 10};
    Point p2 = {.x = 2, .y = 20};
    Point p3 = {.x = 3, .y = 30};

    vector_add(vector, &p1);
    vector_add(vector, &p3);
    vector_insert(vector, &p2, 1);

    vector_point_print(vector); /* [ (1, 10) (2, 20) (3, 30) ] */
    vector_free(vector);

    return EXIT_SUCCESS;
}

void vector_point_print(Vector *vector)
{
    printf("[ ");
    for (size_t i = 0; i < vector->count; i++) {
        Point *point = (Point *) vector->data[i];
        printf("(%d, %d) ", point->x, point->y);
    }
    printf("]\n");
}

Note: Checks for return codes from vector_add() and vector_insert() should be performed, but have been omitted here for the sake of brevity.

Testing the implementation

This Vector implementation has been extensively tested using the cmocka testing framework. The tests are provided in the file vector-test.c.

Install cmocka first. Then, to build the tests, run:

1
gcc -Wall vector-test.c vector.c -g -o vector-test -lcmocka

To execute the tests, run:

1
./test-vector

To execute the tests using valgrind in order to detect memory leaks, run:

1
valgrind --leak-check=full ./test-vector

There should be no memory leaks :-)

Availability

The full Vector implementation along with the tests is available on GitHub.

This post is licensed under CC BY 4.0 by the author.

(Anonymous) unions in C

How to read safely from stdin in C

Comments powered by Disqus.