Suppose we need a generic vector data structure in C, where by generic we mean it can handle any type of data. A vector uses an underlying array, therefore it supports index-based access to its elements. Moreover, the underlying array is resizable, meaning that memory space is not wasted uselessly. If the vector is full, adding a new element causes the underlying array to double its size. If the vector is 75% empty, the underlying array halves its size.
How does a vector work?
The ASCII figure below shows the example of a vector v, initially empty and with an initial capacity of 4 items:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
-----------------
when v is created : | | | | |
-----------------
-----------------
append 7 to v : | 7 | | | |
-----------------
-----------------
append 1 to v : | 7 | 1 | | |
-----------------
-----------------
append 5 to v : | 7 | 1 | 5 | |
-----------------
-----------------
append 9 to v : | 7 | 1 | 5 | 9 |
-----------------
0 1 2 3 4 5 6 7
---------------------------------
insert 2 at index 3 : | 7 | 1 | 5 | 2 | 9 | | | | <-- capacity doubles
---------------------------------
0 1 2 3 4 5 6 7
---------------------------------
insert 2 at index 3 : | 7 | 1 | 5 | 2 | 9 | | | |
---------------------------------
0 1 2 3 4 5 6 7
---------------------------------
remove item at index 3 : | 7 | 1 | 5 | 9 | | | | |
---------------------------------
0 1 2 3 4 5 6 7
---------------------------------
remove item at index 1 : | 7 | 5 | 9 | | | | | |
---------------------------------
0 1 2 3
-----------------
remove item at index 0 : | 5 | 9 | | | <-- capacity is reduced by half since 75% empty
-----------------
Comparison with alternative data structures
A fixed-size array allows access to its elements in \( O(1) \) time. Adding items at the end of the array also takes \( O(1) \) time. However, insertion (other than at the end of the array) and deletion require \( O(n) \) time. As its name implies, a fixed-size array cannot change its size.
A vector uses an underlying resizing array, meaning that its capacity is automatically adjusted to accommodate its elements.
In contrast to an array, a linked list allows insertion and deletion in \( O(1) \) time, but accessing the kth element requires \( O(n) \) time.
An array (a fixed-size array or a resizing array, i.e. a vector) should be used when indexing happens more often than insertion or deletion at arbitrary positions. A linked list is more appropriate when indexing happens rarely and when insertions and deletions are frequent.
Resizing the vector
A vector starts out with an initial capacity, for which we can make an educated guess depending on the application. Let us suppose a good choice for an initial capacity is 4.
When the 5th item is added to the vector, its capacity doubles, becoming 8. When the 9th item is added, the capacity doubles again, becoming 16, and so on. Doubling the vector capacity is thus performed only if it is absolutely necessary to do so.
Halving the vector capacity is more tricky. The aim is to strike a good balance between array resizing operations performed via realloc()
and not wasting too much memory space. Suppose a vector has 9 items and its capacity is 16. If we remove one item, it would be tempting to halve the capacity on the spot. But if a 9th item needs to be added right away, we would have to double the capacity yet again. The best bet is to halve the vector capacity when it is one quarter full, because this means we have been removing items from the vector for quite some time and it is reasonable to assume that the need to double the capacity will not arise any time soon. Continuing the example, a vector would keep its capacity of 16 items as long as it has at least 5 items. When it only has 4 items, its capacity becomes 8. Consider another example of a vector with 513 elements and capacity 1024. If we start removing items, the capacity shrinks to 512 when only 256 items remain. The capacity further shrinks to 256 when only 128 items remain, and so on.
Vector definition
Here is a bare-bones definition of a vector in C:
1
2
3
4
5
typedef struct {
void **data; /* information stored in the vector */
size_t count; /* number of elements currently stored in the vector */
size_t capacity; /* maximum capacity of the vector */
} Vector;
We want this data structure to be generic, meaning it must be able to handle any type of item: integers, doubles, strings, as well as user-defined data structures. This is why items in a Vector
are of type void *
. As vector items are organized in an array, what we need for the vector data
is a pointer to pointer to void (void **
).
We should also define the initial capacity of the vector. Note however that this choice is highly dependent on the application. Here, we will use an initial capacity of 4:
1
const size_t VECTOR_INIT_CAPACITY = 4;
The Vector
implementation in libgcds (Library for Generic C Data Structures) defines the data structure as above, with the addition of several function pointers (struct
s with function pointers are the ancestors of classes; for the C ecosystem, it will just have to do). In this post the aim is to keep things simple and easy to understand, not to build a library, so we will just be defining stand-alone (“normal”) functions for the Vector
API.
A basic vector API
Any basic Vector
API should have the following methods:
- A method to create a vector:
Vector *vector_create()
creates aVector
and returns a pointer to it, or theNULL
pointer in case of failure. - A method to free a vector:
void vector_free(Vector *vector)
frees the specifiedvector
. - A method to add an item at the end of a vector:
int vector_add(Vector *vector, void *item)
attempts to add the givenitem
at the end of thevector
, doubling the size of the underlying array if necessary. Returns 0 for success and -1 for failure. - A method to insert an item at an arbitrary position:
int vector_insert(Vector *vector, void *item, int index)
attempts to insert the givenitem
at a specifiedindex
in thevector
, doubling the size of the underlying array if necessary. Returns 0 for success and -1 for failure. - A method to delete an item at an arbitrary position:
int vector_delete(Vector *vector, int index)
attempts to delete theitem
at the specifiedindex
in thevector
, halving the size of the underlying array if necessary. Returns 0 for success and -1 for failure.
In the remainder of this section we will implement each of these methods in turn.
vector_create()
We start by allocating memory for a vector and return NULL
if the allocation fails. Then we initialize the number of elements to 0 and the capacity to the initial capacity. We must also allocate memory for the underlying array vector->data
. If this is not possible, we free the vector pointer and return NULL
. If everything went fine, the function returns a pointer to the brand new vector.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Vector *vector_create()
{
Vector *vector = (Vector *)malloc(sizeof(Vector));
if (!vector) return NULL;
vector->count = 0;
vector->capacity = VECTOR_INIT_CAPACITY;
vector->data = (void *)malloc(vector->capacity * sizeof(void *));
if (!vector->data) {
vector_free(vector);
return NULL;
}
return vector;
}
vector_free()
If the pointer to Vector
is not NULL
, we attempt to deallocate its data, then the vector itself.
1
2
3
4
5
6
7
8
void vector_free(Vector *vector)
{
if (vector) {
if (vector->data)
free(vector->data);
free(vector);
}
}
_vector_resize()
Yes, _vector_resize()
is not listed above. The reason for this is that this method is not part of the public API, but it is required for methods that may need to resize the underlying array: vector_add()
, vector_insert()
and vector_delete()
. The client of the Vector
API does not even need to know that this function exists. In order to keep it private to the implementation file of the vector (the .c
file), we will be declaring it static
.
The function starts out by attempting to reallocate the vector’s underlying array data
with the new capacity
.
Note: We use a new
void **
pointer for this reallocation. This is important, becauserealloc()
is not guaranteed to return a pointer to the memory location occupied by the array to be resized.
1
2
3
4
5
6
7
8
9
10
11
12
static int _vector_resize(Vector *vector, size_t capacity)
{
void **data = realloc(vector->data, capacity * sizeof(void *));
vector->capacity = capacity;
if (!data) return -1;
if (data != vector->data)
vector->data = data;
data = NULL;
return 0;
}
vector_add()
Adding an item at the end of a vector can fail if the vector or its data is NULL
, or if the resizing is unsuccessful. Resizing the underlying array is performed if there is no free slot in the vector to add the new item.
1
2
3
4
5
6
7
8
9
10
11
12
13
int vector_add(Vector *vector, void *item)
{
if (!vector || !vector->data) return -1;
if (vector->count == vector->capacity) {
if (_vector_resize(vector, 2 * vector->capacity) == -1)
return -1;
}
vector->data[vector->count++] = item;
return 0;
}
vector_insert()
Inserting an item at an arbitrary position in a vector can fail if the vector or its data is NULL
, if the index is incorrect, or if the resizing is unsuccessful. As for vector_add()
, resizing is performed if there is no free slot for the new item. In addition, every item in the vector after the position designated by index
must be shifted by one position to the right. A special case is identified where insertion takes place at the end of the vector, in which case vector_add()
is used directly. As we’ve seen above, vector_add()
may also fail, so a check for its return code is equally performed.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
int vector_insert(Vector *vector, void *item, int index)
{
if (!vector || !vector->data) return -1;
if (index < 0 || index > vector->count) return -1;
if (index == vector->count) {
if (vector_add(vector, item) == -1) return -1;
} else {
if (vector->count == vector->capacity) {
if (_vector_resize(vector, 2 * vector->capacity) == -1)
return -1;
}
for (int i = vector->count; i > index; i--)
vector->data[i] = vector->data[i-1];
vector->data[index] = item;
vector->count++;
}
return 0;
}
vector_delete()
Deleting an item at an arbitrary position in a vector can fail if the vector or its data is NULL
, if the index is incorrect, or if the resizing is unsuccessful. Resizing the underlying array to half its capacity is performed if the vector is one quarter full after deletion. Every item in the vector after the position designated by index
must be shifted by one position to the left.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
int vector_delete(Vector *vector, int index)
{
if (!vector || !vector->data) return -1;
if (index < 0 || index >= vector->count) return -1;
vector->count--;
for (int i = index; i < vector->count; i++)
vector->data[i] = vector->data[i+1];
if (vector->count == vector->capacity / 4
&& vector->capacity > VECTOR_INIT_CAPACITY) {
if (_vector_resize(vector, vector->capacity / 2) == -1)
return -1;
}
return 0;
}
An improved Vector API
For all practical purposes, there is a high chance that the basic API we have just seen above is not sufficient. As the need arises, we may add several useful functions to our Vector
API, such as:
- A method to check whether the vector contains a given item:
bool vector_contains(Vector* vector, void* item)
determines whether thevector
contains the specifieditem
. Returnstrue
if the item exists,false
otherwise. - A method to determine the index of a given item in the vector:
int vector_index(Vector* vector, void* item)
determines the index of the specifieditem
in thevector
, or -1 if no such item exists.
Note: In both cases that we do not actually examine the value of the item, since at this point we cannot even know what kind of items we are dealing with. We simply compare void pointers, which enables us to determine whether the given item exists in the vector’s
data
array.
vector_contains()
If the vector is not NULL
, we iterate its data
array and compare its every item against the specified one.
1
2
3
4
5
6
7
8
bool vector_contains(Vector* vector, void* item)
{
if (!vector) return false;
for (unsigned int i = 0; i < vector->size; i++)
if (vector->data[i] == item)
return true;
return false;
}
vector_index()
If the vector is not NULL
, we iterate its data
array until we find the specified item or until we hit the end of the array.
1
2
3
4
5
6
7
8
9
10
11
int vector_index(Vector* vector, void* item)
{
if (!vector) return -1;
for (unsigned int i = 0; i < vector->size; i++) {
if (vector->data[i] == item)
return (int) i;
}
return -1;
}
Examples of vectors
Here we will see how two clients may use the Vector
API that we have just examined: one client creates a vector of integers, and the other one creates a vector of user-defined data structures.
Example #1: A vector of integers
Suppose we need a vector to handle integers. First, values 2, 4 and 6 are added at the end of the vector using vector_add()
. The vector is [ 2 4 6 ]
. Second, the values 1, 3 and 5 are inserted in between using vector_insert()
, such that the vector becomes [ 1 2 3 4 5 6 ]
. Finally, the last three values are deleted from the vector using vector_delete()
. The vector is now [ 1 2 3 ]
. The following program details these steps:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#include "vector.h"
#include <stdio.h>
#include <stdlib.h>
void vector_int_print(Vector *vector);
int main(void)
{
int i;
Vector *vector = vector_create();
if (!vector) {
fprintf(stderr, "Cannot allocate vector.\n");
exit(EXIT_FAILURE);
}
int even[3] = {2, 4, 6};
for (i = 0; i < 3; i++)
vector_add(vector, &even[i]);
vector_int_print(vector); /* [ 2 4 6 ] */
int odd[3] = {1, 3, 5};
vector_insert(vector, &odd[0], 0);
vector_insert(vector, &odd[1], 2);
vector_insert(vector, &odd[2], 4);
vector_int_print(vector); /* [ 1 2 3 4 5 6 ] */
for (i = 5; i > 2; i--)
vector_delete(vector, i);
vector_int_print(vector); /* [ 1 2 3 ] */
vector_free(vector);
return EXIT_SUCCESS;
}
void vector_int_print(Vector *vector)
{
printf("[ ");
for (size_t i = 0; i < vector->count; i++)
printf("%d ", *((int *) vector->data[i]));
printf("]\n");
}
Note: Checks for return codes from
vector_add()
,vector_insert()
andvector_delete()
should be performed, but have been omitted here for the sake of brevity.
Example #2: A vector of user-defined data structures
Suppose we now need a vector to handle a user-defined data structure representing 2D points. We first add the points with coordinates (1, 10) and (3, 30) to the vector using vector_add()
. We then insert the point (2, 20) in between using vector_insert()
. The following program details these steps:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#include "vector.h"
#include <stdio.h>
#include <stdlib.h>
void vector_point_print(Vector *vector);
typedef struct {
int x;
int y;
} Point;
int main(void)
{
Vector *vector = vector_create();
if (!vector) {
fprintf(stderr, "Cannot allocate vector.\n");
exit(EXIT_FAILURE);
}
Point p1 = {.x = 1, .y = 10};
Point p2 = {.x = 2, .y = 20};
Point p3 = {.x = 3, .y = 30};
vector_add(vector, &p1);
vector_add(vector, &p3);
vector_insert(vector, &p2, 1);
vector_point_print(vector); /* [ (1, 10) (2, 20) (3, 30) ] */
vector_free(vector);
return EXIT_SUCCESS;
}
void vector_point_print(Vector *vector)
{
printf("[ ");
for (size_t i = 0; i < vector->count; i++) {
Point *point = (Point *) vector->data[i];
printf("(%d, %d) ", point->x, point->y);
}
printf("]\n");
}
Note: Checks for return codes from
vector_add()
andvector_insert()
should be performed, but have been omitted here for the sake of brevity.
Testing the implementation
This Vector
implementation has been extensively tested using the cmocka testing framework. The tests are provided in the file vector-test.c.
Install cmocka first. Then, to build the tests, run:
1
gcc -Wall vector-test.c vector.c -g -o vector-test -lcmocka
To execute the tests, run:
1
./test-vector
To execute the tests using valgrind in order to detect memory leaks, run:
1
valgrind --leak-check=full ./test-vector
There should be no memory leaks :-)
Availability
The full Vector
implementation along with the tests is available on GitHub.
Comments powered by Disqus.