Tuesday, 6 November 2018

Programming C: Strings and Character Pointers

The most important thing to know about the string data-type in C is that there isn’t one! Many other languages such as Java, C# and Pascal have a string type which lets you create variables to which string literals such as “Hello world” may be assigned.

In C, you can create  and initialize a string like this:

char str1[] = "Hello";

A string is always terminated by a null '\0' character. It turns out that when you initialize a string at the time of its declaration, as in the example above, a null terminator is added automatically. The first null terminator found in a string will be treated as the end of that string. So, given this declaration:

char str2[] = "Goodbye\0 world";

When I display str2 with printf, like this:

printf("%s\n", str2);

This is what is displayed (because the string terminates on the '\0' character):

Goodbye

In C, I can declare and initialize strings either by placing a pair of square brackets after an identifier or by preceding the identifier with an asterisk (or ‘star’) like this:

char str1[] = "Hello";
char *str2 = "Goodbye";

At first sight, these two declarations appear to be more or less equivalent. Each is initialized with a string and I can display that string using printf like this:

printf("%s\n",  str1);
printf("%s\n",  str2);

In fact, the apparent similarity is deceptive. In order to understand why, we now have to get to grips with one of the most challenging aspects of the C language – pointers.

Pointers...

In the example above, the asterisk or ‘star’ (*) indicates that the variable str2 is a pointer to some memory location. In this case, this happens to be the memory location where the array of characters forming the string “Goodbye” is stored. Each piece of data in your computer’s memory is stored at some memory location or ‘address’. You can display that address using the ‘address-of’ operator & placed before a variable name. This is how I would display the addresses of str1 and str2 (note: it is normal to use the %p format specifier to print an address as a hexadecimal value. Many programmers find hexadecimal hard to understand, however, so in this example I use %d to print an address as a decimal value):

printf("%d\n", &str1);
printf("%d\n", &str2);

If you ran this code, it would display some numbers such as:

2686746
2686740

These are the addresses – that is, the positions in your computer’s memory where these variables live. Now that we have the addresses of the variables, let’s take a look at their values – the data which they store. Here I will print out the address of each variable followed by its value shown first as an integer (%d) and then as a string (%s):

char str1[] = "Hello";
char *str2 = "Goodbye";
printf("%d %d %s\n", &str1, str1, str1);
printf("%d %d %s\n", &str2, str2, str2);

And this is what is displayed (though the actual numbers may vary):

2686746 2686746 Hello
2686740 4206628 Goodbye

This tells me that the address of str1 is 2686746 and its value expressed as an integer is the same number 2686746. Its value expressed as a string is the string with which it was initialized, “Hello”.
The address of str2 is 2686740 but its value expressed as an integer is a different number 4206628. Its value expressed as a string is the string with which it was initialized, “Goodbye”.
The important thing to observe here is that the value of the array name, str1 when expressed as an integer is the same as the address of that name. In fact, we can say that:

THE VALUE OF AN ARRAY NAME IS THE ADDRESS OF THE START OF THAT ARRAY.

I’ll have more to say about arrays, pointers and addresses in a future lesson.

NOTE: If you are new to C, you may want to start with lesson 1 in this series: http://www.bitwisemag.com/2017/02/introduction-to-c-programming.html
And if you want to learn C in more depth, why not sign up to my online video course – C Programming for beginners. See here: http://www.bitwisemag.com/2017/01/learn-to-program-c-special-deal.html