The C programming language reading notes

The C Programming Language

by Brian Kernighan and Dennis Ritchie

Worldcat entry 

An HTML version with author notes and supplementary
material is made by Dr Chuck:
https://www.cc4e.com/index.php


Chapter 1


C is a small language, most of what is used comes from
libraries, which are often defined by ANSI. You get them
with #include, like:

#include <stdio.h>

printf is a commonly used function, and an example of this.



Integer division truncates numbers, so when doing
division keep in mind what will truncate to 0 before
multiplying with other numbers.

Symbolic constants help with readbility when constant values
are used.

getchar()

Get the next character of input. Requires int because EOF is
too big for char.

putchar(c)

print integer as character

single quotes represents integer of the character
(incl. escape chars)

switch might be preferred over if else when the condition
is whether int or char expr matches one in a set of const.

properly designed functions should make it easy to know
what is done, and ignore how it does it.

functions may even be written and called once, to make
the code easier to understand to the reader.

function definition form:

return-type function-name (parameter declarations)
{
  declarations
  statements
}

functions can be defined using a function prototype, which
just states the fn without the body.

c is call by value, arguments pass values and therefore
are temporary values made available to the function. The fn
can do whatever it likes to the value, without affecting the
calling fns value. Changing the value in both contexts, one
can reference the memory holding the value using pointers,
which is the address of the memory.

however arrays are a different model, as they pass the address
to the beginning of the array, so arrays are altered in both
contexts.

\0 is a null character, value is 0, marks the end of a string
is a char array.

extern variables are defined outside of all fns, then declared
within the fns that want to access, using extern,

int some_value;

int main()
{
        extern int some_value;
}

although when the definition is before fn, extern isn't
necessary, just convention.

definition - where variable is created/assigned storage
declaration - where variable is stated but no storage alloc


Chapter 2


constants use special notation:
L, 12345679L - long
U, 1234567890U - unsigned (also UL)
N.M, 123.456 - float (default type double)
Ne-M, 1e-2 - float (default type double)
F, 123.0F - float
L, 123.0L - long double
0N, 012 - octal (also U, L, and UL)
0XN, 0X1F - hex (also U, L, and UL)

escape sequences
\a alert
\b backspace
\f formfeed
\n newline
\r carriage return
\t horizontal tab
\v vertical tab
\\ backslash
\? question mark
\' single quote
\" double quote
\NNN octal
\xNNN hex
\0 null (has numeric value of 0)

string constant
"" empty
"any number of chars"
"these two strings " "will be concatenated"
strings terminate with \0
therefore, 'x' != "x"

enums (often alternative to #define)
enum bool {NO, YES};
enum months {JAN = 1, FEB, MAR};
enum chars {a = 'a', b = 'b'};

const qualifier, value won't be changed from declaration,

const char msg[] = "warning: ";

int strlen(const char[]);

% operator can't be used with float or double

<ctype.h> provides tests and conversions independent of
character set, tolower, isdigit,...

specify signed or unsigned if non-character data is to be
stored in char

type conversion in arithmetic
long double > double > float > int > char/short
long > not long

<math.h> generally uses doubles

unsigned operands can make conversions complicated:
-1L < 1U (unsigned int to signed long)
-1L > 1UL (signed long to unsigned long)

conversions in assignments work by converting value
on the right side to the type on the left

a cast is used to coerce a type conversion

(type name)expression

useful when calling a fn,

int n = 5;
sqrt((double) n);

however, automatic coercion of arguments occurs if
args are declared by a function prototype, like

double sqrt(double)

sqrt(2) // the 2 is coerced to 2.0 without a cast.

<math.h> needs to be explicitly linked with -mn,
gcc -o prog proc.c -lm

++var, increment then assign value
var++, assign value then increment
either way, these operators aren't nice.

& - AND     110 & 100 = 100
| - OR inc. 110 | 100 = 110
^ - OR exc. 110 ^ 100 = 011

<< - left shift   0001 << 2 = 0100
>> - right shift  0100 >> 2 = 0001
nb. signed values will fill with bit signs or
0 bits depending on the machine for r shift.

~ - one's complement  ~010 = 101

More info on bit fields 

Be mindful of when expr are evaluated, and
when a function mutates values before other
functions, esp when used in expr together.


Chapter 3


Statements terminate with semicolons.
When doing conditionals used braces.

                              *

At this point I went off the book for a short time to learn
about how one might achieve some of the basic functions of a
C program without using the standard library, libc. It was
following an article I read that discussed the kinds of
complexity that are added to programs when using stdlib. I
was curious if it was as straight forward as the article
laid out for doing basic stuff like output. I found someone
had written a short tutorial on implementing this, and it
was well explained. However, I did become curious as to how
the author's executables were so much (1000s of bytes)
smaller than mine, although I had followed the same steps.

I guess some systems and compilers will just add more for
whatever reason. I discovered a bunch of interesting tools
to really see into what is going on, that now I think I'll
need to learn more about

gdb
objdump
hexdump
strip
GNU Assembler
All the gcc commands

... and basically just reading through various resources
about understanding everything the system can do.

In light of this, I think it's somewhat beyond the scope of
what I'm trying to learn from this exercise, so I plan to
follow the book without anything like I've described.


                              *

randomness requires a seed, otherwise the value will be the same
each time.

srand(time(0));

always add a break to every switch case (incl. default) that isn't
intended to fall thru.

generally, goto is not recommended. But are often used to jump to
clean up code, error handling, or break out of nested loops.

Local (automatic) arrays are not initialized unless you
explicitly initialize them. Otherwise they contain whatever
random data already occupies the memory.

int arr[10] = {0}; /* will init all 10 values to 0 */

however this doesn't work if for a variable length (VLA), so
either a loop to initialise the values or a call to memset
or calloc for setting values to 0.


Chapter 4


External variables can be used to communicate between fns as an
alternative to args and returns. This can be useful when the
amount of shared data is large. This is usually bad design tho.

extern is used to declare the use of variable defined in a different
context. extern doesn't initialise. Array sizes are optional.

header files contain shared declarations and definitions. Up to a
moderate size, it's better to just have 1 that contains all shared
stuff.

static can be used to keep storage private to a file when the
file is used with other source files. It is oft used to external
variables to functions within a file from other files that use
those functions. It can also be used to create permanent storage
for a local variable within a function.

register tells the compiler that a machine register may be used
for a variable as it will be used heavily.

block scoped variables that collide with (hide) the names of variables
in the external scope is possible but should be avoided to reduce
confusion.

#defines are used for macros, and they can replace instances
of the value with the contents, incl fns, like

#define add(x,y) x + y

using a # in the definition will use a macro param as a string, like

#define add(x,y) printf(#x "and" #y "is %d", x + y)

#define can be made multi-line with \

## does stuff with removing whitespace to create tokens...


#if along with a expr can be used to determine whether a block should
be included when compiling. It is evaluated along with #elif, #else
and #endif.

defined(name) where name is used to check macro names, returns 1 if def.
where there's #if defined(NAME) can be replaced with #ifdef (or #ifndef)
This is usually used in header files to prevent re-including stuff. Or
it can be used to import for various environments etc.


Chapter 5

& retrieves the address of a variable, oft to use in assignment,

p = &c;

p "points to" c

& operator to get an address only applies to objects in memory, and
can't be used for expr, const, or register variables.

* is indirection or dereferencing operator (when applied to pointer).
It accesses the object the pointer points to.

Declaration of pointers, e.g

int *p;

is a mnemonic, that is, it is declared how it will be used, similar
to how a function is declared (rather than defined).

operator like ++ would be written like (*p)++ because of right to
left association.

pointer arithmetic is moving arithmetically the size of the type
of the data a pointer points to.

arrays and pointers are very similarly related. An variable or
expression with type array is the address of element zero, ie

where a is an array of type t and pa is a pointer to t,

pa = &a[0];
pa = a;

are equivalent statements.
Note, a = pa, or a += 1 (despite p += 1 being legal) is illegal.
But when an array variable like a is passed as an argument to
a fn that takes a pointer of the same type,

int fn(char *x);
char arr[10];
fn(arr);

the local variable x can be incremented within the scope of fn.

it is illegal to refer to objects that aren't within the array
bounds, so some references to indices might fail, esp if negative
when dealing with a subarray,

zero is never a valid address for data in C, so return value of
0 can be used to signal an abnormal event. Pointers and integers
are not interchangeable, but 0 can be assgined and compared.
Symbolically, 0 is represented by NULL, which is to state clearly
that we are dealing with a pointer to nothing (ie null pointer).

Comparisons can be used with pointers to members of the same array.
e.g p < q is true if p and q point to different elements and p
is an earlier element. Anything outside, except the first element
past the end of an array, can't be used.

+ and - operators  work by moving through pointers in an array
by the size of the type of the array. It isn't possible to add two
pointers, or multiply, divide, shift, mask or use doubles with them.

(char *a, char *b)
int i = 0
a[i] = b[i]
i++

vs

(char *a, char *b)
*a = *b
a++; b++

pointer arithmetic and incr opertors are used idiomatically for
stacks:

*p++ = val /* push val onto stack (assign val to pointer then incr)
val = *--p /* pop top of stack to val (decr point and return its val)


int a[10][20]; /* initialise subscriptable int arrays of fix amount */
int *b[10]; /* initialise 10 pointers that can point to arrays of varying amount */


int main(int argc, char *argv[])
             |           ^- array of char strings
             - number of cmd line args

argv[0] -> the invoked program
argv[argc] -> null pointer


void * is a generic pointer type. Any pointer can be cast to void *
and back again without loss of info. It can be used when the type
is not known at the point of declaration.

Declaring a fn pointer is possible:

int (* fn)(void *)

And fn pointers are passed like arrays as args, by name and without
the need for & as they're already addresses by name.

note:

int *fn(void *)

refers to a fn returning a pointer to int

the main thing to remember is parens take precendence over *


Chapter 6


struct declares a structure:

struct point {
  int x;
  int y;
};

point is a "structure tag". this can be used as a shorthand
for the declaration in braces.

the x and y are members. these won't conflict with other vars.

operator . connects structure name to members

struct point pt = {0, 0};
printf("x %d, y %d", pt.x, pt.y);

structures may not be compared (unless their values are with fn).

with regards to pointers to structs:
struct members operator . is higher than * so deref needs to be

(*pt).x

but it's more common to just use the alt notation:

pt->x

an example of order of precedence:

*p++->str

first, member binding
.   member lookup
->  pointer member lookup
[]  array subscript
()  function call

so, p->str (we now are looking at str, in this case is a pointer)

then * pointer deref

so, *str (we are now look at whatever str points to)

incr/decr operators ++ --

so, p++ incr p (not str, which would be *p->str++, or what str
points to, which would be (*p->str)++).

sizeof is a unary operator used to compute size of any object
at compile time.

sizeof object
sizeof (type name)

returns unsigned integer type size_t.
sizeof can be used with #define statements
array lengths can be found like

(sizeof some_arr / sizeof(some_arr[0]))

Always make sure to never generate an illegal pointer or attempt
to access an element outside the array.

structs cannot contain their own kind, but can have pointers to.

malloc is used to create a space in memory for objects. It returns
a pointer of type void, so it should be cast to the correct pointer
type that it was called for.

char *s = "1234";
char *p = (char *) malloc(strlen(s) + 1); /* + 1 for \0 */

malloc returns NULL if no space is available.
free is called to free memory taken by malloc.

typedef is used for createing new data type names.

typedef char *String;

String is now a synonum for char *:

String p;
int strcmp(String, String);
p = (String) malloc(100);

typedef is similar to #define but is interpreted by the compiler.
typedef is often used to help with portability of machine-dependent
types. If a type needs to change, then only the typedef is altered.

union holds objects of different types and sizes. the compiler
keeps track of size and alignment requirements. They are a way to
manipulate different data types in a single area of storage. This
also helps avoid machine dependency.

a union will be large enough to hold the largest of its types:

union u_some_union {
      int int_val;
      float float_val;
      char *string_val;
} union_variable;

it is up to the programmer to keep track of which type is stored.
members of a union are accessed the same as with structs.

bit fields can be used to define flag-like values into small
amounts of storage, where masks defined like

#define WARM   01
#define RAIN   02
#define CLOUD  04

weather |= WARM | RAIN
to say set weather values to true (0000 0011)

instead can use:

struct {
       unsigned int is_warm  : 1;
       unsigned int is_rain  : 1;
       unsigned int is_cloud : 1;
} weather;

flags.is_warm = flags.is_rain = 1;

however it is implementation dependent, so be careful when working
with externally defined data. fields may only be ints, and specify 
unsigned or signed for portability.


chapter 7 (brief skim notes)


scanf is like printf for input
files use a file pointer, FILE
fopen to open files
fclose to close files
fprintf and fscanf using file pointer
stdin stdout stderr are used like files, send err output to stderr
exit(num) for exiting using a specific error number, can be
called from anywhere
system("command") used to make system calls (OS dependent)
there are lots of fns for interacting w/ files


chapter 8 (brief skim notes)


read() and write() are lower level syscalls for file ops
they take file descriptors not pointers.
open(), creat(), close(), unlink() are more file ops
lseek can move around a file without changes
there are lots of IO syscalls on UNIX analogous to stdlib stuff


appendix (brief skim notes)


there are many libs for many tasks, math, time, chars

assert(expr) to add diagnotics
if 0, will print to stderr, then calls abort

setjmp/longjmp to move in and out of fn call seq (not sure exactly)

signal for passing a handler to occur on signal events
raise sends a signal to the program


some additional notes


there's a thing called compound literals, eg.

printf("%lu", (long int){0xFFFFFFFFFFFFF});

gcc docs on compound literals 

It's possible to iterate beyond the end of an array or whatever
a pointer is "meant" to be able to point to, and dereference it. R/W
to this is undefined and bad. Therefore, you take responsibility
for providing whatever context with the bounds of the array. If, for
example, the array is a char array string, the end boundary would
be the length of the string + 1 (because the string initialises with
a \0 at the end). If the pointer provided actually pointed to a char
somewhere in the middle of this string, and we know the fn might want
to reference an earlier char, we might provide an value representing
the offset of the pointer's char from the 0 position.

Common ways to handle bounds is to set a character that indicates
the bound, like \1, use a structure that contains the bounds and the
array pointer, have a define or global value that is used throughout
the program for setting the maximum length of all arrays of a type, or
define a fn that requires the calling context to provide the bounds.



inline keyword
variadic functions using "..." and req. stdarg.h