The C Programming Language by Brian Kernighan and Dennis Ritchie Worldcat entry An HTML version with author notes and supplementary material is made by Dr Chuck: https://www.cc4e.com/index.php Chapter 1 C is a small language, most of what is used comes from libraries, which are often defined by ANSI. You get them with #include, like: #include <stdio.h> printf is a commonly used function, and an example of this. Integer division truncates numbers, so when doing division keep in mind what will truncate to 0 before multiplying with other numbers. Symbolic constants help with readbility when constant values are used. getchar() Get the next character of input. Requires int because EOF is too big for char. putchar(c) print integer as character single quotes represents integer of the character (incl. escape chars) switch might be preferred over if else when the condition is whether int or char expr matches one in a set of const. properly designed functions should make it easy to know what is done, and ignore how it does it. functions may even be written and called once, to make the code easier to understand to the reader. function definition form: return-type function-name (parameter declarations) { declarations statements } functions can be defined using a function prototype, which just states the fn without the body. c is call by value, arguments pass values and therefore are temporary values made available to the function. The fn can do whatever it likes to the value, without affecting the calling fns value. Changing the value in both contexts, one can reference the memory holding the value using pointers, which is the address of the memory. however arrays are a different model, as they pass the address to the beginning of the array, so arrays are altered in both contexts. \0 is a null character, value is 0, marks the end of a string is a char array. extern variables are defined outside of all fns, then declared within the fns that want to access, using extern, int some_value; int main() { extern int some_value; } although when the definition is before fn, extern isn't necessary, just convention. definition - where variable is created/assigned storage declaration - where variable is stated but no storage alloc Chapter 2 constants use special notation: L, 12345679L - long U, 1234567890U - unsigned (also UL) N.M, 123.456 - float (default type double) Ne-M, 1e-2 - float (default type double) F, 123.0F - float L, 123.0L - long double 0N, 012 - octal (also U, L, and UL) 0XN, 0X1F - hex (also U, L, and UL) escape sequences \a alert \b backspace \f formfeed \n newline \r carriage return \t horizontal tab \v vertical tab \\ backslash \? question mark \' single quote \" double quote \NNN octal \xNNN hex \0 null (has numeric value of 0) string constant "" empty "any number of chars" "these two strings " "will be concatenated" strings terminate with \0 therefore, 'x' != "x" enums (often alternative to #define) enum bool {NO, YES}; enum months {JAN = 1, FEB, MAR}; enum chars {a = 'a', b = 'b'}; const qualifier, value won't be changed from declaration, const char msg[] = "warning: "; int strlen(const char[]); % operator can't be used with float or double <ctype.h> provides tests and conversions independent of character set, tolower, isdigit,... specify signed or unsigned if non-character data is to be stored in char type conversion in arithmetic long double > double > float > int > char/short long > not long <math.h> generally uses doubles unsigned operands can make conversions complicated: -1L < 1U (unsigned int to signed long) -1L > 1UL (signed long to unsigned long) conversions in assignments work by converting value on the right side to the type on the left a cast is used to coerce a type conversion (type name)expression useful when calling a fn, int n = 5; sqrt((double) n); however, automatic coercion of arguments occurs if args are declared by a function prototype, like double sqrt(double) sqrt(2) // the 2 is coerced to 2.0 without a cast. <math.h> needs to be explicitly linked with -mn, gcc -o prog proc.c -lm ++var, increment then assign value var++, assign value then increment either way, these operators aren't nice. & - AND 110 & 100 = 100 | - OR inc. 110 | 100 = 110 ^ - OR exc. 110 ^ 100 = 011 << - left shift 0001 << 2 = 0100 >> - right shift 0100 >> 2 = 0001 nb. signed values will fill with bit signs or 0 bits depending on the machine for r shift. ~ - one's complement ~010 = 101 More info on bit fields Be mindful of when expr are evaluated, and when a function mutates values before other functions, esp when used in expr together. Chapter 3 Statements terminate with semicolons. When doing conditionals used braces. * At this point I went off the book for a short time to learn about how one might achieve some of the basic functions of a C program without using the standard library, libc. It was following an article I read that discussed the kinds of complexity that are added to programs when using stdlib. I was curious if it was as straight forward as the article laid out for doing basic stuff like output. I found someone had written a short tutorial on implementing this, and it was well explained. However, I did become curious as to how the author's executables were so much (1000s of bytes) smaller than mine, although I had followed the same steps. I guess some systems and compilers will just add more for whatever reason. I discovered a bunch of interesting tools to really see into what is going on, that now I think I'll need to learn more about gdb objdump hexdump strip GNU Assembler All the gcc commands ... and basically just reading through various resources about understanding everything the system can do. In light of this, I think it's somewhat beyond the scope of what I'm trying to learn from this exercise, so I plan to follow the book without anything like I've described. * randomness requires a seed, otherwise the value will be the same each time. srand(time(0)); always add a break to every switch case (incl. default) that isn't intended to fall thru. generally, goto is not recommended. But are often used to jump to clean up code, error handling, or break out of nested loops. Local (automatic) arrays are not initialized unless you explicitly initialize them. Otherwise they contain whatever random data already occupies the memory. int arr[10] = {0}; /* will init all 10 values to 0 */ however this doesn't work if for a variable length (VLA), so either a loop to initialise the values or a call to memset or calloc for setting values to 0. Chapter 4 External variables can be used to communicate between fns as an alternative to args and returns. This can be useful when the amount of shared data is large. This is usually bad design tho. extern is used to declare the use of variable defined in a different context. extern doesn't initialise. Array sizes are optional. header files contain shared declarations and definitions. Up to a moderate size, it's better to just have 1 that contains all shared stuff. static can be used to keep storage private to a file when the file is used with other source files. It is oft used to external variables to functions within a file from other files that use those functions. It can also be used to create permanent storage for a local variable within a function. register tells the compiler that a machine register may be used for a variable as it will be used heavily. block scoped variables that collide with (hide) the names of variables in the external scope is possible but should be avoided to reduce confusion. #defines are used for macros, and they can replace instances of the value with the contents, incl fns, like #define add(x,y) x + y using a # in the definition will use a macro param as a string, like #define add(x,y) printf(#x "and" #y "is %d", x + y) #define can be made multi-line with \ ## does stuff with removing whitespace to create tokens... #if along with a expr can be used to determine whether a block should be included when compiling. It is evaluated along with #elif, #else and #endif. defined(name) where name is used to check macro names, returns 1 if def. where there's #if defined(NAME) can be replaced with #ifdef (or #ifndef) This is usually used in header files to prevent re-including stuff. Or it can be used to import for various environments etc. Chapter 5 & retrieves the address of a variable, oft to use in assignment, p = &c; p "points to" c & operator to get an address only applies to objects in memory, and can't be used for expr, const, or register variables. * is indirection or dereferencing operator (when applied to pointer). It accesses the object the pointer points to. Declaration of pointers, e.g int *p; is a mnemonic, that is, it is declared how it will be used, similar to how a function is declared (rather than defined). operator like ++ would be written like (*p)++ because of right to left association. pointer arithmetic is moving arithmetically the size of the type of the data a pointer points to. arrays and pointers are very similarly related. An variable or expression with type array is the address of element zero, ie where a is an array of type t and pa is a pointer to t, pa = &a[0]; pa = a; are equivalent statements. Note, a = pa, or a += 1 (despite p += 1 being legal) is illegal. But when an array variable like a is passed as an argument to a fn that takes a pointer of the same type, int fn(char *x); char arr[10]; fn(arr); the local variable x can be incremented within the scope of fn. it is illegal to refer to objects that aren't within the array bounds, so some references to indices might fail, esp if negative when dealing with a subarray, zero is never a valid address for data in C, so return value of 0 can be used to signal an abnormal event. Pointers and integers are not interchangeable, but 0 can be assgined and compared. Symbolically, 0 is represented by NULL, which is to state clearly that we are dealing with a pointer to nothing (ie null pointer). Comparisons can be used with pointers to members of the same array. e.g p < q is true if p and q point to different elements and p is an earlier element. Anything outside, except the first element past the end of an array, can't be used. + and - operators work by moving through pointers in an array by the size of the type of the array. It isn't possible to add two pointers, or multiply, divide, shift, mask or use doubles with them. (char *a, char *b) int i = 0 a[i] = b[i] i++ vs (char *a, char *b) *a = *b a++; b++ pointer arithmetic and incr opertors are used idiomatically for stacks: *p++ = val /* push val onto stack (assign val to pointer then incr) val = *--p /* pop top of stack to val (decr point and return its val) int a[10][20]; /* initialise subscriptable int arrays of fix amount */ int *b[10]; /* initialise 10 pointers that can point to arrays of varying amount */ int main(int argc, char *argv[]) | ^- array of char strings - number of cmd line args argv[0] -> the invoked program argv[argc] -> null pointer void * is a generic pointer type. Any pointer can be cast to void * and back again without loss of info. It can be used when the type is not known at the point of declaration. Declaring a fn pointer is possible: int (* fn)(void *) And fn pointers are passed like arrays as args, by name and without the need for & as they're already addresses by name. note: int *fn(void *) refers to a fn returning a pointer to int the main thing to remember is parens take precendence over * Chapter 6 struct declares a structure: struct point { int x; int y; }; point is a "structure tag". this can be used as a shorthand for the declaration in braces. the x and y are members. these won't conflict with other vars. operator . connects structure name to members struct point pt = {0, 0}; printf("x %d, y %d", pt.x, pt.y); structures may not be compared (unless their values are with fn). with regards to pointers to structs: struct members operator . is higher than * so deref needs to be (*pt).x but it's more common to just use the alt notation: pt->x an example of order of precedence: *p++->str first, member binding . member lookup -> pointer member lookup [] array subscript () function call so, p->str (we now are looking at str, in this case is a pointer) then * pointer deref so, *str (we are now look at whatever str points to) incr/decr operators ++ -- so, p++ incr p (not str, which would be *p->str++, or what str points to, which would be (*p->str)++). sizeof is a unary operator used to compute size of any object at compile time. sizeof object sizeof (type name) returns unsigned integer type size_t. sizeof can be used with #define statements array lengths can be found like (sizeof some_arr / sizeof(some_arr[0])) Always make sure to never generate an illegal pointer or attempt to access an element outside the array. structs cannot contain their own kind, but can have pointers to. malloc is used to create a space in memory for objects. It returns a pointer of type void, so it should be cast to the correct pointer type that it was called for. char *s = "1234"; char *p = (char *) malloc(strlen(s) + 1); /* + 1 for \0 */ malloc returns NULL if no space is available. free is called to free memory taken by malloc. typedef is used for createing new data type names. typedef char *String; String is now a synonum for char *: String p; int strcmp(String, String); p = (String) malloc(100); typedef is similar to #define but is interpreted by the compiler. typedef is often used to help with portability of machine-dependent types. If a type needs to change, then only the typedef is altered. union holds objects of different types and sizes. the compiler keeps track of size and alignment requirements. They are a way to manipulate different data types in a single area of storage. This also helps avoid machine dependency. a union will be large enough to hold the largest of its types: union u_some_union { int int_val; float float_val; char *string_val; } union_variable; it is up to the programmer to keep track of which type is stored. members of a union are accessed the same as with structs. bit fields can be used to define flag-like values into small amounts of storage, where masks defined like #define WARM 01 #define RAIN 02 #define CLOUD 04 weather |= WARM | RAIN to say set weather values to true (0000 0011) instead can use: struct { unsigned int is_warm : 1; unsigned int is_rain : 1; unsigned int is_cloud : 1; } weather; flags.is_warm = flags.is_rain = 1; however it is implementation dependent, so be careful when working with externally defined data. fields may only be ints, and specify unsigned or signed for portability. chapter 7 (brief skim notes) scanf is like printf for input files use a file pointer, FILE fopen to open files fclose to close files fprintf and fscanf using file pointer stdin stdout stderr are used like files, send err output to stderr exit(num) for exiting using a specific error number, can be called from anywhere system("command") used to make system calls (OS dependent) there are lots of fns for interacting w/ files chapter 8 (brief skim notes) read() and write() are lower level syscalls for file ops they take file descriptors not pointers. open(), creat(), close(), unlink() are more file ops lseek can move around a file without changes there are lots of IO syscalls on UNIX analogous to stdlib stuff appendix (brief skim notes) there are many libs for many tasks, math, time, chars assert(expr) to add diagnotics if 0, will print to stderr, then calls abort setjmp/longjmp to move in and out of fn call seq (not sure exactly) signal for passing a handler to occur on signal events raise sends a signal to the program some additional notes there's a thing called compound literals, eg. printf("%lu", (long int){0xFFFFFFFFFFFFF}); gcc docs on compound literals It's possible to iterate beyond the end of an array or whatever a pointer is "meant" to be able to point to, and dereference it. R/W to this is undefined and bad. Therefore, you take responsibility for providing whatever context with the bounds of the array. If, for example, the array is a char array string, the end boundary would be the length of the string + 1 (because the string initialises with a \0 at the end). If the pointer provided actually pointed to a char somewhere in the middle of this string, and we know the fn might want to reference an earlier char, we might provide an value representing the offset of the pointer's char from the 0 position. Common ways to handle bounds is to set a character that indicates the bound, like \1, use a structure that contains the bounds and the array pointer, have a define or global value that is used throughout the program for setting the maximum length of all arrays of a type, or define a fn that requires the calling context to provide the bounds. inline keyword variadic functions using "..." and req. stdarg.h