Basics, Macros, and Variables

C - Low level, structured, procedural, and statically typed programming language. Some more info here.

Developed at Bell Labs in 1970s by Dennis Ritchie and Ken Thompson.

Inspirations: BCPL -> B -> C

Standards: K&R C (unofficial), ANSI C (aka C89), C99, C11, C18, C2x (to be reviewed in Dec 2021).


Basics

Standard vs Vendors

Undefined - Dangerous, often mentioned in standard, causes surprises. Ex - Divide by 0, segment violations.

Unspecified - Not specified by standard. Ex - Order of function parameter evaluation.

Implementation defined - Vendors are free to implement and document. Ex - Size of data types.

Simple Program structure

#include<stdio.h>

int main(int argc, char*[] argv){	// char** argv

printf("Hello World!");

return 0;
}

Parameters of main()

main()	// can be called with any number of parameters
main(void)	// no parameters can be present during call

Stages

										Static Libraries
											|
Preprocessor -> Compiler -> Assembler -> Linker -> Loader
	            .i 			.s 		   .o/.obj    .exe

Error Signals

SIGSEGV - Segment Violation (invalid access to valid memory, access to memory we don't have access to)            

SIGBUS - (access to an invalid address)

SIGABRT - an abort() call throws this

SIGFPE - Floating-point exceptions (divide by zero error)  

SIGILL - Illegal statement (CPU tried to execute an instruction it didn't understand)

SIGSYS - invalid argument passed to a system call

SIGTRAP - an exception triggered explicitly by debugger 

Tokens

There are five types of tokens: keywords, identifiers, constants, operators, and separators.

Keywords

There are a total of 44 keywords in C (C89 – 32, C99 – 5, C11 – 7).

Identifiers

Name of variables, functions, etc…

  • First letter must be an alphabet, underscore(_) counts as an alphabet.
  • must contain only alphabets, underscorem and numbers, no special characters.
  • no whitespaces
  • can’t be a keyword
  • lenght limit is 31 but not enforced in anyway other than as a good formatting rule (https://stackoverflow.com/questions/2352209/max-identifier-length)
Constants or Literals
Integer - int, octal (0), hex (0x) 
Real - float(f or F), double(d or D)
Character - 'a'
String - "abhi"
Escape Sequences - '\0', '\"'. '\xFF'
Operators
Arithmetic - +, -, *, /, %
Relational - <, >, <=, >=, !=, ==
Logical - &&, ||, !
Bitwise - &, |, ~, ^
Assignment - =, compound assignments
Conditional - ?:
Unary - ++, --
Others - comma (,), dereferencing (*), addressof (&)
Separators
comma (,) 
{}, (), []

Line Splicing

// this comment extends to next line \
printf("hello");
printf("world");

Also works with macros:

#define REPLACE(a, b) if(a < b) \
						a = b;

Macros and Preprocessor

They can appear anywhere in program.

  • Comment Removal
  • File inclusion
  • Macro expansion
  • Conditional inclusion/compilation
  • Debugging
Preprocessor directive : #
#include<stdio.h>	// searches in standard list of dir 
#include "myheader" // searches in the current folder first, then standard list of dir

#define PI 3.1415  // macro as object

#define MUL(a, b) a * b // macro as function

#undef
#ifdef
#ifndef
#endif

#if
#endif
#elif
#else

#line n "filename"	//filename is optional

#warning here is a warning message to show	 //shows a compile-time warning message

#error here is your error description //throws compile time error

#pragma once 	//not standard
		warn
		startup
		exit

Standard Macros

Mostly used for debugging.

__LINE__
__TIME__
__DATE__
__FILE__
__FUNC__

Stringizing

#define string_this(a) #a

printf("%s", string_this(Hello));

Token Pasting

#define combine_these(a, b) a##b

char* name = "Logan";

printf("%s", combine_these(na, me));

Data type in Macros

They have no control over data types.

Pitfalls and Tips

1. include guard 2. No nesting

#define foo (foo + 10)
foo
// foo will be replaced by (foo + 10), not further

3. No macro definition inside a macro definition

#define DEF #define A 999

DEF // won't work

4. Redefining macros

#define A 99
printf("%s", A);
#undef  // or #define A 88
printf("%s", A);

5. Macro default initialization By default, macros are initialized to Zero (0) is not defined previously.

#if X == 3
printf("Yes");
#else 
printf("No");
#endif

6. Macros don’t get pasted in other macro definitions

#define A define
#A B 99		// compile-time error

7. Not closing macro ifs is a compilation-error.

8. An innovative way to block commnent:

#if 0
printf("%s", "A");
printf("%s", "B");
printf("%s", "C");
#endif

9. Macro arguments are not evaluated before macro expansion

#include <stdio.h>
#define MULTIPLY(a, b) a*b
int main()
{
    // The macro is expanded as 2 + 3 * 3 + 5, not as 5*8
    printf("%d", MULTIPLY(2+3, 3+5));
    return 0;
}
// Output: 16

10. An Interesting Case

#include<stdio.h>
#define A -B
#define B -C
#define C 5

int main()
{
  printf("The value of A is %d\n", A); 
  return 0;
} 

Preprocessed File -

int main()
{
  printf("The value of A is %d\n", - -5);
  return 0;
}

Variadic Macros

 we want: eprintf ("%s:%d: ", input_file, lineno)
       fprintf(stderr, "%s:%d: ", input_file, lineno)

#define eprintf(format, ...) fprintf (stderr, format, __VA_ARGS__)  

Command Line Arguments

argc will atleast be 1 since argv always has argv[0] as program filename itself.

// Input: a.exe my code
#include<stdio.h>

int main(int argc, char* argv[])
{
	printf("%s ", argv[0]);
	printf("%s ", argv[1]);
	printf("%s ", argv[2]);
}

// Output: a.exe my code

Variables and Scoping

Declaration vs Definition

Declaration: declaring variable for the enclosing scope.

Definition: allocating memory to it.

int a = 9;		// both
int a;		// both, garbage value assigned

extern int a ;	// declaration only

Static Scoping

In C, variables are always statically (or lexically) scoped i.e., binding of a variable can be determined by program text and is independent of the run-time function call stack.

For example, output for the below program is 0, i.e., the value returned by f() is not dependent on who is calling it. f() always returns the value of global variable x.

# include <stdio.h>
  
int x = 0;

int f(){
   return x;
}

int g(){
   int x = 1;
   return f();
}

int main(){
  printf("%d", g());
  printf("\n");
  getchar();
}

Scopes in C

// Structural nature of C
#include<stdio.h>

int main()
{
	int a = 9;
	{
		int b = 8;
	}

	printf("%d", a);
	printf("%d", b);

}

// Compile-time error: 'b' undeclared

There are basically four scope rules in C:

Block: if-else, loops, etc…

if(1){
	// block scope
}

File: Global variables

// file scope
int a = 5;

int main(){

}

Function Prototype: only till a funtion prototype/declaration

int sum(int a, int b);
// or
int sum(int, int);

Function: function definition body scope

int sum(int a, int b)
{
	return a + b;
}

Storage Classes

Five storage classes in C are: auto, extern, static, register, typedef. We can’t have multiple storage class for a single variable.

auto: any variable inside some block, all variables are auto by default.

extern: only useful when multiple souce files are present and need to share a variable defined once. It signifies that the variable is defined elsewhere and not within the same file where it is being used.

//DEMO 1: Global variable within the same file
#include<stdio.h>

int main()
{
	extern a;		//if we comment this, we get error
	printf("%d", a);
}

int a = 99;

//Output: 99
//DEMO 2: Global variable in another file

//file1.c
#include<stdio.h>

int main()
{
	extern int a;		//if type mismatch happens, 0 is fetched, also commenting leads to error
	printf("%d", a);
}

//file2.c
int a = 88;

// compile as => gcc file1.c file2.c
// Output: 88 
//DEMO 3: Extern works only on globally scoped variables
#include<stdio.h>

int main()
{
	int a = 9;
	{
	extern int a;		
	printf("%d", a);
	}
}

//Compile-time error, undefined reference to 'a'
extern int a;
extern int a;
extern int a;	//can do this mutiple times as its declaration and not definition

extern int a = 9;	//can't do this, compile-time error, extern can't have initializer

static: serves two purposes:

  1. they don’t get destroyed even after the flow goes out of scope (block scope),

  2. static variables and functions can’t be accessed outside the file in which they’re defined (static linkage in file scope)

  • they are initialized inside data area of memory, hence their default value is always 0.

  • static variables can only be initialized using constant literals, whose value is known during compile-time.

  • Static variables should not be declared inside structure. The reason is C compiler requires the entire structure elements to be placed together (i.e.) memory allocation for structure members should be contiguous. It is possible to declare structure inside the function (stack segment) or allocate memory dynamically(heap segment) or it can be even global (BSS or data segment). Whatever might be the case, all structure members should reside in the same memory segment because the value for the structure element is fetched by counting the offset of the element from the beginning address of the structure. Separating out one member alone to data segment defeats the purpose of static variable and it is possible to have an entire structure as static.

register: same as auto but stored in fast CPU register, address can’t be retrieved using pointers. Avoid making an array with this as it can’t be decomposed into pointer for access or passing to functions.

typedef: assign alternative names to existing types:

typedef long long ll;

typedef unsigned long long ull;

typedef struct* my_structure s;

typedef static int points;	//compile-time error, multiple storage classes are not allowed

The only storage-class specifier that shall occur in a parameter declaration is register.

Type Qualifiers

const: whatever value is assigned to it during declaration can’t be changed.

const a;	//assumed to be int by default

int const a = 9;	//same as storage class, position doesn't matter
const int a = 9;

volatile: tells compiler not to optimize access to these variables since they will be accessed and changed a lot, often from outside the program in ways which compile may not be aware of.

volatile int a = 9;
int volatile a = 9;

const volatile int a = 9;	//we can do this, although it makes no sense

Linkage

Describes where the linker finds variables and functions to link.

No Linkage: Block scope (register, auto, static within block)

Internal: File Scope (static)

External: Multiple files (extern)

While talking about linkage we’re not talking about file inclusions using #include but actual linking of two sources by $gcc file1.c file2.c.

Global Variables

  • Both are stored in data section of memory and hence initialized to 0 or NULL by default.
  • Both must be a compile-time constant literal or contant expression.
  • A global static variable can only be accessed in the same file. (Internal Linkage)
//file1.c
#include<stdio.h>

int main()
{
	extern int a;
	printf("%d", a);
}

//file2.c
static int a = 88;

//Output: compile-time error, undefined reference to 'a'
  • A global variable can be declared multiple times if its not explicitly initialized.
#include<stdio.h>

int a;
int a;
int a = 9;
static int b;
static int b = 9;
int main()
{
	printf("%d", a);
}

//Ouput: 8

Linkers resolving multiple global variables and functions

When multiple files are compiled and linked, there might be clashes in naming of global variables and functions. In that case, linker needs to decide what to keep and what not to.

Initialized global variables and function: Strong Symbols

Uninitialized global varaibles: Weak Symbols

Rule#1: Two strong symbols with the same name are not allowed. Rule#2: If one strong symbol with the same name as one weak is present, choose the strong one. Rule#3: If multiple weak symbols with same name are present, choose any (dangerous, undefined behaviour, difficult to debug).

Reading Complicated Declarations in C

Link: https://www.geeksforgeeks.org/complicated-declarations-in-c/