Awk

awk (Aho, Weinberger, and Kernighan)

  • AWK Operations:

    • Scans a file line by line
    • Splits each input line into fields
    • Compares input line/fields to pattern
    • Performs action(s) on matched lines
  • Syntax: $ awk /regex/'{action}' filename or $ awk '/regex/{action}' filename

  • Default behavior: $ awk '{print}' file.txt

  • Field Separator: -F "sep" where sep is the separator $ awk -F "," '{print $2, $3}' file.txt (prints only the fields 2 and 3 of the file) (fields have to be seaprated by a comma (,) in this case)

  • Read action from file: -f file_containing_action

  • Regex matching: $ awk '/regex/' filename.txt $ awk /regex/ filename.txt

    • Regex in a field ($ awk '$3 ~ /programmer/' emp_data)
    if($2 == "string")  #will match whole date of none of it
    if($2 ~ "regex")    #useful for finding only year in "25/12/2019"
    alternatively, if($2 ~ /regex/)
    # $2 !~ "regex" (be careful with single-quotes, they work like in shell)
    
    • Set IGNORECASE variable in BEGIN block to do case-insensitive matching with ~ as well as == anywhere in the script BEGIN{IGNORECASE=1}
  • BEGIN and END blocks:

    $ awk 'BEGIN{print "hi"} {print} END{print "bye"}' file.txt
    
  • Line-by-Line: (value in field 2 ($2) of every line keeps on getting added)

    $ cat file.txt
    apple 1
    ball 2
    cat 3
    dog 4
      
    $ awk 'BEGIN{s=0} {s=s+$2;print s}' file.txt
    1
    3
    6
    10
    
  • Built-In variables in awk:

    • $1, $2, $3, and so on ($0 is the entire line)
    • FS, OFS (Input and output file separators)
    #specify FS="CHAR" in BEGIN block
    #If any change/update is done to any field, only then will OFS change for $0, else $0 remains on existing FS even if OFS is defined
    #if OFS is defined then use comma (,) in print to insert between fields 
    print $1, $2, $3, $4
    
    • RS, ORS (Input and output record separators) (record = lines)
    • NR, NF, FNR (present line number, present field number, and total lines/records in the file)
    • FILENAME (current file’s name)
    • ARGC, ARGV (no. of cmd-line args, array that stores them (0 to ARGC-1))
    # will print whole lines even with $1 as default separator is TAB
    $ awk 'BEGIN{FS=","} {print $1}' file.txt   
    # fixed by specifying separator as comma (,)
    # double-quotes mandatory with FS var
    
  • Arithmetic, pre and post, assignment, relational, logical, ternary (same as in C) (**/^ is for exponentiation, **=/^= shorthand assignment)

  • String concatenation operator (SPACE)

    $ awk 'BEGIN{str1 = "Abhi"; str2 = "Arya"; str3 = str1 str2; print str3}' 
    
  • Arrays

    • Creation arrayName[key]=value
    • Access arrayName[key]
    • Delete delete arrayName[key]
    for (i in a)
        print a[i]
    
  • Control Flow (same as in C)

    if(condition) { }
    else { }
    
  • Loops (for, while, do while, break, continue, exit(10)) (same as in C)

 for (i in a)
    print i
  • Built-in Functions (https://www.tutorialspoint.com/awk/awk_built_in_functions.htm)
  • User Defined Functions (function foo(arg1, arg2) { return bar })
  • Redirection (we can redirect output using > and » inside awk action) $ awk 'BEGIN { print "Hello, World" > "/tmp/message.txt" }'
  • Piping (https://www.tutorialspoint.com/awk/awk_output_redirection.htm)
  • printf printf format, value_list
    • printf functionality and attributes are same as in C
  • Command-line params:
    • $ awk '{}' file.txt arg1 arg2 (access using ARGV[2] and so on)(poor way, works but arg1 recognized as an input file too ad gives error alongwith correct output)
    • ARGV[0] is awk and ARGV[1] is file.txt
    •   #!/bin/sh
        awk -v arg1="$1" -v arg2="$2" '{}' file     #($1, $2 are cmd-line args to sh command)