Posts tagged awk

AWK

posted on 2016-01-29 23:54:30

intro

awk is one hell of a beast. It's named after it's inventors Aho, Weinberger and Kernighan, there exist different implementations. This post is to give a rough overview, there is more to it.

Either look up the official documentation for your installed implementation (i.e. mawk is just not gawk), or try heading over here, which is where I look things sometimes up when I happen to need it.

Pro's:

  • It's fast.
  • You have a programming language at your disposal in the shell.
  • One-liners are pretty quickly written.

Con's:

  • rather steep learning curve

When working with text fragments, it can truly speed up things. Since beside a lot of good tutorials always missed a proper introduction for me, this i my shot at creating one myself.

These nice things happen to exist, or can at least be created.

  • variables
  • conditions
  • loops
  • associative arrays
  • functions
  • a profiler (Yes, the software comes with its own profiler built in, depending on the implementation you use)
  • pipes (You can pipe awk arrays directly to shell commands WITHIN awk, which is a nice feature.)
  • arithmetic operators

This list is likely not complete, as this post comes almost completely out of my head.

creating scripts vs. executing statements in the shell

Both is easily possible. The shebang for scripts:

#!/usr/bin/awk -f

Within scripts statements (block stuff within the braces) are separated through newlines, whereas in the shell you need semicolons. You don't need semicolons outside the blocks.

structuring

Usually awk are based on the following structure:

#!/usr/bin/awk -f

BEGIN { ... }
BEGIN { ... }
BEGIN { ... }
CONDITION { ... }
CONDITION { ... }
CONDITION { ... }
CONDITION { ... }
END { ... }
END { ... }
END { ... }

Or on the shell:

awk 'BEGIN { ... } BEGIN { ... } BEGIN { ... } CONDITION { ... } CONDITION { ... } CONDITION { ... } CONDITION { ... } END { ... } END { ... } END { ... }'

The misleading part is, you don't need the BEGIN blocks, the END blocks or the CONDITION's themselves from the CONDITION blocks.

So the program could as well be looking just like this:

awk '{ ... }'

or

awk 'CONDITION'

So in very short:

  • awk processes input line by line usually.
  • BEGIN / END blocks are executed prior or after input processing.
  • The middle part is executed while traversing the input, depending if the condition evaluates to 'true'.
  • If no CONDITION is specified, the block is always executed.
  • several blocks can be used together, all are evaluated.
  • If a condition is true, the current line (called a RECORD, consisting of columns called FIELDS) is printed, even when no block was specified. That also happens to be the case with variable definition. (Sooner or later you will have to debug exactly this case.)

built-in variables

To repeat:

  • RECORD = a single row of your dataset
  • FIELD = a single data entry of a column from the current row

Which explains the variables a bit:

FS          field separator (delimiter for input data, usually ' ')
OFS         output field separator (delimiter for output data)

NF          number of fields = column count
NR          number of record = row count

RS          record separator (how input is delimited, usually '\n')
ORS         output record separator (how output is delimited)

FILENAME    name of the input file

There may be more, but these are mostly implementation-dependant and thus omitted.

user-defined variables

Unlike in bash variables, you omit the $ prepended to the variable names. You have input data, which are usually just strings ("like this"), so you declare variables like this:

example_var_1=""; example_var_2=""

This can either be done in BEGIN or within the main code paths. See next paragraph for some examples.

arrays

All arrays are associative, which lets you emulate regular arrays, too. For these you simply create a variable, defined with value 0. When iterating, simply increase the running index, which is the key for your array values.

Associative arrays are rather easy, one of your fields is a key, the other the value which gets set.

An example for a emulated 'normal' array is this:

awk 'BEGIN {index=0; array=""} {array[index]=$1; index++}'

An example for regular 'associative' usage:

awk 'BEGIN {array=""} {array[$1]=$2}'

CONDITIONS

These could either be simple assignments, or regexes (/ ... /)

built-in functions

This is a quick overview, so you know these exist:

next     jump to next record
exit     quit exit program, an exit code can be specified
getline  for when you need to control getting input
print    self-explanatory
printf   formatted printing like in C
++       increment
function keywork for when you explicitly need user-defined functions

one-liners

Some handy examples are provided here:

## print second column (counting starts with one, not zero)
## this is what you will use awk the most for, don't use `cut`
awk 'print $2'

## print everything EXCEPT second column
awk '$2=""'

## remove empty lines
awk 'NF>0'
# or
awk '$1'

## add a header (printf is analogous to C language)
## the regular print statement could be used as well, of course
awk 'BEGIN {printf "%s %s %s\n","1stcol","2ndcol", "3rdcol"} {print $0}'

## print several columns, use several different delimiters
## both of these work with arbitrary counts for tab, space or colon as delemiter
awk -F'[\t :]+' '{print $1 $2 $3}'

awk: show postfix mailq mail ID's for specific mail

posted on 2015-09-28 00:46:44

In short, replace <searchterm> with a regex for the adress you want:

mailq | awk 'BEGIN { RS = "" } /<searchterm>/ {print $1} '

This blog covers .csv, .htaccess, .pfx, .vmx, /etc/crypttab, /etc/network/interfaces, /etc/sudoers, /proc, 10.04, 14.04, AS, ASA, ControlPanel, DS1054Z, GPT, HWR, Hyper-V, IPSEC, KVM, LSI, LVM, LXC, MBR, MTU, MegaCli, PHP, PKI, R, RAID, S.M.A.R.T., SNMP, SSD, SSL, TLS, TRIM, VEEAM, VMware, VServer, VirtualBox, Virtuozzo, XenServer, acpi, adaptec, algorithm, ansible, apache, apachebench, apple, applet, arcconf, arch, architecture, areca, arping, asa, asdm, autoconf, awk, backup, bandit, bar, bash, benchmarking, binding, bitrate, blackarmor, blockdev, blowfish, bochs, bond, bonding, booknotes, bootable, bsd, btrfs, buffer, c-states, cache, caching, ccl, centos, certificate, certtool, cgdisk, cheatsheet, chrome, chroot, cisco, clamav, cli, clp, clush, cluster, coleslaw, colorscheme, common lisp, configuration management, console, container, containers, controller, cron, cryptsetup, csync2, cu, cups, cygwin, d-states, database, date, db2, dcfldd, dcim, dd, debian, debug, debugger, debugging, decimal, desktop, df, dhclient, dhcp, diff, dig, display manager, dm-crypt, dmesg, dmidecode, dns, docker, dos, drivers, dtrace, dtrace4linux, du, dynamictracing, e2fsck, eBPF, ebook, efi, egrep, emacs, encoding, env, error, ess, esx, esxcli, esxi, ethtool, evil, expect, exportfs, factory reset, factory_reset, factoryreset, fail2ban, fbsd, fdisk, fedora, file, filesystem, find, fio, firewall, firmware, fish, flashrom, forensics, free, freebsd, freedos, fritzbox, fsck, fstrim, ftp, ftps, g-states, gentoo, ghostscript, git, git-filter-branch, github, gitolite, global, gnutls, gradle, grep, grml, grub, grub2, guacamole, hardware, haskell, hdd, hdparm, hellowor, hex, hexdump, history, howto, htop, htpasswd, http, httpd, https, i3, icmp, ifenslave, iftop, iis, imagemagick, imap, imaps, init, innoDB, innodb, inodes, intel, ioncube, ios, iostat, ip, iperf, iphone, ipmi, ipmitool, iproute2, ipsec, iptables, ipv6, irc, irssi, iw, iwconfig, iwlist, iwlwifi, jailbreak, jails, java, javascript, javaws, js, juniper, junit, kali, kde, kemp, kernel, keyremap, kill, kpartx, krypton, lacp, lamp, languages, ldap, ldapsearch, less, leviathan, liero, lightning, links, linux, linuxin3months, lisp, list, livedisk, lmctfy, loadbalancing, locale, log, logrotate, looback, loopback, losetup, lsblk, lsi, lsof, lsusb, lsyncd, luks, lvextend, lvm, lvm2, lvreduce, lxc, lxde, macbook, macro, magento, mailclient, mailing, mailq, manpages, markdown, mbr, mdadm, megacli, micro sd, microsoft, minicom, mkfs, mktemp, mod_pagespeed, mod_proxy, modbus, modprobe, mount, mouse, movement, mpstat, multitasking, myISAM, mysql, mysql 5.7, mysql workbench, mysqlcheck, mysqldump, nagios, nas, nat, nc, netfilter, networking, nfs, nginx, nmap, nocaps, nodejs, numberingsystem, numbers, od, onyx, opcode-cache, openVZ, openlierox, openssl, openvpn, openvswitch, openwrt, oracle linux, org-mode, os, oscilloscope, overview, parallel, parameter expansion, parted, partitioning, passwd, patch, pct, pdf, performance, pfsense, php, php7, phpmyadmin, pi, pidgin, pidstat, pins, pkill, plasma, plesk, plugin, posix, postfix, postfixadmin, postgres, postgresql, poudriere, powershell, preview, profiling, prompt, proxmox, ps, puppet, pv, pveam, pvecm, pvesm, pvresize, python, qemu, qemu-img, qm, qmrestore, quicklisp, quickshare, r, racktables, raid, raspberry pi, raspberrypi, raspbian, rbpi, rdp, redhat, redirect, registry, requirements, resize2fs, rewrite, rewrites, rhel, rigol, roccat, routing, rs0485, rs232, rsync, s-states, s_client, samba, sar, sata, sbcl, scite, scp, screen, scripting, seafile, seagate, security, sed, serial, serial port, setup, sftp, sg300, shell, shopware, shortcuts, showmount, signals, slattach, slip, slow-query-log, smbclient, snmpget, snmpwalk, software RAID, software raid, softwareraid, sophos, spacemacs, spam, specification, speedport, spi, sqlite, squid, ssd, ssh, ssh-add, sshd, ssl, stats, storage, strace, stronswan, su, submodules, subzone, sudo, sudoers, sup, swaks, swap, switch, switching, synaptics, synergy, sysfs, systemd, systemtap, tar, tcpdump, tcsh, tee, telnet, terminal, terminator, testdisk, testing, throughput, tmux, todo, tomcat, top, tput, trafficshaping, ttl, tuning, tunnel, tunneling, typo3, uboot, ubuntu, ubuntu 16.04, udev, uefi, ulimit, uname, unetbootin, unit testing, upstart, uptime, usb, usbstick, utf8, utm, utm 220, ux305, vcs, vgchange, vim, vimdiff, virtualbox, virtualization, visual studio code, vlan, vmstat, vmware, vnc, vncviewer, voltage, vpn, vsphere, vzdump, w, w701, wakeonlan, wargames, web, webdav, weechat, wget, whois, wicd, wifi, windowmanager, windows, wine, wireshark, wpa, wpa_passphrase, wpa_supplicant, x11vnc, x2x, xfce, xfreerdp, xmodem, xterm, xxd, yum, zones, zsh


Unless otherwise credited all material Creative Commons License by sjas