linux: systemcheck in 60 seconds
posted on 2016-01-21 00:03

This post is a completely copied from @brendangregg from here. Just in shorter and typed by me in hope I can memorize it easier that way, plus a little change with including htop.

Summary: check a linux system for problems immediatly after ssh'ing onto it.

  1. htop - uptime, core diversity, load, swap on first sight via a TUI.
  2. uptime - for load checking, likely unnnecesary after htop
  3. dmesg | tail - check for errors like out of memory
  4. vmstat 1 - check amount of processes (r) and kernel/userland distribution and swap
  5. mpstat -P ALL 1 - check for a single hot core
  6. pidstat 1 - check for high load on single process
  7. iostat -xz 1 - high r/w load? awaits? util%?
  8. free -m - memory available, likely unneded after htop
  9. sar -n DEV 1 - rxkb/s or txkb/s is 125mbytes max for 1G NICs, util% ok?
  10. sar -n TCP,ETCP 1 - act = egress, pasv = ingress traffic, retransmits = bad, usually
  11. top - zxcV and 1 and < and > are your best friends, along with knowing status indices.


At 7. buffers = block device caching, cache = page cache for file system.

At 10. just switch columns through the angle brackets keys and have a look the waits (wa) to see if there are disk related issues, after having pressed 1 to show all available cores. d with a number after changes the refresh time to x seconds. In general everything concerning top can be found in the manual.

Lastly, a list of the process states from the mentioned top man page:

D = uninterruptible sleep <<-- waiting for disk
R = running
S = sleeping
T = traced or stopped
Z = zombie
Linux: 'top' explained
posted on 2015-03-04 12:54:59

To get a fast overview on what is running on your linux box, use top. (If you want some fancy graphics, try htop, but it has less intuitive shortcuts and is not always installed.)

Sad thing is, at first you don't really know what you are doing. So some guidance:

start and sane defaults

After starting top, press: z, x, c. This will color top (z), show current sort column (x) and the full application path (c).

1 will show stats for all individual cpus.

If you have no idea, use h for getting the help shown.

If you have a newer version of top, V will also work:
This gives you a nice process-tree view.

d changes the update delay, which is at three seconds per default.

cpu stats explained

Straight from the manpage, the CPU statistics show the times spent in:

us = user mode
sy = system mode
ni = low priority user mode (nice)
id = idle task
wa = I/O waiting
hi = servicing IRQs
si = servicing soft IRQs
st = steal (time given to other DomU instances)

If you have low cpu and ram usage but the system is unresponsive, have a look at the wait times.

sorting and searching

Changing the sort column can be done via < and >.

Also available: (not shown in help)

N sort by PID
P sort by CPU usage
M sort by memory usage
T sort by time

R will reverse the output.

u to choose user name, show only this user's processes.

S for cululative time toggling.


f will toggle a window in which you can choose the info fields to be shown. Pressing the character will toggle its state. (Shown or not shown.)

o also opens a window, in there you can reorder the columns. Press the character of the column you want to move, depending on it being upper- or lowercase it gets moved up and down.

manipulate tasks

These should be self-explanatory:

k kill task

r renice task

