posted on 2016-01-21 00:03
Summary: check a linux system for problems immediatly after ssh'ing onto it.
htop- uptime, core diversity, load, swap on first sight via a TUI.
uptime- for load checking, likely unnnecesary after
dmesg | tail- check for errors like out of memory
vmstat 1- check amount of processes (r) and kernel/userland distribution and swap
mpstat -P ALL 1- check for a single hot core
pidstat 1- check for high load on single process
iostat -xz 1- high r/w load? awaits? util%?
free -m- memory available, likely unneded after
sar -n DEV 1- rxkb/s or txkb/s is 125mbytes max for 1G NICs, util% ok?
sar -n TCP,ETCP 1- act = egress, pasv = ingress traffic, retransmits = bad, usually
top- zxcV and 1 and < and > are your best friends, along with knowing status indices.
At 7. buffers = block device caching, cache = page cache for file system.
At 10. just switch columns through the angle brackets keys and have a look the waits (
wa) to see if there are disk related issues, after having pressed
1 to show all available cores.
d with a number after changes the refresh time to x seconds.
In general everything concerning
top can be found in the manual.
Lastly, a list of the process states from the mentioned
top man page:
D = uninterruptible sleep <<-- waiting for disk R = running S = sleeping T = traced or stopped Z = zombie
posted on 2015-01-17 18:50:42
This is an alphabetical list which will serve as a reminder, what programs are there to be looked up for me. :)
All this started when I stumbled across a picture on the web, which was from a presentation from Brendan Gregg at LinuxCon14 as I later found out. It was called Linux Performance Tools and it's worth its words in gold, platin and whatever material you see as highly valuable. The slides are here, get your copy and study them. If you want some serious linux sysadmin skills, there is no possible excuse for not doing it.
DO. IT. NOW.
Another two incentives can be found here and here. These may only use a small portion of the later mentioned programs, but either walk the extra miles, or raise your hands in defeat once things get tough, everybody gets to choose man's own path.
blktrace (8) - generate traces of the i/o traffic on block devices dstat (1) - versatile tool for generating system resource statistics dtrace (1) - Dtrace compatibile user application static probe generation tool. ebpf: nothing appropriate. ethtool (8) - query or control network driver and hardware settings free (1) - Display amount of free and used memory in the system ftrace: nothing appropriate. iostat (1) - Report Central Processing Unit (CPU) statistics and input/output statistics for devices and partitions. iotop (8) - simple top-like I/O monitor ip (8) - show / manipulate routing, devices, policy routing and tunnels iptraf (8) - Interactive Colorful IP LAN Monitor ktap: nothing appropriate. lldptool (8) - manage the LDP settings and status of lldpad lsof (8) - list open files ltrace (1) - A library call tracer lttng: nothing appropriate. mpstat (1) - Report processors related statistics. netstat (8) - Print network connections, routing tables, interface statistics, masquerade connections, and multicast memberships. nicstat: nothing appropriate. pcstat: nothing appropriate. perf (1) - Performance analysis tools for Linux pidstat (1) - Report statistics for Linux tasks. /proc: nothing appropriate. ps (1) - report a snapshot of the current processes. rdmsr: nothing appropriate. sar (1) - Collect, report, or save system activity information. slabtop (1) - display kernel slab cache information in real time snmpget (1) - communicates with a network entity using SNMP GET requests ss (8) - another utility to investigate sockets stap (1) - systemtap script translator/driver strace (1) - trace system calls and signals swapon (8) - enable/disable devices and files for paging and swapping sysdig () - the definitive system and process troubleshooting tool tcpdump (8) - dump traffic on a network tiptop (1) - display hardware performance counters for Linux tasks top (1) - display Linux processes uptime (1) - Tell how long the system has been running. vmstat (8) - Report virtual memory statistics
First some more explanations on the ones listed above with "nothing appropriate":
ebpf, ftrace, ktap, lttng, nicstat, pcstat, /proc, rdmsr are usually all too new. New like either in bleeding edge, or at least not available in CentOS 7 or Debian 7. If you grab the sources, you might get along. The manpage headlines are actually from a CentOS 7. (Only exception is sysdig, which I installed via the one-liner its github page provided.) /proc is of course not a command, but mentions the /proc folder linux uses where a lot of useful information can be found.
Here are some other sortings, by 'types' now. (Maybe this improves readability, or makes it easier to remember, who knows. It's worth trying, still.)
'stat', 'top', 'trace', 'tap':
dstat iotop blktrace ktap iostat slabtop dtrace stap mpstat tiptop ftrace netstat top ltrace nicstat strace pcstat pidstat vmstat
ebpf ethtool free ip iptraf lldptool lsof lttng perf /proc ps rdmsr sar snmpget ss swapon sysdig tcpdump uptime
This were only the 'observability' tools from the presentation. There are also some more listed on 'benchmarking' and 'tuning', and maybe 'tracing'.
Just go an read up on them. NOW.
View posts from 2017-05, 2017-04, 2017-03, 2017-02, 2017-01, 2016-12, 2016-11, 2016-10, 2016-09, 2016-08, 2016-07, 2016-06, 2016-05, 2016-04, 2016-03, 2016-02, 2016-01, 2015-12, 2015-11, 2015-10, 2015-09, 2015-08, 2015-07, 2015-06, 2015-05, 2015-04, 2015-03, 2015-02, 2015-01, 2014-12, 2014-11, 2014-10, 2014-09, 2014-08, 2014-07, 2014-06, 2014-05, 2014-04, 2014-03, 2014-01, 2013-12, 2013-11, 2013-10