Linux I/O Monitoring and Analyze

From WikiMLT

There is a cou­ple of tools avail­able that al­lows you to mon­i­tor and an­a­lyze the disk I/O per­for­mance of your Lin­ux dri­ven sys­tem. Here are list­ed few of them and al­so how to in­stall and ex­am­ples of their ba­sic us­age.

The htop com­mand

If a new­er ver­sion of htop is avail­able at your dis­tri­b­u­tion, there is avail­able an ad­di­tion­al tab that shows the I/O met­rics of the in­stance – Screen 1. Here is how to check the avail­able ver­sion and in­stall htop.

sudo apt show htop 2>/dev/null | grep '^Version'
sudo apt install htop

In­stall the lat­est ver­sion of htop 3.2.1–1 on Ubun­tu Serv­er 22.04.1 from a .deb pack­age.

cd /tmp
wget --no-check-certificate https://http.us.debian.org/debian/pool/main/h/htop/htop_3.2.1-1_amd64.deb
sudo apt install ./htop_3.2.1-1_amd64.deb

To be able to see all da­ta in most cas­es you need to run the tool as root:

sudo htop
Screen 1. The new I/O Met­rics tab of htop (v 3.2+). Use Tab to switch to the I/O tab, then use F6 to open the Sort by menu, and sort by IO_WRITE_RATE. The screen­shot is tak­en on Kali Lin­ux 2022. Screen 1. The new I/O Metrics tab of htop (v 3.2+). Use Tab to switch to the I/O tab, then use F6 to open the Sort by menu, and sort by IO_WRITE_RATE. The screenshot is taken on Kali Linux 2022.

The iostat com­mand

io­stat – Re­port Cen­tral Pro­cess­ing Unit (CPU) sta­tis­tics and input/​​​output sta­tis­tics for de­vices and par­ti­tions. The io­stat com­mand is used for mon­i­tor­ing sys­tem input/​​​output de­vice load­ing by ob­serv­ing the time the de­vices are ac­tive in re­la­tion to their av­er­age trans­fer rates…

The first re­port gen­er­at­ed by the io­stat com­mand pro­vides sta­tis­tics con­cern­ing the time since the sys­tem was boot­ed, un­less the -y op­tion is used. Each sub­se­quent re­port cov­ers the time since the pre­vi­ous re­port. All sta­tis­tics are re­port­ed each time the io­stat com­mand is run. The re­port con­sists of a CPU head­er row fol­lowed by a row of CPU sta­tis­tics. On mul­ti­proces­sor sys­tems, CPU sta­tis­tics are cal­cu­lat­ed sys­tem-wide as av­er­ages among all proces­sors. A de­vice head­er row is dis­played fol­lowed by a line of sta­tis­tics for each de­vice that is con­fig­ured…

Here is how to get the gen­er­al re­port in hu­man read­able for­mat.

iostat -h
Linux 5.18.0-kali5-amd64 (kali-x) 	08/31/2022 	_x86_64_	(24 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.4%    0.0%    0.2%    0.1%    0.0%   99.2%

      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
    11.50       224.2k        20.9k         0.0k       1.2G     119.2M       0.0k dm-0
    10.90       221.8k        20.9k         0.0k       1.2G     119.2M       0.0k dm-1
    10.99       315.7k        56.2k         0.0k       1.8G     320.0M       0.0k nvme0n1
     0.06         1.6k         0.0k         0.0k       8.9M       0.0k       0.0k sda
     4.14        55.1k         0.0k         0.0k     313.8M     152.0k       0.0k sdb

Here is how to get re­port per de­vice, per 1 minute, with time­stamp in hu­man read­able for­mat. Note the first re­port pro­vides sta­tis­tics con­cern­ing the time since the sys­tem was boot­ed, the lat­er re­ports pro­vide sta­tis­tic per 60 sec­onds.

iostat -h /dev/nvme0n1 -d -t 60
Linux 5.18.0-kali5-amd64 (kali-x) 	08/31/2022 	_x86_64_	(24 CPU)

08/31/2022 05:39:22 PM
      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
    10.45       297.5k        53.6k         0.0k       1.8G     324.5M       0.0k nvme0n1

08/31/2022 05:40:22 PM
      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
     1.10         0.6k        10.2k         0.0k      36.0k     612.0k       0.0k nvme0n1

Here is how to get the same as the above sta­tis­tic but con­cerned to a LVM log­i­cal vol­ume.

iostat -h /dev/mapper/kali--x--vg-home -d -t 60
Linux 5.18.0-kali5-amd64 (kali-x) 	08/31/2022 	_x86_64_	(24 CPU)

08/31/2022 05:45:11 PM
      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
     4.02        68.4k        19.7k         0.0k     437.2M     125.9M       0.0k dm-5

08/31/2022 05:46:11 PM
      tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
     0.30         0.0k         1.1k         0.0k       0.0k      68.0k       0.0k dm-5

Watch the changes in the full sta­tis­tics per half sec­ond. In the the fol­low­ing ex­am­ple:

  • io­stat ‑y ‑h – sup­press the re­port since the boot time, hu­man read­able for­mat,
  • io­stat 1 1[ in­ter­val [ count ] ] – one count per one sec­ond.
  • watch ‑n 0.5 ‑d – re­fresh per 0.5 sec­onds, show the dif­fer­ence
watch -n 0.5 -d "iostat -h -y 1 1"

Dis­play ex­tend­ed sta­tis­tics for the whole sys­tem, in hu­man read­able for­mat, with time­stamp, per 6 sec­onds.

iostat -x -h -t 6

The iotop com­mand

iotop – sim­ple top-like I/O mon­i­tor – watch­es  I/O us­age in­for­ma­tion avail­able in the Lin­ux ker­nel (re­quires 2.6.20 or lat­er) and dis­plays a ta­ble of cur­rent I/O us­age by process­es  or  threads  on  the  sys­tem.  At  least  the  CON­FIG­_­TASK­_­DELAY­_­ACCTCON­FIG­_­TASK­_­IO­_­ACCOU­NTINGCON­FIG­_­TASK­STATS and CONFIG­_­VM­_­EVENT­_­COUN­TERS op­tions need to be en­abled in your Lin­ux ker­nel build con­fig­u­ra­tion…

Start­ing with Lin­ux ker­nel 5.14.x task­_​​​­delayacct is con­fig­urable at run­time and set to off by de­fault. This set­ting can be changed in in­ter­ac­tive mode by the Ctrl+T short­cut. In batch mode a warn­ing is print­ed when the set­ting is OFF. From the com­mand line this can be en­abled by: su­do sysctl kernel.task­_­delayacct=1, and dis­abled again by su­do sysctl kernel.task­_­delayacct=0. It is ad­vis­able to keep this op­tion off when not us­ing this or an­oth­er mon­i­tor­ing pro­gram be­cause when en­abled it has some ef­fect on sys­tem  per­for­mance.

On Screen 3, at the bot­tom im­age is shown the very ba­sic us­age of iotop with­out, and at the top im­age how the out­put looks like with the fol­low­ing op­tions (ref­er­ence):

  • -a – will show ac­cu­mu­lat­ed out­put,
  • -o – will on­ly out­put,
  • -P – will on­ly show process­es in­stead of threads.
sudo iotop -aoP
sudo iotop
Screen 2. Examples of usage of the iotop command.
Screen 2. Ex­am­ples of us­age of the iotop com­mand. Screen 2. Examples of usage of the iotop command.

The dstat com­mand

dstat is a ver­sa­tile tool for gen­er­at­ing sys­tem re­source sta­tis­tics, it is a ver­sa­tile re­place­ment for vmstat, iostat and ifstat. Dstat is unique in let­ting you ag­gre­gate block de­vice through­put for a cer­tain diskset or net­work band­width for a group of in­ter­faces, ie. you can see the through­put for all the block de­vices that make up a sin­gle filesys­tem or stor­age sys­tem.

sudo apt install dstat

There is an un­count­able mul­ti­tude of op­tions and plu­g­ins avail­able for dstat. Here is one ex­am­ple of us­age – Screen 3 – where are ap­plied the fol­low­ing op­tions.

  • -D sdc – adds col­umn that re­ports the I/O rate of /dev/sdc.
  • -t, --time – en­able time/​​​date out­put.
  • -a, --all – equals to -cdngy (-c cpu, -d disk; -n en­able net­work stats; -g en­able page stats; -y en­able sys­tem stats).
  • --top-io – show most ex­pen­sive I/O process.
  • --top-bio – show most ex­pen­sive block I/O process.
  • --top-mem – show process us­ing the most mem­o­ry.
sudo dstat -D sda -ta --top-io --top-bio --top-mem
Screen 3. Example of usage of the dstat command.
Screen 3. Ex­am­ple of us­age of the dstat com­mand. Screen 3. Example of usage of the dstat command.

Here is an­oth­er ex­am­ple that will out­put the av­er­age I/O rate per minute.

dstat -tdD total 60
----system--------dsk/total--
     time      |  read  writ
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
29-08 08:40:13 | 1138M 1782M
29-08 08:41:13 |  234k  744k
29-08 08:42:13 |  293k  171k
29-08 08:43:13 |  268k  113k
29-08 08:44:13 | 1100k  129k

And one ex­am­ple (source) with list of mul­ti­ple dri­ves.

dstat -tdD total,sda,sdb,sdc,md1 60
----system---- -dsk/total----dsk/sda-----dsk/sdb-----dsk/sdc-----dsk/md1--
     time     | read  writ: read  writ: read  writ: read  writ: read  writ
08-11 22:08:17|3549k  277k: 144k   28k: 851k   62k: 852k   60k:  25k   82k
08-11 22:09:17|  60k  258k:1775B   15k:  13k   63k:  15k   60k:  68B   74k
08-11 22:10:17| 176k  499k:   0    14k:  41k  122k:  41k  125k: 273B  157k
08-11 22:11:17|  42k  230k:   0    14k:9830B   54k:  14k   51k:   0    70k
08-11 22:11:52|  28k  132k:   0  5032B:5266B   33k:9479B   28k:   0    37k

The vmstat com­mand

vmstat – vir­tu­al mem­o­ry sta­tis­tics – re­ports in­for­ma­tion about process­es, mem­o­ry, pag­ing, block IO, traps, disks and cpu ac­tiv­i­ty. The first re­port pro­duced gives av­er­ages since the last re­boot. Ad­di­tion­al re­ports give in­for­ma­tion on a sam­pling pe­ri­od of length de­lay. The process and mem­o­ry re­ports are in­stan­ta­neous in ei­ther case.

Here is how to get sta­tis­tics about the block de­vices – -d, in megabytes -Sm (1000000), or -SM (1048576) bytes.

sudo vmstat -d -Sm
disk-   ------------reads--------------- --------------writes--------------- ------IO-----
        total  merged   sec tors      ms    total  merged   sectors       ms    cur    sec
loop0      84       0      2554       81        0       0         0        0      0      0
loop1      84       0      2382       36        0       0         0        0      0      0
loop2      52       0       856       39        0       0         0        0      0      0
loop3      60       0       814       43        0       0         0        0      0      0
loop4      52       0       764       14        0       0         0        0      0      0
loop5     539       0     11200      240        0       0         0        0      0      0
loop6      87       0      2498      105        0       0         0        0      0      0
loop7     493       0     34854      228        0       0         0        0      0      1
sda   1284880  157343  65276608   752987  3936160 2077213 135566208  4487853      0   4123
sdc    437039  119521   4760810  2594193    96132  145500  54786232 10747453      0   3024
sdd   2614304  458294  24597746  6017154    63873 1360249  19094048  7394053      0   5798
sdb    136351    1445  34564266   329162    25383    2759  47585536  4447925      0    424
sr0       120       0      897        14        0       0         0        0      0      0
loop8      49       0      752        24        0       0         0        0      0      0
loop9      88       0     3334        42        0       0         0        0      0      0
loop10     11       0       28         0        0       0         0        0      0      0

The sar com­mand

The sar com­mand is part of the pack­age sysstat. It out­puts the con­tents of se­lect­ed cu­mu­la­tive ac­tiv­i­ty coun­ters in the  op­er­at­ing  sys­tem. The ac­tiv­i­ties are col­lect­ed by the sysstat.service. Af­ter in­stalling the pack­age we need to en­able the col­lec­tor ser­vice and wait un­til some sta­tis­tics are col­lect­ed.

sudo apt install sysstat
sudo sed -i 's/ENABLED="false"/ENABLED="true"/' /etc/default/sysstat
sudo systemctl enable --now sysstat.service
systemctl cat sysstat-collect.timer
# /lib/systemd/system/sysstat-collect.timer
# /lib/systemd/system/sysstat-collect.timer
# (C) 2014 Tomasz Torcz <tomek@pipebreaker.pl>
#
# sysstat-12.5.2 systemd unit file:
#        Activates activity collector every 10 minutes

[Unit]
Description=Run system activity accounting tool every 10 minutes

[Timer]
OnCalendar=*:00/10

[Install]
WantedBy=sysstat.service
sar
Linux 5.15.39-4-pve (ubuntu-lxc-pve) 	08/28/22 	_x86_64_	(24 CPU)

20:41:48     LINUX RESTART	(24 CPU)

20:50:05        CPU     %user     %nice   %system   %iowait    %steal     %idle
21:00:00        all      1.66      0.00      0.26      0.02      0.00     98.06
21:10:10        all      2.66      0.00      0.27      0.03      0.00     97.03
21:20:13        all      1.92      0.00      0.29      0.02      0.00     97.76
Average:        all      2.09      0.00      0.27      0.03      0.00     97.62

Mon­i­tor the Files Size Changes Re­cur­sive­ly

By the fol­low­ing com­mand we can mon­i­tor which are the most writ­ten files for the past 10 min­utes, larg­er than 800 Kb. This is done re­cur­sive­ly for the di­rec­to­ries /var/lib and /var/log. The out­put of the com­mand is shown at Screen 4.

sudo watch -n 3 -d \
"find /var/lib /var/log -type f -size +800k -mmin -10 -printf '%-30s \t %t %p\n' | grep -Pv '\.(gz|[0-9])$'"
Screen 4. Use watch and find to monitor file change in real time.
Screen 4. Use watch and find to mon­i­tor file change in re­al time. Screen 4. Use watch and find to monitor file change in real time.

Here is an ad­vanced ver­sion :) which out­puts al­so an ad­di­tion­al da­ta gen­er­at­ed by iostat:

sudo watch -n 3 -d \
"find /var/lib /var/log -type f -size +800k -mmin -10 -printf '%-30s \t %t %p\n' | grep -Pv '\.(gz|[0-9])$';
 echo;
 iostat /dev/sda2"
#Out­put
Every 3.0s: find /var/lib /var/log -type f -size +800k -mmin -10 -printf '%s \t %t %p\n'...; iostat /dev/sda2...

4362053          Mon Aug 29 08:25:15.1573410540 2022 /var/lib/redis/dump.rdb
67108864         Mon Aug 29 08:29:34.6284410810 2022 /var/lib/mysql/undo_001
3276800          Mon Aug 29 08:29:36.1604593970 2022 /var/lib/mysql/#innodb_redo/#ib_redo10127
83886080         Mon Aug 29 08:29:33.6604295080 2022 /var/lib/mysql/mysql.ibd
31459279         Mon Aug 29 08:29:34.2284362990 2022 /var/lib/mysql/binlog.005694
5242880          Mon Aug 29 08:29:33.6284291260 2022 /var/lib/mysql/SCloud/oc_authtoken.ibd
6291456          Mon Aug 29 08:29:34.6284410810 2022 /var/lib/mysql/SCloud/oc_jobs.ibd
50331648         Mon Aug 29 08:29:34.6284410810 2022 /var/lib/mysql/undo_002
79691776         Mon Aug 29 08:29:34.6284410810 2022 /var/lib/mysql/ibdata1
11873325         Mon Aug 29 08:29:01.5600457430 2022 /var/log/syslog
31441772         Mon Aug 29 08:29:08.0921238280 2022 /var/log/auth.log
1504046          Mon Aug 29 08:25:15.2373420090 2022 /var/log/redis/redis-server.log
8388608          Mon Aug 29 08:27:04.5226471030 2022 /var/log/journal/e8dsfe54457bd2f6a44344e1/user-1000.journal
33554432         Mon Aug 29 08:29:08.5241289930 2022 /var/log/journal/e8dsfe54457bd2f6a44344e1/system.journal
1183995          Mon Aug 29 08:25:17.2973666000 2022 /var/log/apache2/wiki.error.log
1170720          Mon Aug 29 08:25:17.3013666480 2022 /var/log/apache2/wiki.access.log
1166389          Mon Aug 29 08:25:14.0613279700 2022 /var/log/apache2/cloud.access.log
1243871          Mon Aug 29 08:25:13.9613267760 2022 /var/log/apache2/bg.mirror.access.log

Linux 5.15.0-46-generic (szs.space) 	08/29/22 	_x86_64_	(16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.50    0.00    0.73    1.06    0.00   95.71

Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
sda2             55.71       454.89       741.44       421.11   32361849   52747168   29958424

From the file sys­tem /​​​proc/​​​diskstats and /sys/block/<dev>/stat

The main pros of this ap­proach (pro­posed here) is that, you do not need any spe­cial tool. In­for­ma­tion about these files and the columns in­side them could be ob­tained at:

cat /sys/block/sda/stat && cat /sys/block/sda/sda2/stat
              88794 40932 3793872 122954 51212 37863 1655608 55654 0   80500 191053 0   0   0   0   17864 12444
              88553 40932 3786488 122874 51212 37863 1655608 55654 0   80452 178529 0   0   0   0   0     0
# -   -  -    1     2     3       4      5     6     7       8     9   10    11     12  13  14  15  16    17
# 1   2  3    4     5     6       4      8     9    10       11    12  13    14     15  16  17  18  19    20
cat /proc/diskstats | grep -Pw 'sda[2]?'
  8   0  sda  88794 40932 3793872 122954 51212 37863 1655608 55654 0   80500 191053 0   0   0   0   17864 12444
  8   2  sda2 88553 40932 3786488 122874 51212 37863 1655608 55654 0   80452 178529 0   0   0   0   0     0
# 1   2  3    4     5     6       4      8     9    10       11    12  13    14     15  16  17  18  19    20
curl https://raw.githubusercontent.com/torvalds/linux/master/Documentation/ABI/testing/procfs-diskstats
What:		/proc/diskstats
Date:		February 2008
Contact:	Jerome Marchand <jmarchan@redhat.com>
Description:
		The /proc/diskstats file displays the I/O statistics
		of block devices. Each line contains the following 14
		fields:

		==  ===================================
		 1  major number
		 2  minor mumber
		 3  device name
		 4  reads completed successfully
		 5  reads merged
		 6  sectors read
		 7  time spent reading (ms)
		 8  writes completed
		 9  writes merged
		10  sectors written
		11  time spent writing (ms)
		12  I/Os currently in progress
		13  time spent doing I/Os (ms)
		14  weighted time spent doing I/Os (ms)
		==  ===================================

		Kernel 4.18+ appends four more fields for discard
		tracking putting the total at 18:

		==  ===================================
		15  discards completed successfully
		16  discards merged
		17  sectors discarded
		18  time spent discarding
		==  ===================================

		Kernel 5.5+ appends two more fields for flush requests:

		==  =====================================
		19  flush requests completed successfully
		20  time spent flushing
		==  =====================================

		For more details refer to Documentation/admin-guide/iostats.rst

Mis­cel­la­neous

Bench­mark Tools

  • io­zone – filesys­tem bench­mark – it is a filesys­tem bench­mark tool. The bench­mark gen­er­ates and mea­sures a va­ri­ety of file op­er­a­tions. io­zone has been port­ed to many ma­chines and runs un­der many op­er­at­ing sys­tems. This doc­u­ment will cov­er the many dif­fer­ent types of op­er­a­tions that are test­ed as well as cov­er­age of all of the com­mand line op­tions. Man­u­al with ex­am­ples: kongll​.github​.io/​i​ozone
  • hd­parm – get/​​​set SATA/IDE de­vice pa­ra­me­ters – pro­vides a com­mand line in­ter­face to var­i­ous ker­nel in­ter­faces sup­port­ed by the Lin­ux SATA/PATA/SAS "li­ba­ta" sub­sys­tem and the old­er IDE dri­ver sub­sys­tem. Per­form read test:
sudo hdparm -tT /dev/nvme0n1
  • dd – con­vert and copy a file – per­form write and read tests (re­al­ly sim­pli­fied just as a note):
dd if=/dev/zero of=./test.file bs=4096k count=4096
dd if=./test.file of=/dev/zero bs=4096k count=4096

Mon­i­tor­ing Tools

  • htop [-dCFh­pustvH] – in­ter­ac­tive process view­er – It is sim­i­lar to top, but al­lows you to scroll ver­ti­cal­ly and hor­i­zon­tal­ly, and in­ter­act us­ing a point­ing de­vice (mouse). You can ob­serve all process­es run­ning on the sys­tem, along with their com­mand line ar­gu­ments, as well as view them in a tree for­mat, se­lect mul­ti­ple process­es and act­ing on them all at once. Tasks re­lat­ed to process­es (killing, renic­ing) can be done with­out en­ter­ing their PIDs.
  • top [-hv|-bcEeHiOSs1 ‑d secs ‑n max ‑u|U user ‑p pids ‑o field ‑w [cols]] – dis­play Lin­ux process­es – it pro­vides a dy­nam­ic re­al-time view of a run­ning sys­tem. It can dis­play sys­tem sum­ma­ry in­for­ma­tion as well as a list of process­es or threads cur­rent­ly be­ing man­aged by the Lin­ux ker­nel. The types of sys­tem sum­ma­ry in­for­ma­tion shown and the types, or­der and size of in­for­ma­tion dis­played for process­es are all user con­fig­urable and that con­fig­u­ra­tion can be made per­sis­tent across restarts.
  • atop – Ad­vanced Sys­tem & Process Mon­i­tor – The pro­gram atop is an in­ter­ac­tive mon­i­tor to view the load on a Lin­ux sys­tem. It shows the oc­cu­pa­tion of the most crit­i­cal hard­ware re­sources (from a per­for­mance point of view) on sys­tem lev­el, i.e. cpu, mem­o­ry, disk and net­work. It al­so shows which process­es are re­spon­si­ble for the in­di­cat­ed load with re­spect to cpu and mem­o­ry load on process lev­el. Disk load is shown per process if "stor­age ac­count­ing is ac­tive in the ker­nel. Net­work load is shown per process if the ker­nel mod­ule ne­tatop has been in­stalled.

Ref­er­ences