Change value from "current absolute value" to "15min average value" and
tune warning limits accordingly. Healing value can sometimes be 1 and
gluster will fix it by itself. Send warning only if it stays at 1 (or
above) for 15 minutes. This will still send a warning immediately if it
goes to 3 or above.
Doesn't work with '--write-mostly' drives like sde1 in :
md94 : active raid1 sde1[1](W)(S) sdd1[2]
I assume to remove everything behind the opening bracket
when installing / first running the plugin, it can take longer than
one munin-update call to create the $TIMEFILE. so until it is created
we assume the $LOCKFILE is not too old yet and let the first run finish
before we spawn additional processes.
During certain situations, a device in the btrfs pool can show a total
capacity of 0 bytes. This is aspecially true when replacing or removing a
failed disk.
This fix stops the plugin from crashing in that situation but just report
the devices percentage as unknown (U). That way other devices in the pool
stil can be monitored.
This does not seam logical, because it explicitly want's spaces arround equal signs when setting a variable, but you musn't use them in parameters. But well, whatever makes the linter happy.
With serials like
Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 285S100HTYST KXG50ZNV512G TOSHIBA 1 512,11 GB / 512,11 GB 512 B + 0 B AAGA4106
/dev/nvme1n1 S4GENX0N713949 SAMSUNG MZVLB512HBJQ-00000 1 474,89 GB / 512,11 GB 512 B + 0 B EXF7201Q
the internal names (by serial number) were rewritten to
Field Internal name Type Warn Crit Info
/dev/nvme0n1 _85S100HTYST_w derive
/dev/nvme1n1 S4GENX0N713949_w derive
--> the trailing underscore in case of nvme0n1 created problems in graph processing:
[RRD ERROR] Unable to graph /var/cache/munin/www/lxdserver/....agitos.de/nvme_bytes-month.png : undefined vname c285S100HTYST_r
Therefore I added a prefix 'SN_' for internal names.
There is some trickery going on to not wakeup the disk when it's in standby
Note: this was aimed at munin-c, but was rejected since it uses a
subprocess that calls the `smartctl` tool.