Still watching the rain fall ... with my laptop and a big screen TV ... LoL
Well, i managed to restore a backup from yesterday, and had a look inside of one of those huge files.
in vm-csv, the file contains only the header for the csv ... weird ..
this:
timestamp, r, b, swpd, free, buff, cache, si, so, bi, bo, in, cs, us, sy, id, wa, st, MemTotal, MemFree, MemAvailable, Buffers, Cached, SwapCached, Active, Inactive, Active(anon), Inactive(anon), Active(file), Inactive(file), Unevictable, Mlocked, SwapTotal, SwapFree, Dirty, Writeback, AnonPages, Mapped, Shmem, Slab, SReclaimable, SUnreclaim, KernelStack, PageTables, NFS_Unstable, Bounce, WritebackTmp, CommitLimit, Committed_AS, VmallocTotal, VmallocUsed, VmallocChunk, HardwareCorrupted, AnonHugePages, CmaTotal, CmaFree, HugePages_Total, HugePages_Free, HugePages_Rsvd, HugePages_Surp, Hugepagesize, DirectMap4k, DirectMap2M, loadavg
Repeated XXXX times.
But this dir :
drwxr-x--- 1 zimbra zimbra 52 Mar 11 19:50 2018-03-10/
Has the vm.csv archived file correct until ( Last line ) :
03/10/2018 11:07:30, 1, 0, 308, 2737300, 18732, 1584860, 0, 0, 147, 1584, 675, 2421, 10, 2, 84, 3, 0, 9292656, 2737300, 4145340, 18732, 1469668, 268, 30
So may I conclude that in fact something currupted the file around 11AM ...
reboot system boot 4.4.0-116-generi Sat Mar 10 10:27 - 11:07 (00:40) -- The reboot after the upgrade process, first time I noticed the problem
reboot system boot 4.4.0-116-generi Sat Mar 10 14:13 - 18:06 (03:52) -- New test to see the problem
reboot system boot 4.4.0-116-generi Sat Mar 10 18:06 still running -- Reboot again, I wanted to wait a bit and went for a walk...
And if we look at the timestamps of the files created :
After the reboot I stopped the zmstat service and deleted these huge files:
-rw------- 1 zimbra zimbra 38096896000 Mar 10 18:13 io-x.csv
-rw------- 1 zimbra zimbra 37955010560 Mar 10 18:13 io.csv
-rw------- 1 zimbra zimbra 39309541376 Mar 10 18:13 vm.csv
These where created on the reboot at 14h
-rw-r----- 1 zimbra zimbra 242968560 Mar 10 14:31 io.csv.gz
-rw-r----- 1 zimbra zimbra 208158485 Mar 10 14:28 vm.csv.gz
-rw-r----- 1 zimbra zimbra 621222084 Mar 10 14:43 io-x.csv.gz
No explanation found for the problem, but it seems that after deleting those files 'vm.csv' & 'io*' the problem didn't manifested again.
I will keep looking.
Stay sharp
PS - Edited to organize the flow of the events.