[Solved] InnoDB: Assertion

Discuss your pilot or production implementation with other Zimbra admins or our engineers.
Post Reply
User avatar
JDunphy
Outstanding Member
Outstanding Member
Posts: 889
Joined: Fri Sep 12, 2014 11:18 pm
Location: Victoria, BC
ZCS/ZD Version: 9.0.0_P39 NETWORK Edition

[Solved] InnoDB: Assertion

Post by JDunphy »

Greetings,

I am running 8.7.11 and I saw in my morning report from /opt/zimbra/libexec/zmdbintegrityreport that the integrity check had failed. I opened a support ticket with Zimbra. They asked me to follow this document https://wiki.zimbra.com/wiki/Mysql_Crash_Recovery after requesting my mysql_error.log file.

My limited inspection of mysql_error.log shows this:

Code: Select all

...
InnoDB: (index "PRIMARY" of table "mboxgroup3"."mail_item")
InnoDB: Corruption of an index tree: table "mboxgroup3"."mail_item", index "PRIMARY",
InnoDB: father ptr page no 1965, child page no 1962
...
followed by this:

Code: Select all

2018-07-16 10:57:28 140645797340960 [Note] /opt/zimbra/common/sbin/mysqld: ready for connections.
Version: '10.1.25-MariaDB'  socket: '/opt/zimbra/data/tmp/mysql/mysql.sock'  port: 7306  Zimbra MariaDB binary distribution
2018-07-16 10:58:49 7feaa6bd4b00  InnoDB: Assertion failure in thread 140645796498176 in file btr0pcur.cc line 457
InnoDB: Failing assertion: btr_page_get_prev(next_page, mtr) == buf_block_get_page_no(btr_pcur_get_block(cursor))
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
I have isolated the table to one user:

Code: Select all

[zimbra@mail ~]$ mysql -e "check table mboxgroup3.mail_item"
ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query
My attempts to check or verify that table results in a restart of mysql and subsequent repair shows this in the mysql_error.log. Note: I have not restarted mysql with innodb_force_recovery = 1 as of yet.

Code: Select all

InnoDB: The log sequence number 9519241261 in ibdata file do not match the log sequence number 9519701606 in the ib_logfiles!
...
...
Recovering after a crash using tc.log
Starting crash recovery...
Crash recovery finished.
...
/opt/zimbra/common/sbin/mysqld: ready for connections.
Does anyone have experience fixing what appears to be a corrupt table. The document above seems overly drastic by dropping all tables in addition to this scary command. I asked for confirmation and they told me to follow it:

Code: Select all

rm -rf /opt/zimbra/db/data/ib*
Following by a possible, zmmyinit, etc.

The user currently doesn't notice any issues nor does the system behave differently other than this error. Does anyone have a different approach to this problem? I was thinking putting it into recovery mode and looking for bad tables first:

Code: Select all

source ~/bin/zmshutil; zmsetvars
mysqlcheck --all-databases -S $mysql_socket -u root --password=$mysql_root_password
then dumping/dropping/recreating the appropriate table provided this is the proper solution. I am in the information gathering phase. It's been probably 12 years since I had to fix a database so a little rusty at present.

Any advice would be appreciated.

Jim
Last edited by JDunphy on Tue Jul 17, 2018 7:47 pm, edited 1 time in total.
User avatar
JDunphy
Outstanding Member
Outstanding Member
Posts: 889
Joined: Fri Sep 12, 2014 11:18 pm
Location: Victoria, BC
ZCS/ZD Version: 9.0.0_P39 NETWORK Edition

Re: InnoDB: Assertion

Post by JDunphy »

I have been able to create a test environment from my production box but not having much luck with this mboxgroup3 yet. I now know it is two tables inside that mboxgroup3 database - (mail_item_dumpster and mail_item) and it fails with mysqldump. I was able to dump all the other DB's from zimbra and they are clean. My process is:

Code: Select all

$ su - zimbra
% vi conf/my.cnf   (change innodb_force_recovery=X)
% mysql.server restart
% source ~/bin/zmshutil; zmsetvars
% /opt/zimbra/common/bin/mysqldump mboxgroup3 -S $mysql_socket -u root --password=$mysql_root_password > /tmp/mysql.sql/mboxgroup3
mysqldump: Error 2013: Lost connection to MySQL server during query when dumping table `mail_item` at row: 32657
I have incremented my way past 4 which is data loss and no change up to 6 before giving up on that method. I can see in mysql_error.log that innodb_force_recovery is being set after each restart.

Previously, I walked through the tables via something like this to locate the bad tables:

Code: Select all

 [zimbra@tmail ~]$ mysql -e "check table mboxgroup3.mail_item_dumpster"
+-------------------------------+-------+----------+-------------------------------------------------------------------------+
| Table                         | Op    | Msg_type | Msg_text                                                                |
+-------------------------------+-------+----------+-------------------------------------------------------------------------+
| mboxgroup3.mail_item_dumpster | check | Warning  | InnoDB: Index 'i_type' contains 1806 entries, should be 1800.           |
| mboxgroup3.mail_item_dumpster | check | Warning  | InnoDB: Index 'i_parent_id' contains 1806 entries, should be 1800.      |
| mboxgroup3.mail_item_dumpster | check | Warning  | InnoDB: Index 'i_folder_id_date' contains 1806 entries, should be 1800. |
| mboxgroup3.mail_item_dumpster | check | Warning  | InnoDB: Index 'i_index_id' contains 1806 entries, should be 1800.       |
| mboxgroup3.mail_item_dumpster | check | Warning  | InnoDB: Index 'i_date' contains 1806 entries, should be 1800.           |
| mboxgroup3.mail_item_dumpster | check | Warning  | InnoDB: Index 'i_mod_metadata' contains 1806 entries, should be 1800.   |
| mboxgroup3.mail_item_dumpster | check | Warning  | InnoDB: Index 'i_uuid' contains 1806 entries, should be 1800.           |
| mboxgroup3.mail_item_dumpster | check | error    | Corrupt                                                                 |
+-------------------------------+-------+----------+-------------------------------------------------------------------------+
Unfortunately, mboxgroup3.mail_item is the bad one because it causes an assertion and restart of mysql on failure and has me stumped.

Code: Select all

[zimbra@tmail ~]$ mysql -e "check table mboxgroup3.mail_item"
ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query
Learning a bit more... found https://github.com/chhabhaiya/undrop-for-innodb and starting to dig deeper via: https://twindb.com/recover-corrupt-mysql-database/

We will see what answers my support contract and associated open ticket provides tomorrow. :?:
User avatar
JDunphy
Outstanding Member
Outstanding Member
Posts: 889
Joined: Fri Sep 12, 2014 11:18 pm
Location: Victoria, BC
ZCS/ZD Version: 9.0.0_P39 NETWORK Edition

Re: InnoDB: Assertion

Post by JDunphy »

Small success this morning... First thing was to fix the mboxgroup3.mail_item_dumpster forcing it to recreate + analyze. Other important fact is zimbra has innodb_file_per_table defined in my.cnf

Code: Select all

[zimbra@tmail ~]$ mysql -e "check table mboxgroup3.mail_item_dumpster"
+-------------------------------+-------+----------+-------------------------------------------------------------------------+
| Table                         | Op    | Msg_type | Msg_text                                                                |
+-------------------------------+-------+----------+-------------------------------------------------------------------------+
| mboxgroup3.mail_item_dumpster | check | Warning  | InnoDB: Index 'i_type' contains 1806 entries, should be 1800.           |
| mboxgroup3.mail_item_dumpster | check | Warning  | InnoDB: Index 'i_parent_id' contains 1806 entries, should be 1800.      |
| mboxgroup3.mail_item_dumpster | check | Warning  | InnoDB: Index 'i_folder_id_date' contains 1806 entries, should be 1800. |
| mboxgroup3.mail_item_dumpster | check | Warning  | InnoDB: Index 'i_index_id' contains 1806 entries, should be 1800.       |
| mboxgroup3.mail_item_dumpster | check | Warning  | InnoDB: Index 'i_date' contains 1806 entries, should be 1800.           |
| mboxgroup3.mail_item_dumpster | check | Warning  | InnoDB: Index 'i_mod_metadata' contains 1806 entries, should be 1800.   |
| mboxgroup3.mail_item_dumpster | check | Warning  | InnoDB: Index 'i_uuid' contains 1806 entries, should be 1800.           |
| mboxgroup3.mail_item_dumpster | check | error    | Corrupt                                                                 |
+-------------------------------+-------+----------+-------------------------------------------------------------------------+
Next

Code: Select all

[zimbra@mail ~]$ mysql -e "optimize table mboxgroup3.mail_item_dumpster"
+-------------------------------+----------+----------+-------------------------------------------------------------------+
| Table                         | Op       | Msg_type | Msg_text                                                          |
+-------------------------------+----------+----------+-------------------------------------------------------------------+
| mboxgroup3.mail_item_dumpster | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| mboxgroup3.mail_item_dumpster | optimize | status   | OK                                                                |
+-------------------------------+----------+----------+-------------------------------------------------------------------+
Repaired

Code: Select all

[zimbra@mail ~]$ mysql -e "check table mboxgroup3.mail_item_dumpster"
+-------------------------------+-------+----------+----------+
| Table                         | Op    | Msg_type | Msg_text |
+-------------------------------+-------+----------+----------+
| mboxgroup3.mail_item_dumpster | check | status   | OK       |
+-------------------------------+-------+----------+----------+
Now moving to the last table mail_item. This is going to be much harder and will cause a fatal assertion error and restart of mysql should you attempt to read the corrupted rows... stopping backups dead in their tracks for that database until it is repaired. Still researching my options.
User avatar
L. Mark Stone
Ambassador
Ambassador
Posts: 2796
Joined: Wed Oct 09, 2013 11:35 am
Location: Portland, Maine, US
ZCS/ZD Version: 10.0.6 Network Edition
Contact:

Re: InnoDB: Assertion

Post by L. Mark Stone »

Not at all sure this would work, but can you export the mailbox, then delete it, then recreate it and import the backed up tgz file? The user I expect cares less about the state of MariaDB and more about their email (and I also expect I'm telling you something you already know as well! :-) )

But perhaps before you do that, can you compare the size of the InnoDB buffer pool in my.cnf with the actual size of the InnoDB database (the buffer pool should be at least 25% larger than the InnoDB database itself)? Most of the MySQL corruption I have seen happens when the buffer pool is smaller than the database, so lots of disk thrashing as table data is moved back and forth between RAM and disk. Easy to change in my.cnf, so if the InnoDB database size is, say, 5.4GB, you can change the innodb_buffer_pool_size variable equal to something like 1.25 x 5.4 = 6.75, so round up a little to "7168M". Using the M suffix makes the entry easier to read as well.

I like the tool MySQLTuner, which runs some other checks as well:

As root,

Code: Select all

wget mysqltuner.pl .
mv index.html /opt/zimbra/mysqltuner.pl
chown zimbra.zimbra /opt/zimbra/mysqltuner.pl
chmod +x /opt/zimbra/mysqltuner.pl
su - zimbra
./mysqltuner.pl
Hope that helps,
Mark
___________________________________
L. Mark Stone
Mission Critical Email - Zimbra VAR/BSP/Training Partner https://www.missioncriticalemail.com/
AWS Certified Solutions Architect-Associate
User avatar
JDunphy
Outstanding Member
Outstanding Member
Posts: 889
Joined: Fri Sep 12, 2014 11:18 pm
Location: Victoria, BC
ZCS/ZD Version: 9.0.0_P39 NETWORK Edition

Re: InnoDB: Assertion

Post by JDunphy »

Thanks Mark,

I should be good and run around 30%. I also need to look for hardware errors with my cloud provider for root cause. Still investigating all options.

I contemplated going back to drop database but tried this instead to save myself more work. What I had - A good zimbra full backup and a recent export in .tgz to fill missing data. I also had newer zimbra corrupt backups once the error happened.

Here is what I ended up doing as I didn't want to force a mysql reset assertion accessing row in that table.

Code: Select all

MariaDB [(none)]> SET FOREIGN_KEY_CHECKS = 0;
Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> truncate table mboxgroup3.mail_item;
Query OK, 0 rows affected (0.02 sec)

MariaDB [(none)]> SET FOREIGN_KEY_CHECKS = 1;
Query OK, 0 rows affected (0.00 sec)
Then

Code: Select all

[zimbra@tmail ~]$ mysql -e "check table mboxgroup3.mail_item"
+----------------------+-------+----------+----------+
| Table                | Op    | Msg_type | Msg_text |
+----------------------+-------+----------+----------+
| mboxgroup3.mail_item | check | status   | OK       |
+----------------------+-------+----------+----------+
[zimbra@tmail ~]$  /opt/zimbra/libexec/zmdbintegrityreport
[zimbra@tmail ~]$ echo $?
0
The problem I ran into was any reference to mboxgroup3.mail_item inconsistencies would restart mysql when that assertion hit killing any attempt to restore or backup data. That means that all zimbra backups were useless from the error forward. I had to pull my last good backup that was offsite, and did the following:

Code: Select all

zmrestore -a user@example.com  --ignoreRedoErrors
For the newer data, we did an export from the date of the last full backup to now prior then re-imported that tgz file. Had user check and everything looks good. We were lucky that the problems appeared in older data in the table.

Now I need to do it for real on the production box. Left as an exercise for the reader. :-) I was very fortunate that the users didn't notice any problems so a little lucky here.

Lessons learned. Run zmdbintegrityreports more than the default once per week and a lot more especially if you have some sort of hot standby because you might be surprised. Put more effort into db integrity. I am sure a lot more lessons once I get through all my notes.
User avatar
L. Mark Stone
Ambassador
Ambassador
Posts: 2796
Joined: Wed Oct 09, 2013 11:35 am
Location: Portland, Maine, US
ZCS/ZD Version: 10.0.6 Network Edition
Contact:

Re: [Solved] InnoDB: Assertion

Post by L. Mark Stone »

Well that's good fortune there for sure!

Not sure if you know about it, but for 8.7 and earlier, if you enable MariaDB Binary Logging (you need some disk space for this for sure), then you can do:

Code: Select all

zmlocalconfig -e mysql_backup_retention=8
which will cause Zimbra to run nightly MySQL dumps as part the nightly Network Edition backups, and keep 8 days' worth.

To enable binary logging, you need to add to /opt/zimbra/conf/my.cnf (probably you know all about this part too...)

Code: Select all

log-bin = /opt/zimbra/backup/mysql/binlog
binlog-format = MIXED
max_binlog_size = 134217728
expire_logs_days = 9
All the best,
Mark
___________________________________
L. Mark Stone
Mission Critical Email - Zimbra VAR/BSP/Training Partner https://www.missioncriticalemail.com/
AWS Certified Solutions Architect-Associate
User avatar
JDunphy
Outstanding Member
Outstanding Member
Posts: 889
Joined: Fri Sep 12, 2014 11:18 pm
Location: Victoria, BC
ZCS/ZD Version: 9.0.0_P39 NETWORK Edition

Re: [Solved] InnoDB: Assertion

Post by JDunphy »

L. Mark Stone wrote:Well that's good fortune there for sure!

Not sure if you know about it, but for 8.7 and earlier, if you enable MariaDB Binary Logging (you need some disk space for this for sure), then you can do:

Code: Select all

zmlocalconfig -e mysql_backup_retention=8
which will cause Zimbra to run nightly MySQL dumps as part the nightly Network Edition backups, and keep 8 days' worth.

To enable binary logging, you need to add to /opt/zimbra/conf/my.cnf (probably you know all about this part too...)

Code: Select all

log-bin = /opt/zimbra/backup/mysql/binlog
binlog-format = MIXED
max_binlog_size = 134217728
expire_logs_days = 9
All the best,
Mark
I learned about that today. Glad you mentioned that to complete this thread. I might still explore using undrop-for-innodb as it appears I could have fixed this with that tool but there is a little bit of a learning curve and I wanted to get back to my happy place before digging into more detail with that method. I have never had an innodb corruption before. Hopefully I have now paid my dues to scare future bad luck away. knock on wood. :-)
User avatar
JDunphy
Outstanding Member
Outstanding Member
Posts: 889
Joined: Fri Sep 12, 2014 11:18 pm
Location: Victoria, BC
ZCS/ZD Version: 9.0.0_P39 NETWORK Edition

Re: [Solved] InnoDB: Assertion

Post by JDunphy »

After investigation with the cloud provider, it was determined that a faulty raid controller in conjunction with power issues in the Fremont data center was likely behind the corruption given the dates we first saw that InnoDB assertion failure in the mysql_error.log. We never saw a power outage on the server so it could also have been something like not using ECC memory or the any other assortments where a bit flip can occur.

As a result, we moved to new hardware and did an upgrade at the same time increasing physical memory to further mitigate this from reoccurring. The more I think about the issues, I am not sure I have a backup strategy that can handle this problem properly. We were lucky we didn't have data loss but that was more good fortune than process IMO. Initially, I added more backing store on the server and increased memory to the java virtual machine (30%) and about (25%) to the mysql service. I also increased innodb_buffer_pool_size. I never saw evidence however this machine experienced load or memory shortfalls from the zimbra logs and only did that as a precaution.

Of note: the solutions I was contemplating including hot imap sync migration to a new server would have failed as that mysql reset forced apps to behave in unexpected ways. Zimbra backups quietly terminated without completion on the users account as did other methods. Rsyncing the server allowed me to replicate the configuration in a test environment to experiment for possible solutions and cost me under a $1 for the rental. Unfortunately, I can now think of more failure scenarios than working scenarios for my backup strategies for these type of events.

I am left wondering how zimbra 8.8.X backups and a newer code base will handle these "can't happen events" by software.
User avatar
L. Mark Stone
Ambassador
Ambassador
Posts: 2796
Joined: Wed Oct 09, 2013 11:35 am
Location: Portland, Maine, US
ZCS/ZD Version: 10.0.6 Network Edition
Contact:

Re: [Solved] InnoDB: Assertion

Post by L. Mark Stone »

Disaster Recovery strategies always have a cost:benefit:risk:likelihood matrix around them that different folks in a company will weight differently.

The worst Zimbra disaster I ever came across was a 450-mailbox server where the sysadmins had run the installer a few times as part of an attempted upgrade with expired certificates.

There were no backups, and with the original .saveconfig long since gone, LDAP and MySQL were wiped clean. All we had for data was /opt/zimbra/store/0 and all the subdirectories.

We bought a backup MX Service to use for queueing inbound email, and copied the store directory to a file server.

A dozen staff were called in to read emails in the store tree, and once a directory was identified as belonging to a particular user, the sub directory was renamed with the users name.

While this was going on, we stood up a Courier IMAP server adjacent to the file server, and stood up a new Zimbra server as well.

Oh, and there was no spreadsheet not other comprehensive list of employees.

Once the store tree subdirectories were renamed entirely, we used the subdirectories listing to create a Zimbra bulk provisioning script and made all the new mailboxes.

We then pointed the Courier IMAP server at the file server to serve up as its own message store, and then ran IMAPSYNC to recover the emails into the new Zimbra server.

The original server was inadvertently destroyed on a Thursday evening after hours. By midday Saturday staff were using the new Zimbra server, getting new emails, and working on rebuilding or restoring their address books and calendars from local copies. The IMAPSYNC finished on Sunday.

Monday morning started normally.

So, in my mind, so long as you take steps to protect the integrity of /opt/zimbra/store, you can pretty much recover from a total server failure within a few days.

I’m very keen to try the new centralized store function on a replicated or distributed file system. I expect it will make recovery from mailbox server disasters much easier and quicker.

Hope that helps, and glad you are back up and running!

Mark

P. S. Never say never, but I don’t think that AWS infrastructure would have suffered the same way in the event of a hardware issue like that.
___________________________________
L. Mark Stone
Mission Critical Email - Zimbra VAR/BSP/Training Partner https://www.missioncriticalemail.com/
AWS Certified Solutions Architect-Associate
Post Reply