Performance improvement help

Discuss your pilot or production implementation with other Zimbra admins or our engineers.
Post Reply
daliose01
Posts: 12
Joined: Wed Aug 03, 2016 7:32 am
Location: Japan
ZCS/ZD Version: 8.7b1
Contact:

Performance improvement help

Post by daliose01 »

I am working at a company running Zimbra 8.8.9 on CentOS 7 with latest updates.
The system had installed 32 GB RAM the first 6 month when Zimbra was installed and then got an upgrade to 48 GB RAM, cause of performance issues
with MySQL and Java is eating RAM all the time. After the RAM upgrade we haven't done any changes to the Zimbra config, since the server was running smooth, again. Now we have the same issue again.
There are about 100 ~ 150 email accounts registered on this server and all user are using the Webclient, only. The server is no VM and it is only running Zimbra on it.

I have looked into several forums and pages where someone recommend to make changes to mailboxd_java_heap_memory_percent
or also mailboxd_java_heap_size if Java is memory hungry, but I am not sure, what the proper values are for our server. Also, there might are other tweaks what helps to get the system running smooth?

Every hint into the right direction is much appreciated!

Maik
User avatar
pup_seba
Outstanding Member
Outstanding Member
Posts: 687
Joined: Sat Sep 13, 2014 2:43 am
Location: Tarragona - Spain
Contact:

Re: Performance improvement help

Post by pup_seba »

Hi,

"The system had installed 32 GB RAM the first 6 month when Zimbra was installed and then got an upgrade to 48 GB RAM, cause of performance issues"
Why you add more RAM? Were those performance issues related to memory?

"with MySQL and Java is eating RAM all the time."
Both appliacations have configurations that tell them how much RAM to consume. Most of the systems I deploy will leave the OS with only 4-6GB available. (this triggers most of the default monitoring alarms my customers have...so they have to tweak the alarms, configuration is fine).

"After the RAM upgrade we haven't done any changes to the Zimbra config, since the server was running smooth, again."
So most likely, what was running out of RAM was your OS, not your Zimbra, as both the RAM consumption for your JVM and for your MariaDB are written in their configuration files.

"Now we have the same issue again."
By "same issue", I guess you mean "system being slow", but again, it just doesn't seem that you know why your system is slow.

"There are about 100 ~ 150 email accounts registered on this server and all user are using the Webclient, only."
That sounds like a really big server for such low amount of accounts. Could you share with us how much mails you send/receive daily (you should see a daily report email with this info in your "admin" account", and how much space your store is using (du -sh /opt/zimbra/store)?

"I have looked into several forums and pages where someone recommend to make changes to mailboxd_java_heap_memory_percent
or also mailboxd_java_heap_size if Java is memory hungry, but I am not sure, what the proper values are for our server. Also, there might are other tweaks what helps to get the system running smooth?"
Before jumping into changing things based on what others reccomend, try to understand:
1. Why your server is running "slow". (also...if you could explain what you mean by slow, that'd would help understand. Maybe is slow response for web client, maybe mails take 20' to hit the mailbox...)
2. How do the different components are configured in Zimbra. A good start is to read this and understand it. https://wiki.zimbra.com/wiki/Performanc ... eployments
Although mostly outdated, it will give you some hints.
(also, that mailboxd_java_heap_memory_percent is quite old and does not apply to your zimbra version)

Although it seems that Zimbra still needs some tweaking about how the JVM and MariaDB are configured, with that amount of memory I really doubt that so few users are able to generate the traffic and mailstore size required to collapse such system. Also, some more details about your configuration and "slowliness" are required to understand and actually help you. For instance:
Outputs of running:
$ free -h
$ cat /opt/zimbra/conf/my.cnf
$ zmlocalconfig mailboxd_java_heap
$ zmprov gas
$ zmcontrol status (on each zimbra server you may have)
$ du -sh /opt/zimbra/store
$ du -sh /opt/zimbra/db/data
$ du -sh /opt/zimbra/store
$ zmlocalconfig ldap_common_threads
$ cat /proc/cpuinfo | grep processor | wc -l

What HDD are you using? (size and SDD, SATA 7.2, SAS 10, SAS 15, etc) Are they local to your server or are they in some sort of SAN? If so, what protocol are you using to present the disks to the server (NFS, iSCSI, FC, other?). What type of RAID are you using?

Are you aware of any modifications/tunning made when the system was installed that may be relevant? Is this server dedicated only to Zimbra or is it being used for other things?

Describe what you mean by "slow" and when it happens?

Do you search/find anything relevant in the logs?

Let's see if we can find out what's causing the slowliness.
daliose01
Posts: 12
Joined: Wed Aug 03, 2016 7:32 am
Location: Japan
ZCS/ZD Version: 8.7b1
Contact:

Re: Performance improvement help

Post by daliose01 »

Thank you for your reply and spending time to support.
Here are the outputs you asked for:

free -h
total used free shared buff/cache available
Mem: 46G 10G 2.6G 2.3G 34G 34G
Swap: 15G 7.8M 15G




cat /opt/zimbra/conf/my.cnf

[mysqld]

bind-address = 127.0.0.1


basedir = /opt/zimbra/common
datadir = /opt/zimbra/db/data
socket = /opt/zimbra/data/tmp/mysql/mysql.sock
pid-file = /opt/zimbra/log/mysql.pid
port = 7306
user = zimbra
tmpdir = /opt/zimbra/data/tmp

external-locking
slow_query_log = 1
slow_query_log_file = /opt/zimbra/log/myslow.log

general_log_file = /opt/zimbra/log/mysql-mailboxd.log

long_query_time = 1
log_queries_not_using_indexes

thread_cache_size = 110
max_connections = 110

# We do a lot of writes, query cache turns out to be not useful.
query_cache_type = 0

sort_buffer_size = 1048576
read_buffer_size = 1048576

# (Num mailbox groups * Num tables in each group) + padding
table_open_cache = 1200

innodb_data_file_path = ibdata1:10M:autoextend
innodb_buffer_pool_size = 4975618867

innodb_log_file_size = 524288000
innodb_log_buffer_size = 8388608
innodb_file_per_table

# Value is: 200 + max_connections + 2 * table_open_cache
innodb_open_files = 2710

innodb_max_dirty_pages_pct = 10
innodb_flush_method = O_DIRECT
innodb_flush_log_at_trx_commit = 0
max_allowed_packet = 16777216

[mysqld_safe]

log-error = /opt/zimbra/log/mysqld.log
pid-file = /opt/zimbra/log/mysql.pid





$ zmlocalconfig mailboxd_java_heap
Warning: null valued key 'mailboxd_java_heap'


$ zmprov gas
ares.XXXX.com (Shows the registered hostname)


$ $ zmcontrol status
Host ares.XXXX.com
amavis Running
antispam Running
antivirus Running
dnscache Running
ldap Running
logger Running
mailbox Running
memcached Running
mta Running
opendkim Running
proxy Running
service webapp Running
snmp Running
spell Running
stats Running
zimbra webapp Running
zimbraAdmin webapp Running
zimlet webapp Running
zmconfigd Running




$ du -sh /opt/zimbra/store
369G /opt/zimbra/store

$ du -sh /opt/zimbra/db/data
5.1G /opt/zimbra/db/data

$ zmlocalconfig ldap_common_threads
ldap_common_threads = 8

$ cat /proc/cpuinfo | grep processor | wc -l
8

What HDD are you using? It is running 2x SATA 7.2 in a Raid 1. The raid is mounted in /opt. The OS is installed on another Raid 1 with SATA 7.2 disks.
The server dedicated and only running Zimbra.
There were no other changes made what are related to the Zimbra setup.

What means slow:
When the server was installed and everyone started using it about a year ago, searching for mails toke about a second. Now it often takes up to 30 seconds.
Sometimes when clicking on a mail folder at the webclient, it takes up to 10 seconds, the mails showing up. Sometimes when users try to reach the webclient, they get a Server timeout. after a view seconds, the login page/webclient shows up, again.

I checked the looks over the last view days, but couldn't really find any message what point to any problem, but I will check it, again.

Maik
User avatar
pup_seba
Outstanding Member
Outstanding Member
Posts: 687
Joined: Sat Sep 13, 2014 2:43 am
Location: Tarragona - Spain
Contact:

Re: Performance improvement help

Post by pup_seba »

Hi,

"free -h
total used free shared buff/cache available
Mem: 46G 10G 2.6G 2.3G 34G 34G
Swap: 15G 7.8M 15G"

I see 2 things here:
1. You have lots of unused RAM, most of it is just being cached.
2. You are swapping.

So, I would reccomend to you to change your os configuration so your "vm.swappiness" equals 0. As you are using a CentOS 7.x, I would reccomend to use "tuned" to perform such changes.

We will talk about the "unused RAM" later on.

"cat /opt/zimbra/conf/my.cnf"
If you did not touched this file after your zimbra got installed, good news for you, as the one installing it, followed the suggestions given here (at least for this file conf). https://wiki.zimbra.com/wiki/Performanc ... eployments
In fact, if you read that article, you won't even need to read my entry as you'll see by yourself what I will suggest is the responsable for your slowliness.

Based on the size of your db, you may change this value so you have room in memory to accomodate it withough problems...although you probably have space as most (half of them) tables for the db are usually empty. So, you could comment this line:
# innodb_buffer_pool_size = 4975618867
And add this line instead:
innodb_buffer_pool_size = 7G

Then, if you keep an eye on the mysql process (top command), you should be able to see how much that process is actually consuming which will tell you how large exactly is your DB. (my guess is roughly 3G).

"zmlocalconfig mailboxd_java_heap"
My bad, the right command is "zmlocalconfig mailboxd_java_heap_size"
As you have enough RAM, you could put it to the maximum reccomended/tested within the Zimbra reccomended limits (https://wiki.zimbra.com/wiki/Performanc ... eployments). I have servers for 8-9K users with no more than 6.144 value, and they work fine. So, as you have enough RAM, you could overkill and try how you do with 6144. Do not use more than that. If it is higher, change it to something <= to 6144.

It is running 2x SATA 7.2 in a Raid 1. The raid is mounted in /opt. The OS is installed on another Raid 1 with SATA 7.2 disks.
The server dedicated and only running Zimbra.
There were no other changes made what are related to the Zimbra setup."
Disk configuration is not good, I don't even use such slow disks not even for backups with zextras tools (quite delicate tools when it comes to disk speed or using things like nfs). Lot's of things are saved on disks that will affect your users:
- emails
- tmp files for your amaivs (the files created while scanning for viruses, etc)
- lucene indexes
- your db files, although you mostly work with memory, at some point (innodb_max_dirty_pages_pct = 10) that info is written into disk.
- virtual memory for your processes

I mean, it is hard to diagnose why your server is slow, may your CPU collapses or may your network acting weird during peak times, etc etc. But one thing is for sure, SATA disks should not be used.

"When the server was installed and everyone started using it about a year ago, searching for mails toke about a second. Now it often takes up to 30 seconds."
Searching for a mail could read from in-memory mariadb info or disk. Things like which folder the mail is, which tags, "from" or "to", subject, those can be read from in-memory maria db (i'm not sure zimbra does read that info from in-memory when performing a search though...). All mail content, is stored only in disk. So if you are searching mails for "a specific word or words" that were in its content, then zimbra will have to drill and read every single file for that user, to find that word/s. With your disks, 30 seconds is reasonable, depending on the type of search.

"Sometimes when clicking on a mail folder at the webclient, it takes up to 10 seconds, the mails showing up."
All the info you see in your webUI, comes from your in-memory mariadb info. Only when you clic on an e-mail to preview it's contents or double clic on it to open it in a new window, only then you are reading from disk. So, if when your users clic on a folder just to "see the list of mails" under that folder, (maybe i'm wrong here) that client is not asking info from your disk, but from your in-memory db. Which should be really fast...unless you are swapping (which is your case), and what should be in-memory, is actually swapping out from disk. Other possibility is that you dont have a big enough buffer_pool_size, but even when that's not the case, follow my instructions to "overkill it just a little bit" ;-)

"Sometimes when users try to reach the webclient, they get a Server timeout. after a view seconds, the login page/webclient shows up, again."
This is usually (under my experience) CPU or Network problems. But, again, it will be hard to tell until you fix the other things. I really think that changing the swappiness and adjusting java and mariadb conf, will make some difference.

Also, depending on your zimbra antispam strategy, spamassassin can be very resource demanding. As you have everything in one server, try to be sure you are not overloading your server. (the load avergae you see under the top command). Furthermore, you could run tools like "vmstat" to check if you have things in queue waiting for resources. Somthing like "vmstat 1" will show you "r" and "b" numbers, which should usually be "0" with some exceptional "1" or rare "2" :)

If the physical server allows it, it is possible to add SSD (most likely are cheaper than memory), put them in RAID 1 and configure new volumes for Zimbra to act as primary volumes.

Regards,
Jordack
Posts: 34
Joined: Sat Sep 13, 2014 2:15 am

Re: Performance improvement help

Post by Jordack »

Ill throw some comments out there.
I’m running 130 users on 20GB RAM

Throwing more RAM at java never fixes it the problem. It is also possible to give it too match memory

We started to see some signifate slow downs a year or two ago. Moved it to SSD storage and saw major improvements. I would start there.

Spinning rust is not good for random read/writes, email is a lot of random R/W.

If you are running everything on 2 spindles on raid 1, you are essentially running everything on a single drive with a 2x write penalty. You do not get a read boost with raid 1. You have 150 users fighting over 2 little read heads.

Luckily Enterprise SSD are not super expensive. Couple months ago I picked some 1.9T Intel drives for $550. (They are down to $450 now). I understand everyone’s budgets are different, but do not go with cheap consumer drives.
Post Reply