Hardware Recommendation for Environment with 250 Users

Ask questions about your setup or get help installing ZCS server (ZD section below).
Post Reply
6125amartin
Advanced member
Advanced member
Posts: 63
Joined: Sat Sep 13, 2014 1:45 am

Hardware Recommendation for Environment with 250 Users

Post by 6125amartin »

Hello,

I am currently running Zimbra 8.x in single-server mode on Ubuntu 14.04 on a quad-core Xeon with 24GB RAM and 7200RPM HDDs in a RAID. With the increasing number of users, Zimbra performance has been decreasing and occasionally users are encountering the "Network Service Error" with a "MailboxLock$LockFailedException" message being printed in mailboxd.log. I moved /opt/zimbra/index and /opt/zimbra/redolog to an SSD, which has helped a bit, but the problem still exists. More information in this bug report:
https://bugzilla.zimbra.com/show_bug.cgi?id=84490

I'm considering two courses of action:

Option 1: upgrade hardware but stay with single-server setup. I would upgrade to a newer Xeon with 4-6 cores and high clock speed (> 3.0Ghz) with at least 32GB RAM and SSDs or possibly NVMe SSDs if the additional speed would be justified. Would it only make sense to put /opt/zimbra/index and /opt/zimbra/redolog on SSDs, or would all of /opt/zimbra benefit significantly from the increased IOPS? Would NVMe SSDs be even more beneficial (and if so for which parts of /opt/zimbra)? My concern is that even with NVMes, the bottleneck is still going to be java threads.

Option 2: convert to a multi-server installation with the following split of components:
2x servers with Zimbra LDAP and Zimbra MTA together on them
2x servers with Zimbra Proxy
2x servers with Zimbra Store

Obviously option 2 is a better long-term approach, but there's additional complexity involved in setting up and maintaining a multi-server environment as well as the additional hardware costs. Any suggestions on hardware for this size of a deployment, or major advantages between these two options? My main concerns are making sure I can run as many threads as possible since that seems to be the problem I've encountered in bug 84490.

Thanks!
User avatar
L. Mark Stone
Ambassador
Ambassador
Posts: 2799
Joined: Wed Oct 09, 2013 11:35 am
Location: Portland, Maine, US
ZCS/ZD Version: 10.0.7 Network Edition
Contact:

Re: Hardware Recommendation for Environment with 250 Users

Post by L. Mark Stone »

Zimbra needs fast disks. When you have fast disks, you can put several thousand active users on each mailbox server no problem.

And by fast I mean Enterprise-grade SAS disks. Active Zimbra servers are near 50:50 read:write and have lots of random I/O of varying byte sizes. This is one of the most taxing workloads there is. And SSD won't always be faster BTW, because they are not so good at random writes. You need a really good cacheing controller with LOTS of cache and 15K spindles, or a proper SAN, for fast I/O.

For 250 heavy users, I would recommend a single server with proxy (because proxy is required for 8.7 anyway) with 12-16GB RAM and 6-8 CPU cores. If you want to get fancy you can create separate sets of spindles for /opt/zimbra/store and put amavis's temp folder on a RAM disk.

Hope that helps,
Mark
___________________________________
L. Mark Stone
Mission Critical Email - Zimbra VAR/BSP/Training Partner https://www.missioncriticalemail.com/
AWS Certified Solutions Architect-Associate
6125amartin
Advanced member
Advanced member
Posts: 63
Joined: Sat Sep 13, 2014 1:45 am

Re: Hardware Recommendation for Environment with 250 Users

Post by 6125amartin »

Thanks for the good information!
L. Mark Stone wrote:Zimbra needs fast disks. When you have fast disks, you can put several thousand active users on each mailbox server no problem.

And by fast I mean Enterprise-grade SAS disks. Active Zimbra servers are near 50:50 read:write and have lots of random I/O of varying byte sizes. This is one of the most taxing workloads there is. And SSD won't always be faster BTW, because they are not so good at random writes. You need a really good cacheing controller with LOTS of cache and 15K spindles, or a proper SAN, for fast I/O.
NVMe drives are quite fast, for example see table 2.2 here:
http://www.intel.com/content/www/us/en/ ... -spec.html

If the additional cost for NVMe isn't a factor and given this level of performance, is SAS still really preferred?
For 250 heavy users, I would recommend a single server with proxy (because proxy is required for 8.7 anyway) with 12-16GB RAM and 6-8 CPU cores. If you want to get fancy you can create separate sets of spindles for /opt/zimbra/store and put amavis's temp folder on a RAM disk.
Would you run the proxy on the same server? Any recommendations on clock speed for the CPU cores? At what point (user count) would you recommend switching over to a multi-server setup? 500? 1000?

Also, with a single-server environment, any recommendations on HA? I suppose I could run /opt/zimbra on a ZFS dataset and "zfs send" snapshots to a hot-standby server
6125amartin
Advanced member
Advanced member
Posts: 63
Joined: Sat Sep 13, 2014 1:45 am

Re: Hardware Recommendation for Environment with 250 Users

Post by 6125amartin »

Any additional updates on this? How can I avoid thread contention on a single server?
User avatar
L. Mark Stone
Ambassador
Ambassador
Posts: 2799
Joined: Wed Oct 09, 2013 11:35 am
Location: Portland, Maine, US
ZCS/ZD Version: 10.0.7 Network Edition
Contact:

Re: Hardware Recommendation for Environment with 250 Users

Post by L. Mark Stone »

Sorry for being scarce for a few days...

Please take a look at the Performance WIki for some specific tips. https://wiki.zimbra.com/wiki/Performanc ... eployments

250 mailboxes is a really small single-server implementation. I have never seen the network service error you are experiencing on such a small system.

When we have "heavy" users (meaning multiple simultaneous IMAP, ActiveSync and web UI logins per user), we play it safe and limit the number of mailboxes to 1K-2K per mailbox server (assuming a farm with separate MTAs, LDAP and proxy servers). For our largest implementation to date (about 20K domains) of mostly IMAP users, we found we could run 3K-5K mailboxes per mailbox server before we started seeing performance issues.

We've worked on plenty of single servers with as many as 6K mailboxes; they were slugs, but they didn't throw the error you are describing.

Honestly, you should be able to get by with 250 heavy users just fine with four cores, 12-16GB of RAM and some fast disks. If you want the sports car version, go to 8 cores and 24GB of RAM, and put Amavis's temp directory on a RAM drive. But unless you are processing something like 20K+ emails/hour, it's a little overkill in my experience. Yes, Zimbra uses Java, and Java requires a lot of resources. And yes, MySQL like to hang on to two CPU cores more than it should. But not so much that it would get in the way for just 250 users

Hope that helps,
Mark
___________________________________
L. Mark Stone
Mission Critical Email - Zimbra VAR/BSP/Training Partner https://www.missioncriticalemail.com/
AWS Certified Solutions Architect-Associate
6125amartin
Advanced member
Advanced member
Posts: 63
Joined: Sat Sep 13, 2014 1:45 am

Re: Hardware Recommendation for Environment with 250 Users

Post by 6125amartin »

Hi Mark,

Thanks for the clarification and additional information. Some of the info on that wiki page seems very out of date, for example when talking about CPUs:
We recommend an x86_64 dual-dual core CPU, of a speed that is not too low or too high on the price/performance ratio. Disable hyper-threading if that feature is present in your CPU (performance monitoring data is unreliable). At this time, we have not tested on dual-quad cores (coming soon).
It seems like 2x quad-core CPUs should be pretty common nowadays.

Based on the numbers you provided, it really does sound like a single server would be sufficient for my needs. Most users are doing either Activesync + IMAP or Activesync + WebUI concurrently, so there is added load due to Activesync, but not usually all 3 at once.

What about HA? With a single server, a hardware failure would cause downtime, and even with backups you then need somewhere beefy enough to run Zimbra. This is one of the reasons it is tempting to run 2x mail store servers, each with half of the mailboxes on a ZFS filesystem and then use "zfs send" and "zfs receive" to mirror the data from the other server to its peer. This way each server would contain all data and in the case of a failure of one of the servers you could run everything from just one of them for awhile. Alternatively you could configure DRBD as the backing device for the disks, but then one of the servers is sitting idle 99% of the time.
uniu
Posts: 2
Joined: Tue Sep 17, 2019 1:29 pm

Re: Hardware Recommendation for Environment with 250 Users

Post by uniu »

Hello Andrew,

Have you waved bug described in 84490 by scaling hardware?
6125amartin wrote:Hello,

I am currently running Zimbra 8.x in single-server mode on Ubuntu 14.04 on a quad-core Xeon with 24GB RAM and 7200RPM HDDs in a RAID. With the increasing number of users, Zimbra performance has been decreasing and occasionally users are encountering the "Network Service Error" with a "MailboxLock$LockFailedException" message being printed in mailboxd.log. I moved /opt/zimbra/index and /opt/zimbra/redolog to an SSD, which has helped a bit, but the problem still exists. More information in this bug report:
https://bugzilla.zimbra.com/show_bug.cgi?id=84490
Thanks!
Have you got round with the bug described in 84490 by scaling hardware?
If yes what was the new hardware specs?


Sincerely
Alexander
6125amartin
Advanced member
Advanced member
Posts: 63
Joined: Sat Sep 13, 2014 1:45 am

Re: Hardware Recommendation for Environment with 250 Users

Post by 6125amartin »

Hello Alexander,

I would recommend putting /opt/zimbra on fast disks to help mitigate the LockFailedException errors, preferably NVMe SSDs if possible.
6125amartin
Advanced member
Advanced member
Posts: 63
Joined: Sat Sep 13, 2014 1:45 am

Re: Hardware Recommendation for Environment with 250 Users

Post by 6125amartin »

NVMe SSDs are quite fast now so that seems like a good option to me
Post Reply