Strange data store size increase on migration

Discuss your pilot or production implementation with other Zimbra admins or our engineers.
Service
Elite member
Elite member
Posts: 1023
Joined: Tue Apr 14, 2009 2:44 pm

Strange data store size increase on migration

Post by Service »

So I've successfully completed a challenging migration yet there is something quite strange going on.
I upgraded from Zimbra 6.0.13 on Ubuntu 8.04 x32, to Ubuntu 10.04 x64 Zimbra 7.1.3.
because of the arch incompatibility to do a straight upgrade, I used imapsync to physically copy all user data from old server to new across the network. This worked relatevely well with some scripting and a user account/password list. The new system is up and running fine and there doesn't seem to be any problems but there is one very strange thing.
On the old server my /opt directory was approximately 215 gigs. There were actually MORE user accounts (280) on the old system as I used this opportunity to prune old unused mailboxes. The new system has fewer user accounts (220) and yet the /opt directory is pushing 430 gigs.
This strikes me as incredibly strange, especially when the physical size increase appears to be exactly double. What could have caused this?? I wouldn't be too concerned except the drive assigned to /opt is only 500g @RAID 50 which "should" have lasted ample years. If this is normal and uncorrectable I will need to quickly increase its size.
The only other thing that I feel is worth mentioning is that I had synced approximately 80% of the data to the new server on install 7.1.2 and then ran the upgrade to 7.1.3 and resynch'd. Is it possible that it literally duplicated data?? I checked several mailboxes and no duplicate mail messages appear to be visible...
Any ideas???
phoenix
Ambassador
Ambassador
Posts: 27278
Joined: Fri Sep 12, 2014 9:56 pm
Location: Liverpool, England

Strange data store size increase on migration

Post by phoenix »

[quote]So I've successfully completed a challenging migration yet there is something quite strange going on.
I upgraded from Zimbra 6.0.13 on Ubuntu 8.04 x32, to Ubuntu 10.04 x64 Zimbra 7.1.3.
because of the arch incompatibility to do a straight upgrade, ....[/QUOTE]The correct procedure to follow would have been this: Network Edition: Moving from 32-bit to 64-bit Server - Zimbra :: Wiki
[quote]The only other thing that I feel is worth mentioning is that I had synced approximately 80% of the data to the new server on install 7.1.2 and then ran the upgrade to 7.1.3 and resynch'd. Is it possible that it literally duplicated data?? I [/QUOTE]I would think this is the likely cause. Duplicated mail won't show in the client or web ui but will be on the mail store, if I were you I'd look at that to see if there's duplicated data in there.
BTW, the RAID level you're using is not recommended for Zimbra. If you want good (or improved) performance you should use RAID10.
Regards

Bill

Rspamd: A high performance spamassassin replacement

Per ardua ad astra
Service
Elite member
Elite member
Posts: 1023
Joined: Tue Apr 14, 2009 2:44 pm

Strange data store size increase on migration

Post by Service »

I would have liked to follow that guide, but we also made some configuration changes to our domain structure to accommodate unplanned aquisitions and the probability of future expansion. In addition the original configuration was designed around an improperly designed network (That was inplace before I came into the picture) and further changes to the network domain and nomenclature also had to be made.
Unfortunately my approach was unavoidable based on lengthy research and a few conversations with zimbra employee's about it.
That being said, how can I go about checking for duplicates in the data store itself. If they do not show up under client mailboxes I'm unclear how to locate and eliminate them. Please advise.
Thanks!
Service
Elite member
Elite member
Posts: 1023
Joined: Tue Apr 14, 2009 2:44 pm

Strange data store size increase on migration

Post by Service »

How can I go about checking for duplicate data in the data store itself. If they do not show up under client mailboxes I'm unclear how to locate and eliminate them.
I am working with Zimbra 7.1.4 on Ubuntu 10.04 LTS x64
Please advise.
Thanks!
bdial
Elite member
Elite member
Posts: 1633
Joined: Fri Sep 12, 2014 10:39 pm

Strange data store size increase on migration

Post by bdial »

what kind of duplicate data are you looking for to begin wtih? zimbra already kind of 'dedupes' by default. if many users on your system receive the same e-mial, it will store it once and hard link it for the other acconts
Rich Graves
Outstanding Member
Outstanding Member
Posts: 687
Joined: Fri Sep 12, 2014 10:24 pm

Strange data store size increase on migration

Post by Rich Graves »

@bdial: Dedupe only takes effect at LMTP submission time. If you use imapsync, zmrestore, or any other non-SMTP, non-LMTP import method, there is no dedupe.
@phoenix: I find it hard to imagine duplicating data with imapsync by accident and not seeing the effects in the client. Maybe you'd copy the same message twice with different INTERNALDATEs, but surely clients would notice the duplicates in search. Or did you simply mean that LMTP dedupe is lost?
You can use freedups or similar (freedup.org, hardlink) to hard-link retroactively. As long as you restrict it to /opt/zimbra/store*, it's safe. It will take a LONG time to run, at least 24 hours, but it's mostly metadata reads so should not hurt performance much. If you really have a nearly 50% data duplication rate among accounts, that's unusual.
Another possibility: Is this OSS or NE? 7.x silently made --zip the default for zmbackup, which increases the size of /opt/zimbra/backup. But only after the second or third full backup.
Quick test: df -i. Are there radically more inodes in use on the new server? This correlates with number of files. This will tell you if you should be looking for lots of duplicated small files, or a smaller number of really big files.
Just as quick: mysql -e 'select count(*) from mboxgroup10.mail_item' will give you a count of messages in a 1% sample of the mail store. There are 100 mysql databases, creatively named mboxgroup1 through mboxgroup100, and individual accounts are "randomly" assigned to one. I would not expect the counts on each server to match exactly, because user jdoe is likely in a different mboxgroup on server1 than on server2, but if you consistently get a higher count n the new server than on the old server, something strange is going on.
du -skh /opt/zimbra/{store,index,db,log,[...]} will tell you definitively where the space has gone, but it will take a long time to return (hours, at least).
Service
Elite member
Elite member
Posts: 1023
Joined: Tue Apr 14, 2009 2:44 pm

Strange data store size increase on migration

Post by Service »

Thank you Rich G. for the thorough response. I think loosing the SMTP/LMTP dedupe is probably what I'm up against. Users in my company are constantly mass mailing each other notices and inquiries, I would imagine 1 in 10 messages is duplicated across 50% or greater mailboxes. It's a-typical but the nature of the beast in this situation.
Doing what you've suggested here are some results:
df -i (OLD /opt): /dev/md0 23805952 1378225 22427727 6% /opt

df -i (NEW /opt): /dev/sdb1 32776192 1669350 31106842 6% /opt
mysql -e 'select count(*) from mboxgroupX.mail_item' (OLD):

1:10171, 2:42741, 3:30073, 4:22586, 5:2407, 6:32999, 7:9994, 8:23582, 9:60016, 10:2697 Total: 237266
mysql -e 'select count(*) from mboxgroupX.mail_item' (NEW):

1:1427, 2:21283, 3:155, 4:957, 5:19096, 6:61267, 7:15310, 8:34707, 9:176, 10:6489 Total: 160867
I'm running the du -skh /opt/zimbra/... commands and tallying up for later.
Based on what I'm seeing here I would have to say the issue is almost certainly dedupe issues. I'm going to try to run the freedups program you suggested and see if that helps. Let me know if anything else here strikes you.
Service
Elite member
Elite member
Posts: 1023
Joined: Tue Apr 14, 2009 2:44 pm

Strange data store size increase on migration

Post by Service »

I had significant trouble attempting to compile freedup from source as there were only a i386 & i586 32bit versions available, so I opted for an alternate version of fdupes in Maverick that includes:
-L --hardlink: replace all duplicate files with hardlinks to the first file in each set of duplicates
It was available at
A">http://mirrors.us.kernel.org/ubuntu//po ... _amd64.deb
A ran a test command without the -L action:
fdupes -m -n -r /opt
It took HOURS to complete, but even with zimbra running live (which was unable to read thousands of locked files) still showed:
525279 duplicate files (in 196227 sets), occupying 172900.5 megabytes
I believe I've found my solution. I would imagine the actual amount to be closer to 200gb which would put my original datastore size back down around 230gb and much closer to the expected.
I'll run this process over the weekend when I can shut down Zimbra to let it run overnight and report back. Thanks again for all the input.
Rich Graves
Outstanding Member
Outstanding Member
Posts: 687
Joined: Fri Sep 12, 2014 10:24 pm

Strange data store size increase on migration

Post by Rich Graves »

> fdupes -m -n -r /opt
Don't run that without the -n!
Keep it within /opt/zimbra/store. Other "duplicate" files are likely to be temporary lockfiles and whatnot, which would be bad to combine.
Service
Elite member
Elite member
Posts: 1023
Joined: Tue Apr 14, 2009 2:44 pm

Strange data store size increase on migration

Post by Service »

Makes sense. What about /opt/zimbra/[index,db,data] ?
Or are duplicates likely only within the store?
Post Reply