Index mail messages to a Solr index

Have a great idea for extending Zimbra? Share ideas, ask questions, contribute, and get feedback.
Post Reply
haydenyoung
Posts: 4
Joined: Wed Aug 07, 2019 4:03 pm

Index mail messages to a Solr index

Post by haydenyoung »

I am running Solr and indexing information from a number of apps to produce a federated search. One of the those apps is Zimbra mail.

Currently I'm using Solr's mail data import handler but this requires the mailbox's username and password. Because I'm indexing multiple mailboxes and because new users can be added and removed without warning, this option will not work in a production deployment.

It is my understanding that Zimbra indexes content to Lucene so my thought was to query Lucene on a regular basis and re-index the data to Solr. However, and correct me if I'm wrong, but Zimbra seems to be using a very old version of Lucene (version 3.5?). This has made using current Lucene libraries impossible because all of the libraries I have tried only support version 6+. I have tried PHP, Javascript and Go but none of them can open Zimbra's Lucene index (I get an invalid segment error).

I'm wondering whether there is a way to a) query Zimbra's Lucene or b) index Zimbra's mailboxes to my Solr index using a different method, maybe via Zimbra's API?

I would be interested in others thoughts on this problem.

Any help much appreciated.

Thanks


Hayden
haydenyoung
Posts: 4
Joined: Wed Aug 07, 2019 4:03 pm

Re: Index mail messages to a Solr index

Post by haydenyoung »

Okay so some progress. First thing was to roll back and use Lucene 3.5. That allowed me to develop a small java class which I used to interrogate the Lucene index. However, while I was able to pull out fields and field values, I could only retrieve a subset of what should be in there. For example, I could pull out priority, l.partname and msg blob id but was unable to retrieve subject, to, from, cc, etc; I.e. all the useful fields. I'm not sure if this is because the fields are not configured for retrieval or whether they aren't even indexed at all.

My other option is to use a combination of zmaccts to loop through the mailboxes and zmmail to query the mail messages and index the returned data into my Solr index. What might be of more use is a REST equivalent. Am I able to get a list of mail addresses using Zimbra's REST API?
Post Reply