Friday, 9 December 2011

Apache Solr: DisMax and multiValued

At maxHeap, we try our best to speed up the data delivery to the end user. MySQL just isn't fast enough, so we use Apache Solr. Solr uses Lucene at it's core (both a part of Apache Project) to deliver lightning fast data. So where does Solr fit into CommonFloor? Well, the auto-suggest and all searching is powered by Solr on CommonFloor.

In it's latest versions Solr, has introduced a different search handler called Disjunction Maximum or simply DisMax. The default search handler is pretty stupid, pardon my strong language, but that's really the case. There is no way to search across multiple fields! Rather, you would have to specify the same query for each field that you want to search, so developers came up with an answer to the problem by using the copyField directive which appends the source field into the destination. Using this method developers used to search in one field for all queries. But then data grew complex, and there was a new requirement! Not all fields have the same weight-age, some are less important, and some are very important. For example the title would be very important, while the URL is not so much. But using the copyField method developers could not do it, they needed something more smart and robust and thats where DisMax and eDisMax (extended-DisMax) comes in. DisMax allows you to execute one query across multiple fields, while allowing you to give a different weight-age to each field (they call this boost). This allowed complex searches with a robust method of boosting and selecting results. DisMax features like mm (Minimum should match), pf and ps (Phrase Fields and Phrase Slop) and of course qf (Query Fields) (multiple fields and their boosts are specified here) allow advanced matching criteria and a great method of sorting the results exactly how you want it. I could go on for ever about DisMax and it's uses, but I'll leave it to you to explore!
More about DisMax: http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/

The second and I'd say the more important thing is upgrading the schema (Solr has schemas just like databases, though they differ significantly). While upgrading from Solr version 1.3 to the latest version we experienced a lot of trouble with Solr. Due to the lack of full documentation we were on our own to solve the issues. The problem: Solr can have multi-valued fields (they are like arrays) which are different from normal fields. The main difference apart from single-valued and multi-valued is that Solr cannot sort based on a multi-valued field. The Solr schema specifies all details of the fields, how they should be processed, what filters to apply, how to parse the values etc. along with a very important attribute i.e. the schema version number. Since our Solr was pretty outdated, the schema version was set to '0.1' which directed Solr to use the rules specified with the oldest schema version which is '1.0'. Version '1.0' specified that all fields are multiValued by nature due to which even after specifying multiValued as false for each fields Solr understood it as multiValued! So after a lot of errors and time spent on the problem we couldn't figure it out. Then it clicked! A very simple change i.e. setting the schema version to '1.1' solved the problem as Version '1.1' directed Solr to use multiValued as false by default. This is completely undocumented due to which we had such a hard time figuring out such a small fix!

15 comments:

  1. Interesting information and attractive.This blog is really rocking... Yes, the post is very interesting and I really like it.I never seen articles like this. I meant it's so knowledgeable, informative, and good looking site. I appreciate your hard work. Good job.
    Kindly visit us @
    Sathya Online Shopping
    Online AC Price | Air Conditioner Online | AC Offers Online | AC Online Shopping
    Inverter AC | Best Inverter AC | Inverter Split AC
    Buy Split AC Online | Best Split AC | Split AC Online
    LED TV Sale | Buy LED TV Online | Smart LED TV | LED TV Price
    Laptop Price | Laptops for Sale | Buy Laptop | Buy Laptop Online
    Full HD TV Price | LED HD TV Price
    Buy Ultra HD TV | Buy Ultra HD TV Online
    Buy Mobile Online | Buy Smartphone Online in India

    ReplyDelete
  2. Looking for latest update on TNPSC exams? Kalviseithi - #1 educational portal offer latest news about TN state government jobs, educational news and much more information.

    ReplyDelete
  3. The article is very interesting and very understood to be read, may be useful for the people. I wanted to thank you for this great read!! I definitely enjoyed every little bit of it. I have to bookmarked to check out new stuff on your post. Thanks for sharing the information keep updating, looking forward for more posts..
    Kindly visit us @
    Madurai Travels
    Best Travels in Madurai
    Cabs in Madurai
    Tours and Travels in Madurai

    ReplyDelete
  4. Looking for best English to Tamil Translation tool online, make use of our site to enjoy Tamil typing and directly share on your social media handle. Tamil Novels Free Download

    ReplyDelete