Friday, June 28, 2013

Full-Text Search - Part 1

The journey with full-text search has not been an easy one. Two and a half years ago, we began looking for a faster way to do searches across multiple related database tables. Our primary target was to improve the performance of reporting on student data in our Learning Management System, NexPort Campus. We were already using the Castle ActiveRecord library to map our C# classes to our database tables with attributes. ActiveRecord is built upon NHibernate, a popular .NET Object Relational Mapper (ORM). Because we were already used to attribute mapping, it made sense to try to use a similar toolset for full-text search. Enter NHibernate Search.

NHibernate Search was an extension of NHibernate built upon a .NET port of the Java full-text search engine Apache Lucene called Lucene.NET. Similar to ActiveRecord, NHibernate Search used attribute mapping to designate what should be included in the documents stored in the Lucene.NET's document database. (A document is a collection of text fields that is stored in a denormalized way to make queries faster.)

At the time of our design implementation, Lucene.NET was several versions behind its ancestor. This should have been a red flag, as was the fact that NHibernate Search had not had a new release in some time. Despite these troubling indicators, we plowed on. We started by mapping out all of the required properties to sustain our previous reporting SQL backend. Our model is quite complex, so this was no easy task. Primitive types and simple objects such as String and DateTime used a Field attribute and user-defined objects used a IndexedEmbedded attribute. In addition to the basic attributes required by NHibernate Search, we also had to write separate IFieldBridge implementations and include the FieldBridge attribute on each property. Needless to say, our class files exploded with non-intuitive code.

NHibernate Search used the attributes for their listeners to determine when an object changed and needed to be re-indexed. If a related object changed, it would then trigger the next object to be processed all in the same session. For our case, one of our indexed objects was a training record object, section enrollment. If a user object changed, it would trigger both itself and all related subscriptions to be re-indexed, which then triggered all section enrollments to be re-indexed. This led to a very large problem in our production system, which I will detail in a bit.

The whole idea of this undertaking was to decrease load on the database server while making search and reporting results faster. To that end, we put the indexing work on a separate machine. To communicate the documents to be indexed, we used Microsoft Messaging Queue (MSMQ) and wrote our own specific backend queue processor factory for NHibernate Search. When an object changed, it would be translated by NHibernate Search into a LuceneWork object. These LuceneWork objects were then serialized into a packet that MSMQ could handle. If the packet was too large, it was split into multiple packets and re-assembled on the other side. MSMQ worked fine when the machines were on the same domain. However, when we went to our Beta system, cross-domain issues began to crop up. After hours of research and trial-and-error, we finally were able to solve the problem by tweaking the Global Catalogue in the domain controller.

To make reads even faster, we implemented a Master-Slave relationship with our indexes. One master index was for writes, and there could be one or more slave indexes to read from. In our first attempt, we used Microsoft's Distributed File System (DFS) to keep the slaves updated from the master. We quickly ran into file-locking problems, so we went to a code-based synchronization solution. We used the Microsoft.Synchronization namespace to replicate the data, ignoring specific files that were causing locking problems.

The file synchronization code was the last piece of the puzzle. After spending months working on the new full-text search reporting backend, it was finally time to release the product. Remember the large problem I mentioned earlier? Well, as soon as users started logging into the system, the extra processing added by NHibernate Search brought the server to its knees. It took minutes for people to do something as simple as login to the system. We immediately had to turn off the listeners for NHibernate Search and re-release NexPort Campus. It was a complete and utter disaster.

The moral of this story is not that NHibernate Search is the devil. The main problem with this solution was over-engineering. Trying to avoid future work by cobbling together too many third-party components that just did not fit well together was short-sighted and ended up being more work in the end. It made for ugly code and an unmaintainable system.

In the weeks following our disastrous release, another developer and I began to think of ways to offload the extra processing. We had some good ideas for it and were in the process of making the changes when priorities changed. Full-text search sat in an unused, half-finished state for nearly two years. When the idea came up to improve the search capability and performance of our user management system, we revisited the full-text search solution. That's when we discovered the holy grail of full-text search, Apache Solr.

For the story of how Solr saved the day, please stay tuned for my next post, Full-Text Search - Part 2.

About NexPort Solutions Group NexPort Solutions Group is a division of Darwin Global, LLC, a systems and software engineering company that provides innovative, cost-effective training solutions and support for federal, state and local government, as well as the private sector.


Post a Comment

Copyright © . NexPort Solutions Engineering Blog - Posts · Comments
Theme Template by BTDesigner · Powered by Blogger