12.31.09
The Eureka Momement
As the semester finally wrapped up I found that my list of things to write about has begun to explode. However, at the very top was something I discussed with several people when talking about my software engineering project: the eureka moment. Epiphany.
When writing software, you often analyze several ways to accomplish a task and embark on what seems like the best one; but on occasion, the stars align and the meaning of life becomes crystal clear. Your solution is so perfect and elegant that you know, even feel, that it is right and it shouldn’t be any other way. I detail my experience with this below.
First, let me describe the problem. I was tasked with writing the search subsystem of our project (which was not a trivial task if you check our schema). I was excited for the task, and I spent a great deal of time working on the details. I decided to do a filter/restriction approach, where people would specify the data they wanted to search (filter) and the data they wanted to find (restrictions). Using this method, you could add multiple filters and even multiple restrictions per filter quite easily. Given our schema, I was not immediately sure how it was going to come together. However, in order to get something semi-workable early on for other members to work with I wrote a simple, yet fairly flexible query generator. It worked quite well when searching a single table, as it could handle multiple restrictions, and would prove quite useful later.
While discussing the project with Allyn, we decided it might save some time to use Sphinx, an open source search engine. I then spent hours installing it, working with examples and trying to set it up to search our fresh, semi-complete class database. About ten or fifteen hours of work later, I determined it was great for searching text fields, but it did not suit our project’s needs.
Frustrated, I took a some time off to fiddle with other things that needed to be done. Suddenly I realized there was serendipity in my Sphinx research, as I followed a tutorial that forced me to use views, a mechanism in SQL that allows you to compose data from multiple sources into a single table. While I knew of them, I did not know an actual use for them. Then it hit me like a brick wall: I knew how to search a single table easily, and with views I could easily bring all of the data I desired into a single table! I nearly shouted “Eureka!” as, in a very brief moment, I mentally saw all of the pieces fall into place. As I had not committed any of my Sphinx related code, I took a chance: I deleted ten plus hours of work and hacked up some code. In two hours I cut the first version of the search system we used for the rest of the project (revision 113 out of 243).
Later changes were mostly trivial additions or refactoring. As I composed all of the data into a single table for searches, I also took all of the data we required to be presented and combined them as well (the views are here). With this done, we mostly had cosmetic changes and testing to deal with. I integrated verification functions and created a paging system to simplify work on the user interface. But the true feature of this system was its flexibility, and sadly was the biggest thing we forgot to mention to the class during our presentation (a story for another time). Adding a new filter to both the user interface and search is a matter of adding a row to the filters table in the database, and if any more search data is required you merely update the view. By changing one line of SQL and adding a row to a table you can create filters rapidly, remove them, deactivate/reactivate, change, update. Nothing is hard-coded. In fact, we removed some functionality because we only sported data from one semester, so filters for semester and year were removed since all data was from Spring 2010.
Search was the core functionality that drove the project, and without it the system would be pointless, so I was very excited to see it come together. It has some issues still, but for only a few weeks worth of work I am impressed with the system we built. Feel free to give it a test run: http://allynbauer.com/seproject. If you get a dinosaur, try hitting the link again
Brian said,
January 1, 2010 at 7:00 pm
Gah, there are definitely times I hate having to deal with search.
We used Xapian in one of our larger projects, and our new Advertsing Order Entry system is using mostly using Drupal Views with filters.
–Brian
Nick said,
January 2, 2010 at 3:54 pm
My God! You are alive! You should get on IM more, I could have used your help numerous times during this project! Plus, there are some entrepreneurship related topics I’d like to discuss with you
Search is an interesting beast. Xapian looks interesting at first glance. How did that turn out for you?
Brian said,
January 2, 2010 at 8:37 pm
I forgot my AIM password, just remembered it the other day actually.
Pretty well, it was nice in that it supported English word stemming and was MUCH MUCH faster than the full text index method Drupal used. Our site had over four hundred thousand nodes so Drupal’s search system often took 5 or 6 minutes to perform a search. Xapian cut that search time to something around 5-15 seconds. Solr was likely somewhat faster, but Xapian was easier to implement as it uses flatfiles for it’s index. Solr operated as another separate daemofor me to administer if I went that route.