Search driven solutions, NoSQL or RDBMS?
Since 2009 “NoSQL” is the new buzz word on the net, but what does it mean and what are the benefits and how do they compare to Search driven or RDBMS solutions?
Traditional relational databases (RDBMS), are typically not very scalable and has poor performance in some data intensive operations.
Web 2.0 applications are typically very data intensive and does not scale well on traditional databases, at least not to a reasonable cost.
So there is a great need to find a new technology replacing (or at least complementing) RDBMS in some applications, and in many cases that would be a NoSQL technology. Knowledge XChanger is typically a target application for NoSQL, and while extending the products capability to handle millions of documents (instead of a few 100k), the need became a necessity.
Some would argue that NoSQL technologies are not databases at all, but more a highly available store of data. The major benefit is that NoSQL can handle extreme quantities of data compared to RDBMS.
NoSQL is a wide definition for much different software’s specialized in different areas like Document Stores, Graph Databases and Key Value Stores. In a sense RDBMS is also a sub set of the definition, but what about search engines? I would argue that a search engine like Appache Solr / Lucene is a NoSQL application. The only differences I see is that the schemas might not be as flexible in Solr and that NoSQL does not provide search functionality. At least for the number of documents we expect to handle. If we need to scale even more, to query terabytes of data, there are examples of Solr + Hadoop clusters that contain thousands of servers doing the job nicely, but frankly that’s beyond my current horizon.
While developing the next generation of Knowledge XChanger, we discovered that moving away from SQL solved many performance problems and made it both easy and affordable to scale horizontally. In some cases we found Appache Solr to be up to 10.000 times faster than MS SQL Server on similar operations on the same hardware. The most surprising part is that the transition to Solr has been so easy and quick to implement. The next version of KXC will still use SQL for storing statistics and relations but in combination with Solr it has much less negative effect on performance. Maybe a future version will be based on NoSQL and Solr only? At this time I believe NoSQL applications to be a bit to immature to base a commercial application on, if there are other solutions available.
Some more information about these topics:
By: Anders Thulin, CTO Comintelli AB