Entries Tagged 'SQL Performance Tuning' ↓

Philly ETE – Database as a Value – Summary

This was the first time I’ve seen Rich Hickey’s talk on Datomic, which lent great clarity to the product. As implemented, Datomic functions as an immutable database for philosophical reasons, although in practice it doesn’t manage it’s own storage, and may eventually support deletions to satisfy legal and compliance issues around privacy.

This database technology aims to solve common problems encountered in relational database systems: inability to reason about the current state of data, concurrency issues, challenges of viewing data as of a given time, mismatched metaphors between code and data, and various frustrating design patterns. Making data immutable and adding a time dimension to data allows a number of great simplifications, and produces an index resembling git history (although unlike git, you can’t rewrite history).

There are a number of possible implementations – naive approaches would copy data or build append-logs (as is typical in database rewrite logs), but the Datomic structures bear study, as they are all based on wide, short trees. Because data nodes are immutable, the contents of a tree won’t change, but only have new data added. This lets the db track novelty, and arranged as they have it currently, it appears that you can easily traverse the history as a range scan. Each tree-node has a series of pointers to other tree locations, based on various pre-indexed ways to traverse the tree.

Architecturally, this can have many readers, which don’t require coordination (a query would specify the database/db version it queries against), but a single writer.

Data elements are designed to resemble facts (e.g. :noun :verb :o bject), so one might think of this as resembling a cell or row in a traditional database. Because these are immutable, they can also be cached anywhere, or cached split across machines, so a database could have arbitrary subsets of data in memory – query results are a merge-join of memory and disk storage. As I understood it, data is read into memory as segments (pages) like traditional databases.

This is designed to take a functional approach to database programming, so instead of running a query inside a database, you run a function on a database, e.g. f(db) -> results. This has great potential – ease of mocking data, running hypothetical inserts to see the result, passing custom code in database queries. Notably, this is not necessarily intended to work well for very large data (click streams) or non-functional uses of a database (i.e. anything that uses a db as a global variable, such as a counter).

Philly ETE – Druid Column Oriented Database

I attended a talk at Philly ETE by Metamarkets, a company doing real-time analytics for advertising. Having worked on a couple Oracle-based reporting projects, I entered with interest. Their system is built around dimensional modeling, although with atypically high volume inserts and low latency for updating reports. They attempted to build this system with a traditional relational database (star schema + pre-computed fact subsets), and then when that didn’t work out, they tried using a key-value store (!) with many possible queries pre-computed.

Rather than purchasing a database from EMC (Greenplum) or Oracle, they decided to roll their own, which apparently is cheaper(?!). Likely, the reason this works is that there are an increasing number of OSS components, such as Zookeeper, Kafka, Hive, etc, so one can construct a bundled database product (a similar model to HortonWorks). Their system is called Druid, and was recently released as an open-source project, although as of this writing they admit it’s not the easiest thing to configure and stand up.

Unsurprisingly, the solution they built makes extensive use of bitmap indexes. Fact table data is partitioned into batches of up to a 1-2 million rows – each dimension is produced per batch, then query results are merged. This limits the cardinality of dimensions – each bitmap index will have no more unique values than rows in the partition. Bitmap indexes are nice, in that they can be ANDed and ORed directly in large batches, with one bit per unique value, so are much smaller than the source data. In fact, this data is also compressed in a way which can allow for boolean operations to work – Metamarkets claims the resulting index is 30-40% the size of an inverted index. Incidentally, there is a handy Java implementation of this compression algorithm.

This system appears to allow much wider scaling than a typical RDBMS (e.g. 100 nodes), and mitigates network I/O by tracking which nodes control a segment of data (they’re not the only DB vendor who has this – perhaps it’s a feature of zookeeper?). Data segments are also considered immutable – when changes are made, a new segment is created, the old one soft-deleted, and background jobs run nightly to clean up duplicates/leftover data.

More information here-

http://metamarkets.com/2011/druid-part-i-real-time-analytics-at-a-billion-rows-per-second/

http://metamarkets.com/druid/

Data Warehousing, NoSQL, and the Cloud

With the nascent advent of NoSql, cloud computing and slick new databases, we seem to have forgotten from whence we came. I went to a conference recently on the open source search product Solr/Lucene. One of the keynote speakers, Chief Data Scientist of HortonWorks, discussed what turned him to NoSQL databases, in this case, a failed project to track every click on walmart.com in Oracle.

For all it’s idiosyncrasies and irritations Oracle (the database) is an incredibly powerful and versatile product, a power most projects do not fully use. Hortonworks appears to be trying to follow the same path as Oracle (the company), from consulting company product to vast riches. Even though many projects do not tap full featureset or power of Oracle, it is still preferred in some companies for the supposed safety of an expensive support contract. This is probably true of SQL Server as well, but I’m less experienced in that area. In the same way, I doubt few who use NoSQL solutions fully realize the power or place to use the tools available.

I think it’s worth looking at how we got here. Oracle has traditionally sells single purpose, high powered and very expensive machines, a poor fit for a scrappy web startup. The variety of configuration and installation options is overwhelming to the point that they sell pre-built HP boxes with Oracle and the OS configured for you.

Through a maze of acquisitions Oracle likely owns a company that can meet any need, if only you can figure out where to look on their website. When I last talked to their sales reps for a non database product, their preferred pricing model was revenue sharing, which to me sounds like a terrible proposal, unless you own a company that exists to lose money.

If you pay enough, Oracle will assign someone to fix your problems. When I last worked on a data warehouse, I typically found a database defect every other week, some with patches available and some without. We were running a “small” data warehousing system, recording a couple hundred million records.
Other companies in the city were well known to have larger databases,for various purposes. Had I wished for a different rendition of this project, I could well have moved to a payroll company, a grocery chain, or a computer manufacturer.

The challenge of building such a system is not unique to Oracle. Tuning queries on Postgres, in my experience, typically results in two orders of magnitude performance improvement, vs. one in Oracle. This appears to be Postgres lacking numerous micro-optimizations, while generally a solid, cleanly-designed product.

If you find this interesting and read this far- stay tuned for more. You can subscribe using RSS or the email subscription in the upper right.

Diagnosing Connection Leaks in Node.js and Postgres

In building a website scraper with Chrome and Node.js, I made mistakes that led to connection leaks. In this application, the scraper runs in a browser and connects to a node.js server, which saves data off to a database.

Once you know what the issues look like, they are easy to see, but otherwise often difficult to diagnose without code inspection. The database driver, nodepg, provides two modes for connection management. You either use built-in connection pooling, or manage connections on your own. As with connection management in any language,  the distinction can be subtle, even when the result is drastically different. If you use the nodepg callback API, the connection is naturally closed on its own:

var pg = require('pg'); 
var conString = "tcp://postgres:1234@localhost/postgres";
pg.connect(conString, function(err, client) {
client.query("SELECT NOW() as when", function(err, result) {
}); });

On the other hand, if you construct a connection object on your own, you must manage closing it:

var client = new pg.Client(conString);
client.connect();
var query = client.query("...");
// must keep track of client and close it...
query.on('end', function() { client.end(); });

If you refactor code between the methods, you must take care to manage the process carefully. The key to finding the issue is to eliminate servers from the mix. There are numerous limited resources that can run out in this environment – e.g. database connections, net TCP connections, TCP connections per process, memory, or per-process open files. Some require OS configuration, while others are triggered by application defects.

To narrow the issue, I used Apache Bench (ab), installed through XAMPP, to eliminate the front-end (browser/parser). It’s a little tricky to send post requests correctly with ab, so I proxied test requests through Fiddler, which is a little tricky it itself. The biggest win was installing pgbouncer on the NodeJS server. On it’s own, this eliminates most of the memory issues (even with connection leaks!), in spite nodepg having it’s own connection pool. The real value though, is the log message that occurs when instantiating a connection fails – you can clearly see whether the failure was connection to pgbouncer, or connection out from pgbouncer to the database.

I wish the API tracked where connections were created, then logged unreleased sessions when the request ended. This is fairly easy to set up in Java/Tomcat in a finalizer. In theory, the web server should be able to clear memory easily after a request finishes. This is likely a significant architectural downside to a single threaded web server like NodeJS. On the other hand, it forces you to fix all the issues, which may be better long term.

To detect this type of issue, my script needs to run for a while. My application typically died after parsing about 5,000 pages. After fixing the issue, the application makes it to at least 15,000 pages, and uses about 80% less memory – it switches from being memory to CPU bound.