Posts Tagged benchmarks
First stats with Tokyo Cabinet
Today we started testing Tokyo Cabinet as our DBM for the new design. We had some very good references about it, so we thought we should give it a try.
After setting up Tokyo Cabinet, it’s python binding and Tokyo Tyrant (db server) with it’s python bindings too we did some fast tests. We drafted a new schema-less design for the new database and dumped part of some old data to Tokyo Cabinet.
For those not familiar with the term schema-less, it’s basically a database that has no table structure, that is, everything is stored as a tuple of (key, value). On one side, a key-value database is much faster to read/write but it’s much harder to maintain and keep in sync.
So, we did some queries (read only operations) in both databases an this is what we saw:
Test 1:
- All data from a feed (MySQL): 0.01699 s
- Partial data from a feed (TC): 0.00174 s
This first test wasn’t really fair, as MySQL had to retrieve all fields per record, while TC just had to access a bunch of buckets with fewer fields. We did this first test as it’s going to be the real scenario, currently we retrieve many more fields from a Feed than we should and so, the new query under TC is, not only faster because of the database, but because it’s much more lightweighted.
Anyway, we modified the test so that both queries retrieved both fields per row:
Test 2:
- Partial data from a feed (MySQL): 0.00346 s
- Partial data from a feed (TC): 0.00151 s
Here we can see that both are slightly similar. Again, this isn’t really fair, as MySQL is executing just one query against several that we do with TC. So, we changed the TC query into a multiget request (request several keys at the same time):
Test 3:
- Partial data from a feed (MySQL): 0.003533 s
- Partial data from a feed (TC with Multiget): 0.000845 s
Under exact circunstances it’s clear which one is faster. So, I think we’ll continue experimenting with Tokyo Cabinet and some more real data and see how it performs.
