Posts Tagged schemaless
Tokyo Tyrant and some numbers
Tokyo Tyrant is the database server that uses Tokyo Cabinet as backend. It allows you to access the database remotely. It supports 3 protocols, binary, memcache and http. This is great if you have already existing infrastructure.
We needed a php class that implemented the protocol so we took a look at two of them, Net_TokyoTyrant with Pete Warden’s patch and Tyrant by Bertrand Mansion. The first one supports http and binary protocols, while Tyrant only supports the raw binary protocol.
During the first tests, Net_TokyoTyrant went crazy when inserting over 28000 records over http, so I guess there’s something wrong with that. When we switched to the binary protocol it worked as expected.
Here are some quick numbers:
Net_TokyoTyrant (100000 keys)
Time inserted: 50.3662779331 secs
Time retrieved: 57.7555668354 secs
Time deleted: 34.1996 secs
Tyrant (100000 keys)
Time inserted: 39.330272913 secs
Time retrieved: 44.3433589935 secs
Time deleted: 26.9360201359 secs
The former is slightly faster so I guess we’ll go for it. Specially important is that the author keeps it up to date, which is also a plus!
The Inkzee Team
First stats with Tokyo Cabinet
Today we started testing Tokyo Cabinet as our DBM for the new design. We had some very good references about it, so we thought we should give it a try.
After setting up Tokyo Cabinet, it’s python binding and Tokyo Tyrant (db server) with it’s python bindings too we did some fast tests. We drafted a new schema-less design for the new database and dumped part of some old data to Tokyo Cabinet.
For those not familiar with the term schema-less, it’s basically a database that has no table structure, that is, everything is stored as a tuple of (key, value). On one side, a key-value database is much faster to read/write but it’s much harder to maintain and keep in sync.
So, we did some queries (read only operations) in both databases an this is what we saw:
Test 1:
- All data from a feed (MySQL): 0.01699 s
- Partial data from a feed (TC): 0.00174 s
This first test wasn’t really fair, as MySQL had to retrieve all fields per record, while TC just had to access a bunch of buckets with fewer fields. We did this first test as it’s going to be the real scenario, currently we retrieve many more fields from a Feed than we should and so, the new query under TC is, not only faster because of the database, but because it’s much more lightweighted.
Anyway, we modified the test so that both queries retrieved both fields per row:
Test 2:
- Partial data from a feed (MySQL): 0.00346 s
- Partial data from a feed (TC): 0.00151 s
Here we can see that both are slightly similar. Again, this isn’t really fair, as MySQL is executing just one query against several that we do with TC. So, we changed the TC query into a multiget request (request several keys at the same time):
Test 3:
- Partial data from a feed (MySQL): 0.003533 s
- Partial data from a feed (TC with Multiget): 0.000845 s
Under exact circunstances it’s clear which one is faster. So, I think we’ll continue experimenting with Tokyo Cabinet and some more real data and see how it performs.
When partitioning isn’t enough
These past weeks we’ve been partitioning our database design. The goal was to achieve better scalability. Because Inkzee grows with the number of feeds it holds, not the users, we needed to partitioned the data tables so that we could process feed posts faster.
After altering a lot of our current code so that it worked with the new database design we’ve been experiencing problems with MySQL. It seems that, even though the solution makes the overall system much faster (like 3 to 4 times faster), some operations don’t play too well with MySQL and add an unaccepted latency to the system.
We’ve been resisting the urge to migrate to a schema-less database but it seems we have no other option but to transition to it. So, even though we thought we could have the new design working by the end of the week, we are afraid we’ll have to postpone it until further notice. We’ll keep you guys updated though!
The Inkzee Team
