Posts Tagged database

Tokyo Tyrant and some numbers

Tokyo Tyrant is the database server that uses Tokyo Cabinet as backend. It allows you to access the database remotely. It supports 3 protocols, binary, memcache and http. This is great if you have already existing infrastructure.

We needed a php class that implemented the protocol so we took a look at two of them, Net_TokyoTyrant with Pete Warden’s patch and Tyrant by Bertrand Mansion. The first one supports http and binary protocols, while Tyrant only supports the raw binary protocol.

During the first tests, Net_TokyoTyrant went crazy when inserting over 28000 records over http, so I guess there’s something wrong with that. When we switched to the binary protocol it worked as expected.

Here are some quick numbers:

Net_TokyoTyrant (100000 keys)

Time inserted: 50.3662779331 secs
Time retrieved: 57.7555668354 secs
Time deleted: 34.1996 secs

Tyrant (100000 keys)

Time inserted: 39.330272913 secs
Time retrieved: 44.3433589935 secs
Time deleted: 26.9360201359 secs

The former is slightly faster so I guess we’ll go for it. Specially important is that the author keeps it up to date, which is also a plus!

The Inkzee Team

, , , , , ,

No Comments

First stats with Tokyo Cabinet

Today we started testing Tokyo Cabinet as our DBM for the new design. We had some very good references about it, so we thought we should give it a try.

After setting up Tokyo Cabinet, it’s python binding and Tokyo Tyrant (db server) with it’s python bindings too we did some fast tests. We drafted a new schema-less design for the new database and dumped part of some old data to Tokyo Cabinet.

For those not familiar with the term schema-less, it’s basically a database that has no table structure, that is, everything is stored as a tuple of (key, value). On one side, a key-value database is much faster to read/write but it’s much harder to maintain and keep in sync.

So, we did some queries (read only operations) in both databases an this is what we saw:

Test 1:

  • All data from a feed (MySQL):  0.01699 s
  • Partial data from a feed (TC): 0.00174 s

This first test wasn’t really fair, as MySQL had to retrieve all fields per record, while TC just had to access a bunch of buckets with fewer fields. We did this first test as it’s going to be the real scenario, currently we retrieve many more fields from a Feed than we should and so, the new query under TC is, not only faster because of the database, but because it’s much more lightweighted.

Anyway, we modified the test so that both queries retrieved both fields per row:

Test 2:

  • Partial data from a feed (MySQL): 0.00346 s
  • Partial data from a feed (TC): 0.00151 s

Here we can see that both are slightly similar. Again, this isn’t really fair, as MySQL is executing just one query against several that we do with TC. So, we changed the TC query into a multiget request (request several keys at the same time):

Test 3:

  • Partial data from a feed (MySQL): 0.003533 s
  • Partial data from a feed (TC with Multiget): 0.000845 s

Under exact circunstances it’s clear which one is faster. So, I think we’ll continue experimenting with Tokyo Cabinet and some more real data and see how it performs.

, , , , , ,

8 Comments

When partitioning isn’t enough

These past weeks we’ve been partitioning our database design. The goal was to achieve better scalability. Because Inkzee grows with the number of feeds it holds, not the users, we needed to partitioned the data tables so that we could process feed posts faster.

After altering a lot of our current code so that it worked with the new database design we’ve been experiencing problems with MySQL. It seems that, even though the solution makes the overall system much faster (like 3 to 4 times faster), some operations don’t play too well with MySQL and add an unaccepted latency to the system.

We’ve been resisting the urge to migrate to a schema-less database but it seems we have no other option but to transition to it. So, even though we thought we could have the new design working by the end of the week, we are afraid we’ll have to postpone it until further notice. We’ll keep you guys updated though!

The Inkzee Team

, , , , ,

No Comments

Step 2: Database redesign

As part of our milestones towards opening up Inkzee we have the database redesign. We currently manage more than 2 millions posts and over 4000 blogs. And although it might not seem as a lot, our database is starting to complain. A lot of the queries we do against it are getting really sluggish.

That means that if we ought to open up Inkzee we need to redesign the database so it can sustain a higher load of blogs and posts. We are currently working on it and we’ve done great advances. We have a prototype working with the new design but there are still some bugs and problems to resolve.

We hope the new design is finished sometime during this week. We’ll then fire up our test cases and check nothing is broken and once we’re sure the new design is as flawless as we can get it, we’ll release it to you guys! Hopefully you’ll experience a much faster site, not only on a subscription by subscription basis but specially when you request all posts from all blogs.

We’ll keep you posted!

The Inkzee Team

, , ,

No Comments