<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Inkzee &#187; database</title>
	<atom:link href="http://blog.inkzee.com/index.php/tag/database/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.inkzee.com</link>
	<description>How to read more in less time</description>
	<lastBuildDate>Wed, 12 May 2010 13:23:28 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/>		<item>
		<title>Tokyo Tyrant and some numbers</title>
		<link>http://blog.inkzee.com/index.php/2009/06/28/tokyo-tyrant-and-some-numbers/</link>
		<comments>http://blog.inkzee.com/index.php/2009/06/28/tokyo-tyrant-and-some-numbers/#comments</comments>
		<pubDate>Sun, 28 Jun 2009 17:13:26 +0000</pubDate>
		<dc:creator>abarrera</dc:creator>
				<category><![CDATA[inkzee]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[schema-less]]></category>
		<category><![CDATA[schemaless]]></category>
		<category><![CDATA[tokyo cabinet]]></category>
		<category><![CDATA[tokyo tyrant]]></category>

		<guid isPermaLink="false">http://blog.inkzee.com/?p=66</guid>
		<description><![CDATA[Tokyo Tyrant is the database server that uses Tokyo Cabinet as backend. It allows you to access the database remotely. It supports 3 protocols, binary, memcache and http. This is great if you have already existing infrastructure.
We needed a php class that implemented the protocol so we took a look at two of them,  [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Tokyo Tyrant</strong> is the database server that uses Tokyo Cabinet as backend. It allows you to access the database remotely. It<strong> supports 3 protocols</strong>, binary, memcache and http. This is great if you have already existing infrastructure.</p>
<p>We needed a php class that implemented the protocol so we took a look at two of them,  <a href="http://openpear.org/repository/Net_TokyoTyrant/">Net_TokyoTyrant</a> with Pete Warden&#8217;s patch and Tyrant by <a class="username" href="http://mamasam.indefero.net/u/golgote/">Bertrand Mansion</a>. The first one supports http and binary protocols, while Tyrant only supports the raw binary protocol.</p>
<p>During the first tests, Net_TokyoTyrant went crazy when inserting over 28000 records over http, so I guess there&#8217;s something wrong with that. When we switched to the binary protocol it worked as expected.</p>
<p>Here are some quick numbers:</p>
<p><strong>Net_TokyoTyrant (100000 keys)</strong></p>
<p>Time inserted: 50.3662779331 secs<br />
Time retrieved: 57.7555668354 secs<br />
Time deleted: 34.1996 secs</p>
<p><strong>Tyrant (100000 keys)</strong></p>
<p>Time inserted: 39.330272913 secs<br />
Time retrieved: 44.3433589935 secs<br />
Time deleted: 26.9360201359 secs</p>
<p><strong>The former is slightly faster</strong> so I guess we&#8217;ll go for it. Specially important is that the <strong>author keeps it up to date</strong>, which is also a plus!</p>
<p><strong>The Inkzee Team</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.inkzee.com/index.php/2009/06/28/tokyo-tyrant-and-some-numbers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>First stats with Tokyo Cabinet</title>
		<link>http://blog.inkzee.com/index.php/2009/06/25/first-stats-with-tokyo-cabinet/</link>
		<comments>http://blog.inkzee.com/index.php/2009/06/25/first-stats-with-tokyo-cabinet/#comments</comments>
		<pubDate>Thu, 25 Jun 2009 02:15:33 +0000</pubDate>
		<dc:creator>abarrera</dc:creator>
				<category><![CDATA[inkzee]]></category>
		<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[pytc]]></category>
		<category><![CDATA[pyTyrant]]></category>
		<category><![CDATA[schemaless]]></category>
		<category><![CDATA[tokyo cabinet]]></category>
		<category><![CDATA[tokyo tyrant]]></category>

		<guid isPermaLink="false">http://blog.inkzee.com/?p=62</guid>
		<description><![CDATA[Today we started testing Tokyo Cabinet as our DBM for the new design. We had some very good references about it, so we thought we should give it a try.
After setting up Tokyo Cabinet, it&#8217;s python binding and Tokyo Tyrant (db server) with it&#8217;s python bindings too we did some fast tests. We drafted a [...]]]></description>
			<content:encoded><![CDATA[<p>Today we<strong> started testing <a href="http://tokyocabinet.sourceforge.net/">Tokyo Cabinet</a></strong> as our DBM for the new design. We had some very good references about it, so we thought we should give it a try.</p>
<p>After setting up Tokyo Cabinet, it&#8217;s python binding and <strong><a href="http://tokyocabinet.sourceforge.net/tyrantdoc/">Tokyo Tyrant</a> (db server)</strong> with it&#8217;s python bindings too we did some fast tests. We drafted a new schema-less design for the new database and<strong> dumped part of some old data</strong> to Tokyo Cabinet.</p>
<p>For those not familiar with the term <strong>schema-less</strong>, it&#8217;s basically a database that has no table structure, that is, everything is stored as a tuple of (key, value). On one side, a key-value database is much faster to read/write but it&#8217;s much harder to maintain and keep in sync.</p>
<p>So, we did some queries (<strong>read only operations</strong>) in both databases an this is what we saw:</p>
<p><strong>Test 1:</strong></p>
<ul>
<li>All data from a feed (MySQL):  0.01699 s</li>
<li>Partial data from a feed (TC): 0.00174 s</li>
</ul>
<p>This first test wasn&#8217;t really fair, as MySQL had to retrieve all fields per record, while TC just had to access a bunch of buckets with fewer fields. We did this first test as it&#8217;s going to be the real scenario, currently we retrieve many more fields from a Feed than we should and so, the new query under TC is, not only faster because of the database, but because it&#8217;s much more lightweighted.</p>
<p>Anyway, we modified the test so that <strong>both queries retrieved both fields per row</strong>:</p>
<p><strong>Test 2:</strong></p>
<ul>
<li>Partial data from a feed (MySQL): 0.00346 s</li>
<li>Partial data from a feed (TC): 0.00151 s</li>
</ul>
<p>Here we can see that both are slightly similar. Again, this isn&#8217;t really fair, as MySQL is executing just one query against several that we do with TC. So, we changed the TC query into a <strong>multiget request</strong> (request several keys at the same time):</p>
<p><strong>Test 3:</strong></p>
<ul>
<li>Partial data from a feed (MySQL): 0.003533 s</li>
<li>Partial data from a feed (TC with Multiget): 0.000845 s</li>
</ul>
<p>Under exact circunstances it&#8217;s clear which one is faster. So, I think we&#8217;ll continue experimenting with Tokyo Cabinet and some more real data and see how it performs.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.inkzee.com/index.php/2009/06/25/first-stats-with-tokyo-cabinet/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>When partitioning isn&#8217;t enough</title>
		<link>http://blog.inkzee.com/index.php/2009/06/24/when-partitioning-isnt-enough/</link>
		<comments>http://blog.inkzee.com/index.php/2009/06/24/when-partitioning-isnt-enough/#comments</comments>
		<pubDate>Wed, 24 Jun 2009 16:51:10 +0000</pubDate>
		<dc:creator>abarrera</dc:creator>
				<category><![CDATA[inkzee]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[partition]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[schema]]></category>
		<category><![CDATA[schemaless]]></category>
		<category><![CDATA[sharding]]></category>

		<guid isPermaLink="false">http://blog.inkzee.com/?p=59</guid>
		<description><![CDATA[These past weeks we&#8217;ve been partitioning our database design. The goal was to achieve better scalability. Because Inkzee grows with the number of feeds it holds, not the users, we needed to partitioned the data tables so that we could process feed posts faster.
After altering a lot of our current code so that it worked [...]]]></description>
			<content:encoded><![CDATA[<p>These past weeks we&#8217;ve been partitioning our database design. The goal was to <strong>achieve better scalability</strong>. Because Inkzee grows with the number of feeds it holds, not the users, we needed to partitioned the data tables so that we could process feed posts faster.</p>
<p>After altering a lot of our current code so that it worked with the new database design we&#8217;ve been experiencing problems with MySQL. It seems that, even though the solution makes the overall system much faster (like 3 to 4 times faster), <strong>some operations don&#8217;t play too well</strong> with MySQL and add an unaccepted latency to the system.</p>
<p>We&#8217;ve been resisting the urge to<strong> migrate to a schema-less database</strong> but it seems we have no other option but to transition to it. So, even though we thought we could have the new design working by the end of the week, we are afraid we&#8217;ll have to <strong>postpone it until further notice</strong>. We&#8217;ll keep you guys updated though!</p>
<p><strong>The Inkzee Team</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.inkzee.com/index.php/2009/06/24/when-partitioning-isnt-enough/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Step 2: Database redesign</title>
		<link>http://blog.inkzee.com/index.php/2009/06/15/step-2-database-redesign/</link>
		<comments>http://blog.inkzee.com/index.php/2009/06/15/step-2-database-redesign/#comments</comments>
		<pubDate>Mon, 15 Jun 2009 16:17:19 +0000</pubDate>
		<dc:creator>abarrera</dc:creator>
				<category><![CDATA[inkzee]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[milestone]]></category>
		<category><![CDATA[redesign]]></category>

		<guid isPermaLink="false">http://blog.inkzee.com/?p=56</guid>
		<description><![CDATA[As part of our milestones towards opening up Inkzee we have the database redesign. We currently manage more than 2 millions posts and over 4000 blogs. And although it might not seem as a lot, our database is starting to complain. A lot of the queries we do against it are getting really sluggish.
That means [...]]]></description>
			<content:encoded><![CDATA[<p>As part of our milestones towards opening up Inkzee we have the<strong> database redesign</strong>. We currently manage more than <strong>2 millions posts and over 4000 blogs</strong>. And although it might not seem as a lot, our database is starting to complain. A lot of the queries we do against it are getting really sluggish.</p>
<p>That means that if we ought to open up Inkzee we need to <strong>redesign the database so it can sustain a higher load of blogs and posts</strong>. We are currently working on it and we&#8217;ve done great advances. We have a prototype working with the new design but there are still some bugs and problems to resolve.</p>
<p>We hope the new design is finished sometime during this week. We&#8217;ll then fire up our test cases and check nothing is broken and once we&#8217;re sure the new design is as flawless as we can get it, we&#8217;ll release it to you guys! Hopefully you&#8217;ll experience a much faster site, not only on a subscription by subscription basis but specially when you request all posts from all blogs.</p>
<p>We&#8217;ll keep you posted!</p>
<p><strong>The Inkzee Team</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.inkzee.com/index.php/2009/06/15/step-2-database-redesign/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
