<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Inkzee &#187; schemaless</title>
	<atom:link href="http://blog.inkzee.com/index.php/tag/schemaless/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.inkzee.com</link>
	<description>How to read more in less time</description>
	<lastBuildDate>Wed, 12 May 2010 13:23:28 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/>		<item>
		<title>Tokyo Tyrant and some numbers</title>
		<link>http://blog.inkzee.com/index.php/2009/06/28/tokyo-tyrant-and-some-numbers/</link>
		<comments>http://blog.inkzee.com/index.php/2009/06/28/tokyo-tyrant-and-some-numbers/#comments</comments>
		<pubDate>Sun, 28 Jun 2009 17:13:26 +0000</pubDate>
		<dc:creator>abarrera</dc:creator>
				<category><![CDATA[inkzee]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[schema-less]]></category>
		<category><![CDATA[schemaless]]></category>
		<category><![CDATA[tokyo cabinet]]></category>
		<category><![CDATA[tokyo tyrant]]></category>

		<guid isPermaLink="false">http://blog.inkzee.com/?p=66</guid>
		<description><![CDATA[Tokyo Tyrant is the database server that uses Tokyo Cabinet as backend. It allows you to access the database remotely. It supports 3 protocols, binary, memcache and http. This is great if you have already existing infrastructure.
We needed a php class that implemented the protocol so we took a look at two of them,  [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Tokyo Tyrant</strong> is the database server that uses Tokyo Cabinet as backend. It allows you to access the database remotely. It<strong> supports 3 protocols</strong>, binary, memcache and http. This is great if you have already existing infrastructure.</p>
<p>We needed a php class that implemented the protocol so we took a look at two of them,  <a href="http://openpear.org/repository/Net_TokyoTyrant/">Net_TokyoTyrant</a> with Pete Warden&#8217;s patch and Tyrant by <a class="username" href="http://mamasam.indefero.net/u/golgote/">Bertrand Mansion</a>. The first one supports http and binary protocols, while Tyrant only supports the raw binary protocol.</p>
<p>During the first tests, Net_TokyoTyrant went crazy when inserting over 28000 records over http, so I guess there&#8217;s something wrong with that. When we switched to the binary protocol it worked as expected.</p>
<p>Here are some quick numbers:</p>
<p><strong>Net_TokyoTyrant (100000 keys)</strong></p>
<p>Time inserted: 50.3662779331 secs<br />
Time retrieved: 57.7555668354 secs<br />
Time deleted: 34.1996 secs</p>
<p><strong>Tyrant (100000 keys)</strong></p>
<p>Time inserted: 39.330272913 secs<br />
Time retrieved: 44.3433589935 secs<br />
Time deleted: 26.9360201359 secs</p>
<p><strong>The former is slightly faster</strong> so I guess we&#8217;ll go for it. Specially important is that the <strong>author keeps it up to date</strong>, which is also a plus!</p>
<p><strong>The Inkzee Team</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.inkzee.com/index.php/2009/06/28/tokyo-tyrant-and-some-numbers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>First stats with Tokyo Cabinet</title>
		<link>http://blog.inkzee.com/index.php/2009/06/25/first-stats-with-tokyo-cabinet/</link>
		<comments>http://blog.inkzee.com/index.php/2009/06/25/first-stats-with-tokyo-cabinet/#comments</comments>
		<pubDate>Thu, 25 Jun 2009 02:15:33 +0000</pubDate>
		<dc:creator>abarrera</dc:creator>
				<category><![CDATA[inkzee]]></category>
		<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[pytc]]></category>
		<category><![CDATA[pyTyrant]]></category>
		<category><![CDATA[schemaless]]></category>
		<category><![CDATA[tokyo cabinet]]></category>
		<category><![CDATA[tokyo tyrant]]></category>

		<guid isPermaLink="false">http://blog.inkzee.com/?p=62</guid>
		<description><![CDATA[Today we started testing Tokyo Cabinet as our DBM for the new design. We had some very good references about it, so we thought we should give it a try.
After setting up Tokyo Cabinet, it&#8217;s python binding and Tokyo Tyrant (db server) with it&#8217;s python bindings too we did some fast tests. We drafted a [...]]]></description>
			<content:encoded><![CDATA[<p>Today we<strong> started testing <a href="http://tokyocabinet.sourceforge.net/">Tokyo Cabinet</a></strong> as our DBM for the new design. We had some very good references about it, so we thought we should give it a try.</p>
<p>After setting up Tokyo Cabinet, it&#8217;s python binding and <strong><a href="http://tokyocabinet.sourceforge.net/tyrantdoc/">Tokyo Tyrant</a> (db server)</strong> with it&#8217;s python bindings too we did some fast tests. We drafted a new schema-less design for the new database and<strong> dumped part of some old data</strong> to Tokyo Cabinet.</p>
<p>For those not familiar with the term <strong>schema-less</strong>, it&#8217;s basically a database that has no table structure, that is, everything is stored as a tuple of (key, value). On one side, a key-value database is much faster to read/write but it&#8217;s much harder to maintain and keep in sync.</p>
<p>So, we did some queries (<strong>read only operations</strong>) in both databases an this is what we saw:</p>
<p><strong>Test 1:</strong></p>
<ul>
<li>All data from a feed (MySQL):  0.01699 s</li>
<li>Partial data from a feed (TC): 0.00174 s</li>
</ul>
<p>This first test wasn&#8217;t really fair, as MySQL had to retrieve all fields per record, while TC just had to access a bunch of buckets with fewer fields. We did this first test as it&#8217;s going to be the real scenario, currently we retrieve many more fields from a Feed than we should and so, the new query under TC is, not only faster because of the database, but because it&#8217;s much more lightweighted.</p>
<p>Anyway, we modified the test so that <strong>both queries retrieved both fields per row</strong>:</p>
<p><strong>Test 2:</strong></p>
<ul>
<li>Partial data from a feed (MySQL): 0.00346 s</li>
<li>Partial data from a feed (TC): 0.00151 s</li>
</ul>
<p>Here we can see that both are slightly similar. Again, this isn&#8217;t really fair, as MySQL is executing just one query against several that we do with TC. So, we changed the TC query into a <strong>multiget request</strong> (request several keys at the same time):</p>
<p><strong>Test 3:</strong></p>
<ul>
<li>Partial data from a feed (MySQL): 0.003533 s</li>
<li>Partial data from a feed (TC with Multiget): 0.000845 s</li>
</ul>
<p>Under exact circunstances it&#8217;s clear which one is faster. So, I think we&#8217;ll continue experimenting with Tokyo Cabinet and some more real data and see how it performs.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.inkzee.com/index.php/2009/06/25/first-stats-with-tokyo-cabinet/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>When partitioning isn&#8217;t enough</title>
		<link>http://blog.inkzee.com/index.php/2009/06/24/when-partitioning-isnt-enough/</link>
		<comments>http://blog.inkzee.com/index.php/2009/06/24/when-partitioning-isnt-enough/#comments</comments>
		<pubDate>Wed, 24 Jun 2009 16:51:10 +0000</pubDate>
		<dc:creator>abarrera</dc:creator>
				<category><![CDATA[inkzee]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[partition]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[schema]]></category>
		<category><![CDATA[schemaless]]></category>
		<category><![CDATA[sharding]]></category>

		<guid isPermaLink="false">http://blog.inkzee.com/?p=59</guid>
		<description><![CDATA[These past weeks we&#8217;ve been partitioning our database design. The goal was to achieve better scalability. Because Inkzee grows with the number of feeds it holds, not the users, we needed to partitioned the data tables so that we could process feed posts faster.
After altering a lot of our current code so that it worked [...]]]></description>
			<content:encoded><![CDATA[<p>These past weeks we&#8217;ve been partitioning our database design. The goal was to <strong>achieve better scalability</strong>. Because Inkzee grows with the number of feeds it holds, not the users, we needed to partitioned the data tables so that we could process feed posts faster.</p>
<p>After altering a lot of our current code so that it worked with the new database design we&#8217;ve been experiencing problems with MySQL. It seems that, even though the solution makes the overall system much faster (like 3 to 4 times faster), <strong>some operations don&#8217;t play too well</strong> with MySQL and add an unaccepted latency to the system.</p>
<p>We&#8217;ve been resisting the urge to<strong> migrate to a schema-less database</strong> but it seems we have no other option but to transition to it. So, even though we thought we could have the new design working by the end of the week, we are afraid we&#8217;ll have to <strong>postpone it until further notice</strong>. We&#8217;ll keep you guys updated though!</p>
<p><strong>The Inkzee Team</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.inkzee.com/index.php/2009/06/24/when-partitioning-isnt-enough/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
