High Scalability - High Scalability - Pomegranate - Storing Billions and Billions of Tiny Little Files
Pomegranate is a novel distributed file system built over distributed tabular storage that acts an awful lot like a NoSQL system. It's targeted at increasing the performance of tiny object access in order to support applications like online photo and micro-blog services, which require high concurrency, high throughput, and low latency. Their tests seem to indicate it works:
<blockquote> <p> We have demonstrate that file system over tabular storage performs well for highly concurrent access. In our test cluster, we observed <strong>linearly </strong>increased more than <strong>100,000</strong> aggregate read and write requests served per second (<span class="caps"><span class="caps">RPS</span></span>).<em> </em> </p> </blockquote> <p> Rather than sitting atop the file system like almost every other K-V store, Pomegranate is baked into file system. The idea is that the file system API is common to every platform so it wouldn't require a separate API to use. Every application could use it out of the box. </p> <p> The features of Pomegranate are: </p> <ul> <li> It handles billions of small files efficiently, even in one directory; </li> <li> It provide separate and scalable caching layer, which can be snapshot-able; </li> <li> The storage layer uses log structured store to absorb small file writes to utilize the disk bandwidth; </li> <li> Build a global namespace for both small files and large files; </li> <li> Columnar storage to exploit temporal and spatial locality; </li> <li> Distributed extendible hash to index metadata; </li> <li> Snapshot-able and reconfigurable caching to increase parallelism and tolerant failures; </li> <li> Pomegranate should be the first file system that is built over tabular storage, and the building experience should be worthy for file system community. </li> </ul> </blockquote> <div class="posterous_quote_citation"> via <a href="http://highscalability.com/blog/2010/8/30/pomegranate-storing-billions-and-billions-of-tiny-little-fil.html">highscalability.com</a> </div> <p> Very cool technology. This reminded me of a distributed filesystem Google Tech Talk (<a href="http://www.youtube.com/watch?v=3xKZ4KGkQY8">http://www.youtube.com/watch?v=3xKZ4KGkQY8</a>) on Wuala (<a href="http://www.wuala.com/)">http://www.wuala.com/)</a> that I found fascinating for all the little problems they had to overcome to make this work. </p>