High Scalability – High Scalability – Pomegranate – Storing Billions and Billions of Tiny Little Files

Pomegranate is a novel distributed file system built over distributed tabular storage that acts an awful lot like a NoSQL system. It’s targeted at increasing the performance of tiny object access in order to support applications like online photo and micro-blog services, which require high concurrency, high throughput, and low latency. Their tests seem to indicate it works:

We have demonstrate that file system over tabular storage performs well for highly concurrent access. In our test cluster, we observed linearly increased more than 100,000 aggregate read and write requests served per second (RPS). 

Rather than sitting atop the file system like almost every other K-V store, Pomegranate is baked into file system. The idea is that the file system API is common to every platform so it wouldn’t require a separate API to use. Every application could use it out of the box.

The features of Pomegranate are:

  • It handles billions of small files efficiently, even in one directory;
  • It provide separate and scalable caching layer, which can be snapshot-able;
  • The storage layer uses log structured store to absorb small file writes to utilize the disk bandwidth;
  • Build a global namespace for both small files and large files;
  • Columnar storage to exploit temporal and spatial locality;
  • Distributed extendible hash to index metadata;
  • Snapshot-able and reconfigurable caching to increase parallelism and tolerant failures;
  • Pomegranate should be the first file system that is built over tabular storage, and the building experience should be worthy for file system community. 

Very cool technology. This reminded me of a distributed filesystem Google Tech Talk (http://www.youtube.com/watch?v=3xKZ4KGkQY8) on Wuala (http://www.wuala.com/) that I found fascinating for all the little problems they had to overcome to make this work.

Posted via email from Sijin Joseph