Gunkan: The Landscape of Object Storage

Storage paradigm: Object vs. File vs. Block

Here do we only consider the storage for unstructured data. Many other storage technologies exist, and you won't find them on the present page. There won't be any word for the many falvors of NoSQL services that could be seen as "just" storage services.

Block storage

At its bare minimum, the storage of data is just the persistency of bytes on a physical medium, represented to the application as a volume. Juste a box for bytes, split in blocks of fixed size that you access with an offset that you manage in the application. Would you share it? Then it would just be necessary to expose the same volume to several applications and manage the concurrency. The most common usage is to "format" (wahooooo) the volume and then expose a file system (cf. below).

Among the well-known technologies, we cite SATA, SAS, SAN, NVMe-oF, AoE (ATA-over-Ethernet)

File Storage

Volume are fine but not very practical, since you end up developing the organization and sharing semantics endlessly. The notion that appeared was then the filesystem and the POSIX norm.

Technologies: NAS, POSIX. Protocols: NFS, CIFS, Samba. Solutions: BeeGeeFS, (IBM) gpfs, (DDN) Lustre, glusterFS, RozoFS, GekkoFS, DelveFS, IPFS, TahoeFS

Object Storage

The new kid on the block.

Purpose of the Object Storage

Yet another trade-off

With the constant evolution of prices, the breakthrough (and then the evolution) of new technologies, the cursor between where to place the compute and the storage is a long story of many different consensus. It started with DC-only computers connected to "light clients". And depending on both the price and the availability of compute and network technologies, we passed to "heavy clients", went back to "rich clients", entered de "cloud era" with its lots of "data warehouses", "data lakes" and many more buzzwords.

Breaking the adherence to the system.

I was too young in the industry to have observed the raise of Object Storage in pre-2000 years. I cannot telll if the benefit of releasing the adherence to the underlying system was something that huge. But this is _THE_ point that makes the difference. When an infrastructure proposes an Object Storage service, all you need is a network access to it, and some credentials. And this was not that obvious before the era of containers and even in the early years of virtual machines.

The point of Object Storage is to be easy to consume, limitless in terms of capacity and parallelism.

Unleash the parallelism.

If there is no adherence to a mountpoint exposed by the system, deploying an application that is client from a object storage becomes much faster. The Object Storage paradigm opens the dorrs to serverless, lambda functions, etc. An Object Storage promises flexibility and elasticity of the layer of client applications. It becomes possible to spawn much more workers than it is possible to share the access to a volume.

The right paradigm for the Cloud Era.

There is no way back, the adoption of the object storage paradigm will continue to grow.

Key Factors for Object Storage

The use case evolved from a decade to another.

The 1998-2008 years: TCO. The whole story of the Object Storage started circa 1998. The object storage became an acceptable target for archiving use cases. And the only key factor was to TCO that had to be the lowest possible.

The 2008-2018 years: scalability and features. The general amount of human-centric data was exponentially increasing and it was mostly made of media file. The scalability then became a concern, we then talked about data warehouses, data lakes. The goal became to share the increasing amount of data and to share it. To the TCO, new goals appeared with the need of scalability and sharing features.

The post-2018 years: performance. The data amount of data never ceased to increase but they are mostly generated by machines and destined to be consumed by machines. The throughput of the storage systems is now not limited the the ingest of the braisn of the human users of a platform. The focus shifts toward the need for extreme performance.

Does this mean that a solution should be the best on all those aspects? Not mandatorily!

Object Storage Architecture Elements

Data Placement and Lookup

  • Static Placement
  • Index based
  • Algorithm based

Tiering Management

  • Hardware Agnostic
  • Federation or Inner Scalability
  • Hybridation

Consistency:

  • Read-after-Write consistency
  • List-after-Write consistency

Object Storage competition

Open Source / Open Core

OpenIO, Minio, Ceph, Swiftstack Swift,

Discontinued Projects

BlobSeer, UrsaMinor, EMC Atmos, EMC ViPR, HDS ActiveScale (formerly Amplidata),

Expensive Software Vendors

Those listed here-below will lead you to consider selling a kidney on the black market, just to pay the yearly license. On the bright side you will help someone to live, someone that will maybe achieve great things for the humanity. The downside is that you will as probably help a moron, but also that you won't be able to sell more than 1 kidney. So you will need to find healthy peope that will do it for you. And we sincerely hope it will be done voluntarily.

Cloudian Hyperstore, Caringo Swarm, DDN WOS / Web Object Scaler, Scality Ring, Huawei FusionStorage, HDS HCP, NetApp StorageGrid, Dell-EMC ECS / Elastic Cloud Storage

Storage as a Service

Object Storage API

  • De facto standard: Amazon S3 (Simple Object Storage).
  • The community-driven open standard: Openstack Swift.
  • The comitee-lead open standard: CDMI (Cloud Data Management Interface).
  • Custom vendor API: Google Cloud Storage, Azure Block, Backblaze B2

Object Storage Providers

  • OVHCloud public object storage
  • Awazon S3 (Simple Storage Service)
  • Google CS (Cloud Storage)
  • Azure Block Storage
  • Backblaze B2
  • Wasabi.com

Gunkan Competitors

Ceph vs. Gunkan

Gunkan has no static placement algorithm. In other words, there is no CrushMap in Gunkan.

In terms of blob store, the data layer in Gunkan is made of services whose role is very similar to the OSD in Ceph. The API is plain HTTP(S) for the sake of simplicity. The implementation relies the local filesystem as does the first OSD, also for the sake of simplicity.

The management of object view and the listing of objects,

Minio vs. Gunkan

Gunkan has no static placement algorithm.

In terms of blob store,

The management of object view and the listing of objects,

Scality vs. Gunkan

Gunkan has no static placement algorithm. In other words, there is no Ring in Gunkan.

In terms of blob store,

The management of object view and the listing of objects,

OpenIO vs. Gunkan

Neither Gunkan nor OpenIO have a statis placement algorithm for the data. OpenIO instead uses a directory. Gunkan instead uses lookup and caches.

In terms of blob store, the RAWX service of OpenIO and the blob-store-fs of Gunkan are *very* similar.

The management of object view and the listing of objects is managed by the container in OpenIO that is also part of the object store. A multipart object is usually represented with an additional indirection. In Gunkan, the plan is (i.e. this has not been implemented yet) to separate the

Please, consider Gunkan as the minimal architecture that can bring the same service than OpenIO on little environments.