Skip to main content

Sharding

Overview

In general, sharding is a method for horizontally splitting and distributing files across multiple storage providers in a decentralized network, and provides the following advantages:

  • Data is secure and private: By splitting and distributing data to different storage providers, the network makes sure that no one can reconstruct a file. Furthermore, users can choose to encrypt their files, adding another layer of security.
  • Reduced risk of data loss: As more storage providers join the network, the level of redundancy increases, and the protocol distributes multiple copies of each block of data among different storage providers. Thus, a storage consumer can retrieve a file even when some of the storage providers are unavailable.
  • Faster download speeds: When the content of a file is distributed among a large number of storage providers, the network is not prone to bottlenecks, and the protocol can retrieve the shards simultaneously from the storage providers that store them.

To store data, the first version of the Iagon protocol used a modified version of IPFS, a distributed system that relies on a peer-to-peer storage network. IPFS splits data into shards, assigns a unique identifier to each shard, and stores this unique identifier together with the list of peers hosting that shard in a distributed hash table (DHT). A DHT is a mapping of unique IDs to the peers storing data. Note that the unique identifier does not indicate where a specific shard is stored. To retrieve a file, a storage consumer must query the DHT twice:

  • Find the list of peers storing a specific shard.
  • Find addresses of the peers identified in step #1.

To meet GDPR compliance and other regulatory requirements, Iagon has built a mechanism that will allow the network to group nodes situated in a specific country or having a specific bandwidth.

In this versions of the protocol, Iagon deploys its own sharding protocol that will differ from IPFS in a number of ways. Iagon’s sharding protocol improve the retrievability of data and the resilience of the protocol.

The following subsections describe the most important differences.

Upload a File

Encode the File using the Error-Correcting Codes

Error-correcting codes were initially used for encoding CD-ROMs and hard disk drives to protect against scratches and lost or flipped bits/bytes. A Reed-Solomon encoder pads each block of data with a sequence of redundant bytes that is computed based on the initial block of data. This allows the decoder to retrieve the entire block of data even if one or more parts are missing or contain errors. Iagon also uses this algorithm to retrieve a file when shards are lost. Note that the error-correcting codes can be scaled up or down. Thus, even if the process increases the file size, the amount of data required to decode a file can be parametrized. For example, the algorithm can be configured in such a way that 80% of the file is needed to decode the original file.

Iagon will use error-correcting codes to restore a file when a number of shards cannot be retrieved.

Data Encryption

The data requires to be encrypted in some form apart from encoding for error before it can be divided into shards. The storage consumers get to select whether to encrypt the data depending upon the plan they choose. The data is encrypted in the first phase regardless the further encyption will depend subscribed plan. The storage cost will be effected as per the encyption method used since it increases the size of the data and the computational cost to decrypt the data adds up as well.

Improve Robustness with Random Masks

Since the encoded file is prone to the potential loss of small amounts of data, Iagon uses a probabilistic approach for the subsequent sharding. In particular, the protocol uses a random mask to select only small parts of the data. The encoded file is a long list of bytes, and the protocol randomly selects a number of these bytes. Thus, the probability of selecting any individual byte is a parameter named byte selector probability (P1) that can range between slightly over 0% and 100%.

Example: The size of the file a storage consumer wishes to upload is 1 kB. This means that there are 100 bytes in a row, P1 is 3%, and each mask will select 3% of the total of 1000 bytes. Thus, for each 1kB file, the protocol randomly selects 30 bytes.

Masked Data

The value of the byte selector probability parameter determines the following behaviors:

  • A low probability means that Iagon will split the file into a large number of shards. Thus, if a threat actor can decrypt a shard, they will only see a small number of bytes.
    • A high probability means Iagon will split the file into a smaller number of shards. For files that need to be quick to reassemble, such as streaming media, this parameter should have a higher value.

Compress the Masked File Shards

The masked data is a data block of the same size as the initial block, but most of the bytes are set to the zero value ( 00000000). The system extracts useful information by removing all the zero bytes. This way, the shard is compressed.

Distribute the Shards

The protocol distributes the shards to storage providers that meet all the requirements specified by the storage consumers when they upload the file.

Download a File

Recombine the Compressed Shards

When a storage consumer wishes to download a file, the protocol retrieves the shards and recombines them. It is not required to decompress the shards.

Decode the Recombined Shards

Once the protocol retrieves and recombines the required number of shards, the protocol decodes them using error-correcting codes. If bytes are missing, the network uses the error-correcting codes to replace the missing bytes and reconstructs the file.