How to use Sia Responsibly

In a previous post, I demonstrated how to use us to access low-level storage functionality, uploading raw (unencrypted) file data to a single host. As I stated in that post, this is highly discouraged! Unless you have a very specific use-case, content stored on Sia should always be encrypted and split across multiple hosts. So in today’s post, we’ll explore how to add these properties to our files.

As before, we’ll need the following packages:

import (
    "lukechampine.com/us/renter"
    "lukechampine.com/us/renter/proto"
    "lukechampine.com/us/renter/renterutil"
    "lukechampine.com/us/renterhost"

    "lukechampine.com/frand"
    "gitlab.com/NebulousLabs/Sia/crypto"
)

Code examples will omit error handling for brevity. In real code, don't do that!

Part 1: Redundancy

We’ll start by adding redundancy. This involves processing our file with a Reed-Solomon encoder. The encoder splits the file into m “data shards”, and then generates additional “parity shards” for a total of n shards. To recover the file, we just need any m of the shards, whether they’re data or parity. In this example, we’ll use a 2-of-4 code, which means that we need any 2 of the 4 shards to recover our file. Note that our total redundancy is 2x: the original file fits in 2 shards, but we have 4 shards total.

Encoding the file looks like this:

rs := renter.NewRSCode(2, 4)
data, _ := ioutil.ReadFile("myfile.jpg")
bytesPerShard := len(data) / 2
shards := make([][]byte, 4)
for i := range shards {
    shards[i] = make([]byte, bytesPerShard)
}
rs.Encode(data, shards)

We can test that this works by deleting two of the shards and trying to recover the original file:

shards[0] = nil // delete a data shard
shards[2] = nil // delete a parity shard
var buf bytes.Buffer
rs.Recover(&buf, shards, len(data))

The contents of buf will now match data. If we had deleted one more shard, though, recovery would have failed.

Once we have our encoded shards, we need to upload them to separate hosts. We’ll assume that we’ve already formed the contracts we need and stored them in a directory called contracts/. First, we load the contracts and connect to each host:

contracts, _ := renter.LoadContracts("contracts")
sessions := make([]*proto.Session, 0, 4)
for _, c := range contracts {
    hostIP, _ := siad.ResolveHostKey(c.HostKey)
    s, _ := proto.Session(hostIP, c.HostKey, c.ID, c.RenterKey, currentHeight)
    defer u.Close()
    sessions = append(sessions, s)
}

Then, we upload our shards, one sector at a time:

roots := make([][]crypto.Hash, 4)
for i, shard := range shards {
    var sector [renterhost.SectorSize]byte
    for buf := bytes.NewBuffer(shard); buf.Len() > 0; {
        buf.Read(sector[:])
        root, _ := session.Append(&sector)
        roots[i] = append(roots[i], root)
    }
}

Our file is now stored redundantly across 4 hosts!

Note that the amount of metadata we need to keep track of has increased. In the previous post, we just needed the original filesize, the host's public key, and the Merkle root of each sector we uploaded. For redundant files, we need each host's public key and Merkle roots, as well as the parameters (m and n) of the erasure code.

Part 2: Encryption

Now we’ll add encryption to each shard. It’s important to add encryption after erasure-encoding, not before; otherwise, hosts could cheat. If a host knows that you’re storing a 2-of-4 file, then when you ask for their shard, they can quickly download 2 shards from other hosts and reconstruct their shard on the fly. In other words, they don't need to store their shard at all, because they can just regenerate it on demand! This is bad news for the renter, because it means that the “true” redundancy of their file is lower than they paid for. Encryption prevents this scenario: if the shards are encrypted, applying the erasure code will produce random garbage.

In this example, we’ll use ChaCha (specifically ChaCha20) as our encryption cipher. ChaCha is a good choice because it allows us to decrypt any 64-byte block of data independently. It also has zero storage overhead, so encrypting a sector doesn’t increase its size. Oh, and it's really, really fast. :) We’ll use this package:

import "github.com/aead/chacha20"

First, we need to generate an encryption key for each shard, using random bytes from frand:

keys := make([][]byte, 4)
for i := range keys {
    keys[i] = frand.Bytes(32)
}

Then, we simply encrypt each shard with the appropriate key before uploading:

nonce := make([]byte, 12) // safe to use an empty nonce if the file never changes
for i, shard := range shards {
    chacha20.XORKeyStream(shard, shard, nonce, keys[i])
}

Adding encryption means more metadata to keep track of; namely, we need to store the encryption key for each shard.

Part 3: Downloading

To download, we’ll need to invert the operations we applied when uploading: download the shards, decrypt them, then apply the erasure code to recover the original data. We’ll also need to trim any padding off of the final result.

This should be pretty intuitive, so I’ll just show the whole operation:

dst, _ := os.Create("myfile2.jpg")
defer dst.Close()
rem := filesize
for rootIndex := 0; rem > 0; rootIndex++ {
    // gather each shard
    shards := make([][]byte, 4)
    for i := range shards {
        var buf bytes.Buffer
        _ = session.Read(&buf, []renterhost.RPCReadRequest{{
            MerkleRoot: roots[i][rootIndex],
            Offset:     0,
            Length:     renterhost.SectorSize,
        }})
        sector := buf.Bytes()
        chacha20.XORKeyStream(sector, sector, nonce, keys[i])
        shards[i] = sector
    }
    // recover the two data shards
    writeLen := int64(renterhost.SectorSize * 2)
    if rem < writeLen {
        writeLen = rem
    }
    rs.Recover(dst, shards, int(writeLen))
    rem -= writeLen
}

In practice, we shouldn’t be downloading every shard, since we only need two in order to recover the data. For optimal performance, we can start downloading all of the shards in parallel, and then stop as soon as any two of them complete. Or, if we want to minimize cost, we can sort the hosts by price and then try them in order until we have enough shards. There are lots of possible strategies, which is why us leaves that decision up to you.

Conclusion

We were able to add encryption and redundancy to our files without adding very much code. Now we have a file that is completely private, and will remain available if some hosts go offline! We can also apply performance optimizations, like downloading parallel, to retrieve our file faster than if it was stored on a single host.

In the next post, we’ll look at the functions and formats that us provides to facilitate encryption and redundancy. Without spoiling too much, they’ll save you a lot of time vs. implementing these features yourself.