S3Proxy # Encryption ## Motivation The motivation behind this implementation is to provide a fully transparent and secure encryption to the s3 client while having the ability to write into different clouds. ## Cipher mode The chosen cipher is ```AES/CFB/NoPadding``` because it provides the ability to read from an offset like in the middle of a ```Blob```. While reading from an offset the decryption process needs to consider the previous 16 bytes of the AES block. ### Key generation The encryption uses a 128-bit key that will be derived from a given password and salt in combination with random initialization vector that will be stored in each part padding. ## How a blob is encrypted Every uploaded part get a padding of 64 bytes that includes the necessary information for decryption. The input stream from a s3 client is passed through ```CipherInputStream``` and piped to append the 64 byte part padding at the end the encrypted stream. The encrypted input stream is then processed by the ```BlobStore``` to save the ```Blob```. | Name | Byte size | Description | |-----------|-----------|----------------------------------------------------------------| | Delimiter | 8 byte | The delimiter is used to detect if the ```Blob``` is encrypted | | IV | 16 byte | AES initialization vector | | Part | 4 byte | The part number | | Size | 8 byte | The unencrypted size of the ```Blob``` | | Version | 2 byte | Version can be used in the future if changes are necessary | | Reserved | 26 byte | Reserved for future use | ### Multipart handling A single ```Blob``` can be uploaded by the client into multiple parts. After the completion all parts are concatenated into a single ```Blob```. This procedure will result in multiple parts and paddings being held by a single ```Blob```. ### Single blob example ``` ------------------------------------- | ENCRYPTED BYTES | PADDING | ------------------------------------- ``` ### Multipart blob example ``` ------------------------------------------------------------------------------------- | ENCRYPTED BYTES | PADDING | ENCRYPTED BYTES | PADDING | ENCRYPTED BYTES | PADDING | ------------------------------------------------------------------------------------- ``` ## How a blob is decrypted The decryption is way more complex than the encryption. Decryption process needs to take care of the following circumstances: - decryption of the entire ```Blob``` - decryption from a specific offset by skipping initial bytes - decryption of bytes by reading from the end (tail) - decryption of a specific byte range like middle of the ```Blob``` - decryption of all previous situation by considering a underlying multipart ```Blob``` ### Single blob decryption First the ```BlobMetadata``` is requested to get the encrypted ```Blob``` size. The last 64 bytes of ```PartPadding``` are fetched and inspected to detect if a decryption is necessary. The cipher is than initialized with the IV and the key. ### Multipart blob decryption The process is similar to the single ```Blob``` decryption but with the difference that a list of parts is computed by fetching all ```PartPadding``` from end to the beginning. ## Blob suffix Each stored ```Blob``` will get a suffix named ```.s3enc``` this helps to determine if a ```Blob``` is encrypted. For the s3 client the ```.s3enc``` suffix is not visible and the ```Blob``` size will always show the unencrypted size. ## Tested jClouds provider - S3 - Minio - OBS from OpenTelekomCloud - AWS S3 - Azure - GCP - Local ## Limitation - All blobs are encrypted with the same key that is derived from a given password - No support for re-encryption - Returned eTag always differs therefore clients should not verify it - Decryption of a ```Blob``` will always result in multiple calls against the backend for instance a GET will result in a HEAD + GET because the size of the blob needs to be determined