RexRay is a plugin module available for use with Docker which provides the ability to use shared storage as a Docker volume. It is quick to setup and provides near seamless data sharing between containers. We review it's basic design and detail tips for it's use in the AWS environment. The plugin design supports many different environments.
Quick primer on Docker volumes
Docker volumes are an abstraction provided as part of the engine, providing a filesystem space (and therefor some level of persistent storage), which is managed by the daemon itself. This allows containers to access storage while remaining agnostic to the underlying host system.
Docker volumes can provide access to local filesystems but, in order to avoid external dependencies, the more common pattern is to use volumes set up and accessed through the Docker system using the volume sub-commands. You can set up a simple volume using `docker volume create myvolume`. Connecting it to your container is as simple as passing the volume name and the location to mount it to the filesystem within the container.
You can use the same volume on any number of containers to share the data or as a means to persist data if across running containers, or across instances of the same container. When used this was it generally falls under the term Persistent Volume. By default the volume is local storage; concurrent access to the files follows standard Unix filesystem rules.
From the introduction of volumes, there has been a call for volumes which can span hosts. The need to share files between hosts is not new, going back years and numerous protocols (NFS, CIFS, AFS, etc.) I'm not going to go into the complications that can arise with distributed files, but you should be aware that they do exist.
The basics of using RexRay (up and running)
RexRay in conjunction with AWS comes in three flavors. We are going to use the S3FS flavor in order to use S3 as our storage provider. Other options within the AWS ecosystem are EFS and EBS. Each comes with it's own qualities, as see in this table:
|EFS||File||strong consistency||AWS region wide (when available in a given region)||medium IO performance|
|EBS||Block||strong consistency||bound to an AWS availability zone||high IO performance|
|S3||Object||eventual consistency||available inside and outside of AWS||high performance in put and get operations; fair perf. outside of AWS itself|
In many cases you would use the EBS storage, which offers the best basic performance for general files, but does not extend beyond the local AWS Availability Zone. What I like about using the S3 storage is that it works literally everywhere, even from your local laptop, so it is easy to test and has a broad use case. The main downside outside of AWS is that it will of course have a high latency on file creation, so is not advisable in a production use case outside of AWS except in certain limited conditions.
Set up a user account in AWS
In the AWS Console add a new user and make sure to generate a set of Access Keys. You will be prompted to download the keys and you should do so. We're going to limit the access to just S3.
Adding the user
Log into the console, select the Services and choose IAM...
Next, select the "Add user" button. Type in a username and check the "Programmatic Access" checkbox. This user does not need the "AWS Management Console access" and you may leave it unchecked. Click "Next".
Now select the "Attach existing policies directly", and click the checkbox next to the policy "AmazonS3FullAccess" (Don't click on the name, as this jumps you over to the policy detail screen). Click "Next".
You're now on the review screen. If everything looks fine click "Create User". This will generate the API keys which you should download using the "Download .csv" link. This file is a simple text file which holds the user and secret access keys, and is used to configure the plugin.
Now install the plugin and use
From your command line, run the following command at the shell prompt. You will need to open the CSV file you downloaded in the previous steps so as to fill in the access credentials. (Don't type just the 'X's, that won't work.)
docker plugin install rexray/s3fs:0.9.2 S3FS_ACCESSKEY=XXXXXXXXXXXXX S3FS_SECRETKEY=XXXXXXXXXXXXXXXXX
The plugin will request certain permissions, above what is normally accorded a container. This allows the plugin to manage mounts and access the remote storage APIs.
Plugin "rexray/s3fs:0.9.2" is requesting the following privileges:
- mount: [/dev]
- allow-all-devices: [true]
- capabilities: [CAP_SYS_ADMIN]
Do you grant the above permissions? [y/N] y
Once loaded you will be able to see the plugin, as well as your S3 buckets, represented as Docker volumes.
docker plugin ls
docker volume ls
This is immediately useful, but should trigger some caution. This isn't something that you can control; visibility of the buckets is not configurable as of RexRay/0.9.2. We need to accept that there may be some information leakage if our bucket names are too descriptive, but this is likely an acceptable trade off.
As a first demo try running the following commands. I'm going to suggest that you add your phone number or postal code to the end of the volume name, just to avoid any collisions in names (remember that S3 has a completely global namespace):
docker volume create --driver rexray/s3fs:0.9.2 myrexvol-<random number>
docker container run -it -v myrexvol-<random number>:/myvol centos
...inside the container here...
# cd /myvol
# date >mydate
# ls -l
If you create any files in the /myvol directory in the container, you will see them in S3 (try running
date >mydate). They can be shared among containers by referring to the same name.
It is important to note that the buckets are used in order to bring storage to Docker containers, and as such places a prefix to the stored files of /data. We will see that this affects how we access the S3 storage.
In the next installment, we are going to explore more of the docker plugin behavior and how to further control access...