FreeWisdom has an excellent guide to backing up to an Amazon Elastic Block Store volume using rsync and an EC2 instance. I’m not going to copy the whole thing here, so for this post to make much sense, you’ll need to go read that first. Go ahead, I’ll wait.
Back already? OK, good.
The guide is very thorough, and I love the idea. It’s just so refreshing to think of “let’s create an entire Ubuntu machine just to temporarily mount a drive long enough to rsync to it, then we’ll get rid of it.” And what’s more, you can automate the whole process via a shell script. It’s a great example of how things change with easily-accessible virtualization.
So I like the FreeWisdom piece, but here are a couple of changes there were helpful for me.
First, when I ran the script from anywhere other than the directory where the id_rsa-gsg-keypair live (~/ec2/ or wherever you have it), ssh couldn’t find it. And if I specified the path as -i ~/ec2/id_rsa-gsg-keypair, then the ssh command passed to rsync failed (I presume because the shell environment under which rsync runs it doesn’t have ‘~/’ setup; I once knew off-hand which part of the shell startup did this, but I’ve flushed it to disk somewhere). You could pass the full path name of your id_rsa-gsg-keypair, but I prefer to just cd there to start with. Keep in mind this means that the output of the rsync command, redirected to ‘out.txt’ in the script, will actually go in this same directory. If that’s not what you want, redirect the output of rsync with a more suitable filename.
The second thing I did was a little more complicated. I have StrictHostKeyChecking turned on, which means that every time I run this against a new instance, ssh asks whether to accept the remote hosts keys as valid. This means I can’t schedule this script to run unattended.
ssh allows some configuration options to be set on the command line. So I’ve added two ssh options to get around this problem.
The first option only has to be added on the first call to ssh. By passing -o “StrictHostKeyChecking no” to the first ssh call, I tell it to automatically accept the remote host key, adding it to the known_hosts file without asking.
This gets rid of the prompting problem, but introduces another: each time this script is run, it’s going to drop another IP/key pair into our known_hosts file, and eventually that will have to be cleaned up. Besides, we only want to accept the key during the time that it’s our instance.
So I add this second option to every ssh call in the script. First, get a reasonably likely unique temporary name (yes, I know there are more rigorous ways to do this; feel free to expand as you have time and interest):
Then tell ssh to use this temporary file to store the key file from the temporary instance by using the -o “UserKnownHostsFile $KNOWN_HOSTS” option.
So the initial ssh call in the FreeWisdom script (line 16 or thereabouts in the original) becomes:
ssh -i id_rsa-gsg-keypair -o “StrictHostKeyChecking no” -o “UserKnownHostsFile $KNOWN_HOSTS” root@$EC2_HOST “mkdir /mnt/samba && mount /dev/sdh /mnt/samba”
Once we’ve connected to the instance once with StrictHostKeyChecking turned off, its key will be in our temporary known_hosts file. Thus, we don’t need the -o “StrictHostKeyChecking no” option in any future ssh commands.
The rsync command then becomes:
rsync -e “ssh -i id_rsa-gsg-keypair -o ‘UserKnownHostsFile $KNOWN_HOSTS’” -avz /Users/rew/Documents root@$EC2_HOST:/mnt/samba/ > out.txt
And the ssh command to umount the volume:
ssh -i id_rsa-gsg-keypair -o “UserKnownHostsFile $KNOWN_HOSTS” root@$EC2_HOST “umount /mnt/samba”
Remove the temporary known_hosts file at the end:
and it’s all cleaned up.
Give it a try, and let me know of any clever extensions you make.