28
Nov

Backing up to an EBS volume with rsync and EC2

   Posted by: rew   in Tech

FreeWisdom has an excellent guide to backing up to an Amazon Elastic Block Store volume using rsync and an EC2 instance. I’m not going to copy the whole thing here, so for this post to make much sense, you’ll need to go read that first. Go ahead, I’ll wait.

Back already? OK, good.

The guide is very thorough, and I love the idea. It’s just so refreshing to think of “let’s create an entire Ubuntu machine just to temporarily mount a drive long enough to rsync to it, then we’ll get rid of it.” And what’s more, you can automate the whole process via a shell script. It’s a great example of how things change with easily-accessible virtualization.

So I like the FreeWisdom piece, but here are a couple of changes there were helpful for me.

First, when I ran the script from anywhere other than the directory where the id_rsa-gsg-keypair live (~/ec2/ or wherever you have it), ssh couldn’t find it. And if I specified the path as -i ~/ec2/id_rsa-gsg-keypair, then the ssh command passed to rsync failed (I presume because the shell environment under which rsync runs it doesn’t have ‘~/’ setup; I once knew off-hand which part of the shell startup did this, but I’ve flushed it to disk somewhere). You could pass the full path name of your id_rsa-gsg-keypair, but I prefer to just cd there to start with. Keep in mind this means that the output of the rsync command, redirected to ‘out.txt’ in the script, will actually go in this same directory. If that’s not what you want, redirect the output of rsync with a more suitable filename.

The second thing I did was a little more complicated. I have StrictHostKeyChecking turned on, which means that every time I run this against a new instance, ssh asks whether to accept the remote hosts keys as valid. This means I can’t schedule this script to run unattended.

ssh allows some configuration options to be set on the command line. So I’ve added two ssh options to get around this problem.

The first option only has to be added on the first call to ssh. By passing -o “StrictHostKeyChecking no” to the first ssh call, I tell it to automatically accept the remote host key, adding it to the known_hosts file without asking.

This gets rid of the prompting problem, but introduces another: each time this script is run, it’s going to drop another IP/key pair into our known_hosts file, and eventually that will have to be cleaned up. Besides, we only want to accept the key during the time that it’s our instance.

So I add this second option to every ssh call in the script. First, get a reasonably likely unique temporary name (yes, I know there are more rigorous ways to do this; feel free to expand as you have time and interest):

KNOWN_HOSTS=”/tmp/known_hosts.$$”

Then tell ssh to use this temporary file to store the key file from the temporary instance by using the -o “UserKnownHostsFile $KNOWN_HOSTS” option.

So the initial ssh call in the FreeWisdom script (line 16 or thereabouts in the original) becomes:

ssh -i id_rsa-gsg-keypair -o “StrictHostKeyChecking no” -o “UserKnownHostsFile $KNOWN_HOSTS” root@$EC2_HOST “mkdir /mnt/samba && mount /dev/sdh /mnt/samba”

Once we’ve connected to the instance once with StrictHostKeyChecking turned off, its key will be in our temporary known_hosts file. Thus, we don’t need the -o “StrictHostKeyChecking no” option in any future ssh commands.

The rsync command then becomes:

rsync -e “ssh -i id_rsa-gsg-keypair -o ‘UserKnownHostsFile $KNOWN_HOSTS’” -avz /Users/rew/Documents root@$EC2_HOST:/mnt/samba/ > out.txt

And the ssh command to umount the volume:

ssh -i id_rsa-gsg-keypair -o “UserKnownHostsFile $KNOWN_HOSTS” root@$EC2_HOST “umount /mnt/samba”

Remove the temporary known_hosts file at the end:

rm $KNOWN_HOSTS

and it’s all cleaned up.

Give it a try, and let me know of any clever extensions you make.

Reblog this post [with Zemanta]

Tags: , , , , ,

This entry was posted on Friday, November 28th, 2008 at 12:32 pm and is filed under Tech. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

3 comments so far

mark
 1 

Could you create an EBS volume right before starting the instance, format/mount it, do your rsync, shut down the EC2, take a snapshot of it to S3, then delete the EBS volume? It seems like doing that should cost nothing, as it would only exist for minutes/hours and you could make it huge, then the S3 snapshot would only be the size of the actual data on the EBS volume..

Just trying to figure out another way to do this without paying for empty EBS space :)

July 13th, 2009 at 10:46 pm
rew
 2 

That might work, though if you’re just looking to rsync to S3 (which is very close to what you’d have here), there are easier ways to do that than running it through an EC2 instance.

I’m not certain how syncing an EC2 instance with an EBS block mounted would work, though it should work fine.

S3 storage actually costs a little more than EBS, but as you mention, you only pay for what you use, whereas with EBS you pay for what you allocate.

Another option to think about, though, is allocating an EBS only about the size you need. Snapshot it frequently to S3, and when you need more room, build a new, slightly larger, EBS from a snapshot and then toss the old one. Rinse, repeat. If your data size grows predictably, that might be a good option for you, so you’re paying the cheaper storage price for EBS space, but not paying for a lot of empty space.

EBS isn’t just S3 on steroids; it provides some abilities that S3 or an EC2 instance doesn’t, and so it’s priced differently. Whether creating+syncing EC2 instances that included and EBS block would end up being cheaper (versus using something like JungleDisk to rsync directly to your S3 storage from your non-Amazon data source) will depend on the size of your dataset compared to the size of the instance itself.

Let me know what you end up doing.

July 14th, 2009 at 9:22 am
 3 

Wow, wonderful blog layout! How long have you been blogging for? you made blogging look easy. The overall look of your web site is great, as well as the content!

May 11th, 2012 at 6:20 am

Leave a reply

Name (*)
Mail (will not be published) (*)
URI
Comment