by Tobi Knaup
Airbnb is a rapidly growing marketplace and our infrastructure needs are evolving. Like many other websites, we were using MySQL as our main datastore, but we recently migrated it to RDS. RDS (Relational Database Service) is a web service by Amazon that lets you run a scalable MySQL setup in the cloud while hiding the administrative overhead that usually comes with it. This blog post is about the challenges we faced with MySQL, how RDS solves them, and how we migrated to RDS from our old setup.
The challenges
Before we switched to RDS, Airbnb was running on a single MySQL instance on top of Amazon EC2. We have dozens of background tasks that handle things like payments and analytics. Many of the queries that are run by these tasks are expensive SELECTs, and we were starting to notice an impact on frontend performance. MySQL has an easy solution for read-intensive applications (as most websites are): just add a read-only slave server that uses asynchronous replication and divide read queries between the servers… Except setting up replication takes some care and on-going attention. Our engineering team is already maxed out, so adding that much complexity didn’t seem like a good idea.
While EC2 is generally very reliable, there is always a chance of an instance failure. The recovery process we had in place was completely manual. We would have had to launch a new instance and restart the site after we changed the host name. Our data backup solution for the old setup was to take a snapshot of the underlying EBS volume every couple of hours. In other words, the site would have been offline for a couple hours and there was a possibility of data loss.
As a website grows, the database eventually becomes a bottleneck. Scaling horizontally (sharding) is always a delicate task, and scaling vertically with our old setup would have involved a significant amount of planning and manual work. In order to have consistent data, we would have had to take the site offline, take a snapshot, then launch a bigger instance with that snapshot.
How RDS solved these challenges for us
Amazon RDS supports asynchronous master-slave replication, and it can be launched with the click of a button or an API call. The time-intensive administration tasks are all handled by RDS, and our engineers can spend more time developing features.
RDS supports Multi-AZ deployments (multiple availability zones), where it transparently keeps a hot-standby master in a different AZ that is synchronously updated. Unlike asynchronous replication, there is no replication lag, meaning that the hot standby always has the same data as the live master. In case of an instance failure, network outage, or even unavailability of the whole AZ of the master, the hot standby is automatically promoted to be the new master. The fail-over process is completely automated: AWS changes the CNAME record of the master under the hood.
A Multi-AZ setup gives you a good amount of redundancy, but there is always a worst-case scenario. What happens if the hot standby fails too? For those rare cases (or when you don’t want to spend the extra money for Multi-AZ), RDS provides point-in-time recovery, which lets you boot a new database instance using a consistent snapshot of your data at any time within your data retention period. Yes, any second you want, and up to about five minutes before a failure! The data retention period is one day by default but can be extended by up to 8 days. In addition to that, RDS allows you to pull a consistent snapshot of your data at any time, much like an EBS snapshot. This is great for archiving.
RDS provides a number of different instance types, similar to EC2. You can start with a small, 1.7GB 1-core server, and scale up to a massive quadruple extra large 68GB 8-core. This range should cover our needs for quite some time. This operation inevitably causes some downtime, but since the whole process is automated, it is kept to a minimum. And again, there is an API for that.
How to switch
Everything is straightforward if you are setting up a site for the first time, but what if you have an existing database with gigabytes of data that you want to migrate to RDS? This is where the fun begins. At the time we did the migration, we already had gigabytes of data and some tables with millions of records. RDS instances only expose MySQL and don’t give you direct access to the machine via SSH or other tools, so unfortunately there is no way to do this on the file system level. The only practical way to move your data over is to do a mysqldump on the source machine, followed by a mysqlimport in RDS. This can take a significant amount of time if you have a lot of data, and unless you want to spend days or weeks figuring out a complex migration strategy, the only option is to take down the site during the process. We wanted to keep the downtime as low as possible, so we looked for simple ways to transfer a significant amount of our data while the site was still running. We came up with the idea to copy the data in tables that never receive UPDATEs or DELETEs, so we would only have to copy the diff later on. Once those tables were carefully selected, the procedure was pretty straightforward. The actual script that we used to do our migration is available at
This approach allowed us to perform the whole migration with only 15 minutes of downtime. The new setup eliminates the effects of long-running queries on our frontend performance, and generally seems to perform significantly better than off-the-shelf MySQL on EC2. Even more important than that, we are now well prepared for our future growth.
thanks for taking the time to write this up and post it!
This is really useful info, thanks. I’m running a smallish MySQL db on a dedicated machine and was wondering about RDS, or an EC2 instance with MySQL. After reading this I think I’d sleep a lot better at night with RDS. I didn’t know it came packed with such an amount of features.
how big is your database?
Hi Tobi — make sure you check that your character encodings are correct with your RDS instance. They certainly weren’t for us by default, and the ec2 tools were broken in odd ways that made fixing it painful. Here’s the secret handshake you need to do: http://matthew.mceachen.us/blog/howto-configure-an-amazon-rds-instance-to-use-utf-8-925.html
My database? Not very big, around 1GB. The reason I’m looking at RDS is primarily uptime since I’ve had trouble with a colo setup in the past. Do you think EC2 will do the job or is RDS inherently ‘better’ as far as reliability and availability goes?
For < 2 GB databases, compare prices with MongoHQ https://mongohq.com/pricing. But you will have to migrate from MySQL
RDS is great to boostrap your projects, you don’t need to worry about your database, it’s availabilty, stoarge … but when database gets bigger, you want de have control over your storage engine, replication …I wanted to leave RDS but it has been a headache, my data where around 25 Gigs, I wanted to minimize downtime, so I wanted to use "select into outfile" / "load data infile", as it sooo much faster with InnoDB, but "select into outfile" requires super privileges, which you don’t have on RDS.I have started to search a solution with mylsqldump converted into CSV so I mysqldump, convert and "load data infile" but never found something. I have started to write a script which worked for some data but didn’t work for other ( "11" != 11 , quotes in strings … so much source of errors).Do you guys have an idea how to do that ? or a script that does it ?
@smaftoul They do provide an export service at a fee and which takes time since it’s literally mailed media but its an option.
Will RDS work transparently against Java – JDBC?
@ Avishay: RDS is just MySQL running on a server you can’t ssh into, so, yes, your MySQL JDBC driver will work just fine.
Did you notice any performance issues after migration? From my limited experience it’s always seemed like local mysql meant fast mysql.
Hi. Thanks for your post. Very helpful for us with a big DB thinking of migrating to RDS.Couple of questions: Are you able to use only one master MySQL instance for all your data? I mean, 68GB of RAM sounds like a lot but I guess you guys are well over that.If not, are you guys using any sharding at all? Do you have any experience with sharding on RDS?
The listing systems I have worked on in the past that got traffic ended basically copying the active listings into cache (Memcached). These days I would lean on something like Riak or MongoDB to do the heavy lifting for the searchable data. I am sure there would still be cache involved but as far as data center outages I would think using a NoSQL database that is design to existing in many data centers would be preferred. This would also make users happier since the latency from a search in Asia would be greatly reduced since the data was located in Asia. Also I thought MySQL multi-master was pretty operations intensive. Where other options considered? I would have thought a document search would have been better just because of the constant change in the data model and addition of searchable fields. Good read thanks.
Hey Tobi, was wondering if Airbnb was still happy with there decision to switch to RDS. I’m thinking of switching myself and wondering if you’d still endorse it now about 2 years later. Have you or Airbnb made an updated post somewhere on this topic? thanks, tim