MySQL in the cloud at Airbnb

About Tobi Knaup

Airbnb is a rapidly growing marketplace and our infrastructure needs are evolving. Like many other websites, we were using MySQL as our main datastore, but we recently migrated it to RDS. RDS (Relational Database Service) is a web service by Amazon that lets you run a scalable MySQL setup in the cloud while hiding the administrative overhead that usually comes with it. This blog post is about the challenges we faced with MySQL, how RDS solves them, and how we migrated to RDS from our old setup.

The challenges

Before we switched to RDS, Airbnb was running on a single MySQL instance on top of Amazon EC2. We have dozens of background tasks that handle things like payments and analytics. Many of the queries that are run by these tasks are expensive SELECTs, and we were starting to notice an impact on frontend performance. MySQL has an easy solution for read-intensive applications (as most websites are): just add a read-only slave server that uses asynchronous replication and divide read queries between the servers… Except setting up replication takes some care and on-going attention. Our engineering team is already maxed out, so adding that much complexity didn’t seem like a good idea.

While EC2 is generally very reliable, there is always a chance of an instance failure. The recovery process we had in place was completely manual. We would have had to launch a new instance and restart the site after we changed the host name. Our data backup solution for the old setup was to take a snapshot of the underlying EBS volume every couple of hours. In other words, the site would have been offline for a couple hours and there was a possibility of data loss.

As a website grows, the database eventually becomes a bottleneck. Scaling horizontally (sharding) is always a delicate task, and scaling vertically with our old setup would have involved a significant amount of planning and manual work. In order to have consistent data, we would have had to take the site offline, take a snapshot, then launch a bigger instance with that snapshot.

How RDS solved these challenges for us

Amazon RDS supports asynchronous master-slave replication, and it can be launched with the click of a button or an API call. The time-intensive administration tasks are all handled by RDS, and our engineers can spend more time developing features.

RDS supports Multi-AZ deployments (multiple availability zones), where it transparently keeps a hot-standby master in a different AZ that is synchronously updated. Unlike asynchronous replication, there is no replication lag, meaning that the hot standby always has the same data as the live master. In case of an instance failure, network outage, or even unavailability of the whole AZ of the master, the hot standby is automatically promoted to be the new master. The fail-over process is completely automated: AWS changes the CNAME record of the master under the hood. 

A Multi-AZ setup gives you a good amount of redundancy, but there is always a worst-case scenario. What happens if the hot standby fails too? For those rare cases (or when you don’t want to spend the extra money for Multi-AZ), RDS provides point-in-time recovery, which lets you boot a new database instance using a consistent snapshot of your data at any time within your data retention period. Yes, any second you want, and up to about five minutes before a failure! The data retention period is one day by default but can be extended by up to 8 days. In addition to that, RDS allows you to pull a consistent snapshot of your data at any time, much like an EBS snapshot. This is great for archiving.

RDS provides a number of different instance types, similar to EC2. You can start with a small, 1.7GB 1-core server, and scale up to a massive quadruple extra large 68GB 8-core. This range should cover our needs for quite some time. This operation inevitably causes some downtime, but since the whole process is automated, it is kept to a minimum. And again, there is an API for that. 

How to switch

Everything is straightforward if you are setting up a site for the first time, but what if you have an existing database with gigabytes of data that you want to migrate to RDS? This is where the fun begins. At the time we did the migration, we already had gigabytes of data and some tables with millions of records. RDS instances only expose MySQL and don’t give you direct access to the machine via SSH or other tools, so unfortunately there is no way to do this on the file system level. The only practical way to move your data over is to do a mysqldump on the source machine, followed by a mysqlimport in RDS. This can take a significant amount of time if you have a lot of data, and unless you want to spend days or weeks figuring out a complex migration strategy, the only option is to take down the site during the process. We wanted to keep the downtime as low as possible, so we looked for simple ways to transfer a significant amount of our data while the site was still running. We came up with the idea to copy the data in tables that never receive UPDATEs or DELETEs, so we would only have to copy the diff later on. Once those tables were carefully selected, the procedure was pretty straightforward. The actual script that we used to do our migration is available at https://gist.github.com/671874

This approach allowed us to perform the whole migration with only 15 minutes of downtime. The new setup eliminates the effects of long-running queries on our frontend performance, and generally seems to perform significantly better than off-the-shelf MySQL on EC2. Even more important than that, we are now well prepared for our future growth.

18 comments

About Tobi Knaup

Speak Your Mind

*

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

Comments

  1. scott

    thanks for taking the time to write this up and post it!

  2. stefanrichter

    This is really useful info, thanks. I’m running a smallish MySQL db on a dedicated machine and was wondering about RDS, or an EC2 instance with MySQL. After reading this I think I’d sleep a lot better at night with RDS. I didn’t know it came packed with such an amount of features.

  3. verbiee

    how big is your database?

  4. matthewmceachen

    Hi Tobi — make sure you check that your character encodings are correct with your RDS instance. They certainly weren’t for us by default, and the ec2 tools were broken in odd ways that made fixing it painful. Here’s the secret handshake you need to do: http://matthew.mceachen.us/blog/howto-configure-an-amazon-rds-instance-to-use-utf-8-925.html

  5. stefanrichter

    My database? Not very big, around 1GB. The reason I’m looking at RDS is primarily uptime since I’ve had trouble with a colo setup in the past. Do you think EC2 will do the job or is RDS inherently ‘better’ as far as reliability and availability goes?

  6. 437341

    For < 2 GB databases, compare prices with MongoHQ https://mongohq.com/pricing. But you will have to migrate from MySQL

  7. smaftoul

    RDS is great to boostrap your projects, you don’t need to worry about your database, it’s availabilty, stoarge … but when database gets bigger, you want de have control over your storage engine, replication …I wanted to leave RDS but it has been a headache, my data where around 25 Gigs, I wanted to minimize downtime, so I wanted to use “select into outfile” / “load data infile”, as it sooo much faster with InnoDB, but “select into outfile” requires super privileges, which you don’t have on RDS.I have started to search a solution with mylsqldump converted into CSV so I mysqldump, convert and “load data infile” but never found something. I have started to write a script which worked for some data but didn’t work for other ( “11″ != 11 , quotes in strings … so much source of errors).Do you guys have an idea how to do that ? or a script that does it ?

  8. GB

    @smaftoul They do provide an export service at a fee and which takes time since it’s literally mailed media but its an option.

  9. Avishay

    Will RDS work transparently against Java – JDBC?

  10. Matthew McEachen

    @ Avishay: RDS is just MySQL running on a server you can’t ssh into, so, yes, your MySQL JDBC driver will work just fine.

  11. Sam

    Did you notice any performance issues after migration? From my limited experience it’s always seemed like local mysql meant fast mysql.

  12. teoruiz

    Hi. Thanks for your post. Very helpful for us with a big DB thinking of migrating to RDS.Couple of questions: Are you able to use only one master MySQL instance for all your data? I mean, 68GB of RAM sounds like a lot but I guess you guys are well over that.If not, are you guys using any sharding at all? Do you have any experience with sharding on RDS?

  13. hackingthought

    The listing systems I have worked on in the past that got traffic ended basically copying the active listings into cache (Memcached). These days I would lean on something like Riak or MongoDB to do the heavy lifting for the searchable data. I am sure there would still be cache involved but as far as data center outages I would think using a NoSQL database that is design to existing in many data centers would be preferred. This would also make users happier since the latency from a search in Asia would be greatly reduced since the data was located in Asia. Also I thought MySQL multi-master was pretty operations intensive. Where other options considered? I would have thought a document search would have been better just because of the constant change in the data model and addition of searchable fields. Good read thanks.

  14. timrpeterson

    Hey Tobi, was wondering if Airbnb was still happy with there decision to switch to RDS. I’m thinking of switching myself and wondering if you’d still endorse it now about 2 years later. Have you or Airbnb made an updated post somewhere on this topic? thanks, tim

  15. Buffy

    Great post – thanks! The section about doing the diff on “create only” tables is going to save me a ton of time.

  16. Gili Chonan

    Thank you for this post!
    Have you (also) looked at: https://developers.google.com/cloud-sql/
    If you did, can you please share the pro/con you found?

  17. Chris Weiss

    Were there other strategies you considered and disqualified? If so, why?

Trackbacks

  1. [...] that we spent some extra money on redundancy of our data store. We blogged back in November about how we moved our database to RDS with a setup that provides redundancy and failover across multiple availability zones. We are very [...]