Replacing Cron & Building Scalable Data Pipelines at Airbnb
February 27, 2013
Leave a Comment
Understanding and analyzing user behavior is crucial for us at Airbnb. Our analytics team depends on very complex data pipelines for analysis, necessitating having a Data Infrastructure team for building systems and tools to support this function. In the past, we used Cron to manage these complex workflows, but we quickly realized sleep statements are not enough for managing complex dependency hierarchies. In this talk we’ll discuss how we build data pipelines. We use a tool we built in-house called Chronos, which is a distributed system for scheduling data pipelines that depends on Mesos. It allows us to build data pipelines which are easier to manage and debug. Internal tools are just as important as user-facing products. We care about how it works and how it feels, which is why Chronos doesn’t stop at the command line. We built a web interface on top of Chronos which abstracts away the complexity for building and managing distributed data pipelines. We’ll talk about how design, front-end and back-end engineering come together to build products like this, and share our experience building this as a scala project on top of mesos using backbone and dropwizard.