Chronos: A Replacement for Cron

About Harrison Shoff

by , , &

Chronos is our replacement for cron. It is a distributed and fault-tolerant scheduler which runs on top of Mesos. It’s a framework and supports custom mesos executors as well as the default command executor. Thus by default, Chronos executes SH (on most systems BASH) scripts. Chronos can be used to interact with systems such as Hadoop (incl. EMR), even if the mesos slaves on which execution happens do not have Hadoop installed. Included wrapper scripts allow transfering files and executing them on a remote machine in the background and using asynchroneous callbacks to notify Chronos of job completion or failures.

Chronos has a number of advantages over regular cron. It allows you to schedule your jobs using ISO8601 repeating interval notation, which enables more flexibility in job scheduling. Chronos also supports the definition of jobs triggered by the completion of other jobs, and it also supports arbitrarily long dependency chains.

Chronos is available on Github

The Backstory

At Airbnb, we heavily rely on data analysis to build great products. Our data-pipeline consists of many technologies such as Hadoop, MySQL, Amazon Redshift and S3. Our software engineers and analysts use a mix of Cascading, Cascalog, Hive and Pig for interfacing with Hadoop. We have scripts that export tables from a vast number of databases into S3 and we use various ETL (extract transform and load) processes to turn blobs of bytes into meaningful information. Many of these transformations consist of multiple steps and some tables are composed of a myriad of data-sources and joins.

We’re not in a private datacenter, and we aren’t running our own Hadoop cluster – we use a managed Hadoop product from Amazon, called Elastic Map/Reduce. High variance in network latency, virtualization and not having predictable I/O performance is an ongoing challenge in a cloud environment. There are many sources for errors. For example calls to web services are subject to timeouts.

In a complex processing pipeline every step increases the chance of failure. Until December last year, we were relying on a single instance with cron to kick off our hourly, daily and weekly ETL jobs. Cron is a really great tool but we wanted a system that allowed retries, was lightweight and provided an easy-to-use interface giving analysts quick insights into which jobs failed and which ones succeeded.

We also wanted a system that was highly available and could manage any workload, not just Hadoop jobs. Other requirements were that the system still could run BASH scripts and fan out the workload to many systems (as we are exporting many tables we didn’t want to just execute on one box albeit we wanted to have central management). At the same time we began looking at Mesos for data-infrastructure. Thus we made the decision to build a new lightweight, fault-tolerant scheduling tool which we named Chronos that would run on top of Mesos, using Mesos’ primitives for storing state and distributing work. Mesos also allowed us to dynamically add new workers to our pool without having to change the configuration of the existing cluster.

Chronos UI

Chronos comes with a UI which can be used to add, delete, list, modify and run jobs. It can also show a graph of job dependencies. These screenshots should give you a good idea of what Chronos can do.

Sample-chronos-ui

Check it out on Github

Over the past weeks, we have open-sourced Chronos, you can check it out on our github page: http://airbnb.github.com/chronos

Here’s the video from our Tech Talk on Chronos: https://www.youtube.com/watch?v=FLqURrtS8IA

9 comments

About Harrison Shoff

Speak Your Mind

*

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

Comments

  1. EMG

    I am currently researching Chronos for our production environment. I’ve dug around and found quite a bit of info but still have some that are unanswered. Wondering if anyone would be able to answer these questions:

    Is there any alerting or notification capability?
    Can it be intergrated with other business applications, for example, Zenoss, JIRA, Logstash…etc ?
    Is it adaptable to multizone calendars?
    Does it have the ability to create incident reports ?
    What would be the response time for each job, for example, amount of time it takes from when a request was submitted until the first response is produced ?
    Does it have reporting features
    Ability to Fail Over / Load balance
    Do we have to use the GUI or can everything be done at command line as well ?

  2. Rauf Issa

    If you want to schedule and automate ETL jobs or any kind of command line script, give JobServer a try. It scales to hundreds of servers and has robust job/task management web UI. Also, the open source developer API, soafaces, lets you build custom Tasklets using java API.

  3. Anonymous

    I gave Chronos a go over the last couple of days and what’s not obvious from the above article is that it’s firstly dependent on several other services (mesos and zookeeper along with their required native libraries), and secondly it’s essentially a list of repeating jobs. Whilst you can use it to implement a scheduler, it’s not really applicable to a daily batch kind of process. It’d be better for regular repeating jobs. Getting it all working is pretty tough too due to the fact the limited documentation seems to be out of date.

    It is pretty though!

  4. Mateusz

    i have a trivial question: how to access chronos web gui? I have started chronos successfully, i am able to access its API but do not know how to access web gui?

Trackbacks

  1. [...] piece of our infrastructure, helping us with service discovery, configuration management, and Chronos dependency [...]

  2. […] built a tool called Aurora (which it plans to open source) to handle this, while Airbnb built a tool called Chronos. Mesosphere’s Leibert and Knaup built Chronos while at Airbnb, and Marathon is a “meta […]

  3. […] “Chronos is a distributed and fault-tolerant scheduler which runs on top of Mesos. It’s a framework and supports custom mesos executors as well as the default command executor. Thus by default, Chronos executes SH (on most systems BASH) scripts. Chronos can be used to interact with systems such as Hadoop (incl. EMR), even if the mesos slaves on which execution happens do not have Hadoop installed. Included wrapper scripts allow transfering files and executing them on a remote machine in the background and using asynchroneous callbacks to notify Chronos of job completion or failures. […]

  4. […] built a tool called Aurora (which it plans to open source) to handle this, while Airbnb built a tool called Chronos. Mesosphere’s Leibert and Knaup built Chronos while at Airbnb, and Marathon is a “meta […]

  5. […] Chronos, a distributed job-scheduler that Airbnb created to account for the realities of running in the cloud, also runs on Mesos. […]