Airbnb Tech Talk #3: Ben Hindman on Mesos

by

Join us for our third Tech Talk at Airbnb HQ in sunny San Francisco, CA.

Sign Up Here

Details:

  • Where: Airbnb Offices
  • When: Wednesday, March 28th
  • Time: 6:00pm - 8:00pm
  • Sign Up: http://bit.ly/GDs30P


Tech Talk: Ben Hindman on Mesos

This talk will be about Apache Mesos, a platform that provides efficient resource isolation and sharing for distributed applications and frameworks. You can use Mesos to run Hadoop, MPI, Hypertable, Spark (a new framework for low-latency interactive and iterative jobs), and others on the same cluster. You can also use Mesos to help build novel distributed applications and frameworks. In this talk I'll discuss the architecture and motivation of Mesos, describe how to build a new application/framework on Mesos, and how Mesos is currently being used at Twitter and other places.

About Benjamin Hindman

Benjamin Hindman is one of the creators of Mesos, a platform for building and running distributed systems. He has done research in the areas of programming languages and distributed systems as a graduate student at Berkeley, where he hopes to one day finish his PhD. These days he spends most of his time hacking on Mesos at Twitter where it is being used in production.

 

Integration Testing with Selenium and Capybara

by

Life before frontend integration testing meant clicking links and buttons and testing everything by hand before launching any feature change.  But as every programmer knows, insanity is doing the same thing over and over again, and not writing code to do it for you.

Though it might seem daunting to set up a frontend integration testing environment, automating these tests is worth the effort.  

  • Javascript unit tests are great for testing programmatic interfaces to libraries, but they are ill-suited for testing behaviors dependent on the browser.
  • Testing by hand is unreliable and time-consuming.
  • Integration testing allows you to be more aggressive in refactoring frontend flows, since changes that break critical behaviors will be caught by the test.

For our frontend testing needs, we use a combination of Capybara and Selenium -- Selenium because it's a mature solution for automating browser interactions; Capybara because of its rspec-like syntax for specifying browser interactions.

Setup

Using Selenium requires installation of the selenium-webdriver gem, and Capybara requires installation of the capybara gem and the launchy gem for screengrabs. In your Gemfile:

Starting out

You can generate the test (with a template) with

rails g integration_test test_name

The auto-generated test will live in spec/requests look something like

Writing Tests

Within the test, you can use Capybara to describe any user interactions on a page.

- You can access pages:

visit some_path
click_link "Log In"

- You can interact with forms:

fill_in "email", :with => "amy.wibowo@airbnb.com"
fill_in "password", :with => "secret"
choose('radio_button_name')
check('checkbox_name')
attach_file('Image', '/path/to/image.jpg')

The selectors can be id, label, or text-- they’re lenient.  

click_button("submit")

- You can verify that the right thing happened

page.should have_content("Hi, Amy!")

- And you can test the Javascript on your page just by setting js to true

it "lets the user login via modal and email", :js => true do

sample test: Running Tests

Afterwards, you can run the tests with rake spec:requests.  They also run as a part of bundle exec rake spec

A Few Tips

- Start off by writing tests for your main flow, for the greatest ROI.
- To keep the tests from being too fragile, use text selectors that are as general as possible (i.e. If your button says “Sign up now!” just check for “Sign up” in case the text changes in the future)

Further Reading

http://railscasts.com/episodes/257-request-specs-and-capybara
http://rubydoc.info/github/jnicklas/capybara/master/file/README.md

Stay tuned for a followup post on how we integrated Capybara tests with Jenkins, our continuous integration server!

Airbnb Tech Talk #2: Eric Tschetter on Druid

by

Join us for our second Tech Talk at Airbnb HQ in sunny San Francisco, CA.


Sign Up Here

Details:

  • Where: Airbnb Offices
  • When: Wednesday, March 14th
  • Time: 6:00pm - 8:00pm
  • Sign Up: http://bit.ly/xV3Clw

 

Tech Talk: Eric Tschetter on Druid - Distributed Exploration of High Dimensional Data


Druid is a distributed system in use at Metamarkets to facilitate rapid exploration of high dimensional data. They use Druid to expose impression monetization data to internet publishers along any arbitrary combination of demographic, content and sales-based dimensions. One Druid cluster currently exposes a data set of >15 billion rows of data representing >500 billion impressions in hypercubes of varying dimensionality (largest is 28 dimensions) while allowing for exploration using top lists and timeseries in sub-second latencies. The tech talk will be a discussion of the design considerations and architecture of the system.

http://metamarkets.com/2011/druid-part-i-real-time-analytics-at-a-billion-row...

About Eric Tschetter

Eric Tschetter is the lead architect of Druid, Metamarkets' distributed, in-memory database. He held senior engineering positions at Ning and LinkedIn before joining Metamarkets. At LinkedIn, Eric productized LinkedIn's PYMK with Hadoop. He holds bachelors degrees in Computer Science and Japanese from the University of Texas at Austin, and a M.S. from the University of Tokyo in Computer Science.

 

 

How to Add Google Analytics Page Tracking to Your Backbone.js App

by

A few weeks ago we launched a new mobile version of Airbnb that was built with Backbone.js and CoffeeScript. Over the next few weeks we'll be posting about our experience and sharing a bunch of the code we wrote.

To kick things off, let's start short and simple and talk about page tracking.

Step 1. The Only Step

Open up your Router and add the following to your initialize function and add the _trackPageview function:

CoffeeScript not your cup of tea? Here's the JavaScript.

A More Thorough Explanation: 

Tracking your page views is a pretty basic requirement and is dead simple with Google Analytics (GA); make a free account, drop in their script and set some variables and you are good to go. It's not quite that straight forward when you only load one html page and the rest via JavaScript, but it can be close.

Take a look at the gist above and let's break it down.

What's going on here? In your Router's initialize function we are a binding to any routing event that is fired and calling the _trackPageView function. In Backbone, a route event is fired anytime there is a match on the routes you define. This allows you to execute code based on the current state of the URL. 

For example, if you had a router like this:

This entry matches a URL with the form: http://m.airbnb.com/#listing/86456 and would call the listing function found within the router. This function also accepts id as a parameter so we can load the appropriate listing data.

Inside the _trackPageView function we are grabbing the 'current page' from the Backbone history manager - this will return the proper fragment regardless of whether you've opted in to pushState or not. GA provides a queue to push page tracking events via a function also named _trackPageView.  This is normally called all by its lonesome on page load which adds whatever the current URL is to analytics. You can push this call onto the analytics queue manually with an additional parameter to name the page you are tracking provided it starts with a / character

Putting it all together: all we have to do is grab that fragment, prepend a "/" and push it into the queue. GA takes care of the rest. Check out your GA dashboard. You should be seeing pages tracked in the form of /login or /listings/86456 or whatever awesome URLs your app uses.

Happy coding and more to come!

Introducing Zonify: Simple EC2 Inventories With DNS

by

Zonify is available on Github

The Backstory

Server inventory systems provide a way to manage and query servers en masse. As part of their function, systems like Puppet, Bcfg2 and Chef Server provide server inventories that group servers together and associate them with config data and basic state information. Those familiar with enterprise environments may think of Remedy, a purpose-built inventory system, or the humble, LDAP machine inventories that are part of directory services products like OpenDirectory or Active Directory.

For platform automation, a server inventory provides a way to find servers by type or function, support for running commands en masse on a group of servers and a way to associate helpful mnemonics with servers as needed. In the past, I've used YAML files for describing small clusters, where each nickname, IP address and grouping was assigned by hand. A simple CLI tool maps names and groupings to resolvable addresses. To connect as the admin user to the utility0 server, you might use:

ssh admin@$(little-tool utility0)

The utility may even provide a way to run commands on multiple machines, get uptimes across a cluster, &c.

little-tool utility-servers --run cat /var/log/cron.log

Here, you find yourself writing a lot of code around job management, SSH option handling and general systems plumbing. The helpful name-based configuration offered by SSH is of no help to you, since the tool resolves the nicknames to IPs or some kind of generic, provider assigned domain name. Using a job automation tool like GNU Parallel requires thoughful filtering and reformatting of the generated names. Your browser, also, doesn't resolve nicknames; you end up deploying and maintaining DNS in addition to your tool's system of nicknames so that developers and staff can easily link to staging environment(s) and internal services, like issue tracking or reporting tools.

You may also find that deploying your little tool is not so simple: you wisely chose a high-level language and a variety of libraries when authoring it; but these all complicate deployment relative to an all-in-one shellscript.

Why We Made Zonify

These considerations led us in the direction of DNS as a server inventory storage system. The little tool can be split in two halves: something to regenerate the DNS zone and push it in to service (probably needs to be run in only one or two places) and little shell wrapper around `dig' for querying server groups in an idiomatic way. Zonify, a Ruby library, provides the first part of the system: it translates EC2 instance metadata like tags, security groups and load balancer membership in to DNS records, using SRV records to store groups of servers. Each EC2 instance gets a unique CNAME; and simple rewrite rules allow you to ensure that the "Name" tag is translated to a short name just above the root of your zone.

The Details

Zonify creates a prototype zone from EC2 information and constructs a changeset against a zone stored in Route 53, adding and removing records as needed to bring a chosen domain suffix into conformance with what's in EC2. The zone information, from Route 53 as well as EC2, can be stored to YAML files; as can the changeset. Zonify is helpful for working with your Route 53 zone even without generating records from EC2. Although we haven't tried it, the YAML format could be easily translated to the zone file format for BIND, TinyDNS or another DNS server.

Check out the Zonify Readme for more information.

Making Life Easier for Everyone

While not a substitute for a true server inventory system, DNS records provide the basics: getting a listing of all your servers, finding servers in particular groups and working with nicknames. On your way to true automation and management -- and as a fallback -- these facilities help a lot. By dynamically generating the zone from EC2 instance metadata, Zonify makes it easy for everyone in your organization to contribute to maintaining your server inventory: use the built-in EC2 tagging facility and your record is there.

Have a look at Zonify on Github

Upgrading Airbnb from Rails 2.3 to Rails 3.0

by

One of the major nerd goals for Airbnb in 2011 was upgrading to Rails 3. Our production instances made the final switch in the week leading up to Thanksgiving, but it didn't happen all at once.

Breaking the Upgrade into 3 Steps

We added the required pieces throughout the past year, and, looking back, breaking the upgrade into three major steps was easier to manage than trying to cram it into a single deploy. The first two steps are independent from actually upgrading to the Rails 3 gem, and any Rails 2.3 app out there should do them even if there's no intention of upgrading to 3.0.

Step 1: Add Bundler

This was our first step into Rails 3 land, and we did it at the end of 2010. Our app was running Rails 2.3.8 at the time, and the production instances were a mess. Each instance had its own set of gems and its own versions of those gems. Naturally, none of the instances matched.

One of our engineers picked the latest version of each gem initialized by config.gem in production to create our first Gemfile. He installed Bundler using their Rails 2.3 guide and added any missing gems when trying to start the app. He used the recommended ~> notation for gem versions at first, but that upgraded our Paperclip gem to 2.3.3. That version had performance problems in our environment, so the engineer specified exact versions for our gems instead, the versions we had fully tested.

He then ran each of our Rake tasks and added any gems those tasks needed to run. That was the last step before our Gemfile was ready to deploy in production.

Adding Bundler at the end of 2010 was a one-engineer task that took about a week working on-and-off. If Bundler had been added the same time as the rest of the Rails 3 upgrade, almost a full year later, it would have been a much bigger ordeal.

After installing Bundler, Airbnb ran without any thoughts of Rails 3 for a while. Our production environment was far cleaner, each production instance was running the same set of gems, and our deploy process was simpler.

Step 2: Install the rails_xss plugin

Templates in Rails 3 auto-escape all strings, which is a major change from 2.3 and breaks a lot of assumptions. Auto-escaping HTML is a good thing, but many of our templates assumed Ruby strings would be interpreted as HTML.

Skip ahead from our Bundler install about 6 months to Spring 2011. To bring the Rails 3 behavior into our app, Tobi Knaup, one of our engineers, chose to install the rails_xss plugin. The plugin escapes strings by default like Rails 3. He knew this was likely to break a ton, if not all, of our templates, so he setup a Rails XSS Hacksession to get every engineer's help in the installation.

Tobi listed every controller in our app at the time and divided them among the engineers. The engineers went through the templates used by each controller and made sure string output in templates was marked as html_safe or rendered with raw where appropriate.

Most of the XSS work was done in a single day with all of the engineers' help, but there was cleanup on pages that weren't used very often that continued for several weeks after installing the plugin.

With rails_xss installed, Airbnb was more secure and closer to Rails 3.

Step 3 (the big one): Add the Rails 3 Gem

This is the well-known upgrade step that includes updating the 'rails' gem to 3.x. Thankfully our Gemfile was already mature, and our templates expected most strings to be HTML-escaped when we got to this step around Thanksgiving 2011.

We knew the real upgrade required major code changes, so four engineers started the upgrade on a Saturday afternoon when there were minimal changes being committed. We asked other engineers to hold off on changes over the weekend to prevent any merge nightmares.

We kept the Rails upgrade branch separate with its own Gemset in case we needed a local working version of master before the upgrade was done. The most basic version of our workflow looked like this:

  1. Create a new rails3 branch
  2. Clone the rails3 branch into a separate directory from the usual working directory (instead of simply switching branches)
  3. Create a new gemset for the rails3 branch's directory
  4. Install the rails_upgrade plugin
  5. Follow the steps from the rails_upgrade README, skipping the step that generates a Gemfile since we were already using Bundler
  6. Try running rails console and rails server
  7. Fix any raised errors
  8. Return to step 6 if there were errors in step 7
  9. Done! Sort of...

Things that caused headaches during the upgrade:

  • The auto-generated routes file from the rails_upgrade plugin was wrong. The routes for our internal API (the API that powers our iPhone app) were busted. The problems weren't isolated to those routes, but those were the most notable. While we did use the auto-generated file as a starting point, we spent a lot of time tracing problems back to the broken routes file.
  • Timezones that worked in Rails 2.3 raised errors in 3.0. We didn't change the timezones we had stored, but various ones started raising errors. Benjamin Oakes's post about Rails Timezones explains the problem and how to start fixing it.
  • The lib directory is no longer auto-loaded. This was not called out prominently anywhere, but a quick search turns up several ways to address it. We added our lib directory to the autoload paths in application.rb
    • config.autoload_paths += %W( #{Rails.root}/lib )

The app was running on Rails 3 by the end of the weekend, and specs were passing, but there were plenty of pages that hadn't been tested.

Monday morning all of the engineers took ownership of features they had recently worked on and thoroughly tested them similar to how we approached the Rails XSS install. By the middle of the week we merged the rails3 branch into master and deployed it to our production instances.

What made the upgrade successful

If everything goes smoothly, the Rails 3 upgrade should be transparent to users. That can make the upgrade a hard sell to anyone outside engineering considering the effort that goes into it. Installing Bundler was a small project done by a single engineer, but installing the XSS plugin and doing the actual Rails 3 upgrade took considerable hours that had to be planned.

We coordinated with our product team for each of the big steps. They were planned between major product updates to make sure the codebase wasn't changing while we were upgrading.

CSS box-shadow Can Slow Down Scrolling

by

Working on one of the Chromebooks Google lets you borrow on Virgin America flights, I noticed scrolling down the page on my airbnb.com dashboard was much slower than on my normal laptop. I chalked it up to weak Chromebook hardware, but other sites were scrolling just fine. box-shadow had caused slow scrolling on our search results page before, so I did some investigation.

I used Chrome's Timeline tab to see the duration of paint events on the page. Before each test I forced a garbage collection and scrolled to the same window position using window.scroll(0, 140). Then I clicked the down arrow in the scroll bar twice, a 40px-scroll per click, and recorded the paint times.

10px box-shadow blur-radius

(original stylesheet value)
= 3 paint events per 40px scroll
Paint area size (px x px)Paint event duration (ms)
1260 x 436 122
1260 x 399 115
1260 x 399 109
1260 x 423 123
1260 x 400 117

To see if box-shadow was slowing down scrolling, I cut the blur-radius in half. The scrolling was far smoother, and the numbers showed why: paint events were taking half as long, which meant more paint events per time period.

5px box-shadow blur-radius

= 3-4 paints per 40px scroll
Paint area size (px x px)Paint event duration (ms)
1260 x 399 58
1260 x 423 59
1260 x 412 59
1260 x 399 56
1260 x 407 61
1260 x 418 66

Since box-shadow was the obvious offender, I tried taking it out entirely.

0px box-shadow blur radius

= 2 extra paint events in the same amount of time, much smoother scrolling
Paint area size (px x px)Paint event duration (ms)
1260 x 399 28
1260 x 410 29
1260 x 411 31
1260 x 410 31
1260 x 400 32
1260 x 399 28
1260 x 410 27
1260 x 411 40
1260 x 411 49
1260 x 399 46
And then I set it to something huge. The Chromebook did not like painting a 300px blur-radius. It took 2 full seconds of paint time per scroll arrow click!

300px box-shadow blur radius

= SO SLOW!

Paint area size (px x px)Paint event duration (ms)
1260 x 418 943
1260 x 418 937
1260 x 399 962
1260 x 437 1000 (a full second!)

Final Product Changes

We dropped the blur-radius for the boxes on airbnb.com/dashboard to 3px and added a 3px offset to get a cleaner look that didn't tear up performance for devices with less processing power.

3px-blur-radius

After (3px blur-radius, 3px offset)
box-shadow: 0 3px 3px 0 rgba(0,0,0,0.15);

10px-blur-radius

Before (10px blur)
box-shadow: 0 0 10px 0 rgba(0,0,0,0.15);

Why is this important? I have like 3 Chromebook visitors.

Your Chromebook audience is probably pretty small, but Chrome is built on WebKit just like the iOS and Android browsers. If CSS is hurting performance on a Chromebook it's likely hurting it for mobile WebKit users visiting your full site too. 

In case you try this at home, this is the Chromebook I was using: Google Chrome 14.0.835.204, Platform 811.154

Monitoring Your Servers....With Fire

by

Creating a speedy website is a top priority here at Airbnb.  And with 30% month-over-month growth, it's sometimes a challenge!

A few months back, the team was out at karaoke and we noticed that the DJ had these really cool cloth flame lamps on stage.  When somebody hit that perfect note belting out Journey, he would flip on the flame lamps and it created this amazing moment on stage.

Something clicked as we thought back to the little server downtime notification in NewRelic's RPM (our server monitoring tool - featured prominently on large dashboards in the office), and it was at this moment an idea was born: we needed to monitor our servers with fire.

Flame

So without further adieu we present... 

How To Monitor Your Servers With Flame Lamps

1. Get yourself a flame lamp

You can buy these at Amazon.  Actually any appliance with an outlet will do (lights, sounds, etc), so feel free to be creative (but not too annoying if you're going to have lots of downtime).

2. Find a USB Net Power 8800

This is a neat little device that allows you to control a power outlet from a USB port.  They are a bit tricky to find now, but we managed to scrounge one up on ebay.  We debated building our own with an arduino, but this was a simpler (and safer) option.

3. Setup some Python script to control it

The USB Net Power 8800 comes with some Windows software that turns it on and off.  But unfortunately we didn't have any spare windows machines around, and we really wanted to hook up the flame lamp to the large monitor/dashboards we already had in the office which run linux.

Luckily this gentleman, Paul Marks, released an open source piece of python code to do just that.   

You'll need to install the PyUSB module as well.  But when you're done you should be able to control your nice new flame lamp (or whatever is plugged into the USB Net Power) from the command line with...

python usbnetpower8800.py on

and

python usbnetpower8800.py off

4. Connect your flame lamp to the NewRelic API

Last step!  Here is a little ruby script we put together which queries the New Relic API every 10 seconds and comes back with our site's response time (in milliseconds).  If it goes over 1000 milliseconds the flame lamp turns on!

Fire it up from the command line with:

ruby flame_lamp.rb

and you'll probably want to add something like this to your cron so it can survive a reboot:

# m h  dom mon dow   command
@reboot  cd /home/cadmin/FlameLamp; ruby flame_lamp.rb > flame_lamp.log 2>&1

Now just wait for some downtime, and bask in its glory!

 

Look at those spikes in response time!  Yikes.

Overall this was a fun project that helped rally the team around site reliability.  Could we have made it cooler?  Feel free to leave us a comment below.  

And if you're someone who enjoys working on these sorts of problems (and in an environment where you can build stuff like this), we'd love to chat more.

How to Make Easy & Flexible Star Ratings

by

Here's a quick tip explaining how to create some nice star ratings. 

Stars

Setting the Stage

The goal is to avoid having a sprite that looks like this:

Big

This method works, but has some drawbacks: it uses a lot of pixels, it's a nightmare to maintain/update, and it forces you to commit to a set rating size.

Let's do better.

Each star has two states: filled and empty. We'll start here.

Here's the sprite we'll be working with (super sized stars courtesy of Airbnb's marvelous designer Steph Tekano):

Sprite
Requirements

  • Should support any number of stars. We use five star ratings right now, but another module could potentionally use ten stars or four stars later down the road.
  • Should support fractional stars. We use half stars right now, but again, this could change later on and it should be easy.
  • No JavaScript. These are static ratings. There's no interaction with them, so they should be rendered on page load.

Think Layers

There are two layers, the empty layer and the filled layer. The filled layer will sit on top of the empty layer.

Layers

We'll use repeating background images to create the number of stars we want for both the empty and filled layers. The empty layer will be a fixed width based on the number of stars we're rating. So for five stars, the width of the empty layer will be 5 x (the width of one star). 

The filled layer will have a variable width that is dependent on what the rating is. 

The magic happens in the styling.

Markup

Let's start with the markup.

Pretty straightforward.

"filled" and "empty" are pretty common class names. To be extra careful that you don't override any other selectors of the same name you can give them an additional class called "stars". 

SASS

Now let's do some styling.

We use Sass beacuse it makes life easier. Some of the big benefits are Nesting, Variables, Functions, Importing, and Mixins. It compiles everything to CSS. I recommend it because you can get things done faster. Less time styling is a good thing.

The first step is to create some variables to specify the width of a star & background offset (from the sprite), the number of stars, and the number of steps (I used 2 because I want half star ratings).

Then I created a mixin called "filled" which figures out the width of the filled div based on the size of the star, the number of steps, and the rating number $n (default is 0). The power of the mixin is that you don't have to do any math.

Both the filled and empty classes use /images/sprite.png as their background image. The empty class offsets the sprite vertically by the height of a star.

Here's what the generated CSS looks like:

The Missing Piece

You can see from stars.scss that I created classes specific to the star rating: filled_5, filled_6, filled_7 etc. Those numbers are the star rating stored as attributes on whatever the model is. All we need to do is output that rating into the markup.

Going Further

It would be a simple refactor to make a Rails helper method to output the markup with a call like <%= star_rating(@obj.rating) %> in a view. For more bonus points you could turn stars.scss into a partial called _stars.scss and then @import it into other scss files when needed in other modules. 

 

Lots of folks handle this sort of thing in different ways. This happens to be the way I like to do it because it's flexible, tries not to repeat itself, and requires one small sprite. Hope it helps. Thanks for reading!

Starss

How we improved search performance by 2x

by

If you haven't used our moving map search recently you should check it out now, because we made it more than twice as fast! Actually, every search is faster now but it's most noticeable in map mode. So how did we do that? Let's start with some background on our setup. We're a Rails shop, and we're using Sphinx as our search engine. The two are connected through ThinkingSphinx, an excellent Ruby gem that provides an easy to use query interface, and a DSL for defining indexes. The queries we run are a little bit different from what an average website does, because every single one filters results with spatial constraints (latitude/longitude). We also make heavy use of facets for the various filter options such as room type, neighborhood, or amenities.

Why was it slow?

Sphinx works great for most common use cases, but it's not optimized for spatial queries. While it gives you some basic functions to query and rank by distance, it doesn't perform any spatial indexing. The latitude and longitude fields are just floats, and spatial queries have to scan the whole index, which is of course not very performant or scalable. Also, it turns out that the configuration generated by ThinkingSphinx doesn't allow Sphinx to make use of multiple processor cores. Now while it sounds like this setup doesn't fit our requirements at all in terms of performance, Sphinx is very fast in general. Rewriting or switching to a different engine wasn't an option for us at the time so we wanted to make surgical changes to get the maximum out of it. We got help from Vlad and Rich at sphinxsearch.com, who are experts in tuning Sphinx.

How we optimized it

The first objective was to allow Sphinx to use all available processor cores. To achieve this, we split the search index into multiple parts and configured Sphinx to use them as a distributed index. Sphinx then uses one thread to search each partial index, and merges the results afterwards. Here is an example configuration snippet that makes use of two cores:

searchd
{
  …
  dist_threads = 2
  …
}

source hosting_core_0
{
  …
  sql_query = SELECT … FROM hostings WHERE id % 2 = 0
  …
}

source hosting_core_1 : hosting_core_0
{
  sql_query = SELECT … FROM hostings WHERE id % 2 = 1
}

index hosting_core_0
{
  source = hosting_core_0
  path = /home/sphinx/db/hosting_core_0
}

index hosting_core_1
{
  source = hosting_core_1
  path = /home/sphinx/db/hosting_core_1
}

index hosting_core
{
  type = distributed
  local = hosting_core_0
  local = hosting_core_1
}

What's important here is to set dist_threads to the number of processor cores, and to configure one partial index per core. It's easy to split your data into multiple indexes if you have an id column with auto_increment. Simply use the mod operator in the source config blocks. Another big performance boost came from upgrading Sphinx from 0.9.9 to 2.0. It's currently in "stable beta", which basically means that core features are production quality, whereas some newly added features might be less tested. The Sphinxsearch guys recommended it, and since we weren't using any of the cutting-edge features we felt confident to use it in production. The only downside to those changes is that we had to say goodbye to the ThinkingSphinx index configuration DSL. It doesn't support these advanced settings.

Statsd

The future

There are a few ways to get even more performance out of Sphinx. It has its own query language - SphinxQL, which allows you to bundle queries and execute them together. This is really helpful for combining multiple facet queries. It would require major changes in our app and getting rid of ThinkingSphinx though, so we'll save that for a later date. Another way to get more parallelism and scalability is to split the index across multiple machines. This works similar to same-machine distributed indexes and is easy to set up. Although Sphinx has been great for us so far, the lack of spatial indexing will become a problem at some point. We're currently exploring other architectures that provide this feature. Stay tuned.