Location Relevance… aka knowing where you want to go in places we’ve never been

About Riley Newman

by Maxim Charkov, Riley Newman & Jan Overgoor

Here at Airbnb, as you can probably imagine, we’re big fans of travel. We love thinking about the diversity of experiences our host community offers, and we spend a fair amount of time trying to make sense of the tens of thousands of cities where people are booking trips every night. If Apple has the iPad and iPhone, we have New York and Paris. And Kavajë, Außervillgraten, and Bli Bli. The tricky thing is, most of us haven’t been to Bli Bli. So we try to come up with creative ways to help people find the experience they’re looking for in places we know very little about. The key to this is our search algorithm – a system that combines dozens of signals to surface the listings guests want. In the early days, our approach was pretty straightforward. Lacking data or personal experience to guide an estimate of what people would want, we returned what we considered to be the highest quality set of listings within a certain radius from the center of wherever someone searched (as determined by Google).

SF heatmap of listings returned without location relevance model

SF heatmap of listings returned without location relevance model

This was a decent first step, and our community worked with it resiliently. However, for a company based in San Francisco, we didn’t have to look far to realize this wasn’t perfect. A general search for our city would return great listings but they were scattered randomly around town, in a variety of neighborhoods, or even outside of town. This is a problem because the location of a listing is as significant to the experience of a trip as the quality of the listing itself. However, while the quality of a listing is fairly easy to measure, the relevance of the location is dependent upon the user’s query. Searching for San Francisco doesn’t mean you want to stay anywhere in San Francisco, let alone the Bay Area more broadly. Therefore, a great listing in Berkeley shouldn’t come up as the first result for someone looking to stay in San Francisco. Conversely, if a user is specifically looking to stay in the East Bay, their search result page shouldn’t be overwhelmed by San Francisco listings, even if they are some of the highest quality ones in the Bay Area. Exponential distance curve So we set out to build a location relevance signal into our search model that would endeavor to return the best listings possible, confined to the location a searcher wants to stay. One heuristic that seems reasonable on the surface is that listings closer to the center of the search area are more relevant to the query. Given that intuition, we introduced an exponential demotion function based upon the distance between the center of the search and the listing location, which we applied on top of the listing’s quality score.

SF heatmap with distance demotion

]4 SF heatmap with distance demotion

This got us past the issue of random locations, but the signal overemphasized centrality, returning listings predominantly in the city center as opposed to other neighborhoods where people might prefer to stay. Sigmoid distance curve To deal with this, we tried shifting from an exponential to a sigmoid demotion curve. This had the benefit of an inflection point, which we could use to tune the demotion function in a more flexible manner. In an A/B test, we found this to generate a positive lift, but it still wasn’t ideal – every city required individual tweaking to accommodate its size and layout. And the city center still benefited from distance-demotion. There are, of course, simple solutions to a problem like this. For example, we could expand the radius for search results and diminish the algorithm’s distance weight relative to weights for other factors. But most locations aren’t symmetrical or axis-aligned, so by widening our radius a search for New York could – gasp – return listings in New Jersey. It quickly became clear that predetermining and hardcoding the perfect logic is too tricky when thinking about every city in the world all at once. Listing density relative to distance from city center, select market comparison

So we decided to let our community solve the problem for us. Using a rich dataset comprised of guest and host interactions, we built a model that estimated a conditional probability of booking in a location, given where the person searched. A search for San Francisco would thus skew towards neighborhoods where people who also search for San Francisco typically wind up booking, for example the Mission District or Lower Haight.

Choropleth of probability of booking given a general query for San Francisco

]7 Choropleth of probability of booking given a general query for San Francisco

This solved the centrality problem and an A/B test again showed positive lift over the previous paradigm.

SF heatmap with location relevance signal

]8 SF heatmap with location relevance signal

However, it didn’t take long to realize the biases we had introduced. We were pulling every search to where we had the most bookings, creating a gravitational force toward big cities. A search for a smaller location, such as the nearby surf town Pacifica, would return some listings in Pacifica and then many more in San Francisco. But the urban experience San Francisco offers doesn’t match the surf trip most Pacifica searchers are planning. To fix this, we tried normalizing by the number of listings in the search area. In the case of Pacifica, we now returned other small beach towns over SF. Victory!

Change in location ranking score before and after normalization

]9 Change in location ranking score before and after normalization

At this point we were close to solving the problem, but something still didn’t feel right. In the earlier world of randomly-scattered listings, there were a number of serendipitous bookings. The mushroom dome, for example, is a beloved listing for our community, but few people find it by searching for Aptos, CA. Instead, the vast majority of mushroom dome guests would discover it while searching for Santa Cruz. However by tightening up our search results for Santa Cruz to be great listings in Santa Cruz, the mushroom dome vanished. Thus, we decided to layer in another conditional probability encoding the relationship between the city people booked in and the cities they searched to get there. Searching for Santa Cruz The relationship between the two conditional probabilities we used is displayed in the graph to the right. While all of the cities in the graph have a low booking likelihood relative to Santa Cruz itself, they are also mostly small markets and we can give them some credit for depending on Santa Cruz for searches for their bookings. At the same time places like San Jose and Monterey have no clear connection to Santa Cruz, so we can consider them as completely separate markets in search. It’s important that improvements to the model do not lead to regressions in other parts of the world. In this case, little changed for our bigger markets like San Francisco. But this additional signal brings back the mushroom dome and other remote but iconic properties, facilitating the unique experiences our community is looking for. The location relevance model that we built during this effort relies completely on data from our users’ behavior. We like this because it allows our community to dynamically inform future guests where they will have great experiences, and allows us to apply the model uniformly to all of the places around the world where our hosts are offering up places to stay. * * *Huge thanks to

Stamen Design and OpenStreetMap.org for sharing their map tiles and data, respectively.

17 comments

About Riley Newman

Speak Your Mind

*

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

Comments

  1. Richard

    Fascinating post! Thanks for writing it up.

  2. Joe Ward

    Quick thought. What if your users agreed to let you track their GEO IP during the duration of their stay?

    If so, you could build in a data model that included the efficiency of their stay, i.e. the common places they visit, travel time, etc.

    This could result in a “stay efficiency” score. When people search for center location x, their stay at locations L1, L2, L3, …, Ln tends to increase/decrease travel distance/time to common locations.

    All else being equal, you could determine the locations that are most efficient in terms of overall travel time to most common local destinations.

  3. Mike Halvorsen

    I am blown away by how in depth your search algorithm is. My guess is most of the other lodging sites do not have anything nearly as complex as the work you have done.

  4. Chris Collins

    “…it allows our community to dynamically inform future guests where they will have great experiences.”

    Great solution & nice writeup.

  5. Chuck Reynolds

    I love the data dive here and the process. Thanks for sharing. I expect more like this from you guys ;)

  6. Shishir

    A great read, thanks for such a detailed analysis!

  7. Filip

    What an in-depth post, thanks for sharing! I’m stunned by the complexity of your algorithms.

  8. Luke M

    Love the openness and clarity of this post. Great work!

  9. Christina Petersen

    I love how Airbnb strives for nothing but the best! Even after hard work working on multiple algorithms- you guys still found ways to refine the search further! I love the way you guys think of things we never would, and will never have to!

  10. Joe Murphy

    Minor note: It’s spelled choropleth, not chloropleth.

  11. Ahmad Zaenudin

    This algorithm seems so sophisticated but in reality it is creating a mess. As a host of airbnb myself, the new system unfavorably moved my listing from the first page which has been there for the last four months to the eleventh, from averagely 20-30 views per day to 2-3 views per day, sometimes not even a soul. The system does not solve the problem as it is intended to be. Here it is the fact:

    Some hosts lists their place not on their true suburb but on the main city. For example: For hosts in Perth metropolitan area they just put the city of Perth, not the name of the suburb. The new algorithm system favourably place those hosts into first page though actually they are not in Perth CBD. For honest host like me who put my real suburb…well God knows where my listings go…
    The guests will search the main city – not the suburb. A guest who will go to Perth Western Australia, certainly won’t search a quiet suburb like City Beach. But they will chose City Beach, because it appears in the search results in firit page, because the performance of the host is excellent and the place is not far from CBD.
    The airbnb is not flexible to allow the guest to search the area based on the radius of the map. If a guest wants to stay in a quiet place but not far from city, there is no tools how to do it. Radius extension in search area is still very important.
    My God, to rank a place based on the performance is still necessary. Please allow guests to choose which places which have a good performances, which have a good ratings etc. The new algorithm is open for a horrible place stays on the first pages leaving behind the much better performance places – just because they are on the CBD!

    I love Airbnb…I hope this give you ideas for a better system…for both guests and hosts

    Regards,
    Ahmad

    • STefano

      Completely agree with Ahmad. Sometimes the KISS rule gives much better results than Star-Trek theoretically fantastic super-algorythms.
      This is one of those cases.

    • Malibu Yurt

      I, too, am really upset with this new algorithm.

    • Jen Kel

      Me too. It’s like wow you have done really good. Let’s stuff you in the back to highlight some totally unproven fly-by-night hosts.

  12. Stephan

    Dear Riley, Jan and Maxim

    Thank you for deleting my comment I posted the day before yesterday. You will know the reasons why you won’t publish it, I for myself have no idea. Well, please check our rating which is abolutly unfair. We already did a lot for Airbnb and got rated in the middle of nowhere due to the algrithim you explaines (thank you for this explanation, very helpfull to understand). You overvalue the distance to center of search, rural hosts like we are will die out more and more.

  13. Malibu Yurt

    Thank you for explaining this and for your effort to make the AirBnB experience the best for users and hosts. However, I think this new algorithm has serious flaws. Our site with 5-stars and 97% response rate has not only disappeared from the top of the page… it has totally disappeared from a search. In fact, when I did a search for ‘Malibu’ where we are located, there was not even one pink location on the map next to Malibu – ALL of them were in other cities that were 40-90 minutes away from Malibu! Another more specific search for “California’ and ‘Yurts’ also upset me – of the 11 Yurts in California, we were listed last even though we were third in number of host reviews. There was even a Yurt listed above us with ZERO reviews even though it was open for 19 months and another with only one review in spite of being open for over a year.

    As a host, I’m obviously upset that I’ve worked really hard to have a high response rate, get great reviews, review everyone, spread AirBnb links and more… yet, now all that work gains me nothing in terms of ranking.

    And, as an AirBnB guest, I find the search engine now totally frustrating… I don’t have time to troll through pages and pages of irrelevant listings. I want a search that will help me find the best and most convenient place.

    I fear that unless this is corrected you will find that many hosts and guests will migrate to other competing sites.

  14. Anonymous

    Great work. For your next update, I’d like to point out one issue I’ve experienced with location relevance search. When someone searches for a small town, your algorithm favors large cities nearby over small towns that are much closer in location to the original town being searched. This is obviously not ideal for guests that want to stay closer to their desired town. After experimenting with it, I think I may have an idea of what causes this undesired result. When someone searches for a town, you present a map that is not centered around that town. Instead, you show a map that is centered much closer to the big city nearby. Since the map itself acts as a filter by default, all the small towns that are actually closer to the desired towns but by chance are outside the map that you present, end up being left out of the search results. If you centered the map around the desired town, you would be showing listings in neighboring towns that are closer to the original search target, rather than favoring the big city. In my tests your search excluded listings that were only 12 miles away but showed listings that were in the big city over 42 miles away (!). Could you look into this issue?