Airbnb Engineering http://nerds.airbnb.com Nerds Mon, 01 Feb 2016 22:31:49 +0000 en-US hourly 1 http://wordpress.org/?v=255 Growth at Scale: Getting to product + sharing fit http://nerds.airbnb.com/growth-at-scale/ http://nerds.airbnb.com/growth-at-scale/#comments Mon, 01 Feb 2016 22:31:49 +0000 http://nerds.airbnb.com/?p=176595394 Travelers like to talk about Airbnb. Some of this conversation occurs outside of Airbnb’s purview (like an in-person conversation or a Snap of an Airbnb listing), but other times it happens on airbnb.com, the iOS or Android apps, or on a site linking to Airbnb. In the latter case, we (Airbnb) have an opportunity to […]

The post Growth at Scale: Getting to product + sharing fit appeared first on Airbnb Engineering.

]]>
Travelers like to talk about Airbnb.

Some of this conversation occurs outside of Airbnb’s purview (like an in-person conversation or a Snap of an Airbnb listing), but other times it happens on airbnb.com, the iOS or Android apps, or on a site linking to Airbnb. In the latter case, we (Airbnb) have an opportunity to influence how the conversation transpires.

In this post, I will share a few features and optimizations that the Growth Team launched in 2015 to aid travelers in their conversation about Airbnb.

Airbnb Referrals via Twitter

The first code I shipped at Airbnb was adding a Twitter Card for the referral page. Being a Twitter nerd, I knew how nice it was to share links that are backed by Twitter Cards in order to give viewers more context (and hopefully increase the click-through rate). This is the current experience of sharing an Airbnb referral code on Twitter (via https://www.airbnb.com/invite):

Screen Shot 2016-02-01 at 1.08.21 PM

The highlight text gives the Twitter user a default tweet which they can modify. I like how ours has the dollar amount and a short message.

Screen Shot 2016-02-01 at 1.08.33 PM

This Twitter Card shows a title, image, and description for the shared link.

By adding the above Twitter Card, we reassured viewers about the referral link’s authenticity and surrounded the tweet with details which may have been left out of the user-customized message.

Airbnb Listings via Email and Facebook

Screen Shot 2016-02-01 at 1.08.42 PM

The first iteration of our new listing share widget. See the current version here: https://www.airbnb.com/rooms/5781222

Earlier this year, we looked at how people were sharing Airbnb listings and realized that most travelers preferred to share via Email and Facebook. So we ran some experiments and ended up altering our sharing widget to focus on Email and Facebook and put everything else in a “More” dropdown. This drove a significant increase in sharing since we molded our product to match people’s intended conversation format.

Airbnb via Facebook Messenger

Screen Shot 2016-02-01 at 1.08.49 PM

This US version of the desktop sharing module for Airbnb listings.

One of my personal favorite changes we made in 2015 was adding Facebook Messenger as a prioritized sharing option in front of Facebook timeline sharing. Our experiment results lead us to further tailor our sharing to match a private trip collaboration mindset instead of a mass share to all friends and followers.

Airbnb Listing Photos

Screen Shot 2016-02-01 at 1.08.56 PM

A brand new type of sharing introduced in 2015 was the ability to share individual photos of listings. We based our work on a hunch that people might want to share different photos than just the first host-picked photo for a listing. After running a successful experiment, we made each photo shareable through all of our channels so now you can email, message, pin, or even embed any listing photo!

Sharing Airbnb in 2016

On the Growth Team we are always thinking about new ways to help people share the Airbnb experience. Each time we successfully add more metadata to a social share, tune the UI, or craft delightful contextual copy, we enhance digital conversations about Airbnb. Stay tuned for more features and updates in the coming year!

The post Growth at Scale: Getting to product + sharing fit appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/growth-at-scale/feed/ 0
Airbnb Product and Engineering Teams Now Landing in Portland http://nerds.airbnb.com/portland/ http://nerds.airbnb.com/portland/#comments Fri, 15 Jan 2016 14:04:37 +0000 http://nerds.airbnb.com/?p=176595382 Airbnb is on a mission to help people belong anywhere in the world. To reach this goal, we are building an exceptional product for our guests and hosts, with a fast-growing team of some of the smartest people in the industry. That’s why today, we are excited to announce that we are expanding our presence […]

The post Airbnb Product and Engineering Teams Now Landing in Portland appeared first on Airbnb Engineering.

]]>
Airbnb is on a mission to help people belong anywhere in the world. To reach this goal, we are building an exceptional product for our guests and hosts, with a fast-growing team of some of the smartest people in the industry. That’s why today, we are excited to announce that we are expanding our presence to include a product and engineering team at our award-winning office in Portland, Oregon.

Airbnb_PDXoffice_Bittermann_12-10-2104_-6 (1)

Airbnb_PDXoffice_Bittermann_12-10-2104_-2 (1)

This is the first time we’ve had engineers, product managers, designers, usability researchers, and data scientists outside of San Francisco, and we wanted to take a really thoughtful approach to expanding our team. For the last few months, we’ve had an engineering landing team who have laid the groundwork for some of the cool projects the team is going to work on this year, as well as scoped out the best places to get caffeinated. Additionally, we’ve gotten to know a lot of folks in Portland through our hosting the HackPDX Winter Hackathon in December, and we are excited to announce that in 2016 we will be sponsoring Django Girls Portland. Our new product team in Portland is going to build a world-class platform and set of tools for our global customer experience organization. To learn more about some of the projects the team has tackled so far, check out this blog post from Emre, one of the engineers on the team.

If you are an engineer, designer, data scientist, or product manager and you’re as excited about this as we are, join us.

The post Airbnb Product and Engineering Teams Now Landing in Portland appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/portland/feed/ 0
Mobile Infrastructure http://nerds.airbnb.com/mobile-infrastructure/ http://nerds.airbnb.com/mobile-infrastructure/#comments Mon, 14 Dec 2015 21:26:26 +0000 http://nerds.airbnb.com/?p=176595377 Building a Continuous testing environment for Android Performance Performance improvements in your application shouldn’t be put off until the last minute! But sadly, performance profiling & tooling is still pretty manual and archaic. In this talk, Colt McAnlis will walk through how to build a automated perf-testing environment for your code. Then, you can run […]

The post Mobile Infrastructure appeared first on Airbnb Engineering.

]]>

Building a Continuous testing environment for Android Performance

Performance improvements in your application shouldn’t be put off until the last minute! But sadly, performance profiling & tooling is still pretty manual and archaic. In this talk, Colt McAnlis will walk through how to build a automated perf-testing environment for your code. Then, you can run tests daily, weekly, or when your co-worker checks in their code (you know the one.. they like to use ENUMs everywhere….)

Colt McAnlis is a Developer Advocate at Google focusing on Performance & Compression; Before that, he was a graphics programmer in the games industry working at BlizzardMicrosoft (Ensemble), and Petroglyph. He’s been an Adjunct Professor at SMU Guildhall, a UDACITY instructor (twice), and a Book Author. When he’s not working with developers, Colt spends his time preparing for an invasion of giant ants from outer space.

You can follow him on G+TwitterGithubLinkedin, or his Blog

Deep Link Dispatch on Android

Deep links are great opportunities to engage users by linking them to deeper content within an application. While Android provides mechanisms to handle deep links, it’s not perfect and leaves opportunity to improve how deep links are managed. We’ll present our tool, Deep Link Dispatch, and demonstrate how Airbnb handles deep links in a structured and convenient way.

Christian Deonier is a member of the Airbnb Android team focused on foundation for features and architecture of the application. He is also the co-creator of DeepLinkDispatch, an Android library for deep links. Before joining Airbnb, he focused on making mobile applications on both iOS and Android, and also worked at Oracle. In his free time, he races cars and, true to Airbnb, travels frequently.

Felipe Lima is a Brazilian Software Engineer at Airbnb working on the Android team, focused on its infrastructure, developer productivity and Open Source tools. Before joining Airbnb, Felipe worked at We Heart It.

Localization on iOS

Airbnb is an international company that aims to bring a local experience to all of our users. Presenting information in a way that is appropriate for the user’s locale is crucial. In this talk, we will share some of the engineering challenges involved in delivering truly local experiences on iOS, and how we’ve solved them here at Airbnb.

Youssef Francis is a Software Engineer on the Airbnb iOS team, focused on improving the search, discovery and booking experience on mobile. Before joining Airbnb, he founded a small startup dedicated to building usability enhancements for jailbroken iOS devices, and has been a member of the iOS jailbreak community since 2007. He likes to play board games and solve puzzles.

The post Mobile Infrastructure appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/mobile-infrastructure/feed/ 0
How well does NPS predict rebooking? http://nerds.airbnb.com/nps-rebooking/ http://nerds.airbnb.com/nps-rebooking/#comments Thu, 10 Dec 2015 23:16:14 +0000 http://nerds.airbnb.com/?p=176595350 Data scientists at Airbnb collect and use data to optimize products, identify problem areas, and inform business decisions. For most guests, however, the defining moments of the “Airbnb experience” happen in the real world – when they are traveling to their listing, being greeted by their host, settling into the listing, and exploring the destination. […]

The post How well does NPS predict rebooking? appeared first on Airbnb Engineering.

]]>
Data scientists at Airbnb collect and use data to optimize products, identify problem areas, and inform business decisions. For most guests, however, the defining moments of the “Airbnb experience” happen in the real world – when they are traveling to their listing, being greeted by their host, settling into the listing, and exploring the destination. These are the moments that make or break the Airbnb experience, no matter how great we make our website. The purpose of this post is to show how we can use data to understand the quality of the trip experience, and in particular how the ‘Net promoter score’ adds value.

Currently, the best information we can gather about the offline experience is from the review that guests complete on Airbnb.com after their trip ends. The review, which is optional, asks for textual feedback and rating scores from 1-5 for the overall experience as well as subcategories: Accuracy, Cleanliness, Checkin, Communication, Location, and Value. Starting at the end of 2013, we added one more question to our review form, the NPS question.

photo1

NPS, or the “Net Promoter Score”, is a widely used customer loyalty metric introduced by Fred Reicheld in 2003 [https://hbr.org/2003/12/the-one-number-you-need-to-grow/ar/1] . We ask guests “How likely are you to recommend Airbnb to a friend?” – a question called “likelihood to recommend” or LTR. Guests who respond with a 9 or 10 are labeled as “promoters”, or loyal enthusiasts, while guests who respond with a score of 0 to 6 are “detractors”, or unhappy customers. Those who leave a 7 or 8 are considered to be “passives”. Our company’s NPS (Net Promoter Score) is then calculated by subtracting the percent of “detractors” from the percent of “promoters”, and is a number that ranges from -100 (worst case scenario: all responses are detractors) to +100 (best case scenario: all responses are promoters).

By measuring customer loyalty as opposed to satisfaction with a single stay, NPS surveys aim to be a more effective methodology to determine the likelihood that the customer will return to book again, spread the word to their friends, and resist market pressure to defect to a competitor. In this blog post, we look to our data to find out if this is actually the case. We find that higher NPS does in general correspond to more referrals and rebookings. But we find that controlling for other factors, it does not significantly improve our ability to predict if a guest will book on Airbnb again in the next year. Therefore, the business impact of increasing NPS scores may be less than what we would estimate from a naive analysis.

Methodology

We will refer to a single person’s response to the NPS question as their LTR (likelihood to recommend) score. While NPS ranges from -100 to +100, LTR is an integer that ranges from 0 to 10. In this study, we look at all guests with trips that ended between January 15, 2014 and April 1, 2014. If a guest took more than one trip within that time frame, only the first trip is considered. We then try to predict if the guest will make another booking with Airbnb, up to one year after the end of the first trip.

One thing to note is that leaving a review after a trip is optional, as are the various components of the review itself. A small fraction of guests do not leave a review or leave a review but choose not to respond to the NPS question. While NPS is typically calculated only from responders, in this analysis we include non-responders by factoring in both guests who do not a leave a review as well as those who leave a review but choose not to answer the NPS question.

To assess the predictive power of LTR, we control for other parameters that are correlated with rebooking. These include:

  • Overall review score and responses to review subcategories. All review categories are on a scale of 1-5.
  • Guest acquisition channel (e.g. organic or through marketing campaigns)
  • Trip destination (e.g. America, Europe, Asia, etc)
  • Origin of guest
  • Previous bookings by the guest on Airbnb
  • Trip Length
  • Number of guests
  • Price per night
  • Month of checkout (to account for seasonality)
  • Room type (entire home, private room, shared room)
  • Number of other listings the host owns

We acknowledge that our approach may have the following shortcomings:

  • There may be other forms of loyalty not captured by rebooking. While we do look at referrals submitted through our company’s referral program, customer loyalty can also be manifested through word of mouth of referrals that are not captured in this study.
  • There may be a longer time horizon for some guests to rebook. We look one year out, but some guests may travel less frequently and would rebook in two to three years.
  • One guest’s LTR may not be a direct substitute for the aggregate NPS. It is possible that even if we cannot accurately predict one customer’s likelihood to rebook based on their LTR, we would fare better if we used NPS to predict an entire cohort’s likelihood to rebook.

Despite these shortcomings, we hope that this study will provide a data informed way to think about the value NPS brings to our understanding of the offline experience.

Descriptive Stats of the Data

Our data covers more than 600,000 guests. Our data shows that out of guests who submitted a review, two-thirds of guests were NPS promoters.  More than half gave an LTR of 10. Of the 600,000 guests in our data set, only 2% were detractors.

photo2

While the overall review score for a trip is aimed at assessing the quality of the trip, the NPS question serves to gauge customer loyalty. We look at how correlated these two variables are by looking at the distributions of LTR scores broken down by overall review score. Although the LTR and overall review rating are correlated, they do provide some differences in information. For example, of the small number of guests who had a disappointing experience and left a 1-star review, 26% were actually promoters of Airbnb, indicating that they were still very positive about the company.

photo3

Keeping in mind that a very small fraction of our travelers are NPS detractors and that LTR is heavily correlated to the overall review score, we investigate how LTR correlates to rebooking rates and referral rates.

We count a guest as a referrer if they referred at least one friend via our referral system in the 12 months after trip end. We see that out of guests who responded to the NPS question, higher LTR corresponds to a higher rebook rate and a higher referral rate.

photo4

photo5

Without controlling for other variables, someone with a LTR of 10 is 13% more likely to rebook and 4% more likely to submit a referral in the next 12 months than someone who is a detractor (0-6). Interestingly, we note that the increase in rebooking rates for responders is nearly linear with LTR (we did not have enough data to differentiate between people who gave responses between 0-6). These results imply that for Airbnb, collapsing people who respond with a 9 versus a 10 into one “promoter” bucket results in loss of information. We also note that guests who did not leave a review behave the same as detractors. In fact, they are slightly less likely to rebook and submit a referral than guests with LTR of 0-6. However, guests who submitted a review but did not answer the NPS question (labeled as “no_nps”) behave similar to promoters. These results indicate that when measuring NPS, it is important to keep track of response rate as well.

Next, we look at how other factors might influence rebooking rates. For instance, we find just from our 10 weeks of data that rebooking rates are seasonal. This is likely because more off season travelers tend to be loyal customers and frequent travelers.

photo6

We see that guests who had shorter trips are more likely to rebook. This could be because some guests will use Airbnb mostly for longer stays and they just aren’t as likely to take another one of those in the next year.

photo7

We also see that the rebooking rate has kind of a parabolic relationship to the price per night of the listing. Guests who stayed in very expensive listings are less likely to rebook, but guests who stayed in very cheap listings are also unlikely to rebook.

photo8

Which review categories are most predictive of rebooking?

In addition to the Overall star rating and the LTR score, guests can choose to respond to the following subcategories in their review, all of which are on a 1-5 scale:

  • Accuracy
  • Cleanliness
  • Checkin
  • Communication
  • Location
  • Value

In this section we will investigate the power of review ratings to predict whether or not a guest will take another trip on Airbnb in the 12 months after trip end. We will also study which subcategories are most predictive of rebooking.

To do this, we compare a series of nested logistic regression models. We start off with a base model, whose dependent variables include only the non-review characteristics of the trip that we mentioned in the above section:

f0 = 'rebooked ~ dim_user_acq_channel + n_guests + nights + I_(price_per_night*10) + I((price_per_night*10)^2) + guest_region + host_region + room_type + n_host_listings + first_time_guest + checkout_month

Then, we build a series of models adding one of the review categories to this base model:

f1 = f0 + communication
f2 = f0 + cleanliness
f3 = f0 + checkin
f4 = f0 + accuracy
f5 = f0 + value
f6 = f0 + location
f7 = f0 + overall_score
f8 = f0 + ltr_score

We compare the quality of each of the models `f1` to `f8` against that of the nested model `f0` by comparing the Akaike information criterion (AIC) of the fits. AIC trades off between the goodness of the fit of the model and the number of parameters, thus discouraging overfitting.

photo9

If we were just to include one review category, LTR and overall score are pretty much tied for first place. Adding any one of the subcategories also improves the model, but not as much as we were to include overall score or LTR.

Next, we adjust our base model to include LTR and repeat the process to see what is the second review category we could add.

photo10

Given LTR, the next subcategory that will improve our model the most is the overall review score. Adding a second review category to the model only marginally improves the fit of the model (note the difference is scale of the two graphs).

We repeat this process, incrementally adding review categories to the model until the models are not statistically significant anymore. We are left with the following set of review categories:

  • LTR
  • Overall score
  • Any three of the six subcategories

These findings show that because the review categories are strongly correlated with one another, once we have the LTR and the overall score, we only need three of the six subcategories to optimize our model. Adding more subcategories will add more degrees of freedom without significantly improving the predictive accuracy of the model.

Finally we tested the predictive accuracies of our models:

 Categories

Accuracy

LTR Only

55.997%

Trip Info Only

63.495%

Trip info + LTR

63.58%

Trip info + Other review categories

63.593%

Trip Info + LTR + Other review categories

63.595%

Using only a guest’s LTR at the end of trip, we can accurately predict if they will rebook again in the next 12 months 56% of the time. Given just basic information we know about the guest, host and trip, we improve this predictive accuracy to 63.5%. Adding review categories (not including LTR), we add an additional  0.1% improvement. Given all this, adding LTR to the model only improves the predictive accuracy by another 0.002%.

Conclusions

Post trip reviews (including LTR) only marginally improves our ability to predict whether or not a guest rebooks 12 months after checkout. Controlling for trip and guest characteristics, review star ratings only improve our predictive accuracy by ~0.1%. Out of all the review subcategories, LTR is the most useful in predicting rebooking, but it only adds 0.002% increase in predictive accuracy if we control for other review categories. This is because LTR and review scores are highly correlated.

Reviews serve purposes other than to predict rebooking. They enable trust in the platform, help hosts build their reputation, and can also be used for host quality enforcement. We found that guests with higher LTR are more likely to refer someone through our referral program. They could also be more likely to refer through word of mouth. Detractors could actually detract potential people from joining the platform. These additional ways in which NPS could be connected to business performance are not explored here. But given the extremely low number of detractors and passives and the marginal power post trip LTR has in predicting rebooking, we should be cautious putting excessive weight on guest NPS.

The post How well does NPS predict rebooking? appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/nps-rebooking/feed/ 0
How Technology and Engineers Can Impact Social Change http://nerds.airbnb.com/impacting-social-change/ http://nerds.airbnb.com/impacting-social-change/#comments Wed, 18 Nov 2015 23:55:32 +0000 http://nerds.airbnb.com/?p=176595344 At the OpenAir 2015 conference, the panel discussion “How Tech Can Reach Underserved Communities” explored how technology can create positive social change—and how engineers in particular can make a difference. The panelists: Alanna Scott, engineer, Airbnb Grace Garey, co-founder, Watsi Raquel Romano, software engineer, Google.org Moderator: Mario Lugay, impact advisor, Kapor Center for Social Impact […]

The post How Technology and Engineers Can Impact Social Change appeared first on Airbnb Engineering.

]]>

At the OpenAir 2015 conference, the panel discussion “How Tech Can Reach Underserved Communities” explored how technology can create positive social change—and how engineers in particular can make a difference.

The panelists:

Alanna Scott, engineer, Airbnb
Grace Garey, co-founder, Watsi
Raquel Romano, software engineer, Google.org
Moderator: Mario Lugay, impact advisor, Kapor Center for Social Impact

Some highlights (edited for brevity or clarity):

Examples of projects to reach underserved communities (3:11 in the video)

Romano said she had been working with a Google.org group focused on crisis response and reaching people before, during, and after a natural disaster. For example, the team developed data feeds that would provide warnings about impending local floods or hurricanes in relevant search results for Google users.

Scott said Airbnb started a Disaster Response Tool three years ago in the wake of Hurricane Sandy. “We were inspired by a host (in the area where the storm hit) who started opening up her home to people who had been displaced. We wanted to build something to support what she was doing and enable the rest of our host community to participate as well.”

Scott: “The Disaster Response Tool was built as a side project. But now we can activate the tool within minutes for a specific location or area that has been hit by a natural disaster. Hosts can list their space for free and we wave all of our fees and create a way for displaced people in that area to find a place to stay.”

Garey: “Watsi is entirely a social impact organization. We let people directly fund healthcare for people all around the world, and 100 percent of donations go to the patient. Technology seemed to be the answer we needed to focus on. We saw people using technology like Airbnb to bust open narrow channels to allow person-to-person interaction and create new ways to solve a problem. So we decided to do the same thing to tackle healthcare in a new way.”

How technology can make a difference (16:13)

Scott: “In the case of a natural disaster, people don’t always have reliable Internet access, or they might not have much battery left on their phone. So we’ve been thinking about how those people can use Airbnb when they are facing technical limitations.”

Helping people in disaster-hit areas may require “using old technology” rather than the latest tech, Scott continued. For example, SMS messaging often continues to work after a disaster when phone calls, email and online access can be difficult, so Airbnb has been exploring ways for users to book or accept reservations through SMS.

Romano: “We’re working on an initiative at Google.org to see how technology can help people with disabilities live more independently. What if we could recognize and translate sign language? What if we could analyze content in video and provide natural language descriptions of it?” Another area of investigation is mobility, in which “eye trackers connect to a communication device, so you can communicate with the world by typing with your eyes.”

How engineers can contribute to social change (20:46)

Scott: “We have a woman user in Florence who donates 50 percent of her Airbnb earnings to a community art project. Another user donates 10 percent of his earnings, and he and his guest decide together which local organization to contribute to. So my advice is to look at how your users are already helping other people with your product, then figure out how to scale it and open it up to your whole community.”

Romano recommended marrying your passion for technology with social issues you care about, because the two are “an amazing combination.” Find others with shared passions by asking around. “Talk to people about what they’re working on and tell them what you’re interested in.”

Romano added that “it’s really hard when you’re trying to prioritize and focus to create space and resources to work on (social impact projects). What works is when people just start doing things (for social impact) without asking for permission. You get other passionate people together and come up with a proof of concept and you can start seeing how it could be better if you had a product manager, user experience person, and multiple engineers working on it.”

Garey added that in 10 to 15 years, the areas of engineering and social change will blur. “So don’t feel like you have to make a choice between working at a company with a product that’s creating value and making a lot of money vs. doing something that’s good for the world. You can do well and do good at the same time.”

The post How Technology and Engineers Can Impact Social Change appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/impacting-social-change/feed/ 0
Confidence Splitting Criterions Can Improve Precision And Recall in Random Forest Classifiers http://nerds.airbnb.com/confidence-splitting-criterions/ http://nerds.airbnb.com/confidence-splitting-criterions/#comments Tue, 20 Oct 2015 16:34:09 +0000 http://nerds.airbnb.com/?p=176595333 The Trust and Safety Team maintains a number of models for predicting and detecting fraudulent online and offline behaviour. A common challenge we face is attaining high confidence in the identification of fraudulent actions. Both in terms of classifying a fraudulent action as a fraudulent action (recall) and not classifying a good action as a […]

The post Confidence Splitting Criterions Can Improve Precision And Recall in Random Forest Classifiers appeared first on Airbnb Engineering.

]]>
The Trust and Safety Team maintains a number of models for predicting and detecting fraudulent online and offline behaviour. A common challenge we face is attaining high confidence in the identification of fraudulent actions. Both in terms of classifying a fraudulent action as a fraudulent action (recall) and not classifying a good action as a fraudulent action (precision).

A classification model we often use is a Random Forest Classifier (RFC). However, by adjusting the logic of this algorithm slightly, so that we look for high confidence regions of classification, we can significantly improve the recall and precision of the classifier’s predictions. To do this we introduce a new splitting criterion (explained below) and show experimentally that it can enable more accurate fraud detection.

Traditional Node Splitting Criterions

A RFC is a collection of randomly grown ‘Decision Trees’. A decision tree is a method for partitioning a multi-dimensional space into regions of similar behaviour. In the context of fraud detection, identifying events as ‘0’ for non-fraud and ‘1’ for fraud, a decision tree is binary and tries to find regions in the signal space that are mainly 0s or mainly 1s. Then, when we see a new event, we can look at which region it belongs to and decide if it is a 0s region or a 1s region.

Typically, a Decision Tree is grown by starting with the whole space, and iteratively dividing it into smaller and smaller regions until a region only contains 0s or only contains 1s. Each final uniform region is called a ‘leaf’. The method by which a parent region is partitioned into two child regions is often referred to as the ‘Splitting Criterion’. Each candidate partition is evaluated and the partition which optimises the splitting criterion is used to divide the region. The parent region that gets divided is called a ‘node’.

Suppose a node has \(N\) observations and take \(L_0^{(i)}\) and \(R_0^{(i)}\) to denote the number of 0s in the left and right child respectively, and similarly \(L_1^{(i)}\) and \(R_1^{(i)}\) for 1s. So for each candidate partition \(i\) we have \(N=L_0^{(i)}+L_1^{(i)}+R_0^{(i)}+R_1^{(i)}\). Now let \(l_0^{(i)}=L_0^{(i)}/(L_0^{(i)}+L_1^{(i)}\)) be the probability of selecting a 0 in the left child node for partition \(i\), and similarly denote the probabilities \(l_1^{(i)}\), \(r_0^{(i)}\), and \(r_1^{(i)}\). The two most common splitting criterions are:

A. Gini Impurity: Choose \(i\) to minimise the probability of mislabeling i.e. \(i_{gini} = \arg \min_i H_{gini}^{(i)}\) where

$$
H_{gini}^{(i)} = \frac{L_0^{(i)}+L_1^{(i)}}{N} \big[ l_0^{(i)} (1-l_0^{(i)}) + l_1^{(i)} (1-l_1^{(i)}) \big]
+ \frac{R_0^{(i)}+R_1^{(i)}}{N} \big[ r_0^{(i)} (1-r_0^{(i)}) + r_1^{(i)} (1-r_1^{(i)}) \big]
$$

B. Entropy: Choose \(i\) to maximise the informational content of the labeling i.e. \(i_{entropy} = \arg \min_i H_{entropy}^{(i)}\) where

$$
H_{entropy}^{(i)} = – \frac{L_0^{(i)}+L_1^{(i)}}{N} \big[ l_0^{(i)} \log(l_0^{(i)}) + l_1^{(i)} \log(l_1^{(i)}) \big]
- \frac{R_0^{(i)}+R_1^{(i)}}{N} \big[ r_0^{(i)} \log(r_0^{(i)}) + r_1^{(i)} \log(r_1^{(i)}) \big].
$$

However, notice that both of these criterions are scale invariant: A node with \(N=300\) observations and partition given by \(L_1=100,L_0=100,R_1=100,R_0=0\) achieves an identical spliting criterion score \(H_{gini}=1/3\) to a node with \(N=3\) observations and \(L_1=1,L_0=1,R_1=1,R_0=0\). The former split is a far stronger result (more difficult partition to achieve) than the latter. It may be useful to have a criterion that is able to differentiate between the likelihood of each of these partitions.

image1

Confidence Splitting Criterion

Theory

Let \(L^{(i)}=L_0^{(i)}+L_1^{(i)}\) and \(R^{(i)}=R_0^{(i)}+R_1^{(i)}\). And let \(p_0\) be the proportion of 0s and \(p_1\) the proportion of 1s in the node we wish to split. Then we want to find partitions where the distribution of 0s and 1s is unlikely to have occured by random. If the null hypothesis is that each observation is a 0 with probability \(p_0\) then the probability of \(L_0^{(i)}\) or more 0s occuring in a partition of \(L^{(i)}\) observations is given by the Binomial random variable \(X(n,p)\):

$$
\mathbb{P}[X(L^{(i)},p_0) >= L_0^{(i)}] = 1 – B(L_0^{(i)};L^{(i)},p_0)
$$

where \(B(x;N,p)\) is the cumulative distribution function for a Binomial random variable \(X=x\) with \(N\) trials and probability \(p\) of success. Similarly for \(p_1\) the probability of \(L_1^{(i)}\) or more 1s occuring in the left partition is \(1-B(L_1^{(i)};L^{(i)},p_1)\). Taking the minimum of these two probabilities gives us the equivalent of a two-tailed hypothesis test. The probability is essentially the p-value under the null hypothesis for a partition at least as extreme as the one given. We can repeat the statistical test for the right partition and take the product of these two p-values to give an overall partition probability.

Now, to enable biasing the splitting towards identifying high density regions of 1s in the observation space, one idea is to modify \(B(L_0^{(i)};L^{(i)},p_0)\) to only be non zero if it is sufficiently large. In other words, replace it with

$$
\mathbb{1}_{ \{B(L_0^{(i)};L^{(i)},p_0) >= C_0 \} }B(L_0^{(i)};L^{(i)},p_0)
$$

where \(C_0 \in [0,1]\) is the minimum confidence we require. \(C_0\) might take value 0.5, 0.9, 0.95, 0.99, etc. It should be chosen to match the identification desired. Similarly for \(C_1\). Thus we propose a new splitting criterion given by:

C. Confidence: Choose \(i\) to minimise the probability of the partition being chance i.e. \(i_{confidence} = \arg \min_i H_{confidence}^{(i)}\) where

$$
H_{confidence}^{(i)} = \min_{j=0,1} \{1 – \mathbb{1}_{ \{ B(L_j^{(i)};L^{(i)},p_j) >= C_j \} } \, B(L_j^{(i)};L^{(i)},p_j) \}
* \min_{j=0,1} \{1 – \mathbb{1}_{ \{ B(R_j^{(i)};R^{(i)},p_j) >= C_j \} } \, B(R_j^{(i)};R^{(i)},p_j) \}
$$

where \(C_0\) and \(C_1\) are to be chosen to optimise the identification of 0s or 1s respectively.

Implementation

To run this proposed new splitting criterion, we cut a new branch of the Python opensource Scikit-Learn repository and updated the Random Forest Classifier Library. Two modifications were made to the analytical \(H_{confidence}^{(i)}\) function to optimise the calculation:

  1. Speed: Calculating the exact value of \(B(x;N,p)\) is expensive, especially over many candidate partitions for many nodes across many trees. For large values of \(N\) we can approximate the Binomial cumulative distribution function by the Normal cumulative distribution \(\Phi(x;Np,\sqrt{Np(1-p)})\) which itself can be approximated using Karagiannidis & Lioumpas (2007) as a ratio of exponentials.
  2. Accuracy: for large values of \(N\) and small values of \(p\) the tails of the Binomial distribution can be very small so subtraction and multiplication of the tail values can be corrupted by the precision of the machine’s memory. To overcome this we take the logarithm of the above approximation to calculate \(H_{confidence}^{(i)}\).

After these tweaks to the algorithm we find an insignificant change to the runtime of the Scikit-Learn routines. The Python code with the new criterion looks something like this:


from sklearn.ensemble import RandomForestClassifier
# using [C_0,C_1] = [0.95,0.95]
rfc = RandomForestClassifier(n_estimators=1000,criterion='conf',conf=[0.95,0.95])
rfc.fit(x_train,y_train)
pred = rfc.predict_proba(x_test)

For more details on the Machine Learning model building process at Airbnb you can read previous posts such as Designing Machine Learning Models: A Tale of Precision and Recall and How Airbnb uses machine learning to detect host preferences. And for details on our architecture for detecting risk you can read more at Architecting a Machine Learning System for Risk.

Evaluation

Data

To test the improvements the Confidence splitting criterion can provide, we use the same dataset we used in the previous post Overcoming Missing Values In A Random Forest Classifier, namely the adult dataset from the UCI Machine Learning Repository. As before the goal is predict whether the income level of the adult is greater than or less than $50k per annum using the 14 features provided.

We tried 6 different combinations of \([C_0,C_1]\) against the baseline RFC with Gini Impurity and looked at the changes in the Precision-Recall curves. As always we holdout a training set and evaluate on the unused test set. We build a RFC of 1000 trees in each of the 7 scenarios.

Results

image2

image3

Observe that \(C_0=0.5\) (yellow and blue lines) offers very little improvement over the baseline RFC, modest absolute recall improvements of 5% at the 95% precision level. However, for \(C_0=0.9\) (green and purple lines) we see a steady increase in recall from at precision levels of 45% and upwards. At 80% precision and above, \(C_0=0.9\) improves recall by an absolute amount of 10%, riing to 13% at 95% precision level. There is little variation between \(C_1=0.9\) (green line) and \(C_1=0.99\) (purple line) for \(C_0=0.9\) although \([C_0,C_1]=[0.9,0.9]\) (green line) does seem to be superior. For \(C_0=0.9\) (pale blue and pink lines), the improvement is not so impressive or consistent.

Final Thoughts

It would be useful to extend the analysis to compare the new splitting criterion against optimising existing hyper-parameters. In the Scikit-Learn implementation of RFCs we could experiment with min_samples_split or min_samples_leaf to overcome the scaling problem. We could also test different values of class_weight to capture the asymmetry introduced by non-equal \(C_0\) and \(C_1\).

More work can be done on the implementation of this methodology and there is still some outstanding analytical investigation on how the confidence thresholds \(C_j\) tie to the improvements in recall or precision. Note however that the methodology does already generalise to non binary classifiers, i.e. where j=0,1,2,3,…. It could be useful to implement this new criterion into the Apache Spark RandomForest library also.

Business Impact

For the dataset examined, the new splitting criterion seems to be able to better identify regions of higher density of 0s or 1s. Moreover, by taking into account the size of the partition and the probability of such a distribution of observations under the null hypothesis, we can better detect 1s. In the context of Trust and Safety, this translates into being able to more accurately detect fraudulent actions.

The business implications of moving the Receiver Operating Characteristic outwards (equivalently moving the Precision-Recall curve outwards) have been discussed in a previous post. As described in the ‘Efficiency Implications’ section of Overcoming Missing Values In A Random Forest Classifier post, even decimal percentage point savings in recall or precision can lead to enormous dollar savings in fraud mitigation and efficiency respectively.

The post Confidence Splitting Criterions Can Improve Precision And Recall in Random Forest Classifiers appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/confidence-splitting-criterions/feed/ 0
How We Partitioned Airbnb’s Main Database in Two Weeks http://nerds.airbnb.com/how-we-partitioned-airbnbs-main-db/ http://nerds.airbnb.com/how-we-partitioned-airbnbs-main-db/#comments Tue, 06 Oct 2015 17:08:05 +0000 http://nerds.airbnb.com/?p=176595306 “Scaling = replacing all components of a car while driving it at 100mph” – Mike Krieger, Instagram Co-founder @ Airbnb OpenAir 2015 Airbnb peak traffic grows at a rate of 3.5x per year, with a seasonal summer peak. Heading into the 2015 summer travel season, the infrastructure team at Airbnb was hard at work scaling […]

The post How We Partitioned Airbnb’s Main Database in Two Weeks appeared first on Airbnb Engineering.

]]>

“Scaling = replacing all components of a car while driving it at 100mph”

– Mike Krieger, Instagram Co-founder @ Airbnb OpenAir 2015

image1

Airbnb peak traffic grows at a rate of 3.5x per year, with a seasonal summer peak.

Heading into the 2015 summer travel season, the infrastructure team at Airbnb was hard at work scaling our databases to handle the expected record summer traffic. One particularly impactful project aimed to partition certain tables by application function onto their own database, which typically would require a significant engineering investment in the form of application layer changes, data migration, and robust testing to guarantee data consistency with minimal downtime. In an attempt to save weeks of engineering time, one of our brilliant engineers proposed the intriguing idea of leveraging MySQL replication to do the hard part of guaranteeing data consistency. (This idea is independently listed an explicit use cases of Amazon RDS’s “Read Replica Promotion” functionality.) By tolerating a brief and limited downtime during the database promotion, we were able to perform this operation without writing a single line of bookkeeping or migration code. In this blog post, we will share some of our work and what we learned in the process.

image2

First, some context

We tend to agree with our friends at Asana and Percona that horizontal sharding is bitter medicine, and so we prefer vertical partitions by application function for spreading load and isolating failures. For instance, we have dedicated databases, each running on its own dedicated RDS instance, that map one-to-one to our independent Java and Rails services. However for historical reasons, much of our core application data still live in the original database from when Airbnb was a single monolithic Rails app.

Using a client side query profiler that we built in-house (it’s client side due to the limitations of RDS) to analyze our database access pattern, we discovered that Airbnb’s message inbox feature, which allows guests and hosts to communicate, accounted for nearly 1/3 of the writes on our main database. Furthermore, this write pattern grows linearly with traffic, so partitioning it out would be a particularly big win for the stability of our main database. Since it is an independent application function, we were also confident that all cross-table joins and transactions could be eliminated, so we began prioritizing this project.

In examining our options for this project, two realities influenced our decision making. First, the last time we partitioned a database was three years ago in 2012, so pursuing this operation at our current size was a new challenge for us and we were open to minimizing engineering complexity at the expense of planned downtime. Second, as we entered 2015 with around 130 software engineers, our teams were spread across a large surface area of products–ranging from personalized search, customer service tools, trust and safety, global payments, to reliable mobile apps that assume limited connectivity–leaving only a small fraction of engineering dedicated to infrastructure. With these considerations in mind, we opted to make use of MySQL replication in order to minimize the engineering complexity and investment needed.

Our plan

The decision to use MySQL’s built-in replication to migrate the data for us meant that we no longer had to build the most challenging pieces to guarantee data consistency ourselves as replication was a proven quantity. We run MySQL on Amazon RDS, so creating new read replicas and promoting a replica to a standalone master is easy. Our setup resembled the following:

image3

We created a new replica (message-master) from our main master database that would serve as the new independent master after its promotion. We then attached a second-tier replica (message-replica) that would serve as the message-master’s replica. The catch is that the promotion process can take several minutes or longer to complete, during which time we have to intentionally fail writes to the relevant tables to maintain data consistency. Given that a site-wide downtime from an overwhelmed database would be much more costly than a localized and controlled message inbox downtime, the team was willing to make this tradeoff to cut weeks of development time. It is worth mentioning that for those who run their own database, replication filters could be used to avoid replicating unrelated tables and potentially reduce the promotion period.

Phase one: preplanning

Moving message inbox tables to a new database could render existing queries with cross-table joins invalid after the migration. Because a database promotion cannot be reverted, the success of this operation depended on our ability to identify all such cases and deprecate them or replace them with in-app joins. Fortunately, our internal query analyzer allowed us to easily identify such queries for most of our main services, and we were able to revoke relevant database permission grants for the remaining services to gain full coverage. One of the architectural tenets that we are working towards at Airbnb is that services should own their own data, which would have greatly simplified the work here. While technically straightforward, this was the most time consuming phase of the project as it required a well-communicated cross-team effort.

Next, we have a very extensive data pipeline that powers both offline data analytics and downstream production services. So the next step in the preplanning was to move all of our relevant pipelines to consume the data exports of message-replica to ensure that we consume the newest data after the promotion. One side effect of our migration plan was that the new database would have the same name as our existing database (not to be confused with the name of our RDS instances, e.g. message-master and message-replica) even though the data will diverge after the promotion. However, this actually allowed us to keep our naming convention consistent in our data pipelines, so we opted not to pursue a database rename.

Lastly, because our main Airbnb Rails app held exclusive write access to these tables, we were able to swap all relevant service traffic to the new message database replica to reduce the complexity of the main operation.

Phase two: the operation

image4

Members of the Production Infrastructure team on the big day.

Once all the preplanning work was done, the actual operation was performed as follows:

  1. Communicate the planned sub-10 minute message inbox downtime with our customer service team. We are very sensitive to the fact that any downtime could leave guests stranded in a foreign country as they try to check-in to their Airbnb, so it was important to keep all relevant functions in the loop and perform the op during the lowest weekly traffic.
  2. Deploy change for message inbox queries to use the new message database user grants and database connections. At this stage, we still point the writes to the main master while reads go to the message replica, and so this should have no outward impact yet. However we delay this step until the op began because it doubles the connection to main master, so we want this stage to be as brief as possible. Swapping the database host in the next step does not require a deploy as we have configuration tools to update the database host entries in Zookeeper, where they can be discovered by SmartStack.
  3. Swap all message inbox write traffic to the message master. Because it has not been promoted yet, all writes on the new master fail and we start clocking our downtime. While reads queries will succeed, in practice nearly all of messaging is down during this phase because marking a message as read requires a db write.
  4. Kill all database connections on the main master with the message database user introduced in step 2. By killing connections directly, as opposed to doing a deploy or cluster restart, we minimize the time it takes to move all writes to the replica that will serve as the new master, a prerequisite for replication to catch up.
  5. Verify that replication has caught up by inspecting:
    1. The newest entries in all the message inbox tables on message master and message replica
    2. All message connections on the main master are gone
    3. New connections on the message master are made
  6. Promote message master. From our experience, the database is completely down for about 30 seconds during a promotion on RDS and in this time reads on the master fail. However, writes will fail for nearly 4 minutes as it takes about 3.5 minutes before the promotion kicks in after it is initiated.
  7. Enable Multi-AZ deployment on the newly-promoted message master before the next RDS automated backup window. In addition to improved failover support, Multi-AZ minimizes latency spikes during RDS snapshots and backups.
  8. Once all the metrics look good and databases stable, drop irrelevant tables on the respective databases. This wrap-up step is important to ensure that no service consumes stale data.

Should the op have failed, we would have reverted the database host entries in Zookeeper and the message inbox functionality would have been restored almost immediately. However, we would have lost any writes that made it to the now-independent message databases. Theoretically it would be possible to backfill to restore the lost messages, but it would be a nontrivial endeavor and confusing for our users. Thus, we robustly tested each of the above steps before pursing the op.

The result

image5

Clear drop in main database master writes.

End-to-end, this project took about two weeks to complete and incurred just under 7 1/2 minutes of message inbox downtime and reduced the size of our main database by 20%. Most significantly, this project brought us significant database stability gains by reducing the write queries on our main master database by 33%. These offloaded queries were projected to grow by another 50% in coming months, which would certainly have overwhelmed our main database, so this project bought us valuable time to pursue longer-term database stability and scalability investments.

One surprise: RDS snapshots can significantly elevate latency

According to the RDS documentation:

Unlike Single-AZ deployments, I/O activity is not suspended on your primary during backup for Multi-AZ deployments for the MySQL, Oracle, and PostgreSQL engines, because the backup is taken from the standby. However, note that you may still experience elevated latencies for a few minutes during backups for Multi-AZ deployments.

We generally have Multi-AZ deployment enabled on all master instances of RDS to take full advantage of RDS’s high availability and failover support. During this project, we observed that given a sufficiently heavy database load, the latency experienced during an RDS snapshot even with Multi-AZ deployment can be significant enough to create a backlog of our queries and bring down our database. We were always aware that snapshots lead to increased latency, but prior to this project we had not been aware of the possibility of full downtime from nonlinear increases in latency relative to database load.

This is significant given that RDS snapshots is a core RDS functionality that we depend on for daily automated backups. Previous unbeknownst to us, as the load on our main database increases, so did the likelihood of RDS snapshots causing site instability. Thus in pursuing this project, we realized that it had been more urgent than we initially anticipated.

thegif

Xinyao, lead engineer on the project, celebrates after the op.

Acknowledgements: Xinyao Hu led the project while I wrote the initial plan with guidance from Ben Hughes and Sonic Wang. Brian Morearty and Eric Levine helped refactor the code to eliminate cross-table joins. The Production Infrastructure team enjoyed a fun afternoon running the operation.

Checkout more past projects from the Production Infrastructure team:

The post How We Partitioned Airbnb’s Main Database in Two Weeks appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/how-we-partitioned-airbnbs-main-db/feed/ 0
Unboxing the Random Forest Classifier: The Threshold Distributions http://nerds.airbnb.com/unboxing-the-random-forest-classifier/ http://nerds.airbnb.com/unboxing-the-random-forest-classifier/#comments Thu, 01 Oct 2015 20:18:30 +0000 http://nerds.airbnb.com/?p=176595290 In the Trust and Safety team at Airbnb, we use the random forest classifier in many of our risk mitigation models. Despite our successes with it, the ensemble of trees along with the random selection of features at each node makes it difficult to succinctly describe how features are being split. In this post, we […]

The post Unboxing the Random Forest Classifier: The Threshold Distributions appeared first on Airbnb Engineering.

]]>

In the Trust and Safety team at Airbnb, we use the random forest classifier in many of our risk mitigation models. Despite our successes with it, the ensemble of trees along with the random selection of features at each node makes it difficult to succinctly describe how features are being split. In this post, we propose a method to aggregate and summarize those split values by generating weighted threshold distributions.

Motivation

Despite its versatility and out-of-box performances, the random forest classifier is often referred to as a black box model. It is easy to see why some might be inclined to think so. First, the optimal decision split at each node is only drawn from a random subset of the feature set. And to make matters more obscure, the model generates an ensemble of trees using bootstrap samples of the training set. All these just means that a feature might split at different nodes of the same tree and possibly with different split values and this could be repeated in multiple trees.

With all this randomness being thrown around, a certainty still remains. After training a random forest, we know exactly every detail of the forest. For each node, we know what feature is used to split, with what threshold value, and with what efficiency. All the details are there, the challenge is knowing how to piece them together to built an accurate and informative description of the allegedly black box.

One common way to describe a trained random forest in terms of the features is to rank them by their importances based on their splitting efficiencies. Although this method manages to quantify the contribution of impurity decreases from each of the features, it does not shed light on how the model makes decisions from them. We propose in this post one way to concisely describe the inner decisions of the forest by mining node-by-node the entire forest and present the weighted distributions (by split efficiencies and sample sizes) of the thresholds for each feature.

Methodology

For purpose of illustrating this method, we resort to the publicly available Online News Popularity Data Set from the UCI Machine Learning Repository. The dataset contains 39,797 observations of Mashable articles each with 58 features. The positive labels for this dataset are defined as whether the number of shares for a particular article is greater or equal to 1,400. The features are all numerical and ranges from simple statistics like number of words in the content to more complex ones like the closeness to a particular LDA-derived topic.

After training the random forest, we crawl through the entire forest and extract the following information from each non-terminal (or non-leaf) node:

  1. Feature name
  2. Split threshold – The value at which the node is splitting
  3. Sample Size – Number of observations that went through the node when training
  4. Is greater than threshold? – Direction where majority of positive observations go
  5. Gini change – Decrease in impurity after split
  6. Tree index – Identifier for the tree

which can be collected into a table like the following: Screen Shot 2015-10-01 at 11.24.00 AM A common table like the above will very likely contain the same feature multiple times across different trees and even within the same tree. It might be tempting at this point to just collect all the thresholds for a particular feature and pile them up in a histogram. This, however, would not be fair since nodes where the optimal feature-threshold tuples are found from a handful of observations should not have the same weight as those found from thousands of observations. Hence, we define, for a splitting node \(i\), the sample size \(N_i\) as the number of observations that reached the splitting node i during the training phase of the random forest.

Similarly, larger impurity changes should be weighted more than those with near zero impurity change. For that, we first define the impurity change \(\Delta G(i)\) at the splitting node \(i\) as,

$$ \Delta G(i) = G(i) – \frac{L_i}{N_i} G(l_i) – \frac{R_i}{N_i} G(r_i) $$

where, \(N_i\) is the sample size of the splitting node, \(L_i\) the sample size for the left child node with index \(l_i\), and \(R_i\) the sample size for the right child node with index \(r_i\). As for the gini impurity itself at node \(i\), we define it as

$$ G(i) = 1 – \sum\limits_{k=0}^{K} \left( \frac{n_{i,k}}{N_i} \right)^2 $$

where \(K\) is the number of classes for which we are classifying, \(n_{i,k}\) the number of observations with label \(k\), and \(N_i\) total number of observations. Here is an illustration that should clarify the above definitions:

Screen Shot 2015-10-01 at 11.24.55 AM

Since our goal is that of generating an overall description of the trained model, we place more weights on those that split more efficiently. A simple combined weight factor can be just the product of the sample size and the impurity change. More specifically, we can express this combined weight for a non-terminal node \(i\) as \((1 + \Delta G(i)) \times N_i\) (the gini impurity change \(\Delta G(i)\) is added an offset of one to make its range strictly positive).

Another type of variation among nodes splitting on the same feature can be exemplified with the following case:

Screen Shot 2015-10-01 at 11.27.02 AM

where the filled circles are the positively labeled observations whereas the unfilled circles are the negatively labeled ones. In this example, although both nodes split on the same feature, their reasons to do so is quite different. In the left splitting node, the majority of the positive observations end up on the less-or-equal-to branch whereas in the right splitting node, the majority of the positive observations end up in on the greater-than branch. In other words, the left splitting node views the positively labeled observations to be more likely to have smaller feature X values whereas the right splitting node views the positively labeled observations to be more likely to have larger feature X values. This difference is accounted using the is_greater_than_threshold (or chirality) flag, where a 1 is true (or greater than) and a 0 is false (or less or equal to).

Finally, for each feature, we can build two threshold distributions (one per chirality) and have them be weighted by the combined weight of \((1 + \Delta G_i) \times N_i\).

Example

After training the classifier model, we crawl through the entire forest and collect all the information specified in the previous section. This information empowers us to describe which thresholds dominates the splittings for, say num_hrefs (number of links in the article):

 

Screen Shot 2015-10-01 at 11.28.25 AM

In the above plot we see that there are two distributions. The orange one corresponds to the nodes where the chirality is greater_than and the red-ish (or rausch) one where the chirality is less-than-equal-to. These weighted distributions for num_hrefs indicate that whenever the feature num_hrefs is used to decide whether an article is a popular one (+1,400 shares) the dominant description is greater than ~15 links, which is illustrated by the spiking bin of the greater-than distribution near 15. Another interesting illustration of this method is on the global_rate_positive_words and the global_rate_negative_words, which are defined as the proportion of, respectively, positive and negative words in the content of the article. The former one is depicted as follows: Screen Shot 2015-10-01 at 11.44.23 AM

where, as far as the model is concerned, popular articles tend to be dominated by larger global_rate_positive_words where the cutoff is dominated by 0.03. However, a more interesting distribution is that for the global_rate_negative_words:

Screen Shot 2015-10-01 at 12.36.26 PM

where the distributions indicate use of negative words greater than ~0.01 words per content size adds a healthy dose of popularity whereas too much of it, say, more than ~0.02 will make the model predict a lower popularity. This is inferred from the spike of the greater-than distribution spiking at ~0.01 whereas the less-than-equal-to distribution spikes at ~0.02.

What’s Next

In TnS we are eager to make our models more transparent. This post only deals with one very specific way to inspect the inner workings of a trained model. Other possible approaches can be by asking the following questions:

  • What observations does the model see as clusters?
  • How do features interact in a random forest?

The post Unboxing the Random Forest Classifier: The Threshold Distributions appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/unboxing-the-random-forest-classifier/feed/ 0
Mastering the tvOS Focus Engine http://nerds.airbnb.com/tvos-focus-engine/ http://nerds.airbnb.com/tvos-focus-engine/#comments Fri, 11 Sep 2015 15:34:28 +0000 http://nerds.airbnb.com/?p=176595232 The tvOS interaction paradigm presents a unique challenge to developers and designers alike. The new Apple TV pairs a trackpad-like remote with a UI lacking a traditional cursor. As a result, “focus” is the only means by which an app can provide visual feedback to its user(s) as they navigate. You can think of the […]

The post Mastering the tvOS Focus Engine appeared first on Airbnb Engineering.

]]>
The tvOS interaction paradigm presents a unique challenge to developers and designers alike. The new Apple TV pairs a trackpad-like remote with a UI lacking a traditional cursor. As a result, “focus” is the only means by which an app can provide visual feedback to its user(s) as they navigate.

You can think of the focus engine as the bridgekeeper between users and your shiny new tvOS application. Though this bridgekeeper doesn’t expect you to know the airspeed velocity of an unladen swallow, befriending the focus engine is an essential step towards building an app that feels native to this platform.

Any seasoned iOS engineer will feel at home in UIKit on tvOS—but don’t let the platforms’ notable similarities seduce you into believing they are the same. Apple has made it easy to port your iOS app to tvOS. But if you don’t consider how your application will interact with the focus engine from the outset, you’ll find yourself fighting an uphill battle as you approach the finish line.

What Does Focus Really Mean?

Users navigate a tvOS application by moving focus between items onscreen. When an item is focused, its appearance is adjusted to stand out from the appearance of other items onscreen. Focus effects are the crux of what makes tvOS communal. Focus effects provide visual feedback not only to whomever is quarterbacking the remote, but also to any onlookers who are following along. They’re what separate this native TV experience from AirPlay-ing your iPad to the big screen.

What’s Focusable?

Only views can receive focus and only one view may be in focus at a time. Consider these buttons:

Button C is currently in focus

Button C is currently in focus. Swiping left on the remote will focus button B. Swiping right will focus button D. Swiping left or right more aggressively will focus button A or button E, respectively. It’s worth noting that even though a more aggressive left swipe will result in button A ultimately gaining focus, button B will instantaneously gain (and then lose) focus in the process.

Whether a particular view is focusable is determined by a new instance method added to UIView.

Apple has audited its public frameworks and provided sensible implementations for canBecomeFocused(). Only the following classes shipped with UIKit are focusable:

  • UIButton
  • UIControl
  • UISegmentedControl
  • UITabBar
  • UITextField
  • UISearchBar (although UISearchBar itself isn’t focusable, its internal text field is)

UICollectionViewCell and UITableViewCell are exceptions. Whether a cell is focusable is determined by the UICollectionView or UITableView delegate:

  • collectionView(_:canFocusItemAtIndexPath:)
  • tableView(_:canFocusRowAtIndexPath:)

Though not focusable itself, UIImageView is also a special case with the addition of the adjustsImageWhenAncestorFocused property. When enabled, an UIImageView instance will display a focused effect whenever an ancestor receives focus. As the system provides no default focus effect for UICollectionView cells, this is an easy way to breathe life into image-based collection views. You will see this technique used extensively by Apple throughout the system UI and builtin applications.

What’s Currently in Focus?

You can ask any view whether it’s currently in focus.

You can also ask the screen for the currently focused view (if it exists).

Responding to Focus Updates

In tvOS, all classes that participate in the focus system conform to the UIFocusEnvironment protocol. Each focus environment corresponds to a particular branch of the view hierarchy, meaning that focus environments can be nested.

The UIFocusEnvironment API allows for a two-way dialogue between developers and the focus engine regarding how focus should be updated within a particular branch of the view hierarchy. UIView, UIViewController, and UIPresentationController conform to UIFocusEnvironment out of the box.

Overriding shouldUpdateFocusInContext(_:) provides an opportunity to vet the proposed focus update before it’s applied. UICollectionView and UITableView delegates provide NSIndexPath-based versions of this API where the provided context contains the previously and next focused index paths rather than the views themselves.

  • collectionView(_:shouldUpdateFocusInContext:)
  • tableView(_:shouldUpdateFocusInContext:)

Here’s a toy example showing how to use the UICollectionView delegate method to disable focus updates within a collection view when a cell has been selected.

Overriding didUpdateFocusInContext(_:withAnimationCoordinator:) provides an opportunity to take action in response to a focus update and participate in the associated animation. Given two adjacent buttons, here’s how one could horizontally center whichever one is currently in focus.

When a focus update cycle occurs, each method is invoked on the view receiving focus, the view losing focus, and all parent focus environments of these two views.

Requesting Focus

The focus engine will automatically initiate focus updates at appropriate times like app launch or when the currently focused view is removed from the view hierarchy. Developers can also request focus updates, but any requests must be issued through the focus engine. Since only the focus engine can update focus, it’s here that the focus engine most literally takes on the role of bridgekeeper.

UIFocusEnvironment’s setNeedsFocusUpdate() and updateFocusIfNeeded() interact with the focus engine in much the same manner as setNeedsLayout() and layoutIfNeeded() interact with the layout engine. Invoking setNeedsFocusUpdate() will make a note of your request and return immediately. The focus engine will recompute focus during the next update cycle. Invoking updateFocusIfNeeded() forces a focus update immediately.

When an update cycle begins, the focus engine queries the initiating focus environment for its preferredFocusedView. If this view is non-nil and focusable, the focus engine will attempt to give that view focus by issuing the aforementioned notification events through the focus responder chain.

Let’s look at a few examples with a particularly useless AnimalsViewController that I’ve created. In each example, dogButton initially has focus.

The focus engine will only honor requests that are issued by a focus environment that currently contains focus. As a result, even though catButton’s preferredFocusedView is itself, catButton’s request to update focus is ignored.

When animalsViewController requests a focus update, however, focus will move from dogButton to catButton since the currently focused view (dogButton) is within the branch of the view hierarchy governed by the animalsViewController focus environment.

When two focus environments request a focus update simultaneously, the focus engine will defer to the parent environment.

Go Forth!

With this focus engine primer in hand, you will be well on your way to taking full advantage of this new platform. The focus system and API is familiar and well-engineered, so it’s easy for it to become an afterthought as you bring your application to the TV. But the more you work on this platform the more clear it will become that harnessing the full power of the focus engine will mark the difference between an iOS port and a native tvOS experience.

The post Mastering the tvOS Focus Engine appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/tvos-focus-engine/feed/ 0
How It Ticks: Building the Airbnb Apple Watch App http://nerds.airbnb.com/airbnb-watch/ http://nerds.airbnb.com/airbnb-watch/#comments Thu, 03 Sep 2015 14:53:12 +0000 http://nerds.airbnb.com/?p=176595215 Airbnb wants to be present on all the devices that our hosts and guests use every day so that we can bring the experience to everyone no matter where they are. Earlier this summer, a small group of us set out to define what the Airbnb experience would be like on a brand new mobile […]

The post How It Ticks: Building the Airbnb Apple Watch App appeared first on Airbnb Engineering.

]]>
Airbnb wants to be present on all the devices that our hosts and guests use every day so that we can bring the experience to everyone no matter where they are. Earlier this summer, a small group of us set out to define what the Airbnb experience would be like on a brand new mobile platform: Apple Watch. We were intrigued by the benefits this inherently on-the-go wearable could possibly bring to the table.

Our initial thought was perhaps the most natural one: “let’s port our iOS app to the Watch!”. So we started by building a prototype Watch app that had the capability to browse listings in a given city, send and receive messages, as well as edit and access wish lists. Because Watch interfaces are built using Interface Builder in Xcode, this was a relatively quick, drag-and-drop process of laying out the UI elements on each screen. So, we were done, right?

Not really. It quickly became evident that the prototype we had built was not the best-possible Airbnb experience that we could deliver. Browsing listings was difficult on the small screen, and many interactions like saving to a wish list were complicated to carry out. Also, we couldn’t display all the details we needed for hosts and guests to make informed decisions, and trying to do so would inevitably result in information overload.

Simplify

One of our Core Values here at Airbnb is Simplify. We were reminded to apply this lesson to the Watch app by Marco Arment’s Redesigning Overcast’s Apple Watch app, where he goes into great detail about flattening unnecessarily complicated navigation layers. We stopped trying to port the iOS app, and started from scratch. This time, we had one very specific goal in mind: deliver the best-possible messaging experience.

The Apple Watch is very much a secondary device; a second screen, if you will, of the iPhone. Naturally, it follows that our Watch app should be just that: a lightweight extension of the Airbnb iOS app. We needed to capitalize on a targeted and important part of the Airbnb experience that embodied the on-the-go nature of the Apple Watch, and messaging fit that description perfectly. The Airbnb Apple Watch app we’re launching today finally began to take its form.

1

Developing with WatchKit: a game of trade-offs

Working with WatchKit is an interesting juggle of trade-offs. Because watchOS 1 apps are inherently tethered to the main app on the iPhone, there are many things to take note when developing within the framework of the WatchKit extension. More than ever, iOS developers at all levels need to be cognizant of the effects of their code on performance and battery usage.

A question that kept coming up is the following: should a non-trivial computation like a network request be performed on the WatchKit extension on the iPhone, or should the Watch wake the parent iOS app in the background and ask it to perform the computation?

The answer depends on the structure of the existing code. If complicated business logic is already baked into the iOS app, then the easiest option would be to invoke openParentApplication: on WKInterfaceController, so that code wouldn’t need to be duplicated in the Watch app. This triggers handleWatchKitExtensionRequest: in the parent app, which returns a dictionary containing the result of the desired computation to the WatchKit extension. Note that because the result needs to be serialized and transferred via Bluetooth, all of the values it contains must conform to NSCoding.

In our case, because we already had a separate framework within the app, AirbnbAPI, that encapsulates the networking layer, we were able to simply import this into the WatchKit extension, and perform everything within it, without having to constantly wake the iOS app. This reduces the communication overhead between the extension and its parent, and results in cleaner code.

When deciding who should be responsible for performing a computation, it is also important to keep in mind the complexity and duration of the computation. The more complex it is, the more the Watch should defer to the iPhone’s processing power. Today on watchOS 1, the performance difference is generally not perceptible. However, on the upcoming watchOS 2, one of the biggest changes is that the WatchKit extension will be moved off the iPhone and onto the Watch. As a result, this distinction will become much clearer, because code in the extension is executed directly on the Watch. Now, would it be faster for the iPhone to carry out the request, then send the result back to the Watch through Bluetooth or for the Watch to perform it by itself and avoid the communication overhead altogether?

Thinking hard about what really matters

The simplicity and laser-focus of the Apple Watch as a platform make it both a joy and a challenge to develop for at the same time. The small screen size means that not a whole lot of content can live on each screen, so a great amount of care needs to exercised when designing for it. Before embarking on this journey, all Apple Watch developers and designers should ask themselves questions like:

If this button will occupy 30% of the pixels on the screen, is it of proportional importance?

If this function is an important action that a person can and will frequently perform, should it be hidden behind the Force Touch gesture?

If this screen is one that people will care about the most because it displays the most crucial data, should it be accessible only through a “home screen” that would otherwise give the Watch app a beautiful front face?

The effort, time, and deliberation put into seemingly small decisions like these would pay back dividends at the later stages of development. We learned that the more thought we put into determining what really matters, the less time our hosts and guests would waste navigating the Watch app, and the more frustration they will avoid. One of the main yardsticks for an amazing Watch app is how quickly someone would be able to accomplish the desired task, from raising their wrist, to lowering it back down.

The final product: a messaging hub

The Airbnb Apple Watch app was not designed to replace or mirror the main iOS app. It is a messaging hub – a means to help hosts respond to their guests more quickly, and for guests to get notified of important events. With rich, interactive notifications, hosts can accept a booking request right from their wrist:

2

A guest can then respond to a host’s welcome message the second they receive that notification on their Apple Watch, using the built-in Dictation feature:

3

We went one step further, and integrated the Watch app with iOS by adding a Settings-Watch.bundle
to the main iOS app target in Xcode (Apple Watch Programming Guide), which enables hosts and guests to enter pre-recorded, quick responses in the Apple Watch settings app on their iPhone. These responses will then show up every time a “Reply” button is tapped on the Watch, allowing for one-tap responses. We hope that this will help hosts who find themselves repeatedly typing in responses to common questions from their guests, like “What’s the wifi password?“.

4

Every frame matters

At Airbnb, we really believe that Every Frame Matters. Because the Watch is still a young platform, we felt that many people are still not fully clear what the apps on their Watch can and cannot do. We wanted to make sure our hosts and guests knew how the new Watch app can help them, so we built a first-time user experience for the Watch; something we haven’t seen done before.

The designers on the Watch team, Britt Nelson, Salih Abdul-Karim, and Helen Tseng, produced a handful of beautifully illustrated and animated sequences that tell people what the Watch app does, and why we think it’s a compelling addition to their Airbnb experience. The animated illustrations are intentionally bright and light-hearted to set a friendly and approachable tone for the app.

5

The time has come!

We’re extremely excited to see the Airbnb Apple Watch app go live to our hosts and guests worldwide today. It was an incredibly fun challenge over the past few months, tackling a platform where common practices haven’t really been laid out and boundaries haven’t been fully explored.

The Watch app began as a pet project, and it slowly evolved into a first-class citizen of the Airbnb mobile experience, something the whole team is very proud of. Moving forward, we hope that the focus, discipline, and attention to detail we put into creating this app would not only inspire other developers to do the same, but also motivate ourselves as we continue to improve the Airbnb experience.

The post How It Ticks: Building the Airbnb Apple Watch App appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/airbnb-watch/feed/ 0