Airbnb Engineering http://nerds.airbnb.com Nerds Wed, 20 May 2015 21:40:48 +0000 en-US hourly 1 http://wordpress.org/?v=234 The Antidote to Bureaucracy is Good Judgement http://nerds.airbnb.com/the-antidote-to-bureaucracy-is-good-judgement/ http://nerds.airbnb.com/the-antidote-to-bureaucracy-is-good-judgement/#comments Fri, 15 May 2015 16:49:51 +0000 http://nerds.airbnb.com/?p=176594984 This was originally posted on First Round Review. Mike Curtis may not have the typical background of someone dedicated to vanquish-ing bureaucracy. AltaVista, AOL, Yahoo, Facebook — he’s a veteran of some legend-ary Silicon Valley behemoths. Now VP of Engineering for Airbnb, he’s at the helm of a small but rapidly growing team. With nearly […]

The post The Antidote to Bureaucracy is Good Judgement appeared first on Airbnb Engineering.

]]>
This was originally posted on First Round Review.

Mike Curtis may not have the typical background of someone dedicated to vanquish-ing bureaucracy. AltaVista, AOL, Yahoo, Facebook — he’s a veteran of some legend-ary Silicon Valley behemoths. Now VP of Engineering for Airbnb, he’s at the helm of a small but rapidly growing team. With nearly two decades of innovation at tech gi-ants under his belt, he’s become an expert at chopping through red tape.

As it turns out, one simple lesson guides Curtis’ approach to building effective teams: the antidote to unproductive bureaucracy is good old-fashioned judgment — having it, hiring for it, and creating conditions that allow people to exercise it. Armed with this truth, he’s tackled the challenges of scaling a world-class engineer-ing team at Airbnb, from taming the beast of expense reports to dramatically im-proving site stability. And he’s done it by eliminating rules, not making them.

At First Round’s recent CTO Summit, Curtis shared actionable tactics for what he calls “replacing policy with principles” that can guide fast, flexible growth and progress. Every startup looking to dodge a fate dictated by increasing structure and processes that inevitably slow you down can benefit from these tips.

To start, Curtis’s definition of bureaucracy (when it comes to startups) isn’t one you’ll find in the dictionary:
bu•reauc•ra•cy n. 1. The sh*t that gets in your way 2. the sh*t that gets in your engineers’ way

The curious thing about organizations is that having more people somehow doesn’t equal more output. “As size and complexity of an organization increases, productivi-ty of individuals working in that organization tends to decrease,” he says. As head-count grows, so too does the policy-and-paperwork stuff that gets in the way of rap-id iteration and scale.

Why is this the case? “I think it comes down to human nature and the way we react to problems,” Curtis says. Our natural response to any problem — from a downed server to a social gaffe — is to try to ensure that it doesn’t happen again. In compa-nies, more often that not, those solutions take the form of new policies. “What happens when you create a new policy, of course, is that you have to fit it into all of your existing rules.” And so begins a web of ever-increasing complexity that’s all about prevention. Soon, you start to hit safeguards no matter what it is you’re trying to do.

To avoid this type of bureaucracy from the very beginning of your company, you should adopt two particular tactics: “First, you have to build teams with good judg-ment, because you need to be able to put your trust in people,” Curtis says. “Then you shape that good judgment with strong principles.”

Build a Trustworthy Team By Hiring Trustworthy People

Minimizing rules that become roadblocks in your organization will only work if you’ve built a team that will make good decisions in the absence of rigid structure. Your hiring process is where you can take the biggest strides toward preventing bu-reaucracy.
“The most important question that you have to answer when you’re hiring somebody is ‘Is this person going to be energized by unknowns?’”

No company will ever achieve perfection, ever. So when things break, you want people who will be motivated by solving problems — those are the people who won’t pause to place blame, and blame is wasteful. Even if you have a honed process for screening and interviewing candidates, it’s worth revisiting how you test for cul-ture fit to make sure this is part of it.

Too many companies and engineering leaders are willing to compromise to maxim-ize technical savvy. Do not do this. Curtis recommends allocating at least 45 minutes to an interview that is entirely about culture and character. Diversity of back-grounds and opinions is championed at Airbnb, so ‘Culture fit’ is about finding peo-ple who share the high-performance work ethic and belief in the company’s mission. If people don’t share your conviction in the company’s success, they aren’t a fit.

At Airbnb, Curtis found that these four moves truly extract the most value out of this type of interview:

• Let them shine first. For the first 15 minutes of your culture interview, let a candidate describe a project they’re particularly proud of. The idea here is to get a sense of what excites them — is it technical challenges, for example, or perhaps personal interactions? “Try to suss out what gives this person energy,” Curtis says.
• Then make them uncomfortable. The other side of that coin is that you want to learn how candidates react when they’re not excited, too. Ask them about diffi-cult experiences, or moments when they were somehow not in control. Some of Curtis’s go-to questions are: “Describe a time you really disagreed with manage-ment on something. What happened?” and “Think of a time you had to cut cor-ners on a project in a way you weren’t proud of to make a deadline. How did you handle it?” This exercise is all about reactions. “Does the candidate start pointing fingers and say, ‘This is why I couldn’t get my job done, this is why this company is so screwed up’? Or do they start talking about how they understood another person’s point of view and collaborated on a solution?”
• Calibrate your results. It’s easy to see if someone nailed a coding challenge. It’s a lot harder to get comparable reads on candidates when you’re working with a group of different interviewers. It takes time to get on the same page, but you can help the process along. “We get all our interviewers together in a room and have them review several packets at the same time to help expedite the process of getting to some kind of calibration on what’s important to us,” Curtis says. Essen-tially, try to make the subjective as objective as you can.
• Watch out for signs of coaching. If a candidate seems to have uncanny com-mand of your internal language, take note. The public domain is exploding with tips and tricks from past interviewees and journalists. “Especially as your compa-ny starts getting more popular or well-known, there’s going to be a lot of stuff about you out on the Internet. If people start quoting things to you that they obvi-ously read in an article or something that is your own internal language, they were probably coached. They either read something or they talked to somebody who works at the company,” Curtis says. That’s not to say you should reject them im-mediately, just don’t let yourself be swayed.

Make the Most of First Impressions

Ideally, your culture interview ensures that you’re hiring a diverse set of people who share your beliefs and work ethic while introducing new ideas and perspectives. Once they’re in the door, you have your next key opportunity to establish shared priorities. “Your first week is the chance for you to set expectations with new engineers,” Curtis says. He’s found that adding a few key elements to the onboard-ing process pays off big down the road:

1. Remind new hires they’re working with the best. “I talk about how many peo-ple applied for their position so they understand how competitive it is to get into the company and that they’re working alongside great people,” Curtis says. Beyond its morale and excitement boosting value, this is an effective way to build a sense of ur-gency and ensure that new hires hit the ground with positive momentum.

2. Emphasize the value of moving fast. At Airbnb, Curtis’s direction to new engi-neers is to ship small things first. That can be an adjustment for people used to working on huge systems in their last gigs, but it’s proven to be a valuable way to build that all-important shared judgment. “Get a bunch of code out the door, learn how things work, then you’ll ship bigger stuff,” he says of new hires.

3. Make imperfection an asset, not a liability. Share what you were looking for all along: Someone who draws energy from unknowns. “I talk about the fact that the people who are going to be successful are the ones who see things that aren’t perfect and draw energy from them,” Curtis says. Make it clear, on the other hand, that cyni-cism and complaints will not be rewarded.

4. Review your engineering values. When you first start to build your engineering organization, it’s good to codify the values that will guide your actions. These can be worded any way that feels right to you. Examples may include, “be biased toward action” or “have strong opinions but hold them weakly.” Whatever you come up with, go through each one, clarifying what they mean to you and why they made the list. “Values can be open to interpretation, so it’s good to have a voice-over,” Curtis says.

5. Welcome new hires to the recruiting team. The people who just came through your interview process are going to be conducting them for new candidates before long, so make sure new members of your team understand that recruiting is a major and critical part of their job now. “You want them to treat it as seriously as they do writing a piece of code. They need to be really present for recruiting,” Curtis says.

6. Establish direct lines of communication. Open communication is a powerful remedy for unnecessary bureaucracy — and there’s no message more powerful than telling your team they can take concerns to upper management and meaning it. “Sometimes people feel they need to funnel all communication through their direct management. When you say that they can come to you, to the leadership several ti-ers up, they know that they can communicate openly across the whole organizati-on,” Curtis says.

7. Conduct a series of initial check-ins. Good habits are established early, so don’t let up on your efforts once new engineers are at their desks. Curtis has found that one month and three months are the sweet spots for informal check-ins. “This is su-per lightweight. All we do is collect a couple sentences of peer feedback from the people around that new engineer,” he says.

Your questions to these team members can be very straightforward. Curtis sug-gests: “How are they ramping up in an unfamiliar code base?” and “What issues have they encountered and how have they reacted?”

Share the feedback you receive with the person in writing. It will be a valuable ref-erence point for engineers as they ramp up. And if you hear any causes for concern, address them right away. Sit down with that person and clarify your expectations.

“It’s much easier to shape how someone works early on when they first start at the company than when it’s solidified a year in.”

Build the Managers You Need

When it comes to hiring and onboarding managers, though, there’s another layer to consider. These are people who are going to actively shape your company’s culture, personality and progress every day. At Airbnb, a commitment to helping managers make good decisions has manifested in an unusual policy:

“We have a philosophy that all managers start as individual contributors. We believe that if a manager doesn’t spend a significant enough time in the code base, they’re not going to have an intuitive sense of what makes engineers move faster and what gets in their way,” Curtis says.

Unsurprisingly, this can make it more difficult to hire managers. But he’s devised a four-step process engineering leaders can use to find managers who will be best for their companies long-term:

• Set the expectation. There’s no sense in surprising a candidate with the news that they won’t inherit a team halfway through the process. “The first time I talk to a manager who has plenty of management experience and wants to work with us, I’ll tell them straight out that they’re going to start as an individual contributor,” Curtis says.
• Conduct a coding interview. After years in management, it can come as a shock to jump back into algorithms on the fly. “Managers who aren’t comfortable anymore with the discipline of engineering are very likely to wash out at this stage,” he says. “But that means that the people that come through in the end are the people who are going to be able to meaningfully contribute to the code base and understand their engineers.”
• Try pairing. Maintain realistic expectations for coding interviews. “If some-body’s been out of the code base for the last five years, they’re probably pretty rusty, and they’re not going to nail it on your algorithmic whiteboard coding ques-tion the way a new grad will,” Curtis says. Pairing can be a helpful workaround. “If you didn’t get a great technical signal from them, but you’ve got a good feeling about them as a manager, do a pairing session,” Curtis says. Give the candidate a chance to shine in the context of working with an existing employee. This usually surfaces latent knowledge and gives you a sense of their dynamic with other engi-neers as they navigate code.
• Give it more time than you think. The goal is to give managers a chance to really engage with the code base, so don’t rush things — six months as an indi-vidual contributor at your company is usually about right. “The point here is to give them a chance to ship something real and establish some legacy in the code before they take on management,” Curtis says.

What It Means to Replace Policies with Principles

At this point, through careful hiring and training, you’ve built a team with good judgment. So how can you leverage that to streamline how you run your organiza-tion? “Now you can start taking a more principled approach to how you govern the organization,” Curtis says. To bring this point home, he provides several examples that succeeded at Airbnb:

OLD POLICY: All expenses require pre-approval.

NEW PRINCIPLE: If you would think twice about spending this much from your own account, gut-check it with your manager.

“I can’t tell you how much pain in my life has come from expense reports,” Curtis says. Airbnb’s old policy was a cumbersome one: Charges big and small required approval before they could be submitted. So Curtis tried replacing it with a princi-ple, simple good judgment, using $500 as a rule of thumb for when to get a gut-check. The result? No increase in discretionary spending (but a whole lot of time saved).

OLD POLICY: Engineers can’t create new backend services without approval from managers.

NEW PRINCIPLE: While working within a set of newly articulated architectural ten-ets — conceived by a group of senior technical leaders — engineers are free to de-velop backend services.

Here’s another case where policy was creating a huge amount of overhead. “You’d have to go explain what you wanted to do to your manager, explain the rationale, get them to understand, and then get them to approve and move forward,” Curtis says. So he tried something new: A group of senior engineers set up sessions to determine the architectural processes that mattered most to the organization, then articulated them in a series of architectural tenets. Guided by that document, engineers are now free to create new backend services. “It might even be okay to go outside of those architectural tenets, as long as you gut-check it with the team,” Curtis says.

The process used here ends up being even more important than the result. “It wasn’t me sending an email saying, ‘Here’s the rules by which you must create new ser-vices.’ Instead, it was a group of peers coming together,” Curtis says. “That created great social pressure within our team, which has worked incredibly well to keep us within the boundaries of what we think we should be developing with to solve our technology problems.”

Getting Changes to Stick
“I have a theory that the only way you can affect cultural change on an or-ganization is through positive reinforcement and social pressure.”

A few years ago at Airbnb, pretty much none of the code being pushed to production was peer reviewed. The team was moving fast, but site stability was suffering. Curtis knew it was time to make peer reviews a priority — but how? “This was a decision point for me. I could have written up a big email and sent it out to the team and said, ‘You must get your code reviewed before you push to production.’ But instead we took a different approach.”

Your team’s goals may be different, but the steps that Curtis used to effect this prin-cipled change can serve as a template for any paradigm shift:

Make it possible. Before you establish a new priority, make sure it’s feasible within your current systems. “It turned out that a lot of our tooling for code reviews was extremely cumbersome and painful, so it was taking too long for people to even get a code review if they wanted one,” Curtis says. So he made sure that tooling was im-proved before rolling out this initiative. People can’t do what you haven’t made pos-sible. If you don’t take this into account, they’ll be confused and resentful.

Create positive examples. Enlist a group of well-respected engineers to lead by ex-ample. In Airbnb’s case, Curtis asked a handful of senior engineers to start request-ing reviews. “It created a whole bunch of examples of great code reviews that we could draw from to set examples for the team.”

Apply social pressure. All-hands meetings can be invaluable tools for advancing a culture-shifting agenda. That time together is already booked, so why not make it work for you? “We started highlighting one or two of the best code reviews from the week before,” Curtis says. “We’d have the person who got the review talk about why it was helpful for them and why this was useful.” Your best spokesperson for a new principle a member of the team who’s already bought in.

Address stragglers. If you don’t get everyone on board on the first pass, don’t take it personally. In fact, Curtis considers converting this crowd an important final step in the process. In the case of Airbnb’s code reviews, he and his senior engineers talked to each holdout and learned what their concerns were. “Usually the end of that conversation was just ‘Give it a try for a couple of weeks, see how it goes, see if it works.’ Most of them had a very positive experience and then were brought a-long.”

In roughly two months, Curtis had made peer code reviews the overwhelming norm without establishing a single policy. “This is the power of positive reinforcement and social pressure to bring about cultural change in an organization. I didn’t hand down any edicts, I didn’t say ‘It has to be done this way from now on,’ I didn’t put any for-mal policy in place,” he says. In fact, code reviews still aren’t enforced in any way; an engineer could still go straight to production anytime — but no one does it.

At the end of the day, though, Curtis is not advocating for the unilateral elimination of all company policies. Sometimes you need rules. “A good example for us is when you’re traveling overseas, there are very specific policies about what kind of data you can have access to and what kind you can’t,” Curtis says. When the health of your organization depends on something that can’t be left open to interpreta-tion, go ahead and make a rule — but do so sparingly.

The real trick is to recognize that a policy doesn’t exist in a vacuum — it interacts with every policy that went before it — and adds to a collective mental and docu-mented overhead that adds up the bigger you get. You want to minimize this over-head however possible, and the easiest way to do that is to trust your team, and clearly articulate your values.
“It really comes down to putting your faith in people with good judgment, making sure you hire good judgment, and then guiding them with princi-ples.”

The post The Antidote to Bureaucracy is Good Judgement appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/the-antidote-to-bureaucracy-is-good-judgement/feed/ 0
OpenAir is back for 2015 http://nerds.airbnb.com/openair-tech-conference-2015/ http://nerds.airbnb.com/openair-tech-conference-2015/#comments Thu, 07 May 2015 21:08:00 +0000 http://nerds.airbnb.com/?p=176594976 We are excited to announce that Airbnb will be hosting our annual tech conference, OpenAir. We’ll be hosting it on June 4th at CityView At The Metreon from 9:00am – 7:00 pm.   OpenAir is the premier tech conference that focuses on creating engineering solutions to the challenges of matching. The brightest minds in the industry […]

The post OpenAir is back for 2015 appeared first on Airbnb Engineering.

]]>
We are excited to announce that Airbnb will be hosting our annual tech conference, OpenAir. We’ll be hosting it on June 4th at CityView At The Metreon from 9:00am – 7:00 pm.

 

OpenAir is the premier tech conference that focuses on creating engineering solutions to the challenges of matching. The brightest minds in the industry will come together to tackle such issues as search and discovery, trust, internationalization, mobile, and infrastructure.

 

We have representation from a broad swatch of companies speaking at OpenAir – Netflix, Stripe, Periscope, LinkedIn, Etsy, Pinterest, Lyft, HomeJoy, Watsi, Instagram, Facebook, and Google.org

 

This year we’ll have more technical talks and we’ll hear about Scaling from Instagram co-founder, Mikey Krieger, Innovation at Netflix from Carlos Gomez-Uribe, Reaching underserved communities from Watsi co-founder, Grace Garey, and Building Periscope from Sara Haider – among many others.

 

Attendees will get access to technical talks, hands-on sessions and thought-provoking discussions to help you break through some of your own engineering challenges and projects. Throughout the day there will be time to network with local engineers, take part in interactive sessions, drop in for lightning talks, and meet the speakers.

 

Registration is $50 and all proceeds from registration fees will be donated to CODE2040.

CODE2040 is a nonprofit organization that creates programs that increase the representation of Blacks and Latino/a in the innovation economy. CODE2040 believes the tech sector, communities of color, and the country as a whole will be stronger if talent from all backgrounds is included in the creation of the companies, programs, and products of tomorrow.

 

Please register here.

The post OpenAir is back for 2015 appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/openair-tech-conference-2015/feed/ 3
Behind the Scenes: Building Airbnb’s First Native Tablet App http://nerds.airbnb.com/airbnb-tablet/ http://nerds.airbnb.com/airbnb-tablet/#comments Wed, 29 Apr 2015 23:30:20 +0000 http://nerds.airbnb.com/?p=176594946 At Airbnb, we’re trying to create a world where people can connect with each other and belong anywhere. Whether you’re traveling, planning a trip with friends, or lying on your couch window-shopping your next adventure, you’re most likely using a mobile device to do your connecting, booking, or dreaming on Airbnb. Our tablet users have […]

The post Behind the Scenes: Building Airbnb’s First Native Tablet App appeared first on Airbnb Engineering.

]]>
At Airbnb, we’re trying to create a world where people can connect with each other and belong anywhere. Whether you’re traveling, planning a trip with friends, or lying on your couch window-shopping your next adventure, you’re most likely using a mobile device to do your connecting, booking, or dreaming on Airbnb. Our tablet users have probably been surprised to learn that thus far, we have never had a native tablet app. Last summer, a small team decided to change that. We started exploring what Airbnb could become on tablet, and today, we’re excited to share it with the world. It’s a challenging feat to build an entirely new platform while simultaneously maintaining and shipping an ever-evolving phone app; and we’ll tell you more about what went right, what went wrong, and how we ultimately made it happen. tablet-3

The first little steps are pivotal

After the successful launch of our Brand Evolution last summer, we formed a small team to start laying the groundwork for tablet. In building the tablet app, we took many of the technical learnings from the rebrand. For instance, much like the rebrand, we wanted to build the app over the course of several releases and eventually release the official tablet app. A small team of designers and three engineers (two iOS, one Android) formed to start building the foundation of the app and exploring the tablet space. We knew we couldn’t rewrite the entire phone app, and that if we ever wanted to ship, we’d have to reuse some of the views already existing on the phone. We reviewed every screen of the phone and every feature to determine the engineering to design cost of rebuilding each screen. One thing we quickly realized was that our top-level navigation system, “AirNav,” wouldn’t translate well to the tablet space. Instead, we’d have to design and build something new.

Airbnb goes tab bar

At Airbnb, we strive to have a seamless experience across platforms and to maintain feature parity no matter the device or form factor. This meant that whatever navigation system we chose for tablet would also have to work on phone. In order to quickly find a solution while covering as much ground as possible, the team split up to prototype as many navigation systems as we could. One of our designers, Kyle Pickering, even went as far as teaching himself Swift so he could build functional prototypes. At the end of the week we had several prototypes in all different forms, fully functional (albeit hacky) prototypes built from our live code, functional prototypes built in Swift with baked data, and even some keynote and after-effects prototypes. We took these to our user research team to quickly get some real-world user feedback on the designs. A big part of the culture at Airbnb is to move quickly and run experiments along the way, rather than waiting until the end. With a pending phone release on the horizon, we decided to build and ship the Tab Nav on phone, wrapped behind an experiment we could roll out and test on. Since the majority of the mobile team was still hard at work on building new features, we had to build the new nav quickly and quietly, in a way that would allow the rest of the team to turn it on or off at runtime without restarting the app. We launched the new nav in November 2014, which gave us several months to collect data and iterate on the high-level information architecture while we built out the tablet app. tablet-1 Fun fact: Up until the launch of tablet, both navigation systems were still active and could be turned on or off via experiment flag.

Dipping our toes in MVVM (kind of)

On iOS, MVC is the name of the game. We knew we were shipping a universal binary; we weren’t going to split targets or release two apps. In terms of code architecture, we worried that shipping a universal app would cause a split in our app that would become unwieldy over time. It wouldn’t take long for the codebase to become littered with split logic, copy-and-pasted code for experiments, and duplicate tracking calls. At the same time, we didn’t want to have massive view controller classes that split functionality between platforms. This required us to rethink the MVC pattern that was previously tried and true. What we realized was that almost every model object in our data layer (Listings, Users, Wish Lists, etc.) have three UI representations: a table view cell, a collection view cell, and a view controller. tablet-2 Each of these representations would differ from tablet to phone, so instead of having branching logic in place everywhere these objects were used, we decided to ask the model how it preferred to be displayed. We built a view-model protocol that allows us to ask any model object for its “default representation view controller.” The model returns a fully allocated device-specific view controller to be displayed. At first, these view-model objects simply returned the phone view controller, but when we eventually started building the tablet version we simply had to change a single line of code for the tablet view controllers to be displayed app-wide. This reduced the amount of refactoring we had to do once we started building out view controllers and allowed us to focus on polishing the view controllers. Also, this kept all of our code splitting check centralized to a few classes. Next we started moving through the existing phone controllers and pulling all of our tracking and experiment logic into shared logic controllers that would be used for both the phone and tablet views. This allowed the team to continue working on the phone, by adding experiments and features that would automatically find their way onto the tablet app.

Kicking a soccer ball uphill

By January 2015, the tablet team was all-hands-on-deck, and design was in a stage where we could start building the tablet app. We had around two months to build out the app and about a month for final polish and bug fixes. Design had produced several fully working demo apps to prototype interactions and UI animations. In producing these code-driven demos, the design team was able to identify gotchas in the design long before engineering was ramped up, which made for an overall smooth development period. There were, however, a few issues that inevitably popped up. For several scrolling pages throughout the app, design called for a lightweight scroll snapping. The idea was to have scroll views always decelerate to perfectly frame the content. This is not a new or revolutionary idea, but on a large-scale tablet device we discovered that, more often than not, this interaction annoyed the user. One user described it as being like “trying to kick a soccer ball up a hill.” Though the final results were visually pleasing, taking control from the user undermined the beauty of the design. Instead of cutting the feature completely, we decided to take a deeper look at the problem. Previously, we were using delegate callbacks which were fired when the user finished scrolling, and then adjusting the target content offset to the closest pre-computed snapping offset. We realized the problem with this system is that it doesn’t take into account the intent of the user. If a user scrolls a view and slides their finger off the screen in a tossing manner, the system works great. If a user purposefully stops scrolling and then releases touch the scroll view snaps to the nearest point, creating the “uphill soccer ball” effect. We decided to disable scroll snapping on the fly once the velocity of the scroll dropped below a certain point, giving the user control of the scrolling experience. Achieving these small wins and being truly thoughtful around user intent helped elevate the app experience to a whole new level of delight and usability. tablet-5 tablet-4

Always be prepared

As we crossed the finish line and landed the project (mostly) on time, we took a little time to reflect what worked to complete such a massive project. We were reminded of the old Boy Scout mantra: “Always Be Prepared.” Even though the entire team built the tablet app in just a few short months, it wouldn’t have been possible without the foundation work that was silently laid throughout the year before. From designers learning to code and building prototypes, to shipping the tablet navigation system on phone months ahead of release, this prep work ensured that when it came time to officially move towards our goal, we were ready. Processed with VSCOcam with 5 preset

The post Behind the Scenes: Building Airbnb’s First Native Tablet App appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/airbnb-tablet/feed/ 0
Meet The Nerds: Gary Wu http://nerds.airbnb.com/meet-nerds-gary-wu/ http://nerds.airbnb.com/meet-nerds-gary-wu/#comments Wed, 22 Apr 2015 17:43:02 +0000 http://nerds.airbnb.com/?p=176594938 Today we’re introducing you to Gary Wu, a traveler, family man, who values simplicity above most other things. How did you get started in Computer Science? I was a quiet kid, I loved math and electronics. It is not surprising that I fell in love with computers. As a teenager, I picked up Basic. At […]

The post Meet The Nerds: Gary Wu appeared first on Airbnb Engineering.

]]>
Today we’re introducing you to Gary Wu, a traveler, family man, who values simplicity above most other things.

How did you get started in Computer Science?

I was a quiet kid, I loved math and electronics. It is not surprising that I fell in love with computers. As a teenager, I picked up Basic. At first, it is just curiosity, but soon I liked the ability to create something fun with little investment. Any new tip learned from a book could be applied immediately to my naive programs. Soon, I was showing off my projects to parents and friends, and they gave me feedback on how to make it better. The process of continuous innovation was exciting. After graduation and working for tech companies, I also found that this repeated cycle of learn => build => feedback is not so much different from my own teenage experience. More importantly, the quicker the cycle runs, the faster the products will innovate and people will grow.

What was your path to Airbnb?

I am very interested in the sharing economy and love travel. The Internet and mobile technology has completely revamped the ways that we communicate, shop, and collaborate. I believe the relationship between people and services will have a revolutionary change as well in the near future, and this may have a profound long term impact to the human society. I followed and heard of many impressive Airbnb stories, but I haven’t considered changing job, as I was pretty happy with the previous company . Some friends approached me and shared a lot of internal stories, especially how Airbnb builds up teams based on its core values. This impressed me that Airbnb has a strong vision to completely change how everyone experiences the world. If Airbnb’s vision comes true, the world will be very different from today. This convinced me to have a try and be a part of the journey. I am glad that I made this career decision.

What’s the most interesting technical challenge you’ve worked on since joining?

After joining Airbnb, I focused on prototyping several early stage product ideas with a potential to significantly expand our market in the future. There are two sets of challenges both from the products and the infrastructure.

On the product side, we want to maximize our learning with minimum efforts, and so we develop MVPs (Minimum Viable Product). Building MVPs is easy to say but hard to execute right, because if the product is not appreciated by the customers, it is difficult to tell whether it is because of no enough engineering efforts or because of the wrong idea. Likely, we may over-emphasize the engineering execution instead of reevaluating the ideas.

On the infrastructure side, there may also have many pitfalls. Building MVP may unfortunately introduce technical debt that could be hard to extend or scale in the long run, especially under a complex business flow. Furthermore, unlike other mature technical companies, which have abundant resources and are easily able to handle 5-10X sudden load increase, Airbnb infrastructure is still at an early stage and doesn’t have enough cushion to deal with unusual resource usage pattern. Therefore, we have to be very thoughtful in developing MVPs, eliminate any possible system threats, and try to minimize any long term debt.

In summary, it is really a enjoyable and fast learning process. Airbnb has a plenty of these opportunities because there are so many areas to explore in the traveling space.

What do you want to work on next?

I would like to enable more micro-entrepreneurs to create services on top of Airbnb. We are grateful that many hosts leverage Airbnb to rent out their extra spaces for the travelers, and it is only a beginning of changing how we experience the world. To provide a magic experience for each traveller, there are a lot of opportunities for the local people to participate in, and Airbnb can be the prefect platform for them to contribute to the traveling industry.

What is your favorite core value, and how do you live it?

Simplify. I like a simplified way of thinking and doing things. First, simplification drives me to provide a simplified “interface” to others. When doing a presentation, no matter it is just 5 mins or 30 mins, I push myself to only make one sentence takeaway for the audiences. When discussing a comprehensive system design topic, I tried to summarize my points in a couple of bullet points. When writing programs, I do my best to make the function names and execution flow are so intuitive that other engineers can pick it up easily. Second, simplification makes me stay focused. It is important to be very productive, but to me, it is more important of avoid doing irrelevant things. Finally, simplification can help achieving a better software quality in the long run. I like to challenge myself to avoid unnecessary complexity for marginal improvements, but seek for architecture simplicity for future extension with a potential 10X improvement.

What’s your favorite Airbnb experience?

I spent a few days with my family living an amazing cabin inside the Sequoia national park early this year. The location of the cabin made me so close to the nature. We were surrounded by the great mountain views, hundred years old giant pine trees, and beautiful stony creeks. In addition, we totally lived in a pre-Interenet world, as there was no cable and the closest village to receive cellular signals was one hour driving distance away. We had a lot of fun with hiking, climbing, and photography. In the last day, an unexpected snow storm turned the entire mountain into white, and our cabin in the snow mountain was exactly my childhood fantasy. It was a magic experience that traditional hotel is probably impossible to offer.

The post Meet The Nerds: Gary Wu appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/meet-nerds-gary-wu/feed/ 0
Introducing AirMapView http://nerds.airbnb.com/airmapview/ http://nerds.airbnb.com/airmapview/#comments Mon, 20 Apr 2015 16:35:07 +0000 http://nerds.airbnb.com/?p=176594926 Many Android applications today require some form of an interactive map as part of their user interface. Google provides a native package and experience with Google Play Services to satisfy this need, but the question remains of how one creates interactive maps for devices without Play Services. In some countries, the majority of devices are […]

The post Introducing AirMapView appeared first on Airbnb Engineering.

]]>
Many Android applications today require some form of an interactive map as part of their user interface. Google provides a native package and experience with Google Play Services to satisfy this need, but the question remains of how one creates interactive maps for devices without Play Services.

In some countries, the majority of devices are sold without Google Play Services. Device manufacturers who ship their devices without Play Services are continuing to gain popularity worldwide. In order for our application to provide a truly internationalized experience, we can’t leave out a feature as critical as maps. And because we know other companies have this same issue, we’ve created and open sourced AirMapView.

AirMapView is a view abstraction that enables interactive maps for devices with and without Google Play Services. Devices that do have Google Play Services will use Google Maps V2, while devices without will use a web based implementation of Google Maps. This all comes as one single API that is designed after that of Google Maps V2 that most developers are used to.

AirMapView will choose by default the best map provider available for the device. By default it will use native Google Maps V2 if available and fallback to a WebView solution if Google Play Services are not available. The API is designed to be completely transparent to the user so that developers can use the same APIs that are currently used for Google Maps to gain the fallback functionality.

Native GoogleMap is implemented as a Fragment inside of the AirMapView providing the exact same functionality as using Google Maps V2 directly. Porting existing implementations from GoogleMap to AirMapView is as simple as replacing calls to GoogleMap with calls to AirMapView and implementing the correct callback classes for operations such as OnCameraChanged. The API is designed to be pluggable so developers can add their own providers for specific devices such as Amazon Maps for Amazon kindle fire devices.

The fallback webview map displays a Google Map inside of an Android WebView and uses the javascript bridge callbacks to allow dynamic interaction with the map. Due to it being a webview and not native code it isn’t as performant as the native GoogleMap but it only performed slightly worse in experiments in the Airbnb app.

Using the Javascript Bridge we are able to implement the same API in the web map so no client code changes are required to support the web map once AirMapView has been implemented for native maps.

The web map allows setting a location, centering, adding markers, dragging, tapping on the map and other common operations that are currently supported in the GoogleMap.

We’ve built AirMapView in such a way that allows us to easily add additional map providers in the future, such as Amazon Maps V2, Baidu, Mapbox, etc.

For more information take a look at our Github page here: https://github.com/airbnb/airmapview

The post Introducing AirMapView appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/airmapview/feed/ 0
Android Frameworks from Airbnb and Square http://nerds.airbnb.com/android-frameworks-from-airbnb-and-square/ http://nerds.airbnb.com/android-frameworks-from-airbnb-and-square/#comments Thu, 16 Apr 2015 19:09:01 +0000 http://nerds.airbnb.com/?p=176594918 We wrapped up the month of March with with the theme of android development. It was exciting to host Pierre-Yves Ricau from Square and feature Airbnb engineers, Eric Petzel and Nick Adams. Mapstraction Many mobile applications today require some form of an interactive map as part of their user interface. Google provides a native package […]

The post Android Frameworks from Airbnb and Square appeared first on Airbnb Engineering.

]]>
We wrapped up the month of March with with the theme of android development. It was exciting to host Pierre-Yves Ricau from Square and feature Airbnb engineers, Eric Petzel and Nick Adams.

Mapstraction

Many mobile applications today require some form of an interactive map as part of their user interface. Google provides a native package and experience with Google Play Services to satisfy this need, but the questions remains of how one creates interactive maps for devices without Play Services.

In some countries, the majority of devices are sold without Google Play Services. Device manufacturers who ship their devices without Play Services are continuing to gain popularity worldwide. We have built a package that solves these problems, letting developers remain agnostic of manufacturer while providing a consistent map experience to users.

Speaker Bio

Eric Petzel: Software Engineer on the Android team at Airbnb where he focuses on building features for our hosts and guests, as well as tools to share with the Android community. Previously to Airbnb, he worked at Skype on their Android client

Nick Adams: Software Engineer on the Android team at Airbnb. He focuses on building features that improve the quality of the app and expanding to new form factors. Before Airbnb he was a student at the University of British Columbia in Canada.

Crash Fast: Square’s approach to Android crashes

The Square Register Android app has few crashes. Getting there requires a systematic approach: coding defensively, gathering information, measuring impact and improving architecture.

This talk presents our concrete steps towards lowering the crash rate, from the general philosophy to the tools we use, together with real crash examples.

Speaker Bio

Pierre-Yves Ricau: Android baker at Square. I started having fun with Java & Android as a consultant in Paris, then joined a startup in Barcelona and finally joined Square in San Francisco to work with some of the best engineers in the world. I like good wine & low entropy code.

The post Android Frameworks from Airbnb and Square appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/android-frameworks-from-airbnb-and-square/feed/ 0
How Airbnb uses machine learning to detect host preferences http://nerds.airbnb.com/host-preferences/ http://nerds.airbnb.com/host-preferences/#comments Tue, 14 Apr 2015 09:15:45 +0000 http://nerds.airbnb.com/?p=176594906 At Airbnb we seek to match people who are looking for accommodation – guests — with those looking to rent out their place – hosts. Guests reach out to hosts whose listings they wish to stay in, however a match succeeds only if the host also wants to accommodate the guest. I first heard about […]

The post How Airbnb uses machine learning to detect host preferences appeared first on Airbnb Engineering.

]]>
At Airbnb we seek to match people who are looking for accommodation – guests — with those looking to rent out their place – hosts. Guests reach out to hosts whose listings they wish to stay in, however a match succeeds only if the host also wants to accommodate the guest.

I first heard about Airbnb in 2012 from a friend. He offered his nice apartment on the site when he traveled to see his family during our vacations from grad school. His main goal was to fit as many booked nights as possible into the 1-2 weeks when he was away. My friend would accept or reject requests depending on whether or not the request would help him to maximize his occupancy.

About two years later, I joined Airbnb as a Data Scientist. I remembered my friend’s behavior and was curious to discover what affects hosts’ decisions to accept accommodation requests and how Airbnb could increase acceptances and matches on the platform.

What started as a small research project resulted in the development of a machine learning model that learns our hosts’ preferences for accommodation requests based on their past behavior. For each search query that a guest enters on Airbnb’s search engine, our model computes the likelihood that relevant hosts will want to accommodate the guest’s request. Then, we surface likely matches more prominently in the search results. In our A/B testing the model showed about a 3.75% increase in booking conversion, resulting in many more matches on Airbnb. In this blog post I outline the process that brought us to this model.

What affects hosts’ acceptance decisions?

I kicked off my research into hosts’ acceptances by checking if other hosts maximized their occupancy like my friend. Every accommodation request falls in a sequence or in a window of available days in the calendar, such as on April 5-10 in the calendar shown below. The gray days surrounding the window are either blocked by the host or already booked. If accepted and booked, a request may leave the host with a sub-window before the check-in date (check-in gap — April 5-7) and/or a sub-window after the check-out (check-out gap — April 10).

A host looking to have a high occupancy will try to avoid such gaps. Indeed, when I plotted hosts’ tendency to accept over the sum of the check-in gap and the check-out gap (3+1= 4 in the example above), as in the next plot, I found the effect that I expected to see: hosts were more likely to accept requests that fit well in their calendar and minimize gap days.

But do all hosts try to maximize occupancy and prefer stays with short gaps? Perhaps some hosts are not interested in maximizing their occupancy and would rather host occasionally. And maybe hosts in big markets, like my friend, are different from hosts in smaller markets.

Indeed, when I looked at listings from big and small markets separately, I found that they behaved quite differently. Hosts in big markets care a lot about their occupancy — a request with no gaps is almost 6% likelier to be accepted than one with 7 gap nights. For small markets I found the opposite effect; hosts prefer to have a small number of nights between requests. So, hosts in different markets have different preferences, but it seems likely that even within a market hosts may prefer different stays.

A similar story revealed itself when I looked at hosts’ tendency to accept based on other characteristics of the accommodation request. For example, on average Airbnb hosts prefer accommodation requests that are at least a week in advance over last minute requests. But perhaps some hosts prefer short notice?

The plot below looks at the dispersion of hosts’ preferences for last minute stays (less than 7 days) versus far in advance stays (more than 7 days). Indeed, the dispersion in preferences reveals that some hosts like last minute stays better than far in advance stays — those in the bottom right — even though on average hosts prefer longer notice. I found similar dispersion in hosts’ tendency to accept other trip characteristics like the number of guests, whether it is a weekend trip etc.

All these findings pointed to the same conclusion: if we could promote in our search results hosts who would be more likely to accept an accommodation request resulting from that search query, we would expect to see happier guests and hosts and more matches that turned into fun vacations (or productive business trips).

In other words, we could personalize our search results, but not in the way you might expect. Typically personalized search results promote results that would fit the unique preferences of the searcher — the guest. At a two-sided marketplace like Airbnb, we also wanted to personalize search by the preference of the hosts whose listings would appear in the search results.

How to model host preferences?

Encouraged by my findings, I joined forces with another data scientist and a software engineer to create a personalized search signal. We set out to associate hosts’ prior acceptance and decline decisions by the following characteristics of the trip: check-in date, check-out date and number of guests. By adding host preferences to our existing ranking model capturing guest preferences, we hoped to enable more and better matches.

At first glance, this seems like a perfect case for collaborative filtering – we have users (hosts) and items (trips) and we want to understand the preference for those items by combining historical ratings (accept/decline) with statistical learning from similar hosts. However, the application does not fully fit in the collaborative filtering framework for two reasons.

  • First, no two trips are ever identical because behind each accommodation request there is a different guest with a unique human interaction that influences the host’s acceptance decision. This results in accept/decline labels that are noisier than, for example, the ratings of a movie or a song like in many collaborative filtering applications.
  • Taking this point one step further, a host can receive multiple accommodation requests for the same trip with different guests at different points in time and give those requests conflicting votes. A host may accept last minute stays that start on a Tuesday 2 out of 4 times, and it remains unclear whether the host prefers such stays.

With these points in mind, we decided to massage the problem into something resembling collaborative filtering. We used the multiplicity of responses for the same trip to reduce the noise coming from the latent factors in the guest-host interaction. To do so, we considered hosts’ average response to a certain trip characteristic in isolation. Instead of looking at the combination of trip length, size of guest party, size of calendar gap and so on, we looked at each of these trip characteristics by itself.

With this coarser structure of preferences we were able to resolve some of the noise in our data as well as the potentially conflicting labels for the same trip. We used the mean acceptance rate for each trip characteristic as a proxy for preference. Still our data-set was relatively sparse. On average, for each trip characteristic we could not determine the preference for about 26% of hosts, because they never received an accommodation request that met those trip characteristics. As a method of imputation, we smoothed the preference using a weight function that, for each trip characteristic, averages the median preference of hosts in the region with the host’s preference. The weight on the median preference is 1 when the host has no data points and goes to 0 monotonically the more data points the host has.

Using these newly defined preferences we created predictions for host acceptances using a L-2 regularized logistic regression. Essentially, we combine the preferences for different trip characteristics into a single prediction for the probability of acceptance. The weight the preference of each trip characteristic has on the acceptance decision is the coefficient that comes out of the logistic regression. To improve the prediction, we include a few more geographic and host specific features in the logistic regression.

This flow chart summarizes the modeling technique.

We ran this model on segments of hosts on our cluster using a user-generated-function (UDF) on Hive. The UDF is written in Python; its inputs are accommodation requests, hosts’ response to them and a few other host features. Depending on the flag passed to it, the UDF either builds the preferences for the different trip characteristics or trains the logistic regression model using scikit-learn.

Our main off-line evaluation metric for the model was mean squared error (MSE), which is more appropriate in a setting when we care about the predicted probability more than about classification. In our off-line evaluation of the model we were able to get a 10% decrease in MSE over our previous model that captured host acceptance probability. This was a promising result. But, we still had to test the performance of the model live on our site.

Experimenting with the model

To test the online performance of the model, we launched an experiment that used the predicted probability of host acceptance as a significant weight in our ranking algorithm that also includes many other features that capture guests’ preferences. Every time a guest in the treatment group entered a search query, our model predicted the probability of acceptance for all relevant hosts and influenced the order in which listings were presented to the guest, ranking likelier matches higher.

We evaluated the experiment by looking at multiple metrics, but the most important one was the likelihood that a guest requesting accommodation would get a booking (booking conversion). We found a 3.75% lift in our booking conversion and a significant increase in the number of successful matches between guests and hosts.

After concluding the initial experiment, we made a few more optimizations that improved conversion by approximately another 1% and then launched the experiment to 100% of users. This was an exciting outcome for our first full-fledged personalization search signal and a sizable contributor to our success.

Conclusions

First, this project taught us that in a two sided marketplace personalization can be effective on the buyer as well as the seller side.

Second, the project taught us that sometimes you have to roll up your sleeves and build a machine learning model tailored for your own application. In this case, the application did not quite fit in the collaborative filtering and a multilevel model with host fixed-effect was too computationally demanding and not suited for a sparse data-set. While building our own model took more time, it was a fun learning experience.

Finally, this project would not have succeeded without the fantastic work of Spencer de Mars and Lukasz Dziurzynski.

The post How Airbnb uses machine learning to detect host preferences appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/host-preferences/feed/ 11
Overcoming Missing Values In A Random Forest Classifier http://nerds.airbnb.com/overcoming-missing-values-in-a-rfc/ http://nerds.airbnb.com/overcoming-missing-values-in-a-rfc/#comments Tue, 07 Apr 2015 18:02:05 +0000 http://nerds.airbnb.com/?p=176594836 No Strangers Airbnb is trying to build a world where people can belong anywhere and there are no strangers. This helps hosts feel comfortable opening their homes and guests be confident traveling around the globe to stay with people they have never met before. While almost all members of the Airbnb community interact in good […]

The post Overcoming Missing Values In A Random Forest Classifier appeared first on Airbnb Engineering.

]]>
No Strangers

Airbnb is trying to build a world where people can belong anywhere and there are no strangers. This helps hosts feel comfortable opening their homes and guests be confident traveling around the globe to stay with people they have never met before.

While almost all members of the Airbnb community interact in good faith, there is an ever shrinking group of bad actors that seek to take advantage of the platform for profit. This problem is not unique to Airbnb: social networks battle with attempts to spam or phish users for their details; ecommerce sites try to prevent the use of stolen credit cards. The Trust and Safety team at Airbnb works tirelessly to remove bad actors from the Airbnb community and to help make the platform a safer and trustworthy place to experience belonging.

Missing Values In A Random Forest

We can train machine learning models to identify new bad actors (for more details see the previous blog post Architecting a Machine Learning System for Risk). One particular family of models we use is Random Forest Classifiers (RFCs). A RFC is a collection of trees, each independently grown using labeled and complete input training data. By complete we explicitly mean that there are no missing values i.e. NULL or NaN values. But in practice the data often can have (many) missing values. In particular, very predictive features do not always have values available so they must be imputed before a random forest can be trained.

Typically, random forest methods/packages encourage two ways of handling missing values: a) drop data points with missing values (not recommended); b) fill in missing values with the median (for numerical values) or mode (for categorical values). While a) does not use all the available information by dropping data points, b) can sometimes brush too broad a stroke for data sets with many gaps and significant structure.

There are alternative techniques for dealing with missing values, but most of these are computationally expensive e.g. repeated iteration of random forest training to compute proximities. What we propose in this post is a one-step pre-computation method which normalises features to construct a distance metric for filling in missing values with the median of their k-nearest neighbors.

All of the features in our fraud prediction models fall into two types: a) numerical and b) categorical. Boolean features can be thought of as a special case of categorical features. Since we work in the business of fraud detection, our labels are binary: 0 if the data point is not fraud and 1 if the data point is fraud. Below are some feature transformations we wish to compare for missing value treatment.

Transformations

  1. Lets first deal with numerical features. Write a numerical feature as a random variable \(X\) taking values in \(\mathbb{R}\). And denote the data labels by the random variable \(Y\) taking values in \(\{0,1\}\). Then for \(X=x\) the transformation \(F_n:\mathbb{R} \rightarrow [0,1]\) defined by$$
    x \mapsto F_n(x) := \mathbb{P}[X \leq x | Y=1]
    $$has the following properties:

    1. \(F_n\) is order preserving i.e. if \(a \leq b\) then \(F(a) \leq F(b)\)
    2. \(F_n\) is invertible i.e. given a value for \(y=F(x)\) we can find \(x=F^{-1}(y)\) (follows from i)
    3. \(F_n(X|{Y=1}) \sim U[0,1]\) i.e. \(F_n(X|{Y=1})\) is uniformly distributed on \([0,1]\).

     

  2. Similarly, consider a categorical feature represented by random variable \(X\) taking values in finite set \(\Omega = \{a,b,…\}\). Take ordering \(\preceq\) on \(\Omega\) according to the rate of fraud for each categorical value. Then for categorical \(X=x\) we can define the transformation \(F_c:\Omega \rightarrow [0,1]\) equivalently by$$
    x \mapsto F_c(x) := \mathbb{P}[X \preceq x | Y=1]
    $$which also has properties i), ii) and iii) from above with respect to the ordering \(\preceq\).
  3. A more common method for transforming a numerical feature \(X\) to the unit interval is given by \(G_n\) where$$
    x \mapsto G_n(x) := \mathbb{P}[X \leq x]
    $$which would also have properties i), ii) and iii) in the above.
  4. We can adapt the popular method \(g(x) = \mathbb{P}[Y=1 | X=x]\) for transforming a categorical feature \(X\) to a numerical value to give us \(G_c:\Omega \rightarrow [0,1]\) defined by$$
    x \mapsto G_c(x) := [ g(x) - \min g(x) ] / [\max g(x) - \min g(x)]
    $$The conditional probability transform \(g\) intuitively makes more sense, but violates properties i), ii), and iii) above. And without invertibility of \(G_c\) we cannot in the RFC easily (or practically) distinguish between two different categorical values \(a\) and \(b\) that have \(G_c(a) \approx G_c(b)\).

Interpretation

The aim of the scaling transforms \(F_n\) and \(F_c\) is two fold. First of all to make the transformation invertible so no information is lost. Secondly, to uniformly distribute the data points with fraud in the interval [0,1] for each feature. If we think of data as points in \(N\) dimensional space where \(N\) is the number of features, then the distance in each dimension between two data points becomes comparable. By comparable we mean that a distance of 0.4 in the the first dimension contains twice as many fraud data points as a distance of 0.2 in the second dimension. This enables better construction of distance metrics to identify fraud.

Imputation Using K-Nearest Neigbours

Suppose there are \(N\) features \(X_1,X_2,…,X_N\) and data points \(\mathbf{a},\mathbf{b},…\in \mathbf{D}\). After we have transformed the \(i\)th feature, by \(F_i\) say, for \(i=1,2,…,N\) we can construct a distance metric \(d\) between any two data points \(\mathbf{a}=(a_1,a_2,…,a_N)\) and \(\mathbf{b}=(b_1,b_2,…,b_N)\) as follows:

$$
d(\mathbf{a},\mathbf{b};\lambda) := \sum_{i=1}^{N} \lambda + \mathbf{1}_{a_i \neq NULL,b_i \neq NULL}(|F_i(a_i)-F_i(b_i)| – \lambda)
$$

where \(\lambda\) is a pre-chosen constant which determines how to weight the distance between two feature values when at least one of them is missing. By increasing \(\lambda\) we push out neighbours of a data point that have many missing values. Then, for a data point \(\mathbf{a}\) with missing value \(a_j\), say, we calculate the nearest neighbours missing value (NNMV) as:

$$
m(a_j;d,k) = \text{median} ( [\text{argmin}^{(k)}_{\mathbf{b} \in \mathbf{D}} d(\mathbf{a},\mathbf{b};\lambda)]_j )
$$

where \(k\) denotes how many nearest neigbhours we wish to use for calculating the median value of the \(j\)th feature. In other words, find the \(k\) closest neighbours and then in the \(j\)th dimension take the median of the \(k\) values.

Experiment

In order to see the effect of the above feature transforms, we use the adult dataset from the UCI Machine Learning Repository and assess the performance of the model under different feature transformations and proportions of missing values. The dataset containts 32,561 rows and 14 features, of which 8 are categorical and the remaining 4 are numerical. The boolean labels correspond to whether the income level of the adult is greater than or less than $50k per annum. We divide the dataset into a training and test set in the ratio of 4 to 1 respectively.

For the first experiment we compare different models using the methodology:

  1. Use the same training and test data set split
  2. Remove \(M\)% of values from the training and test data sets
  3. Either calculate the median/mode for the missing values or use \(m(.;d(.,.;\lambda),100)\) from the training set
  4. Fill in the missing values in both training and test set using the previous step’s calculations
  5. Train with the same number of trees (100)

Observe that \(M\) and \(\lambda\) are unspecified parameters – we will loop over different values of these during experimentation. The Performance of each model will be judged using Area Under Curve (AUC) scores which measures the area under the Reciever Operating Characteristic (ROC) graph (this is a plot of the true postive rate vs the false postitive rate). We will test the following nine models:

  1. Baseline model (no feature transformation and missing value imputation using median/mode)
  2. \(G_n\) and \(G_c\) with median imputation
  3. \(F_n\) and \(F_c\) with median imputation
  4. \(G_n\) and \(G_c\) with NNMV imputation
  5. \(G_n\) and \(F_c\) with NNMV imputation
  6. \(F_n\) and \(G_c\) with NNMV imputation
  7. \(O_n\) and \(F_c\) with NNMV imputation
  8. \(F_n\) and \(O_c\) with NNMV imputation
  9. \(F_n\) and \(F_c\) with NNMV imputation

where \(O_n(x) = O_c(x) := 0\), i.e. is the null map that takes all of a feature’s values to zero. The \(O_n\) and \(O_c\) maps help us verify that both the numerical and categorical features are contributing to the model.

In the second experiment we compare model 8) to model 0) for different combinations of number of trees and number of nearest neighbours.

Results

Models Comparison

First we consider how the models perform for different values of \(M\) and \(\lambda\) with a fixed number of trees (100) in the RFC training process and fixed number of nearest neighbours (100) for NNMV imputation.

rfc_mis_vals_cdf_transforms_13_0

The graphs above display interesting patterns, some intuitive and some surprising:

  1. As we increase the percentage of missing values, the performance of the baseline model deterioriates
  2. In the scenario where no missing values are added (top left graph) but the data set still has some missing values, the improvement from all models is roughly the same at 0.007
  3. For all scenarios, the models with median imputation (red and yellow lines) perform very similarly to each other and sometimes over 0.005 worse than the NNMV models
  4. Across all scenarios the ‘\(F_n\) and \(F_c\) + NNMV’ graph (light brown line) outperforms the median imputation models, with the improvement increasing as the percentage of missing values increases
  5. Across all scenarios the ‘\(F_n\) and \(F_c\) + NNMV’ graph (light brown line) outperforms the other NNMV graphs, by upto 0.003 in some cases (bottom right graph)
  6. Across all scenarios the \(O_n\) and \(O_c\) graphs (light blue and pink lines) reduce the power of the model considerably – as we would expect – so much so that they are outside of the scope of the graphs
  7. The results do not appear to be very sensitive to \(\lambda\) as long as \(\lambda>0\).

Robustness Checks

Having observed the outperformance of model 8) over the other candidates, we next check how model 8) compares to the baseline as we vary i) the number of trees in the RFC training and ii) the number of nearest neigbours in the NNMV imputation. We take the fourth scenario above – where 60% of values are missing – and we chose \(\lambda\)=0.5.

rfc_mis_vals_cdf_transforms_16_1

The left hand plot does not suggest the performance of model 8) improves with the number of nearest neighbours used in the NNMV imputation. However, there is a consistent pattern of improved performance as the number of trees increases, plateauing after about 100 trees. The right hand plot shows how much faster it is to train a RFC with the transformed feature set. This is to be expected as, instead of exploding categorical features to many binary features in the baseline model, we keep the number of features fixed in model 8).

Efficiency Implications

Consider the ROC curves for one of the scenarios above, say, where the number of trees is 100 and the number of nearest neighbours used is 100.

rfc_mis_vals_cdf_transforms_19_0

The improvement of the ROC curve suggests, for example, that holding recall fixed at 80%, say, the false positive rate falls from 26% to 24%. Suppose each day we are scoring 1 million events, 99% of which are non-fraud, each flagged event needs to be manually reviewed by a human, and each review takes 10 seconds. Then the aforementioned decrease in the false positive rate can save reviewing 1,000,000 x 0.99 x 0.02 = 19,800 events or 19,800 / (6 x 60) = 55 hours of reviewing per day! This is why even single digit or decimal digit improvements in the auc score of a RFC can have a dramatic effect on a department’s efficiency.

The post Overcoming Missing Values In A Random Forest Classifier appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/overcoming-missing-values-in-a-rfc/feed/ 0
Meet The Nerds: Barbara Raitz http://nerds.airbnb.com/meet-nerds-barbara-raitz/ http://nerds.airbnb.com/meet-nerds-barbara-raitz/#comments Fri, 03 Apr 2015 16:31:41 +0000 http://nerds.airbnb.com/?p=176594891 Meet Barbara Raitz! In today’s Q&A Barbara tells us about storytelling through code and how to be a magician. How did you get started in Computer Science? I grew up around computers and initially pursued a Computer Science degree for practical reasons. After working a few years in the industry, I discovered that I absolutely […]

The post Meet The Nerds: Barbara Raitz appeared first on Airbnb Engineering.

]]>
Meet Barbara Raitz! In today’s Q&A Barbara tells us about storytelling through code and how to be a magician.

How did you get started in Computer Science?

I grew up around computers and initially pursued a Computer Science degree for practical reasons. After working a few years in the industry, I discovered that I absolutely love it! There is beauty in well crafted code and elegant architectures. I love diving in deep, trying to understand the `story` of complex code, and then providing cleaner, simpler, and more powerful solutions.

If I could say anything to those considering this career path, I want to shout out that it is an amazing field! Though it can be initially daunting, there is such a rich, vast, nearly limitless set of opportunities with this skill set. I love quoting a colleague who said: “Do you want to watch the magic show, or do you want to become the magician?” I feel so lucky to have found a home where I can have such impact, while having so much fun.

What was your path to Airbnb?

I first heard of Airbnb on a podcast from The Commonwealth Club. I very quickly identified with the whole premise of Airbnb, but didn’t consider it much further until a recruiter reached out and contacted me. Because of that initial spark, I decided to come visit the office and meet the team, and was undeniably affected by the positive energy and excitement of the place! Even though it was an introductory visit, I knew I was changed, and every subsequent visit re-enforced that.

I joined Airbnb because it is a technical playground with many opportunities to dig in and make a difference; because the amazing people and culture absolutely blew me away; and because I felt a strong connection to the Airbnb mission. I am particularly drawn to the concept of unique, authentic, local travel that benefits the community, and, that in this ever virtual world, we are bringing people together again.

What’s the most interesting technical challenge you’ve worked on since joining?

I would like to share two!

Initially, I paired up with Spike Brehm to work on Rendr, a library that can re-use and render javascript code on either the client or server: original, powerful, fast — isomorphic javascript! This was a fascinating, ground-breaking, and richly challenging and rewarding project, and I am proud of our results!

From there, I moved to a completely different project that needed me most. I have spent much of my efforts diving deep into tangled legacy code and gradually moving it towards a cleaner, tested, decoupled service-oriented architecture. This process of refactoring mission critical code and data structures while still “in-flight” requires absolute attention to detail, small steps, and patience. Specific to the Calendar, it is rewarding to see powerful new features, such as seasonal availability rules, shine through.

What do you want to work on next?

Consistent with the mission of Simplify, there are other product areas that would greatly benefit from a deep-dive and refactor. Each has it’s own challenges, and is thus richly interesting. I will likely continue supporting opportunities that deliver the biggest impact in terms of decoupling code and processes, increasing stability and maintainability, and delivering powerful new capabilities and features.

In terms of learning something new, I have my eye on React.js — I instinctively like it and want to try it out myself sometime. I have also wanted to explore and experience writing a mobile client application. Maybe in my free time =)

What is your favorite core value, and how do you live it?


I am drawn to all the core values, and absolutely love that our company truly promotes and lives them! Though hard to choose a favorite, I suppose that I best embody “Champion the Mission”. In terms of my technical role, I passionately pursue real architecture changes that I feel will have the most impact to our fast-growing team, product, and company. And then I follow through with careful, persistent, patient hard work. In terms of something bigger, I truly love how Airbnb promotes “belonging anywhere”, in being gracious and welcoming hosts, and being your individual unique self. I love how Airbnb is beneficial to local communities and real individuals. I love how this form of travel encourages people to go outside their normal bounds to perhaps discover something new. These actually map to other core values, and I promote and champion them as well!

What’s your favorite Airbnb experience?

For New Year’s 2013, I wanted to go somewhere special with my boyfriend and we landed here, the experience was absolutely amazing! The space is truly beautiful, from the stunning landscaping and views, to the fabulous and uniquely crafted house itself. For us, it was magical. It was beautiful, relaxing, and creatively stimulating — it allowed us to both relax and go into hyper-creative mode, imagining all sorts of creative, crazy, inspiring futures — colorful ideas that are still with us today. But as any well-travelled person could tell you, the most meaningful memories are tied to the people and kindred spirits you meet along the way. Jeanne, our host, is one of the most gracious, welcoming people I’ve met. She went out of her way to make our stay uniquely special, arranging for a special boat on New Years eve to go to the center of the lake to watch ALL the fireworks. But truly, she opened her heart to us, and we shared real life stories, goals, and aspirations. It’s rare and beautiful when, unexpectedly, your life and perspective changes because you went somewhere or crossed paths and made friends with someone new. This was one of those memorable experiences.

Side note, this listing won Airbnb’s Most Unique Listing award in 2014.

The post Meet The Nerds: Barbara Raitz appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/meet-nerds-barbara-raitz/feed/ 0
Meet the Nerds: Phillippe Siclait http://nerds.airbnb.com/meet-nerds-phillippe-siclait/ http://nerds.airbnb.com/meet-nerds-phillippe-siclait/#comments Thu, 19 Mar 2015 19:36:02 +0000 http://nerds.airbnb.com/?p=176594831 In this Q&A we meet Phillippe Siclait. Phillippe has been with Airbnb for over 2 years and has worked on five teams since joining – clearly he likes to travel both personally and professionally! How did you get started in Computer Science? I started programming to make video games. When I was in elementary school, […]

The post Meet the Nerds: Phillippe Siclait appeared first on Airbnb Engineering.

]]>
In this Q&A we meet Phillippe Siclait. Phillippe has been with Airbnb for over 2 years and has worked on five teams since joining – clearly he likes to travel both personally and professionally!

How did you get started in Computer Science?
I started programming to make video games. When I was in elementary school, I found a book at a school book fair called “Learn to Program Basic”. I had the vague sense that programming was a thing you needed to do to make games, so I picked it up, went through all the exercises, and got hooked. In the proceeding years, I continued to learn from all the resources I could find online. This led to doing programming competitions through my school and the development of an interest in graphics programming. By the end of high school I was pretty set on studying either CS or Economics.

What was your path to Airbnb?
It turns out that I didn’t actually end up majoring in CS. I received a BS in Economics mostly focused on game theory and econometrics while simultaneously working in the Computer Graphics group in the Computer Science and Artificial Intelligence Lab. I decided prior to my final year of school that management consulting would help me learn how to run a business, and after an internship at the Boston Consulting Group, I accepted a full-time offer there. I learned a lot at BCG and got to do a fair amount of travel, both for work and for fun, but in the end realized that I wanted to come back to the technical side. With a couple friends, I packed my bags and moved across the country to explore what the Bay had to offer. After a few months of working independently on mobile and web projects, a friend of mine brought me over to Airbnb for a Tech Talk. It was after meeting many people, hearing about the vision of the company, and thinking about how much I wanted to see this idea spread, that I knew that I had to be here.

What’s the most interesting technical challenge you’ve worked on since joining?
Since joining Airbnb I’ve worked on many teams. I started working on Search (frontend, ranking, infrastructure, evaluation tools, and more), worked a bit on web security, and then worked on our (at the time newly formed) Discovery team. One of the problems for Discovery is recommending places and listings to our guests. I found it fascinating to think through how you could determine which locations an individual would likely be interested in and I worked with my teammates to implement the data pipelines and serving architecture for providing the recommendations to many parts of our product. It was a complex problem partly because the decisions we make while traveling are very personal and multi-faceted.

What do you want to work on next?
I’m currently working on a team that focuses on engaging people to host on Airbnb. We’re an incredibly experiment driven team and we have plans to change many parts of the product to make starting to host easier. We have a small, multidisciplinary team of designers, data scientists, a product manager and engineers who are all focused on this problem. I’m excited to see us increase the rate at which we are testing new product changes.

What is your favorite core value, and how do you live it?
Simplify. I try to simplify any code I write. Code is meant to be read by people and as a result, the simplest code is often the best code. And all of us at Airbnb live it in the product we develop. We are building a product for our guests and hosts, and a simple product leads to a better end experience.

What’s your favorite Airbnb experience?
My favorite Airbnb experience may have actually been over a one night stay in Manila earlier this year. The host had a beautiful house with wonderful, vibrant wood throughout, and he was hosting other guests who were visiting from the UK. He invited us out to dinner that night and it was fascinating to get to know these people who had lives so different from my own. Our host was an American expat who had long been in the Philippines and the other guests were retirees who traveled the world, exploring new places and teaching motorcycle racing. The dinner was wonderful and we spent several hours afterwards chatting in his kitchen about our lives and what we had all seen of the world. In the morning our host organized our transportation back to the airport and made sure we knew that we were welcome the next time we were back in Manila.

The post Meet the Nerds: Phillippe Siclait appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/meet-nerds-phillippe-siclait/feed/ 0