Airbnb Engineering http://nerds.airbnb.com Nerds Tue, 28 Jul 2015 19:43:18 +0000 en-US hourly 1 http://wordpress.org/?v=390 Netflix Algorithms Are Key to the ‘Future of Internet Television’ http://nerds.airbnb.com/netflix-algorithms-are-key-to-the-future-of-internet-television/ http://nerds.airbnb.com/netflix-algorithms-are-key-to-the-future-of-internet-television/#comments Fri, 24 Jul 2015 00:02:05 +0000 http://nerds.airbnb.com/?p=176595119 When Netflix’s 60 million subscribers log in to the streaming video service, their home page is populated with TV show and movie recommendations. The recommendations are key to Netflix’s success, as they drive two out of every three hours of video streamed.   The user’s home page “is where all our algorithms for recommending TV […]

The post Netflix Algorithms Are Key to the ‘Future of Internet Television’ appeared first on Airbnb Engineering.

]]>

When Netflix’s 60 million subscribers log in to the streaming video service, their home page is populated with TV show and movie recommendations. The recommendations are key to Netflix’s success, as they drive two out of every three hours of video streamed.

 

The user’s home page “is where all our algorithms for recommending TV shows and movies come together,” said Carlos Gomez Uribe, Netflix VP of Innovation. In his recent OpenAir 2015 talk, Gomez Uribe said over 100 engineers are focused on developing algorithms to help Netflix meet its business goal of “inventing the future of Internet television.”

 

One Netflix algorithm organizes the entire video catalog in a personalized way for users. Another looks for similarities between all content Netflix offers. A master algorithm “looks at all the other algorithms to decide which videos make it onto a user’s home page,” said Gomez Uribe.

 

Keyword searches drive 20 percent of video streaming hours, so Netflix’s search algorithm is tied into its recommendations and other algorithms. When users search for a title Netflix doesn’t have, the search results will display recommendations for similar shows. “We try to recommend movies related to a search, even though it’s not exactly what you wanted. All this requires a large number of algorithms,” Gomez Uribe said.

 

Personalization is important because it’s more likely to drive higher engagement with Netflix content vs. simply showing a user what’s popular. When Netflix organizes videos by popularity on user home pages, the “take rate” (the percentage of suggested videos that are actually watched) is “OK,” Gomez Uribe said. “But when we personalize recommendations, the take rate goes way up.” (Go to 5:00 in the video to hear more.)

 

Algorithms also help Netflix perform long-term A/B testing on its user interface, providing alternate ways to organize and display recommendations to users, said Gomez Uribe. In turn, the A/B testing can help Netflix measure subscriber cancellation rates more effectively. Cancellations are an easier metric to track than new member sign-ups because the latter are often fueled by word of mouth—which is notoriously difficult to track.

 

A/B testing has enabled Netflix to “stand our ground” on occasion, Gomez Uribe added. In 2011, Netflix.com unveiled a new user interface. A/B test results influenced the design, as the data showed that the new look-and-feel decreased cancellations and increased hours streamed.

 

Netflix was “so proud” of the new interface that it ran a blog post about it, “New Look and Feel for the Netflix Website” (June 8, 2011). But in short order, Netflix received a considerable number of snarky comments about the new look. Wrote one displeased subscriber: “Please inform your employers that a drunken dyslexic monkey would be a more acceptable design lead for your web concepts.”

 

Data from the A/B tests told Netflix that, despite the snark, “the majority of users were better off” with the new interface. And so, rather than rolling back to old the UI, Netflix moved forward with the new one, continuing to fine-tune it along the way. (Discussion begins around 10:40 in the video.)

The post Netflix Algorithms Are Key to the ‘Future of Internet Television’ appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/netflix-algorithms-are-key-to-the-future-of-internet-television/feed/ 0
At Airbnb, Data Science Belongs Everywhere: Insights from Five Years of Hypergrowth http://nerds.airbnb.com/scaling-data-science/ http://nerds.airbnb.com/scaling-data-science/#comments Tue, 07 Jul 2015 17:32:21 +0000 http://nerds.airbnb.com/?p=176595109 Five years ago, I joined Airbnb as its first data scientist. At that time, the few people who’d even heard of the company were still figuring out how to pronounce its name, and the roughly 7 person team (depending on whether you counted that guy on the couch, the intern, and the barista at our […]

The post At Airbnb, Data Science Belongs Everywhere: Insights from Five Years of Hypergrowth appeared first on Airbnb Engineering.

]]>
Five years ago, I joined Airbnb as its first data scientist.

At that time, the few people who’d even heard of the company were still figuring out how to pronounce its name, and the roughly 7 person team (depending on whether you counted that guy on the couch, the intern, and the barista at our favorite coffee shop) was still operating out of the founders’ apartment in SOMA. Put simply, it was pretty early stage.

Bringing me on was a forward-looking move on the part of our founders. This was just prior to the big data craze and the conventional wisdom that data can be a defining competitive advantage. Back then, it was a lot more common to build a data team later in a company’s lifecycle. But they were eager to learn and evolve as fast as possible, and I was attracted to the company’s culture and mission. So even though we were a very small-data shop at the time, I decided to get involved.

There’s a romanticism in Silicon Valley about the early days of a startup: you move fast, make foundational decisions, and any good idea could become the next big thing. From my perspective, that was all true.

Back then we knew so little about the business that any insight was groundbreaking; data infrastructure was fast, stable, and real-time (I was querying our production mysql database); the company was so small that everyone was in the loop about every decision; and the data team (me) was aligned around a singular set of metrics and methodologies.

But five years and 43,000% growth later, things have gotten a bit more complicated. I’m happy to say that we’re also more sophisticated in the way we leverage data, and there’s now a lot more of it. The trick has been to manage scale in a way that brings together the magic of those early days with the growing needs of the present — a challenge that I know we aren’t alone in facing.

So I thought it might be worth pairing our posts on specific problems we’re solving with an overview of the higher-level issues data teams encounter as companies grow, and how we at Airbnb have responded. This will mostly center around how to connect data science with other business functions, but I’ll break it into three concepts — how we characterize data science, how it’s involved in decision-making, and how we’ve scaled it to reach all sides of Airbnb. I won’t say that our solutions are perfect, but we do work every day to retain the excitement, culture, and impact of the early days.

Data Isn’t Numbers, It’s People

The foundation upon which a data science team rests is the culture and perception of data elsewhere in the organization, so defining how we think about data has been a prerequisite to ingraining data science in business functions.

In the past, data was often referenced in cold, numeric terms. It was construed purely as a measurement tool, which paints data scientists as Spock-like characters expected to have statistics memorized and available upon request. Interactions with us would therefore tend to come in the form of a request for a fact: how many listings do we have in Paris?What are the top 10 destinations in Italy?

While answering questions and measuring things is certainly part of the job, at Airbnb we characterize data in a more human light: it’s the voice of our customers. A datum is a record of an action or event, which in most cases reflects a decision made by a person. If you can recreate the sequence of events leading up to that decision, you can learn from it; it’s an indirect way of the person telling you what they like and don’t like – this property is more attractive than that one, I find these features useful but those.. not so much.

This sort of feedback can be a goldmine for decisions about community growth, product development, and resource prioritization. But only if you can decipher it. Thus, data science is an act of interpretation – we translate the customer’s ‘voice’ into a language more suitable for decision-making.

This idea resonates at Airbnb because listening to guests and hosts is core to our culture. Since the early days, our team has met with community members to understand how to make our product better suit their needs. We still do this, but the scale of the community is now beyond the point where it’s feasible to connect with everyone everywhere.

So, data has become an ally. We use statistics to understand individual experiences and aggregate those experiences to identify trends across the community; those trends inform decisions about where to drive the business.

Over time, our colleagues on other teams have come to understand that the data team isn’t a bunch of Vulcans, but rather that we represent the very human voices of our customers. This has paved the way for changes to the structure of data science at Airbnb.

Proactive Partnership v. Reactive Stats-Gathering

A good data scientist is therefore able to get in the mind of people who use our product and understand their needs. But if they’re alone in a forest with no one to act on the insight they uncovered, what difference does it make?

Our distinction between good and great is impact — using insights to influence decisions and ensuring that the decisions had the intended effect. While this may seem obvious, it doesn’t happen naturally – when data scientists are pressed for time, they have a tendency to toss the results of an analysis ‘over the wall’ and then move on to the next problem. This isn’t because they don’t want to see the project through, but with so much energy invested into understanding the data, ensuring statistical methods are rigorous, and making sure results are interpreted correctly, the communication of their work can feel like a trivial afterthought.

But when decision-makers don’t understand the ramifications of an insight, they don’t act on it. When they don’t act on it, the value of the insight is lost.

The solution, we think, is connecting data scientists as tightly as possible with decision-makers. In some cases, this happens naturally; for example when we develop data products (more on this in a future post). But there’s also a strong belief in cross-functional collaboration at Airbnb, which brings up questions about how to structure the team within the broader organization.

A lot has been written about the pros and cons of centralized and embedded data science teams, so I won’t focus on that. But suffice to say we’ve landed on a hybrid of the two.

We began with the centralized model, tempted by its offering of opportunities to learn from each other and stay aligned on metrics, methodologies, and knowledge of past work. While this was all true, we’re ultimately in the business of decision-making, and found we couldn’t do this successfully when silo’d: partner teams didn’t fully understand how to interact with us, and the data scientists on our team didn’t have the full context of what they were meant to solve or how to make it actionable. Over time we became viewed as a resource and, as a result, our work became reactive – responding to requests for statistics rather than being able to think proactively about future opportunities.

So we made the decision to move from a fully-centralized arrangement to a hybrid centralized/embedded structure: we still follow the centralized model, in that we have a singular data science team where our careers unfold, but we have broken this into sub-teams that partner more directly with engineers, designers, product managers, marketers, and others. Doing so has accelerated the adoption of data throughout the company, and has elevated data scientists from reactive stats-gatherers to proactive partners. And by not fully shifting toward an embedded model we’re able to maintain a vantage point over every piece of the business, allowing us to form a neural core that can help all sides of the company learn from one another.

Customer-driven decisions

Structure is a big step toward empowering impactful data science, but it isn’t the full story. Once situated within a team that can take action against an insight, the question becomes how and when to leverage the community’s voice for business decisions.

Through our partnership with all sides of the company, we’ve encountered many perspectives on how to integrate data into a project. Some people are naturally curious and like to begin by understanding the context of the problem they’re facing. Others view data as a reflection of the past and therefore a weaker guide for planning; but these folks tend to focus more on measuring the impact of their gut-driven decisions.

Both perspectives are fair. Being completely data-driven can lead to optimizing toward a local maximum; finding a global maximum requires shocking the system from time to time. But they reflect different points where data can be leveraged in a project’s lifecycle.

Over time, we’ve identified four stages of the decision-making process that benefit from different elements of data science:
data-img1

  1. We begin by learning about the context of the problem, putting together a full synopsis of past research and efforts toward addressing the opportunity. This is more of an exploratory process aimed at sizing opportunities, and generating hypotheses that lead to actionable insights.
  2. That synopsis translates to a plan, which encompasses prioritizing the lever we intend to utilize and forming a hypothesis for the effect of our efforts. Predictive analytics is more relevant in this stage, as we have to make a decision about what path to follow, which is based on where we expect to have the largest impact.
  3. As the plan gets underway, we design a controlled experiment through which to roll the plan out. A/B testing is very common now, but our collaboration with all sides of the business opens up opportunities to use experimentation in a broader sense — operational market-based tests, as well as more traditional online environments.
  4. Finally, we measure the results of the experiment, identifying the causal impact of our efforts. If successful, we launch to the whole community; if not, we cycle back to learning why it wasn’t successful and repeat the process.

Sometimes a step is fairly straightforward, for example if the context of the problem is obvious – the fact that we should build a mobile app doesn’t necessitate a heavy synopsis upfront. But the more disciplined we’ve become about following each step sequentially, the more impactful everyone at Airbnb has become. This makes sense because, ultimately, this process pushes us to solve problems relevant to the community in a way that addresses their needs.

Democratizing Data Science

The above model is great when data scientists have sufficient bandwidth. But the reality of a hypergrowth startup is that the scale and speed at which decisions need to be made will inevitably outpace the growth of the data science team.

This became especially clear in 2011 when Airbnb exploded internationally. Early in the year, we were still a small company based entirely in SF, meaning our army of three data scientists could effectively partner with everyone.

Six months later, we opened over 10 international offices simultaneously, while also expanding our product, marketing, and customer support teams. Our ability to partner directly with every employee suddenly, and irrevocably, disappeared.

Just as it became impossible to meet every new member of the community, it was now also impossible to meet and work with every employee. We needed to find a way to democratize our work, broadening from individual interactions, to empowering teams, the company, and even our community.
data-img2

Doing this successfully requires becoming more efficient and effective, mostly through investment in the technology surrounding data. Here are some examples of how we’ve approached each level of scale:

  1. Individual interactions become more efficient as data scientists are empowered to move more quickly. Investing in data infrastructure is the biggest lever here – adopting faster and more reliable technologies for querying an ever-growing volume of data. Stabilizing ETL has also been valuable, for example through our development of Airflow.
  2. Empowering teams is about removing the burden of reporting and basic data exploration from the shoulders of data scientists so they can focus on more impactful work. Dashboards are a common example of a solution. We’ve also developed a tool to help people author queries (Airpal) against a robust and intuitive data warehouse.
  3. Beyond individual teams, where our work is more tactical, we think about the culture of data in the company as a whole. Educating people on how we think about Airbnb’s ecosystem, as well as how to use tools like Airpal, removes barriers to entry and inspires curiosity about how everyone can better leverage data. Similar to empowering teams, this has helped liberate us from ad hoc requests for stats.
  4. The broadest example of scaling data science is enabling guests and hosts to learn from each other directly. This mostly happens through data products, where machine learning models interpret signals from one set of community-members to help guide others. Location relevance was one example we wrote about, but as this work is becoming more commonplace in other areas of the company, we’ve developed tools for making it easier to launch and understand the models we develop.

Scaling a data science team to a company in hypergrowth isn’t easy. But it is possible. Especially if everyone agrees that it’s not just a nice part of the company, it’s an essential part of the company.

Wrestling the train from the monkey

Five years in, we’ve learned a lot. We’ve improved how we leverage the data we collect; how we interact with decision-makers; and how we democratize this ability out to the company. But to what extent has all of this work been successful?

Measuring the impact of a data science team is ironically difficult, but one signal is that there’s now a unanimous desire to consult data for decisions that need to be made by technical and non-technical people alike. Our team members are seen as partners in the decision-making process, not just reactive stats-gatherers.

Another is that our increasing ability to distill the causal impact of our work has helped us wrestle the train away from the monkey. This has been trickier than one might expect because Airbnb’s ecosystem is complicated — a two-sided marketplace with network effects, strong seasonality, infrequent transactions, and long time horizons — but these challenges make the work more exciting. And as much as we’ve accomplished over the last few years, I think we’re still just scratching the surface of our potential.

We’re at a point where our infrastructure is stable, our tools are sophisticated, and our warehouse is clean and reliable. We’re ready to take on exciting new problems. On the immediate horizon we look forward to shifting from batch to realtime processing; developing a more robust anomaly detection system; deepening our understanding of network effects; and increasing our sophistication around matching and personalization.

But these ideas are just the beginning. Data is the (aggregated) voice of our customers. And wherever we go next–wherever we belong next–will be driven by those voices.

This post originally appeared on Venturebeat.

The post At Airbnb, Data Science Belongs Everywhere: Insights from Five Years of Hypergrowth appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/scaling-data-science/feed/ 0
Recap of OpenAir http://nerds.airbnb.com/recap-of-openair/ http://nerds.airbnb.com/recap-of-openair/#comments Mon, 06 Jul 2015 16:04:20 +0000 http://nerds.airbnb.com/?p=176595098 Three weeks ago we hosted OpenAir 2015, our second technology conference. We had an amazing turnout of bright minds from across the industry, more than doubling attendance from 2014. A new generation of companies are emerging whose customers aren’t judging them by their apps and websites but on the experiences and content the products connect […]

The post Recap of OpenAir appeared first on Airbnb Engineering.

]]>
Three weeks ago we hosted OpenAir 2015, our second technology conference. We had an amazing turnout of bright minds from across the industry, more than doubling attendance from 2014.

A new generation of companies are emerging whose customers aren’t judging them by their apps and websites but on the experiences and content the products connect them with. With that in mind the theme for OpenAir 2015 was scaling human connection and we focused on online to offline and the better matching that enables it.

Throughout the day we learned how Instagram helps their users discover new content that inspires them; how Stripe helps people transact across borders, how LinkedIn used data to power their social network, how Periscope came to life on Android, and of course, how Airbnb helps turns strangers into friends.

Behind all of these challenges there are central concepts that we as a tech industry need to understand better – trust, personalization and the data that enables both.

With that in mind, Airbnb open-sourced two new tools for wrangling data. The first is called Airflow which is a sophisticated tool to programmatically author, schedule and monitor data pipelines. People in the industry will know this work as ETL engineering. The second was Aerosolve. Aerosolve is a machine learning package for Apache Spark. It’s designed to combine high capacity to learn with an accessible workflow that encourages iteration and deep understanding of underlying patterns on a human level. Since we launched these tools they have gotten over 2000 stars on GitHub – we can’t wait to see how people use and contribute to them.

We also announced a new tool for our hosts called Price Tips, which is powered by Aerosolve. Price Tips creates ongoing tips for our hosts on how to price their listing, not just for one day, but for each day of the year. This pricing is fully dynamic — it takes into account demand, location, travel trends, amenities, type of home and much more. There are hundreds of signals that go into the model to produce each price tip. We believe that better pricing will be a great way to further empower our hosts to meet their personal goals through hosting.

Finally we closed out the opening keynote morning with the launch of our brand new Gift Cards website. Now anyone in the US can give their family, friends, colleagues, frenemies, whomever, the gift of travel on Airbnb. And for those lucky folks in the audience, we gave everyone a $100 gift card.

We will be following up with more videos from the event, so keep your eyes on this space.

The post Recap of OpenAir appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/recap-of-openair/feed/ 0
Designing Machine Learning Models: A Tale of Precision and Recall http://nerds.airbnb.com/designing-machine-learning-models/ http://nerds.airbnb.com/designing-machine-learning-models/#comments Wed, 01 Jul 2015 18:02:21 +0000 http://nerds.airbnb.com/?p=176595076 At Airbnb, we are focused on creating a place where people can belong anywhere. Part of that sense of belonging comes from trust amongst our users and knowing that their safety is our utmost concern. While the vast majority of our community is made up of friendly and trustworthy hosts and guests, there exists a […]

The post Designing Machine Learning Models: A Tale of Precision and Recall appeared first on Airbnb Engineering.

]]>
At Airbnb, we are focused on creating a place where people can belong anywhere. Part of that sense of belonging comes from trust amongst our users and knowing that their safety is our utmost concern.

While the vast majority of our community is made up of friendly and trustworthy hosts and guests, there exists a tiny group of users who try to take advantage of our site. These are very rare occurrences, but nevertheless, this is where the Trust and Safety team comes in.

The Trust and Safety team deals with any type of fraud that might happen on our platform. It is our main objective to try to protect our users and the company from various types of risks. An example risk is chargebacks – a problem that most ecommerce companies are familiar with. To reduce the number of fraudulent actions, the Data Scientists within the Trust and Safety team build various Machine Learning models to help identify the different types of risks. For more information on the architecture behind our models, please refer to a previous blog post on Architecting A Machine Learning System For Risk.

In this post, I give a brief overview of the thought process that comes with building a Machine Learning model. Of course every model is different, but hopefully it will give readers an insight on how we use data in a Machine Learning application to help protect our users, and the different approaches we use to improve our models. For this blog post, suppose we want to build a model to predict if certain fictional characters are evil*.

What are we trying to predict?

The most fundamental question in model building is determining what you would like the model to predict. I know this sounds silly, but often times, this question alone raises other deeper questions.

Even a seemingly straightforward character classification model can raise many questions as we think more deeply about the kind of model to build. For example, what do we want this model to score: just newly introduced characters or all characters? If the former, how far into the introduction do we want to score the characters? If the latter, how often do we want to score these characters?

A first thought might be to build a model that scores each character upon introduction. However, with such a model, we would not be able to track characters’ scores over time. Furthermore, we could be missing out on potentially evil characters that might have “good” characteristics at the time of introduction.

We could instead build a model that scores a character every time he/she appears in the plot. This would allow us to study the scores over time and detect anything unusual. But, given that there might not be any character development in every single appearance, this may not be the most practical route to pursue.

After much consideration, we might decide on a model design that falls in between these two initial ideas i.e. build a model that scores each character each time something significant happened such as gathering of new allies, possessions of dragons, etc. This way, we would still be able to track the characters’ scores over time without unnecessarily scoring those with no recent development.

img1

How do we model scores?

Since our objective is to analyze scores over time, our training data set needs to reflect characters’ activities across a period of time. The resulting training data set will look similar to the following:

chart1-1

The periods associated with each character are not necessarily consecutive since we are only interested in days where there exist significant developments.

In this instance, Jarden has significant character developments on 3 different occasions and is constantly growing his army over time. Dineas has significant character developments on 5 different occasions and is responsible for 4 dragons mid-plot.

Sampling

Often with Machine Learning models, it is necessary to down-sample the number of observations. The sampling process itself can be quite straightforward i.e. once one has the desired training data set, one can do a row-based sampling on the population.

However, because the model described herein is dealing with multiple periods per character, row-based sampling might result in scenarios where the occasions pertaining to a character get split between the data for model build and the validation data. The table below shows an example of such scenario:

chart2

This is not ideal because we are not getting a holistic picture of each character and those missing observations could be crucial to building a good model.

For this reason, we need to do character-based sampling. This will ensure that either all of the occasions pertaining to a character get included in the model build data, or none at all.

chart3

The same logic applies when it comes time to splitting our data into training and validation sets.

Feature Engineering

Feature engineering is an integral part of Machine Learning, and a good understanding of the data helps generate ideas on the types of features to engineer for a better model. Examples of feature engineering include feature normalization and treatment of categorical features.

Feature normalization is a way to standardize features that allows for more sensible comparisons. Let’s take the table below as an example:

chart4

Both characters have 10,000 soldiers. However, Serion has been in power for 5 years, while Dineas has only been in power for 2 years. Comparing the absolute number of soldiers across these characters might not have been very useful. However, normalizing them with the characters’ years in power could provide better insights and produce a more predictive feature.

Feature engineering on categorical features probably deserves a separate blog post due to the many different ways to deal with them. In particular for missing values imputation, please take a look at a previous blog post on Overcoming Missing Values in a Random Forest Classifier.

The most common approach for transforming categorical features is vectorizing (also known as one-hot encoding). However, when dealing with many categorical features with many different levels, it is more practical to use conditional-probability coding (CP-coding).

The basic idea of CP-coding is to compute the probability of an event occurring given a categorical level. This method allows us to project all levels of a categorical feature into a single numerical variable.

chart5

However, this type of transformation may result in noisy values for levels that are not represented well. In the example above, we only have one observation from the House of Tallight. As a result, the corresponding probability is either 0 or 1. To get around this issue and to reduce the noise in general, one can adjust how the probabilities are computed by taking into account the weighted average, the global probability, as well as introduce a smoothing hyperparameter.

So, which method is better? It depends on the number and levels of the categorical features. CP-coding is good because it reduces the dimensionality of the feature, but by doing so, we are sacrificing information on feature-to-feature interactions, which is something that vectorizing retains. Alternatively, we could integrate both methods i.e. combine the categorical features of interest, and then performing CP-coding on the interacted features.

Evaluating Model Performance

When it comes time to evaluate model performance, we need to be mindful about the proportion of good/evil characters. With our example model, the data is aggregated at [character*period] level (left table below).

However, the model performance should be measured at character level (right table below).

chart6

As a result, the proportion of good/evil characters between the model build and model performance data is significantly different. It is crucial that one assigns proper weights when evaluating a model’s precision and recall.

Additionally, because we would likely have down-sampled the number of observations, we need to rescale the model’s precision and recall to account for the sampling process.

Assessing Precision and Recall

The two main performance metrics for model evaluation are Precision and Recall. In our example, precision is the proportion of evil characters the model is able to predict correctly. It measures the accuracy of the model at a given threshold. Recall, on the other hand, is the proportion of evil characters the model is able to detect. It measures how comprehensive the model is at identifying evil characters at a given threshold. This can be confusing, so I’ve broken it down in the table below to illustrate the difference:

chart7

It is often helpful to classify the numbers into the 4 different bins:

  1. True Positives (TP): Character is evil and model predicts it as such
  2. False Positives (FP): Character is good, but model predicts it to be evil
  3. True Negatives (TN): Character is good and model predicts it as such
  4. False Negatives (FN): Character is evil, but model fails to identify it

Precision is measured by calculating: Out of the characters predicted to be evil, how many did the model identify correctly i.e. TP / (TP + FP)?

Recall is measured by calculating: Out of all evil characters, how many are predicted by the model i.e. TP / (TP + FN)?

Observe that even though the numerator is the same, the denominator is referring to different sub-populations.

There is always a trade-off between choosing high precision vs. high recall. Depending on the purpose of the model, one might choose higher precision over higher recall. However, for fraud prediction models, higher recall is generally preferred even if some precision is sacrificed.

There are many ways one can improve model’s precision and recall. These include adding better features, optimizing pruning of trees and building a bigger forest to name a few. However, given how extensive this discussion can be, I will leave it for a separate blog post.

Epilogue

Hopefully, this blog post has given readers a glimpse of what building a Machine Learning model entails. Unfortunately, there is no one-size-fits-all solution for building a good model, but knowing the context of the data well is key because it translates into deriving more predictive features, and thus a better model.

Lastly, classifying characters as good or evil can be subjective, but labels are a really important part of machine learning and bad labeling usually results in a poor model. Happy Modeling!

* This model assumes that each character is either born good or evil i.e. if they are born evil, then they are labeled as evil their entire lives. The model design will be completely different if we assume characters could cross labels mid-life.

The post Designing Machine Learning Models: A Tale of Precision and Recall appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/designing-machine-learning-models/feed/ 0
Introducing DeepLinkDispatch: Easy Declaration and Routing of Your Deep Links http://nerds.airbnb.com/deeplinkdispatch/ http://nerds.airbnb.com/deeplinkdispatch/#comments Tue, 30 Jun 2015 16:59:53 +0000 http://nerds.airbnb.com/?p=176595064 Deep links provide a way to link to specific content on either a website or an application. These links are indexable and searchable, and can provide users direct access to much more relevant information than a typical home page or screen. In the mobile context, the links are URIs that link to specific locations in […]

The post Introducing DeepLinkDispatch: Easy Declaration and Routing of Your Deep Links appeared first on Airbnb Engineering.

]]>
Deep links provide a way to link to specific content on either a website or an application. These links are indexable and searchable, and can provide users direct access to much more relevant information than a typical home page or screen. In the mobile context, the links are URIs that link to specific locations in the application.

At Airbnb, we use these deep links frequently to link to listings, reservations, or search queries. For example, a typical deep link to a listing may look something like this:


airbnb://rooms/8357

This deep link directly bypasses the home screen in the application and opens listing information for the Mushroom Dome cabin. Other deep links lead to other non-content screens like sign up screens or informational screens on how the application works.

Android supports deep links through declaration in the Manifest. You can add an intent filters which define a mapping between deep link schemas and Activities. Subsequently, any URI with the registered scheme, host, and path will open up that Activity in the app.

While convenient for simple deep link usage, this traditional method becomes burdensome for more complicated applications. For example, you could use the intent filter to specify the path pattern, but it’s somewhat limiting. You can’t easily indicate the parameters that you would expect in the URI that you are filtering for. For complex deep links, you are likely to have to write a parsing mechanism to extract out the parameters, or worse, have such similar code distributed amongst many Activities.

DeepLinkDispatch is designed to help developers handle deep links easily without having to write a lot of boilerplate code and allows you to supply more complicated parsing logic for deciding what to do with a deep link. You can simply annotate the Activity with a URI in a similar way to other libraries. Looking at the example deep link URI from above, you could annotate an activity like so, and declare an “id” parameter that you want the application to parse:


@DeepLink(“rooms/{id}”)
public class SomeActivity extends Activity {
   ...
}

After annotating a particular Activity with the deep link URI that the activity should handle, DeepLinkDispatch will route the deep link automatically and parse the parameters from the URI. You can then determine whether the intent was fired by a deep link and extract out the parameters declared in the annotation. Here’s an example:


if (getIntent().getBooleanExtra(DeepLink.IS_DEEP_LINK, false)) {
      Bundle parameters = getIntent().getExtras();
      String someParameter = parameters.getString("id");
      ...
}

At its core, DeepLinkDispatch generates a simple Java class to act as a registry of what Activities are registered with which URIs, and what parameters should be extracted. DeepLinkDispatch also generates a shim Activity which tries to match any deep link with an entry in the registry– if it finds a match, it will extract the parameters and start the appropriate Activity with an Intent populated with the parameters.

Additionally, DeepLinkDispatch is intended to provide greater insight into deep link usage. Android by default does not give much insight into what deep links are being used nor what deep links are failing. DeepLinkDispatch provides callbacks in the Application class for any deep link call, either successful or unsuccessful, allowing developers to track and correct any problematic links firing at the application.

An example of such a callback would be:


public class SomeApplication extends Application implements DeepLinkCallback {
  @Override public void onSuccess(String uri) {
    // Handle or track a successful deep link here
  }

  @Override public void onError(DeepLinkError error) {
    // Handle or track and error here
  }
}

In summary, use DeepLinkDispatch if you’d like an easy way to manage deep links. Declaring deep links and parameters are simple with annotations, and it will handle the more complex parsing and routing to your Activities without a lot of extra code on your part. DeepLinkDispatch also gives you greater insight into how your deep links are used by providing simple callbacks on deep link events for you to tie into.

For more information take a look at our Github page here: https://github.com/airbnb/DeepLinkDispatch

The post Introducing DeepLinkDispatch: Easy Declaration and Routing of Your Deep Links appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/deeplinkdispatch/feed/ 0
Aerosolve: Machine learning for humans http://nerds.airbnb.com/aerosolve/ http://nerds.airbnb.com/aerosolve/#comments Thu, 04 Jun 2015 17:23:42 +0000 http://nerds.airbnb.com/?p=176594991 Have you ever wondered how Airbnb’s price tips for hosts works? In this dynamic pricing feature, we show hosts the probability of getting a booking (green for a higher chance, red for a lower chance), or predicted demand, and allow them to easily price their listings dynamically with a click of a button. Many features […]

The post Aerosolve: Machine learning for humans appeared first on Airbnb Engineering.

]]>
Have you ever wondered how Airbnb’s price tips for hosts works?

In this dynamic pricing feature, we show hosts the probability of getting a booking (green for a higher chance, red for a lower chance), or predicted demand, and allow them to easily price their listings dynamically with a click of a button.

Many features go into predicting the demand for a listing among them seasonality, unique features of a listing and price. These features interact in complex ways and can result in machine learning models that are difficult to interpret. So we went about building a package to produce machine learning models that facilitate interpretation and understanding. This is useful for us, developers, and also for our users; the interpretations map to explanations we provide to our hosts on why the demand they face may be higher or lower than they expect.

Introducing Aerosolve: a machine learning package built for humans.

We have been operating on the belief that enabling humans to partner with a machine in a symbiotic way exceeds the capabilities of humans or machines alone.

From the project’s inception we have focused on improving the understanding of data
sets by assisting people in interpreting complex data with easy to understand models. Instead of hiding meaning beneath many layers of model complexity, Aerosolve models expose data to the light of understanding.

For example, we are able to easily determine the negative correlation between the price of a listing in a market and the demand for the listing just by inspecting the image below. Rather than passing features through many deep hidden layers of non-linear transforms we make models very wide, with each variable or combinations of variables modeled explicitly using additive functions. This makes the model easy to interpret while still maintaining a lot of capacity to learn.

Figure 1. Plot of model weight vs price percentile in market.

The red line encodes the general belief before looking at the data, or the prior. In this case we generally believe that the demand decreases with increasing price. We are able to inform the model of our prior beliefs in Aerosolve by adding them to a simple text configuration file during training. The black curve is the belief of the model after learning from billions of data points. It corrects any assumptions of the person working with the model with actual market data, while allowing human beings to feed back their initial beliefs about a variable.

We also took great care to model unique neighborhoods around the world by creating algorithms to automatically generate local neighborhoods based on where Airbnb listings are located. These differ from the hand made neighborhood polygons in two ways. Firstly, they are automatically generated so we are able to construct these quickly for new markets that just open up. Secondly, they are build in a hierarchical manner, so we are able to quickly accumulate statistics that are point like (e.g. listing views) or polygonal (e.g. search boxes) in a scalable way.

The hierarchy also lets us borrow statistical strength from parent neighborhoods as they fully contain the children neighborhoods. These Kd-tree constructed neighborhoods are not user visible but used to compute local features for the machine learning models. In the figure below, we demonstrate the ability of the Kd-tree structure to automatically create local neighborhoods. Notice the care we have taken in informing the algorithm that it should not cross large bodies of water. Even Treasure Island has a neighborhood of it’s own. In order to not have sudden changes along a neighborhood boundary we take care to smooth the neighborhood information in a multi-scale manner. You can read more, and visually see, this kind of smoothing in the Image Impressionism demo of Aerosolve on Github.

Figure 2. Automatically generated local neighborhoods for San Francisco

Because every listing is unique in its own special way, we built image analysis algorithms into Aerosolve to account for the detail and loving care the hosts have put into decorating their homes. We trained the Aerosolve models on two kinds of training data. On the left we have trained the model on scores given by professional photographers and on the right the model was trained on organic bookings. The professional photographers tend to prefer pictures of ornate, brightly lit living rooms, while the guests seem to prefer warm colors and cozy bedrooms.

Figure 3. Learning to rank images. On the left, image ordering trained from professional photographer ratings. On the right, image ordering trained from organic books, clicks and impressions.

We take into account many other things in computing the demand, some of which include local events. For example in the image below we can detect increased demand for places to stay in Austin during the SXSW festival and could perhaps ask hosts of consider opening their homes during a high demand period.

Figure 4. Seasonal demand for Austin

Some features, such as seasonal demand are naturally spiky. Other features, such as number of reviews, generally should not exhibit the same kind of spikiness. We smooth out these smoother features using cubic polynomial splines while preserving end point spikiness using Dirac delta functions. For example in the relationship between number of reviews and 3 stars (out of five), there is a big discontinuity between no reviews and one review.

Figure 5. Smoothing features using polynomial splines

Finally, after all the feature transformations and smoothing, all this data is assembled into a pricing model with hundreds of thousands of interacting parameters to provide a dashboard for hosts to inform themselves on the probability of getting a booking at a given price.

Please check out Aerosolve on Github. There are some demos you can find on how to apply Aerosolve for your own modelling such as teaching the algorithm how to paint in the pointillism style of painting. There is also an income prediction demo based on US census data that you can check out as well.

Figure 6. Aerosolve learning to paint pointilism style

The post Aerosolve: Machine learning for humans appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/aerosolve/feed/ 0
Airflow: a workflow management platform http://nerds.airbnb.com/airflow/ http://nerds.airbnb.com/airflow/#comments Tue, 02 Jun 2015 14:51:04 +0000 http://nerds.airbnb.com/?p=176595003 Airbnb is a fast growing, data informed company. Our data teams and data volume are growing quickly, and accordingly, so does the complexity of the challenges we take on. Our growing workforce of data engineers, data scientists and analysts are using Airflow, a platform we built to allow us to move fast, keep our momentum […]

The post Airflow: a workflow management platform appeared first on Airbnb Engineering.

]]>
pin_100

Airbnb is a fast growing, data informed company. Our data teams and data volume are growing quickly, and accordingly, so does the complexity of the challenges we take on. Our growing workforce of data engineers, data scientists and analysts are using Airflow, a platform we built to allow us to move fast, keep our momentum as we author, monitor and retrofit data pipelines.

Today, we are proud to announce that we are open sourcing and sharing Airflow, our workflow management platform.

DAGs are blooming

As people who work with data begin to automate their processes, they inevitably write batch jobs. These jobs need to run on a schedule, typically have a set of dependencies on other existing datasets, and have other jobs that depend on them. Throw a few data workers together for even a short amount of time and quickly you have a growing complex graph of computation batch jobs. Now if you consider a fast-paced, medium-sized data team for a few years on an evolving data infrastructure and you have a massively complex network of computation jobs on your hands. This complexity can become a significant burden for the data teams to manage, or even comprehend.

These networks of jobs are typically DAGs (directed acyclic graphs) and have the following properties:

  • Scheduled: each job should run at a certain scheduled interval
  • Mission critical: if some of the jobs aren’t running, we are in trouble
  • Evolving: as the company and the data team matures, so does the data processing
  • Heterogenous: the stack for modern analytics is changing quickly, and most companies run multiple systems that need to be glued together

Every company has one (or many)

Workflow management has become such a common need that most companies have multiple ways of creating and scheduling jobs internally. There’s always the good old cron scheduler to get started, and many vendor packages ship with scheduling capabilities. The next step forward is to have scripts call other scripts, and that can work for a short period of time. Eventually simple frameworks emerge to solve problems like storing the status of jobs and dependencies.

Typically these solutions grow reactively as a response to the increasing need to schedule individual jobs, and usually because current incarnation of the system doesn’t allow for simple scaling. Also note that people who write data pipelines typically are not software engineers, and their mission and competencies are centered around processing and analyzing data, not building workflow management systems.

Considering that internally grown workflow management systems are often at least one generation behind the company’s need, the friction around authoring, scheduling and troubleshooting jobs creates massive inefficiencies and frustrations that divert data workers off of their productive path.

 

Airflow

After reviewing the open source solutions, and leveraging Airbnb employees’ insight about systems they had used in the past, we came to the conclusion that there wasn’t anything in the market that met our current and future needs.  We decided to build a modern system to solve this problem properly. As the project progressed in development, we realized that we had an amazing opportunity to give back to the open source community that we rely so heavily upon. Therefore, we have decided to open source the project under the Apache license.

Here are some of the processes fueled by Airflow at Airbnb:

  • Data warehousing: cleanse, organize, data quality check, and publish data into our growing data warehouse
  • Growth analytics: compute metrics around guest and host engagement as well as growth accounting
  • Experimentation: compute our A/B testing experimentation frameworks logic and aggregates
  • Email targeting: apply rules to target and engage our users through email campaigns
  • Sessionization: compute clickstream and time spent datasets
  • Search: compute search ranking related metrics
  • Data infrastructure maintenance: database scrapes, folder cleanup, applying data retention policies, …

Architecture

Much like English is the language of business, Python has firmly established itself as the language of data. Airflow is written in pythonesque Python from the ground up. The code base is extensible, documented, consistent, linted and has broad unit test coverage.

Pipeline authoring is also done in Python, which means dynamic pipeline generation from configuration files or any other source of metadata comes naturally. “Configuration as code” is a principle we stand by for this purpose. While yaml or json job configuration would allow for any language to be used to generate Airflow pipelines, we felt that some fluidity gets lost in the translation. Being able to introspect code (ipython!, IDEs) subclass, meta-program and use import libraries to help write pipelines adds tremendous value. Note that it is still possible to author jobs in any language or markup, as long as you write Python that interprets these configurations.

While you can get up and running with Airflow in just a few commands, the complete architecture has the following components:

  • The job definitions, in source control.
  • A rich CLI (command line interface) to test, run, backfill, describe and clear parts of your DAGs.
  • A web application, to explore your DAGs definition, their dependencies, progress, metadata and logs. The web server is packaged with Airflow and is built on top of the Flask Python web framework.
  • A metadata repository, typically a MySQL or Postgres database that Airflow uses to keep track of task job statuses and other persistent information.
  • An array of workers, running the jobs task instances in a distributed fashion.
  • Scheduler processes, that fire up the task instances that are ready to run.

Extensibility

While Airflow comes fully loaded with ways to interact with commonly used systems like Hive, Presto, MySQL, HDFS, Postgres and S3, and allow you to trigger arbitrary scripts, the base modules have been designed to be extended very easily.

Hooks are defined as external systems abstraction and share a homogenous interface. Hooks use a centralized vault that abstracts host/port/login/password information and exposes methods to interact with these system.

Operators leverage hooks to generate a certain type of task that become nodes in workflows when instantiated. All operators derive from BaseOperator and inherit a rich set of attributes and methods. There are 3 main types of operators:

  • Operators that performs an action, or tells another system to perform an action
  • Transfer operators move data from a system to another
  • Sensors are a certain type of operators that will keep running until a certain criteria is met

Executors implement an interface that allow Airflow components (CLI, scheduler, web server) to run jobs jobs remotely. Airflow currently ships with a SequentialExecutor (for testing purposes), a threaded LocalExecutor, and a CeleryExecutor that leverages Celery, an excellent asynchronous task queue based on distributed message passing. We are also planning on sharing a YarnExecutor in the near future.

A Shiny UI

While Airflow exposes a rich command line interface, the best way to monitor and interact with workflows is through the web user interface. You can easily visualize your pipelines dependencies, see how they progress, get easy access to logs, view the related code, trigger tasks, fix false positives/negatives, analyze where time is spent as well as getting a comprehensive view on at what time of the day different tasks usually finish. The UI is also a place where some administrative functions are exposed: managing connections, pools and pausing progress on specific DAGs.

Screen Shot 2015-05-28 at 11.13.01 AM

 

 

To put a cherry on top of this, the UI serves a Data Profiling section that allows users to run SQL queries against the registered connections, browse through the result sets, as well as offering a way to create and share simple charts. The charting application is a mashup of Highcharts, the Flask Admin‘s CRUD interface and Airflow’s hooks and macros libraries. URL parameters can be passed through to the SQL in your chart, and Airflow macros are available via Jinja templating. With these features, queries, result sets and charts can be easily created and shared by Airflow users.

  

A Catalyst

As a result of using Airflow, the productivity and enthusiasm of people working with data has been multiplied at Airbnb. Authoring pipeline has accelerated and the amount of time monitoring and troubleshooting is reduced significantly. More importantly, this platform allows people to execute at a higher level of abstraction, creating reusable building blocks as well as computation frameworks and services.

Enough Said!

We’ve made it extremely easy to take a test drive of Airflow while powering through an enlightening tutorial. Rewarding results are a few shell commands away. Check out the  quick start and tutorial sections of the Airflow documentation, you should be able to have your an Airflow web application loaded with interactive examples in just a few minutes!

https://github.com/airbnb/airflow

The post Airflow: a workflow management platform appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/airflow/feed/ 2
The Antidote to Bureaucracy is Good Judgment http://nerds.airbnb.com/the-antidote-to-bureaucracy-is-good-judgement/ http://nerds.airbnb.com/the-antidote-to-bureaucracy-is-good-judgement/#comments Fri, 15 May 2015 16:49:51 +0000 http://nerds.airbnb.com/?p=176594984 This was originally posted on First Round Review. Mike Curtis may not have the typical background of someone dedicated to vanquishing bureaucracy. AltaVista, AOL, Yahoo, Facebook — he’s a veteran of some legendary Silicon Valley behemoths. Now VP of Engineering for Airbnb, he’s at the helm of a small but rapidly growing team. With nearly […]

The post The Antidote to Bureaucracy is Good Judgment appeared first on Airbnb Engineering.

]]>
This was originally posted on First Round Review.

Mike Curtis may not have the typical background of someone dedicated to vanquishing bureaucracy. AltaVista, AOL, Yahoo, Facebook — he’s a veteran of some legendary Silicon Valley behemoths. Now VP of Engineering for Airbnb, he’s at the helm of a small but rapidly growing team. With nearly two decades of innovation at tech giants under his belt, he’s become an expert at chopping through red tape.

As it turns out, one simple lesson guides Curtis’ approach to building effective teams: the antidote to unproductive bureaucracy is good old-fashioned judgment — having it, hiring for it, and creating conditions that allow people to exercise it. Armed with this truth, he’s tackled the challenges of scaling a world-class engineering team at Airbnb, from taming the beast of expense reports to dramatically improving site stability. And he’s done it by eliminating rules, not making them.

At First Round’s recent CTO Summit, Curtis shared actionable tactics for what he calls “replacing policy with principles” that can guide fast, flexible growth and progress. Every startup looking to dodge a fate dictated by increasing structure and processes that inevitably slow you down can benefit from these tips.

To start, Curtis’s definition of bureaucracy (when it comes to startups) isn’t one you’ll find in the dictionary:
bu•reauc•ra•cy n. 1. The sh*t that gets in your way 2. the sh*t that gets in your engineers’ way

The curious thing about organizations is that having more people somehow doesn’t equal more output. “As size and complexity of an organization increases, productivity of individuals working in that organization tends to decrease,” he says. As head count grows, so too does the policy-and-paperwork stuff that gets in the way of rapid iteration and scale.

Why is this the case? “I think it comes down to human nature and the way we react to problems,” Curtis says. Our natural response to any problem — from a downed server to a social gaffe — is to try to ensure that it doesn’t happen again. In companies, more often that not, those solutions take the form of new policies. “What happens when you create a new policy, of course, is that you have to fit it into all of your existing rules.” And so begins a web of ever increasing complexity that’s all about prevention. Soon, you start to hit safeguards no matter what it is you’re trying to do.

To avoid this type of bureaucracy from the very beginning of your company, you should adopt two particular tactics: “First, you have to build teams with good judgment, because you need to be able to put your trust in people,” Curtis says. “Then you shape that good judgment with strong principles.”

Build a Trustworthy Team By Hiring Trustworthy People

Minimizing rules that become roadblocks in your organization will only work if you’ve built a team that will make good decisions in the absence of rigid structure. Your hiring process is where you can take the biggest strides toward preventing bureaucracy.
“The most important question that you have to answer when you’re hiring somebody is ‘Is this person going to be energized by unknowns?’”

No company will ever achieve perfection, ever. So when things break, you want people who will be motivated by solving problems — those are the people who won’t pause to place blame, and blame is wasteful. Even if you have a honed process for screening and interviewing candidates, it’s worth revisiting how you test for culture fit to make sure this is part of it.

Too many companies and engineering leaders are willing to compromise to maximize technical savvy. Do not do this. Curtis recommends allocating at least 45 minutes to an interview that is entirely about culture and character. Diversity of backgrounds and opinions is championed at Airbnb, so ‘Culture fit’ is about finding people who share the high-performance work ethic and belief in the company’s mission. If people don’t share your conviction in the company’s success, they aren’t a fit.

At Airbnb, Curtis found that these four moves truly extract the most value out of this type of interview:

• Let them shine first. For the first 15 minutes of your culture interview, let a candidate describe a project they’re particularly proud of. The idea here is to get a sense of what excites them — is it technical challenges, for example, or perhaps personal interactions? “Try to suss out what gives this person energy,” Curtis says.
• Then make them uncomfortable. The other side of that coin is that you want to learn how candidates react when they’re not excited, too. Ask them about difficult experiences, or moments when they were somehow not in control. Some of Curtis’s go to questions are: “Describe a time you really disagreed with management on something. What happened?” and “Think of a time you had to cut corners on a project in a way you weren’t proud of to make a deadline. How did you handle it?” This exercise is all about reactions. “Does the candidate start pointing fingers and say, ‘This is why I couldn’t get my job done, this is why this company is so screwed up’? Or do they start talking about how they understood another person’s point of view and collaborated on a solution?”
• Calibrate your results. It’s easy to see if someone nailed a coding challenge. It’s a lot harder to get comparable reads on candidates when you’re working with a group of different interviewers. It takes time to get on the same page, but you can help the process along. “We get all our interviewers together in a room and have them review several packets at the same time to help expedite the process of getting to some kind of calibration on what’s important to us,” Curtis says. Essentially, try to make the subjective as objective as you can.
• Watch out for signs of coaching. If a candidate seems to have uncanny command of your internal language, take note. The public domain is exploding with tips and tricks from past interviewees and journalists. “Especially as your company starts getting more popular or well known, there’s going to be a lot of stuff about you out on the Internet. If people start quoting things to you that they obviously read in an article or something that is your own internal language, they were probably coached. They either read something or they talked to somebody who works at the company,” Curtis says. That’s not to say you should reject them immediately, just don’t let yourself be swayed.

Make the Most of First Impressions

Ideally, your culture interview ensures that you’re hiring a diverse set of people who share your beliefs and work ethic while introducing new ideas and perspectives. Once they’re in the door, you have your next key opportunity to establish shared priorities. “Your first week is the chance for you to set expectations with new engineers,” Curtis says. He’s found that adding a few key elements to the onboarding process pays off big down the road:

1. Remind new hires they’re working with the best. “I talk about how many people applied for their position so they understand how competitive it is to get into the company and that they’re working alongside great people,” Curtis says. Beyond its morale and excitement boosting value, this is an effective way to build a sense of urgency and ensure that new hires hit the ground with positive momentum.

2. Emphasize the value of moving fast. At Airbnb, Curtis’s direction to new engineers is to ship small things first. That can be an adjustment for people used to working on huge systems in their last gigs, but it’s proven to be a valuable way to build that all important shared judgment. “Get a bunch of code out the door, learn how things work, then you’ll ship bigger stuff,” he says of new hires.

3. Make imperfection an asset, not a liability. Share what you were looking for all along: Someone who draws energy from unknowns. “I talk about the fact that the people who are going to be successful are the ones who see things that aren’t perfect and draw energy from them,” Curtis says. Make it clear, on the other hand, that cynicism and complaints will not be rewarded.

4. Review your engineering values. When you first start to build your engineering organization, it’s good to codify the values that will guide your actions. These can be worded any way that feels right to you. Examples may include, “be biased toward action” or “have strong opinions but hold them weakly.” Whatever you come up with, go through each one, clarifying what they mean to you and why they made the list. “Values can be open to interpretation, so it’s good to have a voice over,” Curtis says.

5. Welcome new hires to the recruiting team. The people who just came through your interview process are going to be conducting them for new candidates before long, so make sure new members of your team understand that recruiting is a major and critical part of their job now. “You want them to treat it as seriously as they do writing a piece of code. They need to be really present for recruiting,” Curtis says.

6. Establish direct lines of communication. Open communication is a powerful remedy for unnecessary bureaucracy — and there’s no message more powerful than telling your team they can take concerns to upper management and meaning it. “Sometimes people feel they need to funnel all communication through their direct management. When you say that they can come to you, to the leadership several tiers up, they know that they can communicate openly across the whole organization,” Curtis says.

7. Conduct a series of initial check-ins. Good habits are established early, so don’t let up on your efforts once new engineers are at their desks. Curtis has found that one month and three months are the sweet spots for informal check-ins. “This is super lightweight. All we do is collect a couple sentences of peer feedback from the people around that new engineer,” he says.

Your questions to these team members can be very straightforward. Curtis suggests: “How are they ramping up in an unfamiliar code base?” and “What issues have they encountered and how have they reacted?”

Share the feedback you receive with the person in writing. It will be a valuable reference point for engineers as they ramp up. And if you hear any causes for concern, address them right away. Sit down with that person and clarify your expectations.

“It’s much easier to shape how someone works early on when they first start at the company than when it’s solidified a year in.”

Build the Managers You Need

When it comes to hiring and onboarding managers, though, there’s another layer to consider. These are people who are going to actively shape your company’s culture, personality and progress every day. At Airbnb, a commitment to helping managers make good decisions has manifested in an unusual policy:

“We have a philosophy that all managers start as individual contributors. We believe that if a manager doesn’t spend a significant enough time in the code base, they’re not going to have an intuitive sense of what makes engineers move faster and what gets in their way,” Curtis says.

Unsurprisingly, this can make it more difficult to hire managers. But he’s devised a four step process engineering leaders can use to find managers who will be best for their companies long-term:

• Set the expectation. There’s no sense in surprising a candidate with the news that they won’t inherit a team halfway through the process. “The first time I talk to a manager who has plenty of management experience and wants to work with us, I’ll tell them straight out that they’re going to start as an individual contributor,” Curtis says.
• Conduct a coding interview. After years in management, it can come as a shock to jump back into algorithms on the fly. “Managers who aren’t comfortable anymore with the discipline of engineering are very likely to wash out at this stage,” he says. “But that means that the people that come through in the end are the people who are going to be able to meaningfully contribute to the code base and understand their engineers.”
• Try pairing. Maintain realistic expectations for coding interviews. “If somebody’s been out of the code base for the last five years, they’re probably pretty rusty, and they’re not going to nail it on your algorithmic whiteboard coding question the way a new grad will,” Curtis says. Pairing can be a helpful workaround. “If you didn’t get a great technical signal from them, but you’ve got a good feeling about them as a manager, do a pairing session,” Curtis says. Give the candidate a chance to shine in the context of working with an existing employee. This usually surfaces latent knowledge and gives you a sense of their dynamic with other engineers as they navigate code.
• Give it more time than you think. The goal is to give managers a chance to really engage with the code base, so don’t rush things — six months as an individual contributor at your company is usually about right. “The point here is to give them a chance to ship something real and establish some legacy in the code before they take on management,” Curtis says.

What It Means to Replace Policies with Principles

At this point, through careful hiring and training, you’ve built a team with good judgment. So how can you leverage that to streamline how you run your organization? “Now you can start taking a more principled approach to how you govern the organization,” Curtis says. To bring this point home, he provides several examples that succeeded at Airbnb:

OLD POLICY: All expenses require pre-approval.

NEW PRINCIPLE: If you would think twice about spending this much from your own account, gut-check it with your manager.

“I can’t tell you how much pain in my life has come from expense reports,” Curtis says. Airbnb’s old policy was a cumbersome one: Charges big and small required approval before they could be submitted. So Curtis tried replacing it with a principle, simple good judgment, using $500 as a rule of thumb for when to get a gut-check. The result? No increase in discretionary spending (but a whole lot of time saved).

OLD POLICY: Engineers can’t create new backend services without approval from managers.

NEW PRINCIPLE: While working within a set of newly articulated architectural tenets — conceived by a group of senior technical leaders — engineers are free to develop backend services.

Here’s another case where policy was creating a huge amount of overhead. “You’d have to go explain what you wanted to do to your manager, explain the rationale, get them to understand, and then get them to approve and move forward,” Curtis says. So he tried something new: A group of senior engineers set up sessions to determine the architectural processes that mattered most to the organization, then articulated them in a series of architectural tenets. Guided by that document, engineers are now free to create new backend services. “It might even be okay to go outside of those architectural tenets, as long as you gut-check it with the team,” Curtis says.

The process used here ends up being even more important than the result. “It wasn’t me sending an email saying, ‘Here’s the rules by which you must create new services.’ Instead, it was a group of peers coming together,” Curtis says. “That created great social pressure within our team, which has worked incredibly well to keep us within the boundaries of what we think we should be developing with to solve our technology problems.”

Getting Changes to Stick
“I have a theory that the only way you can affect cultural change on an organization is through positive reinforcement and social pressure.”

A few years ago at Airbnb, pretty much none of the code being pushed to production was peer reviewed. The team was moving fast, but site stability was suffering. Curtis knew it was time to make peer reviews a priority — but how? “This was a decision point for me. I could have written up a big email and sent it out to the team and said, ‘You must get your code reviewed before you push to production.’ But instead we took a different approach.”

Your team’s goals may be different, but the steps that Curtis used to effect this principled change can serve as a template for any paradigm shift:

Make it possible. Before you establish a new priority, make sure it’s feasible within your current systems. “It turned out that a lot of our tooling for code reviews was extremely cumbersome and painful, so it was taking too long for people to even get a code review if they wanted one,” Curtis says. So he made sure that tooling was improved before rolling out this initiative. People can’t do what you haven’t made possible. If you don’t take this into account, they’ll be confused and resentful.

Create positive examples. Enlist a group of well-respected engineers to lead by example. In Airbnb’s case, Curtis asked a handful of senior engineers to start requesting reviews. “It created a whole bunch of examples of great code reviews that we could draw from to set examples for the team.”

Apply social pressure. All-hands meetings can be invaluable tools for advancing a culture-shifting agenda. That time together is already booked, so why not make it work for you? “We started highlighting one or two of the best code reviews from the week before,” Curtis says. “We’d have the person who got the review talk about why it was helpful for them and why this was useful.” Your best spokesperson for a new principle a member of the team who’s already bought in.

Address stragglers. If you don’t get everyone on board on the first pass, don’t take it personally. In fact, Curtis considers converting this crowd an important final step in the process. In the case of Airbnb’s code reviews, he and his senior engineers talked to each holdout and learned what their concerns were. “Usually the end of that conversation was just ‘Give it a try for a couple of weeks, see how it goes, see if it works.’ Most of them had a very positive experience and then were brought along.”

In roughly two months, Curtis had made peer code reviews the overwhelming norm without establishing a single policy. “This is the power of positive reinforcement and social pressure to bring about cultural change in an organization. I didn’t hand down any edicts, I didn’t say ‘It has to be done this way from now on,’ I didn’t put any formal policy in place,” he says. In fact, code reviews still aren’t enforced in any way; an engineer could still go straight to production anytime — but no one does it.

At the end of the day, though, Curtis is not advocating for the unilateral elimination of all company policies. Sometimes you need rules. “A good example for us is when you’re traveling overseas, there are very specific policies about what kind of data you can have access to and what kind you can’t,” Curtis says. When the health of your organization depends on something that can’t be left open to interpretation, go ahead and make a rule — but do so sparingly.

The real trick is to recognize that a policy doesn’t exist in a vacuum — it interacts with every policy that went before it — and adds to a collective mental and documented overhead that adds up the bigger you get. You want to minimize this over-head however possible, and the easiest way to do that is to trust your team, and clearly articulate your values.
“It really comes down to putting your faith in people with good judgment, making sure you hire good judgment, and then guiding them with principles.”

The post The Antidote to Bureaucracy is Good Judgment appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/the-antidote-to-bureaucracy-is-good-judgement/feed/ 0
OpenAir is back for 2015 http://nerds.airbnb.com/openair-tech-conference-2015/ http://nerds.airbnb.com/openair-tech-conference-2015/#comments Thu, 07 May 2015 21:08:00 +0000 http://nerds.airbnb.com/?p=176594976 We are excited to announce that Airbnb will be hosting our annual tech conference, OpenAir. We’ll be hosting it on June 4th at CityView At The Metreon from 9:00am – 7:00 pm.   OpenAir is the premier tech conference that focuses on creating engineering solutions to the challenges of matching. The brightest minds in the industry […]

The post OpenAir is back for 2015 appeared first on Airbnb Engineering.

]]>
We are excited to announce that Airbnb will be hosting our annual tech conference, OpenAir. We’ll be hosting it on June 4th at CityView At The Metreon from 9:00am – 7:00 pm.

 

OpenAir is the premier tech conference that focuses on creating engineering solutions to the challenges of matching. The brightest minds in the industry will come together to tackle such issues as search and discovery, trust, internationalization, mobile, and infrastructure.

 

We have representation from a broad swatch of companies speaking at OpenAir – Netflix, Stripe, Periscope, LinkedIn, Etsy, Pinterest, Lyft, HomeJoy, Watsi, Instagram, Facebook, and Google.org

 

This year we’ll have more technical talks and we’ll hear about Scaling from Instagram co-founder, Mikey Krieger, Innovation at Netflix from Carlos Gomez-Uribe, Reaching underserved communities from Watsi co-founder, Grace Garey, and Building Periscope from Sara Haider – among many others.

 

Attendees will get access to technical talks, hands-on sessions and thought-provoking discussions to help you break through some of your own engineering challenges and projects. Throughout the day there will be time to network with local engineers, take part in interactive sessions, drop in for lightning talks, and meet the speakers.

 

Registration is $50 and all proceeds from registration fees will be donated to CODE2040.

CODE2040 is a nonprofit organization that creates programs that increase the representation of Blacks and Latino/a in the innovation economy. CODE2040 believes the tech sector, communities of color, and the country as a whole will be stronger if talent from all backgrounds is included in the creation of the companies, programs, and products of tomorrow.

 

Please register here.

The post OpenAir is back for 2015 appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/openair-tech-conference-2015/feed/ 3
Behind the Scenes: Building Airbnb’s First Native Tablet App http://nerds.airbnb.com/airbnb-tablet/ http://nerds.airbnb.com/airbnb-tablet/#comments Wed, 29 Apr 2015 23:30:20 +0000 http://nerds.airbnb.com/?p=176594946 At Airbnb, we’re trying to create a world where people can connect with each other and belong anywhere. Whether you’re traveling, planning a trip with friends, or lying on your couch window-shopping your next adventure, you’re most likely using a mobile device to do your connecting, booking, or dreaming on Airbnb. Our tablet users have […]

The post Behind the Scenes: Building Airbnb’s First Native Tablet App appeared first on Airbnb Engineering.

]]>
At Airbnb, we’re trying to create a world where people can connect with each other and belong anywhere. Whether you’re traveling, planning a trip with friends, or lying on your couch window-shopping your next adventure, you’re most likely using a mobile device to do your connecting, booking, or dreaming on Airbnb. Our tablet users have probably been surprised to learn that thus far, we have never had a native tablet app. Last summer, a small team decided to change that. We started exploring what Airbnb could become on tablet, and today, we’re excited to share it with the world. It’s a challenging feat to build an entirely new platform while simultaneously maintaining and shipping an ever-evolving phone app; and we’ll tell you more about what went right, what went wrong, and how we ultimately made it happen. tablet-3

The first little steps are pivotal

After the successful launch of our Brand Evolution last summer, we formed a small team to start laying the groundwork for tablet. In building the tablet app, we took many of the technical learnings from the rebrand. For instance, much like the rebrand, we wanted to build the app over the course of several releases and eventually release the official tablet app. A small team of designers and three engineers (two iOS, one Android) formed to start building the foundation of the app and exploring the tablet space. We knew we couldn’t rewrite the entire phone app, and that if we ever wanted to ship, we’d have to reuse some of the views already existing on the phone. We reviewed every screen of the phone and every feature to determine the engineering to design cost of rebuilding each screen. One thing we quickly realized was that our top-level navigation system, “AirNav,” wouldn’t translate well to the tablet space. Instead, we’d have to design and build something new.

Airbnb goes tab bar

At Airbnb, we strive to have a seamless experience across platforms and to maintain feature parity no matter the device or form factor. This meant that whatever navigation system we chose for tablet would also have to work on phone. In order to quickly find a solution while covering as much ground as possible, the team split up to prototype as many navigation systems as we could. One of our designers, Kyle Pickering, even went as far as teaching himself Swift so he could build functional prototypes. At the end of the week we had several prototypes in all different forms, fully functional (albeit hacky) prototypes built from our live code, functional prototypes built in Swift with baked data, and even some keynote and after-effects prototypes. We took these to our user research team to quickly get some real-world user feedback on the designs. A big part of the culture at Airbnb is to move quickly and run experiments along the way, rather than waiting until the end. With a pending phone release on the horizon, we decided to build and ship the Tab Nav on phone, wrapped behind an experiment we could roll out and test on. Since the majority of the mobile team was still hard at work on building new features, we had to build the new nav quickly and quietly, in a way that would allow the rest of the team to turn it on or off at runtime without restarting the app. We launched the new nav in November 2014, which gave us several months to collect data and iterate on the high-level information architecture while we built out the tablet app. tablet-1 Fun fact: Up until the launch of tablet, both navigation systems were still active and could be turned on or off via experiment flag.

Dipping our toes in MVVM (kind of)

On iOS, MVC is the name of the game. We knew we were shipping a universal binary; we weren’t going to split targets or release two apps. In terms of code architecture, we worried that shipping a universal app would cause a split in our app that would become unwieldy over time. It wouldn’t take long for the codebase to become littered with split logic, copy-and-pasted code for experiments, and duplicate tracking calls. At the same time, we didn’t want to have massive view controller classes that split functionality between platforms. This required us to rethink the MVC pattern that was previously tried and true. What we realized was that almost every model object in our data layer (Listings, Users, Wish Lists, etc.) have three UI representations: a table view cell, a collection view cell, and a view controller. tablet-2 Each of these representations would differ from tablet to phone, so instead of having branching logic in place everywhere these objects were used, we decided to ask the model how it preferred to be displayed. We built a view-model protocol that allows us to ask any model object for its “default representation view controller.” The model returns a fully allocated device-specific view controller to be displayed. At first, these view-model objects simply returned the phone view controller, but when we eventually started building the tablet version we simply had to change a single line of code for the tablet view controllers to be displayed app-wide. This reduced the amount of refactoring we had to do once we started building out view controllers and allowed us to focus on polishing the view controllers. Also, this kept all of our code splitting check centralized to a few classes. Next we started moving through the existing phone controllers and pulling all of our tracking and experiment logic into shared logic controllers that would be used for both the phone and tablet views. This allowed the team to continue working on the phone, by adding experiments and features that would automatically find their way onto the tablet app.

Kicking a soccer ball uphill

By January 2015, the tablet team was all-hands-on-deck, and design was in a stage where we could start building the tablet app. We had around two months to build out the app and about a month for final polish and bug fixes. Design had produced several fully working demo apps to prototype interactions and UI animations. In producing these code-driven demos, the design team was able to identify gotchas in the design long before engineering was ramped up, which made for an overall smooth development period. There were, however, a few issues that inevitably popped up. For several scrolling pages throughout the app, design called for a lightweight scroll snapping. The idea was to have scroll views always decelerate to perfectly frame the content. This is not a new or revolutionary idea, but on a large-scale tablet device we discovered that, more often than not, this interaction annoyed the user. One user described it as being like “trying to kick a soccer ball up a hill.” Though the final results were visually pleasing, taking control from the user undermined the beauty of the design. Instead of cutting the feature completely, we decided to take a deeper look at the problem. Previously, we were using delegate callbacks which were fired when the user finished scrolling, and then adjusting the target content offset to the closest pre-computed snapping offset. We realized the problem with this system is that it doesn’t take into account the intent of the user. If a user scrolls a view and slides their finger off the screen in a tossing manner, the system works great. If a user purposefully stops scrolling and then releases touch the scroll view snaps to the nearest point, creating the “uphill soccer ball” effect. We decided to disable scroll snapping on the fly once the velocity of the scroll dropped below a certain point, giving the user control of the scrolling experience. Achieving these small wins and being truly thoughtful around user intent helped elevate the app experience to a whole new level of delight and usability. tablet-5 tablet-4

Always be prepared

As we crossed the finish line and landed the project (mostly) on time, we took a little time to reflect what worked to complete such a massive project. We were reminded of the old Boy Scout mantra: “Always Be Prepared.” Even though the entire team built the tablet app in just a few short months, it wouldn’t have been possible without the foundation work that was silently laid throughout the year before. From designers learning to code and building prototypes, to shipping the tablet navigation system on phone months ahead of release, this prep work ensured that when it came time to officially move towards our goal, we were ready. Processed with VSCOcam with 5 preset

The post Behind the Scenes: Building Airbnb’s First Native Tablet App appeared first on Airbnb Engineering.

]]>
http://nerds.airbnb.com/airbnb-tablet/feed/ 0