Over the course of three years, we’ve built Stripe from scratch and scaled it to process billions of dollars a year in transaction volume by making it easy for merchants to get set up and start accepting payments. While the vast majority of the transactions we process are legitimate, Stripe does need to protect its merchants and itself from rogue individuals and groups seeking to “test” or “cash” stolen credit cards. In this talk, I’ll discuss the two types of fraud Stripe faces—merchant fraud and transaction fraud—and how our approach to both has evolved over time. I’ll then demonstrate how “randomness” helps us combat fraud in the three stages of the model-building process. Specifically, I’ll focus on how fraud often appears less random than legitimate behavior (feature generation), how our in-house, distributed random forest learner allows us to build models on huge data sets with more control over how the random splits are made (model training), and how the introduction of randomness in the production scoring environment allows us to reason about counterfactuals (“what would have happened if we hadn’t intervened?”) and evaluate candidate models without production experiments (model evaluation).
Michael Manapat is a Software Engineer on Stripe’s Machine Learning team. He was previously a Software Engineer at Google, a Postdoctoral Fellow in and Lecturer on Applied Mathematics at Harvard, and a graduate student in mathematics at MIT.