This article is part one of a two-article series on fraud modeling in financial services. For a more in-depth look at model errors and adjustments, take a read through the second piece here.
One of the most transformative effects that technology has had on the financial services industry has to do with fraud. That is, as payments have become easier for users, such as with card-not-present transactions, new types of fraud have been introduced—resulting in the need for better fraud detection. Fraud modeling is one important tool in this effort, which will only expand in importance as companies determine which type of models to use and continuously update them to protect against evolving threats.
Traditionally, fraud models in the financial services industry were developed to automatically detect unauthorized credit card transactions. Card issuers like Chase or Capital One use fraud models to determine when a card has been used without the owner's consent. Card networks like Visa or MasterCard use models to identify fraudulent card use in order to maintain their networks’ security and integrity, a key component of the service they provide for merchants.
But the recent proliferation of payment services presents new types of fraud challenges. Many new services often access funds directly from a connected bank account, which means they do not benefit from the fraud protection a card network or issuer might provide. Additionally, payment services also face specific, unique types of fraud problems.
Fraud models provide a way to mitigate these threats and protect users. Consequently, these models play an essential role in payment service providers’ profitability and sustainability. This post breaks down the factors that play into two high-level models: rules-based and algorithmic (or machine-learning) models.
Variables in fraud models
A fraud model considers all available or relevant information for a given transaction and then attempts to label the transaction fraudulent or legitimate. More elaborate models may even attempt to label the type of fraud they think is being committed.
Some of the common factors a fraud model can consider are:
- Merchant: What type of business is charging the transaction?
- Location: Is Steve from San Francisco's card suddenly seeing charges in rural Nebraska?
- Amount: Is a user suddenly sending large amounts of money?
- Type of transaction: Is the transaction taking place in person or online?
- Volume: Is a merchant suddenly seeing many charges?
- Account history: Is this a brand-new account, or has this user been on the service for a while?
- Transaction history: Is a rarely used card suddenly seeing a flurry of transactions?
Not all of this information is available at all points along the payment chain, so which variables a fraud model actually uses often depends on which of the above signals the company developing the model has access to.
A rules-based model is a collection of rules used to identify fraudulent transactions. A single rule contains as a set of conditions that, when satisfied, label a transaction as fraudulent or potentially fraudulent.
Some example rules are:
- If a user receives payments for large amounts from multiple newly created accounts
- If a charge occurs at an infrequently visited gas station far from the card’s billing address
- If someone from the same IP address is creating multiple accounts and sending money with credit cards
Because rules-based models are just collections of conditions, they are easy to interpret. This extra transparency makes it simpler to diagnose and identify issues, e.g., why a model fails to identify a fraudulent transaction or mislabels a legitimate transaction as fraudulent. The interpretability also enables modelers to rapidly develop rules once a new threat is discovered.
Rules-based models are much simpler than their algorithmic counterparts. Their simplicity provides many benefits in the development cycle. For one, modelers have an easier time developing and validating rules-based models. Engineers also benefit from an easier system to implement. Rules-based models are also much faster in operation, which can be an important factor when dealing with a large number of transactions in real time.
An algorithmic model makes use of machine-learning methods to determine whether a transaction is fraudulent. An overview of these methods is outside the scope of this post, but R2D3 has an award-winning visualization that provides a gentle introduction to some core machine-learning concepts.
Algorithmic models are far more complex than rules-based models. Depending on the exact method used, the resulting models may also be more difficult to interpret. This additional complexity and opacity means longer development cycles, more maintenance, and slower operation. That makes implementing these models more difficult for mission-critical situations that require real-time transaction labeling at scale.
Despite these difficulties, algorithmic models are still used for fraud modeling because they are better at identifying complex relationships or more nuanced interactions between variables. They can even identify patterns in data that are so complicated it would take a human years to come up with them.
In order for the algorithms to tease out these patterns, a lot of data is needed—both in terms of the number of observations used to build the models and the number of variables available for the model to consider. Having access to a large number of records for model development is especially important for fraud models because fraudulent transactions are pretty rare: Fewer than 1 in 1,000 transactions are fraudulent. Without seeing a lot of fraud cases, it’s impossible for an algorithm to identify any meaningful patterns. Machine-learning methods also benefit from having many different variables to work with and learn from; when used with a limited number of variables, the benefit of these methods over rules-based models is diminished.
Fraud models in practice
While this binary classification of models is reasonably clear cut, the models actually in use can fall anywhere along the spectrum. Companies can even mix and match and use models in tandem or in a hierarchy. For example, an elaborate system might have a fast, rules-based model at the top that identifies potentially fraudulent transactions. This subset of transactions is now much smaller and can be run through an algorithmic model for final fraud detection.
Many factors contribute to how a company chooses which type of model (or combination of types) to use. There are practical constraints such as resource or expertise limitations and data availability. Since no model is 100 percent accurate, one of the most important (and most interesting!) considerations is balancing the types of errors that the model makes and the business implications of those errors — a tradeoff we've unpacked here.