Data Science Finance Machine Learning Python

Case Study – Machine Learning

Measurement of Success of the Credit Card Fraud Detection ML Model

Let’s review one of the most common use cases of machine learning (“ML”) algorithms – credit card fraud detection and measure its success in reducing losses.

The problem is a natural fit for ML because of the following characteristics:

  1. A large number of transactions occur every second of the day
  2. A real or fraudulent transaction can occur in any part of the world for any purpose
  3. Each transaction is identified as real or fraud (the target) after it is completed
  4. Every fraudulent transaction has real costs and customer service headaches for card issuers, merchants, and customers alike
  5. Fraudulent transactions are a very small portion (less than 0.05%) of the total number of transactions. Correctly identifying them is akin to finding a needle in a haystack

In theory, it would be quite easy to deploy the best performing ML algorithm (as determined by whatever metric used) given the vast amount of labeled dataset available at every financial institution (“FI”). The FIs should see a meaningful reduction in fraud-related losses from the very next day!

Theoretical Model Performance

Here is the model performance on test data. This model was trained on about 280,000 transactions.

Yes, the confusion matrix is not the best metric for evaluating the performance of ML algorithms on datasets containing unbalanced samples, but this model was specifically configured to adjust for it. Here are the ROC curves. The model is good to go.

Assuming that the FI executed the project well and deployed this model into production, the model should produce meaningful savings right away. Over time, as the model is updated by training it on new data, fraud-related losses should drop to near zero.

Analysis of Success

Since credit card ML project success metrics are proprietary, we can calculate overall success by performing analysis on aggregate data. Typically, credit card delinquencies (“DQ”) turn into charge-offs (“CO”) within 90 days. Thus, a “simple” success measurement metric could be CCFraud = (Charge-Off – Delinquencies). Note: If the FIs didn’t have meaningful losses due to fraud, then DQ >= CO by definition.

The chart below shows the quarterly DQ rate on credit card loans by the top 100 banks (red line) and the difference between DQ and CO (blue line=DQ-CO). The blue line is both negative and stable. It also never gets close to zero. This shows that portfolio CO is greater than DQ (probably due to fraudulent transactions and other items). Additionally, the stability of the blue line shows that the FIs have achieved the minimum possible fraud level (probably by applying really expensive technology) and models are unable to reduce it any further.

Let’s look at another chart showing the quarterly DQ rate on credit card loans by banks other than the top 100 (regional and community banks). As shown below, there is certainly more volatility in DQ and CO, but also notice that the difference between DQ and CO (our estimate of fraud) has significantly more volatility when compared to the same ratio for the top 100 banks. One can guess (confidently) that the smaller banks are either not using ML technology or not updating their ML models in a timely fashion.


The ML application by the top 100 banks is certainly able to reduce credit card loan charge-offs, however, they are not able to eliminate them. Institutions in other than the top 100 banks can benefit from adding ML models into their workflow or updating any ML applications they already have. Both, types of institutions can reduce losses further by incorporating one or all the recommended solutions.

Data Science Finance Machine Learning R

Quantitative Modeling (Finance)

Business Problem

Build a simple model to analyze how much charge-offs can we expect at any level of portfolio loan delinquency (time independent)

Regression Models Analyzed (in R)

  1. Linear Regression
  2. Non-Linear Regression
  3. Support Vector Machine
  4. K-Nearest Neighbor (KNN)
  5. Classification and Regression Tree (CART)
  6. Random Forest


RMSE Comparison


Markets crashing; curve is inverting: Is recession next?

This article explores the possibility of a U.S. recession and offers suggestions to protect investments during these volatile times.

Today each of the major U.S. indices lost close to 3% of their value. The selling was broad based and indiscriminate. More importantly, the U.S. Treasury curve between two-year and ten-year maturities inverted briefly intra-day and closed at a low of 0.4 basis points spread. At market close the ten-year Treasury yielded about 1.581 percent and two-year Treasury yielded about 1.577 percent. During the day, every financial news channel was buzzing with discussions about the coming recession. This was primarily due to the near inversion of the 2-10 curve, which has been a reliable leading indicator of coming recessions in the past.

So, what should an average investor do? Here are a few tips when investing during turbulent times.

  1. Turn off the TV, radio and financial news. Don’t log into your brokerage accounts to check positions or how much money you have lost or gained. When markets are volatile, even the best traders have a tough time predicting trends and day-to-day movements. Market timing for purchasing or selling is a losing proposition that has been well researched. The old wall street adage of “Bulls make money; Bears make money; Pigs get slaughtered” is very much applicable here.
  2. Don’t initiate any new positions or sell existing positions to reduce risk. Much of public market movement is driven by algorithms that think and trade in milliseconds. Many programs use momentum trading strategies, meaning they sell when markets are falling and buy when markets are rising. Thus, these automated trading strategies can exaggerate gains or losses and give false signals on where the markets are headed. The algos are also not emotional about making or losing money (their programmers sure are).
  3. When markets are volatile, they tend to be volatile for a while. How long? Could be six months or a year or two. There is no way to predict. If there is an investment that you have been waiting to get into or out, wait until the markets are calm for a few months (signaling that markets are switching to calm mode that may also continue for a while!).
  4. Public markets are demonstrating abnormal levels of sensitivity to central bank policies and political headlines. Thus, opportunistic private investments are far more lucrative and offer lower volatility. These do come at the expense of liquidity though. If you are a qualified institutional investor or an accredited investor, ask your financial advisor for some ideas. There are some amazing opportunities available for sure!

Bottom line is that whatever you do or don’t do, just don’t panic. My speculation is that the 2-10 curve leading indicator is a distortion due to central bank policies for the most part and we are not headed into a recession due to the strength of the underlying economy. So, let’s not wish ourselves into a recession by panicking!