Machine Learning Interpretability: Peering Inside the Black…

Machine Learning Models or Magic Black Boxes

Machine Learning models often operate like magic, capable of producing highly accurate insights from enormous datasets in seconds. These insights are rapidly changing business: driving decision-making, growing revenue, shaping trends and opening new avenues for research and development. It's no wonder that companies are scrambling to hire top tier data scientists and data engineers to ensure they're not left behind.

The underlying assumption is if these scientists understand the science and mathematics behind machine learning, then they can build a highly accurate model that gives their company a competitive edge, even if the scientists are the only ones who truly understand how it works. However, while data scientists and engineers have many tools, the reality is if the models are not designed with interpretability in mind, even they are left peering helplessly into the dark.

What Does It Mean to Be Interpretable?

Interpretability is the science of making machine learning models understandable, not just to technical experts. This doesn't mean that every facet of the model is understood, but it does mean that the results of the model are understandable and explainable. At the very least, anyone should be able to explain why the model gave the output it did. The more understandable and explainable a model is, the more interpretable it is. However, this comes at a cost: as the complexity of algorithms and its accuracy increases, the interpretability decreases.

To better understand why this is the case, we need to understand what machine learning is at a high level and how these models relate to it. Machine learning is the complex interaction of linear algebra and statistics used to create reusable algorithms for solving problems. The algorithms - Random Forest, K-nearest Neighbors and Linear Regression, just to name a few - can be implemented in many languages including R, Python and Java using frameworks like TensorFlow and Caffe. The results of these machine learning algorithms acting on and learning from data are called models. While data scientists can understand the algorithms and the data, they can't necessarily understand how a particular model is being modified and weighted by the data. Some algorithms are more interpretable than others, as they do vary in degree of interpretability, with higher accuracy algorithms like neural networks and boosted trees, more and more of their layers are hidden, obscuring the mechanisms driving them.

Why Peer Inside the Box?

This is not to say that companies should sacrifice accuracy for interpretability. In some situations, it might be okay for a company to design a black box with little to no interpretability that gives users highly relevant purchase suggestions or entertainment selections. It provides value to the customer and revenue to the company while the risk associated with getting something wrong remains low. However, if that black box is responsible for loan approvals or financial risk assessment, that lack of interpretability starts to become a problem.

Leaving aside the business risks for a moment, the legal implications alone should be cause for concern. In loan approval, for example, a bank can't simply say, "Due to our machine learning model, you don't qualify." The bank must be able to point to specific things like credit score and income to adhere to the Equal Credit Opportunity Act (ECOA) and Fair Housing Act (FHA). As far as the business concerns, it's hard to identify the risks that exist in a company's projections if it doesn't know how those projections are derived. What makes the other company worth acquiring? What makes this credit card activity suspicious? These questions become unanswerable with an uninterpretable machine learning model.

Who Benefits from Interpretability?

While a model's interpretability, or lack thereof, may not always impose such legal or financial risks, it is still important to multiple parties for many reasons.

Data scientists benefit from the visibility into a model that enables fine tuning, increasing accuracy and reliability. It provides a window into connections and correlations, sparking new insights and uncovering unexplored avenues for further research. It generates trust between the data scientist and the consumer of the model, whether it is a user or a business.
Users benefit from gaining insight into what shapes their experiences with machine learning. If recommendations are poorly tuned to an individual, this insight can empower them to enhance their own experiences with it. It provides feedback that enables behavioral adjustment.
Regulators benefit from confidence that the systems in place are transparent, fair and explainable. This enables biases and prejudices (intentional or unintentional) to be weeded out.
C-Suites benefit from the ability to communicate to business partners and shareholders on how this technology is driving innovation and producing value. It allows them to concentrate on the big picture and worry less about metrics that might not be as valuable as the insights provided by the model.

How to Peer Inside the Box

The good news is that if your company is already using a model based on a complex uninterpretable algorithm, there are tools that can be used to shed some light on what is happening inside.

LIME, or Local Interpretable Model-Agnostic Explanations, is an open source tool for probing a model to determine the data that is driving the prediction. The main method of discovery is perturbation, or the process of slowly and subtly changing the inputs to understand how the model is functioning. Then, a linear model, or shadow model, is fitted against this data as an explanation.

DeepLIFT is another tool that uses recursive prediction designed specifically to work on neural networks. Using backpropagation to reverse engineer an input node from an output node, it maps the contributions of the individual neurons to the original model.

Where We Go from Here

The problem of interpretability is becoming more widely discussed and studied as companies are starting to implement more complex models in production. At the recent Strata Data Conference in San Jose, numerous sessions on this topic were held by experts in the field. Mike Lee Williams from Cloudera Fast Forward Labs and Evan Kriminger from ZestFinance drew crowds of attendees trying to figure how to solve this problem within their own organizations. The tools mentioned above are a start, but the industry has a long way to go before these tools are mature and interpretability in machine learning is ubiquitous. As an industry, we must start trying to understand these black boxes if we ever hope to make machine learning truly transparent, fair and meaningful.

2025 Executive Research Reveals the Keys to AI ROI

ScoreSight: A Modern Scoring Solution for TGL Presented by SoFi, a New Stadium Golf League

Enabling Decision Intelligence with the ADEPT Accelerator

AI-Based Tool Accelerates Data Ingestion for Financial Provider

2026 Tech Trends: The Only Constants Are AI and Change

CapTech Wins Forbes America’s Best Management Consulting Firms for Eight Consecutive Years

Machine Learning Interpretability: Peering Inside the Black Box