Accurate predictions remain the Holy Grail of every business intelligence initiative. Reporting data to show what definitely happened is beneficial. Analyzing data to determine why things happened the way they did is even better. But having the confidence to predict what will happen in time to take action and alter outcomes demonstrates the real power of data.

Despite the obvious value, organizations are just now achieving predictive analytics productivity according to Gartner. Maybe the delay was a lack of data or the quality of the available data. Maybe the delay was a lack of tools or skills. Maybe the delay was simply a lack of understanding the tools available and the skills needed to leverage those tools.

Many times I have spoken on the power of Microsoft's data mining tools only to discover the audience was unaware of those capabilities. Even trusted users of SQL Server Integration Services and SQL Server Reporting Services remain unaware of the data mining capabilities built into SQL Server Analysis Services (SSAS).

SSAS has included data mining functionality for a long time. One of the complaints against SSAS data mining is the functionality has not evolved much since SQL Server 2005. That consistency might suggest the functionality is stagnant or it might mean one thing less to worry about when you upgrade your SQL Server platform.

As the high-level diagram below shows, a SSAS data mining process includes a mix of data sources, data source views, mining structures, mining models, viewers and outputs. Mining structures organize the superset of column data into unique cases to be mined. Each mining structure includes one or more mining models. The mining models are the mix of included case data, algorithms, filters and properties that actually produce predictions.

Since SQL Server 2005, SSAS has included 9 data mining algorithms for mining model use. Each algorithm supports various use cases with certain input and output limitations. The following table introduces the out-of-the-box algorithms. You can integrate your own custom data mining algorithm into SSAS, but I have not yet seen a need for custom algorithms beyond the stock ones described below.

Algorithm Name

Description

Use Case

Associations

Recommendation engine based on sales or interests

Market basket

Clustering

Groups similar sets of data to recognize anomalies

Fraud detection

Decision Trees

Most popular method for predicting discrete and continuous values from discrete and continuous inputs

Almost any prediction

Linear Regression

Linear relationship between continuous numeric input and predicted variables

Continuous numeric prediction

Logistic Regression

Simplified version of Neural Network

Not widely used in business predictions

Naïve Bayes

Uses and predicts only discrete values

Quick, simple analysis

Neural Network

Complex logic based on artificial intelligence research

Not widely used in business predictions

Sequence Clustering

Groups similar sets of sequential data

Web site click paths

Time Series

Forecasts continuous values over time

Quarterly visitor counts

The algorithms I use most include Decision Trees, Clustering, Associations and Time Series.

Since each algorithm has certain strengths and limitations, I encourage you to include multiple algorithms in your analysis then compare the predicted results to determine which algorithm best meets your needs. SSAS even includes a results comparison tool which plots out the successes of one predicted result set versus another.

SSAS's wizard-driven data mining tools with easy-to-understand visual results simplify predictive analytics. Of course that simplicity also makes the analytical process something of a black box. You put data in, you get results out and you hope your loss of flexibility and control is worth the simplicity. As long as you test and understand your results, easier predictive analytics beat no predictive analytics.

I am not arguing against statisticians or proven advanced analytical tools such as SAS. Businesses can use those skills and technologies. My point is businesses should right-size their solution. Not every business can have a skilled statistician on staff with dedicated technology.

Businesses with a licensed SQL Server Analysis Services instance, quality data and a data analyst who understands correlation from causality and probabilities from certainties can produce worthwhile predictions.

In Part II of this series, I will explain just how easy it is to leverage Microsoft's data mining tools.