data flowGartner recently reported[i] that nearly 50% of data projects fail on a regular basis. For every successful big data analytics implementation, there are hundreds of projects that start with a bang and end with an empty Hadoop cluster. How can an organization ensure that it is not spending time and money on a project that doesn't return results?

Successful data projects can often be thought of the same process as panning for gold. The flow of data between and through various departments in a company can be thought of as analogous to water, with important metrics for decision-making as the nuggets of gold that can help consumers make better choices[ii], drive better decision-making[iii] across departments, and even save lives[iv].

As with a real-life mining operation involving large teams and cross-functional communication, the process of "mining" the flow of data can become complicated for a number of reasons. In real gold mining operations, it could be that the pipe through which the water flows was built to incorrect spec, or maybe no one is keeping track of where the water is flowing, and maybe the miners at the end of the line don't quite know which type of nuggets to keep and which to throw back into the current.

Similarly, there are numerous reasons data projects can fail, from picking the wrong technology, to time constraints, to conflicting management philosophies.

Ensuring data project success lies in three key points:

  • Creating a data project champion
  • Involving stakeholders closely throughout the process, and
  • Choosing the right tools for the job

Project Leadership - Picking a data champion

Every project needs someone who is championing the success of the product because they understand its importance to the company, and who can influence management input into the process. Often, executives will perceive the need for better insight and leave it to other teams to plan implementation, moving on to more urgent, tactical business needs. But a big data project involves a potentially large pool of resources to cooperate and shed light on the data: engineers, data scientists, system administrators, data governance and security teams. All of these individuals need to be driven by a single overarching goal, often coming from a sponsor who is close to core company goals and can help offer input to drive the project in the right direction. Having leadership champion data projects ensures that they have a single champion to regularly assess results and give teams the incentive to move forward.

Stakeholder Involvement

It's also important to involve stakeholders who will be on the receiving end of the data or analysis. The miners building the pipe to empty the river have a different vision than the ones who install the screens to dredge the silt, who have an entirely different view than the gold diggers sifting through the silt at the end of the line. The ones examining the end result are the most important stakeholders and they have specific needs. How do they like to receive data? Do they need a flat file for further analysis, or a snapshot dashboard available on mobile or as a read out on a conference call every day? What kinds of metrics are important to them and why? Starting with these questions and building the system backwards will guarantee the data is used on a daily basis. While talking to these stakeholders, it's important to ask: "Is what we're building useful?", "Do these requirements for project leadership align with your team's internal needs?", "Once we've built the platform and created the data analysis, what do we expect to learn, and how can we adjust our expectations when we don't?"

Correct Toolset

After the heavy lifting of the strategic work has been done, the technical architecture is the final critical step. In many large organizations today, the fire hose of water requires a big data solution, but in smaller teams, the overhead may be too much to maintain. Having the correct tool set is important. If engineers are building APIs that business analysts can't access without SQL queries, for example, the data pipeline might work, but it's not generating any insights. Even if there is buy-in from management and stakeholders are enthusiastic about the data, having sporadic access to the data, or a pipeline that's too small to hold the amount of data the organization generates, can hold up the data project, as well.

But having all three of these components in line will ensure success in making use of an organization's wealth of data.