The recent data breaches in organizations such as Target, Home Depot, and the USPS have raised awareness about the importance of securing data assets. The question everyone wants to know is; ‘How do I ensure that I'm not next?' While there may not be one definitive answer, there are steps you can take to better understand your organization's data architecture, and implement a unique solution for a more secure data governance model.

In the age of the customer, more and more organizations are recognizing the high value of data and are moving to big data platforms to capture, store and mine massive amounts of data. All this data is being used to seek a competitive edge, reduce fraud waste and abuse, and provide a better customer experience while reducing software and hardware costs. It is clear that Hadoop is now mainstream and here to stay, being used to solve complex problems in industries such as Financial Services, Healthcare and Utilities among others.

The use of Hadoop to manage the increased volume of data and number of users accessing the data, presents an entirely new set of challenges. To address these challenges, organizations must balance the need to maintain a strong data governance model without sacrificing the business agility required to transform this data into high value information.

Hadoop security is still immature and if governance is ignored data could easily be compromised In order to govern the data stored in Hadoop we first must understand what is contained in the stored data. Only then can we have a discussion about how it is secured, how it is used, and by whom. Data governance has always sought to address these issues but in the era of big data it is even more important - in the blink of an eye organizations can lose track of sensitive company and consumer information, and find themselves as front page news.

Beyond addressing security and compliance issues, when done right, data governance and stewardship can deliver a rich catalog of technical and business metadata. This creates an environment that allows data scientists, analysts and developers to better understand and discover data in Hadoop. Neglecting governance & stewardship may mean the vision of a data lake quickly turns into a data swamp.

Providing easy access to data, increasing ROI, automating data governance, and improving data discovery was difficult in the relational database world, and is even more challenging in the fast moving, immature big data space. At its core, Hadoop was designed to be extremely flexible, file based and schemaless, making governance and stewardship even more challenging.

Fortunately, there are now tools that exist to address governance and security concerns within Hadoop:

Cloudera Navigator:

  • The only tool that provides column level lineage (for supported tools) and file based lineage within the Hadoop ecosystem

Teradata Loom:

  • Provides lineage (for data transformed in Loom) and schema inference capabilities (for specific file types)

Waterline Data Science

  • Waterline has a strong focus on data quality and business metadata capture in Hadoop

Apache Falcon

  • A framework to define, schedule and monitor data management policies in Hadoop

Apache Sentry

  • Enables fine grained authorization for both data and metadata stored in HDFS

Apache Ranger

  • Provides a central security policy administration and auditing function for the Hadoop ecosystem

Apache HCatalog / HiveMetastore

  • A metadata and table management system for the Hadoop platform

Although the market is fragmented and governance tools are continuing to mature, organizations should strategically choose the tool(s) that address their data governance concerns today, while remaining agile and prepared for change in the future.

There is no silver bullet for protecting your organization against data breaches. You can, however, place more emphasis on data governance during the design phase to improve efficiency & transparency of data operations, increase the value extracted from your data assets, and better recognize, diagnose, and act on threats to your data and organization.


Attend CapTech & Cloudera's webinar on January 20th, for an inside look at what it takes to set up an effective data governance model. Protect your organization from the growing threat of security breaches and hackings that big data brings to the table. Register Today!