Alfred - Your Data Butler

RICHMOND, VA – March 27, 2018 - CapTech, a national IT management consulting firm, is pleased to announce a new contribution to the open-source community that can solve common data ingestion, metadata, and data governance problems in modern data architecture. "Alfred – Your Data Butler" is an open-source data ingestion engine that acts as a gatekeeper to prevent ungoverned data from being loaded into a data lake. This flexible tool helps data scientists find insights quicker, helps data stewards govern data lakes, and helps IT get insights to production faster.

CapTech, who has significant experience delivering data science and modern data architecture solutions to enterprise, developed Alfred to solve a common gap in data scientists' workflow that is currently filled by already overburdened IT groups. Data scientists need to follow a multi-step process to have an exploratory dataset loaded into a data lake or other data store. That process can take weeks to resolve.

Alfred allows business users to upload and analyze data themselves, as well as define and prepare data for ingestion. The simple and intuitive user interface allows the user to describe the metadata needed for ingestion, governance, and search. This process automatically performs much of the technical setup and configuration, which allows data scientists to start working quickly to determine if there is hidden business value in the data.

With this automated process, data scientists only need to get IT and data stewards involved when they deem data valuable and want to promote an insight to a production environment. At this point, IT and data stewards can work with the data scientist to properly load the data. The data stewards can rest assured that the ingestion engine for Alfred is designed to validate against the rules they have defined.

Alfred has no requirement to use a specific data mining or data wrangling tool, which gives users the flexibility to adjust to evolving data technologies. "CapTech created Alfred using an open-source, vendor-agnostic approach," said Ben Harden, Principal at CapTech. "It allows users to continue to use the tools they are comfortable with today and is flexible enough to support the tools they will want to use tomorrow. Providing quick, governed access to exploratory data using familiar tools makes data scientists more productive."

"Alfred isn't meant to replace your favorite data tools," said Calli Rogers, CapTech data engineer and Alfred developer. "It isn't a data catalog that indexes your data after it's in a data store – it's a gatekeeper for incoming data that allows you to catalog your data upfront. Alfred isn't a transfer tool – it's designed to ingest data into a target location to allow your scientists and analysts to work with it. It works with the Apache Hadoop open-source stack but can be applied to any type of data sources and targets. Alfred allows your data scientists, data engineers, and data stewards to be more effective and reduce unneeded overhead."

Key benefits to using Alfred include:

  • Speed – It's designed to let the data scientist get data loaded and ready for analysis without the need for IT.
  • Discovery – Data Scientists can mine data to see if it is relevant before involving IT.
  • Governance – It allows data stewards to easily understand data lineage, curate business metadata data, and ultimately keep the lake clean and safe
  • Control – When data insights are found, IT can take over and operationalize the data scientist's work.

Alfred is available on GitHub. Click here.

How Alfred saves you time