Big Data – Big Governance

Why do you need data governance?

In recent years there is an avalanche of data coming into many companies and there is more to come. Big Data is getting bigger. Each individual company might consume raw data from different sources, in different formats, can be streaming or delayed, etc.

In many companies, the “anarchy” works up to some point and then things start hosing.

A lot of companies now realize that there is a real need for proper data governance and outlined processes. But sometimes it is not that easy to sort out what exactly the governance should look like in order for it to work and how to do that.

There are a lot of ambiguous schematic charts, about Data Governance, but there is a little explanation what is the whole point of all this.

To answer in simple terms WHY do you need data governance:

The data should be:

  • Easily findable
  • Easily accessible
  • High quality (Accurate, complete and consistent)
  • Easy to work with
  • Access secured
  • Regulations compliant
  • Archived when near-active
  • Properly discarded when obsolete

In addition:

  • There should be no operational friction

That’s pretty much it!

The trick is that everything should be working like a well-oiled machine: the data should be well mapped, there should be no unplanned systems outage because of mismanaged data and the process should be smooth and well documented.

Now how to achieve it?

Well 🙂 that is another story:

To design the whole process can take a while. Here are some steps to make it happen:

  • Metrics for measuring DG success should be identified (number of issues opened, closed, in progress, number of data fields standardized, + there is much more)
  • Data should be classified starting with business glossary (people have to be on the same page)
  • Data cleaning processes for quality should be outlined and checked for problems (initial system conversion, data consolidation, manual data entry, batch feeds, real time interfaces, others)
  • Data Connections specified, Master Data Management outlined
  • On the business side:

How the data is created, used, updated and when disposed – specified (policies for data retention, access, etc.) BI and DW components accounted for, etc.

  • On the IT side: How and where the data is maintained, stored, archived, disposed of – specified (functional flow diagrams with concept / logical  / physical architectures considering all IT and business aspects including policies, IT optimized methods, DB management, Security, etc.)
  • Who are the accountable parties
  • Who are decision makers
  • Who are data stewards
  • Communication protocol specified (meetings, emails, etc.)


Create a company Data Governance Workbook. Start small, with something simple (one project or initiative), keep the whole company in perspective. Develop a conceptual design, test it, modify if needed, implement it, monitor the success. Now you have a developed framework, work off the framework with another project.

Can be quite a long process but on the long run is definitely worth the money spent.

Good luck! 🙂