Instructor Spotlight: Theresa Kushner

Theresa Kushner is presently the Vice President of Enterprise Information Management for VMware, Palo Alto. She joined in October 2012 to help the fast growing software company develop a firm data foundation on which to build their future business. In this issue of eLearningCurve Update Theresa shares her thoughts on finding commonalities in the way we manage both big and "little" data..

Read full bio.

Data Governance in a Big Data World

Let’s face it – we can’t escape Big Data! We can ignore it. We can tell our management that we have it under control. We can even try to make it smaller. But the fact remains – Big Data is here to stay and getting bigger every day. In this environment as data governance managers, we have to find ways to manage unstructured Big Data as well as we manage structured “little” data. So let’s start with the principles we hold dear about how to manage “little” data and hope that we can find some commonality, but also make sure we understand the differences. Here are just a few important principles managed in a data governance program.

  1. Accountability
  2. Standardization
  3. Business Alignment
  4. Maintenance
  5. Access control

Most successful governance programs begin with accountability. Who in the organization is accountable for the data that is being governed? Big data is no different than any other data in this area. But finding the person to hold accountable can be tricky. Here are a few suggestions for selecting the right one:

  1. Whose job depends on information gleaned from big data? Marketing? Product development?
  2. Whose collection of big data will be the largest? Does web data in marketing outstrip the number of terabytes dedicated to product or support data collected?
  3. If several groups claim accountability for big data, who has more of a corporate perspective?

Once you’ve answered some of these key questions, make sure that the person you select understands what “accountability” means in your data governance world. Here’s a short definition: Accountability belongs to an individual who can make decisions about the collection, use and management of data.

Standardization in the big data world is almost an oxymoron. One of the attributes of big data is its variety. Standardizing big data is not necessary. What is needed, however, is good cataloging and management tools to ensure that data sources can be located and that once located can be used effectively. Organizing data sources with standardized metadata tags is as close as you might get to standardization. A good metadata management tool is important for any kind of data governance.

Aligning to your business is also important to big data governance. This simply means that the information and key decisions made from the information provided by the analyses of big data is relevant, consistent and appropriate for meeting your business objectives. For example, mining big data for an executive who is “just curious” undermines the governance role as well as exercises resources perhaps unnecessarily.

It’s important to know from the start what part big data plays in your overall data strategy. Will analyses from big data play a contextual role in your understanding of your customers or is it essential for understanding your web-based operations? What you decide to do with the insights you get from big data is key to how your governance program will be structured and managed./p>

Maintenance of big data probably represents the greatest differences from traditional structured data, especially when it comes to metrics you are used to managing such as completeness, consistency, overall quality. These metrics are, for the most part, unnecessary. And the definition of consistency may have to change. With big data consistency may mean how information flows to your analytical environment whereas with structured data it could be about maintaining record to record consistency of data attributes.

In addition, with big data there is usually no requirement to maintain a referential data source to check on how accurate your data is. These sources are often the last stream of information pulled so that consistency and accuracy are measured against the last stream of data you analyzed.

Consistency and accuracy, however, do have a place in ensuring that the metadata you use to tag big data sources maintain an environment for analysts that is consistent, easily accessed and maintained.

In the world of big data, the velocity of the data through your systems makes maintenance very difficult. Your governance now has to contend with which data sets are to be used, when and how often should they flow to your analytical data source. Again, back to your data strategy for support in how this should work for you. Do you need to absorb the data as it comes in, analyze and provide insights immediately or is some of the analysis done by the system itself such as automated systems that analyze information downloaded from thousands of refrigerators in 13 different countries?

Archival and retrieval may not be as great a concern either for two reasons: 1) the analyses done on big data is highly volatile and has a very short shelf life typically and 2) privacy and data management laws are rapidly providing limits to how long big data sets CAN be maintained. You also need to watch how often these data sets are moved across country borders. That’s a big No-No in Europe, for example.

Maria Villa, Global Vice President of SAP in charge of Data Governance, noted in a recent article for Information Management (Data Governance and Big Data) "It is critically important to develop a lifecycle management strategy that includes archiving and deletion policies, business rules and IT automation. The company will not be able to store all this incoming data forever."

Access control IS important for big data and rules for how to manage that access should be in place BEFORE a big data source is created. Big data analysts are great miners. They can toss out lots of great finds from big data, but also be aware that some of their finds may be loose correlations, not causalities, even though the data may seem to suggest it. So first rule is to ensure that those who have access to big data understand HOW it is different from mining in a structured environment. Because the big data is not often used to drive financial results, there is little worry that access to it can cause damage to your company. It’s the connection of information and the insights that are drawn from the data that cause the issues. Take Target’s faux pas with the pregnant teenager and the wrath and publicity the chain got for finding “facts” that probably should not have been capitalized on in marketing campaign to her parents’ home.

These 5 areas are just a small sample of the difference in governance between big and small data.

Learn More from Theresa

Data Governance Fundamentals

Data Governance for Data Stewards

Certified Data Steward

The mission of the Certified Data Steward (CDS) Program

The CDS designation makes a clear statement that you have learned from the industry leaders and demonstrated both depth of understanding and the skills to apply concepts, techniques, and practices of data stewardship, data quality, data governance, metadata management, and master data management.

For the true standard bearers in the data stewardship profession we offer the second level of CDS certification - CDS Ex. To earn CDS Ex designation you must demonstrate a combination of great Expertise, Experience, and Excellence. Read CDS Rulebook for detailed CDS and CDS Ex requirements.

The most convenient and cost-efficient method to enroll in the CDS program is with one of our CDS Packages. Each package includes all courses and exams necessary to earn CDS or CDS Ex in one of the tracks, all at a savings of 20% or more. Alternatively, you can simply enroll in courses and exams one at a time.

2014 International Data Quality Summit

IAIDQ and ECCMA will be hosting their first co-located conference, the International Data Quality Summit (IDQSummit), October 6-9, 2014 at the Wyndham Virginia Crossing Hotel and Conference Center in Glen Allen, just a few miles outside of historic Richmond, Virginia.

The IDQSummit will be offering 12 cutting-edge tutorials, 4 tracks of presentations with 40 speakers, and 4 keynote speakers as well as expert data quality and data governance panels, a 2 day vendor expo and a wide variety of exciting entertainment throughout the entire conference.

Discounted registration is available for all IAIDQ members and ECCMA members. Group discounts are also available (register 3, get the 4th free). For registration details please visit: http://idqsummit.org/registration.html.

If you or a colleague attend this event, you may be eligible for a discount on eLearningCurve courses and packages related to the subject of the conference. Visit our Conference Plus page for more details.

Click here to go back to all newsletters and the newsletter sign up form.