May 7, 2017 | Author: Alan McSweeney | Category: N/A
Structured and Comprehensive Approach to Data Management and the Data Management Book of Knowledge (DMBOK)...
Structured and Comprehensive Approach to Data Management and the Data Management Book of Knowledge (DMBOK)
Alan McSweeney
Objectives •
To provide an overview of a structured approach to developing and implementing a detailed data management policy including frameworks, standards, project, team and maturity
March 8, 2010
2
Agenda •
Introduction to Data Management
•
State of Information and Data Governance
•
Other Data Management Frameworks
•
Data Management and Data Management Book of Knowledge (DMBOK)
•
Conducting a Data Management Project
•
Creating a Data Management Team
•
Assessing Your Data Management Maturity
March 8, 2010
3
Preamble •
Every good presentation should start with quotations from The Prince and Dilbert
March 8, 2010
4
Management Wisdom •
There is nothing more difficult to take in hand, more perilous to conduct or more uncertain in its success than to take the lead in the introduction of a new order of things. − The Prince
•
•
•
Never be in the same room as a decision. I'll illustrate my point with a puppet show that I call "Journey to Blameville" starring "Suggestion Sam" and "Manager Meg.“ You will often be asked to comment on things you don't understand. These handouts contain nonsense phrases that can be used in any situation so, let's dominate our industry with quality implementation of methodologies. Our executives have started their annual strategic planning sessions. This involves sitting in a room with inadequate data until an illusion of knowledge is attained. Then we'll reorganise, because that's all we know how to do. − Dilbert
March 8, 2010
5
Information •
Applications •
Processes
Information IT Systems
• •
•
People
March 8, 2010
Infrastructure
•
Information in all its forms – input, processed, outputs – is a core component of any IT system Applications exist to process data supplied by users and other applications Data breathes life into applications Data is stored and managed by infrastructure – hardware and software Data is a key organisation asset with a substantial value Significant responsibilities are imposed on organisations in managing data 6
Data, Information and Knowledge • • • • • •
• •
Data is the representation of facts as text, numbers, graphics, images, sound or video Data is the raw material used to create information Facts are captured, stored, and expressed as data Information is data in context Without context, data is meaningless - we create meaningful information by interpreting the context around data Knowledge is information in perspective, integrated into a viewpoint based on the recognition and interpretation of patterns, such as trends, formed with other information and experience Knowledge is about understanding the significance of information Knowledge enables effective action March 8, 2010
7
Data, Information, Knowledge and Action
Knowledge
Information
March 8, 2010
Action
Data
8
Information is an Organisation Asset •
Tangible organisation assets are seen as having a value and are managed and controlled using inventory and asset management systems and procedures
•
Data, because it is less tangible, is less widely perceived as a real asset, assigned a real value and managed as if it had a value
•
High quality, accurate and available information is a prerequisite to effective operation of any organisation
March 8, 2010
9
Data Management and Project Success •
Data is fundamental to the effective and efficient operation of any solution − Right data − Right time − Right tools and facilities
•
Without data the solution has no purpose
•
Data is too often overlooked in projects
•
Project managers frequently do not appreciate the complexity of data issues
March 8, 2010
10
Generalised Information Management Lifecycle Enter, Create, Acquire, Derive, Update, Capture
•
Store, Manage, Replicate and Distribute
M an ag
Protect and Recover
•
Design, define and implement framework to manage information through this lifecycle
Generalised lifecycle that differs for specific information types e,
Co nt ro
la
nd
Ad mi
n is t er
Archive and Recall
Delete/Remove
March 8, 2010
11
Expanded Generalised Information Management Lifecycle Plan, Design and Specify
De
Implement Underlying Infrastructure
sig n, Im
ple m
Enter, Create, Acquire, Derive, Update, Capture Store, Manage, Replicate and Distribute
•
Include phases for information management lifecycle design and implementation of appropriate hardware and software to actualise lifecycle March 8, 2010
en
t, M an ag e,
Co nt ro
la
nd
Ad
mi ni
ste
r
Protect and Recover
Archive and Recall
Delete/Remove 12
Data and Information Management •
Data and information management is a business process consisting of the planning and execution of policies, practices, and projects that acquire, control, protect, deliver, and enhance the value of data and information assets
March 8, 2010
13
Data and Information Management To manage and utilise information as a strategic asset
To implement processes, policies, infrastructure and solutions to govern, protect, maintain and use information To make relevant and correct information available in all business processes and IT systems for the right people in the right context at the right time with the appropriate security and with the right quality To exploit information in business decisions, processes and relations March 8, 2010
14
Data Management Goals •
Primary goals − To understand the information needs of the enterprise and all its stakeholders − To capture, store, protect, and ensure the integrity of data assets − To continually improve the quality of data and information, including accuracy, integrity, integration, relevance and usefulness of data − To ensure privacy and confidentiality, and to prevent unauthorised inappropriate use of data and information − To maximise the effective use and value of data and information assets
March 8, 2010
15
Data Management Goals •
Secondary goals − To control the cost of data management − To promote a wider and deeper understanding of the value of data assets − To manage information consistently across the enterprise − To align data management efforts and technology with business needs
March 8, 2010
16
Triggers for Data Management Initiative •
When an enterprise is about to undertake architectural transformation, data management issues need to be understood and addressed
•
Structured and comprehensive approach to data management enables the effective use of data to take advantage of its competitive advantages
March 8, 2010
17
Data Management Principles •
Data and information are valuable enterprise assets
•
Manage data and information carefully, like any other asset, by ensuring adequate quality, security, integrity, protection, availability, understanding and effective use
•
Share responsibility for data management between business data owners and IT data management professionals
•
Data management is a business function and a set of related disciplines
March 8, 2010
18
Organisation Data Management Function •
Business function of planning for, controlling and delivering data and information assets
•
Development, execution, and supervision of plans, policies, programs, projects, processes, practices and procedures that control, protect, deliver, and enhance the value of data and information assets
•
Scope of the data management function and the scale of its implementation vary widely with the size, means, and experience of organisations
•
Role of data management remains the same across organisations even though implementation differs widely March 8, 2010
19
Scope of Complete Data Management Function Data Management Data Governance
Data Architecture Management
Data Development
Data Operations Management
Data Security Management
Data Quality Management
Reference and Master Data Management
Data Warehousing and Business Intelligence Management
Document and Content Management
Metadata Management
March 8, 2010
20
Shared Role Between Business and IT •
Data management is a shared responsibility between data management professionals within IT and the business data owners representing the interests of data producers and information consumers
•
Business data ownership is the concerned with accountability for business responsibilities in data management
•
Business data owners are data subject matter experts
•
Represent the data interests of the business and take responsibility for the quality and use of data March 8, 2010
21
Why Develop and Implement a Data Management Framework? • • • • • • • • • •
Improve organisation data management efficiency Deliver better service to business Improve cost-effectiveness of data management Match the requirements of the business to the management of the data Embed handling of compliance and regulatory rules into data management framework Achieve consistency in data management across systems and applications Enable growth and change more easily Reduce data management and administration effort and cost Assist in the selection and implementation of appropriate data management solutions Implement a technology-independent data architecture March 8, 2010
22
Data Management Issues
March 8, 2010
23
Data Management Issues •
Discovery - cannot find the right information
•
Integration - cannot manipulate and combine information
•
Insight - cannot extract value and knowledge from information
•
Dissemination - cannot consume information
•
Management – cannot manage and control information volumes and growth
March 8, 2010
24
Data Management Problems – User View • • • • • • • • • • • • • • •
Managing Storage Equipment Application Recoveries / Backup Retention Vendor Management Power Management Regulatory Compliance Lack of Integrated Tools Dealing with Performance Problems Data Mobility Archiving and Archive Management Storage Provisioning Managing Complexity Managing Costs Backup Administration and Management Proper Capacity Forecasting and Storage Reporting Managing Storage Growth March 8, 2010
25
Information Management Challenges •
Explosive Data Growth − Value and volume of data is overwhelming − More data is see as critical − Annual rate of 50+% percent
•
Compliance Requirements − Compliance with stringent regulatory requirements and audit procedures
•
Fragmented Storage Environment − Lack of enterprise-wide hardware and software data storage strategy and discipline
•
Budgets − Frozen or being cut March 8, 2010
26
Data Quality •
Poor data quality costs real money
•
Process efficiency is negatively impacted by poor data quality
•
Full potential benefits of new systems not be realised because of poor data quality
•
Decision making is negatively affected by poor data quality
March 8, 2010
27
State of Information and Data Governance •
Information and Data Governance Report, April 2008 − International Association for Information and Data Quality (IAIDQ) − University of Arkansas at Little Rock, Information Quality Program (UALR-IQ)
March 8, 2010
28
Your Organisation Recognises and Values Information as a Strategic Asset and Manages it Accordingly
3.4%
Strongly Disagree
21.5%
Disagree
17.1%
Neutral
39.5%
Agree
18.5%
Strongly Agree
0%
March 8, 2010
10%
20%
30%
40%
50%
29
Direction of Change in the Results and Effectiveness of the Organisation's Formal or Informal Information/Data Governance Processes Over the Past Two Years
Results and Effectiveness Have Significantly Improved
8.8%
50.0%
Results and Effectiveness Have Improved Results and Effectiveness Have Remained Essentially the Same
31.9%
3.9%
Results and Effectiveness Have Worsened Results and Effectiveness Have Significantly Worsened
0.0%
5.4%
Don’t Know 0%
March 8, 2010
10%
20%
30%
40%
50%
60%
70%
30
Perceived Effectiveness of the Organisation's Current Formal or Informal Information/Data Governance Processes
Excellent (All Goals are Met)
2.5%
Good (Most Goals are Met)
21.1%
51.5%
OK (Some Goals are Met)
Poor (Few Goals are Met)
19.1%
Very Poor (No Goals are Met)
3.9%
2.0%
Don’t Know 0%
March 8, 2010
10%
20%
30%
40%
50%
60%
70%
31
Actual Information/Data Governance Effectiveness vs. Organisation's Perception
It is Better Than Most People Think
20.1%
It is the Same as Most People Think
32.4%
It is Worse Than Most People Think
35.8%
11.8%
Don’t Know
0%
March 8, 2010
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
32
Current Status of Organisation's Information/Data Governance Initiatives Started an Information/Data Governance Initiative, but Discontinued the Effort
1.5%
Considered a Focused Information/Data Governance Effort but Abandoned the Idea
0.5% 7.4%
None Being Considered - Keeping the Status Quo Exploring, Still Seeking to Learn More
20.1%
Evaluating Alternative Frameworks and Information Governance Structures
23.0%
Now Planning an Implementation
13.2%
First Iteration Implemented the Past 2 Years
19.1%
First Interation"in Place for More Than 2 Years
8.8%
Don’t Know
6.4% 0%
March 8, 2010
5%
10%
15%
20%
25%
30% 33
Expected Changes in Organisation's Information/Data Governance Efforts Over the Next Two Years 46.6%
Will Increase Significantly
39.2%
Will Increase Somewhat
10.8%
Will Remain the Same
1.0%
Will Decrease Somewhat
Will Decrease Significantly
0.5%
2.0%
Don’t Know
0% March 8, 2010
10%
20%
30%
40%
50%
60% 34
Overall Objectives of Information / Data Governance Efforts Improve Data Quality
80.2%
Establish Clear Decision Rules and Decisionmaking Processes for Shared Data
65.6%
Increase the Value of Data Assets
59.4%
Provide Mechanism to Resolve Data Issues
56.8%
Involve Non-IT Personnel in Data Decisions IT Should not Make by Itself
55.7%
Promote Interdependencies and Synergies Between Departments or Business Units
49.6%
Enable Joint Accountability for Shared Data
45.3%
Involve IT in Data Decisions non-IT Personnel Should not Make by Themselves Other None Applicable Don't Know
35.4% 5.2% 1.0% 2.6% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100 %
March 8, 2010
35
Change In Organisation's Information / Data Quality Over the Past Two Years Information / Data Quality Has Significantly Improved
10.5%
Information / Data Quality Has Improved
68.4%
Information / Data Quality Has Remained Essentially the Same
15.8%
Information / Data Quality Has Worsened
Information / Data Quality Has Significantly Worsened
3.5%
0.0%
1.8%
Don’t Know
0%
March 8, 2010
10%
20%
30%
40%
50%
60%
70%
80%
36
Maturity Of Information / Data Governance Goal Setting And Measurement In Your Organisation 3.7%
5 - Optimised
11.8%
4 - Managed
26.7%
3 - Defined
2 - Repeatable
28.9%
1 - Ad-hoc
28.9%
0% March 8, 2010
5%
10%
15%
20%
25%
30%
35%
40%
45%
50% 37
Maturity Of Information / Data Governance Processes And Policies In Your Organisation 1.6%
5 - Optimised
4.8%
4 - Managed
24.5%
3 - Defined
46.3%
2 - Repeatable
22.9%
1 - Ad-hoc
0% March 8, 2010
5%
10%
15%
20%
25%
30%
35%
40%
45%
50% 38
Maturity Of Responsibility And Accountability For Information / Data Governance Among Employees In Your Organisation 6.9%
5 - Optimised
3.2%
4 - Managed
31.7%
3 - Defined
25.4%
2 - Repeatable
32.8%
1 - Ad-hoc
0% March 8, 2010
5%
10%
15%
20%
25%
30%
35%
40%
45%
50% 39
Other Data Management Frameworks
March 8, 2010
40
Other Data Management-Related Frameworks •
• • • •
TOGAF (and other enterprise architecture standards) define a process for arriving an at enterprise architecture definition, including data TOGAF has a phase relating to data architecture TOGAF deals with high level DMBOK translates high level into specific details COBIT is concerned with IT governance and controls: − IT must implement internal controls around how it operates − The systems IT delivers to the business and the underlying business processes these systems actualise must be controlled – these are controls external to IT − To govern IT effectively, COBIT defines the activities and risks within IT that need to be managed
• •
COBIT has a process relating to data management Neither TOGAF nor COBIT are concerned with detailed data management design and implementation March 8, 2010
41
DMBOK, TOGAF and COBIT Can be a Precursor to Implementing Data Management
TOGAF Defines the Process for Creating a Data Architecture as Part of an Overall Enterprise Architecture
DMBOK Is a Specific and Comprehensive Data Oriented Framework
DMBOK Provides Detailed for Definition, Implementation and Operation of Data Management and Utilisation
Can Provide a Maturity Model for Assessing Data Management
COBIT Provides Data Governance as Part of Overall IT Governance
March 8, 2010
42
DMBOK, TOGAF and COBIT – Scope and Overlap DMBOK
TOGAF
Data Development Data Operations Management Reference and Master Data Management Data Warehousing and Business Intelligence Management Document and Content Management Metadata Management Data Quality Management
Data Architecture Management Data Management Data Migration
Data Governance
March 8, 2010
Data Security Management
COBIT
43
TOGAF and Data Management •
Phase H: Architecture Change Management
Phase G: Implementation Governance
Phase A: Architecture Vision Phase B: Business Architecture
Phase C1 (subset of Phase C) relates to defining a data architecture Phase C1: Data Architecture
Requirements Management
Phase C: Information Systems Architecture
Phase D: Technology Architecture
Phase F: Migration Planning
Phase C2: Solutions and Application Architecture
Phase E: Opportunities and Solutions
March 8, 2010
44
TOGAF Phase C1: Information Systems Architectures - Data Architecture - Objectives •
Purpose is to define the major types and sources of data necessary to support the business, in a way that is: − Understandable by stakeholders − Complete and consistent − Stable
•
Define the data entities relevant to the enterprise
•
Not concerned with design of logical or physical storage systems or databases
March 8, 2010
45
TOGAF Phase C1: Information Systems Architectures - Data Architecture - Overview Phase C1: Information Systems Architectures - Data Architecture
Approach Elements
Inputs
Steps
Outputs
Key Considerations for Data Architecture
Reference Materials External to the Enterprise
Select Reference Models, Viewpoints, and Tools
Architecture Repository
Non-Architectural Inputs
Develop Baseline Data Architecture Description
Architectural Inputs
Develop Target Data Architecture Description
Perform Gap Analysis
Define Roadmap Components
Resolve Impacts Across the Architecture Landscape Conduct Formal Stakeholder Review
Finalise the Data Architecture
Create Architecture Definition Document March 8, 2010
46
TOGAF Phase C1: Information Systems Architectures - Data Architecture - Approach - Key Considerations for Data Architecture •
Data Management − Important to understand and address data management issues − Structured and comprehensive approach to data management enables the effective use of data to capitalise on its competitive advantages − Clear definition of which application components in the landscape will serve as the system of record or reference for enterprise master data − Will there be an enterprise-wide standard that all application components, including software packages, need to adopt − Understand how data entities are utilised by business functions, processes, and services − Understand how and where enterprise data entities are created, stored, transported, and reported − Level and complexity of data transformations required to support the information exchange needs between applications − Requirement for software in supporting data integration with external organisations
March 8, 2010
47
TOGAF Phase C1: Information Systems Architectures - Data Architecture - Approach - Key Considerations for Data Architecture •
Data Migration − Identify data migration requirements and also provide indicators as to the level of transformation for new/changed applications − Ensure target application has quality data when it is populated − Ensure enterprise-wide common data definition is established to support the transformation
March 8, 2010
48
TOGAF Phase C1: Information Systems Architectures - Data Architecture - Approach - Key Considerations for Data Architecture •
Data Governance − Ensures that the organisation has the necessary dimensions in place to enable the data transformation − Structure – ensures the organisation has the necessary structure and the standards bodies to manage data entity aspects of the transformation − Management System - ensures the organisation has the necessary management system and data-related programs to manage the governance aspects of data entities throughout its lifecycle − People - addresses what data-related skills and roles the organisation requires for the transformation
March 8, 2010
49
TOGAF Phase C1: Information Systems Architectures - Data Architecture - Outputs •
Refined and updated versions of the Architecture Vision phase deliverables − Statement of Architecture Work − Validated data principles, business goals, and business drivers
•
Draft Architecture Definition Document − Baseline Data Architecture − Target Data Architecture • • • • •
Business data model Logical data model Data management process models Data Entity/Business Function matrix Views corresponding to the selected viewpoints addressing key stakeholder concerns
− Draft Architecture Requirements Specification • • • • • •
Gap analysis results Data interoperability requirements Relevant technical requirements Constraints on the Technology Architecture about to be designed Updated business requirements Updated application requirements
− Data Architecture components of an Architecture Roadmap March 8, 2010
50
COBIT Structure COBIT Plan and Organise (PO)
Acquire and Implement (AI)
Deliver and Support (DS)
Monitor and Evaluate (ME)
PO1 Define a strategic IT plan
AI1 Identify automated solutions
DS1 Define and manage service levels
ME1 Monitor and evaluate IT performance
PO2 Define the information architecture
AI2 Acquire and maintain application software
DS2 Manage third-party services
ME2 Monitor and evaluate internal control
PO3 Determine technological direction
AI3 Acquire and maintain technology infrastructure
DS3 Manage performance and capacity
ME3 Ensure regulatory compliance
PO4 Define the IT processes, organisation and relationships
AI4 Enable operation and use
DS4 Ensure continuous service
ME4 Provide IT governance
PO5 Manage the IT investment
AI5 Procure IT resources
DS5 Ensure systems security
PO6 Communicate management aims and direction
AI6 Manage changes
DS6 Identify and allocate costs
PO7 Manage IT human resources
AI7 Install and accredit solutions and changes
DS7 Educate and train users
PO8 Manage quality
DS8 Manage service desk and incidents
PO9 Assess and manage IT risks
DS9 Manage the configuration
PO10 Manage projects
DS10 Manage problems
DS11 Manage data DS12 Manage the physical environment DS13 Manage operations March 8, 2010
51
COBIT and Data Management •
COBIT objective DS11 Manage Data within the Deliver and Support (DS) domain
•
Effective data management requires identification of data requirements
•
Data management process includes establishing effective procedures to manage the media library, backup and recovery of data and proper disposal of media
•
Effective data management helps ensure the quality, timeliness and availability of business data
March 8, 2010
52
COBIT and Data Management •
• •
Objective is the control over the IT process of managing data that meets the business requirement for IT of optimising the use of information and ensuring information is available as required Focuses on maintaining the completeness, accuracy, availability and protection of data Involves taking actions − Backing up data and testing restoration − Managing onsite and offsite storage of data − Securely disposing of data and equipment
•
Measured by − User satisfaction with availability of data − Percent of successful data restorations − Number of incidents where sensitive data were retrieved after media were disposed of
March 8, 2010
53
COBIT Process DS11 Manage Data •
DS11.1 Business Requirements for Data Management − Establish arrangements to ensure that source documents expected from the business are received, all data received from the business are processed, all output required by the business is prepared and delivered, and restart and reprocessing needs are supported
•
DS11.2 Storage and Retention Arrangements − Define and implement procedures for data storage and archival, so data remain accessible and usable − Procedures should consider retrieval requirements, cost-effectiveness, continued integrity and security requirements − Establish storage and retention arrangements to satisfy legal, regulatory and business requirements for documents, data, archives, programmes, reports and messages (incoming and outgoing) as well as the data (keys, certificates) used for their encryption and authentication
•
DS11.3 Media Library Management System − Define and implement procedures to maintain an inventory of onsite media and ensure their usability and integrity − Procedures should provide for timely review and follow-up on any discrepancies noted
•
DS11.4 Disposal − Define and implement procedures to prevent access to sensitive data and software from equipment or media when they are disposed of or transferred to another use − Procedures should ensure that data marked as deleted or to be disposed cannot be retrieved.
•
DS11.5 Backup and Restoration − Define and implement procedures for backup and restoration of systems, data and documentation in line with business requirements and the continuity plan − Verify compliance with the backup procedures, and verify the ability to and time required for successful and complete restoration − Test backup media and the restoration process
•
DS11.6 Security Requirements for Data Management − Establish arrangements to identify and apply security requirements applicable to the receipt, processing, physical storage and output of data and sensitive messages − Includes physical records, data transmissions and any data stored offsite
March 8, 2010
54
COBIT Data Management Goals and Metrics Activity Goals
Process Goals
Activity Goals
•Backing up data and testing restoration •Managing onsite and offsite storage of data •Securely disposing of data and equipment
•Maintain the completeness, accuracy, validity and accessibility of stored data •Secure data during disposal of media •Effectively manage storage media
•Backing up data and testing restoration •Managing onsite and offsite storage of data •Securely disposing of data and equipment
Are Measured By Key Performance Indicators •Frequency of testing of backup media •Average time for data restoration
March 8, 2010
Drive
Are Measured By
Drive
Are Measured By
Process Key Goal Indicators
IT Key Goal Indicators
•% of successful data restorations •# of incidents where sensitive data were retrieved after media were disposed of •# of down time or data integrity incidents caused by insufficient storage capacity
•Occurrences of inability to recover data critical to business process •User satisfaction with availability of data •Incidents of noncompliance with laws due to storage management issues 55
Data Management Book of Knowledge (DMBOK)
March 8, 2010
56
Data Management Book of Knowledge (DMBOK) •
DMBOK is a generalised and comprehensive framework for managing data across the entire lifecycle
•
Developed by DAMA (Data Management Association)
•
DMBOK provides a detailed framework to assist development and implementation of data management processes and procedures and ensures all requirements are addressed
•
Enables effective and appropriate data management across the organisation
•
Provides awareness and visibility of data management issues and requirements March 8, 2010
57
Data Management Book of Knowledge (DMBOK) •
Not a solution to your data management needs
•
Framework and methodology for developing and implementing an appropriate solution
•
Generalised framework to be customised to meet specific needs
•
Provide a work breakdown structure for a data management project to allow the effort to be assessed
•
No magic bullet
March 8, 2010
58
Scope and Structure of Data Management Book of Knowledge (DMBOK) Data Management Environmental Elements Data Management Functions
March 8, 2010
59
DMBOK Data Management Functions Data Management Functions Data Governance
Data Architecture Management
Data Development
Data Operations Management
Data Security Management
Data Quality Management
Reference and Master Data Management
Data Warehousing and Business Intelligence Management
Document and Content Management
Metadata Management
March 8, 2010
60
DMBOK Data Management Functions • • • • • • • • • •
Data Governance - planning, supervision and control over data management and use Data Architecture Management - defining the blueprint for managing data assets Data Development - analysis, design, implementation, testing, deployment, maintenance Data Operations Management - providing support from data acquisition to purging Data Security Management - Ensuring privacy, confidentiality and appropriate access Data Quality Management - defining, monitoring and improving data quality Reference and Master Data Management - managing master versions and replicas Data Warehousing and Business Intelligence Management - enabling reporting and analysis Document and Content Management - managing data found outside of databases Metadata Management - integrating, controlling and providing metadata
March 8, 2010
61
DMBOK Data Management Environmental Elements Data Management Environmental Elements Goals and Principles
Activities
Primary Deliverables
Roles and Responsibilities
Practices and Techniques
Technology
Organisation and Culture March 8, 2010
62
DMBOK Data Management Environmental Elements • • •
•
•
• •
Goals and Principles - directional business goals of each function and the fundamental principles that guide performance of each function Activities - each function is composed of lower level activities, sub-activities, tasks and steps Primary Deliverables - information and physical databases and documents created as interim and final outputs of each function. Some deliverables are essential, some are generally recommended, and others are optional depending on circumstances Roles and Responsibilities - business and IT roles involved in performing and supervising the function, and the specific responsibilities of each role in that function. Many roles will participate in multiple functions Practices and Techniques - common and popular methods and procedures used to perform the processes and produce the deliverables and may also include common conventions, best practice recommendations, and alternative approaches without elaboration Technology - categories of supporting technology such as software tools, standards and protocols, product selection criteria and learning curves Organisation and Culture – this can include issues such as management metrics, critical success factors, reporting structures, budgeting, resource allocation issues, expectations and attitudes, style, cultural, approach to change management
March 8, 2010
63
DMBOK Data Management Functions and Environmental Elements Goals and Principles Data Governance Data Architecture Management Data Development Data Operations Management Data Security Management Data Quality Management Reference and Master Data Management Data Warehousing and Business Intelligence Management Document and Content Management Metadata Management March 8, 2010
Activities
Primary Deliverables
Roles and Practices and Responsibilities Techniques
Technology
Organisation and Culture
Scope of Each Data Management Function
64
Scope of Data Management Book of Knowledge (DMBOK) Data Management Framework •
Hierarchy − Function • Activity − Sub-Activity (not in all cases)
•
Each activity is classified as one (or more) of: − Planning Activities (P) • Activities that set the strategic and tactical course for other data management activities • May be performed on a recurring basis
− Development Activities (D) • Activities undertaken within implementation projects and recognised as part of the systems development lifecycle (SDLC), creating data deliverables through analysis, design, building, testing, preparation, and deployment
− Control Activities (C) • Supervisory activities performed on an on-going basis
− Operational Activities (O) • Service and support activities performed on an on- going basis March 8, 2010
65
Activity Groups Within Functions •
Planning Activities
Control Activities
March 8, 2010
Development Activities
Operational Activities
•
Activity groups are classifications of data management activities Use the activity groupings to define the scope of data management subprojects and identify the appropriate tasks: − Analysis and design − Implementation − Operational improvement − Management and administration 66
DMBOK Function and Activity Structure Data Management
Data Governance
Data Architecture Management
Data Development
Data Operations Management
Data Security Management
Data Quality Management
Reference and Master Data Management
DW and BI Management
Document and Content Management
Metadata Management
Data Management Planning
Understand Enterprise Information Needs
Data Modeling, Analysis, and Solution Design
Database Support
Understand Data Security Needs and Regulatory Requirements
Develop and Promote Data Quality Awareness
Understand Reference and Master Data Integration Needs
Understand Business Intelligence Information Needs
Documents / Records Management
Understand Metadata Requirements
Data Management Control
Develop and Maintain the Enterprise Data Model
Detailed Data Design
Data Technology Management
Define Data Security Policy
Define Data Quality Requirement
Identify Master and Reference Data Sources and Contributors
Define and Maintain the DW / BI Architecture
Content Management
Define the Metadata Architecture
Analyse and Align With Other Business Models
Data Model and Design Quality Management
Define Data Security Standards
Profile, Analyse, and Assess Data Quality
Define and Maintain the Data Integration Architecture
Implement Data Warehouses and Data Marts
Develop and Maintain Metadata Standards
Define and Maintain the Database Architecture
Data Implementation
Define Data Security Controls and Procedures
Define Data Quality Metrics
Implement Reference and Master Data Management Solutions
Implement BI Tools and User Interfaces
Implement a Managed Metadata Environment
Define and Maintain the Data Integration Architecture
Manage Users, Passwords, and Group Membership
Define Data Quality Business Rules
Define and Maintain Match Rules
Process Data for Business Intelligence
Create and Maintain Metadata
Define and Maintain the DW / BI Architecture
Manage Data Access Views and Permissions
Test and Validate Data Quality Requirements
Establish “Golden” Records
Monitor and Tune Data Warehousing Processes
Integrate Metadata
Define and Maintain Enterprise Taxonomies and Namespaces
Monitor User Authentication and Access Behaviour
Set and Evaluate Data Quality Service Levels
Define and Maintain Hierarchies and Affiliations
Monitor and Tune BI Activity and Performance
Manage Metadata Repositories
Define and Maintain the Metadata Architecture
Classify Information Confidentiality
Continuously Measure and Monitor Data Quality
Plan and Implement Integration of New Data Sources
Distribute and Deliver Metadata
Audit Data Security
Manage Data Quality Issues
Replicate and Distribute Reference and Master Data
Query, Report, and Analyse Metadata
Clean and Correct Data Quality Defects
Manage Changes to Reference and Master Data
Design and Implement Operational DQM Procedures Monitor Operational DQM Procedures and Performance
March 8, 2010
67
DMBOK Function and Activity - Planning Activities Data Management
Data Governance
Data Architecture Management
Data Development
Data Operations Management
Data Security Management
Data Quality Management
Reference and Master Data Management
DW and BI Management
Document and Content Management
Metadata Management
Data Management Planning
Understand Enterprise Information Needs
Data Modeling, Analysis, and Solution Design
Database Support
Understand Data Security Needs and Regulatory Requirements
Develop and Promote Data Quality Awareness
Understand Reference and Master Data Integration Needs
Understand Business Intelligence Information Needs
Documents / Records Management
Understand Metadata Requirements
Data Management Control
Develop and Maintain the Enterprise Data Model
Detailed Data Design
Data Technology Management
Define Data Security Policy
Define Data Quality Requirement
Identify Master and Reference Data Sources and Contributors
Define and Maintain the DW / BI Architecture
Content Management
Define the Metadata Architecture
Analyse and Align With Other Business Models
Data Model and Design Quality Management
Define Data Security Standards
Profile, Analyse, and Assess Data Quality
Define and Maintain the Data Integration Architecture
Implement Data Warehouses and Data Marts
Develop and Maintain Metadata Standards
Define and Maintain the Database Architecture
Data Implementation
Define Data Security Controls and Procedures
Define Data Quality Metrics
Implement Reference and Master Data Management Solutions
Implement BI Tools and User Interfaces
Implement a Managed Metadata Environment
Define and Maintain the Data Integration Architecture
Manage Users, Passwords, and Group Membership
Define Data Quality Business Rules
Define and Maintain Match Rules
Process Data for Business Intelligence
Create and Maintain Metadata
Define and Maintain the DW / BI Architecture
Manage Data Access Views and Permissions
Test and Validate Data Quality Requirements
Establish “Golden” Records
Monitor and Tune Data Warehousing Processes
Integrate Metadata
Define and Maintain Enterprise Taxonomies and Namespaces
Monitor User Authentication and Access Behaviour
Set and Evaluate Data Quality Service Levels
Define and Maintain Hierarchies and Affiliations
Monitor and Tune BI Activity and Performance
Manage Metadata Repositories
Define and Maintain the Metadata Architecture
Classify Information Confidentiality
Continuously Measure and Monitor Data Quality
Plan and Implement Integration of New Data Sources
Distribute and Deliver Metadata
Audit Data Security
Manage Data Quality Issues
Replicate and Distribute Reference and Master Data
Query, Report, and Analyse Metadata
Clean and Correct Data Quality Defects
Manage Changes to Reference and Master Data
Design and Implement Operational DQM Procedures Monitor Operational DQM Procedures and Performance
March 8, 2010
68
DMBOK Function and Activity - Control Activities Data Management
Data Governance
Data Architecture Management
Data Development
Data Operations Management
Data Security Management
Data Quality Management
Reference and Master Data Management
DW and BI Management
Document and Content Management
Metadata Management
Data Management Planning
Understand Enterprise Information Needs
Data Modeling, Analysis, and Solution Design
Database Support
Understand Data Security Needs and Regulatory Requirements
Develop and Promote Data Quality Awareness
Understand Reference and Master Data Integration Needs
Understand Business Intelligence Information Needs
Documents / Records Management
Understand Metadata Requirements
Data Management Control
Develop and Maintain the Enterprise Data Model
Detailed Data Design
Data Technology Management
Define Data Security Policy
Define Data Quality Requirement
Identify Master and Reference Data Sources and Contributors
Define and Maintain the DW / BI Architecture
Content Management
Define the Metadata Architecture
Analyse and Align With Other Business Models
Data Model and Design Quality Management
Define Data Security Standards
Profile, Analyse, and Assess Data Quality
Define and Maintain the Data Integration Architecture
Implement Data Warehouses and Data Marts
Develop and Maintain Metadata Standards
Define and Maintain the Database Architecture
Data Implementation
Define Data Security Controls and Procedures
Define Data Quality Metrics
Implement Reference and Master Data Management Solutions
Implement BI Tools and User Interfaces
Implement a Managed Metadata Environment
Define and Maintain the Data Integration Architecture
Manage Users, Passwords, and Group Membership
Define Data Quality Business Rules
Define and Maintain Match Rules
Process Data for Business Intelligence
Create and Maintain Metadata
Define and Maintain the DW / BI Architecture
Manage Data Access Views and Permissions
Test and Validate Data Quality Requirements
Establish “Golden” Records
Monitor and Tune Data Warehousing Processes
Integrate Metadata
Define and Maintain Enterprise Taxonomies and Namespaces
Monitor User Authentication and Access Behaviour
Set and Evaluate Data Quality Service Levels
Define and Maintain Hierarchies and Affiliations
Monitor and Tune BI Activity and Performance
Manage Metadata Repositories
Define and Maintain the Metadata Architecture
Classify Information Confidentiality
Continuously Measure and Monitor Data Quality
Plan and Implement Integration of New Data Sources
Distribute and Deliver Metadata
Audit Data Security
Manage Data Quality Issues
Replicate and Distribute Reference and Master Data
Query, Report, and Analyse Metadata
Clean and Correct Data Quality Defects
Manage Changes to Reference and Master Data
Design and Implement Operational DQM Procedures Monitor Operational DQM Procedures and Performance
March 8, 2010
69
DMBOK Function and Activity - Development Activities Data Management
Data Governance
Data Architecture Management
Data Development
Data Operations Management
Data Security Management
Data Quality Management
Reference and Master Data Management
DW and BI Management
Document and Content Management
Metadata Management
Data Management Planning
Understand Enterprise Information Needs
Data Modeling, Analysis, and Solution Design
Database Support
Understand Data Security Needs and Regulatory Requirements
Develop and Promote Data Quality Awareness
Understand Reference and Master Data Integration Needs
Understand Business Intelligence Information Needs
Documents / Records Management
Understand Metadata Requirements
Data Management Control
Develop and Maintain the Enterprise Data Model
Detailed Data Design
Data Technology Management
Define Data Security Policy
Define Data Quality Requirement
Identify Master and Reference Data Sources and Contributors
Define and Maintain the DW / BI Architecture
Content Management
Define the Metadata Architecture
Analyse and Align With Other Business Models
Data Model and Design Quality Management
Define Data Security Standards
Profile, Analyse, and Assess Data Quality
Define and Maintain the Data Integration Architecture
Implement Data Warehouses and Data Marts
Develop and Maintain Metadata Standards
Define and Maintain the Database Architecture
Data Implementation
Define Data Security Controls and Procedures
Define Data Quality Metrics
Implement Reference and Master Data Management Solutions
Implement BI Tools and User Interfaces
Implement a Managed Metadata Environment
Define and Maintain the Data Integration Architecture
Manage Users, Passwords, and Group Membership
Define Data Quality Business Rules
Define and Maintain Match Rules
Process Data for Business Intelligence
Create and Maintain Metadata
Define and Maintain the DW / BI Architecture
Manage Data Access Views and Permissions
Test and Validate Data Quality Requirements
Establish “Golden” Records
Monitor and Tune Data Warehousing Processes
Integrate Metadata
Define and Maintain Enterprise Taxonomies and Namespaces
Monitor User Authentication and Access Behaviour
Set and Evaluate Data Quality Service Levels
Define and Maintain Hierarchies and Affiliations
Monitor and Tune BI Activity and Performance
Manage Metadata Repositories
Define and Maintain the Metadata Architecture
Classify Information Confidentiality
Continuously Measure and Monitor Data Quality
Plan and Implement Integration of New Data Sources
Distribute and Deliver Metadata
Audit Data Security
Manage Data Quality Issues
Replicate and Distribute Reference and Master Data
Query, Report, and Analyse Metadata
Clean and Correct Data Quality Defects
Manage Changes to Reference and Master Data
Design and Implement Operational DQM Procedures Monitor Operational DQM Procedures and Performance
March 8, 2010
70
DMBOK Function and Activity - Operational Activities Data Management
Data Governance
Data Architecture Management
Data Development
Data Operations Management
Data Security Management
Data Quality Management
Reference and Master Data Management
DW and BI Management
Document and Content Management
Metadata Management
Data Management Planning
Understand Enterprise Information Needs
Data Modeling, Analysis, and Solution Design
Database Support
Understand Data Security Needs and Regulatory Requirements
Develop and Promote Data Quality Awareness
Understand Reference and Master Data Integration Needs
Understand Business Intelligence Information Needs
Documents / Records Management
Understand Metadata Requirements
Data Management Control
Develop and Maintain the Enterprise Data Model
Detailed Data Design
Data Technology Management
Define Data Security Policy
Define Data Quality Requirement
Identify Master and Reference Data Sources and Contributors
Define and Maintain the DW / BI Architecture
Content Management
Define the Metadata Architecture
Analyse and Align With Other Business Models
Data Model and Design Quality Management
Define Data Security Standards
Profile, Analyse, and Assess Data Quality
Define and Maintain the Data Integration Architecture
Implement Data Warehouses and Data Marts
Develop and Maintain Metadata Standards
Define and Maintain the Database Architecture
Data Implementation
Define Data Security Controls and Procedures
Define Data Quality Metrics
Implement Reference and Master Data Management Solutions
Implement BI Tools and User Interfaces
Implement a Managed Metadata Environment
Define and Maintain the Data Integration Architecture
Manage Users, Passwords, and Group Membership
Define Data Quality Business Rules
Define and Maintain Match Rules
Process Data for Business Intelligence
Create and Maintain Metadata
Define and Maintain the DW / BI Architecture
Manage Data Access Views and Permissions
Test and Validate Data Quality Requirements
Establish “Golden” Records
Monitor and Tune Data Warehousing Processes
Integrate Metadata
Define and Maintain Enterprise Taxonomies and Namespaces
Monitor User Authentication and Access Behaviour
Set and Evaluate Data Quality Service Levels
Define and Maintain Hierarchies and Affiliations
Monitor and Tune BI Activity and Performance
Manage Metadata Repositories
Define and Maintain the Metadata Architecture
Classify Information Confidentiality
Continuously Measure and Monitor Data Quality
Plan and Implement Integration of New Data Sources
Distribute and Deliver Metadata
Audit Data Security
Manage Data Quality Issues
Replicate and Distribute Reference and Master Data
Query, Report, and Analyse Metadata
Clean and Correct Data Quality Defects
Manage Changes to Reference and Master Data
Design and Implement Operational DQM Procedures Monitor Operational DQM Procedures and Performance
March 8, 2010
71
DMBOK Environmental Elements Structure Data Management Environmental Elements
Goals and Principles
Primary Deliverables
Activities
Roles and Responsibilities
Technology
Practices and Techniques
Organisation and Culture
Vision and Mission
Phases. Tasks, Steps
Inputs and Outputs
Individual Roles
Tool Categories
Recognised Best Practices
Critical Success Factors
Business Benefits
Dependencies
Information
Organisation Roles
Standards and Protocols
Common Approaches
Reporting Structures
Strategic Goals
Sequence and Flow
Documents
Business and IT Roles
Selection Criteria
Alternative Techniques
Management Metrics
Specific Objectives
Use Cases and Scenarios
Databases
Qualifications and Skills
Learning Curves
Guiding Principles
Trigger Events
Other Resources
Values, Beliefs, Expectations
Attitudes. Styles, Preferences Teamwork, Group Dynamics, Authority, Empowerment. Contracting Strategies
Change Management Approach March 8, 2010
72
DMBOK Environmental Elements
March 8, 2010
73
Data Governance
March 8, 2010
74
Data Governance •
Core function of the Data Management Framework
•
Interacts with and influences each of the surrounding ten data management functions
•
Data governance is the exercise of authority and control (planning, monitoring, and enforcement) over the management of data assets
•
Data governance function guides how all other data management functions are performed
•
High-level, executive data stewardship
•
Data governance is not the same thing as IT governance
•
Data governance is focused exclusively on the management of data assets March 8, 2010
75
Data Governance – Definition and Goals •
Definition − The exercise of authority and control (planning, monitoring, and enforcement) over the management of data assets
•
Goals − To define, approve, and communicate data strategies, policies, standards, architecture, procedures, and metrics − To track and enforce regulatory compliance and conformance to data policies, standards, architecture, and procedures − To sponsor, track, and oversee the delivery of data management projects and services − To manage and resolve data related issues − To understand and promote the value of data assets March 8, 2010
76
Data Governance - Overview Inputs
Primary Deliverables
•Business Goals •Business Strategies •IT Objectives •IT Strategies •Data Needs •Data Issues •Regulatory Requirements
Suppliers
•Data Policies •Data Standards •Resolved Issues •Data Management Projects and Services •Quality Data and Information •Recognised Data Value
Data Governance
•Data Producers •Knowledge Workers •Managers and Executives •Data Professionals •Customers
•Business Executives •IT Executives •Data Stewards •Regulatory Bodies
Participants •Executive Data Stewards •Coordinating Data Stewards •Business Data Stewards •Data Professionals •DM Executive •CIO March 8, 2010
Consumers
Tools •Intranet Website •E-Mail •Metadata Tools •Metadata Repository •Issue Management Tools •Data Governance KPI •Dashboard
Metrics •Data Value •Data Management Cost •Achievement of Objectives •# of Decisions Made •Steward Representation / Coverage •Data Professional Headcount •Data Management Process Maturity 77
Data Governance Function, Activities and SubActivities Data Governance Data Management Planning
Data Management Control
Understand Strategic Enterprise Data Needs
Supervise Data Professional Organisations and Staff
Develop and Maintain the Data Strategy
Coordinate Data Governance Activities
Establish Data Professional Roles and Organisations
Manage and Resolve Data Related Issues
Identify and Appoint Data Stewards
Monitor and Ensure Regulatory Compliance
Establish Data Governance and Stewardship Organisations
Monitor and Enforce Conformance with Data Policies, Standards and Architecture
Develop and Approve Data Policies, Standards, and Procedures
Oversee Data Management Projects and Services
Review and Approve Data Architecture
Communicate and Promote the Value of Data Assets
Plan and Sponsor Data Management Projects and Services Estimate Data Asset Value and Associated Costs March 8, 2010
78
Data Governance •
Data governance is accomplished most effectively as an on-going program and a continual improvement process
•
Every data governance programme is unique, taking into account distinctive organisational and cultural issues, and the immediate data management challenges and opportunities
•
Data governance is at the core of managing data assets
March 8, 2010
79
Data Governance - Possible Organisation Structure Data Governance Structure
Organisation Data Governance Council
Data Governance Office
CIO
Data Management Executive
Business Unit Data Governance Councils
Data Technologists
Data Stewardship Committees
Data Stewardship Teams March 8, 2010
80
Data Governance Shared Decision Making Business Decisions
Shared Decision Making
IT Decisions
Enterprise Information Model
Enterprise Information Management Strategy
Database Architecture
IT Leadership
Information Needs
Enterprise Information Management Policies
Data Integration Architecture
Capital Investments
Information Specifications
Enterprise Information Management Standards
Data Warehousing and Business Intelligence Architecture
Research and Development Funding
Quality Requirements
Enterprise Information Management Metrics
Metadata Architecture
Issue Resolution
Enterprise Information Management Services
Technical Metadata
Business Operating Model
Data Governance Model
March 8, 2010
81
Data Stewardship •
Formal accountability for business responsibilities ensuring effective control and use of data assets
•
Data steward is a business leader and/or recognised subject matter expert designated as accountable for these responsibilities
•
Manage data assets on behalf of others and in the best interests of the organisation
•
Represent the data interests of all stakeholders, including but not limited to, the interests of their own functional departments and divisions
•
Protects, manages, and leverages the data resources
•
Must take an enterprise perspective to ensure the quality and effective use of enterprise data March 8, 2010
82
Data Stewardship - Roles •
Executive Data Stewards – provide data governance and make of high-level data stewardship decisions
•
Coordinating Data Stewards - lead and represent teams of business data stewards in discussions across teams and with executive data stewards
•
Business Data Stewards - subject matter experts work with data management professionals on an ongoing basis to define and control data
March 8, 2010
83
Data Stewardship Roles Across Data Management Functions - 1 Data Architecture Management Data Development
Data Operations Management
Data Security Management
Reference and Master Data Management
March 8, 2010
All Data Stewards
Executive Data Stewards
Review, validate, approve, maintain and refine data architecture Validate physical data models and database designs, participate in database testing and conversion
Review and approve the enterprise data architecture
Coordinating Data Stewards Integrate specifications, resolving differences
Business Data Stewards Define data requirements specifications Define data requirements and specifications
Define requirements for data recovery, retention and performance Help identify, acquire, and control externally sourced data Provide security, privacy and confidentiality requirements, identify and resolve data security issues, assist in data security audits, and classify information confidentiality Control the creation, update, and retirement of code values and other reference data, define master data management requirements, identify and help resolve issues 84
Data Stewardship Roles Across Data Management Functions - 2 All Data Stewards Data Warehousing and Business Intelligence Management
Data Quality Management
March 8, 2010
Coordinating Data Stewards
Business Data Stewards Provide business intelligence requirements and management metrics, and they identify and help resolve business intelligence issues Define enterprise taxonomies and resolve content management issues
Document and Content Management Metadata Management
Executive Data Stewards
Create and maintain business metadata (names, meanings, business rules), define metadata access and integration needs and use metadata to make effective data stewardship and governance decisions Define data quality requirements and business rules, test application edits and validations, assist in the analysis, certification, and auditing of data quality, lead clean-up efforts, identify ways to solve causes of poor data quality, promote data quality awareness 85
Data Strategy •
High-level course of action to achieve high-level goals
•
Data strategy is a data management program strategy a plan for maintaining and improving data quality, integrity, security and access
•
Address all data management functions relevant to the organisation
March 8, 2010
86
Elements of Data Strategy • • • • • • • • • •
Vision for data management Summary business case for data management Guiding principles, values, and management perspectives Mission and long-term directional goals of data management Management measures of data management success Short-term data management programme objectives Descriptions of data management roles and business units along with a summary of their responsibilities and decision rights Descriptions of data management programme components and initiatives Outline of the data management implementation roadmap Scope boundaries March 8, 2010
87
Data Strategy
Data Management Programme Charter Data Management Scope Statement Goals and objectives for a defined planning horizon and the roles, organisations, and individual leaders accountable for achieving these objectives
March 8, 2010
Overall vision, business case, goals, guiding principles, measures of success, critical success factors, recognised risks
Data Management Implementation Roadmap Identifying specific programs, projects, task assignments, and delivery milestones
88
Data Policies •
Statements of intent and fundamental rules governing the creation, acquisition, integrity, security, quality, and use of data and information
•
More fundamental, global, and business critical than data standards
•
Describe what to do and what not to do
•
Should be few data policies stated briefly and directly
March 8, 2010
89
Data Policies •
Possible topics for data policies − Data modeling and other data development activities − Development and use of data architecture − Data quality expectations, roles, and responsibilities − Data security, including confidentiality classification policies, intellectual property policies, personal data privacy policies, general data access and usage policies, and data access by external parties − Database recovery and data retention − Access and use of externally sourced data − Sharing data internally and externally − Data warehousing and business intelligence − Unstructured data - electronic files and physical records March 8, 2010
90
Data Architecture •
Enterprise data model and other aspects of data architecture sponsored at the data governance level
•
Need to pay particular attention to the alignment of the enterprise data model with key business strategies, processes, business units and systems
•
Includes − Data technology architecture − Data integration architecture − Data warehousing and business intelligence architecture − Metadata architecture
March 8, 2010
91
Data Standards and Procedures •
Include naming standards, requirement specification standards, data modeling standards, database design standards, architecture standards and procedural standards for each data management function
•
Must be effectively communicated, monitored, enforced and periodically re-evaluated
•
Data management procedures are the methods, techniques, and steps followed to accomplish a specific activity or task
March 8, 2010
92
Data Standards and Procedures •
Possible topics for data standards and procedures − Data modeling and architecture standards, including data naming conventions, definition standards, standard domains, and standard abbreviations − Standard business and technical metadata to be captured, maintained, and integrated − Data model management guidelines and procedures − Metadata integration and usage procedures − Standards for database recovery and business continuity, database performance, data retention, and external data acquisition − Data security standards and procedures − Reference data management control procedures − Match / merge and data cleansing standards and procedures − Business intelligence standards and procedures − Enterprise content management standards and procedures, including use of enterprise taxonomies, support for legal discovery and document and e-mail retention, electronic signatures, report formatting standards and report distribution approaches March 8, 2010
93
Regulatory Compliance •
Most organisations are is impacted by government and industry regulations
•
Many of these regulations dictate how data and information is to be managed
•
Compliance is generally mandatory
•
Data governance guides the implementation of adequate controls to ensure, document, and monitor compliance with data-related regulations.
March 8, 2010
94
Regulatory Compliance •
Data governance needs to work the business to find the best answers to the following regulatory compliance questions − − − − − − − − − − − − − −
How relevant is a regulation? Why is it important for us? How do we interpret it? What policies and procedures does it require? Do we comply now? How do we comply now? How should we comply in the future? What will it take? When will we comply? How do we demonstrate and prove compliance? How do we monitor compliance? How often do we review compliance? How do we identify and report non-compliance? How do we manage and rectify non-compliance?
March 8, 2010
95
Issue Management •
Data governance assists in identifying, managing, and resolving data related issues − − − − − − − − − − −
Data quality issues Data naming and definition conflicts Business rule conflicts and clarifications Data security, privacy, and confidentiality issues Regulatory non-compliance issues Non-conformance issues (policies, standards, architecture, and procedures) Conflicting policies, standards, architecture, and procedures Conflicting stakeholder interests in data and information Organisational and cultural change management issues Issues regarding data governance procedures and decision rights Negotiation and review of data sharing agreements
March 8, 2010
96
Issue Management, Control and Escalation •
Data governance implements issue controls and procedures − Identifying, capturing, logging and updating issues − Tracking the status of issues − Documenting stakeholder viewpoints and resolution alternatives − Objective, neutral discussions where all viewpoints are heard − Escalating issues to higher levels of authority − Determining, documenting and communicating issue resolutions.
March 8, 2010
97
Data Management Projects •
Data management roadmap sets out a course of action for initiating and/or improving data management functions
•
Consists of an assessment of current functions, definition of a target environment and target objectives and a transition plan outlining the steps required to reach these targets including an approach to organisational change management
•
Every data management project should follow the project management standards of the organisation
March 8, 2010
98
Data Asset Valuation •
Data and information are truly assets because they have business value, tangible or intangible
•
Different approaches to estimating the value of data assets
•
Identify the direct and indirect business benefits derived from use of the data
•
Identify the cost of data loss, identifying the impacts of not having the current amount and quality level of data
March 8, 2010
99
Data Architecture Management
March 8, 2010
100
Data Architecture Management •
Concerned with defining and maintaining specifications that − Provide a standard common business vocabulary − Express strategic data requirements − Outline high level integrated designs to meet these requirements − Align with enterprise strategy and related business architecture
Data architecture is an integrated set of specification artifacts used to define data requirements, guide integration and control of data assets and align data investments with business strategy • Includes formal data names, comprehensive data definitions, effective data structures, precise data integrity rules, and robust data documentation •
March 8, 2010
101
Data Architecture Management – Definition and Goals •
Definition − Defining the data needs of the enterprise and designing the master blueprints to meet those needs
•
Goals − To plan with vision and foresight to provide high quality data − To identify and define common data requirements − To design conceptual structures and plans to meet the current and long-term data requirements of the enterprise
March 8, 2010
102
Data Architecture Management - Overview Inputs •Business Goals •Business Strategies •Business Architecture •Process Architecture •IT Objectives •IT Strategies •Data Strategies •Data Issues and Needs •Technical Architecture
Suppliers
Primary Deliverables
Data Architecture Management
•Data Stewards •Subject Matter Experts (SMEs) Data Architects •Data Analysts and Modelers Other Enterprise Architects •DM Executive and Managers •CIO and Other Executives •Database Administrators •Data Model Administrator March 8, 2010
Consumers •Data Producers •Knowledge Workers •Managers and Executives •Data Professionals •Customers
•Executives •Data Stewards •Data Producers •Information Consumers
Participants
•Enterprise Data Model Information Value Chain Analysis •Data Technology Architecture •Data Integration / MDM Architecture •DW / BI Architecture •Metadata Architecture •Enterprise Taxonomies and Namespaces •Document Management Architecture •Metadata
Tools
•Data Modeling Tools •Model Management Tool •Metadata Repository Office •Productivity Tools
Metrics •Data Value •Data Management Cost •Achievement of Objectives •# of Decisions Made •Steward Representation / Coverage •Data Professional Headcount •Data Management Process Maturity 103
Enterprise Data Architecture •
Integrated set of specifications and documents − Enterprise Data Model - the core of enterprise data architecture − Information Value Chain Analysis - aligns data with business processes and other enterprise architecture components − Related Data Delivery Architecture - including database architecture, data integration architecture, data warehousing / business intelligence architecture, document content architecture, and metadata architecture
March 8, 2010
104
Data Architecture Management Activities Understand Enterprise Information Needs • Develop and Maintain the Enterprise Data Model • Analyse and Align With Other Business Models • Define and Maintain the Database Architecture • Define and Maintain the Data Integration Architecture • Define and Maintain the Data Warehouse / Business Intelligence Architecture • Define and Maintain Enterprise Taxonomies and Namespaces • Define and Maintain the Metadata Architecture •
March 8, 2010
105
Understanding Enterprise Information Needs In order to create an enterprise data architecture, the organisation must first define its information need • An enterprise data model is a way of capturing and defining enterprise information needs and data requirements • Master blueprint for enterprise-wide data integration • Enterprise data model is a critical input to all future systems development projects and the baseline for additional data requirements analysis • Evaluate the current inputs and outputs required by the organisation, both from and to internal and external targets •
March 8, 2010
106
Develop and Maintain the Enterprise Data Model •
Data is the set of facts collected about business entities
•
Data model is a set of data specifications that reflect data requirements and designs
•
Enterprise data model is an integrated, subject-oriented data model defining the critical data produced and consumed across the organisation
•
Define and analyse data requirements
•
Design logical and physical data structures that support these requirements
March 8, 2010
107
Enterprise Data Model Enterprise Data Model
Subject Area Model
Conceptual Data Model
Enterprise Logical Data Models
Data Steward Responsibility Assignments
March 8, 2010
Other Enterprise Data Model Components
Valid Reference Data Values
Data Quality Specifications
Entity Life Cycles
108
Enterprise Data Model •
Build an enterprise data model in layers
•
Focus on the most critical business subject areas
March 8, 2010
109
Subject Area Model •
List of major subject areas that collectively express the essential scope of the enterprise
•
Important to the success of the entire enterprise data model
•
List of enterprise subject areas becomes one of the most significant organisation classifications
•
Acceptable to organisation stakeholders
•
Useful as the organising framework for data governance, data stewardship, and further enterprise data modeling
March 8, 2010
110
Conceptual Data Model • • • • • •
•
Conceptual data model defines business entities and their relationships Business entities are the primary organisational structures in a conceptual data model Business needs data about business entities Include a glossary containing the business definitions and other metadata associated with business entities and their relationships Assists improved business understanding and reconciliation of terms and their meanings Provide the framework for developing integrated information systems to support both transactional processing and business intelligence. Depicts how the enterprise sees information March 8, 2010
111
Enterprise Logical Data Models •
Logical data model contain a level of detail below the conceptual data model
•
Contain the essential data attributes for each entity
•
Essential data attributes are those data attributes without which the enterprise cannot function – can be a subjective decision
March 8, 2010
112
Other Enterprise Data Model Components •
Data Steward Responsibility Assignments- for subject areas, entities, attributes, and/or reference data value sets
•
Valid Reference Data Values - controlled value sets for codes and/or labels and their business meaning
•
Data Quality Specifications - rules for essential data attributes, such as accuracy / precision requirements, currency (timeliness), integrity rules, nullability, formatting, match/merge rules, and/or audit requirements
•
Entity Life Cycles - show the different lifecycle states of the most important entities and the trigger events that change an entity from one state to another March 8, 2010
113
Analyse and Align with Other Business Models •
Information value-chain analysis maps the relationships between enterprise model elements and other business models
•
Business value chain identifies the functions of an organisation that contribute directly or indirectly to the organisation’s goals
March 8, 2010
114
Define and Maintain the Data Technology Architecture •
Data technology architecture guides the selection and integration of data-related technology
•
Data technology architecture defines standard tool categories, preferred tools in each category, and technology standards and protocols for technology integration
•
Technology categories include Database management systems (DBMS) Database management utilities Data modelling and model management tools Business intelligence software for reporting and analysis Extract-transform-load (ETL), changed data capture (CDC), and other data integration tools − Data quality analysis and data cleansing tools − Metadata management software, including metadata repositories − − − − −
March 8, 2010
115
Define and Maintain the Data Technology Architecture •
Classify technology architecture components as − Current - currently supported and used − Deployment - deployed for use in the next 1-2 years − Strategic - expected to be available for use in the next 2+ years − Retirement - the organisation has retired or intends to retire this year − Preferred - preferred for use by most applications. − Containment - limited to use by certain applications − Emerging - being researched and piloted for possible future deployment
March 8, 2010
116
Define and Maintain the Data Integration Architecture •
Defines how data flows through all systems from beginning to end
•
Both data architecture and application architecture, because it includes both databases and the applications that control the data flow into the system, between databases and back out of the system
March 8, 2010
117
Define and Maintain the Data Warehouse / Business Intelligence Architecture •
Focuses on how data changes and snapshots are stored in data warehouse systems for maximum usefulness and performance
•
Data integration architecture shows how data moves from source systems through staging databases into data warehouses and data marts
•
Business intelligence architecture defines how decision support makes data available, including the selection and use of business intelligence tools
March 8, 2010
118
Define and Maintain Enterprise Taxonomies and Namespaces •
Taxonomy is the hierarchical structure used for outlining topics
•
Organisations develop their own taxonomies to organise collective thinking about topics
•
Overall enterprise data architecture includes organisational taxonomies
•
Definition of terms used in such taxonomies should be consistent with the enterprise data model
March 8, 2010
119
Define and Maintain the Metadata Architecture •
Metadata architecture is the design for integration of metadata across software tools, repositories, directories, glossaries, and data dictionaries
•
Metadata architecture defines the managed flow of metadata
•
Defines how metadata is created, integrated, controlled, and accessed
•
Metadata repository is the core of any metadata architecture
•
Focus of metadata architecture is to ensure the quality, integration, and effective use of metadata March 8, 2010
120
Data Architecture Management Guiding Principles •
•
•
• • • • •
Data architecture is an integrated set of specification master blueprints used to define data requirements, guide data integration, control data assets, and align data investments with business strategy Enterprise data architecture is part of the overall enterprise architecture, along with process architecture, business architecture, systems architecture, and technology architecture Enterprise data architecture includes three major categories of specifications: the enterprise data model, information value chain analysis, and data delivery architecture Enterprise data architecture is about more than just data - it helps to establish a common business vocabulary An enterprise data model is an integrated subject-oriented data model defining the essential data used across an entire organisation Information value-chain analysis defines the critical relationships between data, processes, roles and organisations and other enterprise elements Data delivery architecture defines the master blueprint for how data flows across databases and applications Architectural frameworks like TOGAF help organise collective thinking about architecture March 8, 2010
121
Data Development
March 8, 2010
122
Data Development •
Analysis, design, implementation, deployment, and maintenance of data solutions to maximise the value of the data resources to the enterprise
•
Subset of project activities within the system development lifecycle focused on defining data requirements, designing the data solution components, and implementing these components
•
Primary data solution components are databases and other data structures
March 8, 2010
123
Data Development – Definition and Goals •
Definition − Designing, implementing, and maintaining solutions to meet the data needs of the enterprise
•
Goals − Identify and define data requirements − Design data structures and other solutions to these requirements − Implement and maintain solution components that meet these requirements − Ensure solution conformance to data architecture and standards as appropriate − Ensure the integrity, security, usability, and maintainability of structured data assets March 8, 2010
124
Data Development - Overview Inputs
Primary Deliverables
•Business Goals and Strategies •Data Needs and Strategies •Data Standards •Data Architecture •Process Architecture •Application Architecture •Technical Architecture
Suppliers
Data Development
•Data Stewards •Subject Matter Experts •IT Steering Committee •Data Governance Council •Data Architects and Analysts •Software Developers •Data Producers •Information Consumers
Participants •Data Stewards and SMEs •Data Architects and Analysts •Database Administrators •Data Model Administrators •Software Developers •Project Managers •DM Executives and Other IT Management March 8, 2010
Tools •Data Modeling Tools •Database Management Systems •Software Development Tools •Testing Tools •Data Profiling Tools •Model Management Tools •Configuration Management Tools •Office Productivity Tools
•Data Requirements and Business Rules •Conceptual Data Models •Logical Data Models and Specifications •Physical Data Models and Specifications •Metadata (Business and Technical) •Data Modeling and DB Design Standards •Data Model and DB Design Reviews •Version Controlled Data Models •Test Data •Development and Test Databases •Information Products •Data Access Services •Data Integration Services •Migrated and Converted Data
Consumers •Data Producers •Knowledge Workers •Managers and Executives •Customers •Data Professionals •Other IT Professionals 125
Data Development Function, Activities and SubActivities Data Development Data Modelling, Analysis and Solution Design
Data Model and Design Quality Management
Detailed Data Design
Analyse Information Requirements
Design Physical Databases
Develop and Maintain Conceptual Data Models
Physical Database Design
Data Implementation
Develop Data Modeling and Design Standards
Implement Development / Test Database Changes
Review Data Model and Database Design Quality
Create and Maintain Test Data
Entities
Performance Modifications
Conceptual and Logical Data Model Reviews
Migrate and Convert Data
Relationships
Physical Database Design Documentation
Physical Database Design Review
Build and Test Information Products
Data Model Validation
Build and Test Data Access Services
Develop and Maintain Logical Data Models
Design Information Products
Attributes
Design Data Access Services
Domains
Design Data Integration Services
Manage Data Model Versioning and Integration
Validate Information Requirements Prepare for Data Deployment
Keys
Develop and Maintain Physical Data Models March 8, 2010
126
Data Development - Principles • • • • • • • • •
•
Data development activities are an integral part of the software development lifecycle Data modeling is an essential technique for effective data management and system design Conceptual and logical data modeling express business and application requirements while physical data modeling represents solution design Data modeling and database design define detail solution component specifications Data modeling and database design balances tradeoffs and needs Data professionals should collaborate with other project team members to design information products and data access and integration interfaces Data modeling and database design should follow documented standards Design reviews should review all data models and designs, in order to ensure they meet business requirements and follow design standards Data models represent valuable knowledge resources and so should be carefully managed and controlled them through library, configuration, and change management to ensure data model quality and availability Database administrators and other data professionals play important roles in the construction, testing, and deployment of databases and related application systems
March 8, 2010
127
Data Modeling, Analysis, and Solution Design •
Data modeling is an analysis and design method used to define and analyse data requirements, and design data structures that support these requirements
•
A data model is a set of data specifications and related diagrams that reflect data requirements and designs
•
Data modeling is a complex process involving interactions between people and with technology which do not compromise the integrity or security of the data
•
Good data models accurately express and effectively communicate data requirements and quality solution design March 8, 2010
128
Data Model •
The purposes of a data model are: − Communication - a data model is a bridge to understanding data between people with different levels and types of experience. Data models help us understand a business area, an existing application, or the impact of modifying an existing structure. Data models may also facilitate training new business and/or technical staff − Formalisation - a data model documents a single, precise definition of data requirements and data related business rules − Scope – a data model can help explain the data context and scope of purchased application packages
•
Data models that include the same data may differ by: − Scope - expressing a perspective about data in terms of function (business view or application view), realm (process, department, division, enterprise, or industry view), and time (current state, short-term future, long-term future) − Focus - basic and critical concepts (conceptual view), detailed but independent of context (logical view), or optimised for a specific technology and use (physical view) March 8, 2010
129
Analyse Information Requirements • •
• •
• •
Information is relevant and timely data in context To identify information requirements, first identify business information needs, often in the context of one or more business processes Business processes (and the underlying IT systems) consume information output from other business processes Requirements analysis includes the elicitation, organisation, documentation, review, refinement, approval, and change control of business requirements Some of these requirements identify business needs for data and information Logical data modeling is an important means of expressing business data requirements March 8, 2010
130
Develop and Maintain Conceptual Data Models •
Visual, high-level perspective on a subject area of importance to the business
•
Contains the basic and critical business entities within a given realm and function with a description of each entity and the relationships between entities
•
Define the meanings of the essential business vocabulary
•
Reflect the data associated with a business process or application function
•
Independent of technology and usage context
March 8, 2010
131
Develop and Maintain Conceptual Data Models •
Entities − A data entity is a collection of data about something that the business deems important and worthy of capture − Entities appear in conceptual or logical data models
•
Relationships − Business rules define constraints on what can and cannot be done • Data Rules – define constraints on how data relates to other data • Action Rules - instructions on what to do when data elements contain certain values
March 8, 2010
132
Develop and Maintain Logical Data Models Detailed representation of data requirements and the business rules that govern data quality • Independent of any technology or specific implementation technical constraints • Extension of a conceptual data model • Logical data models transform conceptual data model structures by normalisation and abstraction •
− Normalisation is the process of applying rules to organise business complexity into stable data structure − Abstraction is the redefinition of data entities, elements, and relationships by removing details to broaden the applicability of data structures to a wider class of situations March 8, 2010
133
Develop and Maintain Physical Data Models •
Physical data model optimises the implementation of detailed data requirements and business rules in light of technology constraints, application usage, performance requirements, and modeling standards
•
Physical data modeling transforms the logical data model
•
Includes specific decisions − Name of each table and column or file and field or schema and element − Logical domain, physical data type, length, and nullability of each column or field − Default values − Primary and alternate unique keys and indexes March 8, 2010
134
Detailed Data Design •
Detailed data design activities include − Detailed physical database design, including views, functions, triggers, and stored procedures − Definition of supporting data structures, such as XML schemas and object classes − Creation of information products, such as the use of data in screens and reports − Definition of data access solutions, including data access objects, integration services, and reporting and analysis services
March 8, 2010
135
Design Physical Databases •
Create detailed database implementation specifications • Ensure the design meets data integrity requirements • Determine the most appropriate physical structure to house and organise the data, such as relational or other type of DBMS, files, OLAP cubes, XML, etc. • Determine database resource requirements, such as server size and location, disk space requirements, CPU and memory requirements, and network requirements • Creating detailed design specifications for data structures, such as relational database tables, indexes, views, OLAP data cubes, XML schemas, etc. • Ensure performance requirements are met, including batch and online response time requirements for queries, inserts, updates, and deletes • Design for backup, recovery, archiving, and purge processing, ensuring availability requirements are met • Design data security implementation, including authentication, encryption needs, application roles and data access and update permissions • Review code to ensure that it meets coding standards and will run efficiently
March 8, 2010
136
Physical Database Design •
Choose a database design based on both a choice of architecture and a choice of technology
•
Base the choice of architecture (for example, relational, hierarchical, network, object, star schema, snowflake, cube, etc.) on data considerations
•
Consider factors such as how long the data needs to be kept, whether it must be integrated with other data or passed across system or application boundaries, and on requirements of data security, integrity, recoverability, accessibility, and reusability
•
Consider organisational or political factors, including organisational biases and developer skill sets, that lean toward a particular technology or vendor
March 8, 2010
137
Physical Database Design - Principles •
Performance and Ease of Use - Ensure quick and easy access to data by approved users in a usable and business-relevant form
•
Reusability - The database structure should ensure that, where appropriate, multiple applications would be able to use the data
•
Integrity - The data should always have a valid business meaning and value, regardless of context, and should always reflect a valid state of the business
•
Security - True and accurate data should always be immediately available to authorised users, but only to authorised users
•
Maintainability - Perform all data work at a cost that yields value by ensuring that the cost of creating, storing, maintaining, using, and disposing of data does not exceed its value to the organisation March 8, 2010
138
Physical Database Design - Questions • •
•
• • • •
• •
What are the performance requirements? What is the maximum permissible time for a query to return results, or for a critical set of updates to occur? What are the availability requirements for the database? What are the window(s) of time for performing database operations? How often should database backups and transaction log backups be done (i.e., what is the longest period of time we can risk non-recoverability of the data)? What is the expected size of the database? What is the expected rate of growth of the data? At what point can old or unused data be archived or deleted? How many concurrent users are anticipated? What sorts of data virtualisation are needed to support application requirements in a way that does not tightly couple the application to the database schema? Will other applications need the data? If so, what data and how? Will users expect to be able to do ad-hoc querying and reporting of the data? If so, how and with which tools? What, if any, business or application processes does the database need to implement? (e.g., trigger code that does cross-database integrity checking or updating, application classes encapsulated in database procedures or functions, database views that provide table recombination for ease of use or security purposes, etc.). Are there application or developer concerns regarding the database, or the database development process, that need to be addressed? Is the application code efficient? Can a code change relieve a performance issue? March 8, 2010
139
Performance Modifications •
Consider how the database will perform when applications make requests to access and modify data
•
Indexing can improve query performance in many cases
•
Denormalisation is the deliberate transformation of a normalised logical data model into tables with redundant data
March 8, 2010
140
Physical Database Design Documentation •
Create physical database design document to assist implementation and maintenance
March 8, 2010
141
Design Information Products • • • • • • • •
•
Design data-related deliverables Design screens and reports to meet business data requirements Ensure consistent use of business data terminology Reporting services give business users the ability to execute both pre-developed and ad-hoc reports Analysis services give business users to ability slice and dice data across multiple dimensions Dashboards display a wide array of analytics indicators, such as charts and graphs, efficiently Scorecard display information that indicates scores or calculated evaluations of performance Use data integrated from multiple databases as input to software for business process automation that coordinates multiple business processes across disparate platforms Data integration is a component of Enterprise Application Integration (EAI) software, enabling data to be easily passed from application to application across disparate platforms March 8, 2010
142
Design Data Access Services May be necessary to access and combine data from remote databases with data in the local database • Goal is to enable easy and inexpensive reuse of data across the organisation preventing, wherever possible, redundant and inconsistent data • Options include •
− Linked database connections − SOA web services − Message brokers − Data access classes − ETL − Replication March 8, 2010
143
Design Data Integration Services •
Critical aspect of database design is determining appropriate update mechanisms and database transaction for recovery
•
Define source-to-target mappings and data transformation designs for extract-transform-load (ETL) programs and other technology for ongoing data movement, cleansing and integration
•
Design programs and utilities for data migration and conversion from old data structures to new data structures
March 8, 2010
144
Data Model and Design Quality Management •
Balance the needs of information consumers (the people with business requirements for data) and the data producers who capture the data in usable form
•
Time and budget constraints
•
Ensure data resides in data structures that are secure, recoverable, sharable, and reusable, and that this data is as correct, timely, relevant, and usable as possible
•
Balance the short-term versus long-term business data interests of the organisation
March 8, 2010
145
Develop Data Modeling and Design Standards •
•
Data modeling and database design standards serve as the guiding principles to effectively meet business data needs, conform to data architecture, and ensure data quality Data modeling and database design standards should include − A list and description of standard data modeling and database design deliverables − A list of standard names, acceptable abbreviations, and abbreviation rules for uncommon words, that apply to all data model objects − A list of standard naming formats for all data model objects, including attribute and column class words − A list and description of standard methods for creating and maintaining these deliverables − A list and description of data modeling and database design roles and responsibilities − A list and description of all metadata properties captured in data modeling and database design, including both business metadata and technical metadata, with guidelines defining metadata quality expectations and requirements − Guidelines for how to use data modeling tools − Guidelines for preparing for and leading design reviews March 8, 2010
146
Review Data Model and Database Design Quality •
Conduct requirements reviews and design reviews, including a conceptual data model review, a logical data model review, and a physical database design review
March 8, 2010
147
Conceptual and Logical Data Model Reviews •
Conceptual data model and logical data model design reviews should ensure that: − Business data requirements are completely captured and clearly expressed in the model, including the business rules governing entity relationships − Business (logical) names and business definitions for entities and attributes (business semantics) are clear, practical, consistent, and complementary − Data modeling standards, including naming standards, have been followed − The conceptual and logical data models have been validated
March 8, 2010
148
Physical Database Design Review •
Physical database design reviews should ensure that: − The design meets business, technology, usage, and performance requirements − Database design standards, including naming and abbreviation standards, have been followed − Availability, recovery, archiving, and purging procedures are defined according to standards − Metadata quality expectations and requirements are met in order to properly update any metadata repository − The physical data model has been validated
March 8, 2010
149
Data Model Validation •
Validate data models against modeling standards, business requirements, and database requirements
•
Ensure the model matches applicable modeling standards
•
Ensure the model matches the business requirements
•
Ensure the model matches the database requirements
March 8, 2010
150
Manage Data Model Versioning and Integration •
Data models and other design specifications require change control − Each change should include − Why the project or situation required the change − What and how the object(s) changed, including which tables had columns added, modified, or removed, etc. − When the change was approved and when the change was made to the model − Who made the change − Where the change was made
March 8, 2010
151
Data Implementation •
Data implementation consists of data management activities that support system building, testing, and deployment − Database implementation and change management in the development and test environments − Test data creation, including any security procedures − Development of data migration and conversion programs, both for project development through the SDLC and for business situations − Validation of data quality requirements − Creation and delivery of user training − Contribution to the development of effective documentation March 8, 2010
152
Implement Development / Test Database Changes •
Implement changes to the database that are required during the course of application development
•
Monitor database code to ensure that it is written to the same standards as application code
•
Identify poor SQL coding practices that could lead to errors or performance problems
March 8, 2010
153
Create and Maintain Test Data •
Populate databases in the development environment with test data
•
Observe privacy and confidentiality requirements and practices for test data
March 8, 2010
154
Migrate and Convert Data •
Key component of many projects is the migration of legacy data to a new database environment, including any necessary data cleansing and reformatting
March 8, 2010
155
Build and Test Information Products •
Implement mechanisms for integrating data from multiple sources, along with the appropriate metadata to ensure meaningful integration of the data
•
Implement mechanisms for reporting and analysing the data, including online and web-based reporting, ad-hoc querying, BI scorecards, OLAP, portals, and the like
•
Implement mechanisms for replication of the data, if network latency or other concerns make it impractical to service all users from a single data source
March 8, 2010
156
Build and Test Data Access Services •
Develop, test, and execute data migration and conversion programs and procedures, first for development and test data and later for production deployment
•
Data requirements should include business rules for data quality to guide the implementation of application edits and database referential integrity constraints
•
Business data stewards and other subject matter experts should validate the correct implementation of data requirements through user acceptance testing
March 8, 2010
157
Validate Information Requirements •
Test and validate that the solution meets the requirements, and plan deployment, developing training, and documentation.
•
Data requirements may change abruptly, in response to either changed business requirements, invalid assumptions regarding the data or reprioritisation of existing requirements
•
Test the implementation of the data requirements and ensure that the application requirements are satisfied
March 8, 2010
158
Prepare for Data Deployment •
• •
• •
Leverage the business knowledge captured in data modeling to define clear and consistent language in user training and documentation Business concepts, terminology, definitions, and rules depicted in data models are an important part of application user training Data stewards and data analysts should participate in deployment preparation, including development and review of training materials and system documentation, especially to ensure consistent use of defined business data terminology Help desk support staff also require orientation and training in how system users appropriately access, manipulate, and interpret data Once installed, business data stewards and data analysts should monitor the early use of the system to see that business data requirements are indeed met March 8, 2010
159
Data Operations Management
March 8, 2010
160
Data Operations Management •
Management is the development, maintenance, and support of structured data to maximise the value of the data resources to the enterprise and includes − Database support − Data technology management
March 8, 2010
161
Data Operations Management – Definition and Goals •
Definition − Planning, control, and support for structured data assets across the data lifecycle, from creation and acquisition through archival and purge
•
Goals − Protect and ensure the integrity of structured data assets − Manage the availability of data throughout its lifecycle − Optimise performance of database transactions
March 8, 2010
162
Data Operations Management - Overview Inputs
Primary Deliverables
•Data Requirements •Data Architecture •Data Models •Legacy Data •Service Level Agreements
Suppliers
Data Operations Management
Consumers
•Executives •IT Steering Committee •Data Governance Council •Data Stewards •Data Architects and Modelers •Software Developers
Participants •Database Administrators •Software Developers •Project Managers •Data Stewards •Data Architects and Analysts •DM Executives and Other IT Management •IT Operators March 8, 2010
•DBMS Technical Environments •Dev/Test, QA, DR, and Production Databases •Externally Sourced Data •Database Performance •Data Recovery Plans •Business Continuity •Data Retention Plan •Archived and Purged Data
•Data Creators •Information Consumers •Enterprise Customers •Data Professionals •Other IT Professionals
Tools
Metrics •Database Management Systems •Data Development Tools •Database Administration Tools •Office Productivity Tools
•Availability •Performance
163
Data Operations Management Function, Activities and Sub-Activities Data Operations Management Database Support
Data Technology Management Implement and Control Database Environments
Understand Data Technology Requirements
Obtain Externally Sourced Data
Define the Data Technology Architecture
Plan for Data Recovery
Evaluate Data Technology
Backup and Recover Data
Install and Administer Data Technology
Set Database Performance Service Levels
Inventory and Track Data Technology Licenses
Monitor and Tune Database Performance
Support Data Technology Usage and Issues
Plan for Data Retention Archive, Retain, and Purge Data Support Specialised Databases March 8, 2010
164
Data Operations Management - Principles • • • • • • • • • •
Write everything down Keep everything Whenever possible, automate a procedure Focus to understand the purpose of each task, manage scope, simplify, do one thing at a time Measure twice, cut once React to problems and issues calmly and rationally, because panic causes more errors Understand the business, not just the technology Work together to collaborate, be accessible, share knowledge Use all of the resources at your disposal Keep up to date March 8, 2010
165
Database Support - Scope •
Ensure the performance and reliability of the database, including performance tuning, monitoring, and error reporting
•
Implement appropriate backup and recovery mechanisms to guarantee the recoverability of the data in any circumstance
•
Implement mechanisms for clustering and failover of the database, if continual data availability data is a requirement
•
Implement mechanisms for archiving data operations management March 8, 2010
166
Database Support - Deliverables •
A production database environment, including an instance of the DBMS and its supporting server, of a sufficient size and capacity to ensure adequate performance, configured for the appropriate level of security, reliability and availability
•
Mechanisms and processes for controlled implementation and changes to databases into the production environment
•
Appropriate mechanisms for ensuring the availability, integrity, and recoverability of the data in response to all possible circumstances that could result in loss or corruption of data
•
Appropriate mechanisms for detecting and reporting any error that occurs in the database, the DBMS, or the data server
•
Database availability, recovery, and performance in accordance with service level agreements March 8, 2010
167
Implement and Control Database Environments •
Updating DBMS software
•
Maintaining multiple installations, including different DBMS versions
•
Installing and administering related data technology, including data integration software and third party data administration tools
•
Setting and tuning DBMS system parameters
•
Managing database connectivity
•
Tune operating systems, networks, and transaction processing middleware to work with the DBMS
•
Optimise the use of different storage technology for cost-effective storage
March 8, 2010
168
Obtain Externally Sourced Data •
Managed approach to data acquisition centralises responsibility for data subscription services
•
Document the external data source in the logical data model and data dictionary
•
Implement the necessary processes to load the data into the database and/or make it available to applications
March 8, 2010
169
Plan for Data Recovery •
Establish service level agreements (SLAs) with IT data management services organisations for data availability and recovery
•
SLAs set availability expectations, allowing time for database maintenance and backup, and set recovery time expectations for different recovery scenarios, including potential disasters
•
Ensure a recovery plan exists for all databases and database servers, covering all possible scenarios − Loss of the physical database server − Loss of one or more disk storage devices − Loss of a database, including the DBMS master database, temporary storage database, transaction log segment, etc. − Corruption of database index or data pages − Loss of the database or log segment file system − Loss of database or transaction log backup files March 8, 2010
170
Backup and Recover Data •
Make regular backups of database and the database transaction logs
•
Balance the importance of the data against the cost of protecting it
•
Databases should reside on some sort of managed storage area
•
For critical data, implement some sort of replication facility
March 8, 2010
171
Set Database Performance Service Levels Database performance has two components - availability and performance • An unavailable database has a performance measure of zero • SLAs between data management services organisations and data owners define expectations for database performance • Availability is the percentage of time that a system or database can be used for productive work • Availability requirements are constantly increasing, raising the business risks and costs of unavailable data •
March 8, 2010
172
Set Database Performance Service Levels •
Factors affecting availability include − Manageability - ability to create and maintain an effective environment − Recoverability - ability to reestablish service after interruption, and correct errors caused by unforeseen events or component failures − Reliability - ability to deliver service at specified levels for a stated period − Serviceability - ability to determine the existence of problems, diagnose their causes, and repair / solve the problems
•
Tasks to ensure databases stay online and operational − − − − − − −
Running database backup utilities Running database reorganisation utilities Running statistics gathering utilities Running integrity checking utilities Automating the execution of these utilities Exploiting table space clustering and partitioning Replicating data across mirror databases to ensure high availability
March 8, 2010
173
Set Database Performance Service Levels •
Cause of loss of database availability − − − − − − − − − − − − − − − −
Planned and unplanned outages Loss of the server hardware Disk hardware failure Operating system failure DBMS software failure Application problems Network failure Data center site loss Security and authorisation problems Corruption of data (due to bugs, poor design, or user error) Loss of database objects Loss of data Data replication failure Severe performance problems Recovery failures Human error
March 8, 2010
174
Monitor and Tune Database Performance •
•
•
Optimise database performance both proactively and reactively, by monitoring performance and by responding to problems quickly and effectively Run activity and performance reports against both the DBMS and the server on a regular basis including during periods of heavy activity When performance problems occur, use the monitoring and administration tools of the DBMS to help identify the source of the problem − − − − − − − −
Memory allocation (buffer / cache for data) Locking and blocking Failure to update database statistics Poor SQL coding Insufficient indexing Application activity Increase in the number, size, or use of databases Database volatility
March 8, 2010
175
Support Specialised Databases •
Some specialised situations require specialised types of databases
March 8, 2010
176
Data Technology Management •
Managing data technology should follow the same principles and standards for managing any technology
•
Use a reference model for technology management such as Information Technology Infrastructure Library (ITIL)
March 8, 2010
177
Understand Data Technology Requirements • • •
Understand the data and information needs of the business Understand the best possible applications of technology to solve business problems and take advantage of new business opportunities Understand the requirements of a data technology before determining what technical solution to choose for a particular situation − − − − − − − − − −
What problem does this data technology mean to solve? What does this data technology do that is unavailable in other data technologies? What does this data technology not do that is available in other data technologies? Are there any specific hardware requirements for this data technology? Are there any specific Operating System requirements for this data technology? Are there any specific software requirements or additional applications required for this data technology to perform as advertised? Are there any specific storage requirements for this data technology? Are there any specific network or connectivity requirements for this data technology? Does this data technology include data security functionality? If not, what other tools does this technology work with that provides for data security functionality? Are there any specific skills required to be able support this data technology? Do we have those skills in-house or must we acquire them?
March 8, 2010
178
Define the Data Technology Architecture •
Data technology architecture addresses three core questions − What technologies are standard (which are required, preferred, or acceptable)? − Which technologies apply to which purposes and circumstances? − In a distributed environment, which technologies exist where, and how does data move from one node to another?
•
Technology is never free - even open-source technology requires maintenance
•
Technology should always be regarded as the means to an end, rather than the end itself
•
Buying the same technology that everyone else is using, and using it in the same way, does not create business value or competitive advantage for the organisation March 8, 2010
179
Define the Data Technology Architecture •
Technology categories include − Database management systems (DBMS) − Database management utilities − Data modelling and model management tools − Business intelligence software for reporting and analysis − Extract-transform-load (ETL), changed data capture (CDC), and other data integration tools − Data quality analysis and data cleansing tools − Metadata management software, including metadata repositories
March 8, 2010
180
Define the Data Technology Architecture •
Classify technology architecture components as − Current - currently supported and used − Deployment - deployed for use in the next 1-2 years − Strategic - expected to be available for use in the next 2+ years − Retirement - the organisation has retired or intends to retire this year − Preferred - preferred for use by most applications. − Containment - limited to use by certain applications − Emerging - being researched and piloted for possible future deployment
•
Create road map for the organisation consisting of these components to helps govern future technology decisions March 8, 2010
181
Evaluate Data Technology •
•
Selecting appropriate data related technology, particularly the appropriate database management technology, is an important data management responsibility Data technologies to be researched and evaluated include: − Database management systems (DBMS) software − Database utilities, such as backup and recovery tools, and performance monitors − Data modeling and model management software − Database management tools, such as editors, schema generators, and database object generators − Business intelligence software for reporting and analysis − Extract-transfer-load (ETL) and other data integration tools − Data quality analysis and data cleansing tools − Data virtualisation technology − Metadata management software, including metadata repositories March 8, 2010
182
Evaluate Data Technology •
Use a standard technology evaluation process − Understand user needs, objectives, and related requirements − Understand the technology in general − Identify available technology alternatives − Identify the features required − Weigh the importance of each feature − Understand each technology alternative − Evaluate and score each technology alternative’s ability to meet requirements − Calculate total scores and rank technology alternatives by score − Evaluate the results, including the weighted criteria − Present the case for selecting the highest ranking alternative March 8, 2010
183
Evaluate Data Technology • •
Selecting strategic DBMS software is very important Factors to consider when selecting DBMS software include: − Product architecture and complexity − Application profile, such as transaction processing, business intelligence, and personal profiles − Organisational appetite for technical risk − Hardware platform and operating system support − Availability of supporting software tools − Performance benchmarks − Scalability − Software, memory, and storage requirements − Available supply of trained technical professionals − Cost of ownership, such as licensing, maintenance, and computing resources − Vendor reputation − Vendor support policy and release schedule − Customer references
March 8, 2010
184
Install and Administer Data Technology •
Need to deploy new technology products in development / test, QA / certification, and production environments
•
Create and document processes and procedures for administering the product
•
Cost and complexity of implementing new technology is usually underestimated
•
Features and benefits are usually overestimated
•
Start with small pilot projects and proof-of-concept (POC) implementations to get a good idea of the true costs and benefits before proceeding with larger production implementation March 8, 2010
185
Inventory and Track Data Technology Licenses •
Comply with licensing agreements and regulatory requirements
•
Track and conduct yearly audits of software license and annual support costs
•
Track other costs such as server lease agreements and other fixed costs
•
Use data to determine the total cost-of-ownership (TCO) for each type of technology and technology product
•
Evaluate technologies and products that are becoming obsolete, unsupported, less useful, or too expensive March 8, 2010
186
Support Data Technology Usage and Issues •
Work with business users and application developers to − Ensure the most effective use of the technology − Explore new applications of the technology − Address any problems or issues that surface from its use
•
Training is important to effective understanding and use of any technology
March 8, 2010
187
Data Security Management
March 8, 2010
188
Data Security Management •
Planning, development, and execution of security policies and procedures to provide proper authentication, authorisation, access, and auditing of data and information assets
•
Effective data security policies and procedures ensure that the right people can use and update data in the right way, and that all inappropriate access and update is restricted
•
Effective data security management function establishes governance mechanisms that are easy enough to abide by on a daily operational basis
March 8, 2010
189
Data Security Management – Definition and Goals •
Definition − Planning, development, and execution of security policies and procedures to provide proper authentication, authorisation, access, and auditing of data and information.
•
Goals − Enable appropriate, and prevent inappropriate, access and change to data assets − Meet regulatory requirements for privacy and confidentiality − Ensure the privacy and confidentiality needs of all stakeholders are met
March 8, 2010
190
Data Security Management •
Protect information assets in alignment with privacy and confidentiality regulations and business requirements − Stakeholder Concerns - organisations must recognise the privacy and confidentiality needs of their stakeholders, including clients, patients, students, citizens, suppliers, or business partners − Government Regulations - government regulations protect some of the stakeholder security interests. Some regulations restrict access to information, while other regulations ensure openness, transparency, and accountability − Proprietary Business Concerns - each organisation has its own proprietary data to protect - ensuring competitive advantage provided by intellectual property and intimate knowledge of customer needs and business partner relationships is a cornerstone in any business plan − Legitimate Access Needs - data security implementers must also understand the legitimate needs for data access
March 8, 2010
191
Data Security Requirements and Procedures •
Data security requirements and the procedures to meet these requirements − Authentication - validate users are who they say they are − Authorisation - identify the right individuals and grant them the right privileges to specific, appropriate views of data − Access - enable these individuals and their privileges in a timely manner − Audit - review security actions and user activity to ensure compliance with regulations and conformance with policy and standards
March 8, 2010
192
Data Security Management - Overview Inputs
Primary Deliverables
•Business Goals •Business Strategy •Business Rules •Business Process •Data Strategy •Data Privacy Issues •Related IT Policies and Standards
Suppliers
Data Security Management
•Data Stewards •IT Steering Committee •Data Stewardship Council •Government •Customers
Participants •Data Stewards •Data Security Administrators •Database Administrators •BI Analysts •Data Architects •DM Leader •CIO/CTO •Help Desk Analysts March 8, 2010
Tools •Database Management System •Business Intelligence Tools •Application Frameworks •Identity Management Technologies •Change Control Systems
•Data Security Policies •Data Privacy and Confidentiality Standards •User Profiles, Passwords and Memberships •Data Security Permissions •Data Security Controls •Data Access Views •Document Classifications •Authentication and Access History •Data Security Audits
Consumers •Data Producers •Knowledge Workers •Managers •Executives •Customers •Data Professionals
193
Data Security Management Function, Activities and Sub-Activities Data Security Management
Understand Data Security Needs and Regulatory Requirements
Define Data Security Policy
Business Requirements
Define Data Security Standards
Define Data Security Controls and Procedures
Manage Users, Passwords, and Group Membership
Manage Data Access Views and Permissions
Monitor User Authentication and Access Behaviour
Classify Information Confidentially
Audit Data Security
Password Standards and Procedures
Regulatory Requirements
March 8, 2010
194
Data Operations Management - Principles • • • • • • • • • • • • • • •
Be a responsible trustee of data about all parties. Understand and respect the privacy and confidentiality needs of all stakeholders, be they clients, patients, students, citizens, suppliers, or business partners Understand and comply with all pertinent regulations and guidelines Data-to-process and data-to-role relationship (CRUD Create, Read, Update, Delete) matrices help map data access needs and guide definition of data security role groups, parameters, and permissions Definition of data security requirements and data security policy is a collaborative effort involving IT security administrators, data stewards, internal and external audit teams, and the legal department Identify detailed application security requirements in the analysis phase of every systems development project Classify all enterprise data and information products against a simple confidentiality classification schema Every user account should have a password set by the user following a set of password complexity guidelines, and expiring every 45 to 60 days Create role groups; define privileges by role; and grant privileges to users by assigning them to the appropriate role group. Whenever possible, assign each user to only one role group Some level of management must formally request, track, and approve all initial authorisations and subsequent changes to user and group authorisations To avoid data integrity issues with security access information, centrally manage user identity data and group membership data Use relational database views to restrict access to sensitive columns and / or specific rows Strictly limit and carefully consider every use of shared or service user accounts Monitor data access to certain information actively, and take periodic snapshots of data access activity to understand trends and compare against standards criteria Periodically conduct objective, independent, data security audits to verify regulatory compliance and standards conformance, and to analyse the effectiveness and maturity of data security policy and practice In an outsourced environment, be sure to clearly define the roles and responsibilities for data security and understand the chain of custody data across organisations and roles. March 8, 2010
195
Understand Data Security Needs and Regulatory Requirements •
Distinguish between business rules and procedures and the rules imposed by application software products
•
Common for systems to have their own unique set of data security requirements over and above those required business processes
March 8, 2010
196
Business Requirements •
Implementing data security within an enterprise requires an understanding of business requirements
•
Business needs of an enterprise define the degree of rigidity required for data security
•
Business rules and processes define the security touch points
•
Data-to-process and data-to-role relationship matrices are useful tools to map these needs and guide definition of data security role-groups, parameters, and permissions
•
Identify detailed application security requirements in the analysis phase of every systems development project March 8, 2010
197
Regulatory Requirements •
Organisations must comply with a growing set of regulations
•
Some regulations impose security controls on information management
March 8, 2010
198
Define Data Security Policy •
Definition of data security policy based on data security requirements is a collaborative effort involving IT security administrators, data stewards, internal and external audit teams, and the legal department
•
Enterprise IT strategy and standards typically dictate highlevel policies for access to enterprise data assets
•
Data security policies are more granular in nature and take a very data-centric approach compared to an IT security policy
March 8, 2010
199
Define Data Security Standards • • •
•
No one prescribed way of implementing data security to meet privacy and confidentiality requirements Regulations generally focus on ensuring achieving an end without defining them means for achieving it Organisations should design their own security controls, demonstrate that the controls meet the requirements of the law or regulations and document the implementation of those controls Information technology security standards can also affect − − − − − − −
Tools used to manage data security Data encryption standards and mechanisms Access guidelines to external vendors and contractors Data transmission protocols over the internet Documentation requirements Remote access standards Security breach incident reporting procedures
March 8, 2010
200
Define Data Security Standards •
Consider physical security, especially with the explosion of portable devices and media, to formulate an effective data security strategy − Access to data using mobile devices − Storage of data on portable devices such as laptops, DVDs, CDs or USB drives − Disposal of these devices in compliance with records management policies
• • • •
An organisation should develop a practical, implementable security policy including data security guiding principles Focus should be on quality and consistency not creating a lengthy body of guidelines Execution of the policy requires satisfying the elements of securing information assets: authentication, authorisation, access, and audit Information classification, access rights, role groups, users, and passwords are the means to implementing policy and satisfying these elements March 8, 2010
201
Define Data Security Controls and Procedures •
Implementation and administration of data security policy is primarily the responsibility of security administrators
•
Database security is often one responsibility of database administrators
•
Implement proper controls to meet the objectives of relevant laws
•
Implement a process to validate assigned permissions against a change management system used for tracking all user permission requests
March 8, 2010
202
Manage Users, Passwords, and Group Membership •
Role groups enable security administrators to define privileges by role and to grant these privileges to users by enrolling them in the appropriate role group
•
Data consistency in user and group management is a challenge in a mixed IT environment
•
Construct group definitions at a workgroup or business unit level
•
Organise roles in a hierarchy, so that child roles further restrict the privileges of parent roles
March 8, 2010
203
Password Standards and Procedures •
Passwords are the first line of defense in protecting access to data
•
Every user account should be required to have a password set by the user with a sufficient level of password complexity defined in the security standards
March 8, 2010
204
Manage Data Access Views and Permissions •
Data security management involves not just preventing inappropriate access, but also enabling valid and appropriate access to data
•
Most sets of data do not have any restricted access requirements
•
Control sensitive data access by granting permissions - opt-in
•
Access control degrades when achieved through shared or service accounts − Implemented as convenience for administrators, these accounts often come with enhanced privileges and are untraceable to any particular user or administrator − Enterprises using shared or service accounts run the risk of data security breaches − Evaluate use of such accounts carefully, and never use them frequently or by default March 8, 2010
205
Monitor User Authentication and Access Behaviour •
Monitoring authentication and access behaviour is critical because: − It provides information about who is connecting and accessing information assets, which is a basic requirement for compliance auditing − It alerts security administrators to unforeseen situations, compensating for oversights in data security planning, design, and implementation
Monitoring helps detect unusual or suspicious transactions that may warrant further investigation and issue resolution • Perform monitoring either actively or passively • Automated systems with human checks and balances in place best accomplish both methods •
March 8, 2010
206
Classify Information Confidentiality • • •
Classify an organisation’s data and information using a simple confidentiality classification schema Most organisations classify the level of confidentiality for information found within documents, including reports A typical classification schema might include the following five confidentiality classification levels: − For General Audiences: Information available to anyone, including the general public − Internal Use Only: Information limited to employees or members, but with minimal risk if shared − Confidential: Information which should not be shared outside the organisation. Client Confidential information may not be shared with other clients − Restricted Confidential: Information limited to individuals performing certain roles with the need to know − Registered Confidential: Information so confidential that anyone accessing the information must sign a legal agreement to access the data and assume responsibility for its secrecy
March 8, 2010
207
Audit Data Security •
Auditing data security is a recurring control activity with responsibility to analyse, validate, counsel, and recommend policies, standards, and activities related to data security management
•
Auditing is a managerial activity performed with the help of analysts working on the actual implementation and details
•
The goal of auditing is to provide management and the data governance council with objective, unbiased assessments, and rational, practical recommendations
•
Auditing data security is no substitute for effective management of data security
•
Auditing is a supportive, repeatable process, which should occur regularly, efficiently, and consistently March 8, 2010
208
Audit Data Security •
Auditing data security includes − Analysing data security policy and standards against best practices and needs − Analysing implementation procedures and actual practices to ensure consistency with data security goals, policies, standards, guidelines, and desired outcomes − Assessing whether existing standards and procedures are adequate and in alignment with business and technology requirements − Verifying the organisation is in compliance with regulatory requirements − Reviewing the reliability and accuracy of data security audit data − Evaluating escalation procedures and notification mechanisms in the event of a data security breach − Reviewing contracts, data sharing agreements, and data security obligations of outsourced and external vendors, ensuring they meet their obligations, and ensuring the organisation meets its obligations for externally sourced data − Reporting to senior management, data stewards, and other stakeholders on the state of data security within the organisation and the maturity of its practices − Recommending data security design, operational, and compliance improvements March 8, 2010
209
Data Security and Outsourcing • •
• • • •
Outsourcing IT operations introduces additional data security challenges and responsibilities Outsourcing increases the number of people who share accountability for data across organisational and geographic boundaries Previously informal roles and responsibilities must now be explicitly defined as contractual obligations Outsourcing contracts must specify the responsibilities and expectations of each role Any form of outsourcing increases risk to the organisation Data security risk is escalated to include the outsource vendor, so any data security measures and processes must look at the risk from the outsource vendor not only as an external risk, but also as an internal risk March 8, 2010
210
Data Security and Outsourcing •
Transferring control, but not accountability, requires tighter risk management and control mechanisms: − − − − − − − −
•
Service level agreements Limited liability provisions in the outsourcing contract Right-to-audit clauses in the contract Clearly defined consequences to breaching contractual obligations Frequent data security reports from the service vendor Independent monitoring of vendor system activity More frequent and thorough data security auditing Constant communication with the service vendor
In an outsourced environment, it is important to maintain and track the lineage, or flow, of data across systems and individuals to maintain a chain of custody March 8, 2010
211
Reference and Master Data Management
March 8, 2010
212
Reference and Master Data Management •
Reference and Master Data Management is the ongoing reconciliation and maintenance of reference data and master data − Reference Data Management is control over defined domain values (also known as vocabularies), including control over standardised terms, code values and other unique identifiers, business definitions for each value, business relationships within and across domain value lists, and the consistent, shared use of accurate, timely and relevant reference data values to classify and categorise data − Master Data Management is control over master data values to enable consistent, shared, contextual use across systems, of the most accurate, timely, and relevant version of truth about essential business entities
•
Reference data and master data provide the context for transaction data
March 8, 2010
213
Reference and Master Data Management – Definition and Goals •
Definition − Planning, implementation, and control activities to ensure consistency with a golden version of contextual data values
•
Goals − Provide authoritative source of reconciled, high-quality master and reference data − Lower cost and complexity through reuse and leverage of standards − Support business intelligence and information integration efforts
March 8, 2010
214
Reference and Master Data Management - Overview Inputs •Business Drivers •Data Requirements Policy and Regulations •Standards •Code Sets •Master Data •Transactional Data
Suppliers •Steering Committees •Business Data Stewards •Subject Matter Experts •Data Consumers •Standards Organisations •Data Providers
Participants •Data Stewards •Subject Matter Experts •Data Architects •Data Analysts •Application Architects •Data Governance Council •Data Providers •Other IT Professionals March 8, 2010
Primary Deliverables
Reference and Master Data Management Tools •Reference Data Management Applications •Master Data Management Applications •Data Modeling Tools •Process Modeling Tools •Metadata Repositories •Data Profiling Tools •Data Cleansing Tools •Data Integration Tools •Business Process and Rule Engines Change Management Tools
•Master and Reference Data Requirements •Data Models and Documentation •Reliable Reference and Master Data •Golden Record Data Lineage •Data Quality Metrics and Reports •Data Cleansing Services
Consumers •Application Users •BI and Reporting Users •Application Developers and Architects •Data Integration Developers and Architects •BI Developers and Architects •Vendors, Customers, and Partners
Metrics •Reference and Master Data Quality •Change Activity •Issues, Costs, Volume •Use and Re-Use •Availability •Data Steward Coverage 215
Reference and Master Data Management Function, Activities and Sub-Activities Reference and Master Data Management
Reference Data
Master Data
Understand Reference and Master Data Integration Needs
Identify Reference and Master Data Sources and Contributors
Define and Maintain the Data integration Architecture
Implement Reference and Master Data Management Solutions
Define and Maintain Match Rules
Establish Golden Records
Define and Maintain Hierarchies and Affiliations
Party Master Data
Vocabulary Management and Reference Data
Financial Master Data
Defining Golden Master Data Values
Plan and Implement Integration of New Data Sources
Replicate and Distribute Reference and Master Data
Manage Changes to Reference and Master Data
Product Master Data
Location Master Data
March 8, 2010
216
Reference and Master Data Management Principles • • •
•
• •
Shared reference and master data belongs to the organisation, not to a particular application or department Reference and master data management is an on-going data quality improvement program; its goals cannot be achieved by one project alone Business data stewards are the authorities accountable for controlling reference data values. Business data stewards work with data professionals to improve the quality of reference and master data Golden data values represent the organisation’s best efforts at determining the most accurate, current, and relevant data values for contextual use. New data may prove earlier assumptions to be false. Therefore, apply matching rules with caution, and ensure that any changes that are made are reversible Replicate master data values only from the database of record Request, communicate, and, in some cases, approve of changes to reference data values before implementation
March 8, 2010
217
Reference Data •
Reference data is data used to classify or categorise other data
•
Business rules usually dictate that reference data values conform to one of several allowed values
•
In all organisations, reference data exists in virtually every database
•
Reference tables link via foreign keys into other relational database tables, and the referential integrity functions within the database management system ensure only valid values from the reference tables are used in other tables March 8, 2010
218
Master Data •
Master data is data about the business entities that provide context for business transactions
•
Master data is the authoritative, most accurate data available about key business entities, used to establish the context for transactional data
•
Master data values are considered golden
•
Master Data Management is the process of defining and maintaining how master data will be created, integrated, maintained, and used throughout the enterprise
March 8, 2010
219
Master Data Challenges • • • • • •
•
• • •
What are the important roles, organisations, places, and things referenced repeatedly? What data is describing the same person, organisation, place, or thing? Where is this data stored? What is the source for the data? Which data is more accurate? Which data source is more reliable and credible? Which data is most current? What data is relevant for specific needs? How do these needs overlap or conflict? What data from multiple sources can be integrated to create a more complete view and provide a more comprehensive understanding of the person, organisation, place or thing? What business rules can be established to automate master data quality improvement by accurately matching and merging data about the same person, organisation, place, or thing? How do we identify and restore data that was inappropriately matched and merged? How do we provide our golden data values to other systems across the enterprise? How do we identify where and when data other than the golden values is used? March 8, 2010
220
Party Master Data •
Includes data about individuals, organisations, and the roles they play in business relationships
•
Customer relationship management (CRM) systems perform MDM for customer data (also called Customer Data Integration (CDI))
•
Focus is to provide the most complete and accurate information about each and every customer
•
Need to identify duplicate, redundant and conflicting data
•
Party master data issues − − − −
Complexity of roles and relationships played by individuals and organisations Difficulties in unique identification High number of data sources Business importance and potential impact of the data
March 8, 2010
221
Financial Master Data •
Includes data about business units, cost centers, profit centers, general ledger accounts, budgets, projections, and projects
•
Financial MDM solutions focus on not only creating, maintaining, and sharing information, but also simulating how changes to existing financial data may affect the organisation’s bottom line
March 8, 2010
222
Product Master Data •
Product master can consists of information on an organisation’s products and services or on the entire industry in which the organisation operates, including competitor products, and services
•
Product Lifecycle Management (PLM) focuses on managing the lifecycle of a product or service from its conception (such as research), through its development, manufacturing, sale / delivery, service, and disposal
March 8, 2010
223
Location Master Data •
Provides the ability to track and share reference information about different geographies, and create hierarchical relationships or territories based on geographic information to support other processes
•
Different industries require specialised earth science data (geographic data about seismic faults, flood plains, soil, annual rainfall, and severe weather risk areas) and related sociological data (population, ethnicity, income, and terrorism risk), usually supplied from external sources
March 8, 2010
224
Understand Reference and Master Data Integration Needs Reference and master data requirements are relatively easy to discover and understand for a single application • Potentially much more difficult to develop an understanding of these needs across applications, especially across the entire organisation • Analysing the root causes of a data quality problem usually uncovers requirements for reference and master data integration • Organisations that have successfully managed reference and master data typically have focused on one subject area at a time •
− Analyse all occurrences of a few business entities, across all physical databases and for differing usage patterns March 8, 2010
225
Identify Reference and Master Data Sources and Contributors •
Successful organisations first understand the needs for reference and master data
•
Then trace the lineage of this data to identify the original and interim source databases, files, applications, organisations and the individual roles that create and maintain the data
•
Understand both the upstream sources and the downstream needs to capture quality data at its source
March 8, 2010
226
Define and Maintain the Data integration Architecture •
•
•
Effective data integration architecture controls the shared access, replication, and flow of data to ensure data quality and consistency, particularly for reference and master data Without data integration architecture, local reference and master data management occurs in application silos, inevitably resulting in redundant and inconsistent data The selected data integration architecture should also provide common data integration services − − − − − − −
• •
Change request processing, including review and approval Data quality checks on externally acquired reference and master data Consistent application of data quality rules and matching rules Consistent patterns of processing Consistent metadata about mappings, transformations, programs and jobs Consistent audit, error resolution and performance monitoring data Consistent approach to replicating data
Establishing master data standards can be a time consuming task as it may involve multiple stakeholders. Apply the same data standards, regardless of integration technology, to enable effective standardisation, sharing, and distribution of reference and master data March 8, 2010
227
Data Integration Services Architecture Data Quality Management Data Acquisition, File Management and Audit
Data Standardisation Cleansing and Matching
Replication Management
Source Data
Rules
Reconciled Master Data
Archives
Errors
Subscriptions
Staging
MetaData Management Business Metadata March 8, 2010
Integration Metadata
Job Flow and Statistics 228
Implement Reference and Master Data Management Solutions •
Reference and master data management solutions are complex
•
Given the variety, complexity, and instability of requirements, no single solution or implementation project is likely to meet all reference and master data management needs
•
Organisations should expect to implement reference and master data management solutions iteratively and incrementally through several related projects and phases
March 8, 2010
229
Define and Maintain Match Rules •
Matching, merging, and linking of data from multiple systems about the same person, group, place, or thing is a major master data management challenge
•
Matching attempts to remove redundancy, to improve data quality, and provide information that is more comprehensive
•
Data matching is performed by applying inference rules − Duplicate identification match rules focus on a specific set of fields that uniquely identify an entity and identify merge opportunities without taking automatic action − Match-merge rules match records and merge the data from these records into a single, unified, reconciled, and comprehensive record. − Match-link rules identify and cross-reference records that appear to relate to a master record without updating the content of the cross-referenced record
March 8, 2010
230
Establish Golden Records •
Establishing golden master data values requires more inference, application of matching rules, and review of the results
March 8, 2010
231
Vocabulary Management and Reference Data A vocabulary is a collection of terms / concepts and their relationships • Vocabulary management is defining, sourcing, importing, and maintaining a vocabulary and its associated reference data •
− See ANSI/NISO Z39.19 - Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies http://www.niso.org/kst/reports/standards?step=2&gid=&project _key=7cc9b583cb5a62e8c15d3099e0bb46bbae9cf38a
Vocabulary management requires the identification of the standard list of preferred terms and their synonyms • Vocabulary management requires data governance, enabling data stewards to assess stakeholder needs •
March 8, 2010
232
Vocabulary Management and Reference Data •
Key questions to ask to enable vocabulary management − What information concepts (data attributes) will this vocabulary support? − Who is the audience for this vocabulary? What processes do they support, and what roles do they play? − Why is the vocabulary needed? Will it support applications, content management, analytics, and so on? − Who identifies and approves the preferred vocabulary and vocabulary terms? − What are the current vocabularies different groups use to classify this information? Where are they located? How were they created? Who are their subject matter experts? Are there any security or privacy concerns for any of them? − Are there existing standards that can be leveraged to fulfill this need? Are there concerns about using an external standard vs. internal? How frequently is the standard updated and what is the degree of change of each update? Are standards accessible in an easy to import / maintain format in a cost efficient manner? March 8, 2010
233
Defining Golden Master Data Values •
•
•
• •
Golden data values are the data values thought to be the most accurate, current, and relevant for shared, consistent use across applications Determine golden values by analyssing data quality, applying data quality rules and matching rules, and incorporating data quality controls into the applications that acquire, create, and update data Establish data quality measurements to set expectations, measure improvements, and help identify root causes of data quality problems Assess data quality through a combination of data profiling activities and verification against adherence to business rules Once the data is standardised and cleansed, the next step is to attempt reconciliation of redundant data through application of matching rules March 8, 2010
234
Define and Maintain Hierarchies and Affiliations •
Vocabularies and their associated reference data sets are often more than lists of preferred terms and their synonyms
•
Affiliation management is the establishment and maintenance of relationships between master data records
March 8, 2010
235
Plan and Implement Integration of New Data Sources •
Integrating new reference data sources involves − Receiving and responding to new data acquisition requests from different groups − Performing data quality assessment services using data cleansing and data profiling tools − Assessing data integration complexity and cost − Piloting the acquisition of data and its impact on match rules − Determining who will be responsible for data quality − Finalising data quality metrics
March 8, 2010
236
Replicate and Distribute Reference and Master Data •
Reference and master data may be read directly from a database of record, or may be replicated from the database of record to other application databases for transaction processing, and data warehouses for business intelligence
•
Reference data most commonly appears as pick list values in applications
•
Replication aids maintenance of referential integrity
March 8, 2010
237
Manage Changes to Reference and Master Data •
Specific individuals have the role of a business data steward with the authority to create, update, and retire reference data
•
Formally control changes to controlled vocabularies and their reference data sets
•
Carefully assess the impact of reference data changes
March 8, 2010
238
Data Warehousing and Business Intelligence Management
March 8, 2010
239
Data Warehousing and Business Intelligence Management •
A Data Warehouse is a combination of two primary components − An integrated decision support database − Related software programs used to collect, cleanse, transform, and store data from a variety of operational and external sources
Both components combine to support historical, analytical, and business intelligence (BI) requirements • A Data Warehouse may also include dependent data marts, which are subset copies of a data warehouse database • A Data Warehouse includes any data stores or extracts used to support the delivery of data for BI purposes •
March 8, 2010
240
Data Warehousing and Business Intelligence Management •
Data Warehousing means the operational extract, cleansing, transformation, and load processes and associated control processes that maintain the data contained within a data warehouse
•
Data Warehousing process focuses on enabling an integrated and historical business context on operational data by enforcing business rules and maintaining appropriate business data relationships and processes that interact with metadata repositories
•
Business Intelligence is a set of business capabilities including − Query, analysis, and reporting activity by knowledge workers to monitor and understand the financial operation health of, and make business decisions about, the enterprise − Strategic and operational analytics and reporting on corporate operational data to support business decisions, risk management, and compliance
March 8, 2010
241
Data Warehousing and Business Intelligence Management •
Together Data Warehousing and Business Intelligence Management is the collection, integration, and presentation of data to knowledge workers for the purpose of business analysis and decision-making
•
Composed of activities supporting all phases of the decision support life cycle that provides context − Moves and transforms data from sources to a common target data store − Provides knowledge workers various means of access, manipulation − Reporting of the integrated target data
March 8, 2010
242
Data Warehousing and Business Intelligence Management – Definition and Goals •
Definition − Planning, implementation, and control processes to provide decision support data and support knowledge workers engaged in reporting, query and analysis
•
Goals − To support and enable effective business analysis and decision making by knowledge workers − To build and maintain the environment / infrastructure to support business intelligence activity, specifically leveraging all the other data management functions to cost effectively deliver consistent integrated data for all BI activity
March 8, 2010
243
Data Warehousing and Business Intelligence Management - Overview Inputs •Business Drivers •BI Data and Access Requirements •Data Quality Requirements •Data Security Requirements •Data Architecture •Technical Architecture •Data Modeling Standards and Guidelines •Transactional Data •Master and Reference Data •Industry and External Data
Suppliers •Executives and Managers •Subject Matter Experts •Data Governance Council •Information Consumers (Internal and External) •Data Producers •Data Architects and Analysts
Primary Deliverables
Data Warehousing and Business Intelligence Management
Consumers
Tools
Participants •Business Executives and Managers •DM Execs and Other IT Management •BI Program Manage •SMEs and Other Information Consumers •Data Stewards •Project Managers •Data Architects and Analysts •Data Integration (ETL) Specialists •BI Specialists •Database Administrators •Data Security Administrators •Data Quality Analysts March 8, 2010
•DW/BI Architecture •Data Warehouses •Data Marts and OLAP Cubes •Dashboards and Scorecards •Analytic Applications •File Extracts (for Data Mining/Statistical Tools) •BI Tools and User Environments •Data Quality Feedback Mechanism/Loop
•Database Management Systems •Data Profiling Tools •Data Integration Tools •Data Cleansing Tools •Business Intelligence Tools •Analytic Applications •Data Modeling Tools •Performance Management Tools •Metadata Repository •Data Quality Tools •Data Security Tools
•Knowledge Workers •Managers and Executives •External Customers and Systems •Internal Customers and Systems •Data Professionals Other IT Professionals
Metrics •Usage Metrics •Customer/User Satisfaction •Subject Area Coverage % •Response/Performance Metrics
244
Data Warehousing and Business Intelligence Management Objectives • • • • • •
• •
•
Providing integrated storage of required current and historical data, organised by subject areas Ensuring credible, quality data for all appropriate access capabilities Ensuring a stable, high-performance, reliable environment for data acquisition, data management, and data access Providing an easy-to-use, flexible, and comprehensive data access environment Delivering both content and access to the content in increments appropriate to the organisation’s objectives Leveraging, rather than duplicating, relevant data management component functions such as Reference and Master Data Management, Data Governance, Data Quality, and Metadata Providing an enterprise focal point for data delivery in support of the decisions, policies, procedures, definitions, and standards that arise from DG Defining, building, and supporting all data stores, data processes, data infrastructure, and data tools that contain integrated, post-transactional, and refined data used for information viewing, analysis, or data request fulfillment Integrating newly discovered data as a result of BI processes into the DW for further analytics and BI use. March 8, 2010
245
Data Warehousing and Business Intelligence Management Function, Activities and Sub-Activities Data Warehousing and Business Intelligence Management Understand Business Intelligence Information Needs
Define and Maintain the DW-BI Architecture
Implement Data Warehouses and Data Marts
Implement Business Intelligence Tools and User Interfaces
Process Data for Business Intelligence
Monitor and Tune Data Warehousing Processes
Query and Reporting Tools
Staging Areas
On Line Analytical Processing (OLAP) Tools
Mapping Sources and Targets
Analytic Applications
Data Cleansing and Transformations (Data Acquisition)
Monitor and Tune BI Activity and Performance
Implementing Management Dashboards and Scorecards Performance Management Tools
Predictive Analytics and Data Mining Tools
Advanced Visualisation and Discovery Tools March 8, 2010
246
Data Warehousing and Business Intelligence Management Principles • • • • • •
• •
• •
•
Obtain executive commitment and support as these projects are labour intensive Secure business SMEs as their support and high availability are necessary for getting the correct data and useful BI solution Be business focused and driven. Make sure DW / BI work is serving real priority business needs and solving burning business problems. Let the business drive the prioritisation Demonstrable data quality is essential Provide incremental value. Ideally deliver in continual 2-3 month segments Transparency and self service. The more context (metadata of all kinds) provided, the more value customers derive. Wisely exposing information about the process reduces calls and increases satisfaction. One size does not fit all. Make sure you find the right tools and products for each of your customer segments Think and architect globally, act and build locally. Let the big-picture and end- vision guide the architecture, but build and deliver incrementally, with much shorter term and more project-based focus Collaborate with and integrate all other data initiatives, especially those for data governance, data quality, and metadata Start with the end in mind. Let the business priority and scope of end-data- delivery in the BI space drive the creation of the DW content. The main purpose for the existence of the DW is to serve up data to the end business customers via the BI capabilities Summarise and optimise last, not first. Build on the atomic data and add aggregates or summaries as needed for performance, but not to replace the detail. March 8, 2010
247
Understand Business Intelligence Information Needs • • • • • • • • •
All projects start with requirements Gathering requirements for DW-BIM projects has both similarities to and differences from gathering requirements for other projects For DW-BIM projects, it is important to understand the broader business context of the business area targeted as reporting is generalised and exploratory Capturing the actual business vocabulary and terminology is a key to success Document the business context, then explore the details of the actual source data Typically, the ETL portion can consume 60%-70% of a DW-BIM project’s budget and time The DW is often the first place where the pain of poor quality data in source systems and / or data entry functions becomes apparent Creating an executive summary of the identified business intelligence needs is a best practice When starting a DW-BIM programme, a good way to decide where to start is using a simple assessment of business impact and technical feasibility − Technical feasibility will take into consideration things like complexity, availability and state of the data, and the availability of subject matter experts − Projects that have high business impact and high technical feasibility are good candidates for starting.
March 8, 2010
248
Define and Maintain the DW-BI Architecture •
Successful DW-BIM architecture requires the identification and bringing together of a number of key roles − Technical Architect - hardware, operating systems, databases and DW-BIM architecture − Data Architect - data analysis, systems of record, data modeling and data mapping − ETL Architect / Design Lead - staging and transform, data marts, and schedules − Metadata Specialist - metadata interfaces, metadata architecture and contents − BI Application Architect / Design Lead - BI tool interfaces and report design, metadata delivery, data and report navigation and delivery
• • •
Technical requirements including performance, availability, and timing needs are key drivers in developing the DW-BIM architecture The design decisions and principles for what data detail the DW contains is a key design priority for DW-BIM architecture Important that the DW-BIM architecture integrate with the overall corporate reporting architecture March 8, 2010
249
Define and Maintain the DW-BI Architecture •
No DW-BIM effort can be successful without business acceptance of data
•
Business acceptance includes the data being understandable, having verifiable quality and having a demonstrable origin
•
Sign-off by the Business on the data should be part of the User Acceptance Testing
•
Structured random testing of the data in the BIM tool against data in the source systems over the initial load and a few update load cycles should be performed to meet sign-off criteria
•
Meeting these requirements is paramount for every DW-BIM architecture
March 8, 2010
250
Implement Data Warehouses and Data Marts •
The purpose of a data warehouse is to integrate data from multiple sources and then serve up that integrated data for BI purposes
•
Consumption is typically through data marts or other systems
•
A single data warehouse will integrate data from multiple source systems and serve data to multiple data marts
•
Purpose of data marts is to provide data for analysis to knowledge workers
•
Start with the end in mind - identify the business problem to solve, then identify the details and what would be used and continue to work back into the integrated data required and ultimately all the way back to the data sources.
March 8, 2010
251
Implement Business Intelligence Tools and User Interfaces •
Well defined set of well-proven BI tools
•
Implementing the right BI tool or User Interface (UI) is about identifying the right tools for the right user set
•
Almost all BI tools also come with their own metadata repositories to manage their internal data maps and statistics
March 8, 2010
252
Query and Reporting Tools •
Query and reporting is the process of querying a data source and then formatting it to create a report
•
With business query and reporting the data source is more often a data warehouse or data mart
•
While IT develops production reports, power users and casual business users develop their own reports with business query tools
•
Business query and reporting tools enable users who want to author their own reports or create outputs for use by others March 8, 2010
253
Query and Reporting Tools Landscape Customers, Suppliers and Regulators Frontline Workers Embedded BI
Published Reports
Executives and Managers Scorecards
Dashboards OLAP
Analysts and Information Workers IT Developers
BI Spreadsheets Production Reporting Tools Commonly Used Tools March 8, 2010
Interactive Fixed Reports
Statistics
Specialist Tools
Business Query
Commonly Used Tools 254
On Line Analytical Processing (OLAP) Tools • •
•
OLAP provides interactive, multi-dimensional analysis with different dimensions and different levels of detail The value of OLAP tools and cubes is reduction of the chance of confusion and erroneous interpretation by aligning the data content with the analyst's mental model Common OLAP operations include slice and dice, drill down, drill up, roll up, and pivot − Slice - a slice is a subset of a multi-dimensional array corresponding to a single value for one or more members of the dimensions not in the subset − Dice - the dice operation is a slice on more than two dimensions of a data cube, or more than two consecutive slices − Drill Down / Up - drilling down or up is a specific analytical technique whereby the user navigates among levels of data, ranging from the most summarised (up) to the most detailed (down) − Roll-Up – a roll-up involves computing all of the data relationships for one or more dimensions. To do this, define a computational relationship or formula − Pivot - to change the dimensional orientation of a report or page display
March 8, 2010
255
Analytic Applications •
Analytic applications include the logic and processes to extract data from well-known source systems, such as vendor ERP systems, a data model for the data mart, and pre-built reports and dashboards
•
Analytic applications provide businesses with a pre-built solution to optimise a functional area or industry vertical
•
Different types of analytic applications include customer, financial, supply chain, manufacturing, and human resource applications
March 8, 2010
256
Implementing Management Dashboards and Scorecards •
Dashboards and scorecards are both ways of efficiently presenting performance information
•
Dashboards are oriented more toward dynamic presentation of operational information while scorecards are more static representations of longer-term organisational, tactical, or strategic goals
•
Typically, scorecards are divided into 4 quadrants or views of the organisation such as Finance, Customer, Environment, and Employees, each with a number of metrics
March 8, 2010
257
Performance Management Tools •
Performance management applications include budgeting, planning, and financial consolidation
March 8, 2010
258
Predictive Analytics and Data Mining Tools •
Data mining is a particular kind of analysis that reveals patterns in data using various algorithms
•
A data mining tool will help users discover relationships or show patterns in more exploratory fashion
March 8, 2010
259
Advanced Visualisation and Discovery Tools •
Advanced visualisation and discovery tools allow users to interact with the data in a highly visual, interactive way
•
Patterns in a large dataset can be difficult to recognise in a numbers display
•
A pattern can be picked up visually fairly quickly when thousands of data points are loaded into a sophisticated display on a single page of display
March 8, 2010
260
Process Data for Business Intelligence •
Most of the work in any DW-BIM effort involves in the preparation and processing of the data
March 8, 2010
261
Staging Areas •
A staging area is the intermediate data store between an original data source and the centralised data repository
•
All required cleansing, transformation, reconciliation, and relationships happen in this area
March 8, 2010
262
Mapping Sources and Targets •
Source-to-target mapping is the documentation activity that defines data type details and transformation rules for all required entities and data elements and from each individual source to each individual target
•
DW-BIM adds additional requirements to this source-to-target mapping process encountered as a component of any typical data migration
•
One of the goals of the DW-BIM effort should be to provide a complete lineage for each data element available in the BI environment all the way back to its respective source(s)
•
A solid taxonomy is necessary to match the data elements in different systems into a consistent structure in the EDW
March 8, 2010
263
Data Cleansing and Transformations (Data Acquisition) •
• • • • •
Data cleansing focuses on the activities that correct and enhance the domain values of individual data elements, including enforcement of standards Cleansing is particularly necessary for initial loads where significant history is involved The preferred strategy is to push data cleansing and correction activity back to the source systems whenever possible Data transformation focuses on activities that provide organisational context between data elements, entities, and subject areas Organisational context includes cross- referencing, reference and master data management and complete and correct relationships Data transformation is an essential component of being able to integrate data from multiple sources March 8, 2010
264
Monitor and Tune Data Warehousing Processes •
Processing should be monitored across the system for bottlenecks and dependencies among processes
•
Database tuning techniques should be employed where and when needed, including partitioning, tuned backup and recovery strategies
•
Archiving is a difficult subject in data warehousing
•
Users often consider the data warehouse as an active archive due to the long histories that are built, and are unwilling, particularly if the OLAP sources have dropped records, to see the data warehouse engage in archiving March 8, 2010
265
Monitor and Tune BI Activity and Performance •
A best practice for BI monitoring and tuning is to define and display a set of customer- facing satisfaction metrics
•
Average query response time and the number of users per day / week / month, are examples of useful metrics to display
•
Regular review of usage statistics and patterns is essential
•
Reports providing frequency and resource usage of data, queries, and reports allow prudent enhancement
•
Tuning BI activity is analogous to the principle of profiling applications in order to know where the bottlenecks are and where to apply optimisation efforts March 8, 2010
266
Document and Content Management
March 8, 2010
267
Document and Content Management •
•
•
Document and Content Management is the control over capture, storage, access, and use of data and information stored outside relational databases Strategic and tactical focus overlaps with other data management functions in addressing the need for data governance, architecture, security, managed metadata, and data quality for unstructured data Document and Content Management includes two sub-functions: − Document management is the storage, inventory, and control of electronic and paper documents. Document management encompasses the processes, techniques, and technologies for controlling and organising documents and records, whether stored electronically or on paper − Content management refers to the processes, techniques, and technologies for organising, categorising, and structuring access to information content, resulting in effective retrieval and reuse. Content management is particularly important in developing websites and portals, but the techniques of indexing based on keywords, and organising based on taxonomies, can be applied across technology platforms. March 8, 2010
268
Document and Content Management – Definition and Goals •
Definition − Planning, implementation, and control activities to store, protect, and access data found within electronic files and physical records (including text, graphics, images, audio, and video)
•
Goals − To safeguard and ensure the availability of data assets stored in less structured formats − To enable effective and efficient retrieval and use of data and information in unstructured formats − To comply with legal obligations and customer expectations − To ensure business continuity through retention, recovery, and conversion − To control document storage operating costs March 8, 2010
269
Document and Content Management - Overview Inputs •Text Documents •Reports •Spreadsheets •Email •Instant Messages •Faxes •Voicemail •Images •Video Recordings •Audio Recordings •Printed Paper Files •Microfiche/Microfilm •Graphics
Primary Deliverables
Document and Content Management
•Managed Records in Many Media Formats •E-discovery Records •Outgoing Letters and Emails •Contracts and Financial Documents •Policies and Procedures •Audit Trails and Logs •Meeting Minutes •Formal Reports •Significant Memoranda
Consumers
Suppliers •Employees •External Parties
Participants •All Employees •Data Stewards •DM Professionals •Records Management Staff •Other IT Professionals •Data Management Executive •Other IT Managers •Chief Information Officer •Chief Knowledge Officer March 8, 2010
Tools •Stored Documents •Office Productivity Tools •Image and Workflow Management Tools •Records Management Tools •XML Development Tools •Collaboration Tools •Internet •Email Systems
•Business and IT Users •Government Regulatory Agencies • Senior Management •External Customers
Metrics •Return on investment •Key Performance Indicators •Balanced Scorecards
270
Document and Content Management Function, Activities and Sub-Activities Document and Content Management
Document / Record Management
Content Management
Plan for Managing Documents / Records
Define and Maintain Enterprise Taxonomies (Information Content Architecture)
Implement Document / Record Management Systems for Acquisition, Storage, Access, and Security Controls
Document / Index Information Content Metadata
Backup and Recover Documents / Records
Provide Content Access and Retrieval
Retention and Disposition of Documents / Records
Govern for Quality Content
Audit Document / Records Management March 8, 2010
271
Document and Content Management - Principles Everyone in an organisation has a role to play in protecting its future. Everyone must create, use, retrieve, and dispose of records in accordance with the established policies and procedures • Experts in the handling of records and content should be fully engaged in policy and planning. Regulatory and best practices can vary significantly based on industry sector and legal jurisdiction • Even if records management professionals are not available to the organisation, everyone can be trained and have an understanding of the issues. Once trained, business stewards and others can collaborate on an effective approach to records management •
March 8, 2010
272
Document and Content Management •
•
•
•
A document management system is an application used to track and store electronic documents and electronic images of paper documents Document management systems commonly provide storage, versioning, security, metadata management, content indexing, and retrieval capabilities A content management system is used to collect, organise, index, and retrieve information content; storing the content either as components or whole documents, while maintaining links between components While a document management system may provide content management functionality over the documents under its control, a content management system is essentially independent of where and how the documents are stored March 8, 2010
273
Document / Record Management •
Document / Record Management is the lifecycle management of the designated significant documents of the organisation
•
Records can − − − − −
Physical such as documents, memos, contracts, reports or microfiche Electronic such as email content, attachments, and instant messaging Content on a website Documents on all types of media and hardware Data captured in databases of all kinds
•
More than 90% of the records created today are electronic
•
Growth in email and instant messaging has made the management of electronic records critical to an organisation
March 8, 2010
274
Document / Record Management •
The lifecycle of Document / Record Management includes: − Identification of existing and newly created documents / records − Creation, Approval, and Enforcement of documents / records policies − Classification of documents / records − Documents / Records Retention Policy − Storage: Short and long term storage of physical and electronic documents / records − Retrieval and Circulation: Allowing access and circulation of documents / records in accordance with policies, security and control standards, and legal requirements − Preservation and Disposal: Archiving and destroying documents / records according to organisational needs, statutes, and regulations March 8, 2010
275
Plan for Managing Documents / Records • • •
• • •
Plan document lifecycle from creation or receipt, organisation for retrieval, distribution and archiving or disposition Develop classification / indexing systems and taxonomies so that the retrieval of documents is easy Create planning and policy around documents and records on the value of the data to the organisation and as evidence of business transactions Identify the responsible, accountable organisational unit for managing the documents / records Develop and execute a retention plan and policy to archive, such as selected records for long-term preservation Records are destroyed at the end of their lifecycle according to operational needs, procedures, statutes and regulations March 8, 2010
276
Implement Document / Record Management Systems for Acquisition, Storage, Access, and Security Controls •
Documents can be created within a document management system or captured via scanners or OCR software
•
Electronic documents must be indexed via keywords or text during the capture process so that the document can be found
•
A document repository enables check-in and check-out features, versioning, collaboration, comparison, archiving, status state(s), migration from one storage media to another and disposition
•
Document management can support different types of workflows − Manual workflows that indicate where the user sends the document − Rules-based workflow, where rules are created that dictate the flow of the document within an organisation − Dynamic rules that allow for different workflows based on content
March 8, 2010
277
Backup and Recover Documents / Records •
The document / record management system needs to be included as part of the overall corporate backup and recovery activities for all data and information
•
Document / records manager be involved in risk mitigation and management, and business continuity especially regarding security for vital records
•
A vital records program provides the organisation with access to the records necessary to conduct its business during a disaster and to resume normal business afterward
March 8, 2010
278
Retention and Disposition of Documents / Records •
Defines the period of time during which documents / records for operational, legal, financial or historical value must be maintained
•
Specifies the processes for compliance, and the methods and schedules for the disposition of documents / records
•
Must deal with privacy and data protection issues
•
Legal and regulatory requirements must be considered when setting up document record retention schedules
March 8, 2010
279
Audit Document / Records Management •
Document / records management requires auditing on a periodic basis to ensure that the right information is getting to the right people at the right time for decision making or performing operational activities − Inventory - Each location in the inventory is uniquely identified − Storage - Storage areas for physical documents / records have adequate space to accommodate growth − Reliability and Accuracy - Spot checks are executed to confirm that the documents / records are an adequate reflection of what has been created or received − Classification and Indexing Schemes - Metadata and document file plans are well described − Access and Retrieval - End users find and retrieve critical information easily − Retention Processes - Retention schedule is structured in a logical way − Disposition Methods - Documents / records are disposed of as recommended − Security and Confidentiality - Breaches of document / record confidentiality and loss of documents / records are recorded as security incidents and managed appropriately − Organisational Understanding of Documents / Records Management - Appropriate training is provided to stakeholders and staff as to the roles and responsibilities related to document / records management
March 8, 2010
280
Content Management •
Organisation, categorisation, and structure of data / resources so that they can be stored, published, and reused in multiple ways
•
Includes data / information, that exists in many forms and in multiple stages of completion within its lifecycle
•
Content management systems manage the content of a website or intranet through the creation, editing, storing, organising, and publishing of content
March 8, 2010
281
Define and Maintain Enterprise Taxonomies (Information Content Architecture) •
Process of creating a structure for a body of information or content
•
Contains a controlled vocabulary that can help with navigation and search systems
•
Content Architecture identifies the links and relationships between documents and content, specifies document requirements and attributes and defines the structure of content in a document or content management system
March 8, 2010
282
Document / Index Information Content Metadata •
Development of metadata for unstructured data content
•
Maintenance of metadata for unstructured data becomes the maintenance of a cross-reference of various local schemes to the official set of organisation metadata
March 8, 2010
283
Provide Content Access and Retrieval •
Once the content has been described by metadata / key word tagging and classified within the appropriate Information Content Architecture, it is available for retrieval and use
•
Finding unstructured data can be eased through portal technology
March 8, 2010
284
Govern for Quality Content •
Managing unstructured data requires effective partnerships between data stewards, data professionals, and records managers
•
The focus of data governance can include document and record retention policies, electronic signature policies, reporting formats, and report distribution policies
•
High quality, accurate, and up-to-date information will aid in critical business decisions
•
Timeliness of the decision-making process with high quality information may increase competitive advantage and business effectiveness March 8, 2010
285
Metadata Management
March 8, 2010
286
Metadata Management • • •
Metadata is data about data Metadata Management is the set of processes that ensure proper creation, storage, integration, and control to support associated usage of metadata Leveraging metadata in an organisation can provide benefits − Increase the value of strategic information by providing context for the data, thus aiding analysts in making more effective decisions − Reduce training costs and lower the impact of staff turnover through thorough documentation of data context, history, and origin − Reduce data-oriented research time by assisting business analysts in finding the information they need, in a timely manner − Improve communication by bridging the gap between business users and IT professionals, leveraging work done by other teams, and increasing confidence in IT system data − Increase speed of system development time-to-market by reducing system development life-cycle time − Reduce risk of project failure through better impact analysis at various levels during change management − Identify and reduce redundant data and processes, thereby reducing rework and use of redundant, out-of-date, or incorrect data March 8, 2010
287
Metadata Management – Definition and Goals •
Definition − Planning, implementation, and control activities to enable easy access to high quality, integrated metadata
•
Goals − Provide organisational understanding of terms, and usage − Integrate metadata from diverse source − Provide easy, integrated access to metadata − Ensure metadata quality and security
March 8, 2010
288
Metadata • •
Metadata is information about the physical data, technical and business processes, data rules and constraints, and logical and physical structures of the data, as used by an organisation Descriptive tags describe data, concepts and the connections between the data and concepts − Business Analytics: Data definitions, reports, users, usage, performance
− Business Architecture: Roles and organisations, goals and objectives − Business Definitions: The business terms and explanations for a particular concept, fact, or other item found in an organisation − Business Rules: Standard calculations and derivation methods − Data Governance: Policies, standards, procedures, programs, roles, organisations, stewardship assignments − Data Integration: Sources, targets, transformations, lineage, ETL workflows, EAI, EII, migration / conversion − Data Quality: Defects, metrics, ratings − Document Content Management: Unstructured data, documents, taxonomies, name sets, legal discovery, search engine indexes − Information Technology Infrastructure: Platforms, networks, configurations, licenses − Logical Data Models: Entities, attributes, relationships and rules, business names and definitions − Physical Data Models: Files, tables, columns, views, business definitions, indexes, usage, performance, change management − Process Models: Functions, activities, roles, inputs / outputs, workflow, business rules, timing, stores − Systems Portfolio and IT Governance: Databases, applications, projects and programs, integration roadmap, change management − Service-Oriented Architecture (SOA) Information: Components, services, messages, master data − System Design and Development: Requirements, designs and test plans, impact − Systems Management: Data security, licenses, configuration, reliability, service levels March 8, 2010
289
Metadata Management - Overview Inputs
Primary Deliverables
•Metadata Requirements •Metadata Issues •Data Architecture •Business Metadata •Technical Metadata •Process Metadata •Operational Metadata •Data Stewardship Metadata
Suppliers •Data Stewards •Data Architects •Data Modelers •Database Administrators •Other Data Professionals •Data Brokers •Government and Industry Regulators
Participants •Metadata Specialist •Data Integration Architects •Data Stewards •Data Architects and Modelers •Database Administrators •Other DM Professionals •Other IT Professionals •DM Executive •Business Users March 8, 2010
Metadata Management
•Metadata Repositories •Quality Metadata •Metadata Models and Architecture •Metadata Management •Operational Analysis •Metadata Analysis •Data Lineage •Change Impact Analysis •Metadata Control Procedures
Consumers
Tools •Metadata Repositories •Data Modeling Tools •Database Management Systems •Data Integration Tools •Business Intelligence Tools •System Management Tools •Object Modeling Tools •Process Modeling Tools •Report Generating Tools •Data Quality Tools •Data Development and Administration Tools •Reference and Master Data Management Tools
•Data Stewards •Data Professionals •Other IT Professionals •Knowledge Workers •Managers and Executives •Customers and Collaborators •Business Users
Metrics •Meta Data Quality •Master Data Service Data Compliance •Metadata Repository Contribution Metadata Documentation Quality Steward Representation / Coverage •Metadata Usage / Reference •Metadata Management Maturity •Metadata Repository Availability 290
Metadata Management Function, Activities and SubActivities Metadata Management
Understand Metadata Requirements
Define the Metadata Architecture
Develop and Maintain Metadata Standards
Implement a Managed Metadata Environment
Create and Maintain Metadata
Integrate Metadata
Manage Metadata Repositories
Distribute and Deliver Metadata
Business User Requirements
Centralised Metadata Architecture
Industry / Consensus Metadata Standards
Metadata Repositories
Technical User Requirements
Distributed Metadata Architecture
International Metadata Standards
Directories, Glossaries and Other Metadata Stores
Hybrid Metadata Architecture
Standard Metadata Metrics
March 8, 2010
Query, Report and Analyse Metadata
291
Metadata Management - Principles • • • • • • • • • • • • • • •
Establish and maintain a metadata strategy and appropriate policies, especially clear goals and objectives for metadata management and usage Secure sustained commitment, funding, and vocal support from senior management concerning metadata management for the enterprise Take an enterprise perspective to ensure future extensibility, but implement through iterative and incremental delivery Develop a metadata strategy before evaluating, purchasing, and installing metadata management products Create or adopt metadata standards to ensure interoperability of metadata across the enterprise Ensure effective metadata acquisition for both internal and external meta- data Maximise user access, since a solution that is not accessed or is under-accessed will not show business value Understand and communicate the necessity of metadata and the purpose of each type of metadata; socialisation of the value of metadata will encourage business usage Measure content and usage Leverage XML, messaging, and Web services Establish and maintain enterprise-wide business involvement in data stewardship, assigning accountability for metadata Define and monitor procedures and processes to ensure correct policy implementation Include a focus on roles, staffing, standards, procedures, training, and metrics Provide dedicated metadata experts to the project and beyond Certify metadata quality
March 8, 2010
292
Understand Metadata Requirements Metadata management strategy must reflect an understanding of enterprise needs for metadata • Gather requirements to confirm the need for a metadata management environment, to set scope and priorities, educate and communicate, to guide tool evaluation and implementation, guide metadata modeling, guide internal metadata standards, guide provided services that rely on metadata, and to estimate and justify staffing needs • Gather requirements from business and technical users • Summarise the requirements from an analysis of roles, responsibilities, challenges, and the information needs of selected individuals in the organisation •
March 8, 2010
293
Business User Requirements •
Business users require improved understanding of the information from operational and analytical systems
•
Business users require a high level of confidence in the information obtained from corporate data warehouses, analytical applications, and operational systems
•
Need appropriate access to information delivery methods, such as reports, queries, ad-hoc, OLAP, dashboards with a high degree of quality documentation and context
•
Business users must understand the intent and purpose of metadata management March 8, 2010
294
Technical User Requirements •
Technical requirement topics include − Daily feed throughput: size and processing time − Existing metadata − Sources - known and unknown − Targets − Transformations − Architecture flow logical and physical − Non-standard metadata requirements
•
Technical users must understand the business context of the data at a sufficient level to provide the necessary support, including implementing the calculations or derived data rules March 8, 2010
295
Define the Metadata Architecture •
Metadata management solutions consist of − Metadata creation / sourcing − metadata integration − Mmetadata repositories − Metadata delivery − Metadata usage − Metadata control / management
March 8, 2010
296
Centralised Metadata Architecture • •
Single metadata repository that contains copies of the live metadata from the various sources Advantages − High availability, since it is independent of the source systems − Quick metadata retrieval, since the repository and the query reside together − Resolved database structures that are not affected by the proprietary nature of third party or commercial systems − Extracted metadata may be transformed or enhanced with additional metadata that may not reside in the source system, improving quality
•
Disadvantages − Complex processes are necessary to ensure that changes in source metadata quickly replicate into the repository − Maintenance of a centralised repository can be substantial − Extraction could require custom additional modules or middleware − Validation and maintenance of customised code can increase the demands on both internal IT staff and the software vendors March 8, 2010
297
Distributed Metadata Architecture • •
Metadata retrieval engine responds to user requests by retrieving data from source systems in real time with no persistent repository Advantages − Metadata is always as current and valid as possible − Queries are distributed, possibly improving response / process time − Metadata requests from proprietary systems are limited to query processing rather than requiring a detailed understanding of proprietary data structures, therefore minimising the implementation and maintenance effort required − Development of automated metadata query processing is likely simpler, requiring minimal manual intervention − Batch processing is reduced, with no metadata replication or synchronisation processes
•
Disadvantages − No enhancement or standardisation of metadata is possible between systems − Query capabilities are directly affected by the availability of the participating source systems − No ability to support user-defined or manually inserted metadata entries since there is no repository in which to place these additions March 8, 2010
298
Hybrid Metadata Architecture •
•
Hybrid architecture where metadata still moves directly from the source systems into a repository but the repository design only accounts for the user-added metadata, the critical standardised items and the additions from manual sources Advantages − Near-real-time retrieval of metadata from its source and enhanced metadata to meet user needs most effectively, when needed − Lowers the effort for manual IT intervention and custom-coded access functionality to proprietary systems.
•
Disadvantages − Source systems must be available because the distributed nature of the back-end systems handles processing of queries − Additional overhead is required to link those initial results with metadata augmentation in the central repository before presenting the result set to the end user − Design forces the metadata repository to contain the latest version of the metadata source and forces it to manage changes to the source, as well − Sets of program / process interfaces to tie the repository back to the meta- data source(s) must be built and maintained
March 8, 2010
299
Develop and Maintain Metadata Standards •
Check industry or consensus standards and international standards
•
International standards provide the framework from which the industry standards are developed and executed
March 8, 2010
300
Industry / Consensus Metadata Standards •
Understanding the various standards for the implementation and management of metadata in industry is essential to the appropriate selection and use of a metadata solution for an enterprise − OMG (Object Management Group) specifications • • • • • • •
Common Warehouse Metadata (CWM) Information Management Metamodel (IMM) MDC Open Information Model (OIM) Extensible Markup Language (XML) Unified Modeling Language (UML) Extensible Markup Interface (XMI) Ontology Definition Metamodel (ODM)
− World Wide Web Consortium (W3C) RDF (Relational Definition Framework) for describing and interchanging meta- data using XML − Dublin Core Metadata Initiative (DCMI) interoperable online metadata standard using RDF − Distributed Management Task Force (DTMF) Web-Based Enterprise Management (WBEM) Common Information Model (CIM) standards-based management tools facilitating the exchange of data across otherwise disparate technologies and platforms − Metadata standards for unstructured data • • • •
March 8, 2010
ISO 5964 - Guidelines for the establishment and development of multilingual thesauri ISO 2788 - Guidelines for the establishment and development of monolingual thesauri ANSI/NISO Z39.1 - American Standard Reference Data and Arrangement of Periodicals ISO 704 - Terminology work Principles and methods
301
International Metadata Standards •
ISO / IEC 11179 is an international metadata standard for standardising and registering of data elements to make data understandable and shareable
March 8, 2010
302
Standard Metadata Metrics •
Controlling the effectiveness of the metadata deployed environment requires measurements to assess user uptake, organisational commitment, and content coverage and quality − Metadata Repository Completeness − Metadata Documentation Quality − Master Data Service Data Compliance − Steward Representation / Coverage − Metadata Usage / Reference − Metadata Management Maturity − Metadata Repository Availability
March 8, 2010
303
Implement a Managed Metadata Environment •
Implement a managed metadata environment in incremental steps in order to minimise risks to the organisation and to facilitate acceptance
•
First implementation is a pilot to prove concepts and learn about managing the metadata environment
March 8, 2010
304
Create and Maintain Metadata •
Metadata creation and update facility provides for the periodic scanning and updating of the repository in addition to the manual insertion and manipulation of metadata by authorised users and program
•
Audit process validates activities and reports exceptions
•
Metadata is the guide to the data in the organisation so its quality is critical
March 8, 2010
305
Integrate Metadata •
Integration processes gather and consolidate metadata from across the enterprise including metadata from data acquired outside the enterprise
•
Challenges will arise in integration that will require resolution through the governance process
•
Use a non-persistent metadata staging area to store temporary and backup files that supports rollback and recovery processes and provides an interim audit trail to assist repository managers when investigating metadata source or quality issues
•
ETL tools used for data warehousing and Business Intelligence applications are often used effectively in metadata integration processes
March 8, 2010
306
Manage Metadata Repositories •
Implement a number of control activities in order to manage the metadata environment
•
Control of repositories is control of metadata movement and repository updates performed by the metadata specialist
March 8, 2010
307
Metadata Repositories •
Metadata repository refers to the physical tables in which the metadata are stored
•
Generic design and not merely reflecting the source system database designs
•
Metadata should be as integrated as possible this will be one of the most direct valued-added elements of the repository
March 8, 2010
308
Directories, Glossaries and Other Metadata Stores •
A Directory is a type of metadata store that limits the metadata to the location or source of data in the enterprise
•
A Glossary typically provides guidance for use of terms
•
Other Metadata stores include specialised lists such as source lists or interfaces, code sets, lexicons, spatial and temporal schema, spatial reference, and distribution of digital geographic data sets, repositories of repositories and business rules
March 8, 2010
309
Distribute and Deliver Metadata •
Metadata delivery layer is responsible for the delivery of the metadata from the repository to the end users and to any applications or tools that require metadata feeds to them
March 8, 2010
310
Query, Report and Analyse Metadata •
Metadata guides management and use of data assets
•
A metadata repository must have a front-end application that supports the search-and- retrieval functionality required for all this guidance and management of data assets
March 8, 2010
311
Data Quality Management
March 8, 2010
312
Data Quality Management • •
• • •
•
Critical support process in organisational change management Data quality is synonymous with information quality since poor data quality results in inaccurate information and poor business performance Data cleansing may result in short-term and costly improvements that do not address the root causes of data defects More rigorous data quality program is necessary to provide an economic solution to improved data quality and integrity Institutionalising processes for data quality oversight, management, and improvement hinges on identifying the business needs for quality data and determining the best ways to measure, monitor, control, and report on the quality of data Continuous process for defining the parameters for specifying acceptable levels of data quality to meet business needs, and for ensuring that data quality meets these levels March 8, 2010
313
Data Quality Management – Definition and Goals •
Definition − Planning, implementation, and control activities that apply quality management techniques to measure, assess, improve, and ensure the fitness of data for use
•
Goals − To measurably improve the quality of data in relation to defined business expectations − To define requirements and specifications for integrating data quality control into the system development lifecycle − To provide defined processes for measuring, monitoring, and reporting conformance to acceptable levels of data quality
March 8, 2010
314
Data Quality Management •
Data quality expectations provide the inputs necessary to define the data quality framework
•
Framework includes defining the requirements, inspection policies, measures, and monitors that reflect changes in data quality and performance
•
Requirements reflect three aspects of business data expectations − Way to record the expectation in business rules − Way to measure the quality of data within that dimension − Acceptability threshold
March 8, 2010
315
Data Quality Management Approach •
Planning for the assessment of the current state and identification of key metrics for measuring data quality
•
Deploying processes for measuring and improving the quality of data
•
Monitoring and measuring the levels in relation to the defined business expectations
•
Acting to resolve any identified issues to improve data quality and better meet business expectations
March 8, 2010
316
Data Quality Management - Overview Inputs
Primary Deliverables
•Business Requirements •Data Requirements •Data Quality Expectations •Data Policies and Standards •Business metadata •Technical metadata •Data Sources and Data Stores
•Improved Quality Data •Data Management •Operational Analysis •Data Profiles •Data Quality Certification Reports •Data Quality Service Level Agreements
Suppliers •External Sources •Regulatory Bodies •Business Subject Matter Experts •Information Consumers •Data Producers •Data Architects •Data Modelers
Participants •Data Quality Analysts •Data Analysts •Database Administrators •Data Stewards •Other Data Professionals •DRM Director •Data Stewardship Council March 8, 2010
Data Quality Management
Tools •Data Profiling Tools •Statistical Analysis Tools •Data Cleansing Tools •Data Integration Tools •Issue and Event Management Tools
Consumers •Data Stewards •Data Professionals •Other IT Professionals •Knowledge Workers •Managers and Executives Customers
Metrics •Data Value Statistics •Errors / Requirement Violations •Conformance to Expectations •Conformance to Service Levels
317
Data Quality Management Function, Activities and Sub-Activities Data Quality Management
Develop and Promote Data Quality Awareness
Define Data Quality Requirements
March 8, 2010
Profile, Analyse and Assess Data Quality
Define Data Quality Metrics
Define Data Quality Business Rules
Test and Validate Data Quality Requirements
Set and Evaluate Data Quality Service Levels
Continuously Measure and Monitor Data Quality
Manage Data Quality Issues
Clean and Correct Data Quality Defects
Design and Implement Operational DQM Procedures
Monitor Operational DQM Procedures and Performance
318
Data Quality Management - Principles • • • • • • • • • • • •
Manage data as a core organisational asset All data elements will have a standardised data definition, data type, and acceptable value domain Leverage Data Governance for the control and performance of DQM Use industry and international data standards whenever possible Downstream data consumers specify data quality expectations Define business rules to assert conformance to data quality expectations Validate data instances and data sets against defined business rules Business process owners will agree to and abide by data quality SLAs Apply data corrections at the original source, if possible If it is not possible to correct data at the source, forward data corrections to the owner of the original source whenever possible Report measured levels of data quality to appropriate data stewards, business process owners, and SLA managers Identify a gold record for all data elements
March 8, 2010
319
Develop and Promote Data Quality Awareness •
Promoting data quality awareness means more than ensuring that the right people in the organisation are aware of the existence of data quality issues
•
Establish a data governance framework for data quality − − − − − − − − −
Set priorities for data quality Develop and maintain standards for data quality Report relevant measurements of enterprise-wide data quality Provide guidance that facilitates staff involvement Establish communications mechanisms for knowledge sharing Develop and apply certification and compliance policies Monitor and report on performance Identify opportunities for improvements and build consensus for approval Resolve variations and conflicts
March 8, 2010
320
Define Data Quality Requirements • •
Applications are dependent on the use of data that meets specific needs associated with the successful completion of a business process Data quality requirements are often hidden within defined business policies − − − − −
•
Identify key data components associated with business policies Determine how identified data assertions affect the business Evaluate how data errors are categorised within a set of data quality dimensions Specify the business rules that measure the occurrence of data errors Provide a means for implementing measurement processes that assess conformance to those business rules
Dimensions of data quality − − − − − − − − − − −
Accuracy Completeness Consistency Currency Precision Privacy Reasonableness Referential Integrity Timeliness Uniqueness Validity
March 8, 2010
321
Profile, Analyse and Assess Data Quality •
Perform an assessment of the data using two different approaches, bottom-up and top-down
•
Bottom-up assessment of existing data quality issues involves inspection and evaluation of the data sets themselves
•
Top-down approach involves understanding how their processes consume data, and which data elements are critical to the success of the business application − Identify a data set for review − Catalog the business uses of that data set − Subject the data set to empirical analysis using data profiling tools and techniques − List all potential anomalies, review and evaluate − Prioritise criticality of important anomalies in preparation for defining data quality metrics March 8, 2010
322
Define Data Quality Metrics •
Poor data quality affects the achievement of business objectives
•
Seek and use indicators of data quality performance to report the relationship between flawed data and missed business objectives
•
Measuring quality similarly to monitoring any type of business performance activity
•
Data quality metrics should be reasonable and effective − − − − − −
Measurability Business Relevance Acceptability Accountability / Stewardship Controllability Trackability
March 8, 2010
323
Define Data Quality Business Rules •
Measurement of conformance to specific business rules requires definition
•
Monitoring conformance to these rules requires
•
Segregating data values, records, and collections of records that do not meet business needs from the valid ones
•
Generating a notification event alerting a data steward of a potential data quality issue
•
Establishing an automated or event driven process for aligning or possibly correcting flawed data within business expectations March 8, 2010
324
Test and Validate Data Quality Requirements •
Data profiling tools analyse data to find potential anomalies
•
Data profiling tools allow data analysts to define data rules for validation, assessing frequency distributions and corresponding measurements and then applying the defined rules against the data sets
•
Characterising data quality levels based on data rule conformance provides an objective measure of data quality
•
By using defined data rules to validate data, an organisation can distinguish those records that conform to defined data quality expectations and those that do not
•
In turn, these data rules are used to baseline the current level of data quality as compared to ongoing audits March 8, 2010
325
Set and Evaluate Data Quality Service Levels • •
•
Data quality SLAs specify the organisation’s expectations for response and remediation Having data quality inspection and monitoring in place increases the likelihood of detection and remediation of a data quality issue before a significant business impact can occur Operational data quality control defined in a data quality SLA includes − − − − − − − −
The data elements covered by the agreement The business impacts associated with data flaws The data quality dimensions associated with each data element The expectations for quality for each data element for each of the identified dimensions in each application or system in the value chain The methods for measuring against those expectations The acceptability threshold for each measurement The individual(s) to be notified in case the acceptability threshold is not met. The timelines and deadlines for expected resolution or remediation of the issue The escalation strategy and possible rewards and penalties when the resolution times are met.
March 8, 2010
326
Continuously Measure and Monitor Data Quality •
Provide continuous monitoring by incorporating control and measurement processes into the information processing flow
•
Incorporating the results of the control and measurement processes into both the operational procedures and reporting frameworks enable continuous monitoring of the levels of data quality
March 8, 2010
327
Manage Data Quality Issues •
• •
•
•
Supporting the enforcement of the data quality SLA requires a mechanism for reporting and tracking data quality incidents and activities for researching and resolving those incidents Data quality incident reporting system provides this capability Tracking of data quality incidents provides performance reporting data, including mean-time-to-resolve issues, frequency of occurrence of issues, types of issues, sources of issues and common approaches for correcting or eliminating problems Data quality incident tracking also requires a focus on training staff to recognise when data issues appear and how they are to be classified, logged and tracked according to the data quality SLA Implementing a data quality issues tracking system provides a number of benefits − Information and knowledge sharing can improve performance and reduce duplication of effort − Analysis of all the issues will help data quality team members determine any repetitive patterns, their frequency, and potentially the source of the issue March 8, 2010
328
Clean and Correct Data Quality Defects •
Perform data correction in three general ways − Automated correction - Submit the data to data quality and data cleansing techniques using a collection of data transformations and rule-based standardisations, normalisations, and corrections − Manual directed correction - Use automated tools to cleanse and correct data but require manual review before committing the corrections to persistent storage − Manual correction: Data stewards inspect invalid records and determine the correct values, make the corrections, and commit the updated records
March 8, 2010
329
Design and Implement Operational DQM Procedures •
Using defined rules for validation of data quality provides a means of integrating data inspection into a set of operational procedures associated with active DQM
•
Design and implement detailed procedures for operationalising activities − Inspection and monitoring − Diagnosis and evaluation of remediation alternatives − Resolving the issue − Reporting
March 8, 2010
330
Monitor Operational DQM Procedures and Performance Accountability is critical to the governance protocols overseeing data quality control • Issues must be assigned to some number of individuals, groups, departments, or organisations • Tracking process should specify and document the ultimate issue accountability to prevent issues from dropping through the cracks • Metrics can provide valuable insights into the effectiveness of the current workflow, as well as systems and resource utilisation and are important management data points that can drive continuous operational improvement for data quality control •
March 8, 2010
331
Conducting a Data Management Project
March 8, 2010
332
Conducting a Data Management Project •
Data management project depends on: − Scope of the Project – data management functions to be encompassed − Type of Project – from architecture to analysis to implementation − Scope Within the Organisation – one or more business units or the entire organisation
March 8, 2010
333
Data Management Function and Project Type Scope of Project Type of Project
Data Data Data Governance Architecture Development Management
Data Data Security Reference and Operations Management Master Data Management Management
Data Warehousing and Business Intelligence Management
Document and Content Management
Metadata Data Quality Management Management
Architecture
Analysis and Design
Implementation
Operational Improvement
Management and Administration
March 8, 2010
334
Mapping the Path Through the Selected Data Management Project •
Use the framework to define the breakdown of the selected project
March 8, 2010
335
Project Elements – Data Management Functions, Type of Project, Organisational Scope
Organisational Scope of Project Data Management Functions Within Scope of Project
Type of Project
•
Select the project building blocks based on the project scope
March 8, 2010
336
Creating a Data Management Team
March 8, 2010
337
Creating a Data Management Team •
Having implemented a data management framework, must be monitored, managed and constantly improved
•
Need to consolidate and coordinate data management and governance efforts to meet the challenges of − Demand for performance management data − Complexity in systems and processes − Greater regulatory and compliance requirements
•
Build a Data Management Center of Excellence (DMCOE)
March 8, 2010
338
Data Management Center of Excellence • • • • • •
•
Separate business units with the organisation generally implement their own solutions Each business unit will have different IT systems, data warehouses/data marts and business intelligence tools Organisation-wide coordination of data resources requires a centralised dedicated structure like the DMCOE providing data services Leads a organisation to business benefits through continuous improvement of data management DMCOE functions need to focus on leveraging organisational knowledge and skills to maximise the value of data to the organisation Maximise technology investment while decreasing costs and increasing efficiency, centralise best practices and standards and empower knowledge workers with information and provide thought leadership to the entire company DMCOE does not exist in isolation to other operations and service management functions
March 8, 2010
339
DMCOE Functions Maximise the value of the data technology investment to the organisation by taking a portfolio approach to increase skills and leverage and to optimise the infrastructure • Focus on project delivery and information asset creation with an emphasis on reusability and knowledge management along with solution delivery • Ensure the integrity of the organisation’s business processes and information systems • Ensure the quality compliance effort related to the configuration, development, and documentation of enhancements • Develop information learning and effective practices •
March 8, 2010
340
Data Charter •
Create charter that lists the fundamental principles of data management the DMCOE will adhere to: − Data Strategy - Create a data blueprint, based upon business functions to facilitate data design − Data Sharing - Promote the sharing of data across the organisation and reduce data redundancy − Data Integrity - Ensure the integrity of data from design and availability perspectives − Technical Expertise - Provide the expertise for the development and support of data systems − High Availability and Optimal Performance - Ensure consistent high availability of data systems through proper design and use and optimise performance of the data systems March 8, 2010
341
DMCOE Skills •
DMCOE needs skills across three dimensions − Specific data management functions − Business management and administration − Technology and service management
March 8, 2010
342
DMCOE Skills Data Management Design and Development Data Management Process Management
Data Management Business Skills
Personnel Management
Data Management Portfolio Management Data Management Strategy
Data Reference Data Data Warehousing Document Data Data Data Security and Master Metadata Data Quality Architecture Operations and Business and Content Governance Development Management Data Management Management Management Management Intelligence Management Management Management
Data Management Specific Functions
Environment and Infrastructure Management Service Management and Support
Data Management Technology and Service Functions March 8, 2010
•
Application Deployment and Data Migration
•
Idealised set of DMCOE skills that need to be customised to suit specific organisation needs Just one view of a DMCOE
Technical Architecture 343
DMCOE Business Management and Administration Skills DMCOE Business Management and Administration
Data Management Strategy
Data Management Portfolio Management
Personnel Management
Data Management Process Management
Data Management Design and Development
Education and Skills Identification and Development
Creation and Enforcement of Process Standards
Requirements Definition and Management
Co-ordination of Data Management Systems and Initiatives
Resource Management and Allocation
Management of Data Processes
Analysis and Design
Creation and Enforcement of Data Principles and Standards
Vendor Management
Performance Management
Development Standards
Data Quality
Solution Development and Deployment
Strategic Planning Processes
Data Usage Strategy March 8, 2010
Management of Portfolio of Data Management Initiatives
344
DMCOE Technology and Service Management Skills DMCOE Technology and Service Management
Environment and Infrastructure Management
March 8, 2010
Service Management and Support
Application Deployment and Data Migration
Technical Architecture
Change Management and Control
Service Desk
Application Deployment
Infrastructure Architecture
Version Management and Control
Service Level Management
Test Management – System, Integration, UAT, UAT Support
Application and Tools Architecture
Performance Monitoring and Management
Security Management
Data Migration Management
Data and Content Architecture
Reporting
System Maintenance
Integration Architecture 345
Benefits of DMCOE Consistent infrastructure that reduces time to analyse and design and implement new IT solutions • Reduced data management costs through a consistent data architecture and data integration infrastructure reduced complexity, redundancy, tool proliferation • Centralised repository of the organisation's data knowledge • Organisation-wide standard methodology and processes to develop and maintain data infrastructure and procedures • Increased data availability • Increased data quality •
March 8, 2010
346
Assessing Your Data Management Maturity
March 8, 2010
347
Assessing Your Data Management Maturity •
•
•
A Data Management Maturity Model is a measure of and then a process for determining the level of maturity that exists within an organisation’s data management function Provides a systematic framework for improving data management capability and identifying and prioritising opportunities, reducing cost and optimising the business value of data management investments Measure of data management maturity so that: − It can be tracked over time to measure improvements − It can be use to define project for data management maturity improvements within costs, time, and return on investment constraints
•
Enables organisations to improve their data management function so that they can increase productivity, increase quality, decrease cost and decrease risk March 8, 2010
348
Data Management Maturity Model •
Assesses data management maturity on a level of 1 to 5 across a number of data management capabilities Level
Title
Description
1
Initial
2
Repeatable and Reactive
Data management is ad hoc and localised. Everybody has their own approach that is unique and not standardised except for local initiatives. Data management has become independent of the person or business unit administering and is standardised.
3
Defined and Standardised
Data management is fully documented, determined by subject matter experts and validated.
4
Managed and Predictable
5
Optimising and Innovating
Data management results and outcomes are stored and proactively cross-related within and between business units. The data management function actively exploit benefits of standardisation. As time, resources, technology, requirements and business landscape changes the data management function is able to be easily and quickly adjusted to fit new needs and environments
March 8, 2010
349
Maturity Level 1 - Initial • • • • • • • • •
Data management processes are mostly disorganised and generally performed on an ad hoc or even even chaotic basis Data is considered as general purpose and is not viewed by either business or executive management to be a problem or a priority Data is accessible but not always available and is not secure or auditable No data management group and no one owns the responsibility for ensuring the quality, accuracy or integrity of the data Data management (to the degree that it is done at all) is reliant on the efforts and competence of individuals Data proliferates without control and the quality is inconsistent across the various business and applications silos Data exists in unconnected databases and spreadsheets using multiple formats and inconsistent definitions Little data profiling or analysis and data is not considered or understood as a component of linked processes No formal data quality processes and the processes that do exist are not repeatable because they are neither well defined nor well documented March 8, 2010
350
Maturity Level 2 - Repeatable and Reactive • • •
• • • • •
Fundamental data management practices are established, defined, documented and can be repeated Data policies for creation and change management exist, but still rely on individuals and are not institutionalised throughout the organisation Data as valuable asset is a concept understood by some, but senior management support is lacking and there is little organisational buy-in to the importance of an enterprise-wide approach to managing data data is stored locally and data quality is reactive to circumstances Requirements are known and managed at the business unit and application level Procurement is ad hoc based on individual needs and data duplication is mostly invisible Data quality varies among business units and data failures occur on a crossfunctional basis. Most data is integrated point-to-point and not across business units
March 8, 2010
351
Maturity Level 3 - Defined and Standardised • • • • •
• • • •
Business analysts begin to control the data management process with IT playing a supporting role Data is recognised as a business enabler and moves from an undervalued commodity to an enterprise asset but there are still limited controls in place Executive management appreciates and understands the role of data governance and commits resources to its management Data administrative function exists as a complement to the database administration function and data is present for both business and IT related development discussions Some core data has defined policy that it is documented as part of the applications development lifecycle and the policies are enforced to a limited extent and testing is performed to ensure that data quality requirements are being achieved Data quality is not fully defined and there are multiple views of what quality Metadata repository exists and a data group maintains corporate data definitions and business rules A centralised platform for managing data is available at the group level and feeds analytical data marts Data is available to business users and can be audited
March 8, 2010
352
Maturity Level 4 - Managed and Predictable • • • • • • • • • • • • •
Data is treated as a critical corporate asset and viewed as equivalent to other enterprise wide assets Unified data governance strategy exists throughout the enterprise with executive level and CEO support Data management objectives are reviewed by senior management Business process interaction is completely documented and planning is centralised Data quality control, integration and synchronisation are integral parts of all business processes Content is monitored and corrected in real time to manage the reliability of the data manufacturing process and is based on the needs of customers, end users and the organisation as a whole Data quality is understood in statistical terms and managed throughout the transactions lifecycle Root cause analysis is well established and proactive steps are taken to prevent and not just correct data inconsistencies A centralised metadata repository exists and all changes are synchronised Data consistency is expected and achieved Data platform is managed at the enterprise level and feeds all reference data repositories Advanced platform tools are used to manage the metadata repository and all data transformation processes Data quality and integration tools are standardised across the enterprise.
March 8, 2010
353
Maturity Level 5 - Optimising and Innovating • • • • • • • • • • •
The organisation is in continuous improvement mode Process enhancements are managed through monitoring feedback and a quantitative understanding of the causes of data inconsistencies Enterprise wide business intelligence is possible Organisation is agile enough to respond to changing circumstances and evolving business objectives Data is considered as the key resource for process improvement Data requirements for all projects are defined and agreed prior to initiation Development stresses the re-use of data and is synchronised with the procurement process Process of data management is continuously being improved Data quality (both monitoring and correction) is fully automated and adaptive Uncontrolled data duplication is eliminated and controlled duplication must be justified Governance is data driven and the organisation adopts a “test and learn” philosophy March 8, 2010
354
Data Management Maturity Evaluation - Key Capabilities and Maturity Levels Level 1
Level 2
Level 3
Level 4
Level 5
Data Governance Data Architecture Management
< Description of capability associated with maturity level >
Data Development Data Operations Management Data Security Management Reference and Master Data Management Data Warehousing and Business Intelligence Management Document and Content Management Metadata Management Data Quality Management March 8, 2010
355
More Information Alan McSweeney
[email protected]
March 8, 2010
356