May 2, 2017 | Author: Murali Devarinti | Category: N/A
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10 Data Integrator other solution
Date Training Center Instructors
Education Website
Participant Handbook Course Version: 96 Course Duration: 3 Day(s) Material Number: 50104424
An SAP course - use it to learn, reference it for work
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Copyright Copyright © 2011 SAP AG. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be changed without prior notice. Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors.
Trademarks •
Microsoft®, WINDOWS®, NT®, EXCEL®, Word®, PowerPoint® and SQL Server® are registered trademarks of Microsoft Corporation.
•
IBM®, DB2®, OS/2®, DB2/6000®, Parallel Sysplex®, MVS/ESA®, RS/6000®, AIX®, S/390®, AS/400®, OS/390®, and OS/400® are registered trademarks of IBM Corporation.
•
ORACLE® is a registered trademark of ORACLE Corporation.
•
INFORMIX®-OnLine for SAP and INFORMIX® Dynamic ServerTM are registered trademarks of Informix Software Incorporated.
•
UNIX®, X/Open®, OSF/1®, and Motif® are registered trademarks of the Open Group.
•
Citrix®, the Citrix logo, ICA®, Program Neighborhood®, MetaFrame®, WinFrame®, VideoFrame®, MultiWin® and other Citrix product names referenced herein are trademarks of Citrix Systems, Inc.
•
HTML, DHTML, XML, XHTML are trademarks or registered trademarks of W3C®, World Wide Web Consortium, Massachusetts Institute of Technology.
•
JAVA® is a registered trademark of Sun Microsystems, Inc.
•
JAVASCRIPT® is a registered trademark of Sun Microsystems, Inc., used under license for technology invented and implemented by Netscape.
•
SAP, SAP Logo, R/2, RIVA, R/3, SAP ArchiveLink, SAP Business Workflow, WebFlow, SAP EarlyWatch, BAPI, SAPPHIRE, Management Cockpit, mySAP.com Logo and mySAP.com are trademarks or registered trademarks of SAP AG in Germany and in several other countries all over the world. All other products mentioned are trademarks or registered trademarks of their respective companies.
Disclaimer THESE MATERIALS ARE PROVIDED BY SAP ON AN "AS IS" BASIS, AND SAP EXPRESSLY DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS OR APPLIED, INCLUDING WITHOUT LIMITATION WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, WITH RESPECT TO THESE MATERIALS AND THE SERVICE, INFORMATION, TEXT, GRAPHICS, LINKS, OR ANY OTHER MATERIALS AND PRODUCTS CONTAINED HEREIN. IN NO EVENT SHALL SAP BE LIABLE FOR ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, CONSEQUENTIAL, OR PUNITIVE DAMAGES OF ANY KIND WHATSOEVER, INCLUDING WITHOUT LIMITATION LOST REVENUES OR LOST PROFITS, WHICH MAY RESULT FROM THE USE OF THESE MATERIALS OR INCLUDED SOFTWARE COMPONENTS.
g2011629105239
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
About This Handbook This handbook is intended to complement the instructor-led presentation of this course, and serve as a source of reference. It is not suitable for self-study.
Typographic Conventions American English is the standard used in this handbook. The following typographic conventions are also used. Type Style
Description
Example text
Words or characters that appear on the screen. These include field names, screen titles, pushbuttons as well as menu names, paths, and options. Also used for cross-references to other documentation both internal and external.
2011
Example text
Emphasized words or phrases in body text, titles of graphics, and tables
EXAMPLE TEXT
Names of elements in the system. These include report names, program names, transaction codes, table names, and individual key words of a programming language, when surrounded by body text, for example SELECT and INCLUDE.
Example text
Screen output. This includes file and directory names and their paths, messages, names of variables and parameters, and passages of the source text of a program.
Example text
Exact user entry. These are words and characters that you enter in the system exactly as they appear in the documentation.
Variable user entry. Pointed brackets indicate that you replace these words and characters with appropriate entries.
© 2011 SAP AG. All rights reserved.
iii
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
About This Handbook
BODS10
Icons in Body Text The following icons are used in this handbook. Icon
Meaning For more information, tips, or background
Note or further explanation of previous point Exception or caution Procedures
Indicates that the item is displayed in the instructor's presentation.
iv
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Contents Course Overview ......................................................... vii Course Goals ...........................................................vii Course Objectives .....................................................vii
Unit 1: Defining Data Services .......................................... 1 Defining Data Services .................................................2
Unit 2: Defining Source and Target Metadata ..................... 25 Defining Datastores in Data Services .............................. 26 Defining Datastores in Data Services .............................. 39 Defining Data Services System Configurations ................... 53 Defining a Data Services Flat File Format ......................... 55 Defining Datastore Excel File Formats............................. 67 Defining XML Formats ............................................... 77
Unit 3: Creating Batch Jobs ........................................... 89 Creating Batch Jobs .................................................. 90
Unit 4: Troubleshooting Batch Jobs ............................... 115 Setting Traces and Adding Annotations .......................... 116 Setting Traces and Adding Annotations ..........................130 Using the Interactive Debugger ....................................144 Setting up and Using the Auditing Feature .......................155
Unit 5: Using Functions, Scripts and Variables.................. 169 Using Built-In Functions .............................................170 Using Variables, Parameters and Scripts .........................197
Unit 6: Using Platform Transforms ................................. 219 Using Platform Transforms .........................................220 Using the Map Operation Transform ..............................224 Using the Validation Transform ....................................231 Using the Merge Transform.........................................254 Using the Case Transform ..........................................270 Using the SQL Transform ...........................................283
Unit 7: Setting Up Error Handling ................................... 293 Setting Up Error Handling...........................................294
2011
© 2011 SAP AG. All rights reserved.
v
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Contents
BODS10
Unit 8: Capturing Changes in Data ................................. 315 Capturing Changes in Data.........................................316 Using Source-Based Change Data Capture (CDC) .............324 Using Target-Based Change Data Capture (CDC) ..............342
Unit 9: Using Data Integrator Platforms ........................... 359 Using Data Integrator Platform Transforms ......................360 Using the Pivot Transform ..........................................366 Using the Data Transfer Transform and Performance Optimization.......................................................376 Using the XML Pipeline Transform ................................392 Using the Hierarchy Flattening Transform (Optional) ...........405
Unit 10: Using Text Data Processing ............................... 421 Using the Entity Extraction Transform.............................422
vi
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Course Overview SAP BusinessObjects™ Data Integrator 4.0 enables you to integrate disparate data sources to deliver more timely and accurate data that end users in an organization can trust. In this three–day course, you will learn about creating, executing, and troubleshooting batch jobs, using functions, scripts and transforms to change the structure and formatting of data, handling errors, and capturing changes in data. As a business benefit, by being able to create efficient data integration projects, you can use the transformed data to help improve operational and supply chain efficiencies, enhance customer relationships, create new revenue opportunities, and optimize return on investment from enterprise applications.
Target Audience This course is intended for the following audiences: • •
Solution consultants responsible for implementing data integration projects. Power users responsible for implementing, administering, and managing data integration projects.
Course Prerequisites Required Knowledge •
Basic knowledge of ETL (Extraction, Transformation, and Loading) of data processes
Course Goals This course will prepare you to: • • •
Stage data in an operational datastore, data warehouse, or data mart. Update staged data in batch mode Transform data for analysis to improver operational efficiencies
Course Objectives After completing this course, you will be able to: • • • •
2011
Integrate disparate data sources Create, execute, and troubleshoot batch jobs Use functions, scripts, and transforms to modify data structures and format data Handle errors in the extraction and transformation process
© 2011 SAP AG. All rights reserved.
vii
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Course Overview
•
viii
BODS10
Capture changes in data from data sources using different techniques
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 1 Defining Data Services Unit Overview Data Integrator provides a graphical interface that allows you to easily create jobs that extract data from heterogeneous sources, transform that data to meet the business requirements of your organization, and load the data into a single location. The Data Services platform enables you to perform enterprise-level data integration and data quality functions. This unit describes the Data Services platform and its architecture, Data Services objects and its graphical interface, the Data Services Designer.
Unit Objectives After completing this unit, you will be able to: • •
Define Data Services objects Use the Data Services Designer interface
Unit Contents Lesson: Defining Data Services .................................................2
2011
© 2011 SAP AG. All rights reserved.
1
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 1: Defining Data Services
BODS10
Lesson: Defining Data Services Lesson Overview Data Services is a graphical interface for creating and staging jobs for data integration and data quality purposes.
Lesson Objectives After completing this lesson, you will be able to: • •
Define Data Services objects Use the Data Services Designer interface
Business Example For reporting in SAP NetWeaver Business Warehouse, your company needs data from diverse data sources, such as SAP systems, non-SAP systems, the Internet and other business applications. You should therefore examine the technologies that SAP NetWeaver BW offers for data acquisition.
Describing Data Services Business Objects Data Services provides a graphical interface that allows you to easily create jobs that extract data from heterogeneous sources, transform that data to meet the business requirements of your organization, and load the data into a single location. Note: Although Data Services can be used for both real-time and batch jobs, this course covers batch jobs only. Data Services combines both batch and real-time data movement and management with intelligent caching to provide a single data integration platform for information management from any information source and for any information use.
2
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Data Services
Figure 1: Data Services Architecture–Access Server
This unique combination allows you to: • • • •
Stage data in an operational data store, data warehouse, or data mart. Update staged data in batch or real-time modes. Create a single environment for developing, testing, and deploying the entire data integration platform. Manage a single metadata repository to capture the relationships between different extraction and access methods and provide integrated lineage and impact analysis.
Data Services performs three key functions that can be combined to create a scalable, high-performance data platform. It: •
• •
Loads Enterprise Resource Planning (ERP) or enterprise application data into an operational datastore (ODS) or analytical data warehouse, and updates in batch or real-time modes. Creates routing requests to a data warehouse or ERP system using complex rules. Applies transactions against ERP systems.
Data mapping and transformation can be defined using the Data Services Designer graphical user interface. Data Services automatically generates the appropriate interface calls to access the data in the source system. For most ERP applications, Data Services generates SQL optimized for the specific target database (Oracle, DB2, SQL Server, Informix, and so on). Automatically-generated, optimized code reduces the cost of maintaining
2011
© 2011 SAP AG. All rights reserved.
3
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 1: Defining Data Services
BODS10
data warehouses and enables you to build data solutions quickly, meeting user requirements faster than other methods (for example, custom-coding, direct-connect calls, or PL/SQL). Data Services can apply data changes in a variety of data formats, including any custom format using a Data Services adapter. Enterprise users can apply data changes against multiple back-office systems singularly or sequentially. By generating calls native to the system in question, Data Services makes it unnecessary to develop and maintain customized code to manage the process. You can also design access intelligence into each transaction by adding flow logic that checks values in a data warehouse or in the transaction itself before posting it to the target ERP system. Data Services Packages (Rapid Marts) Data Services provides a wide range of functionality, depending on the package and options selected: • •
•
Data Integrator packages provide platform transforms for core functionality, and Data Integrator transforms to enhance data integration projects. Data Quality packages provide platform transforms for core functionality, and Data Quality transforms to parse, standardize, cleanse, enhance, match, and consolidate data. Data Services packages provide all of the functionality of both the Data Integrator and Data Quality packages.
The process to build a reporting data mart might take approximately 6-12 months, but with Data Services Rapid Marts, this could be done in 6-12 days.
4
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Data Services
Figure 2: Steps to Building a Reporting Data Mart
The process would begin with accessing the key source tables.
Figure 3: Accessing the Key Source Tables
2011
© 2011 SAP AG. All rights reserved.
5
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 1: Defining Data Services
BODS10
Once the source tables are identified and accessed, predelivered data extractors, transformations and load programs can then be tested.
Figure 4: Pre-Built ETL Jobs
Industry standards are used to build a predeveloped data model based on best practices.
6
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Data Services
Figure 5: Target Data Model
Finally to accelerate the project, prebuilt universes and reports provide are contained in each Rapid Mart.
2011
© 2011 SAP AG. All rights reserved.
7
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 1: Defining Data Services
BODS10
Figure 6: Pre-Built Universes and Reports
Note: This course does not cover the subject of Rapid Marts.
The Data Services Architecture Data Services relies on several unique components to accomplish the data integration and data quality activities required to manage your corporate data. Data Services includes the standard components: • • • • • • • • • •
Designer Repository Job Server Engines Access Server Adapters Real-time Services Address Server Cleansing Packages, Dictionaries, and Directories Management Console
This diagram illustrates the relationships between these components:
8
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Data Services
Figure 7: Data Services Architecture
The Data Services Designer Data Services Designer is a Windows client application used to create, test, and manually execute jobs that transform data and populate a data warehouse. Using Designer, you create data management applications that consist of data mappings, transformations, and control logic.
2011
© 2011 SAP AG. All rights reserved.
9
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 1: Defining Data Services
BODS10
Figure 8: Data Services Designer Interface
You can create objects that represent data sources, and then drag, drop, and configure them in flow diagrams. Designer allows you to manage metadata stored in a local repository. From the Designer, you can also trigger the Job Server to run your jobs for initial application testing. The Data Services Repository The Data Services repository is a set of tables that holds user-created and predefined system objects, source and target metadata, and transformation rules. It is set up on an open client/server platform to facilitate sharing metadata with other enterprise tools. Each repository is stored on an existing Relational Database Management System (RDBMS).
10
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Data Services
Figure 9: Data Services Repository
There are three types of repositories: •
•
•
A local repository (known in Designer as the Local Object Library) is used by an application designer to store definitions of source and target metadata and Data Services objects. A central repository (known in Designer as the Central Object Library) is an optional component that can be used to support multiuser development. The Central Object Library provides a shared library that allows developers to check objects in and out for development. A profiler repository is used to store information that is used to determine the quality of data.
The Data Services Job Server Each repository is associated with at least one Data Services Job Server, which retrieves the job from its associated repository and starts the data movement engine. The data movement engine integrates data from multiple heterogeneous sources, performs complex data transformations, and manages extractions and transactions from ERP systems and other sources. The Job Server can move data in batch or real-time mode and uses distributed query optimization, multithreading, in-memory caching, in-memory data transformations, and parallel processing to deliver high data throughput and scalability.
2011
© 2011 SAP AG. All rights reserved.
11
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 1: Defining Data Services
BODS10
Figure 10: Data Services Architecture–Job Server
While designing a job, you can run it from the Designer. In your production environment, the Job Server runs jobs triggered by a scheduler or by a real-time service managed by the Data Services Access Server. In production environments, you can balance job loads by creating a Job Server Group (multiple Job Servers), which executes jobs according to overall system load. Data Services provides distributed processing capabilities through the Server Groups. A Server Group is a collection of Job Servers that each reside on different Data Services server computers. Each Data Services server can contribute one, and only one, Job Server to a specific Server Group. Each Job Server collects resource utilization information for its computer. This information is utilized by Data Services to determine where a job, data flow or subdata flow (depending on the distribution level specified) should be executed. The Data Services Engines When Data Services jobs are executed, the Job Server starts Data Services engine processes to perform data extraction, transformation, and movement. Data Services engine processes use parallel processing and in-memory data transformations to deliver high data throughput and scalability. The Data Services Cleansing Packages, Dictionaries, and Directories The Data Quality Cleansing Packages, dictionaries, and directories provide referential data for the Data Cleanse and Address Cleanse transforms to use when parsing, standardizing, and cleansing name and address data.
12
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Data Services
Figure 11: Data Services Architecture–Address Server
Cleansing Packages enhance the ability of Data Cleanse to accurately process various forms of global data by including language-specific reference data and parsing rules. Directories provide information on addresses from postal authorities; dictionary files are used to identify, parse, and standardize data such as names, titles, and firm data. Dictionaries also contain acronym, match standard, gender, capitalization, and address information. The Data Services Management Console
2011
© 2011 SAP AG. All rights reserved.
13
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 1: Defining Data Services
BODS10
The Data Services Management Console provides access to these features: •
Administrator Administer Data Services resources, including:
•
– Scheduling, monitoring, and executing batch jobs – Configuring, starting, and stopping real-time services – Configuring Job Server, Access Server, and repository usage – Configuring and managing adapters – Managing users – Publishing batch jobs and real-time services via web services – Reporting on metadata Auto Documentation View, analyze, and print graphical representations of all objects as depicted in Data Services Designer, including their relationships, properties, and more.
•
Data Validation Evaluate the reliability of your target data based on the validation rules you create in your Data Services batch jobs to quickly review, assess, and identify potential inconsistencies or errors in source data.
•
Impact and Lineage Analysis Analyze end-to-end impact and lineage for Data Services tables and columns, and SAP BusinessObjects Business Intelligence platform objects such as universes, business views, and reports.
•
Operational Dashboard View dashboards of status and performance execution statistics of Data Services jobs for one or more repositories over a given time period.
•
Data Quality Reports Use data quality reports to view and export SAP Crystal Reports for batch and real-time jobs that include statistics-generating transforms. Report types include job summaries, transform-specific reports, and transform group reports. To generate reports for Match, US Regulatory Address Cleanse, and Global Address Cleanse transforms, you must enable the Generate report data option in the Transform Editor.
Other Data Services Tools There are also several tools to assist you in managing your Data Services installation.
14
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Data Services
The Data Services Repository Manager allows you to create, upgrade, and check the versions of local, central, and profiler repositories. The Data Services Server Manager allows you to add, delete, or edit the properties of Job Servers. It is automatically installed on each computer on which you install a Job Server. Use the Server Manager to define links between Job Servers and repositories. You can link multiple Job Servers on different machines to a single repository (for load balancing) or each Job Server to multiple repositories (with one default) to support individual repositories (for example, separating test and production environments). The License Manager displays the Data Services components for which you currently have a license. The Metadata Integrator allows Data Services to seamlessly share metadata with SAP BusinessObjects Intelligence products. Run the Metadata Integrator to collect metadata into the Data Services repository for Business Views and Universes used by SAP Crystal Reports, Desktop Intelligence documents, and Web Intelligence documents.
Defining Data Services objects Data Services provides you with a variety of objects to use when you are building your data integration and data quality applications.
Figure 12: Data Services Object Types
Data Services objects
2011
© 2011 SAP AG. All rights reserved.
15
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 1: Defining Data Services
BODS10
In Data Services, all entities you add, define, modify, or work with are objects. Some of the most frequently-used objects are: • • • • • •
Projects Jobs Work flows Data flows Transforms Scripts
This diagram shows some common objects.
Figure 13: Data Services Objects
All objects have options, properties, and classes. Each can be modified to change the behavior of the object. Options control the object. For example, to set up a connection to a database, the database name is an option for the connection. Properties describe the object. For example, the name and creation date describe what the object is used for and when it became active. Attributes are properties used to locate and organize objects. Classes define how an object can be used. Every object is either reusable or single-use.
16
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Data Services
Single-use objects appear only as components of other objects. They operate only in the context in which they were created. You cannot copy single-use objects. A reusable object has a single definition and all calls to the object refer to that definition. If you change the definition of the object in one place, and then save the object, the change is reflected to all other calls to the object. Most objects created in Data Services are available for reuse. After you define and save a reusable object, Data Services stores the definition in the repository. You can then reuse the definition as necessary by creating calls to it. For example, a data flow within a project is a reusable object. Multiple jobs, such as a weekly load job and a daily load job, can call the same data flow. If this data flow is changed, both jobs call the new version of the data flow. You can edit reusable objects at any time independent of the current open project. For example, if you open a new project, you can open a data flow and edit it. However, the changes you make to the data flow are not stored until you save them. Defining Relationship between Objects Jobs are composed of work flows and/or data flows: • •
A work flow is the incorporation of several data flows into a sequence. A data flow process transforms source data into target data.
Figure 14: Data Services Object Relationships
2011
© 2011 SAP AG. All rights reserved.
17
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 1: Defining Data Services
BODS10
A work flow orders data flows and the operations that support them. It also defines the interdependencies between data flows. For example, if one target table depends on values from other tables, you can use the work flow to specify the order in which you want Data Services to populate the tables. You can also use work flows to define strategies for handling errors that occur during project execution, or to define conditions for running sections of a project. A data flow defines the basic task that Data Services accomplishes, which involves moving data from one or more sources to one or more target tables or files. You define data flows by identifying the sources from which to extract data, the transformations the data should undergo, and targets. Defining projects and jobs A project is the highest-level object in Designer. Projects provide a way to organize the other objects you create in Designer. A job is the smallest unit of work that you can schedule independently for execution. A project is a single-use object that allows you to group jobs. For example, you can use a project to group jobs that have schedules that depend on one another or that you want to monitor together. Projects have these characteristics: • • •
Projects are listed in the Local Object Library. Only one project can be open at a time. Projects cannot be shared among multiple users.
The objects in a project appear hierarchically in the project area. If a plus sign (+) appears next to an object, you can expand it to view the lower-level objects contained in the object. Data Services displays the contents as both names and icons in the project area hierarchy and in the workspace. Jobs must be associated with a project before they can be executed in the project area of Designer. Using Work Flows Jobs with data flows can be developed without using work flows. However, one should consider nesting data flows inside of work flows by default. This practice can provide various benefits. Always using work flows makes jobs more adaptable to additional development and/or specification changes. For instance, if a job initially consists of four data flows that are to run sequentially, they could be set up without work flows. But what if specification changes require that they be merged into another job instead? The developer would have to replicate their sequence correctly in the other job. If these had been initially added to a work flow, the developer could then have simply copied that work flow into the correct position within the new job. There would be no need to learn, copy, and verify the previous sequence. The change can be made more quickly with greater accuracy.
18
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Data Services
Even if there is one data flow per work flow, there are benefits to adaptability. Initially, it may have been decided that recovery units are not important; the expectation being that if the job fails, the whole process could simply be rerun. However, as data volumes tend to increase, it may be determined that a full reprocessing is too time consuming. The job may then be changed to incorporate work flows to benefit from recovery units to bypass reprocessing of successful steps. However, these changes can be complex and can consume more time than allotted for in a project plan. It also opens up the possibility that units of recovery are not properly defined. Setting these up during initial development when the full analysis of the processing nature is preferred. Note: This course focuses on creating batch jobs using database Datastores and file formats.
Using the Data Services Designer The Data Services Designer interface allows you to plan and organize your data integration and data quality jobs in a visual way. Most of the components of Data Services can be programmed with this interface. Describing the Designer window The Data Services Designer interface consists of a single application window and several embedded supporting windows. The application window contains the menu bar, toolbar, Local Object Library, project area, tool palette, and workspace. Using the Local Object Library The Local Object Library gives you access to the object types listed in the table below. The table shows the tab on which the object type appears in the Local Object Library and describes the Data Services context in which you can use each type of object. You can import objects to and export objects from your Local Object Library as a file. Importing objects from a file overwrites existing objects with the same names in the destination Local Object Library. Whole repositories can be exported in either .atl or .xml format. Using the .xml file format can make repository content easier for you to read. It also allows you to export Data Services to other products. Using the Tool Palette The tool palette is a separate window that appears by default on the right edge of the Designer workspace. You can move the tool palette anywhere on your screen or dock it on any edge of the Designer window. The icons in the tool palette allow you to create new objects in the workspace. Disabled icons occur when there are invalid entries to the diagram open in the workspace.
2011
© 2011 SAP AG. All rights reserved.
19
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 1: Defining Data Services
BODS10
To show the name of each icon, hold the cursor over the icon until the tool tip for the icon appears. When you create an object from the tool palette, you are creating a new definition of an object. If a new object is reusable, it is automatically available in the Local Object Library after you create it. If you select the data flow icon from the tool palette and define a new data flow called DF1, you can later drag that existing data flow from the Local Object Library and add it to another data flow called DF2. Using the Workspace When you open a job or any object within a job hierarchy, the workspace becomes active with your selection. The workspace provides a place to manipulate objects and graphically assemble data movement processes. These processes are represented by icons that you drag and drop into a workspace to create a diagram. This diagram is a visual representation of an entire data movement application or some part of a data movement application. You specify the flow of data by connecting objects in the workspace from left to right in the order you want the data to be moved.
20
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Data Services
Lesson Summary You should now be able to: • Define Data Services objects • Use the Data Services Designer interface
2011
© 2011 SAP AG. All rights reserved.
21
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit Summary
BODS10
Unit Summary You should now be able to: • Define Data Services objects • Use the Data Services Designer interface
22
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Test Your Knowledge
Test Your Knowledge 1.
Which of these objects is single-use? Choose the correct answer(s).
□ □ □ □ 2.
2011
A B C D
Job? Project? Data Flow? Work Flow?
Place these objects in order by their hierarchy: data flows, jobs, projects, and work flows.
© 2011 SAP AG. All rights reserved.
23
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Test Your Knowledge
BODS10
Answers 1.
Which of these objects is single-use? Answer: B Jobs, Data Flows and Work Flows are all reusable.
2.
Place these objects in order by their hierarchy: data flows, jobs, projects, and work flows. Answer: Projects, jobs, work flows and data flows.
24
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2 Defining Source and Target Metadata Unit Overview To define data movement requirements in Data Services, you must import source and target metadata. A datastore provides a connection or multiple connections to data sources such as a database. Through the datastore connection, Data Services can import the metadata that describes the data from the source. Data Services uses these datastores to read data from source tables or load data to target tables.
Unit Objectives After completing this unit, you will be able to: • • • • • • •
Crate various types of Datastores Crate various types of Datastores Define system configurations in Data Services Defining flat file formats as a basis for a Datastore Create a Data Services Excel file format Import data from XML documents Unnest data in XML documents
Unit Contents Lesson: Defining Datastores in Data Services............................... 26 Exercise 1: Creating Source and Target Datastores.................... 31 Lesson: Defining Datastores in Data Services............................... 39 Exercise 2: Creating Source and Target Datastores.................... 45 Lesson: Defining Data Services System Configurations ................... 53 Lesson: Defining a Data Services Flat File Format ......................... 55 Exercise 3: Creating a Flat File format ................................... 63 Lesson: Defining Datastore Excel File Formats ............................. 67 Exercise 4: Creating an Excel File Format ............................... 73 Lesson: Defining XML Formats ................................................ 77
2011
© 2011 SAP AG. All rights reserved.
25
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
BODS10
Lesson: Defining Datastores in Data Services Lesson Overview Using Datastores to help define data movement requirements in Data Services.
Lesson Objectives After completing this lesson, you will be able to: •
Crate various types of Datastores
Business Example You are responsible for extracting data into the company's SAP NetWeaver Business Warehouse system and want to convert to using Data Services as the new data transfer process.
Using Datastores A Datastore provides a connection or multiple connections to data sources such as a database. Using the Datastore connection, Data Services can import the metadata that describes the data from the data source.
Figure 15: Datastore
26
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Datastores in Data Services
Data Services uses these Datastores to read data from source tables or load data to target tables. Each source or target must be defined individually and the Datastore options available depend on which Relational Database Management System (RDBMS) or application is used for the Datastore. Database Datastores can be created for the sources: • • • •
IBM DB2, Microsoft SQL Server, Oracle, Sybase, and Teradata databases (using native connections) Other databases (using ODBC) A simple memory storage mechanism using a memory Datastore IMS, VSAM, and various additional legacy systems using BusinessObjects Data Services Mainframe Interfaces such as Attunity and IBM Connectors
The specific information that a Datastore contains depends on the connection. When your database or application changes, you must make corresponding changes in the Datastore information in Data Services as these structural changes are not detected automatically. There are three kinds of Datastores: • • •
Database Datastores: provide a simple way to import metadata directly from a RDBMS. Application Datastores: let users easily import metadata from most Enterprise Resource Planning (ERP) systems. Adapter Datastores: can provide access to an application’s data and metadata or just metadata. For example, if the data source is SQL-compatible, the adapter might be designed to access metadata, while Data Services extracts data from or loads data directly to the application.
Using Adapters to define Datastores Adapters provide access to a third-party application’s data and metadata. Depending on the adapter implementation, adapters can provide: • •
Application metadata browsing Application metadata importing into the Data Services repository
For batch and real-time data movement between Data Services and applications, SAP BusinessObjects offers an Adapter Software Development Kit (SDK) to develop your own custom adapters. You can also buy Data Services prepackaged adapters to access application data and metadata in any application. You can use the Data Mart Accelerator for SAP Crystal Reports adapter to import metadata from SAP BusinessObjects Business Intelligence platform. You need to create at least one Datastore for each database file system with which you are exchanging data. To create a Datastore, you must have appropriate access privileges to the database or file system that the Datastore describes.
2011
© 2011 SAP AG. All rights reserved.
27
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
BODS10
Creating a database Datastore 1.
On the Datastores tab of the Local Object Library, right-click the white space and select New from the menu. The Create New Datastore dialog box displays.
2. 3.
4. 5.
In the Datastore name field, enter the name of the new Datastore. The name can contain any alphanumeric characters or underscores, but not spaces. In the Datastore Type drop-down list, ensure that the default value of Database is selected. Note: The values you select for the Datastore type and database type determine the options available when you create a database Datastore. In the Database type drop-down list, select the RDBMS for the data source. Enter the other connection details, as required. Note: If you are using MySQL, any ODBC connection provides access to all of the available MySQL schemas.
6. 7.
Leave the Enable automatic data transfer check box selected. Select OK.
Changing a Datastore definition Like all Data Services objects, Datastores are defined by both options and properties: •
Options control the operation of objects. These include the database server name, database name, user name, and password for the specific database. The Edit Datastore dialog box allows you to edit all connection properties except Datastore name and Datastore type for adapter and application Datastores. For database Datastores, you can edit all connection properties except Datastore name, Datastore type, database type, and database version.
•
28
Properties document the object. For example, the name of the Datastore and the date on which it is created are Datastore properties. Properties are descriptive of the object and do not affect its operation.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Datastores in Data Services
Properties Tab
Description
General
Contains the name and description of the Datastore, if available. The Datastore name appears on the object in the Local Object Library and in calls to the object. You cannot change the name of a Datastore after creation.
Attributes
Includes the date you created the Datastore. This value is not changeable.
Class Attributes
Includes overall Datastore information such as description and date created.
Importing metadata from data sources Data Services determines and stores a specific set of metadata information for tables. You can import metadata by name, searching, and browsing. After importing metadata, you can edit column names, descriptions, and data types. The edits are propagated to all objects that call these objects.
Figure 16: Datastore Metadata
Metadata
Description
Table name
The name of the table as it appears in the database.
Table description
The description of the table.
Column name
The name of the table column.
Column description
The description of the column.
Column data type
The data type for each column. If a column is defined as an unsupported data type (see data types listed below) Data Services converts the data type to one that is supported. In some cases, if Data Services cannot convert the data type, it ignores the column entirely. Supported data types are: BLOB, CLOB, date, datetime, decimal, double, int, interval, long, numeric, real, time, time stamp, and varchar.
2011
© 2011 SAP AG. All rights reserved.
29
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
Primary key column
BODS10
The column that comprises the primary key for the table. After a table has been added to a data flow diagram, these columns is indicated in the column list by a key icon next to the column name.
Table attribute
Information Data Services records about the table such as the date created and date modified if these values are available.
Owner name
Name of the table owner.
You can also import stored procedures from DB2, MS SQL Server, Oracle, and Sybase databases and stored functions and packages from Oracle. You can use these functions and procedures in the extraction specifications you give Data Services. Imported functions and procedures appear in the Function branch of each Datastore tree on the Datastores tab of the Local Object Library. Importing metadata from data sources The easiest way to import metadata is by browsing. Note that functions cannot be imported using this method. To import metadata by browsing: 1.
On the Datastores tab of the Local Object Library, right-click the Datastore and select Open from the menu. The items available to import appear in the workspace.
2.
Navigate to and select the tables for which you want to import metadata. You can hold down the Ctrl or Shift keys and select to select multiple tables.
3.
Right-click the selected items and select Import from the menu. The workspace contains columns that indicate whether the table has already been imported into Data Services (Imported) and if the table schema has changed since it was imported (Changed). To verify whether the repository contains the most recent metadata for an object, right-click the object and select Reconcile.
4. 5.
30
In the Local Object Library, expand the Datastore to display the list of imported objects, organized into Functions, Tables, and Template Tables. To view data for an imported Datastore, right-click a table and select View Data from the menu.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Datastores in Data Services
Exercise 1: Creating Source and Target Datastores Exercise Objectives After completing this exercise, you will be able to: • Create Datastores and import metadata for the Alpha Acquisitions, Beta Businesses, Delta, HR Data Mart and Omega databases
Business Example [Enter a business example that helps the learner understand the practical business use of this exercise.]
Task 1: Start the SAP BusinessObjects Data Services Designer. 1.
Log in to the Data Services Designer.
Task 2: Create Datastores and import metadata for the Alpha Acquisitions, Beta Businesses, Delta, HR_Data Mart and Omega databases
2011
1.
In your Local Object Library, create a new source Datastore for the Alpha Acquisitions database.
2.
In your Local Object Library, create a new source Datastore for the Beta Businesses database.
3.
In your Local Object Library, create a new Datastore for the Delta staging database.
4.
In your Local Object Library, create a new target Datastore for the HR data mart.
5.
In your Local Object Library, create a new target Datastore for the Omega data warehouse.
© 2011 SAP AG. All rights reserved.
31
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
BODS10
Solution 1: Creating Source and Target Datastores Task 1: Start the SAP BusinessObjects Data Services Designer. 1.
Log in to the Data Services Designer. a)
From the Windows Terminal Server (WTS) training environment desktop, use the menu path Start → Programs → SAP Business Objects Data Services 4.0 SP1 → Data Services Designer.
b)
In the dialog box, enter your assigned User ID.
c)
Enter your password which is the same as your User ID.
d)
Select the Log on button.
e)
In the list of repositories, select your repository DSREPO## where ## is the number portion of your User ID.
f)
Select the OK button.
Task 2: Create Datastores and import metadata for the Alpha Acquisitions, Beta Businesses, Delta, HR_Data Mart and Omega databases 1.
In your Local Object Library, create a new source Datastore for the Alpha Acquisitions database. a)
In the Local Object Library, select the Datastores tab and right click the Databases node to select the New option. Note: When you select the Datastores tab you will notice that the CD_DS_d0cafae2 datastore already exists. This is an internal datastore which Data Services uses for executing data quality jobs only. The Data Services Integrator does not use this internal datastore. Do not delete or alter the CD_DS_d0cafae2 datastore in any way.
b)
In the resulting dialog box, use the options: Field
Value
Datastore name
Alpha
Datastore type
Database
Database type
Microsoft SQL Server Continued on next page
32
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Datastores in Data Services
c)
Microsoft SQL Server 2005
Database server name
Supplied by the Instructor
Database name
ALPHA
User name
sourceuser
Password
sourcepass
Import the metadata for the Alpha Acquisitions database source tables by selecting all the tables, right click on them and from the menu, choose the option Import. • • • • • • • • • • •
d)
Database version
source.category source.city source.country source.customer source.department source.employee source.hr_comp_details source.order_details source.orders source.product source.region
View the data for the category table and confirm that there are four records by right clicking on the table in the Local Object Library and choose the option View data.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
33
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
2.
BODS10
In your Local Object Library, create a new source Datastore for the Beta Businesses database. a)
In the Local Object Library, select the Datastores tab and right click the Databases node to select the New option.
b)
In the resulting dialog box, use the options:
c)
Value
Datastore name
Beta
Datastore type
Database
Database type
Microsoft SQL Server
Database version
Microsoft SQL Server 2005
Database server name
Supplied by the Instructor
Database name
BETA
User name
sourceuser
Password
sourcepass
Import the metadata for the Beta database source tables by selecting all the tables, right click on them and from the menu, choose the option Import. • • • • • • • • • • •
d)
Field
source.addresses source.categories source.country source.customer source.employees source.order_details source.orders source.products source.region source.suppliers source.usa_customers
View the data for the usa_customers table and confirm that Jane Hartley from Planview Inc. is the first customer record by right clicking on the table in the Local Object Library and choose the option View data.
Continued on next page
34
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Datastores in Data Services
3.
In your Local Object Library, create a new Datastore for the Delta staging database. a)
In the Local Object Library, select the Datastores tab and right click on the Databases node to select the New option.
b)
In the resulting dialog box, use the options:
c)
Value
Datastore name
Delta
Datastore type
Database
Database type
Microsoft SQL Server
Database version
Microsoft SQL Server 2005
Database server name
Supplied by the Instructor
Database name
DELTA## (where ## is the number from your User ID
User name
student##
Password
student##
Import the metadata for the Alpha Acquisitions database source tables by selecting all the tables, right click and from the menu, choose the option Import. • • • • • • • • • • •
d)
Field
source.addresses source.categories source.country source.customer source.employees source.order_details source.orders source.products source.region source.suppliers source.usa_customers
You do not have to import any metadata.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
35
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
4.
BODS10
In your Local Object Library, create a new target Datastore for the HR data mart. a)
In the Local Object Library, select the Datastores tab and right click the Databases node to select the New option.
b)
In the resulting dialog box, use the options:
c)
Field
Value
Datastore name
HR_datamart
Datastore type
Database
Database type
Microsoft SQL Server
Database version
Microsoft SQL Server 2005
Database server name
Supplied by the Instructor
Database name
HR_DATAMART## (where ## is the number from your User ID)
User name
student##
Password
student##
Import the metadata for the Omega database target tables by selecting all the tables, right click and from the menu, choose the option Import. • • • •
dbo.emp_dept dbo.employee dbo.hr_comp_update dbo.recovery_status
Continued on next page
36
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Datastores in Data Services
5.
In your Local Object Library, create a new target Datastore for the Omega data warehouse. a)
In the Local Object Library, select the Datastores tab and right click the Databases node to select the New option.
b)
In the resulting dialog box, use the options:
c)
Value
Datastore name
Omega
Datastore type
Database
Database type
Microsoft SQL Server
Database version
Microsoft SQL Server 2005
Database server name
Supplied by the Instructor
Database name
OMEGA## (where ## is the number from your User ID)
User name
student##
Password
student##
Import the metadata for the Omega database target tables by selecting all the tables, right click and from the menu, choose the option Import. • • • •
2011
Field
dbo.emp_dim dbo.product_dim dbo.product_target dbo.time_dim
© 2011 SAP AG. All rights reserved.
37
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
BODS10
Lesson Summary You should now be able to: • Crate various types of Datastores
38
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Datastores in Data Services
Lesson: Defining Datastores in Data Services Lesson Overview Using Datastores to help define data movement requirements in Data Services.
Lesson Objectives After completing this lesson, you will be able to: •
Crate various types of Datastores
Business Example You are responsible for extracting data into the company's SAP NetWeaver Business Warehouse system and want to convert to using Data Services as the new data transfer process.
Using Datastores A Datastore provides a connection or multiple connections to data sources such as a database. Using the Datastore connection, Data Services can import the metadata that describes the data from the data source.
Figure 17: Datastore
2011
© 2011 SAP AG. All rights reserved.
39
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
BODS10
Data Services uses these Datastores to read data from source tables or load data to target tables. Each source or target must be defined individually and the Datastore options available depend on which Relational Database Management System (RDBMS) or application is used for the Datastore. Database Datastores can be created for the sources: • • • •
IBM DB2, Microsoft SQL Server, Oracle, Sybase, and Teradata databases (using native connections) Other databases (using ODBC) A simple memory storage mechanism using a memory Datastore IMS, VSAM, and various additional legacy systems using BusinessObjects Data Services Mainframe Interfaces such as Attunity and IBM Connectors
The specific information that a Datastore contains depends on the connection. When your database or application changes, you must make corresponding changes in the Datastore information in Data Services as these structural changes are not detected automatically. There are three kinds of Datastores: • • •
Database Datastores: provide a simple way to import metadata directly from a RDBMS. Application Datastores: let users easily import metadata from most Enterprise Resource Planning (ERP) systems. Adapter Datastores: can provide access to an application’s data and metadata or just metadata. For example, if the data source is SQL-compatible, the adapter might be designed to access metadata, while Data Services extracts data from or loads data directly to the application.
Using Adapters to define Datastores Adapters provide access to a third-party application’s data and metadata. Depending on the adapter implementation, adapters can provide: • •
Application metadata browsing Application metadata importing into the Data Services repository
For batch and real-time data movement between Data Services and applications, SAP BusinessObjects offers an Adapter Software Development Kit (SDK) to develop your own custom adapters. You can also buy Data Services prepackaged adapters to access application data and metadata in any application. You can use the Data Mart Accelerator for SAP Crystal Reports adapter to import metadata from SAP BusinessObjects Business Intelligence platform. You need to create at least one Datastore for each database file system with which you are exchanging data. To create a Datastore, you must have appropriate access privileges to the database or file system that the Datastore describes.
40
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Datastores in Data Services
Creating a database Datastore 1.
On the Datastores tab of the Local Object Library, right-click the white space and select New from the menu. The Create New Datastore dialog box displays.
2. 3.
4. 5.
In the Datastore name field, enter the name of the new Datastore. The name can contain any alphanumeric characters or underscores, but not spaces. In the Datastore Type drop-down list, ensure that the default value of Database is selected. Note: The values you select for the Datastore type and database type determine the options available when you create a database Datastore. In the Database type drop-down list, select the RDBMS for the data source. Enter the other connection details, as required. Note: If you are using MySQL, any ODBC connection provides access to all of the available MySQL schemas.
6. 7.
Leave the Enable automatic data transfer check box selected. Select OK.
Changing a Datastore definition Like all Data Services objects, Datastores are defined by both options and properties: •
Options control the operation of objects. These include the database server name, database name, user name, and password for the specific database. The Edit Datastore dialog box allows you to edit all connection properties except Datastore name and Datastore type for adapter and application Datastores. For database Datastores, you can edit all connection properties except Datastore name, Datastore type, database type, and database version.
•
2011
Properties document the object. For example, the name of the Datastore and the date on which it is created are Datastore properties. Properties are descriptive of the object and do not affect its operation.
© 2011 SAP AG. All rights reserved.
41
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
Properties Tab
BODS10
Description
General
Contains the name and description of the Datastore, if available. The Datastore name appears on the object in the Local Object Library and in calls to the object. You cannot change the name of a Datastore after creation.
Attributes
Includes the date you created the Datastore. This value is not changeable.
Class Attributes
Includes overall Datastore information such as description and date created.
Importing metadata from data sources Data Services determines and stores a specific set of metadata information for tables. You can import metadata by name, searching, and browsing. After importing metadata, you can edit column names, descriptions, and data types. The edits are propagated to all objects that call these objects.
Figure 18: Datastore Metadata
Metadata
Description
Table name
The name of the table as it appears in the database.
Table description
The description of the table.
Column name
The name of the table column.
Column description
The description of the column.
Column data type
The data type for each column. If a column is defined as an unsupported data type (see data types listed below) Data Services converts the data type to one that is supported. In some cases, if Data Services cannot convert the data type, it ignores the column entirely. Supported data types are: BLOB, CLOB, date, datetime, decimal, double, int, interval, long, numeric, real, time, time stamp, and varchar.
42
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Datastores in Data Services
Primary key column
The column that comprises the primary key for the table. After a table has been added to a data flow diagram, these columns is indicated in the column list by a key icon next to the column name.
Table attribute
Information Data Services records about the table such as the date created and date modified if these values are available.
Owner name
Name of the table owner.
You can also import stored procedures from DB2, MS SQL Server, Oracle, and Sybase databases and stored functions and packages from Oracle. You can use these functions and procedures in the extraction specifications you give Data Services. Imported functions and procedures appear in the Function branch of each Datastore tree on the Datastores tab of the Local Object Library. Importing metadata from data sources The easiest way to import metadata is by browsing. Note that functions cannot be imported using this method. To import metadata by browsing: 1.
On the Datastores tab of the Local Object Library, right-click the Datastore and select Open from the menu. The items available to import appear in the workspace.
2.
Navigate to and select the tables for which you want to import metadata. You can hold down the Ctrl or Shift keys and select to select multiple tables.
3.
Right-click the selected items and select Import from the menu. The workspace contains columns that indicate whether the table has already been imported into Data Services (Imported) and if the table schema has changed since it was imported (Changed). To verify whether the repository contains the most recent metadata for an object, right-click the object and select Reconcile.
4. 5.
2011
In the Local Object Library, expand the Datastore to display the list of imported objects, organized into Functions, Tables, and Template Tables. To view data for an imported Datastore, right-click a table and select View Data from the menu.
© 2011 SAP AG. All rights reserved.
43
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
44
© 2011 SAP AG. All rights reserved.
BODS10
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Datastores in Data Services
Exercise 2: Creating Source and Target Datastores Exercise Objectives After completing this exercise, you will be able to: • Create Datastores and import metadata for the Alpha Acquisitions, Beta Businesses, Delta, HR Data Mart and Omega databases
Business Example [Enter a business example that helps the learner understand the practical business use of this exercise.]
Task 1: Start the SAP BusinessObjects Data Services Designer. 1.
Log in to the Data Services Designer.
Task 2: Create Datastores and import metadata for the Alpha Acquisitions, Beta Businesses, Delta, HR_Data Mart and Omega databases
2011
1.
In your Local Object Library, create a new source Datastore for the Alpha Acquisitions database.
2.
In your Local Object Library, create a new source Datastore for the Beta Businesses database.
3.
In your Local Object Library, create a new Datastore for the Delta staging database.
4.
In your Local Object Library, create a new target Datastore for the HR data mart.
5.
In your Local Object Library, create a new target Datastore for the Omega data warehouse.
© 2011 SAP AG. All rights reserved.
45
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
BODS10
Solution 2: Creating Source and Target Datastores Task 1: Start the SAP BusinessObjects Data Services Designer. 1.
Log in to the Data Services Designer. a)
From the Windows Terminal Server (WTS) training environment desktop, use the menu path Start → Programs → SAP Business Objects Data Services 4.0 SP1 → Data Services Designer.
b)
In the dialog box, enter your assigned User ID.
c)
Enter your password which is the same as your User ID.
d)
Select the Log on button.
e)
In the list of repositories, select your repository DSREPO## where ## is the number portion of your User ID.
f)
Select the OK button.
Task 2: Create Datastores and import metadata for the Alpha Acquisitions, Beta Businesses, Delta, HR_Data Mart and Omega databases 1.
In your Local Object Library, create a new source Datastore for the Alpha Acquisitions database. a)
In the Local Object Library, select the Datastores tab and right click the Databases node to select the New option. Note: When you select the Datastores tab you will notice that the CD_DS_d0cafae2 datastore already exists. This is an internal datastore which Data Services uses for executing data quality jobs only. The Data Services Integrator does not use this internal datastore. Do not delete or alter the CD_DS_d0cafae2 datastore in any way.
b)
In the resulting dialog box, use the options: Field
Value
Datastore name
Alpha
Datastore type
Database
Database type
Microsoft SQL Server Continued on next page
46
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Datastores in Data Services
c)
Microsoft SQL Server 2005
Database server name
Supplied by the Instructor
Database name
ALPHA
User name
sourceuser
Password
sourcepass
Import the metadata for the Alpha Acquisitions database source tables by selecting all the tables, right click on them and from the menu, choose the option Import. • • • • • • • • • • •
d)
Database version
source.category source.city source.country source.customer source.department source.employee source.hr_comp_details source.order_details source.orders source.product source.region
View the data for the category table and confirm that there are four records by right clicking on the table in the Local Object Library and choose the option View data.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
47
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
2.
BODS10
In your Local Object Library, create a new source Datastore for the Beta Businesses database. a)
In the Local Object Library, select the Datastores tab and right click the Databases node to select the New option.
b)
In the resulting dialog box, use the options:
c)
Value
Datastore name
Beta
Datastore type
Database
Database type
Microsoft SQL Server
Database version
Microsoft SQL Server 2005
Database server name
Supplied by the Instructor
Database name
BETA
User name
sourceuser
Password
sourcepass
Import the metadata for the Beta database source tables by selecting all the tables, right click on them and from the menu, choose the option Import. • • • • • • • • • • •
d)
Field
source.addresses source.categories source.country source.customer source.employees source.order_details source.orders source.products source.region source.suppliers source.usa_customers
View the data for the usa_customers table and confirm that Jane Hartley from Planview Inc. is the first customer record by right clicking on the table in the Local Object Library and choose the option View data.
Continued on next page
48
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Datastores in Data Services
3.
In your Local Object Library, create a new Datastore for the Delta staging database. a)
In the Local Object Library, select the Datastores tab and right click on the Databases node to select the New option.
b)
In the resulting dialog box, use the options:
c)
Value
Datastore name
Delta
Datastore type
Database
Database type
Microsoft SQL Server
Database version
Microsoft SQL Server 2005
Database server name
Supplied by the Instructor
Database name
DELTA## (where ## is the number from your User ID
User name
student##
Password
student##
Import the metadata for the Alpha Acquisitions database source tables by selecting all the tables, right click and from the menu, choose the option Import. • • • • • • • • • • •
d)
Field
source.addresses source.categories source.country source.customer source.employees source.order_details source.orders source.products source.region source.suppliers source.usa_customers
You do not have to import any metadata.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
49
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
4.
BODS10
In your Local Object Library, create a new target Datastore for the HR data mart. a)
In the Local Object Library, select the Datastores tab and right click the Databases node to select the New option.
b)
In the resulting dialog box, use the options:
c)
Field
Value
Datastore name
HR_datamart
Datastore type
Database
Database type
Microsoft SQL Server
Database version
Microsoft SQL Server 2005
Database server name
Supplied by the Instructor
Database name
HR_DATAMART## (where ## is the number from your User ID)
User name
student##
Password
student##
Import the metadata for the Omega database target tables by selecting all the tables, right click and from the menu, choose the option Import. • • • •
dbo.emp_dept dbo.employee dbo.hr_comp_update dbo.recovery_status
Continued on next page
50
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Datastores in Data Services
5.
In your Local Object Library, create a new target Datastore for the Omega data warehouse. a)
In the Local Object Library, select the Datastores tab and right click the Databases node to select the New option.
b)
In the resulting dialog box, use the options:
c)
Value
Datastore name
Omega
Datastore type
Database
Database type
Microsoft SQL Server
Database version
Microsoft SQL Server 2005
Database server name
Supplied by the Instructor
Database name
OMEGA## (where ## is the number from your User ID)
User name
student##
Password
student##
Import the metadata for the Omega database target tables by selecting all the tables, right click and from the menu, choose the option Import. • • • •
2011
Field
dbo.emp_dim dbo.product_dim dbo.product_target dbo.time_dim
© 2011 SAP AG. All rights reserved.
51
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
BODS10
Lesson Summary You should now be able to: • Crate various types of Datastores
52
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Data Services System Configurations
Lesson: Defining Data Services System Configurations Lesson Overview To define system configurations to enable different connections for a multi-use environment (development, test and production).
Lesson Objectives After completing this lesson, you will be able to: •
Define system configurations in Data Services
Business Example You are responsible for extracting data into the company's SAP NetWeaver Business Warehouse system and want to convert to using Data Services as the new data transfer process. To support a multiuse environment (development, test and production), you want to know how to create system configurations.
Using Data Services system configurations Data Services supports multiple Datastore configurations, which allow you to change your Datastores depending on the environment in which you are working. A configuration is a property of a Datastore that refers to a set of configurable options (such as database connection name, database type, user name, password, and locale) and their values. When you create a Datastore, you can specify one Datastore configuration at a time and specify one as the default. Data Services uses the default configuration to import metadata and execute jobs. You can create additional Datastore configurations using the Advanced option in the Datastore editor. You can combine multiple configurations into a system configuration that is selectable when executing or scheduling a job. Multiple configurations and system configurations make portability of your job much easier (for example, different connections for development, test, and production environments). When you add a new configuration, Data Services modifies the language of data flows that contain table targets and SQL transforms in the Datastore based on what you defined in the new configuration.
2011
© 2011 SAP AG. All rights reserved.
53
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
BODS10
Lesson Summary You should now be able to: • Define system configurations in Data Services
54
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining a Data Services Flat File Format
Lesson: Defining a Data Services Flat File Format Lesson Overview Using flat file formats to create Datastores to help define data movement requirements in Data Services.
Lesson Objectives After completing this lesson, you will be able to: •
Defining flat file formats as a basis for a Datastore
Business Example You are responsible for extracting flat file data into the company's SAP NetWeaver Business Warehouse system and want to convert to using Data Services as the new data transfer process. You need to know how to create flat file formats as the basis for creating a Datastore.
Defining file formats for flat files File formats are connections to flat files in the same way that Datastore are connections to databases. Explaining file formats A file format is a generic description that can be used to describe one file or multiple data files if they share the same format. It is a set of properties describing the structure of a flat file (ASCII). File formats are used to connect to source or target data when the data is stored in a flat file. The Local Object Library stores file format templates that you use to define specific file formats as sources and targets in data flows.
2011
© 2011 SAP AG. All rights reserved.
55
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
BODS10
Figure 19: File Format Editor
File format objects can describe files in: • • •
Delimited format — delimiter characters such as commas or tabs separate each field. Fixed width format — the fixed column width is specified by the user. SAP ERP format — this is used with the predefined Transport_Format or with a custom SAP ERP format.
Creating file formats Use the file format editor to set properties for file format templates and source and target file formats. The file format editor has three work areas: • • •
Property Value: Edit file format property values. Expand and collapse the property groups by clicking the leading plus or minus. Column Attributes: Edit and define columns or fields in the file. Field-specific formats override the default format set in the Properties-Values area. Data Preview: View how the settings affect sample data.
The properties and appearance of the work areas vary with the format of the file. Date formats In the Property Values work area, you can override default date formats for files at the field level. The data format codes can be used:
56
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining a Data Services Flat File Format
Code
Description
DD
2-digit day of the month
MM
2-digit month
MONTH
Full name of the month
MON
3-character name of the month
YY
2-digit year
YYYY
4-digit year
HH24
2-digit hour of the day (0-23)
MI
2-digit minute (0-59)
SS
2-digit second (0-59)
FF
Up to 9-digit subseconds
To create a new file format •
On the Formats tab of the Local Object Library, right-click Flat Files and select New from the menu to open the File Format Editor. To make sure your file format definition works properly,finish inputting the values for the file properties before moving on to the Column Attributes work area.
•
In the Type field, specify the file type: – –
Delimited: select this file type if the file uses a character sequence to separate columns. Fixed width: select this file type if the file uses specified widths for each column.
If a fixed-width file format uses a multibyte code page, then no data is displayed in the Data Preview section of the file format editor for its files. •
In the Name field, enter a name that describes this file format template. Once the name has been created, it cannot be changed. If an error is made, the file format must be deleted and a new format created.
•
Specify the location information of the data file including Location, Root directory, and File name. The Group File Read can read multiple flat files with identical formats with a single file format. By substituting a wild card character or list of file names for the single file name, multiple files can be read.
•
2011
Select Yes to overwrite the existing schema.
© 2011 SAP AG. All rights reserved.
57
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
BODS10
This happens automatically when you open a file. •
Complete the other properties to describe files that this template represents. Overwrite the existing schema as required. For source files, specify the structure of each column in the Column Attributes work area:
•
Column
Description
Field Name
Enter the name of the column.
Data Type
Select the appropriate data type from the dropdown list.
Field Size
For columns with a data type of varchar, specify the length of the field.
Precision
For columns with a data type of decimal or numeric, specify the precision of the field. For columns with a data type of decimal or numeric, specify the scale of the field. For columns with any data type but varchar, select a format for the field, if desired. This information overrides the default format set in the Property Values work area for that data type.
Scale Format
You do not need to specify columns for files used as targets. If you do specify columns and they do not match the output schema from the preceding transform, Data Services writes to the target file using the transform’s output schema. For a decimal or real data type, if you only specify a source column format. If the column names and data types in the target schema do not match those in the source schema, Data Services cannot use the source column format specified. Instead, it defaults to the format used by the code page on the computer where the Job Server is installed. • •
Select Save & Close to save the file format and close the file format editor. In the Local Object Library, right-click the file format and select View Data from the menu to see the data.
To create a file format from an existing file format 1.
On the Formats tab of the Local Object Library, right-click an existing file format and select Replicate. The File Format Editor opens, displaying the schema of the copied file format.
2.
58
In the Name field, enter a unique name for the replicated file format.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining a Data Services Flat File Format
Data Services does not allow you to save the replicated file with the same name as the original (or any other existing File Format object). After it is saved, you cannot modify the name again. 3. 4.
Edit the other properties as desired. Select Save & Close to save the file format and close the file format editor.
To read multiple flat files with identical formats with a single file format 1.
On the Formats tab of the Local Object Library, right-click an existing file format and select Edit from the menu. The format must be based on one single file that shares the same schema as the other files.
2.
In the location field of the format wizard, enter one of: • • •
Root directory (optional to avoid retyping) List of file names, separated by commas File name containing a wild character (*)
When you use the (*) to call the name of several file formats, Data Services reads one file format, closes it and then proceeds to read the next one. For example, if you specify the file name revenue*.txt, Data Services reads all flat files starting with revenue in the file name. There are new unstructured_text and unstructured_binary file reader types for reading all files in a specific folder as long/BLOB records. There is also an option for trimming fixed width files.
2011
© 2011 SAP AG. All rights reserved.
59
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
BODS10
Figure 20: File Reader Enhancements.
Handling errors in file formats One of the features available in the File Format Editor is error handling.
Figure 21: Flat File Error Handling
60
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining a Data Services Flat File Format
When you enable error handling for a file format, Data Services: •
Checks for the two types of flat-file source errors: –
• •
Datatype conversion errors. For example, a field might be defined in the File Format Editor as having a data type of integer but the data encountered is actually varchar. – Row-format errors. For example, in the case of a fixed-width file, Data Services identifies a row that does not match the expected width value. Stops processing the source file after reaching a specified number of invalid rows. Logs errors to the Data Services error log. You can limit the number of log entries allowed without stopping the job.
You can choose to write rows with errors to an error file, which is a semicolon-delimited text file that you create on the same machine as the Job Server. Entries in an error file have this syntax: source file path and name; row number in source file; Data Services error; column number where the error occurred; all columns from the invalid row To enable flat file error handling in the File Format Editor 1. 2. 3. 4.
On the Formats tab of the Local Object Library, right-click the file format and select Edit from the menu. Under the Error handling section, in the Capture data conversion errors dropdown list, select Yes. In the Capture row format errors dropdown list, select Yes. In the Write error rows to file dropdown list, select Yes. You can also specify the maximum warnings to log and the maximum errors before a job is stopped.
5. 6. 7.
2011
In the Error file root directory field, select the folder icon to browse to the directory in which you have stored the error handling text file you created. In the Error file name field, enter the name for the text file you created to capture the flat file error logs in that directory. Select Save & Close.
© 2011 SAP AG. All rights reserved.
61
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
62
© 2011 SAP AG. All rights reserved.
BODS10
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining a Data Services Flat File Format
Exercise 3: Creating a Flat File format Exercise Objectives After completing this exercise, you will be able to: • Create a file format for the orders flat file so you can use them as source objects for extraction
Business Example In addition to the main databases for source information, records for some of the orders for Alpha Acquisitions are stored in flat files. You need to extract data from these flat files and you want to create the appropriate file format for the extraction.
Task: Create a file format for the orders flat file so you can use them as source objects for extraction.
2011
1.
Create a file format Orders_Format,for the orders flat file so you can use them as source objects for extraction.
2.
Adjust the datatypes for the columns proposed by the Designer based on their content.
© 2011 SAP AG. All rights reserved.
63
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
BODS10
Solution 3: Creating a Flat File format Task: Create a file format for the orders flat file so you can use them as source objects for extraction. 1.
Create a file format Orders_Format,for the orders flat file so you can use them as source objects for extraction. a)
In the Local Objects Library, select the tab File Formats.
b)
Right click the Flat File node and choose the option New.
c)
Enter Orders_Format as the format name.
d)
To select the source directory, click the folder icon to select My Documents → BODS10 → Activity_Source.
e)
To select the appropriate file, click the file icon to select the source file orders_12_21_06.txt.
f)
Change the value of the column delimiter to a semicolon by typing in a semicolon.
g)
Change the row delimited by clicking in the value for this property and using the drop-down box to choose the value Windows new line.
h)
Change the date format by typing in the value yyyy.mm.dd.
i)
Set the value for skipping the row header to 1.
Continued on next page
64
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining a Data Services Flat File Format
2.
Adjust the datatypes for the columns proposed by the Designer based on their content. a)
2011
In the Column Attributes pane, change the following field datatypes: Column
Datatype
ORDERID
int
EMPLOYEEID
varchar(15)
ORDERDATE
date
CUSTOMERID
int
COMPANYNAME
varchar(50)
CITY
varchar(50)
COUNTRY
varchar(50)
b)
In the Column Attributes, change the format of the ORDERDATE field to dd-mon-yyyy.
c)
Click the button Save and close.
d)
Right click your new file format Orders_Format and choose the option View data.
e)
Verify that order 11196 was placed on December 21, 2006.
© 2011 SAP AG. All rights reserved.
65
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
BODS10
Lesson Summary You should now be able to: • Defining flat file formats as a basis for a Datastore
66
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Datastore Excel File Formats
Lesson: Defining Datastore Excel File Formats Lesson Overview Lesson Objectives After completing this lesson, you will be able to: •
Create a Data Services Excel file format
Business Example It is possible to connect to Excel workbooks natively as a source, without an ODBC connection setup and configuration needed. You want to select specific data in the workbook using custom ranges or auto detect. You also want to learn how to specify variables for file and sheet names for more flexibility.
Defining file formats for Excel files You can create file formats for Excel files in the same way that you would for flat files. It is possible to connect to Excel workbooks natively as a source, with no ODBC connection setup and configuration needed. You can select specific data in the workbook using custom ranges or auto-detect, and you can specify variable for file and sheet names for more flexibility.
2011
© 2011 SAP AG. All rights reserved.
67
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
BODS10
Figure 22: Excel File Format Editor 1
As with file formats and Datastores, these Excel formats show up as sources in impact and lineage analysis reports.
68
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Datastore Excel File Formats
Figure 23: Excel File Format Editor 2
2011
© 2011 SAP AG. All rights reserved.
69
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
BODS10
To import and configure an Excel source •
On the Formats tab of the Local Object Library, right-click Excel Workbooks and select New from the menu. The Import Excel Workbook dialog box displays.
•
In the Format name field, enter a name for the format. The name may contain underscores but not spaces.
• • • • •
On the Format tab, select the drop-down button beside the Directory field and select . Navigate to and select a new directory, and then select OK. Select the drop-down button beside the File name field and select Navigate to and select an Excel file, and then click Open. To select data in the workbook: –
•
Select the Named range radio button and enter a value in the field provided. – Select the Worksheet radio button and then select the All fields radio button. – Select the Worksheet radio button and the Custom range radio button, select the ellipses (...) button, select the cells, and close the Excel file by clicking X in the top right corner of the worksheet. If required, select the Extend range checkbox. The Extend range checkbox provides a means to extend the spreadsheet when additional rows of data appear at a later time. If this checkbox is checked, at execution time, Data Services searches row by row until a null value row is reached. All rows above the null value row are included.
•
If applicable, select the Use first row values as column names option. If this option is selected, field names are based on the first row of the imported Excel sheet.
•
Click Import schema. The schema is displayed at the top of the dialog box.
•
Specify the structure of each column: Column
70
Description
Field Name
Enter the name of the column.
Data Type
Select the appropriate datatype from the drop-down list.
Field Size
For columns with a datatype of varchar, specify the length of the field.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Datastore Excel File Formats
Precision Scale Description •
For columns with a datatype of decimal or numeric, specify the precision of the field. For columns with a datatype of decimal or numeric, specify the scale of the field. If desired, enter a description of the column.
If required, on the Data Access tab, enter any changes that are required. The Data Access tab provides options to retrieve the file via FTP or execute a custom application (such as unzipping a file) before reading the file.
•
Select OK. The newly imported file format appears in the Local Objects Library with the other Excel workbooks. The sheet is now available to be selected for use as a native data source.
2011
© 2011 SAP AG. All rights reserved.
71
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
72
© 2011 SAP AG. All rights reserved.
BODS10
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Datastore Excel File Formats
Exercise 4: Creating an Excel File Format Exercise Objectives After completing this exercise, you will be able to: • Create a file format to enable you to use the compensation spreadsheet as a source object
Business Example Compensation information in the Alpha Acquisitions database is stored in an Excel spreadsheet. To use the information in data flows, you need to create a file format for this Excel file.
Task: Create a file format to enable you to use the compensation spreadsheet as a source object 1.
2011
In the Local Object Library create a new file format for an Excel workbook called Comp_HR.
© 2011 SAP AG. All rights reserved.
73
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
BODS10
Solution 4: Creating an Excel File Format Task: Create a file format to enable you to use the compensation spreadsheet as a source object 1.
In the Local Object Library create a new file format for an Excel workbook called Comp_HR. a)
In the Local Object Library, select the File Format tab.
b)
Right click the Excel node to select the option New.
c)
Use Comp_HR as the name for the Excel format.
d)
To change the source folder, click the folder icon and navigate to the folder My Documents → BOI300 → Activity_Source.
e)
To select the Excel file, click the file icon to select the file Comp_HR.xls.
f)
Select the Worksheet radio button.
g)
From the Worksheet drop-down list, select the Comp_HR worksheet.
h)
Click the Elllipses (...) button.
i)
Select all the cells that contain data including the first row (header row) and close the spreadsheet Hint: There should be approximately 286 rows.
74
j)
Click the check box option to Extend the range.
k)
Use the first row values for the column names.
l)
Import the schema and adjust the datatypes for the columns: Column
Datatype
EmployeeID
varchar(10)
Emp_Salary
int
Emp_Bonus
int
Emp_VacationDays
int
date_updated
datetime
m)
Save the format.
n)
Right click your format Comp_HR and choose the option View data. Continued on next page
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining Datastore Excel File Formats
o)
2011
Confirm that employee 2Lis5 has 16 vacation days.
© 2011 SAP AG. All rights reserved.
75
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
BODS10
Lesson Summary You should now be able to: • Create a Data Services Excel file format
76
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining XML Formats
Lesson: Defining XML Formats Lesson Overview Data Services allows you to import and export metadata for XML documents that you can use as sources or targets in jobs..
Lesson Objectives After completing this lesson, you will be able to: • •
Import data from XML documents Unnest data in XML documents
Business Example Data included in XML files is a source of data for your data warehouse. Data Services can import XML data once an XML format has been created in the Local Object Library. You need to know how to describe this data using either a document type definition (.dtd) or XML Schema (.xsd). Due to the hierarchical nature of XML data, you need to know how to unnest XML data during the extraction process.
Defining file formats for XML files Data Services allows you to import and export metadata for XML documents that you can use as sources or targets in jobs. XML documents are hierarchical and the set of properties describing their structure is stored in separate format files. These format files describe the data contained in the XML document and the relationships among the data elements, the schema. The format of an XML file or message (.xml) can be specified using either a document type definition (.dtd) or XML Schema (.xsd). Data flows can read and write data to messages or files based on a specified DTD format or XML Schema. You can use the same DTD format or XML Schema to describe multiple XML sources or targets. Data Services uses Nested Relational Data Modeling (NRDM) to structure imported metadata from format documents, such as xsd or dtd files, into an internal schema to use for hierarchical documents. Importing metadata from a DTD file As an example, an XML document that contains information to place a sales order, such as order header, customer, and line items, the corresponding DTD includes the order structure and the relationship between the data elements.
2011
© 2011 SAP AG. All rights reserved.
77
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
BODS10
You can import metadata from either an existing XML file (with a reference to a DTD) or a DTD file. If you import the metadata from an XML file, Data Services automatically retrieves the DTD for that XML file. When importing a DTD format, Data Services reads the defined elements and attributes, and ignores other parts, such as text and comments, from the file definition. This allows you to modify imported XML data and edit the datatype as needed. Importing metadata from an XML schema For an XML document that contains, for example, information to place a sales order, such as order header, customer, and line items, the corresponding XML schema includes the order structure and the relationship between the data as shown:
Figure 24: Importing Metadata from an XML Schema
When importing an XML Schema, Data Services reads the defined elements and attributes, and imports: • • • •
Document structure Table and column names Datatype of each column Nested table and column attributes
Note: While XML Schemas make a distinction between elements and attributes, Data Services imports and converts them all to nested table and column attributes. Nested data in XML files
78
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining XML Formats
Sales orders are presented using nested data. For example, the line items in a sales order are related to a single header and are represented using a nested schema. Each row of the sales order data set contains a nested line item schema as shown:
Figure 25: Example of Nested Data 1
Using the nested data method can be more concise (no repeated information), and can scale to present a deeper level of hierarchical complexity.
Figure 26: Example of Nested Data 2
2011
© 2011 SAP AG. All rights reserved.
79
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
BODS10
To expand on the example above, columns inside a nested schema can also contain columns. There is a unique instance of each nested schema for each row at each level of the relationship as shown: Generalizing further with nested data, each row at each level can have any number of columns containing nested schemas. Data Services maps nested data to a separate schema implicitly related to a single row and column of the parent schema. This mechanism is called Nested Relational Data Modeling (NRDM). NRDM provides a way to view and manipulate hierarchical relationships within data flow sources, targets, and transforms. In Data Services, you can see the structure of nested data in the input and output schemas of sources, targets, and transforms in data flows. Unnesting data Loading a data set that contains nested schemas into a relational target requires that the nested rows be unnested. For example, a sales order may use a nested schema to define the relationship between the order header and the order line items. To load the data into relational schemas, the multilevels of data must be unnested. Unnesting a schema produces a cross-product of the top-level schema (parent) and the nested schema (child).
Figure 27: Unnesting Data 1
80
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining XML Formats
You can also load different columns from different nesting levels into different schemas. For example, a sales order can be flattened so that the order number is maintained separately with each line-item and the header and line-item information are loaded into separate schemas.
Figure 28: Unnesting Data 2
Data Services allows you to unnest any number of nested schemas at any depth. No matter how many levels are involved, the result of unnesting schemas is a cross product of the parent and child schemas. When two or more levels of unnesting occur, the inner-most child is unnested first, then the result—the cross product of the parent and the inner-most child is then unnested from its parent, and continuing to the top-level schema.
Figure 29: Unnesting Data 3
2011
© 2011 SAP AG. All rights reserved.
81
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 2: Defining Source and Target Metadata
BODS10
Keep in mind that unnesting all schemas to create a cross product of all data might not produce the results you intend. For example, if an order includes multiple customer values such as ship-to and bill-to addresses, flattening a sales order by unnesting customer and line-item schemas produces rows of data that might not be useful for processing the order.
82
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Defining XML Formats
Lesson Summary You should now be able to: • Import data from XML documents • Unnest data in XML documents
2011
© 2011 SAP AG. All rights reserved.
83
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit Summary
BODS10
Unit Summary You should now be able to: • Crate various types of Datastores • Crate various types of Datastores • Define system configurations in Data Services • Defining flat file formats as a basis for a Datastore • Create a Data Services Excel file format • Import data from XML documents • Unnest data in XML documents
84
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Test Your Knowledge
Test Your Knowledge 1.
What is the difference between a Datastore and a repository?
2.
What are the two methods in which metadata can be manipulated in Data Services objects and what does each of these do?
3.
Which is not a Datastore type? Choose the correct answer(s).
□ □ □ □
2011
A B C D
Database? Application Adapter File Format
4.
What is the difference between a repository and a Datastore?
5.
What is the difference between a Datastore and a repository?
6.
What are the two methods in which metadata can be manipulated in Data Services objects and what does each of these do?
© 2011 SAP AG. All rights reserved.
85
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Test Your Knowledge
7.
BODS10
Which is not a Datastore type? Choose the correct answer(s).
□ □ □ □ 8.
86
A B C D
Database? Application Adapter File Format
What is the difference between a repository and a Datastore?
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Test Your Knowledge
Answers 1.
What is the difference between a Datastore and a repository? Answer: A Datastore is a connection to a database.
2.
What are the two methods in which metadata can be manipulated in Data Services objects and what does each of these do? Answer: You can use an object's options and properties settings to manipulate Data Services objects. Options control the operation of objects. Properties document the object.
3.
Which is not a Datastore type? Answer: D The File Format is used to create a Datastore and is not a type.
4.
What is the difference between a repository and a Datastore? Answer: A repository is a set of tables that hold system objects, source and target metadata, and transformation rules. A Datastore is an actual connection to a database that holds data.
5.
What is the difference between a Datastore and a repository? Answer: A Datastore is a connection to a database.
6.
What are the two methods in which metadata can be manipulated in Data Services objects and what does each of these do? Answer: You can use an object's options and properties settings to manipulate Data Services objects. Options control the operation of objects. Properties document the object.
7.
Which is not a Datastore type? Answer: D The File Format is used to create a Datastore and is not a type.
2011
© 2011 SAP AG. All rights reserved.
87
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Test Your Knowledge
8.
BODS10
What is the difference between a repository and a Datastore? Answer: A repository is a set of tables that hold system objects, source and target metadata, and transformation rules. A Datastore is an actual connection to a database that holds data.
88
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 3 Creating Batch Jobs Unit Overview A data flow defines how information is moved from source to target. These data flows are organized into executable jobs, which are grouped into projects.
Unit Objectives After completing this unit, you will be able to: • • • •
Create a project Create and execute a job Create a data flow with source and target tables Use the Query transform
Unit Contents Lesson: Creating Batch Jobs................................................... 90 Exercise 5: Creating a Basic Data Flow .................................107
2011
© 2011 SAP AG. All rights reserved.
89
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 3: Creating Batch Jobs
BODS10
Lesson: Creating Batch Jobs Lesson Overview Once metadata has been imported for your datastores, you can create data flows to define data movement requirements. Data flows consist of a source and a target connected with a transform. Data flows can then be placed into a workflow as an optional object. Data flows must be placed in a job for execution.
Lesson Objectives After completing this lesson, you will be able to: • • • •
Create a project Create and execute a job Create a data flow with source and target tables Use the Query transform
Business Example Your company would like to set up reporting on sales and purchasing data from your SAP source system in SAP NetWeaver Business Warehouse. The extraction of data with a Data Integrator data flow is the first step to securing this data for reporting. You need to know how to build a data flow with source and target tables and a simple Query transform. You also need to know how to execute this data flow manually prior to scheduling.
Working with Data Integrator Objects Data flows define how information is moved from a source to a target. Data flows are organized into executable jobs, which are grouped into projects.
90
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Creating Batch Jobs
Figure 30: Data Services Project Area
Creating a project A project is a single-use object that allows you to group jobs. It is the highest level of organization offered by SAP BusinessObjects Data Services. Opening a project makes one group of objects easily accessible in the user interface. Only one project can be open at a time.
Figure 31: Data Services Project
A project is used solely for organizational purposes. For example, you can use a project to group jobs that have schedules that depend on one another or that you want to monitor together. The objects in a project appear hierarchically in the project area in Designer. If a plus sign appears next to an object, you can expand it to view the lower-level objects. Creating a job
2011
© 2011 SAP AG. All rights reserved.
91
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 3: Creating Batch Jobs
BODS10
A job is the only executable object in Data Services. When you are developing your data flows, you can manually execute and test jobs directly in Data Services. In production, you can schedule batch jobs and set up real-time jobs as services that execute a process when Data Services receives a message request.
Figure 32: Data Services Job
A job is made up of steps that are executed together. Each step is represented by an object icon that you place in the workspace to create a job diagram. A job diagram is made up of two or more objects connected together. You can include any of these objects in a job definition: • • • • • •
Work flows Scripts Conditionals While loops Try/catch blocks Data flows – – –
Source objects Target objects Transforms
If a job becomes complex, you can organize its content into individual work flows, and then create a single job that calls those work flows. Hint: Follow the recommended consistent naming conventions to facilitate object identification across all systems in your enterprise. Adding, connecting, and deleting objects in the workspace After creating a job, you can add objects to the job workspace area using either the Local Object Library or the tool palette.
92
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Creating Batch Jobs
To add objects from the Local Object Library to the workspace 1. 2.
In the Local Object Library, select the tab for the type of object you want to add. Select and drag the selected object on to the workspace.
To add objects from the tool palette to the workspace, in tool palette, select the desired object, move the cursor to the workspace, and then select the workspace to add the object. Creating a work flow A work flow is an optional object that defines the decision-making process for executing other objects.
Figure 33: Data Services Workflow
For example, elements in a work flow can determine the path of execution based on a value set by a previous job or can indicate an alternative path if something goes wrong in the primary path. Ultimately, the purpose of a work flow is to prepare for executing data flows and to set the state of the system after the data flows are complete. Note: Jobs are just work flows that can be executed. Almost all of the features documented for work flows also apply to jobs. Work flows can contain data flows, conditionals, while loops, try/catch blocks, and scripts. They can also call other work flows, and you can nest calls to any depth. A work flow can even call itself. To connect objects in the workspace area, select and drag from the triangle or square of an object to the triangle or square of the next object in the flow to connect the objects. To disconnect objects in the workspace area, select the connecting line between the objects and press Delete
2011
© 2011 SAP AG. All rights reserved.
93
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 3: Creating Batch Jobs
BODS10
Defining the order of execution in work flows The connections you make between the icons in the workspace determine the order in which work flows execute, unless the jobs containing those work flows execute in parallel. Steps in a work flow execute in a sequence from left to right. You must connect the objects in a work flow when there is a dependency between the steps. To execute more complex work flows in parallel, you can define each sequence as a separate work flow, and then call each of the work flows from another work flow, as in this example: You can specify a job to execute a particular work flow or data flow once only. If you specify that it should be executed only once, Data Services only executes the first occurrence of the work flow or data flow, and skips subsequent occurrences in the job. You might use this feature when developing complex jobs with multiple paths, such as jobs with try/catch blocks or conditionals, and you want to ensure that Data Services only executes a particular work flow or data flow one time.
Creating a data flow Data flows contain the source, transform, and target objects that represent the key activities in data integration and data quality processes. Using data flows Data flows determine how information is extracted from sources, transformed, and loaded into targets. The lines connecting objects in a data flow represent the flow of data with data integration and data quality processes.
Figure 34: Data Services Dataflow
Each icon you place in the data flow diagram becomes a step in the data flow. The objects that you can use as steps in a data flow are source and target objects and transforms. The connections you make between the icons determine the order in which Data Services completes the steps. Using data flows as steps in work flows
94
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Creating Batch Jobs
Each step in a data flow, up to the target definition, produces an intermediate result. For example, the results of a SQL statement contain a WHERE clause that flows to the next step in the data flow. The intermediate result consists of a set of rows from the previous operation and the schema in which the rows are arranged. This result is called a data set. This data set may, in turn, be further filtered and directed into yet another data set. Data flows are closed operations, even when they are steps in a work flow. Any data set created within a data flow is not available to other steps in the work flow. A work flow does not operate on data sets and cannot provide more data to a data flow; however, a work flow can: • • •
Call data flows to perform data movement operations. Define the conditions appropriate to run data flows. Pass parameters to and from data flows.
Changing data flow properties You can specify these advanced data properties for a data flow: Data Flow Property
Execute only once
Use database links
Degree of parallelism
Description When you specify that a data flow should only execute once, a batch job will never re-execute that data flow after the data flow completes successfully. Even if the data flow is contained in a work flow that is a recovery unit that re-executes. You should not select this option if the parent work flow is a recovery unit. Database links are communication paths between one database server and another. Database links allow local users to access data on a remote database, which can be on the local or a remote computer of the same or different database type. Degree of parallelism (DO) is a property of a data flow that defines how many times each transform within a data flow replicates to process a parallel subset of data. You can cache data to improve performance of operations such as joins, groups, sorts, filtering, lookups, and table comparisons. Select one of these values: •
Cache type •
2011
In Memory: Choose this value if your data flow processes a small amount of data that can fit in the available memory. Pageable: Choose this value if you want to return only a subset of data at a time to limit the resources required. This is the default.
© 2011 SAP AG. All rights reserved.
95
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 3: Creating Batch Jobs
BODS10
Explaining source and target objects A data flow directly reads data from source objects and loads data to target objects. Before you can add source and target objects to a data flow, you must first create the datastore and import the table metadata for any databases, or create the file format for flat files. Object
Description
Type
Table
A file formatted with columns and rows as used in relational databases.
Source and target
Template table
A template table that has been created and saved in another data flow (used in development).
Source and target
File
A delimited or fixed-width flat file.
Source and target
Document
Source and target A file with an application-specific format (not readable by SQL or XML parser).
XML file
A file formatted with XML tags.
Source and target
XML message
A source in real-time jobs.
Source only
XML template file
An XML file whose format is based on the preceding transform output (used in development, primarily for debugging data flows).
Target only
Transform
A prebuilt set of operations that can create new data, such as the Date Generation transform.
Source only
Using the query transform The Query transform is the most commonly-used transform, and is included in most data flows. It enables you to select data from a source and filter or reformat it as it moves to the target.
96
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Creating Batch Jobs
Figure 35: Query Transform
Describing the transform editor The transform editor is a graphical interface for defining the properties of transforms. The workspace can contain these areas: • • •
Input schema area Output schema area Parameters area
Figure 36: Query Transform Editor
The input schema area displays the schema of the input data set. For source objects and some transforms, this area is not available. The output schema area displays the schema of the output data set, including any functions. For template tables, the output schema can be defined based on your preferences. For any data that needs to move from source to target, a relationship must be defined between the input and output schemas. To create this relationship, you must map each input column to the corresponding output column. Below the input and output schema areas is the parameters area. The options available on this tab differs based on which transform or object you are modifying. Explaining the Query transform The Query transform is used so frequently that it is included in the tool palette with other standard objects. It retrieves a data set that satisfies conditions that you specify, similar to a SQL SELECT statement.
2011
© 2011 SAP AG. All rights reserved.
97
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 3: Creating Batch Jobs
BODS10
The Query transform can perform these operations: • • • • • • •
Filter the data extracted from sources. Join data from multiple sources. Map columns from input to output schemas. Perform transformations and functions on the data. Perform data nesting and unnesting. Add new columns, nested schemas, and function results to the output schema. Assign primary keys to output columns.
In the past, you needed three tabs to define joins: FROM (tables), WHERE (join conditions) and OUTER JOIN. Now all information is combined in one tab (FROM). The WHERE tab is still there and is still used for real restrictions/filters.
Figure 37: Query Transform Editor for Joins
Join Ranks and Cache settings were moved to the FROM tab. In addition, outer and inner joins can appear in the same FROM clause.
98
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Creating Batch Jobs
Figure 38: Query Transform: Join Ranks and Cache Settings
For example, you could use the Query transform to select a subset of the data in a table to show only those records from a specific region. The next section gives a brief description the function, data input requirements, options, and data output results for the Query transform. Input/Output The data input is a data set from one or more sources with rows flagged with a NORMAL operation code. The NORMAL operation code creates a new row in the target. All the rows in a data set are flagged as NORMAL when they are extracted by a source table or file. If a row is flagged as NORMAL when loaded into a target table or file, it is inserted as a new row in the target. The data output is a data set based on the conditions you specify and using the schema specified in the output schema area. Note: When working with nested data from an XML file, you can use the Query transform to unnest the data using the right-click menu for the output schema, which provides options for unnesting. The input schema area displays all schemas input to the Query transform as a hierarchical tree. Each input schema can contain multiple columns. Output schema area displays the schema output from the Query transform as a hierarchical tree. The output schema can contain multiple columns and functions. The parameters area of the Query transform includes these tabs:
2011
© 2011 SAP AG. All rights reserved.
99
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 3: Creating Batch Jobs
BODS10
Tab
Description
Mapping
Specify how the selected output column is derived.
SELECT
Select only distinct rows (discarding any duplicate rows).
FROM
Specify the input schemas used in the current output schema.
OUTER JOIN
Specify an inner table and an outer table for joins that you want treated as outer joins.
WHERE
Set conditions that determine which rows are output.
GROUP BY
Specify a list of columns for which you want to combine output. For each unique set of values in the group by list, Data Services combines or aggregates the values in the remaining columns.
ORDER BY
Specify the columns you want used to sort the output data set. Create separate sub data flows to process any of these resource-intensive query clauses:
Advanced
Find
• • • •
DISTINCT GROUP BY JOIN ORDER BY
Search for a specific work or item in the input schema or the output schema.
To map input columns to output columns •
In the transform editor, do one of these actions: – – – –
–
100
Drag and drop a single column from the input schema area into the output schema area Drag a single input column over the corresponding output column, release the cursor, and select Remap Column from the menu. Select multiple input columns (using Ctrl+click or Shift+click) and drag onto the Query output schema for automatic mapping. Select the output column and manually enter the mapping on the Mapping tab in the parameters area. You can either type the column name in the parameters area or select and drag the column from the input schema pane. Select the output column, highlight and manually delete the mapping on the Mapping tab in the parameters area.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Creating Batch Jobs
Using target tables The target object for your data flow can be either a physical table or file, or a template table. When your target object is a physical table in a database, the target table editor opens in the workspace with different tabs where you can set database type properties, table loading options, and tuning techniques for loading a job. Note: Most of the tabs in the target table editor focus on migration or performance-tuning techniques, which are outside the scope of this course. You can set these table loading options in the Options tab of the target table editor: Option Rows per commit
Description Specifies the transaction size in number of rows. Specifies how the input columns are mapped to output columns. There are two options: •
Column comparison
Delete data from table before loading
Compare_by_position — disregards the column names and maps source columns to target columns by position. • Compare_by_name — maps source columns to target columns by name. Validation errors occur if the data types of the columns do not match. Sends a TRUNCATE statement to clear the contents of the table before loading during batch jobs. Defaults to not selected. Specifies the number of loaders (to a maximum of five) and the number of rows per commit that each loader receives during parallel loading.
Number of loaders For example, if you choose a Rows per commit of 1000 and set the number of loaders to three, the first 1000 rows are sent to the first loader. The second 1000 rows are sent to the second loader, the third 1000 rows to the third loader, and the next 1000 rows back to the first loader.
Use overflow file
2011
Writes rows that cannot be loaded to the overflow file for recovery purposes. Options are enabled for the file name and file format. The overflow format can include the data rejected and the operation being performed (write_data) or the SQL command used to produce the rejected operation (write_sql).
© 2011 SAP AG. All rights reserved.
101
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 3: Creating Batch Jobs
BODS10
Ignore columns with value
Specifies a value that might appear in a source column that you do not want updated in the target table. When this value appears in the source column, the corresponding target column is not updated during auto correct loading. You can enter spaces.
Ignore columns with null
Ensures that NULL source columns are not updated in the target table during auto correct loading.
Use input keys
Enables Data Integrator to use the primary keys from the source table. By default, Data Integrator uses the primary key of the target table.
Update key columns
Updates key column values when it loads data to the target. Ensures that the same row is not duplicated in a target table. This is particularly useful for data recovery operations.
Auto correct load
When Auto correct load is selected, Data Integrator reads a row from the source and checks if a row exists in the target table with the same values in the primary key. If a matching row does not exist, it inserts the new row regardless of other options. If a matching row exists, it updates the row depending on the values of Ignore columns with value and Ignore columns with null. Indicates that this target is included in the transaction processed by a batch or real-time job. This option allows you to commit data to multiple tables as part of the same transaction. If loading fails for any one of the tables, no data is committed to any of the tables. Transactional loading can require rows to be buffered to ensure the correct load order. If the data being buffered is larger than the virtual memory available, Data Integrator reports a memory error.
Include in transaction
The tables must be from the same datastore. If you choose to enable transactional loading, these options are not available: Rows per commit, Use overflow file, and overflow file specification, Number of loaders, Enable partitioning, and Delete data from table before loading. Data Integrator also does not parameterize SQL or push operations to the database if transactional loading is enabled.
Transaction order
102
Indicates where this table falls in the loading order of the tables being loaded. By default, there is no ordering.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Creating Batch Jobs
All loaders have a transaction order of zero. If you specify orders among the tables, the loading operations are applied according to the order. Tables with the same transaction order are loaded together. Tables with a transaction order of zero are loaded at the discretion of the data flow process. Using template tables During the initial design of an application, you might find it convenient to use template tables to represent database tables. Template tables are particularly useful in early application development when you are designing and testing a project. With template tables, you do not have to initially create a new table in your HARDPANS and import the metadata into Data Services. Instead, Data Services automatically creates the table in the database with the schema defined by the data flow when you execute a job.
Figure 39: Template Tables
After creating a template table as a target in one data flow, you can use it as a source in other data flows. Although a template table can be used as a source table in multiple data flows, it can be used only as a target in one data flow. You can modify the schema of the template table in the data flow where the table is used as a target. Any changes are automatically applied to any other instances of the template table. After a template table is created in the database, you can convert the template table in the repository to a regular table. You must convert template tables so that you can use the new table in expressions, functions, and transform options. After a template table is converted, you can no longer alter the schema.
2011
© 2011 SAP AG. All rights reserved.
103
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 3: Creating Batch Jobs
BODS10
Executing a job After you create your project, jobs, and associated data flows, you can execute the job in Data Services to the data move from source to target. You can run jobs two ways: •
Immediate jobs Data Services initiates both batch and real-time jobs and runs them immediately from within the Designer. For these jobs, both the Designer and designated Job Server (where the job executes, usually on the same machine) must be running. You run immediate jobs only during the development cycle.
•
Scheduled jobs Batch jobs are scheduled. To schedule a job, use the Data Services Management Console or use a third-party scheduler. The Job Server must be running. Note: If a job has syntax errors, it does not execute.
Setting execution properties When you execute a job, the following options are available in the Execution Properties window: Option Print all trace messages
Records all trace messages in the log.
Disable data validation statistics collection
Does not collect audit statistics for this specific job execution.
Enable auditing
Collects audit statistics for this specific job execution.
Enable recovery
Enables the automatic recovery feature. When enabled, Data Services saves the results from completed steps and allows you to resume failed jobs.
Recover from last failed execution
Collect statistics for optimization
104
Description
Resumes a failed job. Data Services retrieves the results from any steps that were previously executed successfully and re-executes any other steps. This option is a runtime property. This option is not available when a job has not yet been executed or when recovery mode was disabled during the previous run. Collects statistics that the Data Services optimizer uses to choose an optimal cache type (in-memory or pageable).
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Creating Batch Jobs
Collect statistics for monitoring
Displays cache statistics in the Performance Monitor in Administrator.
Use collected statistics
Optimizes Data Services to use the cache statistics collected on a previous execution of the job. Specifies the system configuration to use when executing this job. A system configuration defines a set of datastore configurations, which define the datastore connections.
System configuration
If a system configuration is not specified, Data Services uses the default datastore configuration for each datastore. This option is a runtime property that is only available if there are system configurations defined in the repository.
Job Server or Server Group
Specifies the Job Server or server group to execute this job. Allows a job to be distributed to multiple Job Servers for processing. The options are:
Distribution level
• • •
2011
Job - The entire job executes on one server. Data flow - Each data flow within the job executes on a separate server. Subdata flow - Each subdata flow (can be a separate transform or function) within a data flow executes on a separate Job server.
© 2011 SAP AG. All rights reserved.
105
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 3: Creating Batch Jobs
106
BODS10
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Creating Batch Jobs
Exercise 5: Creating a Basic Data Flow Exercise Objectives After completing this exercise, you will be able to: • Use the Query transform to change the schema of the Alpha Acquisitions Customer table • Move the data from Alpha Acquisitions into the Delta staging database
Business Example After analyzing the source data, you determine that the structure of the customer data for Beta Businesses is the appropriate structure for the customer data in the Omega data warehouse. You must change the structure of the Alpha Acquisitions customer data to use the same structure in preparation for merging data from both datastores. Since the target table may later be processed by a Data Quality transform, you also define Content Types for the appropriate columns in the target. table.
Task 1: Use the Query transform to change the schema of the Alpha Acquisitions Customer table. 1.
Create a new project called Omega.
2.
In the Omega project, create a new batch job Alpha_Customers_Job with a new data flow called Alpha_Customers_DF.
3.
In the workspace for Alpha_Customers_DF, add the Customers table from the Alpha datastore as the source object.
4.
Create a new template table alpha_customers in the Delta datastore as the target object.
5.
Add the Query transform to the workspace between the source and target.
6.
In the transform editor for the Query transform, create output columns.
7.
Map the input columns to the output columns in the Query transform and set the output field CustomerID as the primary key.
Task 2: Execute the job with the default execution properties after saving all created objects. 1.
2011
Execute the job with the default execution properties after saving all created objects.
© 2011 SAP AG. All rights reserved.
107
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 3: Creating Batch Jobs
BODS10
Solution 5: Creating a Basic Data Flow Task 1: Use the Query transform to change the schema of the Alpha Acquisitions Customer table. 1.
2.
3.
Create a new project called Omega. a)
From the Project menu, choose the option New → Project.
b)
When the Project New dialog box appears, enter Omega in the Project name field.
c)
Click Create so that the new project appears in the Project area.
In the Omega project, create a new batch job Alpha_Customers_Job with a new data flow called Alpha_Customers_DF. a)
In the Project area, right click the project name and choose New Batch Job from the menu.
b)
Enter the name of the job as Alpha_Customers_Job.
c)
Press Enter to commit the change.
d)
Open the job Alpha_Customers_Job by double-clicking it.
e)
Select the Data Flow icon in the Tool Palette.
f)
Click the workspace where you want to add the data flow.
g)
Enter Alpha_Customers_DF as the name.
h)
Press Enter to commit the change.
i)
Double-click the data flow to open the data flow workspace.
In the workspace for Alpha_Customers_DF, add the Customers table from the Alpha datastore as the source object. a)
In the Local Object Library, select the Datastores tab and then select the Customers table from the Alpha datastore.
b)
Click and drag the object to the data flow workspace and in the context menu, choose the option Make Source.
Continued on next page
108
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Creating Batch Jobs
4.
5.
6.
Create a new template table alpha_customers in the Delta datastore as the target object. a)
In the Tool Palette, click the Template Table icon and click the workspace to add a new template table to the data flow.
b)
In the Create Template dialog box, enter alpha_customers as the template table name.
c)
In the In datastore drop down list, select the Delta datastore as the template table destination.
d)
Click OK.
Add the Query transform to the workspace between the source and target. a)
In the Tool Palette, select the Query transform icon and click the workspace to add a Query template to the data flow.
b)
Connect the source table to the Query transform by selecting the source table and holding down the mouse button, drag the cursor to the Query transform. Then release the mouse button.
c)
Connect the Query transform to the target template table by selecting the Query transform and holding down the mouse button, drag the cursor to the target table. Then release the mouse button.
In the transform editor for the Query transform, create output columns. a)
Double-click the Query transform to open the editor.
b)
In the Schema Out workspace, right click Query to choose the option New Output Item and enter the Item name CustomerID with Data Type int.
c)
In the Schema Out workspace, right click CustomerID to choose the option New Output Item. In the next dialog box, choose the option Below and enter the Item name Firm with Data Type varchar(50) and Content Type Firm.
d)
In the Schema Out workspace, right click Firm to choose the option New Output Item. In the next dialog box, choose the option Below and enter the Item name ContactName with Data Type varchar(50) and Content Type Name.
e)
In the Schema Out workspace, right click Name to choose the option New Output Item. In the next dialog box, choose the option Below and enter the Item name Title with Data Type varchar(30) and Content Type Title.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
109
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 3: Creating Batch Jobs
BODS10
f)
In the Schema Out workspace, right click Title to choose the option New Output Item. In the next dialog box, choose the option Below and enter the Item name Address1 with Data Type varchar(50) and Content Type Title.
g)
In the Schema Out workspace, right click Address1 to choose the option New Output Item. In the next dialog box, choose the option Below and enter the Item name City with Data Type varchar(50) and Content Type Locality.
h)
In the Schema Out workspace, right click City to choose the option New Output Item. In the next dialog box, choose the option Below and enter the Item name Region with Data Type varchar(25) and Content Type Region.
i)
In the Schema Out workspace, right click Region to choose the option New Output Item. In the next dialog box, choose the option Below and enter the Item name PostalCode with Data Type varchar(25) and Content Type Postcode.
j)
In the Schema Out workspace, right click PostalCode to choose the option New Output Item. In the next dialog box, choose the option Below and enter the Item name Country with Data Type varchar(50) and Content Type Country.
k)
In the Schema Out workspace, right click Country to choose the option New Output Item. In the next dialog box, choose the option Below and enter the Item name Phone with Data Type varchar(25) and Content Type Phone.
l)
In the Schema Out workspace, right click Phone to choose the option New Output Item. In the next dialog box, choose the option Below and enter the Item name Fax with Data Type varchar(25) and Content Type Phone.
Continued on next page
110
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Creating Batch Jobs
7.
Map the input columns to the output columns in the Query transform and set the output field CustomerID as the primary key. a)
Drag the field in the output schema to the corresponding field in the input schema according to the following table: Schema In
Schema Out
CUSTOMERID
CustomerID
COMPANYNAME
Firm
CONTACTNAME
ContactName
CONTACTTITLE
Title
ADDRESS
Address1
CITY
City
REGIONID
Region
POSTALCODE
PostalCode
COUNTRYID
Country
PHONE
Phone
FAX
Fax
b)
Right click the field CustomerID and from the menu choose the option Set as primary key.
c)
Select the Back icon to close the transform editor.
Task 2: Execute the job with the default execution properties after saving all created objects. 1.
2011
Execute the job with the default execution properties after saving all created objects. a)
In the Project area, right click the Alpha_Customers_Job and select Execute from the menu.
b)
Data Services prompts you to save any objects that have not been saved. Select OK.
c)
The Execution Properties dialog box appears and select OK.
d)
Return to the data flow workspace and view the target table data to confirm that 25 records were loaded by right clicking the target table and choosing the option View Data.
© 2011 SAP AG. All rights reserved.
111
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 3: Creating Batch Jobs
BODS10
Lesson Summary You should now be able to: • Create a project • Create and execute a job • Create a data flow with source and target tables • Use the Query transform
Related Information For more information, see “Cache type” in the Data Services Performance Optimization Guide. For more information, see “Degree of parallelism” in the Data Services Performance Optimization Guide. For more information on the Query transform see “Transforms” chapter n the Data Services Reference Guide. For more information, see “Distributed Data Flow execution” in the Data Services Designer Guide. For more information, see “Database link support for push-down operations across datastores” in the Data Services Performance Optimization Guide.
112
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Unit Summary
Unit Summary You should now be able to: • Create a project • Create and execute a job • Create a data flow with source and target tables • Use the Query transform
2011
© 2011 SAP AG. All rights reserved.
113
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit Summary
114
BODS10
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 4 Troubleshooting Batch Jobs Unit Overview To document decisions and troubleshoot any issues that arise when executing your jobs, you can validate and add annotations to your jobs, work flow, and data flows. In addition, you can set various trace options and see the trace results in different logs. You can also use the Interactive Debugger as a method of troubleshooting. Setting up audit points, label, and rules help you to ensure the correct data is loaded to the target.
Unit Objectives After completing this unit, you will be able to: • • • • • • •
Use descriptions and annotations Setting traces on jobs Use descriptions and annotations Setting traces on jobs Use the View Data Function Use the Interactive Debugger Use auditing in data flows
Unit Contents Lesson: Setting Traces and Adding Annotations ........................... 116 Exercise 6: Setting traces and annotations .............................127 Lesson: Setting Traces and Adding Annotations ...........................130 Exercise 7: Setting traces and annotations .............................141 Lesson: Using the Interactive Debugger.....................................144 Exercise 8: Using the Interactive Debugger ............................151 Lesson: Setting up and Using the Auditing Feature........................155 Exercise 9: Using Auditing in a Data flow ...............................163
2011
© 2011 SAP AG. All rights reserved.
115
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 4: Troubleshooting Batch Jobs
BODS10
Lesson: Setting Traces and Adding Annotations Lesson Overview To document decisions and troubleshoot any issues that arise when executing your jobs, you can validate and add annotations to jobs, work flows, and data flows, set trace options, and debug your jobs. You can also set up audit rules to ensure the correct data is loaded to the target. To document decisions and troubleshoot any issues that arise when executing your jobs, you can validate and add annotations to jobs, work flows, and data flows.
Lesson Objectives After completing this lesson, you will be able to: • •
Use descriptions and annotations Setting traces on jobs
Business Example Your company has recognized how useful it can be to integrate people, information and business processes in a heterogeneous system landscape and would like to obtain this benefit. Practice has shown, though, that loading large datasets makes considerable demands on hardware and system performance. It is therefore necessary to examine if and how the data records can be loaded into SAP NetWeaver Business Warehouse with a delta process and to understand the modes of operation and the different variants of a delta loading process.
Using descriptions and annotations Descriptions and annotations are a convenient way to add comments to objects and workspace diagrams.
116
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting Traces and Adding Annotations
Figure 40: Annotations and Descriptions
Using descriptions with objects A description is associated with a particular object. When you import or export a repository object, you also import or export its description. Designer determines when to show object descriptions based on a system-level setting and an object-level setting. Both settings must be activated to view the description for a particular object. Note: The system-level setting is unique to your setup. There are three requirements for displaying descriptions: • • •
A description has been entered into the properties of the object. The description is enabled on the properties of that object. The global View Enabled Object Descriptions option is enabled.
To show object descriptions at the system level •
From the View menu, select Enabled Descriptions. Note: The Enabled Descriptions option is only available if it is a viable option.
2011
© 2011 SAP AG. All rights reserved.
117
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 4: Troubleshooting Batch Jobs
BODS10
To add a description to an object 1.
In the project area or the workspace, right-click an object and select Properties from the menu. The Properties dialog box displays.
2. 3.
In the Description text box, enter your comments. Select OK. If you are modifying the description of a reusable object, Data Services provides a warning message that all instances of the reusable object are affected by the change.
4.
Select Yes. The description for the object displays in the Local Object Library.
To display a description in the workspace •
In the workspace, right-click the object in the workspace and select Enable Object Description from the menu. The description displays in the workspace under the object.
Using annotations to describe objects An annotation is an object in the workspace that describes a flow, part of a flow, or a diagram. An annotation is associated with the object where it appears. When you import or export a job, work flow, or data flow that includes annotations, you also import or export associated annotations. To add an annotation to the workspace 1.
In the workspace, from the tool palette, select the Annotation icon and then select the workspace. An annotation appears on the diagram.
2. 3. 4.
Double-click the annotation. Add text to the annotation. Select the cursor outside of the annotation to commit the changes. You can resize and move the annotation by clicking and dragging. You cannot hide annotations that you have added to the workspace. However, you can move them out of the way or delete them.
Validating and tracing jobs Validating jobs
118
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting Traces and Adding Annotations
It is a good idea to validate your jobs when you are ready for job execution to ensure there are no errors. You can also select and set specific trace properties, which allow you to use the various log files to help you read job execution status or troubleshoot job errors. As a best practice, you want to validate your work as you build objects so that you are not confronted with too many warnings and errors at one time. You can validate your objects as you create a job or you can automatically validate all your jobs before executing them.
Figure 41: Validating Jobs
To validate jobs automatically before job execution 1.
From the Tools menu, select Options. The Options dialog box displays.
2. 3. 4.
2011
In the Category pane, expand the Designer branch and select General. Select the Perform complete validation before job execution option. Select OK.
© 2011 SAP AG. All rights reserved.
119
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 4: Troubleshooting Batch Jobs
BODS10
To validate objects on demand 1.
From the Validation menu, select Validate → Current View or All Objects in View. The Output dialog box displays.
2.
To navigate to the object where an error occurred, right-click the validation error message and select Go To Error from the menu.
Tracing jobs Use trace properties to select the information that Data Services monitors and writes to the trace log file during a job. Data Services writes trace messages to the trace log associated with the current Job Server and writes error messages to the error log associated with the current Job Server.
Figure 42: Setting Traces in Job Execution Properties
These trace options are available. Trace
120
Description
Row
Writes a message when a transform imports or exports a row.
Session
Writes a message when the job description is read from the repository, when the job is optimized, and when the job runs.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting Traces and Adding Annotations
Work flow
Writes a message when the work flow description is read from the repository, when the work flow is optimized, when the work flow runs, and when the work flow ends.
Data flow
Writes a message when the data flow starts and when the data flow successfully finishes or terminates due to error.
Transform
Writes a message when a transform starts and completes or terminates.
Custom Transform
Writes a message when a custom transform starts and completes successfully. Writes a message of all user invocations of the AE_LogMessage function from custom C code. Writes data retrieved before SQL functions:
Custom Function
• SQL Functions
•
•
2011
Every row retrieved by the named query before the SQL is submitted in the key_generation function. Every row retrieved by the named query before the SQL is submitted in the lookup function (but only if PRE_LOAD_CACHE is not specified). When mail is sent using the mail_to function.
SQL Transforms
Writes a message (using the Table Comparison transform) about whether a row exists in the target table that corresponds to an input row from the source table.
SQL Readers
Writes the SQL query block that a script, query transform, or SQL function submits to the system and writes the SQL results.
SQL Loaders
Writes a message when the bulk loader starts, submits a warning message, or completes successfully or unsuccessfully.
Memory Source
Writes a message for every row retrieved from the memory table.
Memory Target
Writes a message for every row inserted into the memory table.
Optimized Data Flow
For SAP BusinessObjects consulting and technical support use.
Tables
Writes a message when a table is created or dropped.
Scripts and Script Functions
Writes a message when a script is called, a function is called by a script, and a script successfully completes.
Trace Parallel Execution
Writes messages describing how data in a data flow is parallel processed.
© 2011 SAP AG. All rights reserved.
121
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 4: Troubleshooting Batch Jobs
Access Server Communication Stored Procedure Audit Data
BODS10
Writes messages exchanged between the Access Server and a service provider. Writes a message when a stored procedure starts and finishes, and includes key values. Writes a message that collects a statistic at an audit point and determines if an audit rule passes or fails.
To set trace options 1.
From the project area, right-click the job name and do one of these actions:: • •
To set trace options for a single instance of the job, select Execute from the menu. To set trace options for every execution of the job, select Properties from the menu.
Save all files. Depending on which option you selected, the Execution Properties dialog box or the Properties dialog box displays. 2. 3.
Select the Trace tab. Under the name column, select a trace object name. The Value dropdown list is enabled when you select a trace object name.
4. 5.
From the Value dropdown list, select Yes to turn the trace on. Select OK.
Using log files As a job executes, Data Services produces three log files. You can view these from the project area. The log files are, by default, also set to display automatically in the workspace when you execute a job. You can select the Trace, Monitor, and Error icons to view the log files, which are created during job execution. Examining trace logs Use the trace logs to determine where an execution failed, whether the execution steps occur in the order you expect, and which parts of the execution are the most time consuming.
122
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting Traces and Adding Annotations
Figure 43: Using the Trace Log
Examining monitor logs Use monitor logs to quantify the activities of the components of the job. It lists the time spent in a given component of a job and the number of data rows that streamed through the component.
2011
© 2011 SAP AG. All rights reserved.
123
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 4: Troubleshooting Batch Jobs
BODS10
Figure 44: Using the Monitor and Error Logs
Examining error logs Use the error logs to determine how an execution failed. If the execution completed without error, the error log is blank. Using the Monitor tab The Monitor tab lists the trace logs of all current or most recent executions of a job. The traffic-light icons in the Monitor tab indicate: •
Green light indicates that the job is running. You can right-click and select Kill Job to stop a job that is still running.
•
Red light indicates that the job has stopped. You can right-click and select Properties to add a description for a specific trace log. This description is saved with the log which can be accessed later from the Log tab.
•
Red cross indicates that the job encountered an error.
Using the Log tab You can also select the Log tab to view a job’s log history.
124
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting Traces and Adding Annotations
To view log files from the project area 1. 2. 3. 4. 5.
In the project area, select the Log tab. Select the job for which you want to view the logs. In the workspace, in the Filter dropdown list, select the type of log you want to view. In the list of logs, double-click the log to view details. To copy log content from an open log, select one or more lines and use the key commands [Ctrl+C].
Determining the success of the job The best measure of the success of a job is the state of the target data. Always examine your data to make sure the data movement operation produced the results you expect.
Figure 45: View Data in Data Flow
Be sure that: • • • • •
2011
Data is not converted to incompatible types or truncated. Data is not duplicated in the target. Data is not lost between updates of the target. Generated keys have been properly incremented. Updated values were handled properly.
© 2011 SAP AG. All rights reserved.
125
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 4: Troubleshooting Batch Jobs
BODS10
If a job fails to execute, check the Job server icon in the status bar to verify that the Job Service is running. Check that the port number in Designer matches the number specified in Server Manager. If necessary, you can use the Server Manager Resync button to reset the port number in the Local Object Library.
126
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting Traces and Adding Annotations
Exercise 6: Setting traces and annotations Exercise Objectives After completing this exercise, you will be able to: • Use descriptions and annotations • Setting traces on jobs
Business Example You are sharing your jobs with other developers during the project, so you want to make sure that you identify the purpose of the job you created. You also want to ensure that the job is handling the movement of each row appropriately.
Task: You add an annotation to the data flow with an explanation of the purpose of the job.
2011
1.
Add an annotation to the workspace of the job you have already created.
2.
Execute the Alpha_Customers_Job after enabling the tracing of rows.
© 2011 SAP AG. All rights reserved.
127
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 4: Troubleshooting Batch Jobs
BODS10
Solution 6: Setting traces and annotations Task: You add an annotation to the data flow with an explanation of the purpose of the job. 1.
2.
128
Add an annotation to the workspace of the job you have already created. a)
Open the workspace of the Alpha_Customers_Job by selecting the job.
b)
From the Tool Palette, select the icon for an Annotation item and drag it in to the workspace beside the data flow. Then click the workspace to add the Annotation.
c)
Type in an explanation of the purpose of the job, such as: “The purpose of this job is to move records from the Customer table from the Alpha datastore to a template table, Alpha_customers in the Delta staging datastore.”
d)
Save all objects you have created by using the icon Save All.
Execute the Alpha_Customers_Job after enabling the tracing of rows. a)
Right click the Alpha_Customers_Job and select the option Execute.
b)
In the Execution Properties dialog box, select the Trace tab and select the Trace rows option.
c)
Select OK in the Execution Properties dialog box.
d)
In the Trace log, you should see an entry for each row added to the log to indicate how it is being handled by the data flow.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting Traces and Adding Annotations
Lesson Summary You should now be able to: • Use descriptions and annotations • Setting traces on jobs
2011
© 2011 SAP AG. All rights reserved.
129
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 4: Troubleshooting Batch Jobs
BODS10
Lesson: Setting Traces and Adding Annotations Lesson Overview To document decisions and troubleshoot any issues that arise when executing your jobs, you can validate and add annotations to jobs, work flows, and data flows, set trace options, and debug your jobs. You can also set up audit rules to ensure the correct data is loaded to the target. To document decisions and troubleshoot any issues that arise when executing your jobs, you can validate and add annotations to jobs, work flows, and data flows.
Lesson Objectives After completing this lesson, you will be able to: • •
Use descriptions and annotations Setting traces on jobs
Business Example Your company has recognized how useful it can be to integrate people, information and business processes in a heterogeneous system landscape and would like to obtain this benefit. Practice has shown, though, that loading large datasets makes considerable demands on hardware and system performance. It is therefore necessary to examine if and how the data records can be loaded into SAP NetWeaver Business Warehouse with a delta process and to understand the modes of operation and the different variants of a delta loading process.
Using descriptions and annotations Descriptions and annotations are a convenient way to add comments to objects and workspace diagrams.
130
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting Traces and Adding Annotations
Figure 46: Annotations and Descriptions
Using descriptions with objects A description is associated with a particular object. When you import or export a repository object, you also import or export its description. Designer determines when to show object descriptions based on a system-level setting and an object-level setting. Both settings must be activated to view the description for a particular object. Note: The system-level setting is unique to your setup. There are three requirements for displaying descriptions: • • •
A description has been entered into the properties of the object. The description is enabled on the properties of that object. The global View Enabled Object Descriptions option is enabled.
To show object descriptions at the system level •
From the View menu, select Enabled Descriptions. Note: The Enabled Descriptions option is only available if it is a viable option.
2011
© 2011 SAP AG. All rights reserved.
131
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 4: Troubleshooting Batch Jobs
BODS10
To add a description to an object 1.
In the project area or the workspace, right-click an object and select Properties from the menu. The Properties dialog box displays.
2. 3.
In the Description text box, enter your comments. Select OK. If you are modifying the description of a reusable object, Data Services provides a warning message that all instances of the reusable object are affected by the change.
4.
Select Yes. The description for the object displays in the Local Object Library.
To display a description in the workspace •
In the workspace, right-click the object in the workspace and select Enable Object Description from the menu. The description displays in the workspace under the object.
Using annotations to describe objects An annotation is an object in the workspace that describes a flow, part of a flow, or a diagram. An annotation is associated with the object where it appears. When you import or export a job, work flow, or data flow that includes annotations, you also import or export associated annotations. To add an annotation to the workspace 1.
In the workspace, from the tool palette, select the Annotation icon and then select the workspace. An annotation appears on the diagram.
2. 3. 4.
Double-click the annotation. Add text to the annotation. Select the cursor outside of the annotation to commit the changes. You can resize and move the annotation by clicking and dragging. You cannot hide annotations that you have added to the workspace. However, you can move them out of the way or delete them.
Validating and tracing jobs Validating jobs
132
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting Traces and Adding Annotations
It is a good idea to validate your jobs when you are ready for job execution to ensure there are no errors. You can also select and set specific trace properties, which allow you to use the various log files to help you read job execution status or troubleshoot job errors. As a best practice, you want to validate your work as you build objects so that you are not confronted with too many warnings and errors at one time. You can validate your objects as you create a job or you can automatically validate all your jobs before executing them.
Figure 47: Validating Jobs
To validate jobs automatically before job execution 1.
From the Tools menu, select Options. The Options dialog box displays.
2. 3. 4.
2011
In the Category pane, expand the Designer branch and select General. Select the Perform complete validation before job execution option. Select OK.
© 2011 SAP AG. All rights reserved.
133
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 4: Troubleshooting Batch Jobs
BODS10
To validate objects on demand 1.
From the Validation menu, select Validate → Current View or All Objects in View. The Output dialog box displays.
2.
To navigate to the object where an error occurred, right-click the validation error message and select Go To Error from the menu.
Tracing jobs Use trace properties to select the information that Data Services monitors and writes to the trace log file during a job. Data Services writes trace messages to the trace log associated with the current Job Server and writes error messages to the error log associated with the current Job Server.
Figure 48: Setting Traces in Job Execution Properties
These trace options are available. Trace
134
Description
Row
Writes a message when a transform imports or exports a row.
Session
Writes a message when the job description is read from the repository, when the job is optimized, and when the job runs.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting Traces and Adding Annotations
Work flow
Writes a message when the work flow description is read from the repository, when the work flow is optimized, when the work flow runs, and when the work flow ends.
Data flow
Writes a message when the data flow starts and when the data flow successfully finishes or terminates due to error.
Transform
Writes a message when a transform starts and completes or terminates.
Custom Transform
Writes a message when a custom transform starts and completes successfully. Writes a message of all user invocations of the AE_LogMessage function from custom C code. Writes data retrieved before SQL functions:
Custom Function
• SQL Functions
•
•
2011
Every row retrieved by the named query before the SQL is submitted in the key_generation function. Every row retrieved by the named query before the SQL is submitted in the lookup function (but only if PRE_LOAD_CACHE is not specified). When mail is sent using the mail_to function.
SQL Transforms
Writes a message (using the Table Comparison transform) about whether a row exists in the target table that corresponds to an input row from the source table.
SQL Readers
Writes the SQL query block that a script, query transform, or SQL function submits to the system and writes the SQL results.
SQL Loaders
Writes a message when the bulk loader starts, submits a warning message, or completes successfully or unsuccessfully.
Memory Source
Writes a message for every row retrieved from the memory table.
Memory Target
Writes a message for every row inserted into the memory table.
Optimized Data Flow
For SAP BusinessObjects consulting and technical support use.
Tables
Writes a message when a table is created or dropped.
Scripts and Script Functions
Writes a message when a script is called, a function is called by a script, and a script successfully completes.
Trace Parallel Execution
Writes messages describing how data in a data flow is parallel processed.
© 2011 SAP AG. All rights reserved.
135
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 4: Troubleshooting Batch Jobs
Access Server Communication Stored Procedure Audit Data
BODS10
Writes messages exchanged between the Access Server and a service provider. Writes a message when a stored procedure starts and finishes, and includes key values. Writes a message that collects a statistic at an audit point and determines if an audit rule passes or fails.
To set trace options 1.
From the project area, right-click the job name and do one of these actions:: • •
To set trace options for a single instance of the job, select Execute from the menu. To set trace options for every execution of the job, select Properties from the menu.
Save all files. Depending on which option you selected, the Execution Properties dialog box or the Properties dialog box displays. 2. 3.
Select the Trace tab. Under the name column, select a trace object name. The Value dropdown list is enabled when you select a trace object name.
4. 5.
From the Value dropdown list, select Yes to turn the trace on. Select OK.
Using log files As a job executes, Data Services produces three log files. You can view these from the project area. The log files are, by default, also set to display automatically in the workspace when you execute a job. You can select the Trace, Monitor, and Error icons to view the log files, which are created during job execution. Examining trace logs Use the trace logs to determine where an execution failed, whether the execution steps occur in the order you expect, and which parts of the execution are the most time consuming.
136
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting Traces and Adding Annotations
Figure 49: Using the Trace Log
Examining monitor logs Use monitor logs to quantify the activities of the components of the job. It lists the time spent in a given component of a job and the number of data rows that streamed through the component.
2011
© 2011 SAP AG. All rights reserved.
137
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 4: Troubleshooting Batch Jobs
BODS10
Figure 50: Using the Monitor and Error Logs
Examining error logs Use the error logs to determine how an execution failed. If the execution completed without error, the error log is blank. Using the Monitor tab The Monitor tab lists the trace logs of all current or most recent executions of a job. The traffic-light icons in the Monitor tab indicate: •
Green light indicates that the job is running. You can right-click and select Kill Job to stop a job that is still running.
•
Red light indicates that the job has stopped. You can right-click and select Properties to add a description for a specific trace log. This description is saved with the log which can be accessed later from the Log tab.
•
Red cross indicates that the job encountered an error.
Using the Log tab You can also select the Log tab to view a job’s log history.
138
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting Traces and Adding Annotations
To view log files from the project area 1. 2. 3. 4. 5.
In the project area, select the Log tab. Select the job for which you want to view the logs. In the workspace, in the Filter dropdown list, select the type of log you want to view. In the list of logs, double-click the log to view details. To copy log content from an open log, select one or more lines and use the key commands [Ctrl+C].
Determining the success of the job The best measure of the success of a job is the state of the target data. Always examine your data to make sure the data movement operation produced the results you expect.
Figure 51: View Data in Data Flow
Be sure that: • • • • •
2011
Data is not converted to incompatible types or truncated. Data is not duplicated in the target. Data is not lost between updates of the target. Generated keys have been properly incremented. Updated values were handled properly.
© 2011 SAP AG. All rights reserved.
139
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 4: Troubleshooting Batch Jobs
BODS10
If a job fails to execute, check the Job server icon in the status bar to verify that the Job Service is running. Check that the port number in Designer matches the number specified in Server Manager. If necessary, you can use the Server Manager Resync button to reset the port number in the Local Object Library.
140
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting Traces and Adding Annotations
Exercise 7: Setting traces and annotations Exercise Objectives After completing this exercise, you will be able to: • Use descriptions and annotations • Setting traces on jobs
Business Example You are sharing your jobs with other developers during the project, so you want to make sure that you identify the purpose of the job you created. You also want to ensure that the job is handling the movement of each row appropriately.
Task: You add an annotation to the data flow with an explanation of the purpose of the job.
2011
1.
Add an annotation to the workspace of the job you have already created.
2.
Execute the Alpha_Customers_Job after enabling the tracing of rows.
© 2011 SAP AG. All rights reserved.
141
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 4: Troubleshooting Batch Jobs
BODS10
Solution 7: Setting traces and annotations Task: You add an annotation to the data flow with an explanation of the purpose of the job. 1.
2.
142
Add an annotation to the workspace of the job you have already created. a)
Open the workspace of the Alpha_Customers_Job by selecting the job.
b)
From the Tool Palette, select the icon for an Annotation item and drag it in to the workspace beside the data flow. Then click the workspace to add the Annotation.
c)
Type in an explanation of the purpose of the job, such as: “The purpose of this job is to move records from the Customer table from the Alpha datastore to a template table, Alpha_customers in the Delta staging datastore.”
d)
Save all objects you have created by using the icon Save All.
Execute the Alpha_Customers_Job after enabling the tracing of rows. a)
Right click the Alpha_Customers_Job and select the option Execute.
b)
In the Execution Properties dialog box, select the Trace tab and select the Trace rows option.
c)
Select OK in the Execution Properties dialog box.
d)
In the Trace log, you should see an entry for each row added to the log to indicate how it is being handled by the data flow.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting Traces and Adding Annotations
Lesson Summary You should now be able to: • Use descriptions and annotations • Setting traces on jobs
2011
© 2011 SAP AG. All rights reserved.
143
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 4: Troubleshooting Batch Jobs
BODS10
Lesson: Using the Interactive Debugger Lesson Overview To document decisions and troubleshoot any issues that arise when executing your jobs, you can set trace options, and debug your jobs using the Interactive Debugger.
Lesson Objectives After completing this lesson, you will be able to: • •
Use the View Data Function Use the Interactive Debugger
Business Example Your company has recognized how useful it can be to integrate people, information and business processes in a heterogeneous system landscape and would like to benefit from this. Practice has shown, though, that loading large datasets makes considerable demands on hardware and system performance. It is therefore necessary to examine if and how the data records can be loaded into SAP NetWeaver Business Warehouse with a delta process. You must understand the modes of operation and the different variants of a delta loading process.
Using View Data and the Interactive Debugger You can debug jobs in Data Services using the View Data and Interactive Debugger features. With View Data, you can view samples of source and target data for your jobs. Using the Interactive Debugger, you can examine what happens to the data after each transform or object in the flow. After completing this unit, you can: • • •
Use View Data with sources and targets Use the Interactive Debugger Set filters and breakpoints for a debug session
Using View Data with sources and targets With the View Data feature, you can check the status of data at any point after you import the metadata for a data source, and before or after you process your data flows. You can check the data when you design and test jobs to ensure that your design returns the results you expect.
144
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Interactive Debugger
Figure 52: View Data in Data Flow
View Data allows you to see source data before you execute a job. Using data details you can: • • • •
Create higher quality job designs. Scan and analyze imported table and file data from the Local Object Library. See the data for those same objects within existing jobs. Refer back to the source data after you execute the job.
View Data also allows you to check your target data before executing your job, then look at the changed data after the job executes. In a data flow, you can use one or more View Data panels to compare data between transforms and within source and target objects. View Data displays your data in the rows and columns of a data grid. The path for the selected object displays at the top of the pane. The number of rows displayed is determined by a combination of several conditions: • •
Sample size: the number of rows sampled in memory. Default sample size is 1000 rows for imported source, targets, and transforms. Filtering: the filtering options that are selected. If your original data set is smaller or if you use filters, the number of returned rows could be less than the default.
Keep in mind that you can have only two View Data windows open at any time. if you already have two windows open and try to open a third, you are prompted to select which to close.
2011
© 2011 SAP AG. All rights reserved.
145
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 4: Troubleshooting Batch Jobs
BODS10
To use View Data in source and target tables •
On the Datastore tab of the Local Object Library, right-click a table and select View Data from the menu. The View Data dialog box displays.
To open a View Data pane in a data flow workspace 1.
In the data flow workspace, select the magnifying glass button on a data flow object. A large View Data pane appears beneath the current workspace area.
2.
To compare data, select the magnifying glass button for another object. A second pane appears below the workspace area, and the first pane area shrinks to accommodate it. When both panes are filled and you select another View Data button, a small menu appears containing window placement icons. The black area in each icon indicates the pane you want to replace with a new set of data. When you select a menu option, the data from the latest selected object replaces the data in the corresponding pane.
Using the Interactive Debugger Designer includes an Interactive Debugger that allows you to troubleshoot your jobs by placing filters and breakpoints on lines in a data flow diagram. This enables you to examine and modify data row by row during a debug mode job execution. The Interactive Debugger can also be used without filters and breakpoints. Running the job in debug mode and then navigating to the data flow while remaining in debug mode enables you to drill into each step of the data flow and view the data. When you execute a job in debug mode, Designer displays several additional windows that make up the Interactive Debugger: Call stack, Trace, Variables, and View Data panes.
146
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Interactive Debugger
Figure 53: The Interactive Debugger
The left View Data pane shows the data in the source table, and the right pane shows the rows that have been passed to the query up to the breakpoint. To start the Interactive Debugger 1.
In the project area, right-click the job and select Start debug from the menu. The Debug Properties dialog box displays.
2.
Set properties for the execution. You can specify many of the same properties as you can when executing a job without debugging. In addition, you can specify the number of rows to sample in the Data sample rate field.
3.
Select OK. The debug mode begins. While in debug mode, all other Designer features are set to read-only. A Debug icon is visible in the task bar while the debug is in progress.
4. 5.
If you have set breakpoints, in the Interactive Debugger toolbar, select Get next row to move to the next breakpoint. To exit the debug mode, from the Debug menu, select Stop Debug.
Setting filters and breakpoints for a debug session
2011
© 2011 SAP AG. All rights reserved.
147
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 4: Troubleshooting Batch Jobs
BODS10
You can set filters and breakpoints on lines in a data flow diagram before you start a debugging session that allow you to examine and modify data row-by-row during a debug mode job execution. A debug filter functions the same as a simple Query transform with a WHERE clause. You can use a filter if you want to reduce a data set in a debug job execution. The debug filter does not support complex expressions. A breakpoint is the location where a debug job execution pauses and returns control to you. A breakpoint can be based on a condition, or it can be set to break after a specific number of rows. You can place a filter or breakpoint on the line between a source and a transform or two transforms. If you set a filter and a breakpoint on the same line, Data Services applies the filter first, which means that the breakpoint applies to the filtered rows only.
Figure 54: Setting Filters and Breakpoints in the Data Flow
148
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Interactive Debugger
To set filters and breakpoints 1. 2. 3. 4.
In the data flow workspace, right-click the line that connects two objects and select Set Filter/Breakpoint from the menu. In the Breakpoint window in the Column dropdown list, select the column to which the filter or breakpoint applies. In the Operator dropdown list, select the operator for the expression. In the Value field, enter the value to complete the expression. The condition for filters/breakpoints do not use a delimiter for strings.
5.
6.
2011
If you are using multiple conditions, repeat step 3 to step 5 for all conditions and select the appropriate operator from the Concatenate all conditions using dropdown list. Select OK.
© 2011 SAP AG. All rights reserved.
149
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 4: Troubleshooting Batch Jobs
150
© 2011 SAP AG. All rights reserved.
BODS10
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Interactive Debugger
Exercise 8: Using the Interactive Debugger Exercise Objectives After completing this exercise, you will be able to: • Use the View Data Function • Use the Interactive Debugger
Business Example To ensure that your job is processing the data correctly, you want to run the job in debug mode. To minimize the datayou have to review in the Interactive Debugger, you set the debug option process to show only records from an individual CountryID field value.
Task 1: Execute the Alpha_Customers_Job in debug mode with a subset of records. 1.
In the workspace for the Alpha_Customers_Job, add a filter between the source and the Query transform to filter the records so that only customers from the USA are included in the debug session.
Task 2: Once you have confirmed that the structure appears correct, you execute another debug session with all records, breaking after every row. 1.
2011
Execute the Alpha_Customers_Job again in debug mode using a breakpoint to stop the debug process after a number of rows.
© 2011 SAP AG. All rights reserved.
151
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 4: Troubleshooting Batch Jobs
BODS10
Solution 8: Using the Interactive Debugger Task 1: Execute the Alpha_Customers_Job in debug mode with a subset of records. 1.
In the workspace for the Alpha_Customers_Job, add a filter between the source and the Query transform to filter the records so that only customers from the USA are included in the debug session. a)
Open the workspace for the Alpha_Customers_Job and right click the connection between the source table and the Query transform and from the context menu, choose Set Filter
b)
In the Filter window, in the Column drop-down list, select the column to which the filter applies.
c)
In the Operator drop-down list, select the “Equals (=)” operator for the expression.
d)
In the Value field, enter the value 1 representing the country USA.
e)
Select OK.
f)
Right-click the Alpha_Customers_Job and select Start debug from the menu.
g)
In the Debug Properties dialog box, set properties for the execution and then select OK. Debug mode begins and all other Designer features are set to read-only. A Debug icon is visible in the task bar while the debug is in progress. You can specify many of the same properties as you can when executing a job without debugging. In addition, you can specify the number of rows to sample in the Data sample rate field.
h)
You should see only five records returned to the template table.
i)
Exit from debug mode by using the menu option Debug → Exit.
Continued on next page
152
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Interactive Debugger
Task 2: Once you have confirmed that the structure appears correct, you execute another debug session with all records, breaking after every row. 1.
2011
Execute the Alpha_Customers_Job again in debug mode using a breakpoint to stop the debug process after a number of rows. a)
In the workspace for the Alpha_Customers_Job, right click the connection between the source table and the Query transform to choose the option Remove Filter.
b)
Right click the connection between the source table and the Query transform and from the context menu, choose Set Breakpoint.
c)
In the Breakpoint window, select the checkbox Break after number of rows to enable breaking the debug session during processing and enter 20 in the field Break after number of rows
d)
Select OK.
e)
Right click the Alpha_Customers_Job and select Start debug from the menu choose Start debug.
f)
In the Debug Properties dialog box, set properties for the execution and then select OK.
g)
Debug mode begins and then stops after processing 20 rows. Use the menu path Debug → Step over
h)
Discard the last row processed from the target table by selecting the last row displayed and select the Discard icon in the record display. You will see that the record fields values now appear as if a line has been drawn through each value.
i)
Continue processing by using the menu path Debug → Get next row and now the next row is displayed. Continue using the menu path until you get a message that the job is finished.
j)
Exit from debug mode by using the menu option Debug → Exit.
k)
Remove the breakpoint from the data flow by right-clicking on it to select Delete from the menu.
l)
Use the button in the Tool Bar Save All.
m)
In the data flow workspace, select the magnifying glass for the target table to view the table records. Note that only 24 of 25 rows were returned, because you rejected one record.
© 2011 SAP AG. All rights reserved.
153
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 4: Troubleshooting Batch Jobs
BODS10
Lesson Summary You should now be able to: • Use the View Data Function • Use the Interactive Debugger
154
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting up and Using the Auditing Feature
Lesson: Setting up and Using the Auditing Feature Lesson Overview To document decisions and troubleshoot any issues that arise when executing your jobs, you can set up audit rules to ensure the correct data is loaded to the target.
Lesson Objectives After completing this lesson, you will be able to: •
Use auditing in data flows
Business Example Your company has recognized how useful it can be to integrate people, information and business processes in a heterogeneous system landscape and would like to benefit from this. Practice has shown, though, that loading large datasets makes considerable demands on hardware and system performance. It is necessary to examine if and how the data records can be loaded into SAP NetWeaver Business Warehouse with a delta process. You must understand the modes of operation and the different variants of a delta loading process.
Setting up auditing You can collect audit statistics on the data that flows out of any Data Services object, such as a source, transform, or target. If a transform has multiple distinct or different outputs (such as Validation or Case), you can audit each output independently. Setting up auditing When you audit data flows, you: 1. 2. 3. 4.
Define audit points to collect runtime statistics about the data that flows out of objects. These audit statistics are stored in the Data Services repository. Define rules with these audit statistics to ensure that the data extracted from sources, processed by transforms, and loaded into targets is what you expect. Generate a runtime notification that includes the audit rule that failed and the values of the audit statistics at the time of failure. Display the audit statistics after the job execution to help identify the object in the data flow that might have produced incorrect data.
Defining audit points
2011
© 2011 SAP AG. All rights reserved.
155
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 4: Troubleshooting Batch Jobs
BODS10
An audit point represents the object in a data flow where you collect statistics. You can audit a source, a transform, or a target in a data flow. When you define audit points on objects in a data flow, you specify an audit function. An audit function represents the audit statistic that Data Services collects for a table, output schema, or column. You can choose from these audit functions: Data object
Function
Table or output schema
Count
Column
Sum
Description This function collects two statistics: •
Good count for rows that were successfully processed. • Error count for rows that generated some type of error if you enabled error handling. The datatype for this function is integer. Sum of the numeric values in the column. This function only includes the good rows. This function applies only to columns with a datatype of integer, decimal, double, and real.
Column
Average
Average of the numeric values in the column. This function only includes the good rows. This function applies only to columns with a datatype of integer, decimal, double, and real.
Column
Checksum
Detect errors in the values in the column by using the checksum value. This function applies only to columns with a datatype of varchar.
Defining audit labels An audit label represents the unique name in the data flow that Data Services generates for the audit statistics collected for each audit function that you define. You use these labels to define audit rules for the data flow.
156
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting up and Using the Auditing Feature
Figure 55: Using Auditing Points, Label and Functions
If the audit point is on a table or output schema, these two labels are generated for the Count audit function: $Count_objectname $CountError_objectname If the audit point is on a column, the audit label is generated with this format: $auditfunction_objectname Note: An audit label can become invalid if you delete or rename an object that had an audit point defined on it. Invalid labels are listed as a separate node on the Labels tab. To resolve the issue, you must re-create the labels and then delete the invalid items. Defining audit rules Use auditing rules if you want to compare audit statistics for one object against another object. For example, you can use an audit rule if you want to verify that the count of rows from the source table is equal to the rows in the target table. An audit rule is a Boolean expression, which consists of a left-hand-side (LHS), a Boolean operator, and a right-hand-side (RHS). The LHS can be a single audit label, multiple audit labels that form an expression with one or more mathematical operators, or a function with audit labels as parameters. In addition to these, the RHS can also be a constant. These are examples of audit rules: $Count_CUSTOMER = $Count_CUSTDW
2011
© 2011 SAP AG. All rights reserved.
157
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 4: Troubleshooting Batch Jobs
BODS10
$Sum_ORDER_US + $Sum_ORDER_EUROPE = $Sum_ORDER_DW round($Avg_ORDER_TOTAL) >= 10000 Defining audit actions You can choose any combination of the actions listed for notification of an audit failure: •
E-mail to list: Data Services sends a notification of which audit rule failed to the E-mail addresses that you list in this option. Use a comma to separate the list of mail addresses. You can specify a variable for the mail list. This option uses the smtp_to function to send E-mail. You must define the server and sender for the Simple Mail Tool Protocol (SMTP) in the Data Services Server Manager.
• •
Script: Data Services executes the custom script that you create in this option. Raise exception: When an audit rule fails, the Error Log shows the rule that failed. The job stops at the first audit rule that fails. This is an example of a message in the Error Log: Audit rule failed for or Demo_DF>. This action is the default. If you clear this action and an audit rule fails, the job completes successfully and the audit does not write messages to the job log.
If you choose all three actions, Data Services executes them in the order presented. You can see the audit status in one of these places: Places where you can view audit information
Action on Failure
Job Error Log, Metadata Reports
Raise an exception
E-mail message, Metadata Reports
E-mail to list
Wherever the custom script sends the audit messages, Script Metadata Reports
158
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting up and Using the Auditing Feature
Figure 56: Defining Audit Rules and Actions
2011
© 2011 SAP AG. All rights reserved.
159
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 4: Troubleshooting Batch Jobs
BODS10
To define audit points and rules in a data flow: 1.
On the Data Flow tab of the Local Object Library, right-click a data flow and select Audit from the menu. The Audit dialog box displays with a list of the objects you can audit, with any audit functions and labels for those objects.
2.
On the Label tab, right-click the object you want to audit and select Properties from the menu. The Schema Properties dialog box displays.
3.
In the Audit tab of the Schema Properties dialog box, in the Audit function dropdown list, select the audit function you want to use against this data object type. The audit functions displayed in the dropdown menu depend on the data object type that you have selected. Default values are assigned to the audit labels, which can be changed if required.
4. 5. 6.
Select OK. Repeat step 2 to step 4 for all audit points. On the Rule tab, under Auditing Rules, select Add. The expression editor activates and the Custom options become available for use. The expression editor contains three dropdown lists where you specify the audit labels for the objects you want to audit and choose the Boolean expression to use between these labels.
7. 8. 9.
In the left-hand-side dropdown list in the expression editor, select the audit label for the object you want to audit. In the operator dropdown list in the expression editor, select a Boolean operator. In the right-hand-side dropdown list in the expression editor, select the audit label for the second object you want to audit. If you want to compare audit statistics for one or more objects against statistics for multiple other objects or a constant, select the Custom radio button, and select the ellipsis button beside Functions. This opens up the full-size smart editor where you can drag different functions and labels to use for auditing.
10. Repeat step 7 to step 10 for all audit rules. 11. Under Action on Failure, select the action you want. 12. Select Close.
160
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting up and Using the Auditing Feature
To trace audit data 1. 2. 3. 4. 5.
In the project area, right-click the job and select Execute from the menu. In the Execution Properties window, select the Trace tab. Select Trace Audit Data. In the Value dropdown list, select Yes. Select OK. The job executes and the job log displays the Audit messages based on the audit function that is used for the audit object.
Choosing audit points When you choose audit points, consider: •
The Data Services optimizer cannot push down operations after the audit point. Therefore, if the performance of a query that is pushed to the database server is more important than gathering audit statistics from the source, define the first audit point on the query or later in the data flow. For example, suppose your data flow has a source, a Query transform, and a target. The Query has a WHERE clause that is pushed to the database server that significantly reduces the amount of data that returns to Data Services. Define the first audit point on the Query, rather than on the source, to obtain audit statistics on the results.
• • •
2011
If a pushdown_sql function is after an audit point, Data Services cannot execute it. The auditing feature is disabled when you run a job with the debugger. If you use the CHECKSUM audit function in a job that executes in parallel, Data Services disables the Degrees of Parallelism (DOP) for the whole data flow. The order of rows is important for the result of CHECKSUM, and DOP processes the rows in a different order than in the source.
© 2011 SAP AG. All rights reserved.
161
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 4: Troubleshooting Batch Jobs
162
© 2011 SAP AG. All rights reserved.
BODS10
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting up and Using the Auditing Feature
Exercise 9: Using Auditing in a Data flow Exercise Objectives After completing this exercise, you will be able to: • Create audit points, labels and rules to validate the accuracy of a data flow job
Business Example You must ensure that all records from the Customer table in the Alpha database are being moved to the Delta staging database using the audit logs.
Task: In the Local Object Library, set up auditing on the data flow Alpha_Customers_DF by adding an audit points to compare the total number of records in the source and target tables.
2011
1.
Add an audit point in the Alpha_Customers_DF data flow to count the total number of records in the source table
2.
Add an audit point in the Alpha_Customers_DF data flow to count the total number of records in the target table
3.
Construct an audit rule that an exception must be entered into the log if the count from both tables is not the same.
4.
Enable auditing for the execution of the Alpha_Customers_Job .
© 2011 SAP AG. All rights reserved.
163
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 4: Troubleshooting Batch Jobs
BODS10
Solution 9: Using Auditing in a Data flow Task: In the Local Object Library, set up auditing on the data flow Alpha_Customers_DF by adding an audit points to compare the total number of records in the source and target tables. 1.
Add an audit point in the Alpha_Customers_DF data flow to count the total number of records in the source table a)
In the Local Object Library, select the Data Flow tab and then right click the data flow Alpha_Customers_DF to select the option Audit.
b)
The Audit dialog box displays with a list of the objects you can audit with any audit functions and labels for those objects. On the Label tab, right click the source table “Customer” and select Properties from the menu. The Schema Properties dialog box displays.
c)
2.
In the Audit tab of the Schema Properties dialog box, go to the field Audit function. Use the drop-down list to select the audit function Count. Then select OK.
Add an audit point in the Alpha_Customers_DF data flow to count the total number of records in the target table a)
In the Local Object Library, select the Data Flow tab and then right click the data flow Alpha_Customers_DF to select the option Audit.
b)
The Audit dialog box displays with a list of the objects you can audit with any audit functions and labels for those objects. On the Label tab, right click the target table and select Properties from the menu. The Schema Properties dialog box displays.
c)
In the Audit tab of the Schema Properties dialog box, go to the field Audit function. Use the drop-down list to select the audit function Count. Then select OK.
Continued on next page
164
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting up and Using the Auditing Feature
3.
Construct an audit rule that an exception must be entered into the log if the count from both tables is not the same. a)
In the Rule tab, under Auditing Rules, select Add.
b)
The expression editor opens and contains three drop-down lists where you specify the audit labels for the objects you want to audit and choose the Boolean expression to use between these labels. In the left-hand side drop down list in the expression editor, select the audit label for the source table.
4.
2011
c)
In the operator drop down list in the expression editor, select either the Boolean operator Less than ().
d)
In the right-hand drop down list in the expression editor, select the audit label for the target table.
e)
In the expression editor, under Action on failure, select the checkbox for the option Raise exception. Then select Close.
Enable auditing for the execution of the Alpha_Customers_Job . a)
Right-click the Alpha_Customers_Job to select Execute.
b)
In the Execution Properties dialog box, go to the Execution Options tab and select the checkbox for Enable auditing.
c)
In the Execution Properties dialog box, go to the Trace tab and enable the option Trace audit data.
d)
Then select OK and you see that the audit rule passes validation.
© 2011 SAP AG. All rights reserved.
165
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 4: Troubleshooting Batch Jobs
BODS10
Lesson Summary You should now be able to: • Use auditing in data flows
Related Information For more information on DOP, see “Using Parallel Execution” and “Maximizing the number of push-down operations” in the Data Services Performance Optimization Guide.
166
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Unit Summary
Unit Summary You should now be able to: • Use descriptions and annotations • Setting traces on jobs • Use descriptions and annotations • Setting traces on jobs • Use the View Data Function • Use the Interactive Debugger • Use auditing in data flows
2011
© 2011 SAP AG. All rights reserved.
167
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit Summary
168
BODS10
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 5 Using Functions, Scripts and Variables Unit Overview Data Services gives you the ability to perform complex operations using built-in functions. You can extend the flexibility and reusability of objects by writing scripts, custom functions, and expressions using Data Services scripting language and variables.
Unit Objectives After completing this unit, you will be able to: • • • • • • •
Use functions in expressions Use the search_replace function Use the lookup_ext function Use the decode function Use variables and parameters Use Data Services scripting language Create a custom function
Unit Contents Lesson: Using Built-In Functions .............................................170 Exercise 10: Using the search_replace function .......................181 Exercise 11: Using the lookup_ext() function ...........................185 Exercise 12: Using the decode function .................................189 Lesson: Using Variables, Parameters and Scripts .........................197 Exercise 13: Creating a custom function ................................ 211
2011
© 2011 SAP AG. All rights reserved.
169
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 5: Using Functions, Scripts and Variables
BODS10
Lesson: Using Built-In Functions Lesson Overview Data Services gives you the ability to perform complex operations using functions and to extend the flexibility and reusability of built-in functions using other Data Services features.
Lesson Objectives After completing this lesson, you will be able to: • • • •
Use functions in expressions Use the search_replace function Use the lookup_ext function Use the decode function
Business Example You want to load data from an external system into SAP NetWeaver Business Warehouse using flat files. You also want to consider the option of loading data by delta upload. You also consider an alternative method. You can use the DB Connect functions for direct data extraction into BW from tables and views of a database management system that is directly connected to BW.
Using functions Defining functions Note: Data Services does not support functions that include tables as input or output parameters, except functions imported from SAP ERP. Listing the types of operations for functions Functions are grouped into different categories: Type
170
Description
Aggregate Functions
Performs calculations on numeric values.
Conversion Functions
Converts values to specific data types.
Custom Functions
Performs functions defined by the user.
Database Functions
Performs operations specific to databases.
Date Functions
Performs calculations and conversions on date values.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Built-In Functions
Environment Functions
Performs operations specific to your Data Services environment.
Look Up Functions
Looks up data in other tables.
Math Functions
Performs complex mathematical operations on numeric values.
Miscellaneous Functions
Performs various operations
String Functions
Performs operations on alphanumeric strings of data.
System Functions
Performs system operations.
Validation Functions
Validates specific types of values.
Other types of functions
2011
© 2011 SAP AG. All rights reserved.
171
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 5: Using Functions, Scripts and Variables
BODS10
In addition to these listed built-in functions, you can also use these functions: •
Database and application functions: These functions are specific to your RDBMS. You can import the metadata for database and application functions and use them in Data Services applications. At runtime, Data Services passes the appropriate information to the database or application from which the function was imported. The metadata for a function includes the input, output, and their data types. If there are restrictions on data passed to the function, such as requiring uppercase values or limiting data to a specific range, you must enforce these restrictions in the input. You can either test the data before extraction or include logic in the data flow that calls the function. You can import stored procedures from DB2, Microsoft SQL Server, Oracle, and Sybase databases. You can also import stored packages from Oracle. Stored functions from SQL Server can also be imported.
•
Custom functions: These are functions that you define. You can create your own functions by writing script functions using Data Services scripting language.
•
New cryptographic functions to encrypt and decrypt data using the AES algorithm: The key length used for the encryption can be specified as a parameter (128, 192, or 256). Based on the passphrase, a key with the required length will be generated. The passphrase is needed to decrypt the data again. The output of the encryption function will result in a string of length : (size_of_input_string+16)*1.3. The syntax is: encrypt_AES ( input_string, passphrase, key_length) decrypt_AES ( input_string, passphrase, key_length).
•
New gen_UUID function: This is used to generate Universally Unique Identifiers Unique across space (host, process, thread) and time. Based on RFC 4122 - Version 1 (Timestamp based), the Generated ID is a VARCHAR that is 32 characters long.
Using functions in expressions Functions can be used in expressions to map return values as new columns, which allows columns that are not in the initial input data set to be specified in the output data set. Defining functions in expressions
172
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Built-In Functions
Functions are typically used to add columns based on some other value (lookup function) or generated key fields. You can use functions in: • • • •
Transforms: The Query, Case, and SQL transforms support functions. Scripts: These are single-use objects used to call functions and assign values to variables in a work flow. Conditionals: These are single-use objects used to implement branch logic in a work flow. Other custom functions: These are functions that you create as required.
Before you use a function, you need to know if the function’s operation makes sense in the expression you are creating. For example, the “max” function cannot be used in a script or conditional where there is no collection of values on which to operate. You can add existing functions in an expression by using the Smart Editor or the Function wizard. The Smart Editor offers you many options, including variables, data types, keyboard shortcuts, and so on.
Figure 57: Functions: Smart Editor
The Function wizard allows you to define parameters for an existing function and is recommended for defining complex functions.
2011
© 2011 SAP AG. All rights reserved.
173
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 5: Using Functions, Scripts and Variables
BODS10
Figure 58: Functions: Function Wizard
To use the Smart Editor 1. 2. 3.
Open the object in which you want to use an expression. Select the ellipses (...) button and the Smart Editor appears. Select the Functions tab and expand a function category.
To use the Function wizard 1. 2.
Open the object in which you want to use an expression. Select Functions. The Select Function dialog box opens.
3.
In the Function list, select a category.
Using the lookup functions Lookup functions allow you to look up values in other tables to populate columns. Using lookup tables Lookup functions allow you to use values from the source table to look up values in other tables to generate the data that populates the target table.
174
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Built-In Functions
Figure 59: Using the Lookup Function
Lookups enable you to store reusable values in memory to speed up the process. Lookups are useful for values that rarely change. The lookup, lookup_seq, and lookup_ext functions all provide a specialized type of join, similar to an SQL outer join. While a SQL outer join may return multiple matches for a single record in the outer table, lookup functions always return exactly the same number of records that are in the source table. While all lookup functions return one row for each row in the source, they differ in how they choose which of several matching rows to return: • •
•
Lookup does not provide additional options for the lookup expression. Lookup_ext allows you to specify an Order by column and Return policy (Min, Max) to return the record with the highest/lowest value in a given field (for example, a surrogate key). Lookup_seq searches in matching records to return a field from the record where the sequence column (for example, effective_date) is closest to but not greater than a specified sequence value (for example, a transaction date).
The lookup_ext function is recommended for lookup operations because of its enhanced options.
2011
© 2011 SAP AG. All rights reserved.
175
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 5: Using Functions, Scripts and Variables
BODS10
Figure 60: Comparison: Lookup and Lookup_ext
176
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Built-In Functions
You can use this function to retrieve a value in a table or file based on the values in a different source table or file. This function also extends functionality by allowing you to: • • • • • •
• • • •
Return multiple columns from a single lookup. Choose from more operators to specify a lookup condition. Specify a return policy for your lookup. Perform multiple (including recursive) lookups. Call lookup_ext in scripts and custom functions. This also lets you reuse the lookups packaged inside scripts. Define custom SQL using the SQL_override parameter to populate the lookup cache, narrowing large quantities of data for only the sections relevant for your lookup(s). Use lookup_ext to dynamically execute SQL. Call lookup_ext, using the Function wizard, in the query output mapping to return multiple columns in a Query transform. Design jobs to use lookup_ext without having to hard code the name of the translation file at design time. Use lookup_ext with memory datastore tables. Hint: There are two ways to use the lookup_ext function in a Query output schema. The first way is to map to a single output column in the output schema. In this case, the lookup_ext is limited to returning values from a single column from the lookup (translate) table. The second way is to specify a “New Output Function Call” (right mouse select option) in the Query output schema, which opens the Function Wizard. You can then configure the lookup_ext with multiple columns being returned from the lookup (translate) table from a single lookup. This has performance benefits as well as allowing you to easily modify the function call after the initial definition.
2011
© 2011 SAP AG. All rights reserved.
177
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 5: Using Functions, Scripts and Variables
BODS10
To create a lookup_ext expression 1.
Open the Query transform. The Query transform should have at least one main source table and one lookup table, and it must be connected to a single target object.
2. 3.
Select the output schema column for which the lookup function is being performed. In the Mapping tab, select Functions. The Select Function window opens.
4. 5. 6.
In the Function list, select Lookup Functions. In the Function name list, select lookup_ext. Select Next. The Lookup_ext - Select Parameters dialog box displays as in the graphic below.
Figure 61: Lookup_ext Function Parameters
The Lookup_ext function sets the cache parameter to the value PRE-LOAF CACHE by default. This affects how Data Services uses the records of the lookup table in the cache and has a direct relationship to the performance of the lookup job.
178
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Built-In Functions
Figure 62: Cache Specification for Lookup
Data Services has cache settings at various points to effect the performance of the jobs.
Figure 63: Cache Guidelines
Using the decode function You can use the decode function as an alternative to nested if/then/else conditions. Explaining the decode function
2011
© 2011 SAP AG. All rights reserved.
179
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 5: Using Functions, Scripts and Variables
BODS10
You can use the decode function to return an expression based on the first condition in the specified list of conditions and expressions that evaluates as TRUE. It provides an alternate way to write nested ifthenelse functions. Use this function to apply multiple conditions when you map columns or select columns in a query. For example, you can use this function to put customers into different groupings. The syntax of the decode function uses the format: decode(condition_and_expression_list, default_expression) The decode function provides an easier way to write nested ifthenelse functions. In nested ifthenelse functions, you must write nested conditions and ensure that the parentheses are in the correct places as in this example: ifthenelse((EMPNO = 1),'111', ifthenelse((EMPNO = 2),'222', ifthenelse((EMPNO = 3),'333', ifthenelse((EMPNO = 4),'444', 'NO_ID')))) In the decode function, you list the conditions as in this example: decode((EMPNO = 1),'111', (EMPNO = 2),'222', (EMPNO = 3),'333', (EMPNO = 4),'444', ''NO_ID') The decode function is less prone to error than nested ifthenelse functions. To improve performance, Data Services pushes this function to the database server when possible so that the database server, rather than Data Integrator, evaluates the decode function.
180
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Built-In Functions
Exercise 10: Using the search_replace function Exercise Objectives After completing this exercise, you will be able to: • Use functions in expressions • Use the search_replace function in an expression to change incorrect titles in your source data
Business Example When evaluating the customer data for Alpha Acquisitions, you discover a data entry error. The contact title of “Account Manager” has been entered as “Accounting Manager”. You want to correct these entries before it is moved to the data warehouse.
Task: Use the search_replace function in an expression to change the contact title from “Accounting Manager” to “Account Manager”.
2011
1.
In the Alpha_Customers_DF workspace, delete an existing expression for the Title column in the Query transform.
2.
Using the Function wizard, create a new expression for the Title column using the search_replace function found under the category of “String” functions.
3.
Execute the Alpha_Customers_Job with the default execution properties after saving all objects you have created.
© 2011 SAP AG. All rights reserved.
181
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 5: Using Functions, Scripts and Variables
BODS10
Solution 10: Using the search_replace function Task: Use the search_replace function in an expression to change the contact title from “Accounting Manager” to “Account Manager”. 1.
2.
In the Alpha_Customers_DF workspace, delete an existing expression for the Title column in the Query transform. a)
In the Alpha_Customers_DF workspace, open the transform editor for the Query transform by double-clicking the Query transform.
b)
In the Query transform, select the field Title in the output schema.
c)
Go to the Mapping tab for the Title field and delete the existing expression by highlighting it and using the Delete button on your keyboard.
Using the Function wizard, create a new expression for the Title column using the search_replace function found under the category of “String” functions. a)
Select the Function button and in the Select Function dialog box, open the category of “String Functions”.
b)
From the list of function names, select the search_replace function and select the Next button.
c)
In the Define Input Parameters dialog box, select the drop-down arrow next to the field Input string.
d)
In the Input Parameters dialog box, select by double-clicking the source object source.customer and column Title for the function.
e)
Type in the full string for replacement as Accounting Manager and the replacement string as Account Manager.
f)
Select the Finish button.
Continued on next page
182
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Built-In Functions
3.
2011
Execute the Alpha_Customers_Job with the default execution properties after saving all objects you have created. a)
Right click the Alpha_Customers_Job listed under the Omega project and select the option Execute.
b)
Data Services prompts you to save any objects that have not been saved by selecting the OK button in the Save all changes and execute dialog box.
c)
Use the default execution properties and select the OK button.
d)
Return to the data flow workspace and right click the target table to choose the option View data. Note that the titles for the affected contacts are changed.
© 2011 SAP AG. All rights reserved.
183
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 5: Using Functions, Scripts and Variables
184
© 2011 SAP AG. All rights reserved.
BODS10
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Built-In Functions
Exercise 11: Using the lookup_ext() function Exercise Objectives After completing this exercise, you will be able to: • Use functions in expressions • Use the lookup_ext function
Business Example In the Alpha Acquisitions database, the country for a customer is stored in a separate table and referenced with a code. To speed up access to information in the data warehouse, this lookup should be eliminated.
Task: Use the lookup_ext function to exchange the ID for the country name in the Customers table for Alpha Acquisitions with the actual value from the Countries table.
2011
1.
In the Alpha_Customers_DF workspace, delete an existing expression for the Country column in the Query transform.
2.
Use the Functions wizard to create a new lookup expression using the lookup_ext function.
3.
Execute the Alpha_Customers_Job with the default execution properties after saving all objects you have created.
© 2011 SAP AG. All rights reserved.
185
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 5: Using Functions, Scripts and Variables
BODS10
Solution 11: Using the lookup_ext() function Task: Use the lookup_ext function to exchange the ID for the country name in the Customers table for Alpha Acquisitions with the actual value from the Countries table. 1.
2.
In the Alpha_Customers_DF workspace, delete an existing expression for the Country column in the Query transform. a)
In the Alpha_Customers_DF workspace, open the transform editor for the Query transform by double-clicking the Query transform.
b)
In the Query transform, select the field Country in the output schema.
c)
Go to the Mapping tab for the Country field and delete the existing expression by highlighting it and using the Delete button on your keyboard.
Use the Functions wizard to create a new lookup expression using the lookup_ext function. a)
Select the Function button and in the Select Function dialog box, open the category of “Database Functions”.
b)
From the list of function names, select the lookup_ext function and select the Next button.
c)
In the Lookup_ext - Select Parameters dialog box, enter the parameters: Field/Option
Value
Lookup table
ALPHA.SOURCE.COUNTRY
Condition Columns in lookup table
COUNTRYID
Op.(&)
=
Expression
customer.COUNTRYID
Output Column in lookup table d)
COUNTRYNAME
Select the Back icon to close the editor.
Continued on next page
186
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Built-In Functions
3.
2011
Execute the Alpha_Customers_Job with the default execution properties after saving all objects you have created. a)
Right click the Alpha_Customers_Job listed under the Omega project and select the option Execute.
b)
Data Services prompts you to save any objects that have not been saved by selecting the OK button in the Save all changes and execute dialog box.
c)
Use the default execution properties and select the OK button.
d)
Return to the data flow workspace and right click the target table to choose the option View data. Note that the country codes are replaced by the country names.
© 2011 SAP AG. All rights reserved.
187
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 5: Using Functions, Scripts and Variables
188
© 2011 SAP AG. All rights reserved.
BODS10
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Built-In Functions
Exercise 12: Using the decode function Exercise Objectives After completing this exercise, you will be able to: • Use functions in expressions • Use the decode function
Business Example You need to calculate the total value of all orders, including their discounts, for reporting purposes. Currently these details are found in different tables.
Task: Use the sum and decode functions to calculate the total value of orders in the Order_Details table. 1.
Create a new batch job called Alpha_Order_Sum_Job with a data flow Alpha_Order_Sum_DF.
2.
In the transform editor for the Query transform, propose a join between the two source tables.
3.
In the Query transform, create a new output column TOTAL_VALUE which will hold the new calculation.
4.
On the Mapping tab of the new output column, construct an expression to calculate the total value of the orders using the decode and sum functions. The discount and order total can be multiplied to determine the total after discount. The decode function allows you to avoid multiplying orders with zero discount by zero. Use the Function wizard to construct the decode portion of the mapping. Then use the Smart Editor to wrap the sum function around the expression. The expression must specify that if the value in the DISCOUNT column is not zero, then the total value of the order is calculated by multiplying the QUANTITY from the order_details table by the COST from the product table. Then that sum is multiplied by the value of the DISCOUNT. Otherwise the total value of the order is calculated simply by multiplying the QUANITITY from the order_details table by the COST from the product table. Once these values are calculated for each order, a sum must be calculated for the entire collection of orders.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
189
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 5: Using Functions, Scripts and Variables
190
BODS10
5.
Now that the expression can calculate the total of the order values, make it possible for the Query transform to begin at the first order through the end of the records in the table by using the Goup By tab.
6.
Execute the Alpha_Customers_Job with the default execution properties after saving all objects you have created.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Built-In Functions
Solution 12: Using the decode function Task: Use the sum and decode functions to calculate the total value of orders in the Order_Details table. 1.
Create a new batch job called Alpha_Order_Sum_Job with a data flow Alpha_Order_Sum_DF. a)
In the Project area, right click your Omega project and select the option New batch job and enter the name Alpha_Order_Sum_Job.
b)
Right click the Alpha_Order_Sum_Job to select the option New data flow and enter the name Alpha_Order_Sum_DF.
c)
From the Local Object Library, select the tab Datastores and locate the Alpha datastore. From the Alpha datastore, drag and drop the Order_Details table in to the Alpha_Order_Sum_DF workspace. In the dialog box, choose the option Make Source. .
d)
From the Local Object Library, select the tab Datastores and locate the Alpha datastore. From the Alpha datastore, drag and drop the Products table in to the Alpha_Order_Sum_DF workspace. In the dialog box, choose the option Make Source.
e)
From the tool palette, select the icon for a Template table and then click in the Alpha_Order_Sum_DF workspace to place the template table. Enter order_sum as the table name in the Delta datastore.
f)
From the tool palette, select the icon for a Query transform and then click in the Alpha_Order_Sum_DF workspace to place it.
g)
Connect the Order_Details table to the Query transform by selecting the source table while holding down the mouse button. Drag to the Query transform and release the mouse button.
h)
Connect the Products table to the Query transform by selecting the source table while holding down the mouse button. Drag to the Query transform and release the mouse button.
i)
Connect the Query transform by selecting the transform while holding down the mouse button. Drag to the order_sum table and release the mouse button.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
191
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 5: Using Functions, Scripts and Variables
2.
BODS10
In the transform editor for the Query transform, propose a join between the two source tables. a)
Double click the Query transform to open the transform editor and select the WHERE tab.
b)
In the WHERE tab, select the Propose Join button. The Designer should enter the following code: PRODUCT.PRODUCTID = ORDER_DETAILS.PRODUCTID.
3.
4.
In the Query transform, create a new output column TOTAL_VALUE which will hold the new calculation. a)
Map the ORDERID column from the input schema to the same field in the output schema.
b)
In the output schema, right click the ORDERID column to choose the option New output field and choose the option Below. Then enter the name TOTAL_VALUE with a data type of decimal, precision of 10 and scale of 2.
On the Mapping tab of the new output column, construct an expression to calculate the total value of the orders using the decode and sum functions. The discount and order total can be multiplied to determine the total after discount. The decode function allows you to avoid multiplying orders with zero discount by zero. Use the Function wizard to construct the decode portion of the mapping. Then use the Smart Editor to wrap the sum function around the expression. The expression must specify that if the value in the DISCOUNT column is not zero, then the total value of the order is calculated by multiplying the QUANTITY from the order_details table by the COST from the product table. Then that sum is multiplied by the value of the DISCOUNT. Otherwise the total value of the order is calculated simply by multiplying the QUANITITY from the order_details table by the COST from the product table. Once these values are calculated for each order, a sum must be calculated for the entire collection of orders. a)
Select the icon for the Function wizard and in the Select Function dialog box, select the Miscellaneous Functions category and then select the decode function. Select the Next button. Note: Do not use the base64_decode function.
b)
In the field Conditional expression type in an open parenthesis ( and then select the drop down box arrow and double-click the Order_Details table. Then select the field Discount and then select OK. Continued on next page
192
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Built-In Functions
c)
In the field Conditional expression now type in the less than symbol followed by the greater than symbol. Hint: This represents the expression “is not equal to”. Finally, type in the number zero. Close this expression by typing a close parenthesis ). Note: This expression (ORDER_DETAILS.DISOUNT 0) tests each record to see if the order has a non-zero discount.
d)
In the field Case expression type in two open parentheses (( and then select the drop down box arrow and double-click the Order_Details table. Then select the field Quantity and then select OK. Back in the decode function dialog box, type in an asterisk * which is the symbol for multiplication. Still In the field Case expression select the drop down box arrow and double-click the Products table. Then select the field Cost and then select OK. Back in the decode function dialog box, type in a close parenthesis) followed by an asterisk * which is the symbol for multiplication. Still In the field Case expression select the drop down box arrow and double-click the Order_Details table. Then select the field Discount and then select OK. Close this expression by typing a close parenthesis ). Note: This expression ((ORDER_DETAILS.QUANTITY * PRODUCT.COST) * ORDER_DETAILS.DISCOUNT) is the expression which should be executed if the discount is not zero.
e)
In the field Default expression type in an open parenthesis ( and then select the drop down box arrow and double-click the Order_Details table. Then select the field Quantity and then select OK. Back in the decode function dialog box, type in an asterisk * which is the symbol for multiplication. Still In the field Case expression select the drop down box arrow and double-click the Products table. Then select the field Cost and then select OK.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
193
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 5: Using Functions, Scripts and Variables
BODS10
Close this expression by typing a close parenthesis ). Note: The expression (ORDER_DETAILS.QUANTITY * PRODUCT.COST) is the expression evaluated for each record which has a zero discount. Note: The final expression should be: sum(decode(order_details.discount 0, (order_details.quantity*product.cost)*order_details.discount, order_details.quantity*product.cost)) f)
Select the Finish button and to return to the Mapping tab of the Query transform.
g)
In the Mapping tab of the query transform, place the cursor at the beginning of the expression and select the Smart Editor represented by the button with the elipsis. In the Smart Editor, select the Functions tab and open the Aggregate category node by selecting the plus sign to its left. Now select the sum function and then select the OK button. Note: This will place the sum function at the beginning of the expression followed by an open parenthesis. If you scroll to the end of the expression, you will find that the Smart Editor has properly placed a close parenthesis. Hint: If you validate the expression, the validation will fail. Once you complete the next step, the validation will pass.
5.
Now that the expression can calculate the total of the order values, make it possible for the Query transform to begin at the first order through the end of the records in the table by using the Goup By tab. a)
In the Query transform editor, select the Group By tab.
b)
In the Schema In column , select the ORDERID field from the ORDER_DETAILS table and drag it into the Group By tab.
c)
Select the Back icon to close the editor.
Continued on next page
194
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Built-In Functions
6.
2011
Execute the Alpha_Customers_Job with the default execution properties after saving all objects you have created. a)
Right click the Alpha_Customers_Job listed under the Omega project and select the option Execute.
b)
Data Services prompts you to save any objects that have not been saved by selecting the OK button in the Save all changes and execute dialog box.
c)
Use the default execution properties and select the OK button.
d)
Return to the data flow workspace and view data for the target table to confirm that order 11146 has 204000.00 as a total value.
© 2011 SAP AG. All rights reserved.
195
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 5: Using Functions, Scripts and Variables
BODS10
Lesson Summary You should now be able to: • Use functions in expressions • Use the search_replace function • Use the lookup_ext function • Use the decode function
Related Information For more information on importing functions, see “Custom Datastores”, in Chapter 5, in the Data Services Reference Guide.
196
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Variables, Parameters and Scripts
Lesson: Using Variables, Parameters and Scripts Lesson Overview To apply decision-making and branch logic to work flows, you use a combination of scripts, variables, and parameters to calculate and pass information between the objects in your jobs.
Lesson Objectives After completing this lesson, you will be able to: • • •
Use variables and parameters Use Data Services scripting language Create a custom function
Business Example You want to load data from an external system into SAP NetWeaver Business Warehouse using flat files. You also want to consider the option of loading data by delta upload. You also consider an alternative method. You can use the DB Connect functions for direct data extraction into BW from tables and views of a database management system that is directly connected to BW.
Using scripts, variables and parameters With the Data Services scripting language, you can assign values to variables, call functions, and use standard string and mathematical operators to transform data and manage work flow. Defining variables A variable is common component in scripts that acts as a placeholder to represent values that have the potential to change each time a job is executed. To make them easy to identify in an expression, variable names start with a dollar sign ($). They can be of any datatype supported by Data Services. You can use variables in expressions in scripts or transforms to facilitate decision making or data manipulation (using arithmetic or character substitution). A variable can be used in a LOOP or IF statement to check a variable's value to decide which step to perform.
2011
© 2011 SAP AG. All rights reserved.
197
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 5: Using Functions, Scripts and Variables
BODS10
Note that variables can be used to enable the same expression to be used for multiple output files. Variables can be used as file names for: • • • • •
Flat file sources and targets XML file sources and targets XML message targets (executed in the Designer in test mode) Document file sources and targets (in an SAP ERP environment) Document message sources and targets (SAP ERP environment)
In addition to scripts, you can also use variables in a catch or a conditional. A catch is part of a serial sequence called a try/catch block. The try/catch block allows you to specify alternative work flows if errors occur while Data Services is executing a job. A conditional is a single-use object available in work flows that allows you to branch the execution logic based on the results of an expression. The conditional takes the form of an if/then/else statement. Defining parameters A parameter is another type of placeholder that calls a variable. This call allows the value from the variable in a job or work flow to be passed to the parameter in a dependent work flow or data flow. Parameters are most commonly used in WHERE clauses.
Figure 64: Variables compared to Parameters
Defining global versus local variables There are two types of variables: local and global.
198
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Variables, Parameters and Scripts
Local variables are restricted to the job or work flow in which they are created. You must use parameters to pass local variables to the work flows and data flows in the object. Global variables are also restricted to the job in which they are created. However, they do not require parameters to be passed to work flows and data flows in that job. Instead, you can reference the global variable directly in expressions in any object in that job. Global variables can simplify your work. You can set values for global variables in script objects or using external job, execution, or schedule properties. For example, during production, you can change values for default global variables at runtime from a job's schedule without having to open a job in the Designer. Whether you use global variables or local variables and parameters depends on how and where you need to use the variables. If you need to use the variable at multiple levels of a specific job, we recommend that you create a global variable. However, there are implications to using global variables in work flows and data flows that are reused in other jobs. A local variable is included as part of the definition of the work flow or data flow, and so it is portable between jobs. Since a global variable is part of the definition of the job to which the work flow or data flow belongs, it is not included when the object is reused. This table summarizes the type of variables and parameters you can create for each type of object. Object
Used by
Job
Global variable
Any object in the job.
Job
Local variable
A script or conditional in the job.
Local variable
This work flow or passed down to other work flows or data flows using a parameter.
Parameter
Parent objects to pass local variables. Work flows may also return variables or parameters to parent objects.
Parameter
A WHERE clause, column mapping, or function in the data flow. Data flows cannot return output values.
Work flow
Work flow
Data flow
2011
Type
© 2011 SAP AG. All rights reserved.
199
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 5: Using Functions, Scripts and Variables
BODS10
To ensure consistency across projects and minimize troubleshooting errors, a best practice is to use a consistent naming convention for your variable and parameters. Keep in mind that names can include any alpha or numeric character or underscores, but cannot contain blank spaces. To differentiate between the types of objects, start all names with a dollar sign ($), and use the prefixes: Type
Naming convention
Global variable
$G_
Local variable
$L_
Parameter
$P_
To define a global variable, local variable, or parameter 1.
Select the object in the project area. For a global variable, the object must be a job. For a local variable, it can be a job or a work flow. For a parameter, it can be either a work flow or a data flow.
2.
From the Tools menu, select Variables. The Variables and Parameters dialog box appears.
You can create a relationship between a local variable and the parameter by specifying that the name of the local variable as the value in the properties of the parameter in the Calls tab. To define the relationship between a local variable and a parameter 1. 2. 3.
Select the dependent object in the project area. From the Tools menu, select Variables to open the Variables and Parameters dialog box. Select the Calls tab. Any parameters that exist in dependent objects display on the Calls tab.
4.
Right-click the parameter and select Properties from the menu. The Parameter Value dialog box appears.
5.
In the Value field, enter the name of the local variable you want the parameter to call or a constant value. If you enter a variable, it must of the same datatype as the parameter.
6.
Select OK.
Setting global variables using job properties
200
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Variables, Parameters and Scripts
In addition to setting a variable inside a job using a script, you can also set and maintain global variable values outside a job using properties. Values set outside a job are processed the same way as those set in a script. However, if you set a value for the same variable both inside and outside a job, the value from the script overrides the value from the property. Values for global variables can be set as a job property or as an execution or schedule property. All values defined as job properties are shown in the Properties window. By setting values outside a job, you can rely on the Properties window for viewing values that have been set for global variables and easily edit values when testing or scheduling a job. To set a global variable value as a job property 1.
Right-click a job in the Local Object Library or project area and select Properties from the menu. The Properties dialog box appears.
2.
Select the Global Variable tab. All global variables for the job are listed.
3. 4.
In the Value column for the global variable, enter a constant value or an expression, as required. Select OK. You can also view and edit these default values in the Execution Properties dialog of the Designer. This allows you to override job property values at runtime. Data Services saves values in the repository as job properties.
Defining substitution parameters Substitution parameters provide a way to define parameters that have a constant value for one environment, but might need to get changed in certain situations. In case a change is needed, it can be changed in one location to affect all jobs. You can override the parameter for particular job executions. The typical use case is for file locations (directory files or source/target/error files) that are constant in one environment, but changes when a job is migrated to another environment (like migrating a job from test to production). As with variables and parameters, the name can include any alpha or numeric character or underscores, but cannot contain blank spaces. Follow the same naming convention and always begin the name for a substitution parameter with double dollar signs ($$) and an S_ prefix to differentiate from out-of-the-box substitution parameters.
2011
© 2011 SAP AG. All rights reserved.
201
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 5: Using Functions, Scripts and Variables
BODS10
Figure 65: Substitution Parameters and Variables
To create a substitution parameter configuration, 1.
From the Tools menu, select Substitution Parameter Configurations Note: When exporting a job (to a file or a repository), the substitution parameter configurations (values) are not exported with them. You need to export substitution parameters via a separate command to a text file and use this text file to import into another repository.
Using Data Services scripting language Defining scripts A script is a single-use object that is used to call functions and assign values in a work flow. Typically, a script is executed before data flows for initialization steps and used with conditionals to determine execution paths. A script may also be used after work flows or data flows to record execution information such as time, or a change in the number of rows in a data set. Use a script when you want to calculate values that are passed on to other parts of the work flow. Use scripts to assign values to variables and execute functions.
202
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Variables, Parameters and Scripts
A script can contain these statements: • • • • •
Function calls If statements While statements Assignment statements Operators
Figure 66: Scripting Language
With Data Services scripting language, you can assign values to variables, call functions, and use standard string and mathematical operators. The syntax can be used in both expressions (such as WHERE clauses) and scripts. Using basic syntax Expressions are a combination of constants, operators, functions, and variables that evaluate to a value of a given datatype. Expressions can be used inside script statements or added to data flow objects.
2011
© 2011 SAP AG. All rights reserved.
203
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 5: Using Functions, Scripts and Variables
BODS10
Figure 67: Basic Syntax
Data Services scripting language follows these basic syntax rules when you are creating an expression: • • • • • •
Each statement ends with a semicolon. Variable names start with a dollar sign. String values are enclosed in single quotation marks. Comments start with a pound sign. Function calls always specify parameters, even if they do not use parameters. Square brackets substitute the value of the expression. For example: Print('The value of the start date is:[sysdate()+5]');
•
Curly brackets quote the value of the expression in single quotation marks. For example: $StartDate = sql('demo_target', 'SELECT ExtractHigh FROM Job_Execution_Status WHERE JobName = {$JobName}');
Using syntax for column and table references in expressions Since expressions can be used inside data flow objects, they can contain column names. The Data Services scripting language recognizes column and table names without special syntax. For example, you can indicate the start_date column as the input to a function in the Mapping tab of a query as: to_char(start_date, 'dd.mm.yyyy')
204
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Variables, Parameters and Scripts
The column start_date must be in the input schema of the query. If there is more than one column with the same name in the input schema of a query, indicate which column is included in an expression by qualifying the column name with the table name. For example, indicate the column start_date in the table status as: status.start_date Column and table names as part of SQL strings may require special syntax based on the RDBMS that the SQL is evaluated by. For example, select all rows from the LAST_NAME column of the CUSTOMER table as: sql('oracle_ds','select CUSTOMER.LAST_NAME from CUSTOMER') Using operators The operators you can use in expressions are listed in this table in order of precedence. Note that when operations are pushed to a RDBMS to perform, the precedence is determined by the rules of the RDBMS. Operator
Description
+
Addition
-
Subtraction
*
Multiplication
/
Division
=
Comparison, equals
<
Comparison, is less than
Comparison, is greater than
>=
Comparison, is greater than or equal to
!=
Comparison, is not equal to
||
Concatenate
AND
Logical AND
OR
Logical OR
NOT
Logical NOT
IS NULL
Comparison, is a NULL value
IS NOT NULL
Comparison, is not a NULL value
Using quotation marks Special care must be given to handling of strings. Quotation marks, escape characters, and trailing blanks can all have an adverse effect on your script if used incorrectly.
2011
© 2011 SAP AG. All rights reserved.
205
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 5: Using Functions, Scripts and Variables
BODS10
The type of quotation marks to use in strings depends on whether you are using identifiers or constants. An identifier is the name of the object (for example, table, column, data flow, or function). A constant is a fixed value used in computation. There are two types of constants: • •
String constants (for example, 'Hello' or '2007.01.23') Numeric constants (for example, 2.14)
Identifiers need quotation marks if they contain special (non-lphanumeric) characters. For example, you need a double quote for the next string because it contains blanks: “compute large numbers” Use single quotes for string constants. Using escape characters If a constant contains a single quote or backslash or another special character used by the Data Services scripting language, then those characters must be preceded by an escape character to be evaluated properly in a string. Data Services uses the backslash as the escape character. Character
Example
Single quote (')
'World\'s Books'
Backslash (\)
'C:\\temp'
Handling nulls, empty strings, and trailing blanks To conform to the ANSI VARCHAR standard when dealing with NULLS, empty strings, and trailing blanks, Data Services: • • • • •
Treats an empty string as a zero length varchar value, instead of as a NULL value. Returns a value of FALSE when you use the operators Equal (=) and Not Equal () to compare to a NULL value. Provides IS NULL and IS NOT NULL operators to test for NULL values. Treats trailing blanks as regular characters when reading from all sources, instead of trimming them. Ignores trailing blanks in comparisons in transforms (Query and Table Comparison) and functions (decode, ifthenelse, lookup, lookup_ext, lookup_seq).
NULL values To represent NULL values in expressions, type the word NULL. For example, you can check whether a column (COLX) is null or not with these expressions: COLX IS NULL
206
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Variables, Parameters and Scripts
COLX IS NOT NULL Data Services does not check for NULL values in data columns. Use the function “NVL” to remove NULL values. NULL values and empty strings Data Services uses two rules with empty strings: •
When you assign an empty string to a variable, Data Services treats the value of the variable as a zero-length string. An error results if you assign an empty string to a variable that is not a varchar. To assign a NULL value to a variable of any type, use the NULL constant.
•
As a blank constant (' '), Data Services treats the empty string as a varchar value of zero length. Use the NULL constant for the null value.
Data Services uses these three rules with NULLS and empty strings in conditionals: Rule 1 The Equals (=) and Is Not Equal to () comparison operators against a NULL value always evaluate to FALSE. This FALSE result includes comparing a variable that has a value of NULL against a NULL constant. Rule 2 Use the IS NULL and IS NOT NULL operators to test the presence of null values. For example, assuming a variable assignment $var1 = NULL; Rule 3 When comparing two variables, always test for NULL. In this scenario, you are not testing a variable with a value of NULL against a NULL constant (as in the first rule). Either test each variable and branch accordingly or test in the conditional.
Scripting a custom function If the built-in functions that are provided by Data Services do not meet your requirements, you can create your own custom functions using the Data Services scripting language. Combining scripts, variables, and parameters To illustrate how scripts, variables, and parameters are used together, consider an example where you start with a job, work flow, and data flow. You want the data flow to update only those records that have been created since the last time the job executed. To accomplish this, you would start by creating a variable for the update time at the work flow level, and a parameter at the data flow level that calls the variable.
2011
© 2011 SAP AG. All rights reserved.
207
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 5: Using Functions, Scripts and Variables
BODS10
Next, you would create a script within the work flow that executes before the data flow runs. The script contains an expression that determines the most recent update time for the source table. The script then assigns that update time value to the variable, which identifies what that value is used for and allows it to be reused in other expressions. Finally, in the data flow, you create an expression that uses the parameter to call the variable and find out the update time. This allows the data flow to compare the update time to the creation date of the records and identify which rows to extract from the source. You can create your own functions by writing script functions in Data Services scripting language using the Smart Editor. Saved custom functions appear in the Function wizard and the Smart Editor under the Custom Functions category, and are also displayed on the Custom Functions tab of the Local Object Library. You can edit and delete custom functions from the Local Object Library. Consider these guidelines when you create your own functions: • • • • •
Functions can call other functions. Functions cannot call themselves. Functions cannot participate in a cycle of recursive calls. For example, function A cannot call function B if function B calls function A. Functions return a value. Functions can have parameters for input, output, or both. However, data flows cannot pass parameters of type output or input/output.
Before creating a custom function, you must know the input, output, and return values and their data types. The return value is predefined to be Return. To create a custom function: 1.
On the Custom Functions tab of the Local Object Library, right-click the white space and select New from the menu. The Custom Function dialog box displays.
2. 3. 4.
In the Function name field, enter a unique name for the new function. In the Description field, enter a description. Select Next. The Smart Editor enables you to define the return type, parameter list, and any variables to be used in the function.
Importing a stored procedure as a function If you are using Microsoft SQL Server, you can use stored procedures to insert, update, and delete data in your tables. To use stored procedures in Data Services, you must import them as custom functions.
208
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Variables, Parameters and Scripts
To import a stored procedure 1. 2.
On the Datastores tab of the Local Object Library, expand the datastore that contains the stored procedure. Right-click Functions and select Import By Name from the menu. The Import By Name dialog box displays.
2011
© 2011 SAP AG. All rights reserved.
209
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 5: Using Functions, Scripts and Variables
210
© 2011 SAP AG. All rights reserved.
BODS10
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Variables, Parameters and Scripts
Exercise 13: Creating a custom function Exercise Objectives After completing this exercise, you will be able to: • Create a custom function using variables, parameters and scripts
Business Example The Marketing department would like to send special offers to customer have placed a specified number of orders. This requires creating a custom function that must be called when a customer order is placed.
Task: Create a custom function to accept the input parameters of the Customer ID and the number of orders required to receive a special order, check the Orders table and then create an initial list of eligible customers.
2011
1.
In the Local Object Library, create a new customer function called CF_MarketingOffer.
2.
Create a new batch job and data flow called Alpha_Marketing_Offer_Job and Alpha_Marketing_Offer_DF respectively and a new global variable $G_Num_to_Qual.
3.
In the job workspace, define a script to define the global variable and attach the script to the data flow.
4.
Define the data flow with the Customer table from the Alpha datastore as a source, a template table as a target and two query transforms between the source and target.
5.
Execute Alpha_Marketing_Offer_Job with the default properties and view the results.
© 2011 SAP AG. All rights reserved.
211
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 5: Using Functions, Scripts and Variables
BODS10
Solution 13: Creating a custom function Task: Create a custom function to accept the input parameters of the Customer ID and the number of orders required to receive a special order, check the Orders table and then create an initial list of eligible customers. 1.
In the Local Object Library, create a new customer function called CF_MarketingOffer. a)
In the Local Object Library, select the Custom Functions tab, and in the tab, right click and select the option New.
b)
In the Custom Function dialog box, enter CF_MarketingOffer in the Function name field and select Next.
c)
In the Smart Editor, select the Variables tab and right click Parameters and select New from the menu and use $P_CustomerID as the parameter's name.
d)
Right click $P_CustomerID and select Properties from the menu.
e)
In the Return value Properties dialog box, select the drop-down list Data type and choose the data type int and a parameter type of Input from the Parameter type drop-down list. Then select OK.
f)
Right click Parameters and select New from the menu and use $P_Orders as the name for a second parameter.
g)
Right click $P_CustomerID and select Properties from the menu.
h)
In the Return value Properties dialog box, select the drop-down list Data type and choose the data type int and parameter type of Input from the Parameter type drop-down list. Then select OK.
i)
In the workspace of the Smart Editor, define the custom function as a conditional clause. The conditional clause should specify that, if the number of rows in the Orders table is equal to the value of the parameter $P_Orders for the Customer ID, the function should return a 1. Otherwise, it should return zero. Type in this code on three separate lines: if ((sql('alpha', 'select count(*) from orders where customerid = [$P_CustomerID]')) >= $P_OrdersNote: There should be no line break between = and [$P_CustomerID]')) Return 1;
Continued on next page
212
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Variables, Parameters and Scripts
Else return 0; Note: Do not use the ifthenelse function. Type in the if function. j)
2.
Validate your code by using the Validate button and make any necessary corrections. Then select OK. Note: If your function contains syntax errors, Data Services displays a list of those errors in an embedded pane below the editor. To see where the error occurs in the text, double-click an error. The Smart Editor redraws to show you the location of the error.
Create a new batch job and data flow called Alpha_Marketing_Offer_Job and Alpha_Marketing_Offer_DF respectively and a new global variable $G_Num_to_Qual. a)
In the project area, right click the Omega project to select the option New batch job and enter the name Alpha_Marketing_Offer_Job.
b)
From the Tool Palette, select the data flow icon and drag it into the workspace and enter the name Alpha_Marketing_Offer_DF.
c)
In the project area, select the job Alpha_Marketing_Offer_Job and then use the menu path Tools → Variables.
d)
Right click Variables and select Insert from the menu.
e)
Right click the new variable and select Properties from the menu and enter $G_Num_to_Qual in the Global Variable Properties dialog box. In the Data type drop down list, select int for the datatype and select OK.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
213
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 5: Using Functions, Scripts and Variables
3.
BODS10
In the job workspace, define a script to define the global variable and attach the script to the data flow. a)
In the project area, select the Alpha_Marketing_Offer_Job and then from the Tool Palette, select the script icon and drag it into the workspace. Name the script CheckOrders.
b)
Double-click the script to open it and create an expression to define the global variable as five orders to qualify for the special marketing campaign. Type in this expression: $G_Num_to_Qual = 5;
4.
c)
Close the script and return to the job workspace.
d)
Connect the script to the data flow by selecting the script, while holding down the mouse button, and dragging to the data flow. Release the button to create the connection. Double-click the data flow to open its workspace.
Define the data flow with the Customer table from the Alpha datastore as a source, a template table as a target and two query transforms between the source and target. a)
From the Local Object Library, select the tab Datastores and drag the Customer table in the Alpha datastore into the data flow workspace. From the menu, select the option Make target.
b)
From the Tool Palette, drag the icon for a template table into the dataflow workspace. Use offer_mailing_list as the template table name, select the Delta datastore and click OK.
c)
From the Tool Palette, drag the icon for the Query transform into the data flow workspace twice. Connect all the objects.
d)
Double-click the first Query transform and in the transform editor, map the columns as indicated: Schema In
Schema Out
CONTACTNAME
CONTACTNAME
ADDRESS
ADDRESS
CITY
CITY
POSTALCODE
POSTALCODE
Continued on next page
214
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Variables, Parameters and Scripts
e)
Right click POSTALCODE and choose the option New Output Column and select Below from the menu.
f)
Right click the new output column and select Properties from the menu. Enter OFFER_STATUS as the name and from the Datatype drop down list choose int and select OK.
g)
Select OFFER_STATUS and on the Mapping tab, select the button for the Function Wizard. Select the category Custom Functions and select your custom function CF_MarketingOffer and click Next. Then in the Smart Editor use the CUSTOMERID column for the parameter $P_CustomerID and the global variable for the parameter $P_Orders. The expression should look like this: CF_MarketingOffer (customer.CUSTOMERID, $G_Num_to_Qual) Select Back to close the editor.
h)
i)
Double-click the second Query transform and in the transform editor, map the columns as indicated: Schema In
Schema Out
CONTACTNAME
CONTACTNAME
ADDRESS
ADDRESS
CITY
CITY
POSTALCODE
POSTALCODE
Select the WHERE tab and enter an expression to select only those records where OFFER_STATUS has a value of one. The expression should be: Query.OFFER_STATUS = 1 Select Back to close the editor.
j)
Connect the source table to the first Query transform by selecting the source table, while holding down the mouse button, and dragging to the Query transform. Release the button to create the connection.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
215
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 5: Using Functions, Scripts and Variables
5.
BODS10
k)
Connect the first Query transform to the second Query transform by selecting the first query transform, while holding down the mouse button, and dragging to the second Query transform. Release the button to create the connection.
l)
Connect the target table to the second Query transform by selecting the Query transform, while holding down the mouse button, and dragging to the target table. Release the button to create the connection.
Execute Alpha_Marketing_Offer_Job with the default properties and view the results. a)
In the project area, select your Alpha_Marketing_Offer_Job and choose the option Execute.
b)
Select Save to save all objects you have created.
c)
In the next dialog box, accept all the default execution properties and select OK.
d)
When the job is finished, close the log and return to the dataflow workspace. Select the small magnifying glass icon in the lower right corner of the template table to use the option View data. You should have one output record for contact Lev M. Melton in Quebec.
216
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Variables, Parameters and Scripts
Lesson Summary You should now be able to: • Use variables and parameters • Use Data Services scripting language • Create a custom function
Related Information •
2011
For more information on the NVL function, see “Functions and Procedures”, Chapter 6 in the Data Services Reference Guide.
© 2011 SAP AG. All rights reserved.
217
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit Summary
BODS10
Unit Summary You should now be able to: • Use functions in expressions • Use the search_replace function • Use the lookup_ext function • Use the decode function • Use variables and parameters • Use Data Services scripting language • Create a custom function
218
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6 Using Platform Transforms Unit Overview Transforms are optional objects in a data flow that allow you to transform your data as it moves from source to target. In data flows, transforms operate on input data sets by changing them or by generating one or more new data sets. Transforms are added as components to your data flow in the same way as source and target objects. Each transform provides different options that you can specify based on the transform's function. You can choose to edit the input data, output data, and parameters in a transform.
Unit Objectives After completing this unit, you will be able to: • • • • • •
Describe platform transforms Use the Map Operation transform in a data flow Use the Validation transform Use the Merge transform Use the Case transform Use the SQL transform
Unit Contents Lesson: Using Platform Transforms ..........................................220 Lesson: Using the Map Operation Transform ...............................224 Exercise 14: Using the Map Operation transform ......................227 Lesson: Using the Validation Transform .....................................231 Exercise 15: Using the Validation transform ............................239 Lesson: Using the Merge Transform .........................................254 Exercise 16: Using the Merge transform ................................257 Lesson: Using the Case Transform ..........................................270 Exercise 17: Using the Case transform .................................275 Lesson: Using the SQL Transform ...........................................283 Exercise 18: Using the SQL transform ..................................287
2011
© 2011 SAP AG. All rights reserved.
219
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
BODS10
Lesson: Using Platform Transforms Lesson Overview A transform enables you to control how data sets change in a data flow.
Lesson Objectives After completing this lesson, you will be able to: •
Describe platform transforms
Business Example Your company extracts data from external systems using flat files. The data volume from the various external systems has increased continually in the recent past, making management of the jobs for flat file extraction difficult. You can optimize this process by using Data Services to extract data directly from an external system.
Describing Platform transforms Transforms are optional objects in a data flow that allow you to transform your data as it moves from source to target.
Figure 68: Data Services Transforms
220
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Platform Transforms
After completing this unit, you can: • • • •
Explain transforms Describe the platform transforms available in Data Services Add a transform to a data flow Describe the Transform Editor window
Explaining transforms Transforms are objects in data flows that operate on input data sets by changing them or by generating one or more new data sets. The Query transform is the most commonly–used transform. Transforms are added as components to your data flow in the same way as source and target objects. Each transform provides different options that you can specify based on the transform's function. You can choose to edit the input data, output data, and parameters in a transform. Some transforms, such as the Date Generation and SQL transforms, can be used as source objects, in which case they do not have input options. Transforms are used in combination to create the output data set. For example, the Table Comparison, History Preserve, and Key Generation transforms are used for slowly changing dimensions. Transforms are similar to functions in that they can produce the same or similar values during processing. However, transforms and functions operate on a different scale: • •
2011
Functions operate on single values, such as values in specific columns in a data set. Transforms operate on data sets by creating, updating, and deleting rows of data.
© 2011 SAP AG. All rights reserved.
221
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
BODS10
Figure 69: Comparison of Transforms and Functions
Describing platform transforms The following platform transforms are available on the Transforms tab of the Local Object Library: Transform
222
Description
Case
Divides the data from an input data set into multiple output data sets based on IF-THEN-ELSE branch logic.
Map Operation
Allows conversions between operation codes.
Merge
Unifies rows from two or more input data sets into a single output data set.
Query
Retrieves a data set that satisfies conditions that you specify. A query transform is similar to a SQL SELECT statement.
Row Generation
Generates a column filled with integers starting at zero and incrementing by one to the end value you specify.
SQL
Performs the indicated SQL query operation.
Validation
Allows you to specify validation criteria for an input data set. Data that fails validation can be filtered out or replaced. You can have one validation rule per column.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Platform Transforms
Lesson Summary You should now be able to: • Describe platform transforms
2011
© 2011 SAP AG. All rights reserved.
223
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
BODS10
Lesson: Using the Map Operation Transform Lesson Overview The Map Operation transform enables you to change the operation code for records.
Lesson Objectives After completing this lesson, you will be able to: •
Use the Map Operation transform in a data flow
Business Example Your company extracts data from external systems using flat files. The data volume from the various external systems has increased continually in the recent past, making management of the jobs for flat file extraction difficult. You can optimize this process by using Data Services to extract data directly from an external system. You want to control how the data is to be loaded into the target and want to explore the capabilities of the Map Operation transform to control the target updating.
Using the Map Operation transform Transforms are optional objects in a data flow that allow you to transform your data as it moves from source to target. Transforms are objects in data flows that operate on input data sets by changing them or by generating one or more new data sets. The Query transform is the most commonly–used transform. Transforms are added as components to your data flow in the same way as source and target objects. Each transform provides different options that you can specify based on the transform's function. You can choose to edit the input data, output data, and parameters in a transform. The Map Operation transform enables you to change the operation code for records. Describing map operations Data Services maintains operation codes that describe the status of each row in each data set described by the inputs to and outputs from objects in data flows. The operation codes indicate how each row in the data set would be applied to a target table if the data set were loaded into a target. The operation codes are:
224
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Map Operation Transform
Operation Code
Description Creates a new row in the target.
NORMAL
All rows in a data set are flagged as NORMAL when they are extracted by a source table or file. If a row is flagged as NORMAL when loaded into a target table or file, it is inserted as a new row in the target. Most transforms operate only on rows flagged as NORMAL. Creates a new row in the target.
INSERT
Only History Preserving and Key Generation transforms can accept data sets with rows flagged as INSERT as input. Is ignored by the target. Rows flagged as DELETE are not loaded.
DELETE
Only the History Preserving transform, with the Preserve delete row(s) as update row(s) option selected, can accept data sets with rows flagged as DELETE.
UPDATE
Overwrites an existing row in the target table.
Only History Preserving and Key Generation transforms Explaining the Map Operation transform can accept data sets with rows flagged as UPDATE as input. The Map Operation transform allows you to change operation codes on data sets to produce the desired output. For example, if a row in the input data set has been updated in some previous operation in the data flow, you can use this transform to map the UPDATE operation to an INSERT. The result could be to convert UPDATE rows to INSERT rows to preserve the existing row in the target
2011
© 2011 SAP AG. All rights reserved.
225
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
BODS10
Figure 70: Introduction to the Map Operation Transform
Data Services can push Map Operation transforms to the source database. The next section gives a brief description the function, data input requirements, options, and data output results for the Map Operation transform. Input for the Map Operation transform is a data set with rows flagged with any operation codes. It can contain hierarchical data. Use caution when using columns of datatype real in this transform, because comparison results are unpredictable for this datatype. Output for the Map Operation transform is a data set with rows flagged as specified by the mapping operations. The Map Operation transform enables you to set the Output row type option to indicate the new operations desired for the input data set. Choose from the following operation codes: INSERT, UPDATE, DELETE, NORMAL, or DISCARD.
226
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Map Operation Transform
Exercise 14: Using the Map Operation transform Exercise Objectives After completing this exercise, you will be able to: • Use the Map Operation transform in a data flow
Business Example Users of employee reports have requested that employee records in the data mart contain only record for current employees. You use the Map Operation transform to change the behavior of loading so the resulting target conforms to this business requirement.
Task: Use the Map Operation transform to remove any employee records that a value in the discharge data column of the source data. 1.
Create a new batch job Alpha_Employees_Current_Job with a data flow Alpha_Employees_Current_DF which, contains a Map Operation transform.
2.
Add the Map Operation transform to the data flow, change the output operation code of NORMAL to DELETE, save all objects and execute the job.
3.
Save all objects and execute the Alpha_Employees_Current_Job.
Result Two rows were filtered from the target table. Both of these records have discharge_date field entries.
2011
© 2011 SAP AG. All rights reserved.
227
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
BODS10
Solution 14: Using the Map Operation transform Task: Use the Map Operation transform to remove any employee records that a value in the discharge data column of the source data. 1.
Create a new batch job Alpha_Employees_Current_Job with a data flow Alpha_Employees_Current_DF which, contains a Map Operation transform. a)
In the project area right click the Omega project and select the option New job and change the name to Alpha_Employees_Current_Job.
b)
In the workspace for the job, go to the Tool Palette, select the icon for a data flow and drag it to the workspace. Give the data flow the name Alpha_Employees_Current_DF.
c)
Double-click the data flow to open the data flow workspace and drag the Employee table from the Alpha datastore In the Local Object Library into the workspace. From the next menu, choose the option source.
d)
Drag the Employee table from the HR_datamart datastore In the Local Object Library into the workspace. From the next menu, choose the option Make target.
e)
From the Tool Palette, select the icon for a Query transform and drag it into the workspace. Then connect all the objects.
f)
Double-click the Query transform to access the transform editor and map all columns from the input schema to the same column in the output schema. Drag each field from the input schema to its counterpart in the output schema.
g)
On the WHERE tab of the Query transform, enter an expression to select only those rows where the discharge date field is not empty. Enter the code: employee.discharge_date is not null
h)
Select the Back icon to close the editor. Continued on next page
228
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Map Operation Transform
2.
3.
Add the Map Operation transform to the data flow, change the output operation code of NORMAL to DELETE, save all objects and execute the job. a)
In the data flow workspace, disconnect the Query transform from the target table by right clicking the connection to select the option Delete.
b)
In the Local Object Library, select the Transform tab. Open the node Data Transforms. Select the Map Operation transform and drag it into the data flow workspace.
c)
Double-click the Map Operation transform to open the transform editor and change the settings to that rows with an input operation code of Normal have an output operation code of DELETE. Select OK.
d)
Select the Back icon to close the editor.
Save all objects and execute the Alpha_Employees_Current_Job. a)
In the project area, right click the job Alpha_Employees_Current_Job and select the option Execute job.
b)
In the next dialog box, select OK to save all the objects.
c)
In the Execution Properties dialog box, accept all the default settings and select OK.
d)
Once the job executes successfully, in the data flow workspace, select the magnifying glass button on the source table. A large View Data pane appears beneath the current workspace area.
e)
To compare the data, select the magnifying glass button on the target table.
Result Two rows were filtered from the target table. Both of these records have discharge_date field entries.
2011
© 2011 SAP AG. All rights reserved.
229
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
BODS10
Lesson Summary You should now be able to: • Use the Map Operation transform in a data flow
Related Information •
230
For more information on the Map Operation transform see “Transforms” Chapter 5 in the Data Services Reference Guide
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Validation Transform
Lesson: Using the Validation Transform Lesson Overview The Validation transform enables you to create validation rules and move data into target objects based on whether they pass or fail validations.
Lesson Objectives After completing this lesson, you will be able to: •
Use the Validation transform
Business Example Your company extracts data from external systems using flat files. The data volume from the various external systems has increased continually in the recent past, making management of the jobs for flat file extraction difficult. You can optimize this process by using Data Services to extract data directly from an external system. Order data is stored in multiple formats with different structures and different information. You want to know how to use the Validation transform to validate order data from flat file sources and the database tables before merging it.
Using the Validation transform The Validation transform enables you to create validation rules and move data into target objects based on whether they pass or fail validation. Explaining the Validation transform Use the Validation transform in your data flows when you want to ensure that the data at any stage in the data flow meets your criteria. For example, you can set the transform to ensure that all values: • • •
2011
Are within a specific range Have the same format Do not contain NULL values
© 2011 SAP AG. All rights reserved.
231
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
BODS10
Figure 71: Introduction to the Validation Transform
The Validation transform allows you to define a reusable business rule to validate each record and column. The Validation transform qualifies a data set based on rules for input schema columns. It filters out or replaces data that fails your criteria. The available outputs are pass and fail. You can have one validation rule per column. For example, if you want to load only sales records for October 2010, you would set up a validation rule that states: Sales Date is between 10/1/2010 to 10/31/2010. Data Services looks at this date field in each record to validate if the data meets this requirement. If it does not, you can choose to pass the record into a Fail table, correct it in the Pass table, or do both.
232
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Validation Transform
Figure 72: Validation Transform Editor
Your validation rule consists of a condition and an action on failure: •
Use the condition to describe what you want for your valid data. For example, specify the condition “IS NOT NULL” if you do not want any “NULLS” in data passed to the specified target.
•
Use the Action on Failure area to describe what happens to invalid or failed data. Continuing with the example above, for any NULL values, you may want to select the Send to Fail option to send all “NULL” values to a specified “FAILED” target table.
2011
© 2011 SAP AG. All rights reserved.
233
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
BODS10
Figure 73: Add/Edit Rule Editor
Figure 74: Conflict of Validation Rules
You can also create a custom Validation function and select it when you create a validation rule. The next section gives a brief description the function, data input requirements, options, and data output results for the Validation transform.
234
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Validation Transform
Only one source is allowed as a data input for the Validation transform. The Validation transform outputs up to two different data sets based on whether the records pass or fail the validation condition you specify. You can load pass and fail data into multiple targets. The Pass output schema is identical to the input schema. Data Services adds the two columns to the Fail output schemas: •
The DI_ERRORACTION column indicates where failed data was sent in this way: – –
The letter B is used for sent to both Pass and Fail outputs. The letter F is used for sent only to the Fail output.
If you choose to send failed data to the Pass output, Data Services does not track the results. You may want to substitute a value for failed data that you send to the Pass output because Data Services does not add columns to the Pass output. •
The DI_ERRORCOLUMNS column displays all error messages for columns with failed rules. The names of input columns associated with each message are separated by colons. For example, “ failed rule(s): c1:c2” . If a row has conditions set for multiple columns and the Pass action, Fail action , and Both actions are specified for the row, then the precedence order is Fail, Both, Pass. For example, if one column’s action is Send to Fail and the column fails, then the whole row is sent only to the Fail output. Other actions for other validation columns in the row are ignored.
When you use the Validation transform, you select a column in the input schema and create a validation rule in the Validation transform editor. The Validation transform offers several options for creating this validation rule: Option
Description
Enable Validation
Turn the validation rule on and off for the column.
Do not validate when NULL
Send all NULL values to the Pass output automatically. Data Services does not apply the validation rule on this column when an incoming value for it is NULL.
Condition
Define the condition for the validation rule: Define where a record is loaded if it fails the validation rule:
Action on Fail
2011
• • •
Send to Fail Send to Pass Send to both
© 2011 SAP AG. All rights reserved.
235
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
BODS10
If you choose Send to Pass or Send to Both, you can choose to substitute a value or expression for the failed values that are sent to the Pass output. The Rule Violation table lists for each record, what were all the rules/columns failed. The field Row_ID,, which is also added to the fail table, allows you to make the link back to the original data. In this example row 1 and 2 each failed for one validation rule (validZIP and validPhone). Row 3 failed both rules. With the Rule Violation table you can now easily create queries and reports to show all rows that failed for a particular rule and count the number of failures per rule.
Figure 75: Rule Violation Statistics
To create a validation rule: 1. 2. 3. 4.
Open the data flow workspace. Add your source object to the workspace. On the Transforms tab of the Local Object Library, select and drag the Validation transform to the workspace to the right of your source object. Add your target objects to the workspace. You require one target object for records that pass validation, and an optional target object for records that fail validation, depending on the options you select.
5.
236
Connect the source object to the transform.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Validation Transform
6.
Double-click the Validation transform to open the transform editor and configure the validation rules.
Figure 76: Validation Reminders
2011
© 2011 SAP AG. All rights reserved.
237
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
238
© 2011 SAP AG. All rights reserved.
BODS10
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Validation Transform
Exercise 15: Using the Validation transform Exercise Objectives After completing this exercise, you will be able to: • Use the Validation transform
Business Example Order date is stored in multiple formats with different structures and different information. You want to learn how to use the Validation transform to validate order data from flat file sources and the alpha orders table before merging it.
Task 1: Create a flat file format called Order_Shippers_Format for flat files containing order delivery information. 1.
Create a flat file format called Order_Shippers_Format.
2.
Adjust the datatypes for the columns proposed by the Designer based on their content.
Task 2: Create a new batch job called Alpha_Orders_Validated_Job and two data flows, one named Alpha_Orders_Files_DF and Alpha_Orders_DB_DF in the Omega project. 1.
In the Omega project, create a new batch job Alpha_Orders_Validated_Job with a new data flow called Alpha_Orders_Files_DF.
2.
In the Omega project, create a new batch job Alpha_Orders_Validated_Job with a new data flow called Alpha_Orders_DB_DF.
Task 3: Design the data flow Alpha_Orders_Files_DF with file formats, a Query transform, a Validation transform and target template tables. 1.
In the workspace for Alpha_Orders_Files_DF, add the file formats Orders_Format and Orders_Shippers_Format as source objects.
2.
Create a new template table Orders_Files_Work in the Delta datastore as the target object.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
239
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
BODS10
3.
Create a new template table Orders_Files_No_Fax in the Delta datastore as the target object.
4.
Add the Query transform to the workspace and connect both sources to it.
5.
Add the Validation transform to the workspace to the right of the Query transform and connect them.
6.
Edit the source file formats in the data flow to use all three related orders and order shippers flat files.
7.
Edit the source file formats in the data flow to use all three related orders and order shippers flat files.
8.
Complete the data flow Alpha_Orders_Files_DF by connecting the pass and fail outputs from the Validation transform to the target template tables.
Task 4: Design the data flow Alpha_Orders_DB_DF with the Orders table from the Alpha datastore, a Query transform, a Validation transform and target template tables.
240
1.
In the workspace for Alpha_Orders_DB_DF, add the Orders table from the Alpha datastore as a source object.
2.
Create a new template table Orders_DB_Work in the Delta datastore as the target object.
3.
Create a new template table Orders_DB_No_Fax in the Delta datastore as the target object.
4.
Add the Query transform to the workspace and connect both sources to it.
5.
Add the Validation transform to the workspace to the right of the Query transform and connect them.
6.
Complete the data flow Alpha_Orders_DB_DF by connecting the pass and fail outputs from the Validation transform to the target template tables.
7.
Execute the Alpha_Orders_Validated_Job and view the differences between passing and failing records.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Validation Transform
Solution 15: Using the Validation transform Task 1: Create a flat file format called Order_Shippers_Format for flat files containing order delivery information. 1.
Create a flat file format called Order_Shippers_Format. a)
In the Local Object Library, select the tab Formats and right-click Flat Files and select New from the menu to open the File Format Editor.
b)
In the Type field, specify the type Delimited.
c)
In the Name field, enter the name Order_Shippers_Format.
d)
To select the source directory, select the folder icon to select My Documents → BODS10 → Activity_Source.
e)
To select the appropriate file, select the file icon to select the source file Order_Shippers_01_20_07.txt.
f)
Change the value of the column delimiter to a semicolon by typing in a semicolon.
g)
Change the row delimited by clicking in the value for this property and using the drop-down box to choose the value Windows new line.
h)
Set the value for skipping the row header to 1.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
241
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
2.
BODS10
Adjust the datatypes for the columns proposed by the Designer based on their content. a)
b)
In the Column Attributes pane, change these field datatypes: Column
Datatype
ORDERID
int
SHIPPERNAME
varchar(50)
SHIPPERADDRESS
varchar(50)
SHIPPERCITY
varchar(50)
SHIPPERCOUNTRY
int
SHIPPERPHONE
varchar(20)
SHIPPPERFAX
varchar(20)
SHIPPERREGION
int
SHIPPERPOSTASLCODE
varchar(15)
Select the button Save and close.
Task 2: Create a new batch job called Alpha_Orders_Validated_Job and two data flows, one named Alpha_Orders_Files_DF and Alpha_Orders_DB_DF in the Omega project. 1.
In the Omega project, create a new batch job Alpha_Orders_Validated_Job with a new data flow called Alpha_Orders_Files_DF. a)
In the Project area, right click the project name and choose New Batch Job from the menu.
b)
Enter the name of the job as Alpha_Orders_Validated_Job.
c)
Press Enter to commit the change.
d)
Open the job Alpha_Orders_Validated_Job by double-clicking it.
e)
Select the Data Flow icon in the Tool Palette.
f)
Select the workspace where you want to add the data flow.
g)
Enter Alpha_Orders_Files_DF as the name.
h)
Press Enter to commit the change.
Continued on next page
242
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Validation Transform
2.
In the Omega project, create a new batch job Alpha_Orders_Validated_Job with a new data flow called Alpha_Orders_DB_DF. a)
Open the job Alpha_Orders_Validated_Job by double-clicking it.
b)
Select the Data Flow icon in the Tool Palette.
c)
Select the workspace where you want to add the data flow.
d)
Enter Alpha_Orders_DB_DF as the name.
e)
Press Enter to commit the change.
Task 3: Design the data flow Alpha_Orders_Files_DF with file formats, a Query transform, a Validation transform and target template tables. 1.
2.
In the workspace for Alpha_Orders_Files_DF, add the file formats Orders_Format and Orders_Shippers_Format as source objects. a)
In the Local Object Library, select the Formats tab and then select the file format Orders_Format.
b)
Select and drag the object to the data flow workspace and in the context menu, choose the option Make Source.
c)
In the Local Object Library, select the Formats tab and then select the file format Orders_Shippers_Format.
d)
Select and drag the object to the data flow workspace and in the context menu, choose the option Make Source.
Create a new template table Orders_Files_Work in the Delta datastore as the target object. a)
In the Tool Palette, select the Template Table icon and select the workspace to add a new template table to the data flow.
b)
In the Create Template dialog box, enter Orders_Files_Work as the template table name.
c)
In the In datastore drop-down list, select the Delta datastore as the template table destination.
d)
Select OK.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
243
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
3.
4.
BODS10
Create a new template table Orders_Files_No_Fax in the Delta datastore as the target object. a)
In the Tool Palette, select the Template Table icon and select the workspace to add a new template table to the data flow.
b)
In the Create Template dialog box, enter Orders_Files_No_Fax as the template table name.
c)
In the In datastore drop-down list, select the Delta datastore as the template table destination.
d)
Select OK.
Add the Query transform to the workspace and connect both sources to it. a)
In the Tool Palette, select the Query transform icon and select the workspace to add a Query template to the data flow.
b)
Connect the source file formats Orders_Format and Orders_Shippers_Format to the Query transform by selecting the sources and holding down the mouse button, drag the cursor to the Query transform. Then release the mouse button.
c)
Double-click the Query transform to open the editor.
d)
In the transform editor for the Query transform, select the WHERE tab and select in the workspace to enter the expression Orders_Shippers_Format.ORDERID = Orders_Format.ORDERID to join the data in the formats on the OrderID values.
e)
In the Query transform, select these input schema fields and drag them to the output schema. This creates the necessary mapping. Input Schema
Field
Output Schema
Orders_Format
ORDERID
ORDERID
Orders_Format
CUSTOMERID
CUSTOMERID
Orders_Format
ORDERDATE
ORDERDATE
Order_Shippers_Format
SHIPPERNAME
SHIPPERNAME
Order_Shippers_Format
SHIPPERADDRESS SHIPPERADDRESS
Order_Shippers_Format
SHIPPERCITY
SHIPPERCITY
Order_Shippers_Format
SHIPPERCOUNTRY
SHIPPERCOUNTRY Continued on next page
244
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Validation Transform
f)
Order_Shippers_Format
SHIPPERPHONE
SHIPPERPHONE
Order_Shippers_Format
SHIPPERFAX
SHIPPERFAX
Order_Shippers_Format
SHIPPERREGION
SHIPPERREGION
Order_Shippers_Format
SHIPPERPOSTALCODE
SHIPPERPOSTALCODE
In the output schema of the Query transform, right-click the field ORDERDATE and from the menu, select the option, New Output Column. From the next context menu, select Above. Name the new field ORDER_TAKEN_BY with a datatype of varchar and a length of 15. Map ORDER_TAKEN_BY to Orders_Format.EMPLOYEEID by selecting Orders_Format.EMPLOYEEID in the input schema and dragging it to the field in the output schema.
g)
In the output schema of the Query transform, right-click the field ORDERDATE and from the menu, select the option, New Output Column. From the next context menu, select Above. Name the new field ORDER_ASSIGNED_TO with a datatype of varchar and a length of 15. Map ORDER_ASSIGNED_TO to Orders_Format.EMPLOYEEID by selecting Orders_Format.EMPLOYEEID in the input schema and dragging it to the field in the output schema.
h) 5.
Select the Back icon to close the editor.
Add the Validation transform to the workspace to the right of the Query transform and connect them. a)
In the Local Object Library, select the Transforms tab. Then select and drag the Validation transform to the data flow workspace to the right of the Query transform.
b)
Connect the Query transform to the Validation transform by selecting the Query transform and holding down the mouse button. Then drag the cursor to the Validation transform and release the mouse button.
c)
Double-click the Validation transform to open the transform editor. In the input schema area, select the field ORDER_ASSIGNED_TO.
d)
In the Validation Rules area, select the button Add. Continued on next page
2011
© 2011 SAP AG. All rights reserved.
245
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
e)
BODS10
In the Rules area, select the check box button for the field Enabled. From the drop-down list in the Rule field, select the rule Exists in table .
f)
Select the field Action on Fail to set the action Send to Both field to send to both Pass and Fail. In the Name field, use the drop-down list to select the EMPLOYEEID field from the Employee table in the HR_datamart datastore. The resulting expression should be HR_DATAMART.DBO.EMPLOYEE.EMPLOYEEID.
g)
In the If any rule fails and Send to Pass, substitute with: section, select the check box button for the field Enabled. In the Column field, use the drop-down list to select EMPLOYEEID. In the Expression field , select the elipses (..) icon and in the Smart Editor, enter the expression '3Cla5'.
h)
In the input schema area, select the field SHIPPERFAX.
i)
In the Validation Rules area, select the button Add.
j)
In the Rules area, select the check box button for the field Enabled. From the drop-down list in the Rule field, select the rule IS NOT NULL.
k)
Select the field Action on Fail to set the action Send to Both field to send to both Pass and Fail. In the Name field, use the drop-down list to select theSHIPPERFAX field.
l)
In the If any rule fails and Send to Pass, substitute with: section, select the check box button for the field Enabled. In the Column field, use the drop-down list to select SHIPPERFAX. In the Expression field , select the elipses (..) icon and in the Smart Editor, enter the expression 'No Fax'.
m)
Select the Back icon to close the editor.
Continued on next page
246
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Validation Transform
6.
Edit the source file formats in the data flow to use all three related orders and order shippers flat files. a)
Double-click the Orders_Format source object to open the format editor and change the file name to orders_*.txt. Note: The asterisk character acts as a wildcard.
b)
Edit the source object to point to the file on the Job Server. Change Location by selecting Job Server from the drop-down list and change the Root directory to D:\CourseFiles\DataServices\Activity_Source. Note: The above file path is case-senstitive and must be typed as shown. Check with your instructor to verify that the path is correct before proceeding.
c) 7.
In the format editor for the Orders_Format, change the Capture Data Conversion Errors option to Yes.
Edit the source file formats in the data flow to use all three related orders and order shippers flat files. a)
Double-click the Orders_Shippers_Format source object to open the format editor and change the file name to Order_Shippers_*.txt. Note: The asterisk character acts as a wildcard.
b)
Edit the source object to point to the file on the Job Server. Change Location by selecting Job Server from the drop-down list and change the Root directory to D:\CourseFiles\DataServices\Activity_Source. Note: The above file path is case-senstitive and must be typed as shown. Check with your instructor to verify that the path is correct before proceeding.
c)
In the format editor for the Orders_Shippers_Format, change the Capture Data Conversion Errors option to Yes.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
247
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
8.
BODS10
Complete the data flow Alpha_Orders_Files_DF by connecting the pass and fail outputs from the Validation transform to the target template tables. a)
Select Back to return to the data flow workspace.
b)
Select and drag from the Validation Transform to the target template table Orders_Files_Work. Release the mouse and select the label Pass for that object from the context menu.
c)
Select and drag from the Validation transform to the target template table Orders_Files_No_Fax. Release the mouse and select the label Fail for that object from the context menu.
Task 4: Design the data flow Alpha_Orders_DB_DF with the Orders table from the Alpha datastore, a Query transform, a Validation transform and target template tables. 1.
2.
In the workspace for Alpha_Orders_DB_DF, add the Orders table from the Alpha datastore as a source object. a)
In the Local Object Library, select the Datastores tab and then select the Orders table from the Alpha datastore.
b)
Select and drag the object to the data flow workspace and in the context menu, choose the option Make Source.
Create a new template table Orders_DB_Work in the Delta datastore as the target object. a)
In the Tool Palette, select the Template Table icon and select the workspace to add a new template table to the data flow.
b)
In the Create Template dialog box, enter Orders_DB_Work as the template table name.
c)
In the In datastore drop-down list, select the Delta datastore as the template table destination.
d)
Select OK.
Continued on next page
248
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Validation Transform
3.
Create a new template table Orders_DB_No_Fax in the Delta datastore as the target object. a)
In the Tool Palette, select the Template Table icon and select the workspace to add a new template table to the data flow.
b)
In the Create Template dialog box, enter Orders_DB_No_Fax as the template table name.
c)
In the In datastore drop-down list, select the Delta datastore as the template table destination.
d)
Select OK.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
249
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
4.
BODS10
Add the Query transform to the workspace and connect both sources to it. a)
In the Tool Palette, select the Query transform icon and select the workspace to add a Query template to the data flow.
b)
Connect the source to the Query transform by selecting the source and holding down the mouse button, drag the cursor to the Query transform. Then release the mouse button.
c)
Double-click the Query transform to open the editor.
d)
In the Query transform, map all of the columns, except for EMPLOYEEID, from the input schema to the output schema by dragging the input schema field to the corresponding output schema field
e)
In the Query transform, change the names of these output schema columns:
f)
Old column name
New output name
SHIPPERCITYID
SHIPPERCITY
SHIPPERCOUNTRYID
SHIPPERCOUNTRY
SHIPPERREGIONID
SHIPPERREGION
In the output schema of the Query transform, right-click the field ORDERDATE and from the menu, select the option, New Output Column. From the next context menu, select Above. Name the new field ORDER_TAKEN_BY with a datatype of varchar and a length of 15. Map ORDER_TAKEN_BY to Orders.EMPLOYEEID by selecting Orders_Format.EMPLOYEEID in the input schema and dragging it to the field in the output schema.
g)
In the output schema of the Query transform, right-click the field ORDERDATE and from the menu, select the option, New Output Column. From the next context menu, select Above. Name the new field ORDER_ASSIGNED_TO with a datatype of varchar and a length of 15. Map ORDER_ASSIGNED_TO to Orders.EMPLOYEEID by selecting Orders_Format.EMPLOYEEID in the input schema and dragging it to the field in the output schema.
h)
Select the Back icon to close the editor.
Continued on next page
250
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Validation Transform
5.
Add the Validation transform to the workspace to the right of the Query transform and connect them. a)
In the Local Object Library, select the Transforms tab. Then select and drag the Validation transform to the data flow workspace to the right of the Query transform.
b)
Connect the Query transform to the Validation transform by selecting the Query transform and holding down the mouse button. Then drag the cursor to the Validation transform and release the mouse button.
c)
Double-click the Validation transform to open the transform editor. In the input schema area, select the field ORDER_ASSIGNED_TO.
d)
In the parameters area, select the checkbox Enable Validation option.
e)
In the Condition area, select the radio button for the option Exists in table. From the drop-down list in this option, select the HR_datamart datastore, then the Employee table, and then the EMPLOYEEID field. The resulting expression should be HR_DATAMART.DBO.EMPLOYEE.EMPLOYEEID.
f)
Select the Action on Failure tab to set the action for the ORDER_ASSIGNED_TO field to send to both Pass and Fail.
g)
Select the For pass, substitute with option and enter the substitute value 3Cla5.
h)
In the input schema area, select the field SHIPPERFAX.
i)
In the parameters area, select the checkbox Enable Validation option.
j)
In the Condition area, select the radio button for the first option. This should be the default selection. From the drop-down list in this option, select IS NOT operator, then enter the value NULL in the next field after the operator.
k)
Select the Action on Failure tab to set the action for the SHIPPERFAX field to send to both Pass and Fail.
l)
Select the For pass, substitute with option and enter the substitute value No Fax.
m)
Select the Back icon to close the editor.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
251
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
6.
BODS10
Complete the data flow Alpha_Orders_DB_DF by connecting the pass and fail outputs from the Validation transform to the target template tables. a)
Select Back to return to the data flow workspace.
b)
Select and drag from the Validation Transform to the target template table Orders_DB_Work. Release the mouse and select the label Pass for that object from the context menu.
c)
Select and drag from the Validation transform to the target template table Orders_DB_No_Fax. Release the mouse and select the label Fail for that object from the context menu.
7.
252
Execute the Alpha_Orders_Validated_Job and view the differences between passing and failing records. a)
In the Omega project area, right-click on the Alpha_Orders_Validated_Job and select the option Execute.
b)
Data Services prompts you to save any objects that have not been saved. Select OK.
c)
The Execution Properties dialog box appears and select OK.
d)
Return to the data flow workspace and view the data in the target tables to see the differences between passing and failing records.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Validation Transform
Lesson Summary You should now be able to: • Use the Validation transform
Related Information • •
2011
For more information on the Validation transform see “Transforms” Chapter 5 in the Data Services Reference Guide. For more information on creating custom Validation functions, see “Validation Transform”, Chapter 12 in the Data Services Reference Guide.
© 2011 SAP AG. All rights reserved.
253
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
BODS10
Lesson: Using the Merge Transform Lesson Overview You want to use the Merge transform to combine incoming data sets with the same schema structure to produce a single output data set with the same schema as the input data sets.
Lesson Objectives After completing this lesson, you will be able to: •
Use the Merge transform
Business Example Your company extracts data from external systems using flat files. The data volume from the various external systems has increased continually in the recent past, making management of the jobs for flat file extraction difficult. You can optimize this process by using Data Services to extract data directly from an external system. You want to use the Merge transform to combine incoming data sets with the same schema structure to produce a single output data set with the same schema as the input data sets.
Using the Merge transform The Merge transform allows you to combine multiple sources with the same schema into a single target. Explaining the Merge transform The Merge transform combines incoming data sets with the same schema structure to produce a single output data set with the same schema as the input data sets. For example, you could use the Merge transform to combine two sets of address data:
254
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Merge Transform
Figure 77: The Merge Transform
The next section gives a brief description the function, data input requirements, options, and data output results for the Merge transform. Input/Output The Merge transform performs a union of the sources. All sources must have the same schema, including: • • •
Number of columns Column names Column data types
If the input data set contains hierarchical data, the names and datatypes must match at every level of the hierarchy. The output data has the same schema as the source data. The output data set contains a row for every row in the source data sets. The transform does not strip out duplicate rows. If columns in the input set contain nested schemas, the nested data is passed without change. Hint: If you want to merge tables that do not have the same schema, you can add the Query transform to one of the tables before the Merge transform to redefine the schema to match the other table. The Merge transform does not offer any options.
2011
© 2011 SAP AG. All rights reserved.
255
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
256
© 2011 SAP AG. All rights reserved.
BODS10
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Merge Transform
Exercise 16: Using the Merge transform Exercise Objectives After completing this exercise, you will be able to: • Use the Merge transform
Business Example Your company extracts data from external systems using flat files. The data volume from the various external systems has increased continually in the recent past, making management of the jobs for flat file extraction difficult. You can optimize this process by using Data Services to extract data directly from an external system. You want to use the Merge transform to combine incoming data sets with the same schema structure to produce a single output data set with the same schema as the input data sets. The Orders data has now been validated, but the out is for two different sources, flat files and database tables. The next step in the process is to modify the structure of those data sets so they match and then merge them into a single data set for further processing. You want to explore using the Merge transform for this task.
Task 1: Use the Query transforms to modify any columns names and data types and to perform lookups for any columns that reference other tables. Use the Merge transform to merge the validated orders data. 1.
In the Omega project, create a new batch job called Alpha_Orders_Merged_Job containing a data flow called Alpha_Orders_Merged_DF.
2.
In the workspace for Alpha_Orders_Merged_DF, add the orders_file_work and orders_db_work tables from the Delta datastore as the source objects.
3.
Add two Query transforms to the workspace connecting each source object to it own Query transform.
4.
In the transform editor for the Query transform connected to the orders_file_work, create output columns and map input columns to output columns. by dragging all columns from the input schema to the output schema.
5.
For the SHIPPERCOUNTRY output column, change the mapping to perform a lookup of COUNTRYNAME from the Country table in the Alpha datastore. Continued on next page
2011
© 2011 SAP AG. All rights reserved.
257
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
BODS10
6.
For the SHIPPERREGION output column, change the mapping to perform a lookup of REGIONNAME from the Region table in the Alpha datastore.
7.
In the transform editor for the Query transform connected to the orders_db_work, create output columns and map input columns to output columns. by dragging all columns from the input schema to the output schema.
8.
For the SHIPPERCITY output column, change the mapping to perform a lookup of CITYNAME from the City table in the Alpha datastore.
9.
For the SHIPPERCOUNTRY output column, change the mapping to perform a lookup of COUNTRYNAME from the Country table in the Alpha datastore.
10. For the SHIPPERREGION output column, change the mapping to perform a lookup of REGIONNAME from the Region table in the Alpha datastore.
Task 2: Merge the data from the Query transforms into a template table called Orders_Merged from the Delta datastore using a Merge transform.
258
1.
Add a Merge transform to the data flow and connect both Query transforms to the Merge transform.
2.
Execute the Alpha_Orders_Merged_Job with the default execution properties.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Merge Transform
Solution 16: Using the Merge transform Task 1: Use the Query transforms to modify any columns names and data types and to perform lookups for any columns that reference other tables. Use the Merge transform to merge the validated orders data. 1.
2.
In the Omega project, create a new batch job called Alpha_Orders_Merged_Job containing a data flow called Alpha_Orders_Merged_DF. a)
In the Project area, right-click the project name and choose New Batch Job from the menu.
b)
Enter the name of the job as Alpha_Orders_Merged_Job.
c)
Press Enter to commit the change.
d)
Open the job Alpha_Orders_Merged_Job by double-clicking it.
e)
Select the Data Flow icon in the Tool Palette.
f)
Select the workspace where you want to add the data flow.
g)
Enter Alpha_Orders_Merged_DF as the name.
h)
Press Enter to commit the change.
i)
Double-click the data flow to open the data flow workspace.
In the workspace for Alpha_Orders_Merged_DF, add the orders_file_work and orders_db_work tables from the Delta datastore as the source objects. a)
In the Local Object Library, select the Datastores tab and then select the orders_file_work and orders_db_work tables from the Delta datastore.
b)
Select and drag the objects to the data flow workspace and in the context menu, choose the option Make Source for each source table.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
259
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
3.
BODS10
Add two Query transforms to the workspace connecting each source object to it own Query transform. a)
In the Tool Palette, select the Query transform icon and select the workspace to add a Query template to the data flow. Then add another Query transform to the workspace.
4.
b)
Connect the source table orders_file_work to a Query transform by selecting the source table and holding down the mouse button, drag the cursor to the Query transform. Then release the mouse button to create the connection.
c)
Connect the source table orders_db_work to a Query transform by selecting the source table and holding down the mouse button, drag the cursor to the Query transform. Then release the mouse button to create the connection.
In the transform editor for the Query transform connected to the orders_file_work, create output columns and map input columns to output columns. by dragging all columns from the input schema to the output schema. a)
Double-click the Query transform to open the editor.
b)
In the Schema In workspace, select and drag each field to the Schema Out workspace. This not only creates output columns, but maps input schema columns to output schema columns.
c)
Change the datatype for these Schema Out columns: Column
Type
ORDERDATE
datetime
SHIPPERADDRESS
varchar(100)
SHIPPERCOUNTRY
varchar(50)
SHIPPERRREGION
varchar(50)
SHIPPERPOSTALCODE
varchar(50)
Continued on next page
260
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Merge Transform
5.
For the SHIPPERCOUNTRY output column, change the mapping to perform a lookup of COUNTRYNAME from the Country table in the Alpha datastore. a)
Go to the Mapping tab for the output schema field SHIPPERCOUNTRY and delete the existing expression by highlighting it and using the Delete button on your keyboard.
b)
Select the Function button and in the Select Function dialog box, open the category of “Database Functions”.
c)
From the list of function names, select the lookup_ext function and select the Next button.
d)
In the Lookup_ext - Select Parameters dialog box, enter the parameters: Field/Option
Value
Lookup table
ALPHA.SOURCE.COUNTRY
Condition Columns in lookup table
COUNTRYID
Op.(&)
=
Expression
ORDERS_FILE_WORK.SHIPPERCOUNTRY
Output Column in lookup table e)
COUNTRYNAME
Select the Finish button.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
261
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
6.
BODS10
For the SHIPPERREGION output column, change the mapping to perform a lookup of REGIONNAME from the Region table in the Alpha datastore. a)
Go to the Mapping tab for the output schema field SHIPPERREGION and delete the existing expression by highlighting it and using the Delete button on your keyboard.
b)
Select the Function button and in the Select Function dialog box, open the category of “Database Functions”.
c)
From the list of function names, select the lookup_ext function and select the Next button.
d)
In the Lookup_ext - Select Parameters dialog box, enter the parameters: Field/Option
Value
Lookup table
ALPHA.SOURCE.REGION
Condition Columns in lookup table
REGIONID
Op.(&)
=
Expression
ORDERS_FILE_WORK.SHIPPERREGION
Output Column in lookup table
REGIONNAME
e)
Select the Finish button.
f)
Select the Back icon to close the editor.
Continued on next page
262
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Merge Transform
7.
In the transform editor for the Query transform connected to the orders_db_work, create output columns and map input columns to output columns. by dragging all columns from the input schema to the output schema. a)
Double-click the Query transform to open the editor.
b)
In the Schema In workspace, select and drag each field to the Schema Out workspace. This not only creates output columns, but maps input schema columns to output schema columns.
c)
Change the datatype for these Schema Out columns: Column
Type
ORDER_TAKEN_BY
varchar(15)
ORDER_ASSIGNED_TO
varchar(15)
SHIPPERCITY
varchar(50)
SHIPPERCOUNTRY
varchar(50)
SHIPPERRREGION
varchar(50)
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
263
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
8.
BODS10
For the SHIPPERCITY output column, change the mapping to perform a lookup of CITYNAME from the City table in the Alpha datastore. a)
Go to the Mapping tab for the output schema field SHIPPERCITY and delete the existing expression by highlighting it and using the Delete button on your keyboard.
b)
Select the Function button and in the Select Function dialog box, open the category of “Database Functions”.
c)
From the list of function names, select the lookup_ext function and select the Next button.
d)
In the Lookup_ext - Select Parameters dialog box, enter the parameters: Field/Option
Value
Lookup table
ALPHA.SOURCE.CITY
Condition Columns in lookup table
CITYID
Op.(&)
=
Expression
ORDERS_DB_WORK.SHIPPERCITY
Output Column in lookup table e)
CITYNAME
Select the Finish button.
Continued on next page
264
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Merge Transform
9.
For the SHIPPERCOUNTRY output column, change the mapping to perform a lookup of COUNTRYNAME from the Country table in the Alpha datastore. a)
Go to the Mapping tab for the output schema field SHIPPERCOUNTRY and delete the existing expression by highlighting it and using the Delete button on your keyboard.
b)
Select the Function button and in the Select Function dialog box, open the category of “Database Functions”.
c)
From the list of function names, select the lookup_ext function and select the Next button.
d)
In the Lookup_ext - Select Parameters dialog box, enter the parameters: Field/Option
Value
Lookup table
ALPHA.SOURCE.COUNTRY
Condition Columns in lookup table
COUNTRYID
Op.(&)
=
Expression
ORDERS_DB_WORK.SHIPPERCOUNTRY
Output Column in lookup table e)
COUNTRYNAME
Select the Finish button.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
265
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
BODS10
10. For the SHIPPERREGION output column, change the mapping to perform a lookup of REGIONNAME from the Region table in the Alpha datastore. a)
Go to the Mapping tab for the output schema field SHIPPERREGION and delete the existing expression by highlighting it and using the Delete button on your keyboard.
b)
Select the Function button and in the Select Function dialog box, open the category of “Database Functions”.
c)
From the list of function names, select the lookup_ext function and select the Next button.
d)
In the Lookup_ext - Select Parameters dialog box, enter the parameters: Field/Option
Value
Lookup table
ALPHA.SOURCE.REGION
Condition Columns in lookup table
REGIONID
Op.(&)
=
Expression
ORDERS_DB_WORK.SHIPPERREGION
Output Column in lookup table
REGIONNAME
e)
Select the Finish button.
f)
Select the Back icon to close the editor.
Continued on next page
266
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Merge Transform
Task 2: Merge the data from the Query transforms into a template table called Orders_Merged from the Delta datastore using a Merge transform. 1.
Add a Merge transform to the data flow and connect both Query transforms to the Merge transform. a)
In the Local Object Library, select the Transforms tab. Then select and drag the Merge transform to the data flow workspace to the right of the Query transforms.
b)
Connect both Query transforms to the Merge transform by selecting each Query transform and holding down the mouse button. Then drag the cursor to the Merge transform and release the mouse button to create the connection.
c)
Double-click the Query transform to open the editor. Note: At this point, check to make sure that the order of fields in both input schemas are identical in order. This is a prerequisite for the Merge transform to merge the schemas.
d)
Select the Back icon to close the editor.
e)
In the Tool Palette, select the Template Table icon and select the workspace to add a new template table to the data flow.
f)
In the Create Template dialog box, enter Orders_Merged as the template table name.
g)
In the In datastore drop-down list, select the Delta datastore as the template table destination target.
h)
Select OK.
i)
Connect the Merge transforms to the target template table Orders_Merged by selecting the Merge transform and holding down the mouse button. Then drag the cursor to the template table and release the mouse button.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
267
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
2.
268
BODS10
Execute the Alpha_Orders_Merged_Job with the default execution properties. a)
In the Omega project area, right-click on the Alpha_Orders_Merged_Job and select the option Execute.
b)
Data Services prompts you to save any objects that have not been saved. Select OK.
c)
The Execution Properties dialog box appears and select OK.
d)
Return to the data flow workspace and view the data in the target table to see that the SHIPPERCITY, SHIPPERCOUNTRY and SHIPPERREGION columns for the 363 records in the template table have names rather than ID values.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Merge Transform
Lesson Summary You should now be able to: • Use the Merge transform
Related Information •
2011
For more information on the Merge transform see “Transforms” Chapter 5 in the Data Services Reference Guide.
© 2011 SAP AG. All rights reserved.
269
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
BODS10
Lesson: Using the Case Transform Lesson Overview You want to use the Case transform to simplify branch logic in data flows by consolidating case or decision–making logic into one transform. The transform allows you to divide a data set into smaller sets based on logical branches.
Lesson Objectives After completing this lesson, you will be able to: •
Use the Case transform
Business Example Your company extracts data from external systems using flat files. The data volume from the various external systems has increased continually in the recent past, making management of the jobs for flat file extraction difficult. You can optimize this process by using Data Services to extract data directly from an external system. You want to use the Case transform to simplify branch logic in data flows by consolidating case or decision-making logic into one transform. The transform allows you to divide a data set into smaller sets based on logical branches.
Using the Case transform The Case transform supports separating data from a source into multiple targets based on branch logic. Explaining the Case transform You use the Case transform to simplify branch logic in data flows by consolidating case or decision–making logic into one transform. The transform allows you to divide a data set into smaller sets based on logical branches. For example, you can use the Case transform to read a table that contains sales revenue facts for different regions and separate the regions into their own tables for more efficient data access:
270
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Case Transform
Figure 78: Introduction to the Case Transform
The next section gives a brief description the function, data input requirements, options, and data output results for the Case transform. Only one data flow source is allowed as a data input for the Case transform. Depending on the data, only one of multiple branches is executed per row. The input and output schema are also identical when using the case transform. The connections between the Case transform and objects used for a particular case must be labeled. Each output label in the Case transform must be used at least once. You connect the output of the Case transform with another object in the workspace. Each label represents a case expression (“WHERE” clause).
2011
© 2011 SAP AG. All rights reserved.
271
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
BODS10
Figure 79: Comparison: Case and Validation
Options The Case transform offers several options: Option
Description
Label
Define the name of the connection that describes the path for data if the corresponding Case condition is true.
Expression
Define the Case expression for the corresponding label.
Produce default option with label
Specify that the transform must use the expression in this label when all other Case expressions evaluate to false.
Row can be TRUE for one case only
Specify that the transform passes each row to the first case whose expression returns true.
Figure 80: Case Transform Editor
272
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Case Transform
To create a case statement 1. 2.
Drag the Case transform to the workspace to the right of your source object. Add your target objects to the workspace. One target object is required for each possible condition in the case statement.
3. 4. 5. 6. 7. 8.
9.
Connect the source object to the transform. In the parameters area of the transform editor, select Add to add a new expression. In the Label field, enter a label for the expression. Select and drag an input schema column to the Expression pane at the bottom of the window. Define the expression of the condition. To direct records that do not meet any defined conditions to a separate target object, select the Produce default option with label option and enter the label name in the associated field. To direct records that meet multiple conditions to only one target, select the Row can be TRUE for one case only option. In this case, records placed in the target are associated with the first condition that evaluates as true.
Figure 81: Case Transform Reminders
2011
© 2011 SAP AG. All rights reserved.
273
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
274
© 2011 SAP AG. All rights reserved.
BODS10
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Case Transform
Exercise 17: Using the Case transform Exercise Objectives After completing this exercise, you will be able to: • [Enter an exercise objective. Insert additional objectives if required.]
Business Example The Orders data has been validated and merged from two different source, flat files and database tables. Now the resulting data set must be partitioned by quarter for reporting purposes. You must use the Case transform to set up the various conditions to partitioned the merged data into the appropriate quarterly partitions.
Task: The Orders data has been validated and merged from two different source, flat files and database tables. Now the resulting data set must be partitioned by quarter for reporting purposes. You must use the Case transform to set up the various conditions to create separate tables for orders occurring in fiscal quarters 4 for the year 2006 and quarters 1-4 for the year 2007.
2011
1.
In the Omega project, create a new batch job Alpha_Orders_By_Quarter_Job with a new data flow called Alpha_Orders_By_Quarter_DF.
2.
In the workspace for Alpha_Orders_By_Quarter_DF, add the Orders_Merged table from the Delta datastore as the source object.
3.
Add the Query transform to the workspace between the source and target.
4.
In the transform editor for the Query transform, create output columns and map all columns from input to output.
5.
Add the Case transform to the workspace to the right of the Query transform and connect them.
6.
In the transform editor for the Case transform, create the labels and associated expressions for the partitioned fiscal quarters 4 in the year 2006 and 1-4 in the year 2007.
7.
Add five template tables Orders_Q4_2006, Orders_Q1_2007, Orders_Q2_2007, Orders_Q3_2007, and Orders_Q4_2007 in the Delta datastore as output tables for the Case transform and connect them to the Case transform.
8.
Execute the Alpha_Orders_By_Quarter_Job with the default execution properties.
© 2011 SAP AG. All rights reserved.
275
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
BODS10
Solution 17: Using the Case transform Task: The Orders data has been validated and merged from two different source, flat files and database tables. Now the resulting data set must be partitioned by quarter for reporting purposes. You must use the Case transform to set up the various conditions to create separate tables for orders occurring in fiscal quarters 4 for the year 2006 and quarters 1-4 for the year 2007. 1.
2.
3.
In the Omega project, create a new batch job Alpha_Orders_By_Quarter_Job with a new data flow called Alpha_Orders_By_Quarter_DF. a)
In the Project area, right-click the project name and choose New Batch Job from the menu.
b)
Enter the name of the job as Alpha_Orders_By_Quarter_Job.
c)
Press Enter to commit the change.
d)
Open the job Alpha_Orders_By_Quarter_Job by double-clicking it.
e)
Select the Data Flow icon in the Tool Palette.
f)
Select the workspace where you want to add the data flow.
g)
Enter Alpha_Orders_By_Quarter_DF as the name.
h)
Press Enter to commit the change.
i)
Double-click the data flow to open the data flow workspace.
In the workspace for Alpha_Orders_By_Quarter_DF, add the Orders_Merged table from the Delta datastore as the source object. a)
In the Local Object Library, select the Datastores tab and then select the Orders_Merged table from the Delta datastore.
b)
Select and drag the object to the data flow workspace and in the context menu, choose the option Make Source.
Add the Query transform to the workspace between the source and target. a)
In the Tool Palette, select the Query transform icon and select the workspace to add a Query template to the data flow.
b)
Connect the source table to the Query transform by selecting the source table and holding down the mouse button, drag the cursor to the Query transform. Then release the mouse button.
Continued on next page
276
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Case Transform
4.
In the transform editor for the Query transform, create output columns and map all columns from input to output. a)
Double-click the Query transform to open the editor.
b)
In the Schema Out workspace, select all the fields by selecting the first and last fields in the list while holding down the Shift key.
c)
Drag all the selected fields in the output schema to the Schema In workspace.
d)
In the Schema Out workspace, right-click the last column field to choose the option New Output Item. In the next dialog box, choose the option Below and enter the Item name ORDERQUARTER with Data Type int.
e)
In the Schema Out workspace, right-click ORDERQUARTER to choose the option New Output Item. In the next dialog box, choose the option Below and enter the Item name ORDERYEAR with Data Type varchar(4).
f)
Go to the Mapping tab for the output schema field ORDERYEAR and Select the Function button and in the Select Function dialog box, open the category of “Date Functions”. From the list of function names, select the Quarter function and select the Next button. In the field Input string, select the drop-down arrow to select the table Orders_Merged from the Delta datastore. From the table Orders_Merged, select the field ORDERDATE and select the OK button and in the next dialog box, select the Finish button.
g)
Go to the Mapping tab for the output schema field ORDERQUARTER and Select the Function button and in the Select Function dialog box, open the category of “Conversion Functions”. From the list of function names, select the to_char function and select the Next button. In the field Input string, select the drop-down arrow to select the table Orders_Merged from the Delta datastore. From the table Orders_Merged, select the field ORDERDATE. Select the format YYYY and select the OK button. In the next dialog box, select the Finish button.
h)
Select the Back icon to close the editor.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
277
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
5.
6.
BODS10
Add the Case transform to the workspace to the right of the Query transform and connect them. a)
In the Local Object Library, select the Transforms tab, select and drag the Case transform into the data flow workspace.
b)
Connect the Query transform to the Case transform by selecting the Query transform and holding down the mouse button, drag the cursor to the Case transform. Then release the mouse button.
In the transform editor for the Case transform, create the labels and associated expressions for the partitioned fiscal quarters 4 in the year 2006 and 1-4 in the year 2007. a)
Double-click the Case transform to open the transform editor.
b)
In the parameters area of the transform editor, select Add to add a new transform. In the Label field, enter the label Q42006 for the expression. Select and drag the input schema columns ORDERYEAR to the Expression workspace at the bottom of the window and enter ='2006' and . Select and drag the input schema columns ORDERQUARTER to the Expression workspace at the bottom of the window and enter ='4' to complete the expression for the first condition.
c)
In the parameters area of the transform editor, select Add to add a new transform. In the Label field, enter the label Q12007 for the expression. Select and drag the input schema columns ORDERYEAR to the Expression workspace at the bottom of the window and enter ='2007' and . Select and drag the input schema columns ORDERQUARTER to the Expression workspace at the bottom of the window and enter ='1' to complete the expression for the second condition.
d)
In the parameters area of the transform editor, select Add to add a new transform. In the Label field, enter the label Q22007 for the expression. Select and drag the input schema columns ORDERYEAR to the Expression workspace at the bottom of the window and enter ='2007' and .
Continued on next page
278
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Case Transform
Select and drag the input schema columns ORDERQUARTER to the Expression workspace at the bottom of the window and enter ='2' to complete the expression for the third condition. e)
In the parameters area of the transform editor, select Add to add a new transform. In the Label field, enter the label Q32007 for the expression. Select and drag the input schema columns ORDERYEAR to the Expression workspace at the bottom of the window and enter ='2007' and . Select and drag the input schema columns ORDERQUARTER to the Expression workspace at the bottom of the window and enter ='3' to complete the expression for the fourth condition.
f)
In the parameters area of the transform editor, select Add to add a new transform. In the Label field, enter the label Q42007 for the expression. Select and drag the input schema columns ORDERYEAR to the Expression workspace at the bottom of the window and enter ='2007' and . Select and drag the input schema columns ORDERQUARTER to the Expression workspace at the bottom of the window and enter ='4' to complete the expression for the fifth condition.
g)
To direct records that do not meet any defined conditions to a separate target object, select the check box Produce default output with label and enter the label name default in the associated field.
h)
To direct records that might meet multiple conditions to only one target, select the check box Row can be TRUE for one case only. In this case, records are placed in the target associated with the first condition that evaluates as true.
i) 7.
Select Back to return to the data flow workspace.
Add five template tables Orders_Q4_2006, Orders_Q1_2007, Orders_Q2_2007, Orders_Q3_2007, and Orders_Q4_2007 in the Delta datastore as output tables for the Case transform and connect them to the Case transform. a)
In the Tool Palette, select the Template Table icon and select the workspace to add a new template table to the data flow. In the Create Template dialog box, enter Orders_Q4_2006 as the template table name. Continued on next page
2011
© 2011 SAP AG. All rights reserved.
279
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
BODS10
In the In datastore drop-down list, select the Delta datastore as the template table destination target and select OK. b)
In the Tool Palette, select the Template Table icon and select the workspace to add a new template table to the data flow. In the Create Template dialog box, enter Orders_Q1_2007 as the template table name. In the In datastore drop-down list, select the Delta datastore as the template table destination target and select OK.
c)
In the Tool Palette, select the Template Table icon and select the workspace to add a new template table to the data flow. In the Create Template dialog box, enter Orders_Q2_2007 as the template table name. In the In datastore drop-down list, select the Delta datastore as the template table destination target and select OK.
d)
In the Tool Palette, select the Template Table icon and select the workspace to add a new template table to the data flow. In the Create Template dialog box, enter Orders_Q3_2007 as the template table name. In the In datastore drop-down list, select the Delta datastore as the template table destination target and select OK.
e)
In the Tool Palette, select the Template Table icon and select the workspace to add a new template table to the data flow. In the Create Template dialog box, enter Orders_Q4_2007 as the template table name. In the In datastore drop-down list, select the Delta datastore as the template table destination target and select OK.
f)
To connect the output from the Case transform to the target template tables, select the Case transform and holding down the mouse button, drag to a template table. Select the label from the popup menu, which corresponds to the table name. Repeat this step for each of the five template tables.
Continued on next page
280
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Case Transform
8.
2011
Execute the Alpha_Orders_By_Quarter_Job with the default execution properties. a)
In the Omega project area, right-click on the Alpha_Orders_By_Quarter_Job and select the option Execute.
b)
Data Services prompts you to save any objects that have not been saved. Select OK.
c)
The Execution Properties dialog box appears and select OK.
d)
View the data in the target tables and confirm that there are 103 orders that were placed in fiscal quarter one of 2007.
© 2011 SAP AG. All rights reserved.
281
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
BODS10
Lesson Summary You should now be able to: • Use the Case transform
Related Information •
282
For more information on the Case transform, see “Transforms” Chapter 5 in the Data Services Reference Guide.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the SQL Transform
Lesson: Using the SQL Transform Lesson Overview You want to use the SQL transform to submit SQL commands to generate data to be moved into target objects.
Lesson Objectives After completing this lesson, you will be able to: •
Use the SQL transform
Business Example Your company extracts data from external systems using flat files. The data volume from the various external systems has increased continually in the recent past, making management of the jobs for flat file extraction difficult. You can optimize this process by using Data Services to extract data directly from an external system. You want to use the SQL transform to submit SQL commands to generate data to be moved into target objects where other transforms do not meet business requirements.
Using the SQL transform The SQL transform allows you to submit SQL commands to generate data to be moved into target objects. Explaining the SQL transform Use this transform to perform standard SQL operations when other built–in transforms does not perform them.
2011
© 2011 SAP AG. All rights reserved.
283
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
BODS10
Figure 82: The SQL Transform Editor
The SQL transform can be used to extract for general select statements as well as stored procedures and views. You can use the SQL transform as a replacement for the Merge transform when you are dealing with database tables only. The SQL transform performs more efficiently because the merge is pushed down to the database. However, you can not use this functionality if your source objects include file formats. The next section gives a brief description the function, data input requirements, options, and data output results for the SQL transform. Inputs/Outputs There is no input data set for the SQL transform There are two ways of defining the output schema for a SQL transform if the SQL submitted is expected to return a result set: •
•
Automatic – After you type the SQL statement, select Update schema to execute a select statement against the database that obtains column information returned by the select statement and populates the output schema. Manual — output columns must be defined in the output portion of the SQL transform if the SQL operation is returning a data set. The number of columns defined in the output of the SQL transform must equal the number of columns returned by the SQL query. The column names and data types of the output columns do not need to match the column names or data types in the SQL query.
The SQL transform has these options:
284
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the SQL Transform
Option
Description
Datastore
Specify the datastore for the tables referred to in the SQL statement.
Database type
Specify the type of database for the datastore where there are multiple datastore configurations.
Join rank
Indicate the weight of the output data set if the data set is used in a join. The highest ranked source is accessed first to construct the join.
Array fetch size Cache
Indicate the number of rows retrieved in a single request to a source database. The default value is 1000. Hold the output from this transform in memory for use in subsequent transforms. Use this only if the data set is small enough to fit in memory.
SQL text
Enter the text of the SQL query.
To create a SQL statement: 1. 2. 3. 4. 5. 6.
On the Transforms tab of the Local Object Library, select and drag the SQL transform to the workspace Add your target object to the workspace. Connect the transform to the target object. Double-click the SQL transform to open the transform editor. In the parameters area, select the source datastore from the Datastore drop–down list. In the SQL text area, enter the SQL statement. For example, to copy the entire contents of a table into the target object, you would use the statement: Select * from Customers.
7.
2011
Select Update Schema to update the output schema with the appropriate values.
© 2011 SAP AG. All rights reserved.
285
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
286
© 2011 SAP AG. All rights reserved.
BODS10
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the SQL Transform
Exercise 18: Using the SQL transform Exercise Objectives After completing this exercise, you will be able to: • Use the SQL transform
Business Example Your company extracts data from external systems using flat files. The data volume from the various external systems has increased continually in the recent past, making management of the jobs for flat file extraction difficult. You can optimize this process by using Data Services to extract data directly from an external system. You want to use the SQL transform to submit SQL commands to generate data to be moved into target objects where other transforms do not meet business requirements.
Task: The contents of the Employee and Department tables must be merged and can be accomplished with the SQL transform. 1.
In the Omega project, create a new batch job called Alpha_Employees_Dept_Job containing a data flow called Alpha_Employees_Dept_DF.
2.
Add a SQL transform to the data flow and connect it to the Emp_Dept table from the HR_datamart datastore as the target object.
3.
In the transform editor for the SQL transform, specify the source datastore and tables.
4.
Execute the Alpha_Employees_Dept_Job with the default execution properties.
Result You should have 40 rows in your target table, because there were 8 employees in the Employee table with department IDs that were not defined in the Department table.
2011
© 2011 SAP AG. All rights reserved.
287
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
BODS10
Solution 18: Using the SQL transform Task: The contents of the Employee and Department tables must be merged and can be accomplished with the SQL transform. 1.
2.
In the Omega project, create a new batch job called Alpha_Employees_Dept_Job containing a data flow called Alpha_Employees_Dept_DF. a)
In the Project area, right-click the project name and choose New Batch Job from the menu.
b)
Enter the name of the job as Alpha_Employees_Dept_Job.
c)
Press Enter to commit the change.
d)
Open the job Alpha_Employees_Dept_Job by double-clicking it.
e)
Select the Data Flow icon in the Tool Palette.
f)
Select the workspace where you want to add the data flow.
g)
Enter Alpha_Employees_Dept_DF as the name.
h)
Press Enter to commit the change.
i)
Double-click the data flow to open the data flow workspace.
Add a SQL transform to the data flow and connect it to the Emp_Dept table from the HR_datamart datastore as the target object. a)
In the Local Object Library, select the Transforms tab. Then select and drag the SQL transform to the data flow workspace.
b)
In the Local Object Library, select the Datastores tab and then select the Emp_Dept table from the HR_datamart datastore and drag the object to the data flow workspace and in the context menu, choose the option Make Target.
Continued on next page
288
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the SQL Transform
3.
In the transform editor for the SQL transform, specify the source datastore and tables. a)
Double-click the SQL transform to open the transform editor.
b)
For the field Datastore, use the drop-down list to select the Alpha datastore.
c)
For the field Database type, use the drop-down list to select SQL Server 2005.
d)
Create a SQL statement to select the last name and first name of the employee from the Employee table and the department to which the employee belongs. Look up the value in the Department table based on the Department ID. Enter the expression: SELECT EMPLOYEE.EMPLOYEEID, EMPLOYEE.FIRSTNAME, EMPLOYEE.LASTNAME. DEPARTMENT NAME FROM ALPHA.SOURCE. EMPLOYEE, ALPHA.SOURCE.DEPARTMENT WHERE EMPLOYEE.DEPARTMENTID=DEPARTMENT.DEPARTMENTID.
4.
e)
To create the output schema, select the button Update schema and this creates the output column fields.
f)
Right-click on the EMPLOYEEID column and select the option Set as primary key.
g)
Select the Back icon to close the editor.
h)
Connect the SQL transform to the target table by selecting the SQL transform and while holding down the mouse button, drag to the target table. Release the button to create the link.
Execute the Alpha_Employees_Dept_Job with the default execution properties. a)
In the Omega project area, right-click on the Alpha_Employees_Dept_Job and select the option Execute.
b)
Data Services prompts you to save any objects that have not been saved. Select OK.
c)
The Execution Properties dialog box appears and select OK.
d)
Return to the data flow workspace and view the data in the target table.
Result You should have 40 rows in your target table, because there were 8 employees in the Employee table with department IDs that were not defined in the Department table.
2011
© 2011 SAP AG. All rights reserved.
289
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 6: Using Platform Transforms
BODS10
Lesson Summary You should now be able to: • Use the SQL transform
Related Information •
290
For more information on the SQL transform see “Transforms” Chapter 5 in the Data Services Reference Guide.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Unit Summary
Unit Summary You should now be able to: • Describe platform transforms • Use the Map Operation transform in a data flow • Use the Validation transform • Use the Merge transform • Use the Case transform • Use the SQL transform
2011
© 2011 SAP AG. All rights reserved.
291
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit Summary
292
BODS10
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 7 Setting Up Error Handling Unit Overview If a Data Services job does not complete properly, you must resolve the problems that prevented the successful execution of the job. The best solution to data recovery situations is obviously not to get them in the first place. Some of those situations are unavoidable, such as server failures. Others, however, can easily be sidestepped by constructing your jobs so that they take into account the issues that frequently cause them to fail.
Unit Objectives After completing this unit, you will be able to: • •
Explain the levels of data recovery strategies Use recoverable work flows using a try/catch block with a conditional
Unit Contents Lesson: Setting Up Error Handling ...........................................294 Exercise 19: Creating an Alternative Work Flow .......................305
2011
© 2011 SAP AG. All rights reserved.
293
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 7: Setting Up Error Handling
BODS10
Lesson: Setting Up Error Handling Lesson Overview For sophisticated error handling, you can use recoverable work flows and try/catch blocks to recover data.
Lesson Objectives After completing this lesson, you will be able to: • •
Explain the levels of data recovery strategies Use recoverable work flows using a try/catch block with a conditional
Business Example Your company would like to establish a direct open connection to your Microsoft Access database. In considering DB Connect technology, you have decided that this technology is not a suitable option for your particular database. You need a data staging option that can support a wider range of source systems. The options provided by the SAP NetWeaver Business Warehouse connection are considerably enhanced by universal data integration so that almost all data sources can be directly connected to SAP BW.
Using recovery mechanisms If a Data Services job does not complete properly, you must resolve the problems that prevented the successful execution of the job. Avoiding data recovery situations The best solution to data recovery situations is obviously not to get into them in the first place. Some of those situations are unavoidable, such as server failures. Others, however, can easily be sidestepped by constructing your jobs so that they take into account the issues that frequently cause them to fail. One example is when an external file is required to run a job. In this situation, you could use the wait_for_file function or a while loop and the file_exists function to check that the file exists in a specified location before executing the job. The while loop is a single–use object that you can use in a work flow. The while loop repeats a sequence of steps as long as a condition is true. Typically, the steps done during the while loop result in a change in the condition so that the condition is eventually no longer satisfied and the work flow exits from the while loop. If the condition does not change, the while loop does not end.
294
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting Up Error Handling
For example, you might want a work flow to wait until the system writes a particular file. You can use a while loop to check for the existence of the file using the file_exists function. As long as the file does not exist, you can have the work flow go into sleep mode for a particular length of time before checking again. Because the system might never write the file, you must add another check to the loop, such as a counter, to ensure that the while loop eventually exits. In other words, change the while loop to check for the existence of the file and the value of the counter. As long as the file does not exist and the counter is less than a particular value, repeat the while loop. In each iteration of the loop, put the work flow in sleep mode and then increment the counter Describing levels of data recovery strategies When a job fails to complete successfully during execution, some data flows may not have completed. When this happens, some tables may have been loaded, partially loaded, or altered.
Figure 83: Recovery Mechanisms
You need to design your data movement jobs so that you can recover your data by rerunning the job and retrieving all the data without introducing duplicate or missing data.
2011
© 2011 SAP AG. All rights reserved.
295
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 7: Setting Up Error Handling
BODS10
There are different levels of data recovery and recovery strategies. You can: •
• •
•
•
Recover your entire database: Use your standard RDBMS services to restore crashed data cache to an entire database. This option is outside of the scope of this course. Recover a partially-loaded job: Use automatic recovery. Recover from partially-loaded tables: Use the Table Comparison transform, do a full replacement of the target, use the auto-correct load feature, include a preload SQL command to avoid duplicate loading of rows when recovering from partially loaded tables. Recover missing values or rows: Use the Validation transform or the Query transform with WHERE clauses to identify missing values, and use overflow files to manage rows that could not be inserted. Define alternative work flows: Use conditionals, try/catch blocks, and scripts to ensure all exceptions are managed in a work flow.
Depending on the relationships between data flows in your application, you may use a combination of these techniques to recover from exceptions. Note: Some recovery mechanisms are for use in production systems and are not supported in development environments Configuring work flows and data flows In some cases, steps in a work flow depend on each other and must be executed together. When there is a dependency like this, you should designate the work flow as a recovery unit. This requires the entire work flow to complete successfully. If the work flow does not complete successfully, Data Services executes the entire work flow during recovery, including the steps that executed successfully in prior work flow runs. Conversely, you may need to specify that a work flow or data flow should only execute once. When this setting is enabled, the job never re–executes that object. We do not recommend marking a work flow or data flow as “Execute only once” if the parent work flow is a recovery unit. To specify a work flow as a recovery unit 1.
In the project area or on the Work Flows tab of the Local Object Library, right–click the work flow and select Properties from the menu. The Properties dialog box displays.
2. 3.
296
On the General tab, select the Recover as a unit check box. Select OK.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting Up Error Handling
To specify that an object executes only once 1.
In the project area or on the appropriate tab of the Local Object Library, right–click the work flow or data flow and select Properties from the menu. The Properties dialog box displays.
2. 3.
On the General tab, select the Execute only once check box. Select OK.
Using recovery mode If a job with automated recovery enabled fails during execution, you can execute the job again in recovery mode. During recovery mode, Data Services retrieves the results for successfully–completed steps and reruns incompleted or failed steps under the same conditions as the original job. In recovery mode, Data Services executes the steps or recovery units that did not complete successfully in a previous execution. This includes steps that failed and steps that generated an exception but completed successfully, such as those in a try/catch block. As in normal job execution, Data Services executes the steps in parallel if they are not connected in the work flow diagrams and in serial if they are connected. For example, suppose a daily update job running overnight successfully loads dimension tables in a warehouse. However, while the job is running, the database log overflows and stops the job from loading fact tables. The next day, you truncate the log file and run the job again in recovery mode. The recovery job does not reload the dimension tables in a failed job because the original job, even though it failed, successfully loaded the dimension tables. To ensure that the fact tables are loaded with the data that corresponds properly to the data already loaded in the dimension tables, ensure that •
Your recovery job must use the same extraction criteria that your original job used when loading the dimension tables. If your recovery job uses new extraction criteria, such as basing data extraction on the current system date, the data in the fact tables will not correspond to the data previously extracted into the dimension tables. If your recovery job uses new values, the job execution may follow a completely different path with conditional steps or try/catch blocks.
•
2011
Your recovery job must follow the exact execution path that the original job followed. Data Services records any external inputs to the original job so that your recovery job can use these stored values and follow the same execution path.
© 2011 SAP AG. All rights reserved.
297
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 7: Setting Up Error Handling
BODS10
To enable automatic recovery in a job 1.
In the project area, right–click the job and select Execute from the menu. The Execution Properties dialog box displays.
2.
On the Parameters tab, select the Enable recovery check box. If this check box is not selected, Data Services does not record the results from the steps during the job and cannot recover the job if it fails.
3.
Select OK.
To recover from last execution 1.
In the project area, right–click the job that failed and select Execute from the menu. The Execution Properties dialog box displays.
2.
On the Parameters tab, select the Recover from last execution check box. This option is not available when a job has not yet been executed, the previous job run succeeded, or recovery mode was disabled during the previous run.
3.
Select OK.
Recovering from partially–loaded data Executing a failed job again may result in duplication of rows that were loaded successfully during the first job run.
298
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting Up Error Handling
Within your recoverable work flow, you can use several methods to ensure that you do not insert duplicate rows: •
•
•
•
Include the Table Comparison transform (available in Data Integrator packages only) in your data flow when you have tables with more rows and fewer fields, such as fact tables. Change the target table options to completely replace the target table during each execution. This technique can be optimal when the changes to the target table are numerous compared to the size of the table. Change the target table options to use the auto-correct load feature when you have tables with fewer rows and more fields, such as dimension tables. The auto-correct load checks the target table for existing rows before adding new rows to the table. Using the auto-correct load option, however, can slow jobs executed in nonrecovery mode. Consider this technique when the target table is large and the changes to the table are relatively few. Include a SQL command to execute before the table loads. Preload SQL commands can remove partial database updates that occur during incomplete execution of a step in a job. Typically, the preload SQL command deletes rows based on a variable that is set before the partial insertion step began. For more information on preloading SQL commands, see “Using preload SQL to allow re–executable Data Flows”, Chapter 18 in the Data Services Designer Guide.
Recovering missing values or rows Missing values that are introduced into the target data during data integration and data quality processes can be managed using the Validation or Query transforms. Missing rows are rows that cannot be inserted into the target table. For example, rows may be missing in instances where a primary key constraint is violated. Overflow files help you process this type of data problem. When you specify an overflow file and Data Services cannot load a row into a table, Data Services writes the row to the overflow file instead. The trace log indicates the data flow in which the load failed and the location of the file. You can use the overflow information to identify invalid data in your source or problems introduced in the data movement. Every new run will overwrite the existing overflow file.
2011
© 2011 SAP AG. All rights reserved.
299
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 7: Setting Up Error Handling
BODS10
To use an overflow file in a job 1. 2. 3.
Open the target table editor for the target table in your data flow. On the Options tab, under Error handling, select the Use overflow file check box. In the File name field, enter or browse to the full path and file name for the file. When you specify an overflow file, give a full path name to ensure that Data Services creates a unique file when more than one file is created in the same job.
4.
In the File format drop–down list, select what you want Data Services to write to the file about the rows that failed to load: • •
If you select Write data, you can use Data Services to specify the format of the error–causing records in the overflow file. If you select Write sql, you can use the commands to load the target manually when the target is accessible.
Defining alternative work flows You can set up your jobs to use alternative work flows that cover all possible exceptions and have recovery mechanisms built in. This technique allows you to automate the process of recovering your results.
Figure 84: Alternative Workflow with Try/Catch Blocks
300
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting Up Error Handling
Alternative work flows consist of several components: 1.
A script to determine if recovery is required. This script reads the value in a status table and populates a global variable with the same value. The initial value in table is set to indicate that recovery is not required.
2.
A conditional that calls the appropriate work flow based on whether recovery is required. The conditional contains an If/Then/Else statement to specify that work flows that do not require recovery are processed one way, and those that do require recovery are processed another way.
3.
A work flow with a try/catch block to execute a data flow without recovery. The data flow where recovery is not required is set up without the auto correct load option set. This ensures that, wherever possible, the data flow is executed in a less resource–intensive mode.
4.
A script in the catch object to update the status table. The script specifies that recovery is required if any exceptions are generated.
5.
A work flow to execute a data flow with recovery and a script to update the status table. The data flow is set up for more resource–intensive processing that will resolve the exceptions. The script updates the status table to indicate that recovery is not required.
Conditionals Conditionals are single–use objects used to implement conditional logic in a work flow.
2011
© 2011 SAP AG. All rights reserved.
301
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 7: Setting Up Error Handling
BODS10
Figure 85: Workflow with Conditional Decision
When you define a conditional, you must specify a condition and two logical branches: Statement
Description
If
A Boolean expression that evaluates to TRUE or FALSE. You can use functions, variables, and standard operators to construct the expression.
Then
Work flow element to execute if the IF expression evaluates to TRUE.
Else
Work flow element to execute if the IF expression evaluates to FALSE.
Both the Then and Else branches of the conditional can contain any object that you can have in a work flow, including other work flows, data flows, nested conditionals, try/catch blocks, scripts, and so on. Try/Catch Blocks A try/catch block allows you to specify alternative work flows if errors occur during job execution. Try/catch blocks catch classes of errors, apply solutions that you provide, and continue execution. For each catch in the try/catch block, you can specify: •
•
One exception or group of exceptions handled by the catch. To handle more than one exception or group of exceptions, add more catches to the try/catch block. The work flow to execute if the indicated exception occurs. Use an existing work flow or define a work flow in the catch editor.
If an exception is thrown during the execution of a try/catch block, and if no catch is looking for that exception, then the exception is handled by normal error logic. Using try/catch blocks and automatic recovery Data Services does not save the result of a try/catch block for re–use during recovery. If an exception is thrown inside a try/catch block, during recovery Data Services executes the step that threw the exception and subsequent steps. Since the execution path with the try/catch block might be different in the recovered job, using variables set in the try/catch block could alter the results during automatic recovery. For example, suppose you create a job that defines the value of variable $I within a try/catch block. If an exception occurs, you set an alternate value for $I. Subsequent steps are based on the new value of $I.
302
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting Up Error Handling
During the first job execution, the first work flow contains an error that generates an exception, which is caught. However, the job fails in the subsequent work flow.
Figure 86: Workflow First Execution Captures Error
You fix the error and run the job in recovery mode. During the recovery execution, the first work flow no longer generates the exception. Thus the value of variable $I is different, and the job selects a different subsequent work flow, producing different results.
Figure 87: Conditional Changes Execution Path
To ensure proper results with automatic recovery when a job contains a try/catch block, do not use values set inside the try/catch block or reference output variables from a try/catch block in any subsequent steps.
2011
© 2011 SAP AG. All rights reserved.
303
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 7: Setting Up Error Handling
304
© 2011 SAP AG. All rights reserved.
BODS10
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting Up Error Handling
Exercise 19: Creating an Alternative Work Flow Exercise Objectives After completing this exercise, you will be able to: • Use recoverable work flows using a try/catch block with a conditional to catch exceptions
Business Example With the influx of new employees resulting from Alpha's acquisition of new companies, the Employee Department information needs to be updated regularly. Since this information is used for payroll, it is critical that there is no loss of records if a job is interrupted. You need to set up the job in a way that exceptions are always managed. This involves setting up a conditional that executes a less resource-intensive update of the table first. If that generates an exception, the conditional then tries a version of the same data flow that is configured to auto correct the load.
Task: Set up a job Alpha_Empoyees_Dept_Recovery_Job with a try/catch block and conditional to catch exceptions in the execution of a data flow Alpha_Employees_Dept_DF. Exceptions cause the conditional to execute a different version of the same data flow Alpha_Employees_Dept_AC_DF configured with auto correction. 1.
Replicate the data flow Alpha_Employees_Dept_DF as Alpha_Employees_Dept_AC_DF in the Local Object Library and reconfigure the target tables in both data flows for auto correction.
2.
In the Omega project, create a new batch job and data flow called Alpha_Employees_Dept_Recovery_Job and a new global variable $G_Recovery_Needed.
3.
In the workspace of the Alpha_Employees_Dept_Recovery_Job add a work flow called Alpha_Employees_Dept_Recovery_WF.
4.
In the work flow Alpha_Employees_Dept_Recovery_WF workspace, add a script called GetStatus and construct an expression to update the value of the global variable $G_Recovery_Needed to the same value as in the recovery_flag column in the recovery_status table in the HR_datamart.
5.
In the work flow workspace, enter a Conditional call Alpa_Employees_Dept_Con connected to the script. Continued on next page
2011
© 2011 SAP AG. All rights reserved.
305
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 7: Setting Up Error Handling
306
BODS10
6.
Configure the Conditional as an “if” statement, which determines which data flow to execute based upon the value of the global variable $G_Recovery_Needed.
7.
Execute Alpha_Employees_Dept_Recovery_Job with the default properties.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting Up Error Handling
Solution 19: Creating an Alternative Work Flow Task: Set up a job Alpha_Empoyees_Dept_Recovery_Job with a try/catch block and conditional to catch exceptions in the execution of a data flow Alpha_Employees_Dept_DF. Exceptions cause the conditional to execute a different version of the same data flow Alpha_Employees_Dept_AC_DF configured with auto correction. 1.
Replicate the data flow Alpha_Employees_Dept_DF as Alpha_Employees_Dept_AC_DF in the Local Object Library and reconfigure the target tables in both data flows for auto correction. a)
In the Local Object Library, select the Dataflows tab and right-click the Alpha_Employees_Dept_DF data flow to choose the option Replicate.
b)
Change the name of the replicated data flow to Alpha_Employees_Dept_AC_DF by double-clicking the name to enter editor mode. After entering the new name, select Enter to save the name.
c)
Double-click the data flow Alpha_Employees_Dept_DF to open its workspace and double-click the target table Emp_Dept.
d)
In the workspace for the target table editor, deselect the checkboxes for the options Delete data from table before loading and Auto correct load. Select Back to return to the data flow workspace.
e)
In the Local Object Library, select the Dataflows tab, select and double-click the Alpha_Employees_Dept_DF data flow to open the workspace and double-click the target table Emp_Dept.
f)
In the workspace for the target table editor, deselect the checkbox for the option Delete data from table before loading.
g)
In the workspace for the target table editor, select the checkbox for the option Auto correct load. Select Back to return to the data flow workspace.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
307
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 7: Setting Up Error Handling
2.
BODS10
In the Omega project, create a new batch job and data flow called Alpha_Employees_Dept_Recovery_Job and a new global variable $G_Recovery_Needed. a)
In the project area, right-click the Omega project to select the option New batch job and enter the name Alpha_Employees_Dept_Recovery_Job.
b)
In the project area, select the jobAlpha_Employees_Dept_Recovery_Job and then use the menu path Tools → Variables.
c)
Right-click Variables and select Insert from the menu.
d)
Right-click the new variable and select Properties from the menu and enter $G_Recovery_Needed in the Global Variable Properties dialog box. In the Data type drop-down list, select int for the datatype and select OK.
3.
In the workspace of the Alpha_Employees_Dept_Recovery_Job add a work flow called Alpha_Employees_Dept_Recovery_WF. a)
Double-click the job Alpha_Employees_Dept_Recovery_Job to open its workspace.
b)
From the Tool Palette, select the work flow icon and drag it into the workspace and enter the name Alpha_Employees_Dept_Recovery_WF.
c)
Double-click the work flow Alpha_Employees_Dept_Recovery_WF to open its workspace.
Continued on next page
308
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting Up Error Handling
4.
In the work flow Alpha_Employees_Dept_Recovery_WF workspace, add a script called GetStatus and construct an expression to update the value of the global variable $G_Recovery_Needed to the same value as in the recovery_flag column in the recovery_status table in the HR_datamart. a)
From the Tool Palette, select the Script icon and then double-click in the work flow workspace to insert the script.
b)
Name the script GetTimeStamp.
c)
Double-click the script to open it and create an expression to update the value of the global variable to the value as in the recovery_flag column in the recovery_status table in the HR_datamart. Type in this expression: $G_Recovery_Needed = sql('hr_datamart', 'select recovery_flag from recovery_status');
d) 5.
Close the script and return to the job workspace. Double-click the data flow to open its workspace.
In the work flow workspace, enter a Conditional call Alpa_Employees_Dept_Con connected to the script. a)
From the Tool Palette, select and drag the icon for a Conditional into the work flow workspace.
b)
Select the script and holding down the mouse button, drag to the Conditional. Release the mouse button to create the connection between the script and the conditional.
c) 6.
Double-click the Conditional to open its workspace.
Configure the Conditional as an “if” statement, which determines which data flow to execute based upon the value of the global variable $G_Recovery_Needed. a)
In the editor for the conditional, enter an “IF” statement that states that recovery is not required. Enter the expression: $G_Recovery_Needed = 0.
b)
From the Tool Palette, select and drag the icon for the Try object and double-click in the “Then” pane of the Conditional editor to inset it. Give the Try object the name Alpha_Employees_Dept_Try
c)
In the Local Object Library, select and drag the data flow Alpha_Employees_Dept_DF into the “Then” pane of the Conditional editor, Continued on next page
2011
© 2011 SAP AG. All rights reserved.
309
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 7: Setting Up Error Handling
d)
BODS10
Connect the Alpha_Employees_Dept_Try Try object to the data flow Alpha_Employees_Dept_DF by right-clicking the Try object and dragging to the data flow while holding down the mouse button. Release the mouse button to create the connection.
e)
From the Tool Palette, select and drag the icon for the Catch object and double-click in the “Then” pane of the Conditional editor to inset it. Give the Catch object the name Alpha_Employees_Dept_Catch
f)
Connect the Alpha_Employees_Dept_Catch Catch object to the data flow Alpha_Employees_Dept_DF by selecting the Catch object and dragging to the data flow while holding down the mouse button. Release the mouse button to create the connection.
g)
Double-click the Catch object Alpha_Employees_Dept_Catch to open its editor.
h)
From the Tool Palette, select the Script icon and then double-click in the Catch object editor to insert the script into the lower pane.
i)
Double-click the script to open it and create an expression to update the flag in the recovery status table to 1, indicating that recovery is needed. Type in this expression: sql('hr_datamart', 'update recovery_status set recovery_flag = 1'); and close the script.
j)
In the Local Object Library, select the Dataflows table and select, drag and drop the data flow Alpha_Employees_Dept_AC_DF to the “Else” pane of the Conditional workspace.
k)
From the Tool Palette, select the Script icon and then double-click in the “Else” pane of the Conditional editor. Enter Recovery_Pass as the name of the script.
l)
Double-click the script to open it and create an expression to update the flag in the recovery status table to 0, indicating that recovery is not needed. Type in this expression: sql('hr_datamart', 'update recovery_status set recovery_flag = 0'); and close the script. Continued on next page
310
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Setting Up Error Handling
m)
7.
Connect the data flow Alpha_Employees_Dept_AC_DF to the script Recover_Pass by selecting the data flow and dragging to the script.
Execute Alpha_Employees_Dept_Recovery_Job with the default properties. a)
In the project area, select your Alpha_Marketing_Offer_Job and choose the option Execute.
b)
Select Save to save all objects you have created.
c)
In the next dialog box, accept all the default execution properties and select OK. Note: Note that the trace log indicates the data flow generated an error, but the job completed successfully due to the try catch block. Note that an error log was generated which indicates a primary key conflict in the target table.
d)
Execute the Alpha_Marketing_Offer_Job a second time. Note: In the log, note that the job succeeds and that the data flow used was Alpha_Employees_Dept_AC_DF
2011
© 2011 SAP AG. All rights reserved.
311
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 7: Setting Up Error Handling
BODS10
Lesson Summary You should now be able to: • Explain the levels of data recovery strategies • Use recoverable work flows using a try/catch block with a conditional
312
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Unit Summary
Unit Summary You should now be able to: • Explain the levels of data recovery strategies • Use recoverable work flows using a try/catch block with a conditional
2011
© 2011 SAP AG. All rights reserved.
313
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit Summary
314
BODS10
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 8 Capturing Changes in Data Unit Overview The design of your data warehouse must take into account how you are going to handle changes in your target system the respective data in your source system changes. Data Integrator transforms provides you with a mechanism to do this. Slow Changing Dimensions (SCD) are dimensions, prevalent in data warehouses,, that have data that changes over time. Three methods of handling these SCDs: no history preservation, unlimited history preservation and new rows, and limited history preservation.
Unit Objectives After completing this unit, you will be able to: • • • • •
Update data which changes slowly over time Use source-based CDC (Change Data Capture) Use time stamps in source-based CDC Manage issues related to using time stamps for source-based CDC Use target-based CDC
Unit Contents Lesson: Capturing Changes in Data .........................................316 Lesson: Using Source-Based Change Data Capture (CDC)..............324 Exercise 20: Using Source-Based Change Data Capture (CDC)....329 Lesson: Using Target-Based Change Data Capture (CDC)...............342 Exercise 21: Using Target-Based Change Data Capture (CDC) .....351
2011
© 2011 SAP AG. All rights reserved.
315
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 8: Capturing Changes in Data
BODS10
Lesson: Capturing Changes in Data Lesson Overview The design of your data warehouse must take into account how you are going to handle changes in your target system when the respective data in your source system changes. Data Integrator transforms provide you with a mechanism to do this
Lesson Objectives After completing this lesson, you will be able to: •
Update data which changes slowly over time
Business Example The current business environment demands a more open data interchange with your customers, subsidiaries and other business partners. There is an increasing need to incorporate data of various formats and source systems. XML has proven to be a reliable, stable standard for the transfer of data. In contrast to the general Pull extraction strategy, the transfer of data is to be initiated by the source systems (Push-Mechanism).
Updating data over time Introduction Data Integrator transforms provide support for updating changing data in your data warehouse. After completing this unit, you can able to: • • • •
Describe the options for updating changes to data Explain the purpose of Changed Data Capture (CDC) Explain the role of surrogate keys in managing changes to data Define the differences between source–based and target–based CDC
Explaining Slowly Changing Dimensions (SCD) Slowly Changing Dimensions are dimensions that have data that changes over time. There are three methods of handling Slowly Changing Dimensions are available:
316
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Capturing Changes in Data
Type
Description Natural consequence of normalization.
Type 1 No history preservation Type 2
•
Unlimited history preservation • and new rows • Type 3
•
Limited history preservation • •
New rows generated for significant changes. Requires use of a unique key. The key relates to facts/time. Optional Effective_Date field. Two states of data are preserved: current and old. New fields are generated to store history data. Requires an Effective_Date field.
Figure 88: Slowly Changing Dimensions
Since SCD Type 2 resolves most of the issues related to slowly changing dimensions, it is explored last. SCD Type 1
2011
© 2011 SAP AG. All rights reserved.
317
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 8: Capturing Changes in Data
BODS10
For a SCD Type 1 change, you find and update the appropriate attributes on a specific dimensional record. For example, to update a record in the SALES_PERSON_DIMENSION table to show a change to an individual’s SALES_PERSON_NAME field, you simply update one record in the SALES_PERSON_DIMENSION table. This action would update or correct that record for all fact records across time. In a dimensional model, facts have no meaning until you link them with their dimensions. If you change a dimensional attribute without appropriately accounting for the time dimension, the change becomes global across all fact records. This is the data before the change: SALES_PERSON_KEY 15
SALES_PERSON_ID 000120
NAME Doe, John B
SALES_TEAM Northwest
This is the same table after the salesperson’s name has been changed: SALES_PERSON_KEY 15
SALES_PERSON_ID 000120
NAME Smith, John B
SALES_TEAM Northwest
However, suppose a salesperson transfers to a new sales team. Updating the salesperson’s dimensional record would update all previous facts so that the salesperson would appear to have always belonged to the new sales team. This may cause issues in terms of reporting sales numbers for both teams. If you want to preserve an accurate history of who was on which sales team, Type 1 is not appropriate. SCD Type 3 To implement a Type 3 change, you change the dimension structure so that it renames the existing attribute and adds two attributes, one to record the new value and one to record the date of the change. A Type 3 implementation has three disadvantages: • • •
318
You can preserve only one change per attribute, such as old and new or first and last. Each Type 3 change requires a minimum of one additional field per attribute and another additional field if you want to record the date of the change. Although the dimension’s structure contains all the data needed, the SQL code required to extract the information can be complex. Extracting a specific value is not difficult, but if you want to obtain a value for a specific point in time or multiple attributes with separate old and new values, the SQL statements become long and have multiple conditions.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Capturing Changes in Data
In summary, SCD Type 3 can store a change in data, but can neither accommodate multiple changes, nor adequately serve the need for summary reporting. This is the data before the change: SALES_PERSON_KEY 15
SALES_PERSON_ID 000120
NAME Doe, John B
SALES_TEAM Northwest
This is the same table after the new dimensions have been added and the salesperson’s sales team has been changed: SALES_PER- OLD_TEAM NEW_TEAM EFF_TO_DATE SALES_ PERSON_ID SON_ NAME Doe, John B
Northwest
Northwest
Oct_31_2004
00120
SCD Type 2 With a Type 2 change, you do not need to make structural changes to the SALES_PERSON_DIMENSION table. Instead, you add a record. This is the data before the change: SALES_PERSON_KEY 15
SALES_PERSON_ID 000120
NAME Doe, John B
SALES_TEAM Northwest
After you implement the Type 2 change, two records appear, as in this table: SALES_PERSON_KEY
SALES_PERSON_ID
NAME
SALES_TEAM
15
000120
Doe, John B
Northwest
133
000120
Doe, John B
Southeast
Updating changes to data Many times you have a large amount of data to update regularly and a small amount of system down time for scheduled maintenance on a data warehouse. You must choose the most appropriate method for updating your data over time, also known as “delta load”. You can choose to do a full refresh of your data or you can choose to extract only new or modified data and update the target system:
2011
© 2011 SAP AG. All rights reserved.
319
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 8: Capturing Changes in Data
BODS10
Figure 89: Introducing Changing Data Capture (CDC)
•
•
Full refresh: Full refresh is easy to implement and easy to manage. This method ensures that no data is overlooked or left out due to technical or programming errors. For an environment with a manageable amount of source data, full refresh is an easy method you can use to perform a delta load to a target system. Capturing only changes: After an initial load is complete, you can choose to extract only new or modified data and update the target system. Identifying and loading only changed data is called Changed Data Capture (CDC). CDC is recommended for large tables. If the tables that you are working with are small, you may want to consider reloading the entire table instead. The benefit of using CDC instead of doing a full refresh is that it: – –
Improves performance because the job takes less time to process with less data to extract, transform, and load Change history can be tracked by the target system so that data can be correctly analyzed over time. For example, if a sales person is assigned a new sales region, simply updating the customer record to reflect the new region negatively affects any analysis by region over time. The purchases made by that customer before the move are attributed to the new region.
Explaining history preservation and surrogate keys History preservation allows the data warehouse or data mart to maintain the history of data in dimension tables so you can analyze it over time. For example, if a customer moves from one sales region to another, simply updating the customer record to reflect the new region would give you misleading results in an analysis by region over time. All purchases made by the customer before the move would incorrectly be attributed to the new region.
320
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Capturing Changes in Data
The solution to this involves introducing a new record for the same customer that reflects the new sales region so that you can preserve the previous record. In this way, accurate reporting is available for both sales regions. To support this, Data Services is set up to treat all changes to records as INSERT rows by default. However, you also need to manage the primary key constraint issues in your target tables that arise when you have more than one record in your dimension tables for a single entity, such as a customer or an employee. For example, with your sales records, the Sales Representative ID is the primary key and is used to link that record to all of the representative's sales orders. If you try to add a new record with the same primary key, it causes an exception. On the other hand, if you assign a new Sales Representative ID to the new record for that representative, you compromise your ability to report accurately on the representative's total sales. To address this issue, you create a surrogate key, which is a new column in the target table that becomes the new primary key for the records. At the same time, you change the properties of the former primary key so that it is simply a data column. When a new record is inserted for the same representative, a unique surrogate key is assigned allowing you to continue to use the Sales Representative ID to maintain the link to the representative’s orders.
Figure 90: Source-based Change Data Capture using Surrogate Keys
2011
© 2011 SAP AG. All rights reserved.
321
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 8: Capturing Changes in Data
BODS10
You can create surrogate keys either by using the gen_row_num or key_generation functions in the Query transform to create a new output column. This automatically increments whenever a new record is inserted, or by using the Key Generation transform, which serves the same purpose. Comparing source–based and target–based CDC Setting up a full CDC solution within Data Services may not be required. Many databases now have CDC support built into them, such as Oracle, SQL Server, and DB2. Alternatively, you could combine surrogate keys with the Map Operation transform to change all UPDATE row types to INSERT row types to capture changes. However, if you do want to set up a full CDC solution, there are two general incremental CDC methods to choose from: source–based and target–based CDC. Source–based CDC evaluates the source tables to determine what has changed and only extracts changed rows to load into the target tables. Target–based CDC extracts all the data from the source, compares the source and target rows using table comparison, and then loads only the changed rows into the target. Source–based CDC is almost always preferable to target–based CDC for performance reasons. However, some source systems do not provide enough information to make use of the source–based CDC techniques. You can use a ombination of the two techniques.
322
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Capturing Changes in Data
Lesson Summary You should now be able to: • Update data which changes slowly over time
2011
© 2011 SAP AG. All rights reserved.
323
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 8: Capturing Changes in Data
BODS10
Lesson: Using Source-Based Change Data Capture (CDC) Lesson Overview To reduce the amount of data, which must be moved, you want to use source-based CDC to provide a delta load of data.
Lesson Objectives After completing this lesson, you will be able to: • • •
Use source-based CDC (Change Data Capture) Use time stamps in source-based CDC Manage issues related to using time stamps for source-based CDC
Business Example The current business environment demands a more open data interchange with your customers, subsidiaries and other business partners. There is an increasing need to incorporate data of various formats and source systems. XML has proven to be a reliable, stable standard for the transfer of data. In contrast to the general Pull extraction strategy, the transfer of data is to be initiated by the source systems (Push-Mechanism). To reduce the amount of data, which must be moved, you want to use source-based CDC to provide a delta load of data.
Using source-based CDC Source–based CDC is the preferred method because it improves performance by extracting the fewest rows. Using source tables to identify changed data
324
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Source-Based Change Data Capture (CDC)
Source–based CDC, sometimes also referred to as incremental extraction, extracts only the changed rows from the source. To use source–based CDC, your source data must have some indication of the change. There are two methods: •
•
Time stamps: You can use the time stamps in your source data to determine what rows have been added or changed since the last time data was extracted from the source. To support this type of source-based CDC, your database tables must have at least an update time stamp; it is preferable to include a create time stamp as well. Change logs: You can also use the information captured by the RDBMS in the log files for the audit trail to determine what data is has been changed. Note: Log–based data is more complex and is outside the scope of this course.
Using CDC with time stamps Timestamp–based CDC is an ideal solution to track changes if: • • •
There are date and time fields in the tables being updated. You are updating a large table that has a small percentage of changes between extracts and an index on the date and time fields. You are not concerned about capturing intermediate results of each transaction between extracts (for example, if a customer changes regions twice in the same day).
We do not recommend that you use timestamp–based CDC if: • • •
You have a large table with a large percentage of it changes between extracts and there is no index on the time stamps. You need to capture physical row deletes. You need to capture multiple events occurring on the same row between extracts.
Some systems have time stamps with dates and times, some with just the dates, and some with monotonically–generated increasing numbers. You can treat dates and generated numbers in the same manner. Note that time zones can become important for time stamps based on real time. You can keep track of time stamps using the nomenclature of the source system (that is, using the source time or source–generated number). Then treat both temporal (specific time) and logical (time relative to another time or event) time stamps in the same way. The basic technique for using time stamps is to add a column to your source and target tables that tracks the time stamps of rows loaded in a job. When the job executes, this column is updated along with the rest of the data. The next job then reads the latest time stamp from the target table and selects only the rows in the source table for which the time stamp is later.
2011
© 2011 SAP AG. All rights reserved.
325
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 8: Capturing Changes in Data
BODS10
This example illustrates the technique. Assume that the last load occurred at 2:00 PM on January 1, 2008. At that time, the source table had only one row (key=1) with a time stamp earlier than the previous load. Data Services loads this row into the target table with the original time stamp of 1:10 PM on January 1, 2008. After 2:00 PM, Data Services adds more rows to the source table: At 3:00 PM on January 1, 2008, the job runs again. The job: 1. 2.
Reads the Last_Update field from the target table (01/01/2008 01:10 PM). Selects rows from the source table that have time stamps that are later than the value of Last_Update. The SQL command to select these rows is: SELECT * FROM Source WHERE Last_Update > '01/01/2007 01:10 pm' This operation returns the second and third rows (key=2 and key=3).
3.
Loads these new rows into the target table.
For time-stamped CDC, you must create a work flow that contains: • •
A script that reads the target table and sets the value of a global variable to the latest time stamp. A data flow that uses the global variable in a “WHERE” clause to filter the data.
The data flow contains a source table, a query, and a target table. The query extracts only those rows that have time stamps later than the last update.
326
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Source-Based Change Data Capture (CDC)
To set up a timestamp-based CDC delta job 1.
In the Variables and Parameters dialog box, add a global variable with a datatype of datetime to your job. The purpose of this global variable is to store a string conversion of the time stamp for the last time the job executed.
2. 3.
In the job workspace, add a script. . In the script workspace, construct an expression to do: •
4. 5.
Select the last time the job was executed from the last update column in the table. • Assign the actual time stamp value to the global variable. Add a data flow to the right of the script using the tool palette. In the data flow workspace, add the source, Query transform, and target objects and connect them. The target table for CDC can not be a template table.
6. 7.
8. 9.
Right–click the surrogate key column and select the Primary Key option in the menu. On the Mapping tab for the surrogate key column, construct an expression to use the key_generation function to generate new keys based on that column in the target table, incrementing by 1. On the “WHERE” tab, construct an expression to select only those records with a time stamp that is later than the global variable. Connect the script to the data flow.
Managing overlaps Without rigorously isolating source data during the extraction process (which typically is not practical), there is a window of time when changes can be lost between two extraction runs. This overlap period affects source–based CDC because this capture relies on a static time stamp to determine changed data. For example, suppose a table has 10,000 rows. If a change is made to one of the rows after it was loaded but before the job ends, the second update can be lost. There are three techniques for handling this situation: • • •
Overlap avoidance Overlap reconciliation Presampling
Overlap avoidance In some cases, it is possible to set up a system where there is no possibility of an overlap. You can avoid overlaps if there is a processing interval where no updates are occurring on the target system.
2011
© 2011 SAP AG. All rights reserved.
327
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 8: Capturing Changes in Data
BODS10
For example, if you can guarantee the data extraction from the source system does not last more than one hour, you can run a job at 1:00 AM every night that selects only the data updated the previous day until midnight. While this regular job does not give you up–to–the–minute updates, it guarantees that you never is an overlap and greatly simplifies time stamp management. Overlap reconciliation Overlap reconciliation requires a special extraction process that re–applies changes that could have occurred during the overlap period. This extraction can be executed separately from the regular extraction. For example, if the highest time stamp loaded from the previous job was 01/01/2008 10:30 PM and the overlap period is one hour, overlap reconciliation re–applies the data updated between 9:30 PM and 10:30 PM on January 1, 2008. The overlap period is equal to the maximum possible extraction time. If it can take up to N hours to extract the data from the source system, an overlap period of N (or N plus a small increment) hours is recommended. For example, if it takes at most two hours to run the job, an overlap period of at least two hours is recommended. Presampling Presampling is an extension of the basic time stamp processing technique. The main difference is that the status table contains both a start and an end time stamp, instead of the last update time stamp. The start time stamp for presampling is the same as the end time stamp of the previous job. The end time stamp for presampling is established at the beginning of the job. It is the most recent time stamp from the source table, commonly set as the system date.
328
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Source-Based Change Data Capture (CDC)
Exercise 20: Using Source-Based Change Data Capture (CDC) Exercise Objectives After completing this exercise, you will be able to: • Use source-based Change Data Capture (CDC) • Use time stamps in source-based CDC
Business Example You need to set up a job to update employee records in the Omega data warehouse whenever they change. The employee records include time stamps to indicate when they were last updated, so you can use source-based CDC.
Task: Construct and configure a batch job Alpha_Employees_Dim_Job, which updates employee table columns based on whether records are new or have been changed since the last time data was updated. 1.
In the Omega project, create a new batch job and data flow called Alpha_Employees_Dim_Job and a new global variable $G_LastUpdate.
2.
In the job Alpha_Employees_Dim_Job workspace, add a script called GetTimeStamp and construct an expression to select the last time the job executed and on that basis, if the time stamp is Null, then all records are processed. If the time stamp is not Null, then assign the value to the global variable GetTimeStamp.
3.
In the job Alpha_Employees_Dim_Job workspace, add a data flow Alpha_Employees_Dim_DF to the right of the script and connect it to the script.
4.
Add the Employee table from the Alpha datastore as the source object and the Emp_Dim table from the Omega datastore as the target object of the data flow Alpha_Employees_Dim_DF. Connect them with a Query transform.
5.
Map the Schema In fields of the Query transform to the Schema Out fields. Schema In
Schema Out
EMPLOYEEID
EMPLOYEEID
LASTNAME
LASTNAME
FIRSTNAME
FIRSTNAME Continued on next page
2011
© 2011 SAP AG. All rights reserved.
329
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 8: Capturing Changes in Data
BODS10
BIRTHDATE
BIRTHDATE
HIREDATE
HIREDATE
ADDRESS
ADDRESS
PHONE
PHONE
EMAIL
EMAIL
REPORTSTO
REPORTSTO
LastUpdate
LAST_UPDATE
discharge_date
DISCHARGE_DATE
6.
Create a mapping expression for the SURR_KEY column that generates new keys based on the Emp_Dim target table incrementing by 1 by using the Functions wizard.
7.
For the CITY output column, change the mapping to perform a lookup of CITYNAME from the City table in the Alpha datastore based on the city ID.
8.
For the REGION output column, change the mapping to perform a lookup of REGIONNAME from the City table in the Alpha datastore based on the city ID.
9.
For the COUNTRY output column, change the mapping to perform a lookup of COUNTRYNAME from the City table in the Alpha datastore based on the city ID.
10. For the DEPARTMENT output column, change the mapping to perform a lookup of DEPARTMENTNAME from the City table in the Alpha datastore based on the city ID. 11. On the WHERE tab, construct an expression to select only those records with a time stamp that is later than the value of the global variable $G_LastUpdate. 12. Execute Alpha_Employees_Dim_Job with the default properties.
330
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Source-Based Change Data Capture (CDC)
Solution 20: Using Source-Based Change Data Capture (CDC) Task: Construct and configure a batch job Alpha_Employees_Dim_Job, which updates employee table columns based on whether records are new or have been changed since the last time data was updated. 1.
In the Omega project, create a new batch job and data flow called Alpha_Employees_Dim_Job and a new global variable $G_LastUpdate. a)
In the project area, right-click the Omega project to select the option New batch job and enter the name Alpha_Employees_Dim_Job.
b)
In the project area, select the job Alpha_Employees_Dept_Recovery_Job and then use the menu path Tools → Variables.
c)
Right-click Variables and select Insert from the menu.
d)
Right-click the new variable and select Properties from the menu and enter $G_LastUpdate in the Global Variable Properties dialog box. In the Data type drop-down list, select datetime for the datatype and select OK.
2.
In the job Alpha_Employees_Dim_Job workspace, add a script called GetTimeStamp and construct an expression to select the last time the job executed and on that basis, if the time stamp is Null, then all records are processed. If the time stamp is not Null, then assign the value to the global variable GetTimeStamp. a)
From the Tool Palette, select the Script icon and then double-click in the work flow workspace to insert the script. Name it GetTimeStamp
b)
Double-click the script GetTimeStamp to open it and create an expression to update the value of the global variable to the value of the last update column in the employee dimension table. Type in this expression: $G_LastUpdate = sql('omega', 'select max(LAST_UPDATE) from emp_dim'); if ($G_LastUpdate is null) $G_LastUpdate = to_date ('1901.01.01', 'YYYY.MM.DD');
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
331
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 8: Capturing Changes in Data
BODS10
else print('Last update was ' || $G_LastUpdate); 1. 2. 3.
4.
In this script: Select the last time the job was executed from the last update column in the employee dimension table. If the last update column is NULL, assign a value of January 1, 1901 to the $G_LastUpdate global variable. When the job executes for the initial load, this ensures that all records are processed. If the last update column is not NULL, assign the actual time stamp value to the $G_LastUpdate global variable. Note: The last two lines of the script are not necessary, but should be included for robustness in case the time stamp is null.
c) 3.
Close the script and return to the job workspace. Double-click the data flow to open its workspace.
In the job Alpha_Employees_Dim_Job workspace, add a data flow Alpha_Employees_Dim_DF to the right of the script and connect it to the script. a)
From the Tool Palette, select the Data Flow icon. Double-click in the job workspace to insert the data flow and name it Alpha_Employees_Dim_DF.
b)
Holding down the mouse button, select the script GetTimeStamp and drag to the data flow. Release the mouse button to create the connection.
c)
Double-click the data flow to open its workspace.
Continued on next page
332
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Source-Based Change Data Capture (CDC)
4.
Add the Employee table from the Alpha datastore as the source object and the Emp_Dim table from the Omega datastore as the target object of the data flow Alpha_Employees_Dim_DF. Connect them with a Query transform. a)
In the Local Object Library, select the tab Datastores and select the Employee table from the Alpha datastore and drag it into the data flow workspace. From the menu, select the option Make Source.
b)
In the Local Object Library, select the tab Datastores and select the Emp_Dim table from the Omega datastore and drag it into the data flow workspace. From the menu, select the option Make Target.
5.
c)
From the Tool Palette, select the Query transform icon and double-click the data flow workspace to insert it.
d)
Connect the source table to the Query transform and connect the Query transform to the target table.
Map the Schema In fields of the Query transform to the Schema Out fields. Schema In
Schema Out
EMPLOYEEID
EMPLOYEEID
LASTNAME
LASTNAME
FIRSTNAME
FIRSTNAME
BIRTHDATE
BIRTHDATE
HIREDATE
HIREDATE
ADDRESS
ADDRESS
PHONE
PHONE
EMAIL
EMAIL
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
333
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 8: Capturing Changes in Data
BODS10
REPORTSTO
REPORTSTO
LastUpdate
LAST_UPDATE
discharge_date
DISCHARGE_DATE
a)
Double-click the Query transform to open the editor.
b)
May the columns in the Schema In pane to the columns in the Schema Out pane by selecting and dragging the column from Schema In to Schema Out for the fields: Schema In
Schema Out
EMPLOYEEID
EMPLOYEEID
LASTNAME
LASTNAME
FIRSTNAME
FIRSTNAME
BIRTHDATE
BIRTHDATE
HIREDATE
HIREDATE
ADDRESS
ADDRESS
PHONE
PHONE
EMAIL
EMAIL
REPORTSTO
REPORTSTO
LastUpdate
LAST_UPDATE
discharge_date
DISCHARGE_DATE
Continued on next page
334
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Source-Based Change Data Capture (CDC)
6.
Create a mapping expression for the SURR_KEY column that generates new keys based on the Emp_Dim target table incrementing by 1 by using the Functions wizard. a)
In the Schema Out pane, select the output column SURR_KEY and go to the Mapping tab.
b)
Select the Function button and in the Select Function dialog box, open the category of “Database Functions”.
c)
From the list of function names, select the key_generation function and select the Next button.
d)
In the key_generation - Select Parameters dialog box, enter the parameters: Field/Option
Value
Table
OMEGA.DBO.EMP_DIM
Key_column
SURR_KEY
Key_increment
1
Note: The resulting expression should be: key_generation('Omega.dbo.emp_dim', 'SURR_KEY', 1) e)
Select the Finish button.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
335
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 8: Capturing Changes in Data
7.
BODS10
For the CITY output column, change the mapping to perform a lookup of CITYNAME from the City table in the Alpha datastore based on the city ID. a)
Go to the Mapping tab for the output schema field CITY and delete the existing expression by highlighting it and using the Delete button on your keyboard.
b)
Select the Function button and in the Select Function dialog box, open the category of “Database Functions”.
c)
From the list of function names, select the lookup_ext function and select the Next button.
d)
In the Lookup_ext - Select Parameters dialog box, enter the parameters: Field/Option
Value
Lookup table
ALPHA.SOURCE.CITY
Condition Columns in lookup table
CITYID
Op.(&)
=
Expression
OMEGA.EMP_DIM.CITYID
Output Column in lookup table e)
CITYNAME
Select the Finish button.
Continued on next page
336
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Source-Based Change Data Capture (CDC)
8.
For the REGION output column, change the mapping to perform a lookup of REGIONNAME from the City table in the Alpha datastore based on the city ID. a)
Go to the Mapping tab for the output schema field REGION and delete the existing expression by highlighting it and using the Delete button on your keyboard.
b)
Select the Function button and in the Select Function dialog box, open the category of “Database Functions”.
c)
From the list of function names, select the lookup_ext function and select the Next button.
d)
In the Lookup_ext - Select Parameters dialog box, enter the parameters: Field/Option
Value
Lookup table
ALPHA.SOURCE.REGION
Condition Columns in lookup table
REGIONID
Op.(&)
=
Expression
OMEGA.EMP_DIM.REGIONID
Output Column in lookup table e)
REGIONAME
Select the Finish button.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
337
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 8: Capturing Changes in Data
9.
BODS10
For the COUNTRY output column, change the mapping to perform a lookup of COUNTRYNAME from the City table in the Alpha datastore based on the city ID. a)
Go to the Mapping tab for the output schema field COUNTRY and delete the existing expression by highlighting it and using the Delete button on your keyboard.
b)
Select the Function button and in the Select Function dialog box, open the category of “Database Functions”.
c)
From the list of function names, select the lookup_ext function and select the Next button.
d)
In the Lookup_ext - Select Parameters dialog box, enter the parameters: Field/Option
Value
Lookup table
ALPHA.SOURCE.COUNTRY
Condition Columns in lookup table
COUNTRYID
Op.(&)
=
Expression
OMEGA.EMP_DIM.COUNTRYID
Output Column in lookup table e)
COUNTRYNAME
Select the Finish button.
Continued on next page
338
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Source-Based Change Data Capture (CDC)
10. For the DEPARTMENT output column, change the mapping to perform a lookup of DEPARTMENTNAME from the City table in the Alpha datastore based on the city ID. a)
Go to the Mapping tab for the output schema field DEPARTMENT and delete the existing expression by highlighting it and using the Delete button on your keyboard.
b)
Select the Function button and in the Select Function dialog box, open the category of “Database Functions”.
c)
From the list of function names, select the lookup_ext function and select the Next button.
d)
In the Lookup_ext - Select Parameters dialog box, enter the parameters: Field/Option
Value
Lookup table
ALPHA.SOURCE.DEPARTMENT
Condition Columns in lookup table
DEPARTMENTID
Op.(&)
=
Expression
OMEGA.EMP_DIM.DEPARTMENTID
Output Column in lookup table e)
DEPARTMENTNAME
Select the Finish button.
11. On the WHERE tab, construct an expression to select only those records with a time stamp that is later than the value of the global variable $G_LastUpdate. a)
In the transform editor of the Query transform, select the WHERE tab.
b)
In the workspace, enter the expression: employee.LastUpdate > $G_LastUpdate
c)
Select the Back icon to close the editor.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
339
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 8: Capturing Changes in Data
BODS10
12. Execute Alpha_Employees_Dim_Job with the default properties.
340
a)
In the project area, select your Alpha_Marketing_Offer_Job and choose the option Execute.
b)
Select Save to save all objects you have created.
c)
In the next dialog box, accept all the default execution properties and select OK.
d)
According to the log, the last update for the table was on “2007.10.04”.
e)
Return to the data flow workspace and view the data for the target table. Sort the records by the LAST_UPDATE column.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Source-Based Change Data Capture (CDC)
Lesson Summary You should now be able to: • Use source-based CDC (Change Data Capture) • Use time stamps in source-based CDC • Manage issues related to using time stamps for source-based CDC
Related Information • •
2011
For more information on using logs for CDC, see “Techniques for Capturing Data”, in the Data Services Designer Guide. For more information see “Source–based and target–based CDC” in “Techniques for Capturing Changed Data”“ ” in the Data Services Designer Guide.
© 2011 SAP AG. All rights reserved.
341
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 8: Capturing Changes in Data
BODS10
Lesson: Using Target-Based Change Data Capture (CDC) Lesson Overview You find that some of your data does not provide any time stamps or logs to provide a source-based CDC. You want to investigate using target–based CDC to compare the source to the target to determine which records have changed.
Lesson Objectives After completing this lesson, you will be able to: •
Use target-based CDC
Business Example The current business environment demands a more open data interchange with your customers, subsidiaries and other business partners. There is an increasing need to incorporate data of various formats and source systems. XML has proven to be a reliable, stable standard for the transfer of data. In contrast to the general Pull extraction strategy, the transfer of data is to be initiated by the source systems (Push-Mechanism). You find that some of your data does not provide any time stamps or logs to provide a source-based CDC. You want to investigate using target–based CDC to compare the source to the target to determine which records have changed.
Using target-based CDC Target–based CDC compares the source to the target to determine which records have changed. Using target tables to identify changed data Source–based CDC evaluates the source tables to determine what has changed and only extracts changed rows to load into the target tables. Target–based CDC, by contrast, extracts all the data from the source, compares the source and target rows, and then loads only the changed rows into the target with new surrogate keys. Source–based changed–data capture is almost always preferable to target–based capture for performance reasons; however, some source systems do not provide enough information to make use of the source–based CDC techniques. Target–based CDC allows you to use the technique when source–based change information is limited.
342
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Target-Based Change Data Capture (CDC)
You can preserve history by creating a data flow that contains: • • •
•
• •
A source table contains the rows to be evaluated. A Query transform maps columns from the source. A Table Comparison transform compares the data in the source table with the data in the target table to determine what has changed. It generates a list of “INSERT” and “UPDATE” rows based on those changes. This circumvents the default behavior in Data Services of treating all changes as INSERT rows. A History Preserving transform converts certain “UPDATE” rows to “INSERT” rows based on the columns in which values have changed. This produces a second row in the target instead of overwriting the first row. A Key Generation transform generates new keys for the updated rows that are now flagged as INSERT. A target table receives the rows. The target table can not be a template table.
Figure 91: Target-Based Change Data Capture
Identifying history preserving transforms Data Services supports history preservation with three Data Integrator transforms:
2011
© 2011 SAP AG. All rights reserved.
343
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 8: Capturing Changes in Data
BODS10
Transform
Description
History Preserving
Converts rows flagged as “UPDATE” to “UPDATE” plus “INSERT”, so that the original values are preserved in the target. You specify the column in which to look for updated data.
Key Generation
Generates new keys for source data, starting from a value based on existing keys in the table you specify.
Table Comparison
Compares two data sets and produces the difference between them as a data set with rows flagged as “INSERT” and “UPDATE”.
Explaining the Table Comparison transform The Table Comparison transform allows you to detect and forward changes that have occurred since the last time a target was updated. This transform compares two data sets and produces the difference between them as a data set with rows flagged as “INSERT” or “UPDATE”. For example, the transform compares the input and comparison tables and determines that row 10 has a new address, row 40 has a name change, and row 50 is a new record. The output includes all three records, flagged as appropriate:
Figure 92: Table Comparison Transform
344
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Target-Based Change Data Capture (CDC)
The next section gives a brief description of the function, data input requirements, options, and data output results for the Table Comparison transform. Input/output The transform compares two data sets, one set from the input to the Table Comparison transform (input data set), and one set from a database table specified in the Table Comparison transform (the comparison table). The transform selects rows from the comparison table based on the primary key values from the input data set. The transform compares columns that exist in the schemas for both inputs. The input data set must be flagged as “NORMAL”. The output data set contains only the rows that make up the difference between the tables. The schema of the output data set is the same as the schema of the comparison table. No “DELETE” operations are produced. If a column has a date datatype in one table and a datetime datatype in the other, the transform compares only the date section of the data. The columns can also be time and datetime data types, in which case Data Integrator only compares the time section of the data. For each row in the input data set, there are three possible outcomes from the transform: •
An “INSERT” column is added: The primary key value from the input data set does not match a value in the comparison table. The transform produces an “INSERT” row with the values from the input data set row. If there are columns in the comparison table that are not present in the input data set, the transform adds these columns to the output schema and fills them with NULL values.
•
An “UPDATE” row is added: The primary key value from the input data set matches a value in the comparison table, and values in the non–key compare columns differ in the corresponding rows from the input data set and the comparison table. The transform produces an “UPDATE” row with the values from the input data set row. If there are columns in the comparison table that are not present in the input data set, the transform adds these columns to the output schema and fills them with values from the comparison table
•
The row is ignored: The primary key value from the input data set matches a value in the comparison table, but the comparison does not indicate any changes to the row values.
Options The Table transform offers several options:
2011
© 2011 SAP AG. All rights reserved.
345
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 8: Capturing Changes in Data
BODS10
Option
Description
Table name
Specifies the fully qualified name of the source table from which the maximum existing key is determined (key source table). This table must already be imported into the repository. Table name is represented as datastore.owner.table where datastore is the name of the datastore Data Services uses to access the key source table and owner depends on the database type associated with the table.
Generated key column
Specifies a column in the comparison table. When there is more than one row in the comparison table with a given primary key value, this transform compares the row with the largest generated key value of these rows and ignores the other rows. This is optional.
Input contains duplicate Provides support for input rows with duplicate keys primary key values. Detect deleted row(s) from comparison table
Flags the transform to identify rows that have been deleted from the source.
Comparison method
Allows you to select the method for accessing the comparison table. You can select from Row–by–row select, Cached comparison table, and Sorted input.
Input primary key column(s)
Compare columns
Specifies the columns in the input data set that uniquely identify each row. These columns must be Improves by table comparing only the column subset present in performance the comparison with the same of columns you drag into this box from the input names and data types. schema. If no columns are listed, all columns in the input data set that are also in the comparison table are used as compare columns. This is optional.
Explaining the History Preserving transform The History Preserving transform ignores everything but rows flagged as “UPDATE”. For these rows, it compares the values of specified columns and, if the values have changed, flags the row as “INSERT”. This produces a second row in the target instead of overwriting the first row. For example, a target table that contains employee information is updated periodically from a source table. In this case, the Table Comparison transform has flagged the name change for row 40 as an update. However, the History Preserving transform is set up to preserve history on the LastName column, so the output changes the operation code for that record from “UPDATE” to “INSERT”.
346
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Target-Based Change Data Capture (CDC)
Figure 93: History Preserving Transform
The next section gives a brief description of the function, data input requirements, options, and data output results for the History Preserving transform. Input/output The input data set is the result of a comparison between two versions of the same data. Rows with changed data from the newer version are flagged as “UPDATE” rows and new data from the newer version are flagged as “INSERT” rows. The output data set contains rows flagged as “INSERT” or “UPDATE”. Options The History Preserving transform offers these options: Option
2011
Description
Valid from
Specifies a date or datetime column from the source schema. Specify a Valid from date column if the target uses an effective date to track changes in data.
Valid to
Specifies a date value in the format: “YYYY.MM.DD”. The Valid to date can not be the same as the Valid from date.
© 2011 SAP AG. All rights reserved.
347
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 8: Capturing Changes in Data
BODS10
Specifies a column from the source schema that identifies the current valid row from a set of rows with the same primary key. The flag column indicates whether a row is the most current data in the target for a given primary key. Defines an expression that outputs a value with the same datatype as the value in the Set flag column. This value is used to update the current flag column in the new row in the target added to preserve history of an existing row.
Column
Set value
Reset value
Preserve delete row(s) as update row(s)
Defines an expression that outputs a value with the same datatype as the value in the Reset flag column. This value“DELETE” is used to update current flagrows column Converts rows tothe“UPDATE” in in an existing row in the target that included changes in the target. If you previously set effective date values one or more compare columns. (Valid from of andthe Valid to), sets the Valid to value to the execution date. This option is used to maintain slowly changing dimensions by feeding a complete data set first through the Table Comparison transform with its Detect deleted row(s) from comparison table option selected. Lists the column or columns in the input data set that are to be compared for changes. •
Compare columns
If the values in the specified compare columns in each version match, the transform flags the row as “UPDATE”. The row from the “before” version is updated. The date and flag information is also updated. • If the values in each version do not match, the row from the latest version is flagged as “INSERT” when output from the transform. This adds a new row to the warehouse with the values from the new row. Updates to non-history preserving columns update all versions of the row if the update is performed on the natural key (for example, Customer), but only update the latest version if the update is on the generated key.
Explaining the Key Generation transform The Key Generation transform generates new keys before inserting the data set into the target in the same way as the “key_generation” function does. When it is necessary to generate artificial keys in a table, this transform looks up the
348
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Target-Based Change Data Capture (CDC)
maximum existing key value from a table and uses it as the starting value to generate new keys. The transform expects the generated key column to be part of the input schema. For example, suppose the History Preserving transform produces rows to add to a warehouse, and these rows have the same primary key as rows that already exist in the warehouse. In this case, you can add a generated key to the warehouse table to distinguish these two rows that have the same primary key. The next section gives a brief description of the function, data input requirements, options, and data output results for the Key Generation transform. Input/output The input data set is the result of a comparison between two versions of the same data. Changed data from the newer version are flagged as “UPDATE” rows and new data from the newer version are flagged as INSERT rows. The output data set is a duplicate of the input data set, with the addition of key values in the generated key column for input rows flagged as “INSERT”. Options The Key Generation transform offers these options: Option
2011
Description
Table name
Specifies the fully qualified name of the source table from which the maximum existing key is determined (key source table). This table must be already imported into the repository. Table name is represented as datastore.owner.table where datastore is the name of the datastore Data Services uses to access the key source table and owner depends on the database type associated with the table.
Generated key column
Specifies the column in the key source table containing the existing key values. A column with the same name must exist in the input data set; the new key is inserted in this column.
Increment values
Indicates the interval between generated key values.
© 2011 SAP AG. All rights reserved.
349
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 8: Capturing Changes in Data
350
© 2011 SAP AG. All rights reserved.
BODS10
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Target-Based Change Data Capture (CDC)
Exercise 21: Using Target-Based Change Data Capture (CDC) Exercise Objectives After completing this exercise, you will be able to: • Use target-based change data capture
Business Example The current business environment demands a more open data interchange with your customers, subsidiaries and other business partners. There is an increasing need to incorporate data of various formats and source systems. XML has proven to be a reliable, stable standard for the transfer of data. In contrast to the general Pull extraction strategy, the transfer of data is to be initiated by the source systems (Push-Mechanism). You find that some of your data does not provide any time stamps or logs to provide a source-based CDC. You want to investigate using target–based CDC to compare the source to the target to determine which records have changed.
Task: You need to set up a job to update product records in the Omega data warehouse whenever they change. The product records do not include time stamps to indicate when they were last updated. You must use target-based change data capture to extract all records from the source and compare them to the target. 1.
In the Omega project, create a new batch job called Alpha_Product_Dim_Job containing a data flow called Alpha_Product_Dim_DF.
2.
In the workspace for Alpha_Product_Dim_DF, add the Product table from the Alpha datastore as the source object and the Prod_Dim table from the Omega datastore as the target object.
3.
Add a Query transform to the workspace connecting it to the source and target objects. In addition, add the Table Comparison, History Preserving and Key Generation transforms to the workspace.
4.
In the transform editor for the Query transform, map input columns to output columns. by dragging corresponding columns from the input schema to the output schema. After deleting the link between the Query transform and the target table, complete the connection of the remaining objects in the data flow workspace.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
351
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 8: Capturing Changes in Data
352
BODS10
5.
In the transform editor for the Table Comparison transform, use the Prod_Dim table in the Omega datastore as the comparison table and set the field SURR_KEY as the generated key column.
6.
In the transform editor for the Key Generation transform, set up key generation based on the SURR_KEY column of the Prod_Dim table and increment the key by a value of 1. In addition, do not configure the History Preserving transform.
7.
In the data flow workspace, before executing the job, display the data in both the source and target tables.
8.
Execute the Alpha_Product_Dim_Job with the default execution properties.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Target-Based Change Data Capture (CDC)
Solution 21: Using Target-Based Change Data Capture (CDC) Task: You need to set up a job to update product records in the Omega data warehouse whenever they change. The product records do not include time stamps to indicate when they were last updated. You must use target-based change data capture to extract all records from the source and compare them to the target. 1.
2.
In the Omega project, create a new batch job called Alpha_Product_Dim_Job containing a data flow called Alpha_Product_Dim_DF. a)
In the Project area, right-click the project name and choose New Batch Job from the menu.
b)
Enter the name of the job as Alpha_Product_Dim_Job.
c)
Press Enter to commit the change.
d)
Open the job Alpha_Product_Dim_Job by double-clicking it.
e)
Select the Data Flow icon in the Tool Palette.
f)
Select the workspace where you want to add the data flow.
g)
Enter Alpha_Product_Dim_DF as the name.
h)
Press Enter to commit the change.
i)
Double-click the data flow to open the data flow workspace.
In the workspace for Alpha_Product_Dim_DF, add the Product table from the Alpha datastore as the source object and the Prod_Dim table from the Omega datastore as the target object. a)
In the Local Object Library, select the Datastores tab and then select the Product table from the Alpha datastore.
b)
Select and drag the object to the data flow workspace and in the context menu, choose the option Make Source.
c)
In the Local Object Library, select the Datastores tab and then select the Prod_Dim table from the Omega datastore.
d)
Select and drag the object to the data flow workspace and in the context menu, choose the option Make Target.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
353
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 8: Capturing Changes in Data
3.
BODS10
Add a Query transform to the workspace connecting it to the source and target objects. In addition, add the Table Comparison, History Preserving and Key Generation transforms to the workspace. a)
In the Tool Palette, select the Query transform icon and select the workspace to add a Query template to the data flow.
b)
Connect the source table Product to the Query transform by selecting the source table and holding down the mouse button, drag the cursor to the Query transform. Then release the mouse button to create the connection.
c)
Connect the target table Prod_Dim to the Query transform by selecting the Query transform and holding down the mouse button, drag the cursor to the target table. Then release the mouse button to create the connection.
d)
In the Local Object Library, select the Transforms tab. Then select and drag the Table Comparison transform to the data flow workspace to the right of the Query transform.
e)
In the Local Object Library, select the Transforms tab. Then select and drag the History Preserving transform to the data flow workspace to the right of the Table Comparison transform.
f)
In the Local Object Library, select the Transforms tab. Then select and drag the Key Generation transform to the data flow workspace to the right of the History Preserving transform.
4.
In the transform editor for the Query transform, map input columns to output columns. by dragging corresponding columns from the input schema to the output schema. After deleting the link between the Query transform and the target table, complete the connection of the remaining objects in the data flow workspace. a)
Double-click the Query transform to open the editor.
b)
In the Schema In workspace, select and drag corresponding fields to the Schema Out workspace. Schema In
Schema Out
PRODUCTID
PRODUCTID
PRODUCTNAME
PRODUCTNAME
CATEGORYID
CATEGORYID
COST
COST
Continued on next page
354
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Target-Based Change Data Capture (CDC)
5.
c)
Go to the Mapping tab for the output schema field SURR_KEY and enter the value NULL. This provides a value until a key can be generated.
d)
Go to the Mapping tab for the output schema field EFFECTIVE_DATE and enter the value sysdate( ). This provides the system current date as the effective date.
e)
Select the Back icon to close the editor.
f)
Delete the link between the Query transform and the target table by right-clicking the link and selecting the option Delete.
g)
Connect the Query transform to the Table Comparison transform by clicking on the Query transform and holding down the mouse button, drag the cursor to the Table Comparison transform. Then release the mouse button to create the connection.
h)
Connect the Table Comparison transform to the History Preserving transform by selecting the Table Comparison transform and holding down the mouse button, drag the cursor to the History Preserving transform. Then release the mouse button to create the connection.
i)
Connect the History Preserving transform to the Key Generation transform by selecting the History Preserving transform and holding down the mouse button, drag the cursor to the Key Generation transform. Then release the mouse button to create the connection.
j)
Connect the Key Generation transform to the target table by selecting the Key Generation transform and holding down the mouse button, drag the cursor to the target table. Then release the mouse button to create the connection.
In the transform editor for the Table Comparison transform, use the Prod_Dim table in the Omega datastore as the comparison table and set the field SURR_KEY as the generated key column. a)
Double-click the Table Comparison transform to open the editor.
b)
Use the drop-down list for the field Table name and select Prod_Dim in the Omega datastore as the comparison table from which the maximum existing key is determined.
c)
Use the drop-down list for the field Generated key column and select SURR_KEY as the generated key column.
d)
Select the fields PRODUCTNAME, CATEGORYID and COST as the comparison columns for the field Compare columns.
e)
Use the drop-down list for the field Input primary key column(s) and select PRODUCTID as the primary key column.
f)
Select the Back icon to close the editor. Continued on next page
2011
© 2011 SAP AG. All rights reserved.
355
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 8: Capturing Changes in Data
6.
7.
8.
356
BODS10
In the transform editor for the Key Generation transform, set up key generation based on the SURR_KEY column of the Prod_Dim table and increment the key by a value of 1. In addition, do not configure the History Preserving transform. a)
Do not configure the History Preserving transform
b)
Double-click the Key Generation transform to open the editor.
c)
Use the drop-down list for the field Table name and select Prod_Dim in the Omega datastore as the comparison table from which the maximum existing key is determined.
d)
Use the drop-down list for the field Generated key column and select SURR_KEY as the generated key column.
e)
Enter the value of 1 for the field Input primary key column(s) and select PRODUCTID as the primary key column.
f)
Select the Back icon to close the editor.
In the data flow workspace, before executing the job, display the data in both the source and target tables. a)
In the data flow workspace, select the magnifying glass button on the source table. A large View Data pane appears beneath the current workspace area.
b)
In the data flow workspace, select the magnifying glass button on the target table. A large View Data pane appears beneath the current workspace area.
c)
Note that the “OmegaSoft” product has been added in the source, but has not yet been updated in the target.
Execute the Alpha_Product_Dim_Job with the default execution properties. a)
In the Omega project area, right-click on the Alpha_Product_Dim_Job and select the option Execute.
b)
Data Services prompts you to save any objects that have not been saved. Select OK.
c)
The Execution Properties dialog box appears and select OK.
d)
Return to the data flow workspace and view the data in the target table to see that the new records for “product IDs 2, 3, 6, 8, and 13” and that “OmegaSoft” has been added to the target.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Target-Based Change Data Capture (CDC)
Lesson Summary You should now be able to: • Use target-based CDC
Related Information • • •
2011
For more information on the Pivot transform see “Transforms” Chapter 5 in the Data Services Reference Guide. For more information on the History Preserving transform see “Transforms” Chapter 5 in the Data Services Reference Guide. For more information on the Key Generation transform see “Transforms” Chapter 5 in the Data Services Reference Guide.
© 2011 SAP AG. All rights reserved.
357
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit Summary
BODS10
Unit Summary You should now be able to: • Update data which changes slowly over time • Use source-based CDC (Change Data Capture) • Use time stamps in source-based CDC • Manage issues related to using time stamps for source-based CDC • Use target-based CDC
358
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9 Using Data Integrator Platforms Unit Overview Data Integrator transforms are used to enhance your data integration projects beyond the core functionality of the platform transforms. These specific transforms perform key operations on data sets to manipulate their structure as they are passed from source to target.
Unit Objectives After completing this unit, you will be able to: • • • • • • •
Using the Data Integrator transforms Use the Pivot transform Describe performance optimization Use the Data Transfer transform View SQL generated by a data flow Use the XML Pipeline transform Use the Hierarchy Flattening transform
Unit Contents Lesson: Using Data Integrator Platform Transforms .......................360 Lesson: Using the Pivot Transform...........................................366 Exercise 22: Using the Pivot transform..................................371 Lesson: Using the Data Transfer Transform and Performance Optimization .....................................................................376 Exercise 23: Using the Data Transfer platform .........................383 Lesson: Using the XML Pipeline Transform .................................392 Exercise 24: Using the XML Pipeline transform ........................397 Lesson: Using the Hierarchy Flattening Transform (Optional) ............405 Exercise 25: Using the Hierarchy Flattening transform (Optional) ...409
2011
© 2011 SAP AG. All rights reserved.
359
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
BODS10
Lesson: Using Data Integrator Platform Transforms Lesson Overview Data Integrator transforms are used to enhance your data integration projects beyond the core functionality of the platform transforms.
Lesson Objectives After completing this lesson, you will be able to: •
Using the Data Integrator transforms
Business Example Data Integrator transforms are used to enhance your data integration projects beyond the core functionality of the platform transforms. In your projects, you encounter XML data with repeated nodes, hierarchy data, or sources of data where there are either too many fields or not enough fields. You find that the platform transforms do not provide enough flexibility and so you turn to the Data Integrator-specific transforms for assistance.
Describing Data Integrator Transforms Data Integrator transforms perform key operations on data sets to manipulate their structure as they are passed from source to target.
360
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Data Integrator Platform Transforms
Figure 94: Data Integrator Transforms
Defining Data Integrator transforms These transforms are available in the Data Integrator branch of the Transforms tab in the Local Object Library: Transform
Description
Data Transfer
Allows a data flow to split its processing into two subdata flows and push down resource-consuming operations to the database server.
Date Generation
Generates a column filled with date values based on the start and end dates and increment you specify.
Effective Date
Generates an additional effective to column based on the primary key’s effective date.
Hierarchy Flattening
Flattens hierarchical data into relational tables so that it can participate in a star schema. Hierarchy flattening can be both vertical and horizontal.
Text Data Processing
Map CDC Operation
Sorts input data, maps output data, and resolves before and after versions for UPDATE rows. While commonly used to support Oracle or mainframe changed data capture, this transform supports any data stream if its input requirements are met.
Pivot
Rotates the values in specified columns to rows.
Reverse Pivot
Rotates the values in specified rows to columns.
XML Pipeline
Processes large XML inputs in small batches.
Date Generation Transform Use this transform to produce the key values for a time dimension target. From this generated sequence, you can populate other fields in the time dimension (such as day_of_week) using functions in a query. Example: To create a time dimension target with dates from the beginning of the year 1997 to the end of the year 2000, place a Date_Generation transform, a query, and a target in a data flow. Connect the output of the Date_Generation transform to the query, and the output of the query to the target. The text below describes three transforms, which are not discussed in their own lessons due to time constraints.
2011
© 2011 SAP AG. All rights reserved.
361
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
BODS10
Figure 95: Date Generation Transform Editor
Inside the Date_Generation transform, specify these Options. (You can also specify a variable for these options.) • • •
Start date: 1997.01.01 (A variable can also be used.) End date: 2000.12.31 (A variable can also be used.) Increment: Daily (A variable can also be used.)
Inside the query, create two target columns and the field name, and define a mapping for these time dimension values: • •
Business quarter: BusQuarter Function: quarter(Generated_date) •Date number from start: DateNum Function: julian(generated_date) julian(1997.01.01)
Effective Date Transform Calculates an “effective-to” value for data that contains an effective date. The calculated effective-to date and an existing effective date produce a date range that allows queries based on effective dates to produce meaningful results.
362
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Data Integrator Platform Transforms
Figure 96: Effective Date Transform Editor
Map CDC Operation Transform Using its input requirements (values for the Sequencing column and a Row operation column), performs three functions: 1. 2.
3.
2011
Sorts input data based on values in Sequencing column box and (optional) the Additional Grouping Columns box. Maps output data based on values in Row Operation Column box. Source table rows are mapped to INSERT, UPDATE, or DELETE operations before passing them on to the target. Resolves missing, separated, or multiple before- and after-images for UPDATE rows.
© 2011 SAP AG. All rights reserved.
363
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
BODS10
Figure 97: Map CDC Operation Transform Editor
While commonly used to support relational or mainframe changed-data capture (CDC), this transform supports any data stream as long as its input requirements are met. Relational CDC sources include Oracle and SQL Server. This transform is typically the last object before the target in a data flow because it produces INPUT, UPDATE and DELETE operation codes. Data Services produces a warning if other objects are used.
364
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using Data Integrator Platform Transforms
Lesson Summary You should now be able to: • Using the Data Integrator transforms
2011
© 2011 SAP AG. All rights reserved.
365
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
BODS10
Lesson: Using the Pivot Transform Lesson Overview Data Integrator transforms are used to enhance your data integration projects beyond the core functionality of the platform transforms. The Pivot and Reverse Pivot transforms let you convert columns to row and rows back into columns.
Lesson Objectives After completing this lesson, you will be able to: •
Use the Pivot transform
Business Example Data Integrator transforms are used to enhance your data integration projects beyond the core functionality of the platform transforms. In your projects, you encounter XML data with repeated nodes, hierarchy data, or sources of data where there are either too many fields or not enough fields. You find that the platform transforms do not provide enough flexibility and so you turn to the Data Integrator-specific transforms for assistance. Here you encounter data that has too many fields and you use the Pivot transform to convert these multiple fields to values of a single fields, thereby creating a structure that is a easier to load.
Using the Pivot transform The Pivot transform creates a new row for each value in a column that you identify as a pivot column. It allows you to change how the relationship between rows is displayed. For each value in each pivot column, Data Services produces a row in the output data set. You can create pivot sets to specify more than one pivot column. For example, you could produce a list of discounts by quantity for certain payment terms so that each type of discount is listed as a separate record, rather than each being displayed in a unique column.
366
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Pivot Transform
Figure 98: Pivot Transfer Concept
The Reverse Pivot transform reverses the process, converting rows into columns. The next section gives a brief description of the function, data input requirements, options, and data output results for the Pivot transform. Data inputs include a data set with rows flagged as NORMAL. Data outputs include a data set with rows flagged as NORMAL. This target includes the non–pivoted columns, a column for the sequence number, the data field column, and the pivot header column. Options The Pivot transform offers several options: Option
2011
Description
Pivot sequence column
Assign a name to the sequence number column. For each row created from a pivot column, Data Services increments and stores a sequence number.
Non-pivot columns
Select the columns in the source that are to appear in the target without modification.
Pivot set
Identify a number for the pivot set. For each pivot set, you define a group of pivot columns, a pivot data field, and a pivot header name.
© 2011 SAP AG. All rights reserved.
367
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
BODS10
Data column field
Specify the column that contains the pivoted data. This column contains all of the Pivot columns values.
Header column
Specify the name of the column that contains the pivoted column names. This column lists the names of the columns where the corresponding data originated.
Pivot columns
Select the columns to be rotated into rows. Describe these columns in the Header column. Describe the data in these columns in the Data field column.
To pivot a table 1. 2. 3.
4. 5. 6. 7. 8. 9.
Open the data flow workspace. Add your source object to the workspace. On the Transforms tab of the Local Object Library, select and drag the Pivot or Reverse Pivot transform to the workspace to the right of your source object. Add your target object to the workspace. Connect the source object to the transform. Connect the transform to the target object. Double-click the Pivot transform to open the transform editor. Select and drag any columns that will not be changed by the transform from the input schema area to the Non–Pivot Columns area. Select and drag any columns that are not tol be pivoted from the input schema area to the Pivot Columns area. If required, you can create more than one pivot set by clicking Add.
10. If desired, change the values in the Pivot sequence column, Data field column, and Header column fields. These are the new columns that will be added to the target object by the transform. 11. Select Back to return to the data flow workspace.
368
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Pivot Transform
Figure 99: Pivot Transform Editor
2011
© 2011 SAP AG. All rights reserved.
369
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
370
© 2011 SAP AG. All rights reserved.
BODS10
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Pivot Transform
Exercise 22: Using the Pivot transform Exercise Objectives After completing this exercise, you will be able to: • Use the Pivot transform
Business Example Currently, employee compensation information is loaded into a table with a separate column for each salary, bonus, and vacation days. For reporting purposes, you need for each of these items to be a separate record in the HR_datamart.
Task: Use the Pivot transform to create a separate row for each entry in a new employee compensation table.
2011
1.
In the Omega project, create a new batch job called Alpha_HR_Comp_Job containing a data flow called Alpha_HR_Comp_DF.
2.
In the workspace for Alpha_HR_Comp_DF, add the HR_Comp_Update table from the Alpha datastore as the source object.
3.
Add a Pivot transform to the data flow and connect it to the source table.
4.
Add a Query transform to the data flow and connect it to the Pivot transform. Create a target template table Employee_Comp in the Delta datastore.
5.
Specify in the Pivot transform that the fields EmployeeID and date_updated are nonpivot columns. Specify that the fields Emp_Salary, Emp_Bonus, and Emp_VacationDays are pivot columns.
6.
In the transform editor for the Query transform, map all fields from the input schema to the output schema and add an expression in the WHERE tab to filter out NULL values for the Comp. column.
7.
Execute the Alpha_HR_Comp_Job with the default execution properties.
© 2011 SAP AG. All rights reserved.
371
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
BODS10
Solution 22: Using the Pivot transform Task: Use the Pivot transform to create a separate row for each entry in a new employee compensation table. 1.
2.
3.
In the Omega project, create a new batch job called Alpha_HR_Comp_Job containing a data flow called Alpha_HR_Comp_DF. a)
In the Project area, right-click the project name and choose New Batch Job from the menu.
b)
Enter the name of the job as Alpha_HR_Comp_Job.
c)
Press Enter to commit the change.
d)
Open the job Alpha_HR_Comp_Job by double-clicking it.
e)
Select the Data Flow icon in the Tool Palette.
f)
Select the workspace where you want to add the data flow.
g)
Enter Alpha_HR_Comp_DF as the name.
h)
Press Enter to commit the change.
i)
Double-click the data flow to open the data flow workspace.
In the workspace for Alpha_HR_Comp_DF, add the HR_Comp_Update table from the Alpha datastore as the source object. a)
In the Local Object Library, select the Datastores tab and then select the HR_Comp_Update table from the Alpha datastore.
b)
Select and drag the object to the data flow workspace and in the context menu, choose the option Make Source.
Add a Pivot transform to the data flow and connect it to the source table. a)
In the Local Object Library, select the Transforms tab. Then select and drag the Pivot transform to the data flow workspace to the right of the source table.
b)
Connect the source table to the Pivot transform by selecting the source table and holding down the mouse button. Then drag the cursor to the Pivot transform and release the mouse button to create the connection.
Continued on next page
372
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Pivot Transform
4.
Add a Query transform to the data flow and connect it to the Pivot transform. Create a target template table Employee_Comp in the Delta datastore. a)
In the Local Object Library, select the Transforms tab. Then select and drag the Query transform to the data flow workspace to the right of the Pivot transform.
5.
b)
Connect the Pivot transform to the Query transform by selecting the Pivot transform and holding down the mouse button. Then drag the cursor to the Query transform and release the mouse button to create the connection.
c)
In the Tool Palette, select the Template Table icon and select the workspace to add a new template table to the data flow.
d)
In the Create Template dialog box, enter Employee_Comp as the template table name.
e)
In the In datastore drop-down list, select the Delta datastore as the template table destination target.
f)
Select OK.
g)
Connect the Query transform to the target template table Employee_Comp by clicking on the Query transform and holding down the mouse button. Then drag the cursor to the template table and release the mouse button.
Specify in the Pivot transform that the fields EmployeeID and date_updated are nonpivot columns. Specify that the fields Emp_Salary, Emp_Bonus, and Emp_VacationDays are pivot columns. a)
Double-click the Pivot transform to open the transform editor.
b)
Drag and drop the fields EmployeeID and date_updated into the Non-Pivot Columns workspace.
c)
Drag and drop the fields Emp_Salary, Emp_Bonus, and Emp_VacationDays into the Pivot Columns workspace.
d)
In the field Data field column, enter the value Comp..
e)
In the field Header column, enter the value Comp_Type.
f)
Select the Back icon to close the editor.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
373
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
6.
7.
374
BODS10
In the transform editor for the Query transform, map all fields from the input schema to the output schema and add an expression in the WHERE tab to filter out NULL values for the Comp. column. a)
Double-click the Query transform to open the transform editor.
b)
Select fields from the Schema In and drag them to he corresponding field in the Schema Out to create the mapping.
c)
Select the WHERE tab and enter the expression: Pivot.Comp is not null.
d)
Select the Back icon to close the editor.
Execute the Alpha_HR_Comp_Job with the default execution properties. a)
In the Omega project area, right-click on the Alpha_HR_Comp_Job and select the option Execute.
b)
Data Services prompts you to save any objects that have not been saved. Select OK.
c)
The Execution Properties dialog box appears and select OK.
d)
Return to the data flow workspace and view the data for the target table.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Pivot Transform
Lesson Summary You should now be able to: • Use the Pivot transform
Related Information •
2011
For more information on the Pivot transform see “Transforms” Chapter 5 in the Data Services Reference Guide
© 2011 SAP AG. All rights reserved.
375
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
BODS10
Lesson: Using the Data Transfer Transform and Performance Optimization Lesson Overview Data Integrator transforms are used to enhance your data integration projects beyond the core functionality of the platform transforms.
Lesson Objectives After completing this lesson, you will be able to: • • •
Describe performance optimization Use the Data Transfer transform View SQL generated by a data flow
Business Example Data Integrator transforms are used to enhance your data integration projects beyond the core functionality of the platform transforms. In your projects, you encounter XML data with repeated nodes, hierarchy data, or sources of data where there are either too many fields or not enough fields. You find that the platform transforms do not provide enough flexibility and so you turn to the Data Integrator-specific transforms for assistance. Here you want to explore the options for performance optimization in your jobs. You would like to push some operations down to the database server and you use the Data Transfer transform to push a WHERE clause containing a table join down to the database server.
Describing the performance optimization You can improve the performance of your jobs by pushing down operations to the source or target database to reduce the number of rows and operations that the engine must retrieve and process.
376
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Data Transfer Transform and Performance Optimization
Figure 100: Performance Optimization Overview
Describing push–down operations Data Services examines the database and its environment when determining which operations to push down to the database: •
Full push–down operations The Data Services optimizer always tries to do a full push–down operation. Full push–down operations are operations that can be pushed down to the databases and the data streams directly from the source database to the target database. For example, Data Services sends SQL INSERT INTO... SELECT statements to the target database and it sends SELECT to retrieve data from the source. Data Services can only do full push–down operations to the source and target databases when these conditions are met: –
•
All of the operations between the source table and target table can be pushed down – The source and target tables are from the same datastore or they are in datastores that have a database link defined between them Partial push–down operations When a full push–down operation is not possible, Data Services tries to push down the SELECT statement to the source database. Operations within the SELECT statement that can be pushed to the database include:
2011
© 2011 SAP AG. All rights reserved.
377
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
BODS10
Operation Aggregations Distinct rows Filtering
Description Aggregate functions, typically used with a Group by statement, always produce a data set smaller than or the Data Services only outputs unique same size as the original data set. rows when you use distinct rows. Filtering can produce a data set smaller than or equal to the original data set.
Joins
Joins typically produce a data set smaller than or similar in size to the original tables.
Ordering
Ordering does not affect data set size. Data Services can efficiently sort data sets that fit in memory. Since Data Services does not perform paging (writing out intermediate results to disk), We recommend that you use a dedicated disk-sorting program such as SyncSort or the DBMS itself to order large data sets.
Projections
A projection produces a smaller data set because it only returns columns referenced by a data flow.
Functions
Most Data Services functions that have equivalents in the underlaying database are appropriately translated.
Data Services cannot push some transform operations to the database. For example: • • • •
Expressions that include Data Services functions that do not have database correspondents. Load operations that contain triggers. Transforms other than Query. Joins between sources that are on different database servers that do not have database links defined between them.
Similarly, not all operations can be combined into single requests. For example, when a stored procedure contains a COMMIT statement or does not return a value, you cannot combine the stored procedure SQL with the SQL for other operations in a query. You can only push operations supported by the RDBMS down to that RDBMS. Note: You cannot push built–in functions or transforms to the source database. For best performance, do not intersperse built–in transforms among operations that can be pushed down to the database. Database–specific functions can only be used in situations where they are pushed down to the database for execution. Viewing SQL generated by a data flow
378
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Data Transfer Transform and Performance Optimization
Before running a job, you can view the SQL generated by the data flow and adjust your design to maximize the SQL that is pushed down to improve performance. Alter your design to improve the data flow when necessary. Keep in mind that Data Services only shows the SQL generated for table sources. Data Services does not show the SQL generated for SQL sources that are not table sources, such as the lookup function, the Key Generation transform, the key_generation function, the Table Comparison transform, and target tables. To view SQL 1.
In the Data Flows tab of the Local Object Library, right–click the data flow and select Display Optimized SQL from the menu. The Optimized SQL dialog box displays.
2.
In the left pane, select the datastore for the data flow. The optimized SQL for the datastore displays in the right pane.
Figure 101: View Optimized SQL
Caching data You can improve the performance of data transformations that occur in memory by caching as much data as possible. By caching data, you limit the number of times the system must access the database. Cached data must fit into available memory. Data Services allows administrators to select a pageable cache location to save content over the 2 GB RAM limit. The pageable cache location is set up in Server Manager and the option to use pageable cache is selected on the Dataflow Properties dialog box.
2011
© 2011 SAP AG. All rights reserved.
379
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
BODS10
Persistent cache datastores can be created with the Create New Datastore dialog box by selecting Persistent Cache as the database type. The newly–created persistent cache datastore appears in the list of datastores, and can be used as a source in jobs. For more information about advanced caching features, see the Data Services Performance Optimization Guide Slicing processes You can also optimize your jobs with process slicing, which involves splitting data flows into subdata flows. Subdata flows work on smaller data sets and/or fewer transforms so there is less virtual memory to consume per process. This way, you can leverage more physical memory per data flow as each subdata flow can access 2 GB of memory. This functionality is available with the Advanced tab for the Query transform. You can run each memory–intensive operation as a separate process.
Figure 102: Performance–Process Slicing
For more information on process slicing, see the Data Services Performance Optimization Guide.
Using the Data Transfer platform Introduction The Data Transfer transform allows a data flow to split its processing into two subdata flows and push down resource–consuming operations to the database server. Explaining the Data Transfer transform
380
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Data Transfer Transform and Performance Optimization
The Data Transfer transform moves data from a source or the output from another transform into a transfer object and subsequently reads data from the transfer object. You can use the Data Transfer transform to push down resource–intensive database operations that occur anywhere within the data flow. The transfer type can be a relational database table, persistent cache table, file, or pipeline.
Figure 103: Data Transfer Transform Editor
Use the Data Transfer transform to: •
•
Push down operations to the database server when the transfer type is a database table. You can push down resource-consuming operations such as joins, GROUP BY, and sorts. Define points in your data flow where you want to split processing into multiple subdata flows that each process part of the data. Data Services does not need to process the entire input data in memory at one time. Instead, the Data Transfer transform splits the processing among multiple subdata flows that each use a portion of memory.
The next section gives a brief description of the function, data input requirements, options, and data output results for the Data Transfer transform. When the input data set for the Data Transfer transform is a table or file transfer type, the rows must be flagged with the NORMAL operation code. When input data set is a pipeline transfer type, the rows can be flagged as any operation code. The input data set must not contain hierarchical (nested) data. Output data sets have the same schema and same operation code as the input data sets. In the push down scenario, the output rows are in the sort or GROUP BY order.
2011
© 2011 SAP AG. All rights reserved.
381
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
BODS10
The subdata flow names use this format dataflowname_n, where n is the number of the data flow: The execution of the output depends on the temporary transfer type: For Table or File temporary transfer types, Data Services automatically splits the data flow into subdata flows and executes them serially. For Pipeline transfer types, Data Services splits the data flow into subdata flows if you specify the option Run as a separate process option in another operation in the data flow. Data Services executes these subdata flows that use pipeline in parallel.
382
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Data Transfer Transform and Performance Optimization
Exercise 23: Using the Data Transfer platform Exercise Objectives After completing this exercise, you will be able to: • Use the Data Transfer transform • View SQL generated by a data flow
Business Example You want to join two database schemas with a join condition and cause the join condition of the data done on the database server. You create two data flows; one with a Data Transfer transform and one without it. In viewing the optimized SQL for each data flow, you see that only the data flow with the Data Transfer transform pushes the join condition down to the database server.
Task 1: The Data Transfer transform can be used to push data down to a database table so that it can be processed by the database server rather than the Data Services Job Server. In this activity, you join two database schemas. When the Data Transfer transform is not used, the join occurs on the Data Services Job Server. When the Data Transfer transform is added to the data flow, the join can be seen in the SQL Query by displaying the optimized SQL for the data flow. 1.
In the Omega project, create a new batch job called No_Data_Transfer_Job containing a data flow called No_Data_Transfer_DF.
2.
In the workspace for No_Data_Transfer_DF, add the Employee_Comp table from the Delta datastore and the Employee table from the Alpha datastore as source objects.
3.
Add a Query transform to the workspace connecting each source object to it
4.
In the transform editor for the Query transform, add the LastName and BirthDate columns from the Employee table and the Comp_Type and Comp columns from the Employee_Comp table to the output schema. Join the two tables on the EmployeeID columns. Caution: Create a target template table Employee_Temp in the Delta datastore. Then save the bath job.
5.
View the optimized SQL for the data flow No_Data_Transfer_DF.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
383
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
BODS10
Task 2: Create a copy of the data flow No_Data_Transfer_DF and in the copy use the Data Transfer transform in addition to the Query transform. Then view the optimized SQL to see the presence of the “WHERE” clause.
384
1.
Create a new batch job Data_Transfer_Job in the Omega project with a replica of the data flow No_Data_Transfer_DF called Data_Transfer_DF.
2.
Add a Data Transfer transform to the workspace and place it between the source table Employee_Comp and the Query transform.
3.
Configure the Data Transfer transform to push the join of data to the database server.
4.
Configure the Query transform to join the Data Transfer transform output to the target table input. Save the objects and display the optimized SQL.
5.
View the optimized SQL for the data flow Data_Transfer_DF.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Data Transfer Transform and Performance Optimization
Solution 23: Using the Data Transfer platform Task 1: The Data Transfer transform can be used to push data down to a database table so that it can be processed by the database server rather than the Data Services Job Server. In this activity, you join two database schemas. When the Data Transfer transform is not used, the join occurs on the Data Services Job Server. When the Data Transfer transform is added to the data flow, the join can be seen in the SQL Query by displaying the optimized SQL for the data flow. 1.
2.
In the Omega project, create a new batch job called No_Data_Transfer_Job containing a data flow called No_Data_Transfer_DF. a)
In the Project area, right-click the project name and choose New Batch Job from the menu.
b)
Enter the name of the job as No_Data_Transfer_Job.
c)
Press Enter to commit the change.
d)
Open the job No_Data_Transfer_Job by double-clicking it.
e)
Select the Data Flow icon in the Tool Palette.
f)
Select the workspace where you want to add the data flow.
g)
Enter No_Data_Transfer_DF as the name.
h)
Press Enter to commit the change.
i)
Double-click the data flow to open the data flow workspace.
In the workspace for No_Data_Transfer_DF, add the Employee_Comp table from the Delta datastore and the Employee table from the Alpha datastore as source objects. a)
In the Local Object Library, select the Datastores tab and then select the Employee_Comp table from the Delta datastore.
b)
Select and drag the objects to the data flow workspace and in the context menu, choose the option Make Source.
c)
In the Local Object Library, select the Datastores tab and then select the Employee table from the Alpha datastore.
d)
Select and drag the objects to the data flow workspace and in the context menu, choose the option Make Source.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
385
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
3.
4.
BODS10
Add a Query transform to the workspace connecting each source object to it a)
In the Tool Palette, select the Query transform icon and select the workspace to add a Query template to the data flow.
b)
Connect the source table Employee_Comp to the Query transform by selecting the source table and holding down the mouse button, drag the cursor to the Query transform. Then release the mouse button to create the connection.
c)
Connect the source table Employee to the Query transform by selecting the source table and holding down the mouse button, drag the cursor to the Query transform. Then release the mouse button to create the connection.
In the transform editor for the Query transform, add the LastName and BirthDate columns from the Employee table and the Comp_Type and Comp columns from the Employee_Comp table to the output schema. Join the two tables on the EmployeeID columns.
Continued on next page
386
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Data Transfer Transform and Performance Optimization
Caution: Create a target template table Employee_Temp in the Delta datastore. Then save the bath job. a)
Double-click the Query transform to open its editor.
b)
From the Schema In workspace, drag the fields LastName and BirthDate columns from the Employee table to the Schema In workspace.
c)
From the Schema In workspace, drag the fields Comp_Type and Comp columns from the Employee_Comp table to the Schema In workspace.
d)
Select the WHERE and enter the expression: Employee_Comp.EmployeeID = Employee.EmployeeID Note: If the field EmployeeID is not available, then use the field LastName. The expression would then be: Employee_Comp.LastName = Employee.LastName
e)
Select the Back icon to close the editor.
f)
In the Tool Palette, select the Template Table icon and select the workspace to add a new template table to the data flow.
g)
In the Create Template dialog box, enter Employee_Temp as the template table name.
h)
In the In datastore drop-down list, select the Delta datastore as the template table destination target.
i)
Select OK.
j)
Connect the Query transform to the target template table Employee_Temp by selecting the Query transform and holding down the mouse button. Then drag the cursor to the template table and release the mouse button.
k)
Go to the Designer tool bar and select the Save All button to save all objects you have created. Do not execute the job.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
387
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
5.
BODS10
View the optimized SQL for the data flow No_Data_Transfer_DF. a)
In the Local Object Library, select the Data Flows tab and right-click the data flow No_Data_Transfer_DF and choose the option Display Optimized SQL.
b)
In the Optimized SQL dialog box, note that the “WHERE” clause does not appear in the SQL statement.
Task 2: Create a copy of the data flow No_Data_Transfer_DF and in the copy use the Data Transfer transform in addition to the Query transform. Then view the optimized SQL to see the presence of the “WHERE” clause. 1.
Create a new batch job Data_Transfer_Job in the Omega project with a replica of the data flow No_Data_Transfer_DF called Data_Transfer_DF. a)
In the Project area, right-click the project name and choose New Batch Job from the menu.
b)
Enter the name of the job as Data_Transfer_Job.
c)
Press Enter to commit the change.
d)
Open the job Data_Transfer_Job by double-clicking it.
e)
In the Local Object Library, select the Data Flows tab and right-click on the data flow No_Data_Transfer_DF and choose the option Replicate.
f)
Right-click the data flow Copy_of_No_Data_Transfer_DF and choose the option Rename. Enter the name Data_Transfer_DF.
g)
Select and drag the data flow Data_Transfer_DF into the Data_Transfer_Job workspace.
h)
Double-click the data flow Data_Transfer_DF to open the workspace.
Continued on next page
388
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Data Transfer Transform and Performance Optimization
2.
3.
Add a Data Transfer transform to the workspace and place it between the source table Employee_Comp and the Query transform. a)
Right-click the link between the source table Employee_Comp and the Query transform to choose the option Delete.
b)
In the Local Object Library, select the Transforms tab, select the Data Transform transform and drag it to the data flow workspace.
c)
Connect the source table Employee_Comp to the Data Transfer transform by selecting the source table and holding down the mouse button, drag the cursor to the Data Transfer transform. Then release the mouse button to create the connection.
d)
Connect the Data Transfer transform to the Query transform by selecting the Data Transfer transform and holding down the mouse button, drag the cursor to the Query transform. Then release the mouse button to create the connection.
Configure the Data Transfer transform to push the join of data to the database server. a)
Double-click the Data Transfer transform to open its editor.
b)
For the field Transfer Type,select the option Table.
c)
In the Table Options section of the transform editor, select the elipses (...) button. Select the Alpha datastore and select Table Name.
d)
In the field Table Name enter PUSHDOWN_DATA with SOURCE in the Owner field.
e)
Select the Back icon to close the editor.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
389
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
4.
BODS10
Configure the Query transform to join the Data Transfer transform output to the target table input. Save the objects and display the optimized SQL. a)
Double-click the Query transform to open its editor.
b)
Go to the WHERE tab and update the expression to join on the EmployeeID fields in the Employee and Data_Transfer tables. The expression in the WHERE tab should look like this: Data_Transfer.EmployeeID = Employee.EmployeeID
5.
c)
Verify that the fields Comp_Type and Comp columns are mapped to the Data Transfer transform.
d)
Select the Back icon to close the editor.
e)
Go to the Designer tool bar and select the Save All button to save all objects you have created. Do not execute the job.
View the optimized SQL for the data flow Data_Transfer_DF. a)
In the Local Object Library, select the Data Flows tab and right-click the data flow Data_Transfer_DF and choose the option Display Optimized SQL.
b)
In the Optimized SQL dialog box, note that the “WHERE” clause does appear in the SQL statement. Note: If you were to execute the batch job Data_Transfer_Job, the job results in an error since the Alpha datastore is read-only. The point of the exercise is to show the Data Transfer transform pushes the table join to the database server. Once this job executes, the WHERE clause no longer appears in the optimized SQL display. The selection is pushed down to the database server and the Job Server is processing what is sent from the database. Once Data_Transfer_Job is executed, the optimized SQL is the same as the SQL of No_Data_Transfer_Job.
390
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Data Transfer Transform and Performance Optimization
Lesson Summary You should now be able to: • Describe performance optimization • Use the Data Transfer transform • View SQL generated by a data flow
Related Information • • •
2011
For more information on the Data Transfer transform see “Transforms” Chapter 5 in the Data Services Reference Guide. For more information about advanced caching features, see the Data Services Performance Optimization Guide For more information on process slicing, see the Data Services Performance Optimization Guide.
© 2011 SAP AG. All rights reserved.
391
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
BODS10
Lesson: Using the XML Pipeline Transform Lesson Overview Data Integrator transforms are used to enhance your data integration projects beyond the core functionality of the platform transforms.
Lesson Objectives After completing this lesson, you will be able to: •
Use the XML Pipeline transform
Business Example Data Integrator transforms are used to enhance your data integration projects beyond the core functionality of the platform transforms. In your projects, you encounter XML data with repeated nodes, hierarchy data, or sources of data where there are either too many fields or not enough fields. You find that the platform transforms do not provide enough flexibility and so you turn to the Data Integrator-specific transforms for assistance. Here you encounter that the data source consists of document containing data in XML format. While non-repeatable nodes are not a problem, the repeatable ones are difficult to load into a relational database without using complex WHERE expressions. You use the XML Pipeline transform to process one instance of a repeatable structure at a time thereby pushing the operation of the streaming transform to the XML source.
Using the XML Pipeline transform Performance Considerations • • •
The XML Pipeline transform is used to process large XML files more efficiently by separating them into small batches. The XML Pipeline transform is used to process large XML files, one instance of a specified repeatable structure at a time. With this transform, Data Services does not need to read the entire XML input into memory and build an internal data structure before performing the transformation.
In a Nested Relational Data Model (NRDM), the only difference between relational data and hierarchical data is one additional datatype. A column can not only be of type number or character, it can also be of type schema. In order to read from a table, a “where” clause is used. However, the “where” used with a column of type schema would have to be complex. To avoid this, the data is first parsed and decoded into a set of table, operations are performed on them and then the data is reassembled into the schema structure.
392
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the XML Pipeline Transform
Using the nested data method can be more concise (no repeated information), and can scale to present a deeper level of hierarchical complexity.
Figure 104: Example of Nested Data
To expand on the example above, columns inside a nested schema can also contain columns. There is a unique instance of each nested schema for each row at each level of the relationship as shown: Generalizing further with nested data, each row at each level can have any number of columns containing nested schemas. Loading a data set that contains nested schemas into a relational target requires that the nested rows be unnested.
2011
© 2011 SAP AG. All rights reserved.
393
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
BODS10
Figure 105: Unnesting Data
Data Services allows you to unnest any number of nested schemas at any depth. No matter how many levels are involved, the result of unnesting schemas is a cross product of the parent and child schemas. When two or more levels of unnesting occur, the inner-most child is unnested first, then the result—the cross product of the parent and the inner-most child is then unnested from its parent, and continuing to the top-level schema. This means that a NRDM structure is not required to represent the entire XML data input. Instead, this transform uses a portion of memory to process each instance of a repeatable structure, then continually releases and reuses the memory to continuously flow XML data through the transform. During execution, Data Services pushes operations of the streaming transform to the XML source. Therefore, you cannot use a breakpoint between your XML source and an XML Pipeline transform. Note: You can use the XML Pipeline transform to load into a relational or nested schema target. This course focuses on loading XML data into a relational target. You can use an XML file or XML message. You can also connect more than one XML Pipeline transform to an XML source. When connected to an XML source, the transform editor shows the input and output schema structures as a root schema containing repeating and non–repeating subschemas represented by these icons as seen in the XML Pipeline transform editor:
394
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the XML Pipeline Transform
Figure 106: XML Pipeline Transform Editor
Keep in mind these rules when using the XML Pipeline transform: • •
•
•
You cannot drag and drop the root level schema. You can drag and drop the same child object repeated times to the output schema, but only if you give each instance of that object a unique name. Rename the mapped instance before attempting to drag and drop the same object to the output again. When you drag and drop a column or subschema to the output schema, you cannot then map the parent schema for that column or subschema. Similarly, when you drag and drop a parent schema, you cannot then map an individual column or subschema from under that parent. You cannot map items from two sibling repeating subschemas because the XML Pipeline transform does not support Cartesian products (combining every row from one table with every row in another table) of two repeatable schemas.
To take advantage of the XML Pipeline transform’s performance, always select a repeatable column to be mapped. For example, if you map a repeatable schema column, the XML source produces one row after parsing one item.
2011
© 2011 SAP AG. All rights reserved.
395
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
BODS10
Avoid selecting non–repeatable columns that occur structurally after the repeatable schema because the XML source must then assemble the entire structure of items in memory before processing. Selecting non–repeatable columns that occur structurally after the repeatable schema increases memory consumption to process the output into your target. To map both the repeatable schema and a non–repeatable column that occurs after the repeatable one, use two XML Pipeline transforms, and use the Query transform to combine the outputs of the two XML Pipeline transforms and map the columns into one single target. Options The XML Pipeline is streamlined to support massive throughput of XML data; therefore, it does not contain additional options other than input and output schemas, and the Mapping tab.
396
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the XML Pipeline Transform
Exercise 24: Using the XML Pipeline transform Exercise Objectives After completing this exercise, you will be able to: • Use the XML Pipeline transform
Business Example Purchase order information is stored in XML files that have repeatable purchase orders and items, and a nonrepeated Total Purchase Orders column. You must combine the customer name, order date, order items, and the totals into a single relational target table, with one row per customer per item.
Task: Use the XML Pipeline transform to unnest purchase order information stored in XML files with repeatable and nonrepeatable elements. 1.
On the Formats tab of the Local Object Library, create a file format for an XML schema called PurchaseOrders_Format based on the purchaseOrders.xsd file in the Activity_Source folder. Use a root element of PurchaseOrders.
2.
In the Omega project, create a new batch job called Alpha_Purchase_Orders_Job containing a data flow called Alpha_Purchase_Orders_DF.
3.
In the workspace for Alpha_Purchase_Orders_DF, add the PurchaseOrders_Format as the XML file source object and configure the file format in the workspace to point to the pos.xml file in the My Documents → BODS10 → Activity_Source folder.
4.
Add two XML Pipeline transforms to the workspace connecting source object to each transform.
5.
In the transform editor of the first XML Pipeline, map these columns: Schema In
Schema Out
customerName
customerName
orderDate
orderDate
and the entire repeatable schema from the input schema to the output schema. 6.
In the transform editor of the second XML Pipeline, map these columns: Continued on next page
2011
© 2011 SAP AG. All rights reserved.
397
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
398
BODS10
Schema In
Schema Out
customerName
customerName
orderDate
orderDate
totalPOs
totalPOs
7.
Add a Query transform to the data flow and connect both XML Pipeline transforms to the Query transform. In the Query transform, map both columns and the repeatable schema from the first XML Pipeline transform from the input schema to the output schema. In addition, map the totalPOs column from the second XML Pipeline transform.
8.
Add a target template table Item_POs in the Delta datastore and connect it to the Query Transform. Execute the Alpha_Purchase_Orders_Job with the default execution properties.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the XML Pipeline Transform
Solution 24: Using the XML Pipeline transform Task: Use the XML Pipeline transform to unnest purchase order information stored in XML files with repeatable and nonrepeatable elements. 1.
2.
On the Formats tab of the Local Object Library, create a file format for an XML schema called PurchaseOrders_Format based on the purchaseOrders.xsd file in the Activity_Source folder. Use a root element of PurchaseOrders. a)
On the Formats tab of the Local Object Library, right-click XML Schemas and select the option New. The Import XML Schema Format editor appears.
b)
In the Format name, enter the value PurchaseOrders_Format.
c)
In the File name/URL field, select the Browse button and navigate to the folder My Documents → BODS10 → Activity_Source and select the file purchaseOrders.xsd. Then select OK to return to the editor.
d)
In the field Root element name drop-down list, select the element PurchaseOrders.
e)
Accepting all other default values, select OK to save your XML format.
In the Omega project, create a new batch job called Alpha_Purchase_Orders_Job containing a data flow called Alpha_Purchase_Orders_DF. a)
In the Project area, right-click the project name and choose New Batch Job from the menu.
b)
Enter the name of the job as Alpha_Purchase_Orders_Job.
c)
Press Enter to commit the change.
d)
Open the job Alpha_Purchase_Orders_Job by double-clicking it.
e)
Select the Data Flow icon in the Tool Palette.
f)
Select the workspace where you want to add the data flow.
g)
Enter Alpha_Purchase_Orders_DF as the name.
h)
Press Enter to commit the change.
i)
Double-click the data flow to open the data flow workspace.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
399
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
3.
BODS10
In the workspace for Alpha_Purchase_Orders_DF, add the PurchaseOrders_Format as the XML file source object and configure the file format in the workspace to point to the pos.xml file in the My Documents → BODS10 → Activity_Source folder. a)
In the Local Object Library, select the XML Formats tab and then select the PurchaseOrders_Format.
b)
Select and drag the object to the data flow workspace and in the context menu, choose the option Make Source.
c)
Double-click the XML file format PurchaseOrders_Format to open the editor.
d)
Change the file path name to D:\CourseFiles\DataSources\Activity_Source. Note: The path name must be typed in exactly. Check with your instructor to verify that the path is correct before proceeding.
e) 4.
5.
Select OK to close the editor.
Add two XML Pipeline transforms to the workspace connecting source object to each transform. a)
In the Local Object Library, select the Transforms tab, select the XML Pipeline transform icon and drag it to the data flow workspace.
b)
In the Local Object Library, select the Transforms tab, select the XML Pipeline transform icon a second time and drag it to the data flow workspace.
c)
Connect the source format PurchaseOrders_Format to the first XML Pipeline transform by selecting the source format and holding down the mouse button, drag the cursor to the first XML Pipeline transform. Then release the mouse button to create the connection. Note: This XML Pipeline transform has the name XML_Pipeline.
d)
Connect the source format PurchaseOrders_Format to the second XML Pipeline transform by selecting the source format and holding down the mouse button, drag the cursor to the second XML Pipeline transform. Then release the mouse button to create the connection. Note: This XML Pipeline transform has the name XML_Pipeline_1.
In the transform editor of the first XML Pipeline, map these columns: Continued on next page
400
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the XML Pipeline Transform
Schema In
Schema Out
customerName
customerName
orderDate
orderDate
and the entire repeatable schema from the input schema to the output schema.
6.
a)
Double-click the first XML Pipeline transform to open the editor.
b)
From the Schema In workspace, select and drag the fields below to the Schema Out workspace. Schema In
Schema Out
customerName
customerName
orderDate
orderDate
c)
In the Schema In workspace, locate, select and drag the item repeatable schema to the Schema Out workspace.
d)
Select Back to close the editor.
In the transform editor of the second XML Pipeline, map these columns: Schema In
Schema Out
customerName
customerName
orderDate
orderDate
totalPOs
totalPOs
a)
Double-click the second XML Pipeline transform to open the editor.
b)
From the Schema In workspace, select and drag the fields below to the Schema Out workspace.
c)
Schema In
Schema Out
customerName
customerName
orderDate
orderDate
totalPOs
totalPOs
Select Back to close the editor.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
401
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
7.
BODS10
Add a Query transform to the data flow and connect both XML Pipeline transforms to the Query transform. In the Query transform, map both columns and the repeatable schema from the first XML Pipeline transform from the input schema to the output schema. In addition, map the totalPOs column from the second XML Pipeline transform. a)
In the Tool Palette, select the Query transform icon and select the workspace to add a Query template to the data flow.
b)
Connect both XML Pipeline transforms to the Query transform by selecting each XML Pipeline transform and holding down the mouse button. Then drag the cursor to the Merge transform and release the mouse button to create the connection.
c)
Double-click the Query transform to open its editor.
d)
From the Schema In workspace, select and drag the fields from the first XML Pipeline transform to the Schema Out workspace. Schema In
Schema Out
customerName
customerName
orderDate
orderDate
item (repeatable schema*)
item (repeatable schema*)
Hint: *The term “repeatable schema” does not appear in either the Schema In or the Schema Out workspaces. The column item is identified as repeatable from the icon associated with it. e)
f)
From the Schema In workspace, select and drag this field from the second XML Pipeline transform to the Schema Out workspace. Schema In
Schema Out
totalPOs
totalPOs
In the Schema Out workspace, right-click the item (repeatable schema*) and choose the option Unnest. Hint: *The term “repeatable schema” does not appear in either the Schema In or the Schema Out workspaces. The column item is identified as repeatable from the icon associated with it.
Continued on next page
402
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the XML Pipeline Transform
8.
2011
g)
In the transform editor of the Query transform, select the WHERE tab and insert a clause to join the outputs from the two XML Pipeline transforms on the customerName column XML_Pipeline.customerName = XML_Pipeline_1.customerName.
h)
Select the Back icon to close the editor.
Add a target template table Item_POs in the Delta datastore and connect it to the Query Transform. Execute the Alpha_Purchase_Orders_Job with the default execution properties. a)
In the Tool Palette, select the Template Table icon and select the workspace to add a new template table to the data flow.
b)
In the Create Template dialog box, enter Item_POs as the template table name.
c)
In the In datastore drop-down list, select the Delta datastore as the template table destination target.
d)
Select OK.
e)
Connect the XML Pipeline transforms to the target table by selecting each XML Pipeline transform and while holding down the mouse button, drag to the target table. Release the button to create the link.
f)
In the Omega project area, right-click on the Alpha_Purchase_Orders_Job and select the option Execute.
g)
Data Services prompts you to save any objects that have not been saved. Select OK.
h)
The Execution Properties dialog box appears and select OK.
i)
Return to the data flow workspace and view the data in the target table by selecting the magnifying glass button on the target table. A large View Data pane appears beneath the current workspace area.
© 2011 SAP AG. All rights reserved.
403
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
BODS10
Lesson Summary You should now be able to: • Use the XML Pipeline transform
Related Information •
404
For more information on constructing nested schemas for your target, refer to the Data Services Designer Guide.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Hierarchy Flattening Transform (Optional)
Lesson: Using the Hierarchy Flattening Transform (Optional) Lesson Overview Data Integrator transforms are used to enhance your data integration projects beyond the core functionality of the platform transforms.
Lesson Objectives After completing this lesson, you will be able to: •
Use the Hierarchy Flattening transform
Business Example Data Integrator transforms are used to enhance your data integration projects beyond the core functionality of the platform transforms. In your projects, you encounter XML data with repeated nodes, hierarchy data, or sources of data where there are either too many fields or not enough fields. You find that the platform transforms do not provide enough flexibility and so you turn to the Data Integrator-specific transforms for assistance. Here you encounter a data source which structures its data in a hierarchy and you need to load it to a flat structure. You use the Hierarchy Flattening transform to produce a horizontallyor veritically-flattened structure.
Using the Hierarchy Flattening transform The Hierarchy Flattening transform enables you to break down hierarchical table structures into a single table to speed up data access. Explaining the Hierarchy Flattening transform The Hierarchy Flattening transform constructs a complete hierarchy from parent/child relationships, and then produces a description of the hierarchy in horizontally– or vertically–flattened format. For horizontally–flattened hierarchies, each row of the output describes a single node in the hierarchy and the path to that node from the root.
2011
© 2011 SAP AG. All rights reserved.
405
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
BODS10
Figure 107: Hierarchy Flattening Transform 1
For vertically–flattened hierarchies, each row of the output describes a single relationship between ancestor and descendent and the number of nodes the relationship includes. There is a row in the output for each node and all of the descendants of that node. Each node is considered its own descendent and, therefore, is listed one time as both ancestor and descendent.
Figure 108: Hierarchy Flattening Transform 2
The next section gives a brief description of the function, data input requirements, options, and data output results for the Hierarchy Flattening transform. Inputs/Outputs
406
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Hierarchy Flattening Transform (Optional)
Data input includes rows describing individual parent–child relationships. Each row must contain two columns that function as the keys of the parent and child in the relationship. The input can also include columns containing attributes describing the parent and/or child. The input data set cannot include rows with operations other than NORMAL, but can contain hierarchical data. For a listing of the target columns, consult the Data Services Reference Guide. Options The Hierarchy Flattening transform offers several options:
Figure 109: Hierarchy Flattening Transform Editor
Option Parent column Child column Flattening type Use maximum length paths
2011
Description Identifies the column of the source data that contains the parent identifier in each parent-child relationship. Identifies the column in the source data that contains the child identifier in each parent-child relationship. Indicates how the hierarchical relationships are described in the output. Indicates whether longest or shortest paths are used to describe relationships between descendants and ancestors when the descendent has more than one parent.
© 2011 SAP AG. All rights reserved.
407
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
BODS10
Maximum depth
Indicates the maximum depth of the hierarchy.
Parent attribute list
Identifies a column or columns that are associated with the parent column.
Child attribute list
Identifies a column or columns that are associated with the child column.
Run as a separate process
Creates a separate subdata flow process for the Hierarchy Flattening transform when Data Services executes the data flow.
The Hierarchy Flattening transform can also deal with circular dependencies.
408
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Hierarchy Flattening Transform (Optional)
Exercise 25: Using the Hierarchy Flattening transform (Optional) Exercise Objectives After completing this exercise, you will be able to: • Use the Hierarchy Flattening transform
Business Example The employee table in the Alpha datastore contains employee data in a recursive hierarchy. To determine all reports, direct or indirect, to a given executive or manager would require complex SQL statements to traverse the hierarchy.
Task: Flatten the Employee table data to allow more effective reporting on data containing a recursive hierarchy for all reports, direct or indirect, to a given executive or manager. 1.
In the Omega project, create a new batch job called Alpha_Employees_Report_Job containing a data flow called Alpha_Employees_Report_DF.
2.
In the workspace for Alpha_Employees_Report_DF, add the Employee table from the Alpha datastore as the source object. Create a target template table Manager_Emps in the HR_datamart.
3.
Add a Hierarchy Flattening transform in the workspace to the right of the source table and connect the source table to the transform. In the transform editor, select the options: Options
Value
Flattening Type
Vertical
Parent Column
REPORTSTO
Child Column
EMPLOYEEID
Child Attribute List
LASTNAME FIRSTNAME BIRTHDATE HIREDATE ADDRESS CITYID REGIONID COUNTRYID Continued on next page
2011
© 2011 SAP AG. All rights reserved.
409
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
BODS10
PHONE EMAIL DEPARTMENTID LastUpdate discharge_date
410
4.
Add a Query transform in the workspace to the right of the Hierarchy Flattening transform and connect the transforms.
5.
In the transform editor for the Query transform, create output columns.
6.
In the Query transform, map input columns to output columns. Create lookups for the output fields MANAGER_NAME and DEPARTMENT. Concatenate the employee's last name and first name separated by a comma for the output column EMPLOYEE_NAME. Add a WHERE clause to return only row with a depth greater than zero.
7.
Execute the Alpha_Employees_Report_Job with the default execution properties.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Hierarchy Flattening Transform (Optional)
Solution 25: Using the Hierarchy Flattening transform (Optional) Task: Flatten the Employee table data to allow more effective reporting on data containing a recursive hierarchy for all reports, direct or indirect, to a given executive or manager. 1.
2.
In the Omega project, create a new batch job called Alpha_Employees_Report_Job containing a data flow called Alpha_Employees_Report_DF. a)
In the Project area, right-click the project name and choose New Batch Job from the menu.
b)
Enter the name of the job as Alpha_Employees_Report_Job.
c)
Press Enter to commit the change.
d)
Open the jobAlpha_Employees_Report_Job by double-clicking it.
e)
Select the Data Flow icon in the Tool Palette.
f)
Select the workspace where you want to add the data flow.
g)
Enter Alpha_Employees_Report_DF as the name.
h)
Press Enter to commit the change.
i)
Double-click the data flow to open the data flow workspace.
In the workspace for Alpha_Employees_Report_DF, add the Employee table from the Alpha datastore as the source object. Create a target template table Manager_Emps in the HR_datamart. a)
In the Local Object Library, select the Datastores tab and then select the Employee table from the Alpha datastore.
b)
Select and drag the object to the data flow workspace and in the context menu, choose the option Make Source.
c)
In the Tool Palette, select the Template Table icon and select the workspace to add a new template table to the data flow.
d)
In the Create Template dialog box, enter Manager_Emps as the template table name.
e)
In the In datastore drop-down list, select the Delta datastore as the template table destination target.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
411
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
3.
BODS10
Add a Hierarchy Flattening transform in the workspace to the right of the source table and connect the source table to the transform. In the transform editor, select the options: Options
Value
Flattening Type
Vertical
Parent Column
REPORTSTO
Child Column
EMPLOYEEID
Child Attribute List
LASTNAME FIRSTNAME BIRTHDATE HIREDATE ADDRESS CITYID REGIONID COUNTRYID PHONE EMAIL DEPARTMENTID LastUpdate discharge_date
a)
In the Local Object Library, select the Transforms tab. Then select and drag the Hierarchy Flattening transform to the data flow workspace to the right of the Query transforms.
b)
Connect the Hierarchy Flattening transform to the source table by selecting the source table and holding down the mouse button. Then drag the cursor to the Hierarchy Flattening transform and release the mouse button to create the connection. Double-click the Hierarchy Flattening transform to open the editor.
c)
In the transform editor for the Hierarchy Flattening transform, select the options by dragging columns from the input schema into the appropriate workspace. Options
Value
Flattening Type
Vertical (select with dropdown list)
Parent Column
REPORTSTO
Child Column
EMPLOYEEID
Continued on next page
412
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Hierarchy Flattening Transform (Optional)
Child Attribute List
d) 4.
5.
LASTNAME FIRSTNAME BIRTHDATE HIREDATE ADDRESS CITYID REGIONID COUNTRYID PHONE EMAIL DEPARTMENTID LastUpdate discharge_date
Select the Back icon to close the editor.
Add a Query transform in the workspace to the right of the Hierarchy Flattening transform and connect the transforms. a)
In the Tool Palette, select the Query transform icon and select the workspace to add a Query template to the data flow.
b)
Connect the Hierarchy Flattening transform to the Query transform by selecting the Hierarchy Flattening transform and holding down the mouse button, drag the cursor to the Query transform. Then release the mouse button to create the connection.
c)
Double-click the Query transform to open the transform editor.
In the transform editor for the Query transform, create output columns. a)
In the Schema Out workspace, right-click Query to choose the option New Output Item and enter the Item name MANAGAERID with Data Type varchar(10).
b)
In the Schema Out workspace, right-click MANAGERID to choose the option New Output Item. In the next dialog box, choose the option Below and enter the Item name MANAGERNAME with Data Type varchar(50).
c)
In the Schema Out workspace, right-click MANAGERNAME to choose the option New Output Item. In the next dialog box, choose the option Below and enter the Item name EMPLOYEEID with Data Type varchar(10).
d)
In the Schema Out workspace, right-click EMPLOYEEID to choose the option New Output Item. In the next dialog box, choose the option Below and enter the Item name EMPLOYEE_NAME with Data Type varchar(102). Continued on next page
2011
© 2011 SAP AG. All rights reserved.
413
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
BODS10
e)
In the Schema Out workspace, right-click EMPLOYEE_NAME to choose the option New Output Item. In the next dialog box, choose the option Below and enter the Item name DEPARTMENT with Data Type varchar(50).
f)
In the Schema Out workspace, right-click DEPARTMENT to choose the option New Output Item. In the next dialog box, choose the option Below and enter the Item name HIREDATE with Data Type datetime.
g)
In the Schema Out workspace, right-click HIREDATE to choose the option New Output Item. In the next dialog box, choose the option Below and enter the Item name LASTUPDATE with Data Type datetime.
h)
In the Schema Out workspace, right-click LASTUPDATE to choose the option New Output Item. In the next dialog box, choose the option Below and enter the Item name PHONE with Data Type varchar(20).
i)
In the Schema Out workspace, right-click PHONE to choose the option New Output Item. In the next dialog box, choose the option Below and enter the Item name EMAIL with Data Type varchar(50).
j)
In the Schema Out workspace, right-click EMAIL to choose the option New Output Item. In the next dialog box, choose the option Below and enter the Item name ADDRESS with Data Type varchar(200).
k)
In the Schema Out workspace, right-click ADDRESS to choose the option New Output Item. In the next dialog box, choose the option Below and enter the Item name CITY with Data Type varchar(50).
l)
In the Schema Out workspace, right-click CITY to choose the option New Output Item. In the next dialog box, choose the option Below and enter the Item name REGION with Data Type varchar(50).
m)
In the Schema Out workspace, right-click REGION to choose the option New Output Item. In the next dialog box, choose the option Below and enter the Item name COUNTRY with Data Type varchar(50).
n)
In the Schema Out workspace, right-click COUNTRY to choose the option New Output Item. In the next dialog box, choose the option Below and enter the Item name DISCHARGE_DATE with Data Type datetime.
o)
In the Schema Out workspace, right-click DISCHARGE_DATE to choose the option New Output Item. In the next dialog box, choose the option Below and enter the Item name DEPTH with Data Type int.
Continued on next page
414
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Hierarchy Flattening Transform (Optional)
6.
p)
In the Schema Out workspace, right-click DEPTH to choose the option New Output Item. In the next dialog box, choose the option Below and enter the Item name ROOT_FLAG with Data Type int.
q)
In the Schema Out workspace, right-click ROOT_FLAG to choose the option New Output Item. In the next dialog box, choose the option Below and enter the Item name LEAF_FLAG with Data Type int.
In the Query transform, map input columns to output columns. Create lookups for the output fields MANAGER_NAME and DEPARTMENT. Concatenate the employee's last name and first name separated by a comma for the output column EMPLOYEE_NAME. Add a WHERE clause to return only row with a depth greater than zero. a)
b)
Map input columns to output columns by dragging an output column and dropping it on the corresponding input column: Schema In
Schema Out
ANCESTOR
MANAGERID
DESCENDANT
EMPLOYEEID
DEPTH
DEPTH
ROOT_FLAG
ROOT_FLAG
LEAF_FLAG
LEAF_FLAG
C_ADDRESS
ADDRESS
C_discharge_date
DISCHARGE_DATE
C_EMAIL
EMAIL
C_HIIREDATE
HIREDATE
C_LastUpdate
LASTUPDATE
C_PHONE
PHONE
In the MAPPING tab of the MANAGER_NAME output field, select the Function button and in the Select Function dialog box, open the category of “Database Functions”. From the list of function names, select the lookup_ext function and select the Next button. In the Lookup_ext - Select Parameters dialog box, enter the parameters: Field/Option
Value
Lookup table
ALPHA.SOURCE.EMPLOYEE
Condition Continued on next page
2011
© 2011 SAP AG. All rights reserved.
415
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
BODS10
Columns in lookup table
LASTNAME
Op.(&)
=
Expression
HIERARCHY_FLATTENING.ANCESTOR
Output Column in lookup table c)
EMPLOYEEID
In the MAPPING tab of the DEPARTMENT output field, select the Function button and in the Select Function dialog box, open the category of “Database Functions”. From the list of function names, select the lookup_ext function and select the Next button. In the Lookup_ext - Select Parameters dialog box, enter the parameters: Field/Option
Value
Lookup table
ALPHA.SOURCE.DEPARTMENT
Condition Columns in lookup table
DEPARTMENTNAME
Op.(&)
=
Expression
HIERARCHY_FLATTENING.C_DEPARTMENTID
Output Column in lookup table d)
DEPARTMENTID
Select the output column EMPLOYEE_NAME and select the MAPPING tab and enter the expression: Hierarchy_Flattening.C_LASTNAME || ',' || Hierarchy_Flattening.C_FIRSTNAME
e)
In the Query transform, select the WHERE tab and enter the expression: Hiearchy_Flattening.DEPTH > 0
f)
Select the Back icon to close the editor.
Continued on next page
416
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Hierarchy Flattening Transform (Optional)
7.
Execute the Alpha_Employees_Report_Job with the default execution properties. a)
In the Omega project area, right-click on the Alpha_Employees_Report_Job and select the option Execute.
b)
Data Services prompts you to save any objects that have not been saved. Select OK.
c)
The Execution Properties dialog box appears and select OK.
d)
Return to the data flow workspace and view the data in the target table by selecting the Magnifying Glass icon on the target table. Note: 179 rows are written to the target table.
2011
© 2011 SAP AG. All rights reserved.
417
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 9: Using Data Integrator Platforms
BODS10
Lesson Summary You should now be able to: • Use the Hierarchy Flattening transform
Related Information •
418
For more information on the Hierarchy Flattening transform see “Transforms” Chapter 5 in the Data Services Reference Guide.
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Unit Summary
Unit Summary You should now be able to: • Using the Data Integrator transforms • Use the Pivot transform • Describe performance optimization • Use the Data Transfer transform • View SQL generated by a data flow • Use the XML Pipeline transform • Use the Hierarchy Flattening transform
2011
© 2011 SAP AG. All rights reserved.
419
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit Summary
420
BODS10
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 10 Using Text Data Processing Unit Overview In this Information Technology age, we are all familiar with the massive explosion of digital data that we have seen in the last decades. In 2003, there were 5 exabytes of data, twice the amount from three years earlier (UC Berkeley). Digital information created, captured and replicated worldwide has grown tenfold in five years (IDC 2008). 95% of digital data is unstructured (IDC 2007). This is the native integration of the text analytics technology acquired in 2007. The Entity Extraction transform is a new feature of Data Services to bring text data onto the platform and preparing it for query, analytics, and reporting.
Unit Objectives After completing this unit, you will be able to: •
Using the Entity Extraction transform
Unit Contents Lesson: Using the Entity Extraction Transform .............................422 Exercise 26: Using the Entity Extraction Transform ...................435
2011
© 2011 SAP AG. All rights reserved.
421
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 10: Using Text Data Processing
BODS10
Lesson: Using the Entity Extraction Transform Lesson Overview Text Data Processing can parse digital unstructured data to extract meaning from it and transform it into structured data that can be integrated into a database. Once in the database, other Business Intelligence tools can be used to support query, analysis, and reporting on that text data
Lesson Objectives After completing this lesson, you will be able to: •
Using the Entity Extraction transform
Business Example Your company wants which is being published about your company. Along with the general explosion of digital data in the Information Technology age, many individual pieces of digital data are being published about your company. To gain business insight, derive increased governance and productivity from this unstructured data, you want to examine, parse, store it and prepare strategic reports for your company. You must create a new batch job using the new Entity Extraction transform in Data Services and apply it to this data. In this Information Technology age, we are all familiar with the massive explosion of digital data that we have seen in the last decades. Some facts about this explosion: • • •
In 2003, there were 5 exabytes of data, twice the amount from three years earlier (UC Berkeley) Digital information created, captured and replicated worldwide has grown tenfold in five years (IDC 2008) 95% of digital data is unstructured (IDC 2007)
A large percentage of the digital data is unstructured, and IDC estimates that 95% of digital data is unstructured. Additional estimates: • • • •
70-95 percent of all data stored is unstructured format (Butler Group) 80 percent of business is conducted on unstructured information (Gartner Group) Unstructured data doubles every three months (Gartner Group) 7 million Web pages are added every day (Gartner Group)
The emergence of open source platforms for unstructured content also point to the general recognition that unstructured data is becoming more prominent.
422
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Entity Extraction Transform
This new feature of Data Services bringing text data onto platform and preparing it for query, analytics, and reporting. This is the native integration of the text analytics technology acquired in 2007. Benefits include: •
• •
Enhanced Business Insights: text data processing extends Enterprise Information Management and Business Intelligence initiatives to unstructured text content, for improved quality of analysis and reporting, and competitive advantage. Improved Governance: text data processing can be deployed as part of an initiative for improved transparency and oversight by monitoring. Increased Productivity: text data processing automates tedious manual tasks leading to improved efficiency and cost reduction.
This slide shows a conceptual overview of how text data processing works. Content derived from sources such as notes fields, file systems, spreadsheets, or other repositories. The current release supports text documents that are html, txt, or XML. Text data processing then parses the text to extract meaning from it and transform it into structured data that can be integrated into a database. Once in the database, our BI tools can be used to support query, analysis, and reporting on that text data
Figure 110: Text Data Processing in Data Services
2011
© 2011 SAP AG. All rights reserved.
423
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 10: Using Text Data Processing
BODS10
In this example, we see highlighted some of the entities and facts that Text Data Processing can automatically identify, and output as structured information. For example, we have a person name, dates, titles, organization names, and concepts. We also see larger matches, also known as facts or relations, underlined here. For instance, we have an executive job change and a merger and acquisition fact.
Figure 111: Text Data Processing Capabilities
These extractions become metadata or tags about the content that is stored to a repository and tells us the meaning of the content, without manual processing.
424
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Entity Extraction Transform
Figure 112: What is new in Text Data Processing
The native text data processing capability of Data Services, we see: • • •
• The entity extraction transform supports a set of predefined entity types, as well as customer entities, sentiments, and other packaged rules. • Entity extraction can be customized using a dictionary. • Because Text Data Processing is part of Data Services, the administration happens via the Data Services Designer, so we have unified installation, administration, and output.
For customers interested in text analytics, this native integration provides a number of benefits. One major area is the server features like security and batch processing, heterogeneous database support, and connectivity to many different sources including SAP systems. Another area of benefits is the ease of creating combined workflows with Data Integrator and Data Quality Services.
2011
© 2011 SAP AG. All rights reserved.
425
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 10: Using Text Data Processing
BODS10
Figure 113: Key Concepts Defined
This slide shows the architecture for text data processing.
426
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Entity Extraction Transform
Figure 114: Text Data Processing Architecture
1. 2.
3.
The Extraction, Transformation, and Loading (ETL) designer sets up Text Data Processing jobs using the Data Services Designer. Data Services accesses text content from sources such as surveys, notes fields in databases, or text files directly. Connectors to e-mails or internet sources can also be built. As long as the content is in HTML, XML, or TXT we can process it. Conversion from binary formats such as Word and PDF is planned for a future release. Optionally, the results of text Data Processing can be passed to Data Quality Services for cleansing and normalization before being stored to a repository. From here, the results can be consumed either directly by an application or dashboard or via the BI semantic layer.
This slide shows the predefined entity types that are extracted, or matched, in various languages.
2011
© 2011 SAP AG. All rights reserved.
427
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 10: Using Text Data Processing
BODS10
Figure 115: Supported Entity Types
There are two methods to customize entity extraction: 1.
2.
A dictionary is a list of known entity names and their variants. This is the recommended method for making entity extraction more relevant to a specific business domain. a) Standard form and variants are matched and normalized b) Source can be an XLS spreadsheet, XML file, or table c) Package includes XSD with correct dictionary format d) Supported for all languages e) Dictionary supports multiple languages A rule is a pattern written using a proprietary language in a compiled format. This is an advanced feature, which should be used by specialized consultants or trained partners only. a) b) c) d) e)
Pattern matching language based on regular expressions, enhanced with natural language operators Command line compilation Rule customization supported in all languages Packaged rule sets for Voice of Customer (sentiment, request), enterprise events, public security events Engage consulting or a partner for additional customization Note: In this course we consider only dictionary-based entity extractions.
428
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Entity Extraction Transform
Dictionaries can be used to customize extraction for a customer or domain. A dictionary is a list of entities that should always be extracted if one of their forms appears in the input. A dictionary is represented in an XML format compiled into a binary representation for runtime processing. Dictionaries can be used for name variation management, disambiguation of unknown entities, and control over entity recognition. Things are referred to with multiple names. Humans naturally associate names together and use them interchangeably. You can help the extraction process understand these variations by specifying them in a dictionary. Doing so improves the usefulness and accuracy of results as knowing an entity refers to the same thing as another entity helps with analysis. Take for example, the occurrence of: • • •
Example: IBM, International Business Machines, and Big Blue are all names for the same company. Pick a standard form, such as IBM, and make the other names variants of this form. Post-extraction, all of the entities that have the same standard form – IBM – can then be grouped together even if the input text used another form.
While each supported language provides built in, system dictionaries to know the types of some extractions, such as SAP being a company, sometimes an entity is extracted as a PROP_MISC (proper miscellaneous names). This indicates that the extraction process knows the entity is meaningful, but does not know to what type it belongs. You can improve the accuracy of your results by disambiguating the type of these entities in a dictionary. • •
Example: Processing the text The World Cup was hosted by South Africa in 2010 would not identify World Cup as a sporting event by default. Adding World Cup, and any variations, to a dictionary as type SPORTING_EVENT would resolve this during extraction.
The Entity Extraction transform is located within the newly added Text Data Processing category in the Transforms tab within the Local Object Library of the Designer. It includes a single configuration that can be used for extraction.
2011
© 2011 SAP AG. All rights reserved.
429
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 10: Using Text Data Processing
BODS10
Figure 116: Text Data Processing: The Entity Extraction Transform
The Text Data Processing Entity Extraction transform supports processing text, HTML, or XML content as a varchar, LONG, or BLOB data type. This content can come from multiple sources such as a flat file, a column of an Excel spreadsheet, or a database table column.
Figure 117: Entity Extraction: Input Field
430
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Entity Extraction Transform
The Entity Extraction transform has a single mandatory field Language. Always select the language of the content being processed before validating the transform or executing a job containing it. The Entity Extraction transform can process text in these languages: English, French, German, Japanese, Simplified Chinese, and Spanish. However, if words or sections of the content are in a language different from the one selected in the transform, unexpected or incorrect results may be output.
Figure 118: Entity Extraction: Options
Reduce noise from too many extractions by specifying a language entity type filter. Only entities of the selected types are output if they exist in the content processed. These filters discard types which do not exist in the processed context.
2011
© 2011 SAP AG. All rights reserved.
431
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 10: Using Text Data Processing
BODS10
Figure 119: Options: Filtering Output by Entity Types in Selected Languages
Advanced Parsing enriches noun phrase extraction with pronouns, numbers, and determiners that can be used when writing custom extraction rules. It is only supported by the English language. Enabling advanced parsing does not enable coreference resolution – the ability to relate pronouns to named entities.
Figure 120: Options: Enabling Advanced Parsing
432
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Entity Extraction Transform
The Text Data Processing Entity Extraction transform outputs entities and facts in a flat structure for easy consumption by other transforms. However, there are inherent relationships between output rows. For instance, the entity Peter Rezle/PERSON can be broken down into Peter/PERSON_GIV and Rezle/PERSON_FAM subentities – each output as a different row. The ID and PARENT_ID columns maintain any relationship between the rows output for a piece of text. The STANDARD_FORM column value represents the longest, most precise or official name associated with the value of the corresponding TYPE column – Peter Rezle in the previous example versus Pete Rezle mentioned elsewhere in the content. The CONVERTED_TEXT column value represents the possibly transcoded input text. This can be used to refer to the location of an extraction using the character OFFSET and LENGTH column values for any entity or fact.
Figure 121: Entity Extraction: Output Fields
2011
© 2011 SAP AG. All rights reserved.
433
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 10: Using Text Data Processing
434
© 2011 SAP AG. All rights reserved.
BODS10
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Entity Extraction Transform
Exercise 26: Using the Entity Extraction Transform Exercise Objectives After completing this exercise, you will be able to: • Using the Entity Extraction transform
Business Example Your company wants which is being published about your company. Along with the general explosion of digital data in the Information Technology age, many individual pieces of digital data are being published about your company. To gain business insight, derive increased governance and productivity from this unstructured data, you want to examine, parse, store it and prepare strategic reports for your company. You must create a new batch job using the new Entity Extraction transform in Data Services and apply it to this data.
Task: You need to process a set of Enron emails to identify the different entities contained in them, filtering the output to include only PERSON and ORGANIZATION/COMMERCIAL entities.
2011
1.
In the Omega project, create a new batch job called Alpha_TDP_Job containing a data flow called Alpha_TDP_DF.
2.
Create an unstructured text file format to extract Enron emails from a directory and place the format in the data flow workspace.
3.
Configure an Entity Extraction transform to process the Enron emails
4.
Create a delimited file format TDP_Enron_output to load the extraction output
5.
Execute the job Alpha_TDP_Job, inspect the output, and configure an entity type filter
6.
Execute the job Alpha_TDP_Job again, inspect the output, and configure an entity type filter
© 2011 SAP AG. All rights reserved.
435
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 10: Using Text Data Processing
BODS10
Solution 26: Using the Entity Extraction Transform Task: You need to process a set of Enron emails to identify the different entities contained in them, filtering the output to include only PERSON and ORGANIZATION/COMMERCIAL entities. 1.
In the Omega project, create a new batch job called Alpha_TDP_Job containing a data flow called Alpha_TDP_DF. a)
In the Project area, right-click the project name and choose New Batch Job from the menu.
b)
Enter the name of the job as Alpha_TDP_Job.
c)
Press Enter to commit the change.
d)
Open the job Alpha_TDP_Job by double-clicking it.
e)
Select the Data Flow icon in the Tool Palette.
f)
Select the workspace where you want to add the data flow.
g)
Enter Alpha_TDP_DF as the name.
h)
Press Enter to commit the change.
i)
Double-click the data flow to open the data flow workspace.
Continued on next page
436
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Entity Extraction Transform
2.
Create an unstructured text file format to extract Enron emails from a directory and place the format in the data flow workspace. a)
In the Local Object Library, select the Flat Files tab. Right-click on Flat Files and choose the option New.
b)
2. Select unstructured text from the Type option value drop-down list and select Yes to overwrite the schema.
c)
3. Specify the Name value as TDP_Enron_emails.
d)
4. Select the folder icon next to the Root directory option and browse to the N:\My Documents → BODS10 → Activity_Source → Enron_emails. Note: All files will be retrieved as the File name(s) filter specifies *.* by default. You can change the filter value to limit a particular type of file, such as *.txt or *.html, or even specify the name of a file.
3.
e)
5. Select Save & Close.
f)
Double-click the data flow Alpha_TDP_DF to open the workspace.
g)
7. Drag the TDP_Enron_emails unstructured text file format into the data flow workspace.
Configure an Entity Extraction transform to process the Enron emails a)
Select the Transforms tab in the Local Object Library, expand the Text Data Processing category, and expand the Entity_Extraction transform. 2. Drag the Base_EntityExtraction transform configuration into the data flow workspace.
b)
Connect the TDP_Enron_emails file format to the Base_EntityExtraction transform in the data flow.
c)
Double-click the Base_EntityExtraction transform in the data flow to open its editor.
d)
Select the Input and drag the Data column from the Schema In pane onto the TEXT column in the Input tab.
e)
Select the Options and select English as the Language option value.
f)
Select the Output tab and check these field names to map them to the Schema Out pane: ID, PARENT_ID, STANDARD_FORM, TYPE, and SOURCE_FORM.
g)
Drag the FileName column from the Schema In pane onto the Schema Out pane.
Continued on next page
2011
© 2011 SAP AG. All rights reserved.
437
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 10: Using Text Data Processing
4.
BODS10
Create a delimited file format TDP_Enron_output to load the extraction output a)
Right-click the Base_EntityExtraction label in the Schema Out pane and select Create File Format.
b)
Specify the name as TDP_Enron_output.
c)
Enter N:\My Documents\BODS10\Activity_Source\Enron_emails in the Root directory option value.
d)
Enter enron_emails_out.txt in the File name(s) option value.
e)
Select Save & Close.
f)
Select Back to close the Base_EntityExtraction transform editor.
g)
Drag the TDP_Enron_output file format into the Alpha_TDP_DF data flow workspace and select Make Target.
h)
Connect the Base_EntityExtraction transform to the TDP_Enron_output file format in the data flow by selecting the Entity Extraction transform and while holding down the mouse button, drag to the target file format. Release the mouse button to create the link.
Continued on next page
438
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Lesson: Using the Entity Extraction Transform
5.
6.
2011
Execute the job Alpha_TDP_Job, inspect the output, and configure an entity type filter a)
In the Omega project, right-click the job Alpha_TDP_DF and select the option Execute.
b)
Data Services prompts you to save any objects that have not been saved. Select OK.
c)
The Execution Properties dialog box appears and select OK.
d)
Once the job completes, close and then open the Alpha_TDP_DF data flow.
e)
Right-click the TDP_Enron_output file format in the data flow and select View Data. to inspect the extracted entities.
f)
Close the view of data and double-click the Base_EntityExtraction transform instance in the data flow to open the editor.
g)
Select the Options and select the Filter by Entity Types option value within the Languages option group and select elipses (...).
h)
In the Available values pane, select the fields ORGANIZATION/COMMERICIAL and PERSON entity types as filter values. Then select the Add button to move the fields to the Selected values pane.
i)
Select Back to close the editor.
Execute the job Alpha_TDP_Job again, inspect the output, and configure an entity type filter a)
In the Omega project, right-click the job Alpha_TDP_DF and select the option Execute.
b)
Data Services prompts you to save any objects that have not been saved. Select OK.
c)
The Execution Properties dialog box appears and select OK.
d)
Once the job completes, close and then open the Alpha_TDP_DF data flow.
e)
Right-click the TDP_Enron_output file format in the data flow and select View Data. to inspect the extracted ORGANIZATION/COMMERCIAL and PERSON entities.
© 2011 SAP AG. All rights reserved.
439
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit 10: Using Text Data Processing
BODS10
Lesson Summary You should now be able to: • Using the Entity Extraction transform
440
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Unit Summary
Unit Summary You should now be able to: • Using the Entity Extraction transform
2011
© 2011 SAP AG. All rights reserved.
441
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Unit Summary
442
BODS10
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
BODS10
Course Summary
Course Summary You should now be able to: • • • • •
2011
Integrate disparate data sources Create, execute, and troubleshoot batch jobs Use functions, scripts, and transforms to modify data structures and format data Handle errors in the extraction and transformation process Capture changes in data from data sources using different techniques
© 2011 SAP AG. All rights reserved.
443
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Course Summary
444
BODS10
© 2011 SAP AG. All rights reserved.
2011
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]
Feedback SAP AG has made every effort in the preparation of this course to ensure the accuracy and completeness of the materials. If you have any corrections or suggestions for improvement, please record them in the appropriate place in the course evaluation.
2011
© 2011 SAP AG. All rights reserved.
445
For Any SAP / IBM / Oracle - Materials Purchase Visit : www.sapcertified.com OR Contact Via Email Directly At :
[email protected]