OLAP
Short Description
Download OLAP...
Description
ONLINE ANALYTICAL PROCESSING A SEMINAR REPORT Submitted by
TARA JOHN in partial fulfillment for the award of the degree of
BACHELOR OF TECHNOLOGY in COMPUTER SCIENCE AND ENGINEERING SCHOOL OF ENGINEERING
COCHIN UNIVERSITY OF SCIENCE &TECHNOLOGY, KOCHI-682022
AUGUST 2008
DIVISION OF COMPUTER ENGINEERING SCHOOL OF ENGINEERING COCHIN UNIVERSITY OF SCIENCE AND TECHNOLOGY
KOCHI-682022
Certificate Certified that this is a bonafide record of the seminar entitled
“ ONLINE ANALYTICAL PROCESSING “ done by the following student TARA JOHN of the VIIth semester,Computer Science and Engineering in the year 2008 in partial fulfillment of the requirements to the award of Degree of Bachelor of Technology in Computer Science and Engineering of Cochin University of Science and Technology.
Mr.DAMODARAN
Dr David Peter S
Seminar Guide
Head of the Department
Date:
ACKNOWLEDGEMENT At the outset, I thank the Lord Almighty for the grace, strength and hope to make my endeavor a success. I also express my gratitude to Dr. David Peter, Head of the Department and my Seminar Guide for providing me with adequate facilities, ways and means by which I was able to complete this seminar. I express my sincere gratitude to him for his constant support and valuable suggestions without which the successful completion of this seminar would not have been possible. I thank Mr.Damodaran, my seminar guide for her boundless cooperation and helps extended for this seminar. I express my immense pleasure and thankfulness to all the teachers and staff of the Department of Computer Science and Engineering, CUSAT for their cooperation and support. Last but not the least, I thank all others, and especially my classmates and my family members who in one way or another helped me in the successful completion of this work. TARA JOHN
ABSTRACT OLAP
refers to a type of application that allows a user to
interactively analyze data. An OLAP system is often contrasted to an OLTP (On-Line Transaction Processing) system that focuses on processing transaction such as orders, invoices or general ledger transactions.It describes a class of applications that require multidimensional analysis of business data. OLAP systems enable managers and analysts to rapidly and easily examine key performance data and perform powerful comparison and trend analyses, even on very large data volumes. They can be used in a wide variety of business areas, including sales and marketing analysis, financial reporting, quality tracking, profitability analysis, manpower and pricing applications, and many others.OLAP technology is being used in an increasingly wide range of applications. The most common are sales and marketing analysis; financial reporting and consolidation; and budgeting and planning; quality analysis, in fact for any management system that requires a flexible, top down view of an organization.Online Analytical Processing (OLAP) is also a method of analyzing data in a multidimensional format, often across multiple time periods, with the aim of uncovering the business information concealed within the data'- OLAP enables business users to gain an insight into the business through interactive analysis of different views of
the business data that have been built up from the operational systems. This approach facilitates a more intuitive and meaningful analysis of business information and assists in identifying important business trends.
TABLE OF CONTENTS CHAPTER NO
TITLE ABSTRACT
PAGE NO iv
LIST OF FIGURES
1.
INTRODUCTION
1
1.1 Olap Cube
2
1.2 Olap Operations
3
1.3 History
4
2.
OLAP MILESTONES
9
3.
DIFFERENCES BETWEEN OLTP
14
AND OLAP 4.
DATAWAREHOUSE
15
4.1 Datawarehouse architecture
16
ETL TOOLS
20
5.1 ETL Concepts
21
6.
CLASSIFICATIONS OF OLAP
22
7.
APPLICATIONS OF OLAP
25
8.
FUTURE GAZING:
34
5.
Possibility of convergence of OLAP and OLTP REFERENCES
35
LIST OF FIGURES
NO
NAME
PAGE NO
1.
OLAP CUBE
2
2.
OLAP MILESTONES
9
3.
STAR SCHEMA
17
4.
SNOWFLAKE SCHEMA
18
5.
ETL
22
6.
EIS
31
Online Analytical Processing
1. INTRODUCTION On-Line Analytical Processing (OLAP) is a category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user. OLAP functionality is characterized by dynamic multi-dimensional analysis of consolidated enterprise data supporting end user analytical and navigational activities including
trend analysis over sequential time periods
slicing subsets for on-screen viewing
drill-down to deeper levels of consolidation
reach-through to underlying detail data
rotation to new dimensional comparisons in the viewing area
OLAP is implemented in a multi-user client/server mode and offers
consistently rapid response to queries, regardless of database size and complexity .
OLAP helps the user synthesize enterprise information through
comparative, personalized viewing, as well as through analysis of historical and projected data in various "what-if" data model scenarios. This is achieved through use of an OLAP Server An OLAP server is a high-capacity, multi-user data manipulation engine specifically designed to support and operate on multi-dimensional data structures. A multi-dimensional structure is arranged so that every data item is located and accessed based on the intersection of the dimension members which define that item. The OLAP term dates back to 1993, but the ideas, technology and even some of the products have origins long before then.
Division Of Computer Engineering, SOE, CUSAT
1
Online Analytical Processing
1.1OLAP CUBE An OLAP cube as shown in fig(1.1.1) is a data structure that allows fast analysis of data. The arrangement of data into cubes overcomes a limitation of relational databases. Relational databases are not well suited for near instantaneous analysis and display of large amounts of data. Instead, they are better suited for creating records from a series of transactions known as OLTP or On-Line Transaction Processing. Although many report-writing tools exist for relational databases, these are slow when the whole database must be summarized.
Fig(1.1.1) Olap Cube
The OLAP cube consists of numeric facts called measures which are categorized by dimensions. The cube metadata is typically created from a star schema or snowflake schema of tables in a relational database. Measures are derived from the records in the fact table and dimensions are derived from the dimension tables.
Division Of Computer Engineering, SOE, CUSAT
2
Online Analytical Processing
1.2 OLAP Operations The analyst can understand the meaning contained in the databases using multidimensional analysis. By aligning the data content with the analyst's mental model, the chances of confusion and erroneous interpretations are reduced. The analyst can navigate through the database and screen for a particular subset of the data, changing the data's orientations and defining analytical calculations.The user-initiated process of navigating by calling for page displays interactively, through the specification of slices via rotations and drill down/up is sometimes called "slice and dice". Common operations include slice and dice, drill down, roll up, and pivot. Slice: A slice is a subset of a multi-dimensional array corresponding to a single value for one or more members of the dimensions not in the subset. Dice: The dice operation is a slice on more than two dimensions of a data cube (or more than two consecutive slices). Drill Down/Up: Drilling down or up is a specific analytical technique whereby the user navigates among levels of data ranging from the most summarized (up) to the most detailed (down). Roll-up: A roll-up involves computing all of the data relationships for one or more dimensions. To do this, a computational relationship or formula might be defined. Pivot: To change the dimensional orientation of a report or page display.
Division Of Computer Engineering, SOE, CUSAT
3
Online Analytical Processing
1.3 History Online Analytical Processing, or OLAP is an approach to quickly provide answers to analytical queries that are multi-dimensional in nature. OLAP is part of the broader category business intelligence, which also encompasses relational reporting and data mining. The typical applications of OLAP are in business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas. The term OLAP was created as a slight modification of the traditional database term OLTP (Online Transaction Processing). Databases configured for OLAP employ a multidimensional data model, allowing for complex analytical and ad-hoc queries with a rapid execution time. They borrow aspects of navigational databases and hierarchical databases that are speedier than their relational kin. Nigel Pendse has suggested that an alternative and perhaps more descriptive term to describe the concept of OLAP is Fast Analysis of Shared Multidimensional Information (FASMI). The output of an OLAP query is typically displayed in a matrix (or pivot) format. The dimensions form the row and column of the matrix; the measures, the values. The first product that performed OLAP queries was Express, which was released in 1970 (and acquired by Oracle in 1995 from Information Resources). However, the term did not appear until 1993 when it was coined by Ted Codd, who has been described as "the father of the relational database". Codd's paper resulted from a short consulting assignment which Codd undertook for former Arbor Software (later Hyperion Solutions, and in 2007 acquired by Oracle), as a sort of marketing coup. The company had released its own OLAP product, Essbase, a year earlier. As a result Codd's "twelve laws of online analytical processing" were explicit in their reference to Essbase. There was some ensuing controversy and when Computerworld learned that Codd was paid by Arbor, it retracted the article. OLAP market experienced strong growth in late 90s with dozens of commercial products going into market. In 1998, Microsoft released its first OLAP Division Of Computer Engineering, SOE, CUSAT
4
Online Analytical Processing Server - Microsoft Analysis Services, which drove wide adoption of OLAP technology and moved it into mainstream.
APL Multidimensional analysis, the basis for OLAP, is not new. In fact, it goes back to 1962, with the publication of Ken Iverson’s book, A Programming Language. The first computer implementation of the APL language was in the late 1960s, by IBM. APL is a mathematically defined language with multidimensional variables and elegant, if rather abstract, processing operators. It was originally intended more as a way of defining multidimensional transformations than as a practical programming language, so it did not pay attention to mundane concepts like files and printers. In the interests of a succinct notation, the operators were Greek symbols. In fact, the resulting programs were so succinct that few could predict what an APL program would do. It became known as a ‘Write Only Language’ (WOL), because it was easier to rewrite a program that needed maintenance than to fix it. In spite of inauspicious beginnings, APL did not go away. It was used in many 1970s and 1980s business applications that had similar functions to today’s OLAP systems. Indeed, IBM developed an entire mainframe operating system for APL, called VSPC, and some people regarded it as the personal productivity environment of choice long before the spreadsheet made an appearance. One of these APL-based mainframe products from the 1980s was originally called Frango, and later Fregi. It was developed by IBM in the UK, and was used for interactive, top-down planning. A PC-based descendant of Frango surfaced in the early 1990s as KPS, and the product remains on sale today as the Analyst module in Cognos Planning. This is one of several APL-based products that Cognos has built or acquired since 1999. Ironically, this APL-based module has returned home, now that IBM owns Cognos. Even today, more than 40 years later, APL continues to be enhanced and used in new applications. It is used behind the scenes in many business applications, and has
Division Of Computer Engineering, SOE, CUSAT
5
Online Analytical Processing even entered the worlds of unicode, object-oriented programming and Vista. Few, if any, other 1960s computer languages have shown such longevity.
Express By 1970, a more application-oriented multidimensional product, with academic origins, had made its first appearance: Express. This, in a completely rewritten form and with a modern code-base, became a widely used contemporary OLAP offering, but the original 1970’s concepts still lie just below the surface. Even after 30 years, Express remains one of the major OLAP technologies, although Oracle struggled and failed to keep it up-to-date with the many newer client/server products. Oracle announced in late 2000 that it would build OLAP server capabilities into Oracle9i starting in mid 2001. The second release of the Oracle9i OLAP Option included both a version of the Express engine, called the Analytic Workspace, and a new ROLAP engine. In fact, delays with the new OLAP technology and applications means that Oracle is still selling Express and applications based on it in 2005, some 35 years after Express was first released. This means that the Express engine has lived on well into its fourth decade, and if the OLAP Option eventually becomes successful (particularly in its new guise as a super-charged materialized views engine), the renamed Express engine could be on sale into its fifth decade, making it possibly one of the longest-lived software products ever.
EIS By the mid 1980s, the term EIS (Executive Information System) had been born. The idea was to provide relevant and timely management information using a new, much simpler user interface than had previously been available. This used what was then a revolutionary concept, a graphical user interface running on DOS PCs, and using touch screens or mice. For executive use, the PCs were often hidden away in custom cabinets and desks, as few senior executives of the day wanted to be seen as nerdy PC users. The first explicit EIS product was Pilot’s Command Center though there had been EIS applications implemented by IRI and Comshare earlier in the decade. This was a
Division Of Computer Engineering, SOE, CUSAT
6
Online Analytical Processing cooperative processing product, an architecture that would now be called client/server. Because of the limited power of mid 1980s PCs, it was very server-centric, but that approach came back into fashion again with products like EUREKA:Strategy and Holos and the Web.
Spreadsheets and OLAP By the late 1980s, the spreadsheet was already becoming dominant in end-user analysis, so the first multidimensional spreadsheet appeared in the form of Compete. This was originally marketed as a very expensive specialist tool, but the vendor could not generate the volumes to stay in business, and Computer Associates acquired it, along with a number of other spreadsheet products including SuperCalc and 20/20. The main effect of CA’s acquisition of Compete was that the price was slashed, the copy protection removed and the product was heavily promoted. However, it was still not a success, a trend that was to be repeated with CA’s other OLAP acquisitions. For a few years, the old Compete was still occasionally found, bundled into a heavily discounted bargain pack. Later, Compete formed the basis for CA’s version 5 of SuperCalc, but the multidimensionality aspect of it was not promoted. By the late 1980s, Sinper had entered the multidimensional spreadsheet world, originally with a proprietary DOS spreadsheet, and then by linking to DOS 1-2-3. It entered the Windows era by turning its (then named) TM/1 product into a multidimensional back-end server for standard Excel and 1-2-3. Slightly later, Arbor did the same thing, although its new Essbase product could then only work in client/server mode, whereas Sinper’s could also work on a stand-alone PC. This approach to bringing multidimensionality to spreadsheet users has been far more popular with users. So much so, in fact, that traditional vendors of proprietary front-ends have been forced to follow suit, and products like Express, Holos, Gentia, MineShare, PowerPlay, MetaCube and WhiteLight all proudly offered highly integrated spreadsheet access to their OLAP servers. Ironically, for its first six months, Microsoft OLAP Services was one of the few OLAP servers not to have a vendor-developed spreadsheet client, as Microsoft’s (very basic) offering only appeared in June 1999 in Excel 2000. However, the (then)
Division Of Computer Engineering, SOE, CUSAT
7
Online Analytical Processing OLAP@Work. Excel add-in filled the gap, and still (under its new snappy name, BusinessQuery MD for Excel) provided much better exploitation of the server than did Microsoft’s own Excel interface. Since then there have been at least ten other third party Excel add-ins developed for Microsoft Analysis Services, all offering capabilities not available even in Excel 2003. However, Business Objects’ acquisition of Crystal Decisions has led to the phasing out of BusinessQuery MD for Excel, to be replaced by technology from Crystal. There was a rush of new OLAP Excel add-ins in 2004 from Business Objects, Cognos, Microsoft, MicroStrategy and Oracle. Perhaps with users disillusioned by disappointing Web capabilities, the vendors rediscovered that many numerate users would rather have their BI data displayed via a flexible Excel-based interface rather than in a dumb Web page or PDF. Microsoft is taking this further with PerformancePoint, whose main user interface for data entry and reporting is via Excel.
Division Of Computer Engineering, SOE, CUSAT
8
Online Analytical Processing
2.OLAP MILESTONES OLAP milestones Year Event
Comment
1962 Publication of A Programming Language by Ken Iverson
First multidimensional language; used Greek symbols for operators. Became available on IBM mainframes in the late 1960s and still used to a limited extent today. APL would not count as a modern OLAP tool, but many of its ideas live on in today’s altogether less elitist products, and some applications (eg Cognos Planning Analyst and Cognos Consolidation, the former Lex 2000) still use APL internally.
1970 Express available on timesharing (followed by inhouse versions later in the 1970s)
First multidimensional tool aimed at marketing applications; now owned by Oracle, and still one of the market leaders (after several rewrites and two changes of ownership). Although the code is much changed, the concepts and the data model are not. The modern version of this engine is now shipping as the MOLAP engine in Oracle9i Release 2 OLAP Option.
1982 Comshare System W available on timesharing (and in-house mainframes the following year)
First OLAP tool aimed at financial applications. No longer marketed, but still in limited use on IBM mainframes; its Windows descendent is marketed as the planning component of Comshare MPC. The later Essbase product used many of the same concepts, and like System W, suffers from database explosion.
1984 Metaphor launched
First ROLAP. Sales of this Mac cousin were disappointing, partly because of proprietary hardware and high prices (the start-up cost for an eight-workstation system, including 72Mb file server, database server and software was $64,000). But, like Mac users, Metaphor users remained fiercely loyal.
1985 Pilot Command Center launched
First client/server EIS style OLAP; used a time-series approach running on VAX servers and standard PCs as clients.
1990 Cognos PowerPlay launched
This became both the first desktop and the first Windows OLAP and now leads the “desktop” sector. Though we still class this as a desktop OLAP on functional grounds, most customers now implement the much more scalable client/server and Web versions.
1991 IBM acquired Metaphor
The first of many OLAP products to change hands. Metaphor became part of the doomed Apple-IBM Taligent joint venture and was renamed IDS, but there are unlikely to be any
Division Of Computer Engineering, SOE, CUSAT
9
Online Analytical Processing remaining sites. 1992 Essbase launched
First well-marketed OLAP product, which went on to become the market leading OLAP server by 1997.
1993 Codd white paper This white paper, commissioned by Arbor Software, brought coined the OLAP multidimensional analysis to the attention of many more term people than ever before. However, the Codd OLAP rules were soon forgotten (unlike his influential and respected relational rules). 1994 MicroStrategy DSS Agent launched
First ROLAP to do without a multidimensional engine, with almost all processing being performed by multi-pass SQL — an appropriate approach for very large databases, or those with very large dimensions, but suffers from a severe performance penalty. The modern MicroStrategy 7i has a more conventional three-tier hybrid OLAP architecture.
1995 Holos 4.0 released First hybrid OLAP, allowing a single application to access both relational and multidimensional databases simultaneously. Many other OLAP tools are now using this approach. Holos was acquired by Crystal Decisions in 1996, but has now been discontinued. 1995 Oracle acquired Express
First important OLAP takeover. Arguably, it was this event that put OLAP on the map, and it almost certainly triggered the entry of the other database vendors. Express has now become a hybrid OLAP and competes with both multidimensional and relational OLAP tools. Oracle soon promised that Express would be fully integrated into the rest of its product line but, almost ten years later, has still failed to deliver on this promise.
1996 BusinessObjects 4.0 launched
First tool to provide seamless multidimensional and relational reporting from desktop cubes dynamically built from relational data. Early releases had problems, now largely resolved, but Business Objects has always struggled to deliver a true Web version of this desktop OLAP architecture. It is expected finally to achieve this by using the former Crystal Enterprise as the base.
1997 Microsoft announced OLE DB for OLAP
This project was code-named Tensor, and became the ‘industry standard’ OLAP API before even a single product supporting it shipped. Many third-party products now support this API, which is evolving into the more modern XML for Analysis.
1998 IBM DB2 OLAP Server released
This version of Essbase stored all data in a form of relational star schema, in DB2 or other relational databases, but it was more like a slow MOLAP than a scalable ROLAP. IBM later
Division Of Computer Engineering, SOE, CUSAT
10
Online Analytical Processing abandoned its “enhancements”, and now ships the standard version of Essbase as DB2 OLAP Server. Despite the name, it remains non-integrated with DB2. 1998 Hyperion Solutions formed
Arbor and Hyperion Software ‘merged’ in the first large consolidation in the OLAP market. Despite the name, this was more of a takeover of Hyperion by Arbor than a merger, and was probably initiated by fears of Microsoft’s entry to the OLAP market. Like most other OLAP acquisitions, this went badly. Not until 2002 did the merged company begin to perform competitively.
1999 Microsoft OLAP Services shipped
This project was code-named Plato and then named Decision Support Services in early pre-release versions, before being renamed OLAP Services on release. It used technology acquired from Panorama Software Systems in 1996. This soon became the OLAP server volume market leader through ease of deployment, sophisticated storage architecture (ROLAP/MOLAP/Hybrid), huge third-party support, low prices and the Microsoft marketing machine.
1999 CA starts CA acquired the former Prodea Beacon, via Platinum, in 1999 mopping up failed and renamed it DecisionBase. In 2000 it also acquired the OLAP servers former IA Eureka, via Sterling. This collection of failed OLAPs seems destined to grow, though the products are soon snuffed-out under CA’s hard-nosed ownership, and as The OLAP Survey 2, the remaining CA Cleverpath OLAP customers were very unhappy by 2002. By the The OLAP Survey 4 in 2004, there seemed to be no remaining users of the product. 2000 Microsoft renames Microsoft renamed the second release of its OLAP server for OLAP Services to no good reason, thus confusing much of the market. Of Analysis Services course, many references to the two previous names remain within the product. 2000 XML for Analysis This initiative for a multi-vendor, cross-platform, XML-based announced OLAP API is led by Microsoft (later joined by Hyperion and then SAS Institute). It is, in effect, an XML implementation of OLE DB for OLAP. XML for Analysis is the native API for Analysis Services 2005, but most client tools still use OLE DB for OLAP. 2001 Oracle begins shipping the successor to Express
Six years after acquiring Express, which has been in use since 1970, Oracle began shipping Oracle9i OLAP, expected eventually to succeed Express. However, the early releases of the new generation Oracle OLAP were incomplete and unusable. The full replacement of the technology and applications had still not occurred by 2006, some 11 years
Division Of Computer Engineering, SOE, CUSAT
11
Online Analytical Processing after Oracle acquired Express, and there are still very few users of the Oracle OLAP Option. 2001 MicroStrategy abandons Strategy.com
Strategy.com was part of MicroStrategy’s grand strategy to become the next Microsoft. Instead, it very nearly bankrupted the company, which finally shut the subsidiary down in late 2001.
2001 Siebel acquires nQuire
Siebel was surprisingly successful with what became Siebel Analytics, which now seems destined to become the core of Oracle’s future BI strategy.
2002 Oracle ships integrated OLAP server
Oracle9i Release 2 OLAP Option shipped in mid 2002, with a MOLAP server (a modernized Express), called the Analytical Workspace, integrated within the database. This was the closest integration yet between a MOLAP server and an RDBMS. But it is still not a complete solution, lacking competitive front-end tools and applications.
2003 The year of consolidation
Business Objects purchases Crystal Decisions, Hyperion Solutions Brio Software, Cognos Adaytum, and Geac Comshare.
2004 Excel add-ins go mainstream
Business Objects, Cognos, Microsoft, MicroStrategy and Oracle all release new Excel add-ins for accessing OLAP data, while Sage buys one of the leading Analysis Services Excel add-in vendors, IntelligentApps.
2004 Essbase database explosion curbed
Hyperion releases Essbase 7X which included the results of Project Ukraine: the Aggregate Storage Option. This finally cured Essbase’s notorious database explosion syndrome, making the product suitable for marketing, as well as financial, applications.
2004 Cognos buys its second Frango
Cognos buys Frango, the Swedish consolidation system. Less well known is the fact that Adaytum, which Cognos bought in the previous year, had its origins in IBM’s Frango project from the early 1980s.
2005 Microsoft finally ships the muchdelayed SQL Server 2005
Originally planned for release in 2003, Microsoft managed to ship the major ‘Yukon’ version just before the end of 2005.
2005 Pentaho buys Mondrian
Pentaho acquires Mondrian, as part of the process of assembling a full-blown open source BI suite.
2006 Palo launched
The first open source MOLAP server.
2006 Microsoft buys ProClarity
Microsoft underlines its BI ambitions by buying the leading front-end suite for Analysis Services.
Division Of Computer Engineering, SOE, CUSAT
12
Online Analytical Processing 2006 Microsoft Within just a few months of the first beta release of announces PerformancePoint Server, three leading CPM vendors PerformancePoint changed hands. 2007 Oracle buys Hyperion
In the largest-ever BI consolidation, Oracle purchased Hyperion Solutions, bringing together multiple OLAP products originating in over a dozen companies.
2007 Oracle delivers embedded OLAP
After years of promise, Oracle finally delivered genuine embedded OLAP capabilities in the 11g database, which can deliver performance benefits to unchanged relational applications via cube-based materialized views.
2008 IBM buys Cognos APL returns home, as IBM acquires Cognos Planning, which had started life as Frango within IBM more than 25 years earlier. Ironically, IBM now has to license the APL technology still used in the Analyst module of Cognos Planning. Fig(2.1) Olap Milestones
Division Of Computer Engineering, SOE, CUSAT
13
Online Analytical Processing
3.DIFFERENCES BETWEEN OLTP AND OLAP Main Differences between OLTP and OLAP are:1. User and System Orientation OLTP: customer-oriented, used for data analysis and querying by clerks, clients and IT professionals. OLAP: market-oriented, used for data analysis by knowledge workers( managers, executives, analysis). 2. Data Contents OLTP: manages current data, very detail-oriented. OLAP: manages large amounts of historical data, provides facilities for summarization and aggregation, stores information at different levels of granularity to support decision making process. 3. Database Design OLTP: adopts an entity relationship(ER) model and an application-oriented database design. OLAP: adopts star, snowflake or fact constellation model and a subject-oriented database design. 4. View OLTP: focuses on the current data within an enterprise or department. OLAP: spans multiple versions of a database schema due to the evolutionary process of an organization; integrates information from many organizational locations and data stores
Division Of Computer Engineering, SOE, CUSAT
14
Online Analytical Processing
4.DATAWAREHOUSE A data warehouse is a repository of an organization's electronically stored data. Data warehouses are designed to facilitate reporting and analysis This classic definition of the data warehouse focuses on data storage. However, the means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are also considered essential components of a data warehousing system. Many references to data warehousing use this broader context. Thus, an expanded definition for data warehousing includes business intelligence tools, tools to extract, transform, and load data into the repository, and tools to manage and retrieve metadata. In contrast to data warehouses are operational systems which perform day-to-day transaction processing Some of the benefits that a data warehouse provides are as follows: A data warehouse provides a common data model for all data of interest regardless of the data's source. This makes it easier to report and analyze information than it would be if multiple data models were used to retrieve information such as sales invoices, order receipts, general ledger charges, etc. •
Prior to loading data into the data warehouse, inconsistencies are identified and
resolved. This greatly simplifies reporting and analysis. •
Information in the data warehouse is under the control of data warehouse users so
that, even if the source system data is purged over time, the information in the warehouse can be stored safely for extended periods of time. •
Because they are separate from operational systems, data warehouses provide
retrieval of data without slowing down operational systems.
Division Of Computer Engineering, SOE, CUSAT
15
Online Analytical Processing Data warehouses facilitate decision support system applications such as trend reports (e.g., the items with the most sales in a particular area within the last two years), exception reports, and reports that show actual performance versus goals. Data warehouses can work in conjunction with and, hence, enhance the value of operational business applications, notably customer relationship management (CRM) systems
4.1Data warehouse architecture Architecture, in the context of an organization's data warehousing efforts, is a conceptualization of how the data warehouse is built. There is no right or wrong architecture. The worthiness of the architecture can be judged in how the conceptualization aids in the building, maintenance, and usage of the data warehouse. One possible simple conceptualization of a data warehouse architecture consists of the following interconnected layers: Operational database layer The source data for the data warehouse - An organization's ERP systems fall into this layer. Informational access layer The data accessed for reporting and analyzing and the tools for reporting and analyzing data - Business intelligence tools fall into this layer. And the Inmon-Kimball differences about design methodology, discussed later in this article, have to do with this layer. Data access layer The interface between the operational and informational access layer - Tools to extract, transform, load data into the warehouse fall into this layer. Metadata layer The data directory - This is often usually more detailed than an operational system data directory. There are dictionaries for the entire warehouse and sometimes dictionaries for the data that can be accessed by a particular reporting and analysis tool.
Division Of Computer Engineering, SOE, CUSAT
16
Online Analytical Processing The star schema (sometimes referenced as star join schema) as shown in fig(4.1.1) is the simplest style of data warehouse schema. The star schema consists of a few "fact tables" (possibly only one, justifying the name) referencing any number of "dimension tables". The star schema is considered an important special case of the snowflake schema
Fig (4.1.1) Star Schema A snowflake schema as shown in fig(4.1.2) is a logical arrangement of tables in a relational database such that the entity relationship diagram resembles a snowflake in shape. Closely related to the star schema, the snowflake schema is represented by centralized fact tables which are connected to multiple dimensions. In the snowflake schema, however, dimensions are normalized into multiple related tables whereas the star schema's dimensions are denormalized with each dimension being represented by a single table. When the dimensions of a snowflake schema are elaborate, having multiple levels of relationships, and where child tables have multiple parent tables ("forks in the road"), a
Division Of Computer Engineering, SOE, CUSAT
17
Online Analytical Processing complex snowflake shape starts to emerge. The "snowflaking" effect only affects the dimension tables and not the fact tables
Fig(4.1.2) Snow flake Schema So how is a data warehouse different from you regular database? After all, both are databases, and both have some tables containing data. If we look deeper, we'd find that both have indexes, keys, views, and the regular jing-bang. So is that 'Data warehouse' really different from the tables in you application? And if the two aren't really different, maybe we can just run your queries and reports directly from your application
Division Of Computer Engineering, SOE, CUSAT
18
Online Analytical Processing databases! The primary difference between you application database and a data warehouse is that while the former is designed (and optimized) to record , the latter has to be designed (and optimized) to respond to analysis questions that are critical for your business. Application databases are OLTP (On-Line Transaction Processing) systems where every transaction has to be recorded, and super-fast at that. Consider the scenario where a bank ATM has disbursed cash to a customer but was unable to record this event in the bank records. If this started happening frequently, the bank wouldn't stay in business for too long. So the banking system is designed to make sure that every trasaction gets recorded within the time you stand before the ATM machine. This system is write-optimized, and you shouldn't crib if your analysis query (read operation) takes a lot of time on such a system.
A Data Warehouse (DW) on the other end, is a database (yes, you are right, it's a database) that is designed for facilitating querying and analysis. Often designed as OLAP (On-Line Analytical Processing) systems, these databases contain read-only data that can be queried and analysed far more efficiently as compared to your regular OLTP application databases. In this sense an OLAP system is designed to be read-optimized.
Division Of Computer Engineering, SOE, CUSAT
19
Online Analytical Processing
5.ETL TOOLS ETL Tools are meant to extract, transform and load the data into Data Warehouse for decision making. Before the evolution of ETL Tools, the above mentioned ETL process was done manually by using SQL code created by programmers. This task was tedious and cumbersome in many cases since it involved many resources, complex coding and more work hours. On top of it, maintaining the code placed a great challenge among the programmers. These difficulties are eliminated by ETL Tools since they are very powerful and they offer many advantages in all stages of ETL process starting from extraction, data cleansing, data profiling, transformation, debuggging and loading into data warehouse when compared to the old method. There are a number of ETL tools available in the market to do ETL process the data according to business/technical requirements.
Fig(5.1) ETL
Division Of Computer Engineering, SOE, CUSAT
20
Online Analytical Processing 5.1 ETL Concepts
Extraction, transformation, and loading. ETL refers to the methods involved in accessing and manipulating source data and loading it into target database. The first step in ETL process is mapping the data between source systems and target database(data warehouse or data mart). The second step is cleansing of source data in staging area. The third step is transforming cleansed source data and then loading into the target system. Commonly used ETL tools •
Informatica
•
Datastage
Division Of Computer Engineering, SOE, CUSAT
21
Online Analytical Processing
6.CLASSIFICATIONS OF OLAP In the OLAP world, there are mainly two different types: Multidimensional OLAP (MOLAP) and Relational OLAP (ROLAP). MOLAP This is the more traditional way of OLAP analysis. In MOLAP, data is stored in a multidimensional cube. The storage is not in the relational database, but in proprietary formats. Multidimensional OLAP is one of the oldest segments of the OLAP market. The business problem MOLAP addresses is the need to compare, track, analyze and forecast high level budgets based on allocation scenarios derived from actual numbers. The first forays into data warehousing were led by the MOLAP vendors who created special purpose databases that provided a cube-like structure for performing data analysis. MOLAP tools restructure the source data so that it can be accessed, summarized, filtered and retrieved almost instantaneously. As a general rule, MOLAP tools provide a robust solution to data warehousing problems. Administration, distribution, meta data creation and deployment are all controlled from a central point. Deployment and distribution can be achieved over the Web and with client/server models. When MOLAP Tools Bring Value •
Need to process information with consistent response time regardless of
level of summarization or calculations selected. •
Need to avoid many of the complexities of creating a relational database to
store data for analysis. •
Need fastest possible performance.
Who Should Have Access to MOLAP Tools
Division Of Computer Engineering, SOE, CUSAT
22
Online Analytical Processing •
Users who are connected to a network and need to analyze larger, less defined
data. •
Users who want to access predefined reports, but need to have the ability to
perform additional analysis on information that may not be contained in the report. Advantages: •
Excellent performance: MOLAP cubes are built for fast data retrieval, and is
optimal for slicing and dicing operations. •
Can perform complex calculations: All calculations have been pre-generated
when the cube is created. Hence, complex calculations are not only doable, but they return quickly. Disadvantages: •
Limited in the amount of data it can handle: Because all calculations are
performed when the cube is built, it is not possible to include a large amount of data in the cube itself. This is not to say that the data in the cube cannot be derived from a large amount of data. Indeed, this is possible. But in this case, only summary-level information will be included in the cube itself. •
Requires additional investment: Cube technology are often proprietary and do not
already exist in the organization. Therefore, to adopt MOLAP technology, chances are additional investments in human and capital resources are needed. ROLAP This methodology relies on manipulating the data stored in the relational database to give the appearance of traditional OLAP's slicing and dicing functionality. In essence, each action of slicing and dicing is equivalent to adding a "WHERE" clause in the SQL statement.. data warehouse sizes, users have come to realize that they cannot store all of the information that they need in MOLAP databases. The business problem that ROLAP addresses is the need to analyze
Division Of Computer Engineering, SOE, CUSAT
23
Online Analytical Processing massive volumes of data without having to be a systems or software expert. Relational OLAP databases seek to resolve this dilemma by providing a multidimensional front end that creates queries to process information in a relational format. These tools provide the ability to transform two-dimensional relational data into multidimensional information. Due to the complexity and size of ROLAP implementations, the tools provide a robust set of functions for meta data creation, administration and deployment. The focus of these tools is to provide administrators with the ability to optimize system performance and generate maximum analytical throughput and performance for users. All of the ROLAP vendors provide the ability to deploy their solutions via the Web or within a multitier client/server environment. Advantages: •
Can handle large amounts of data: The data size limitation of ROLAP technology
is the limitation on data size of the underlying relational database. In other words, ROLAP itself places no limitation on data amount. •
Can leverage functionalities inherent in the relational database: Often, relational
database already comes with a host of functionalities. ROLAP technologies, since they sit on top of the relational database, can therefore leverage these functionalities. Disadvantages: •
Performance can be slow: Because each ROLAP report is essentially a SQL query
(or multiple SQL queries) in the relational database, the query time can be long if the underlying data size is large. •
Because ROLAP technology mainly relies on generating SQL statements to
query the relational database, and SQL statements do not fit all needs (for example, it is difficult to perform complex calculations using SQL), ROLAP technologies are therefore traditionally limited by what SQL can do. ROLAP vendors have mitigated this risk by building into the tool out-of-the-box complex functions as well as the ability to allow users to define their own functions.
Division Of Computer Engineering, SOE, CUSAT
24
Online Analytical Processing
7.APPLICATIONS OF OLAP OLAP is Fast Analysis of Shared Multidimensional Information — FASMI. There are many applications where this approach is relevant, and this section describes the characteristics of some of them. In an increasing number of cases, specialist OLAP applications have been pre-built and you can buy a solution that only needs limited customizing; in others, a general-purpose OLAP tool can be used. A general-purpose tool will usually be versatile enough to be used for many applications, but there may be much more application development required for each. The overall software costs should be lower, and skills are transferable, but implementation costs may rise and end-users may get less ad hoc flexibility if a more technical product is used. In general, it is probably better to have a general-purpose product which can be used for multiple applications, but some applications, such as financial reporting, are sufficiently complex that it may be better to use a pre-built application, and there are several available. OLAP applications have been most commonly used in the financial and marketing areas, but as we show here, their uses do extend to other functions. Data rich industries have been the most typical users (consumer goods, retail, financial services and transport) for the obvious reason that they had large quantities of good quality internal and external data available, to which they needed to add value. However, there is also scope to use OLAP technology in other industries. The applications will often be smaller, because of the lower volumes of data available, which can open up a wider choice of products (because some products cannot cope with very large data volumes).
Marketing and sales analysis Most commercial companies require this application, and most products are capable of handling it to some degree. However, large-scale versions of this application occur in three industries, each with its own peculiarities: Consumer goods industries often have large numbers of products and outlets, and a high rate of change of both. They usually analyze data monthly, but sometimes it may
Division Of Computer Engineering, SOE, CUSAT
25
Online Analytical Processing go down to weekly or, very occasionally, daily. There are usually a number of dimensions, none especially large (rarely over 100,000). Data is often very sparse because of the number of dimensions. Because of the competitiveness of these industries, data is often analyzed using more sophisticated calculations than in other industries. Often, the most suitable technology for these applications is one of the hybrid OLAPs, which combine high analytical functionality with reasonably large data capacity. Retailers, thanks to EPOS data and loyalty cards, now have the potential to analyze huge amounts of data. Large retailers could have over 100,000 products (SKUs) and hundreds of branches. They often go down to weekly or daily level, and may sometimes track spending by individual customers. They may even track sales by time of day. The data is not usually very sparse, unless customer level detail is tracked. Relatively low analytical functionality is usually needed. Sometimes, the volumes are so large that a ROLAP solution is required, and this is certainly true of applications where individual private consumers are tracked. The financial services industry (insurance, banks etc) is a relatively new user of OLAP technology for sales analysis. With an increasing need for product and customer profitability, these companies are now sometimes analyzing data down to individual customer level, which means that the largest dimension may have millions of members. Because of the need to monitor a wide variety of risk factors, there may be large numbers of attributes and dimensions, often with very flat hierarchies.
Clickstream analysis This is one of the latest OLAP applications. Commercial Web sites generate gigabytes of data a day that describe every action made by every visitor to the site. No bricks and mortar retailer has the same level of detail available about how visitors browse the offerings, the route they take and even where they abandon transactions. A large site has an almost impossible volume of data to analyze, and a multidimensional framework is possibly the best way of making sense of it. There are many dimensions to this
Division Of Computer Engineering, SOE, CUSAT
26
Online Analytical Processing analysis, including where the visitors came from, the time of day, the route they take through the site, whether or not they started/completed a transaction, and any demographic data about customer visitors. The Web site should not be viewed in isolation. It is only one facet of an organization’s business, and ideally, the Web statistics should be combined with other business data, including product profitability, customer history and financial information. OLAP is an ideal way of bringing these conventional and new forms of data together. This would allow, for instance, Web sites to be targeted not simply to maximize transactions, but to generate profitable business and to appeal to customers likely to create such business. OLAP can also be used to assist in personalizing Web sites. Many of the issues with clickstream analysis come long before the OLAP tool. The biggest issue is to correctly identify real user sessions, as opposed to hits. This means eliminating the many crawler bots that are constantly searching and indexing the Web, and then grouping sets of hits that constitute a session. This cannot be done by IP address alone, as Web proxies and NAT (network address translation) mask the true client IP address, so techniques such as session cookies must be used in the many cases where surfers do not identify themselves by other means. Indeed, vendors such as Visual Insights charge much more for upgrades to the data capture and conversion features of their products than they do for the reporting and analysis components, even though the latter are much more visible.
Database marketing This is a specialized marketing application that is not normally thought of as an OLAP application, but is now taking advantage of multidimensional analysis, combined with other statistical and data mining technologies. The purpose of the application is to determine who are the best customers for targeted promotions for particular products or services based on the disparate information from various different systems. Database marketing professionals aim to:
Division Of Computer Engineering, SOE, CUSAT
27
Online Analytical Processing 1.
Determine who the preferred customers are, based on their purchase of profitable
products. This can be done with brute force data mining techniques (which are slow and can be hard to interpret), or by experienced business users investigating hunches using OLAP cubes (which is quicker and easier). 2.
Work to build loyalty packages for preferred customers via correct offerings.
Once the preferred customers have been identified, look at their product mix and buying profile to see if there are denser clusters of product purchases over particular time periods. Again, this is much easier in a multidimensional environment. These can then form the basis for special offers to increase the loyalty of profitable customers If these goals are met, both parties profit. The customers will have a company that knows what they want and provides it. The company will have loyal customers that generate sufficient revenue and profits to continue a viable business. Database marketing specialists try to model (using statistical or data mining techniques) which pieces of information are most relevant for determining likelihood of subsequent purchases, and how to weight their importance. In the past, pure marketers have looked for triggers, which works, but only in one dimension. But a well established company may have hundreds of pieces of information about customers, plus years of transaction data, so multidimensional structures are a great way to investigate relationships quickly, and narrow down the data which should be considered for modeling. Once this is done, the customers can be scored using the weighted combination of variables which compose the model. A measure can then be created, and cubes set up which mix and match across multidimensional variables to determine optimal product mix for customers. The users can determine the best product mix to market to the right customers based on segments created from a combination of the product scores, the several demographic dimensions, and the transactional data in aggregate. Finally, in a more simplistic setting, the users can break the world into segments based on combinations of dimensions that are relevant to targeting. They can then calculate a return on investment on these combinations to determine which segments
Division Of Computer Engineering, SOE, CUSAT
28
Online Analytical Processing have been profitable in the past, and which have not. Mailings can then be made only to those profitable segments. Products like Express allows the users to fine tune the dimensions quickly to build one-off promotions, determine how to structure profitable combinations of dimensions into segments, and rank them in order of desirability.
Financial reporting and consolidation Every medium and large organization has onerous responsibilities for producing financial reports for internal (management) consumption. Publicly quoted companies or public sector bodies also have to produce other, legally required, reports. Accountants and financial analysts were early adopters of multidimensional software. Even the simplest financial consolidation consists of at least three dimensions. It must have a chart of accounts (general OLAP tools often refer to these as facts or measures, but to accountants they will always be accounts), at least one organization structure plus time. Usually it is necessary to compare different versions of data, such as actual, budget or forecast. This makes the model four dimensional. Often line of business segmentation or product line analysis can add fifth or sixth dimensions. Even back in the 1970s, when consolidation systems were typically run on time-sharing mainframe computers, the dedicated consolidation products typically had a four or more dimensional feel to them. Several of today’s OLAP tools can trace their roots directly back to this ancestry and many more have inherited design attributes from these early products. To address this specific market, certain vendors have developed specialist products. The market leader in this segment is Hyperion Solutions. Other players include Cartesis, Geac, Longview Khalix, and SAS Institute, but many smaller players also supply pre-built applications for financial reporting. In addition to these specialist products, there is a school of thought that says that these types of problems can be solved by building extra functionality on top of a generic OLAP tool.
Management reporting In most organizations, management reporting is quite distinct from formal financial reporting. It will usually have more emphasis on the P&L and possible cash Division Of Computer Engineering, SOE, CUSAT
29
Online Analytical Processing flow, and less on the balance sheet. It will probably be done more often — usually monthly, rather than annually and quarterly. There will be less detail but more analysis. More users will be interested in viewing and analyzing the results. The emphasis is on faster rather than more accurate reporting and there may be regular changes to the reporting requirements. Users of OLAP based systems consistently.. Management reporting usually involves the calculation of numerous business ratios, comparing performance against history and budget. There is also advantage to be gained from comparing product groups or channels or markets against each other. Sophisticated exception detection is important here, because the whole point of management reporting is to manage the business by taking decisions. The new Microsoft OLAP Services product and the many new client tools and applications being developed for it will certainly drive down ‘per seat’ prices for generalpurpose management reporting applications, so that it will be economically possible to deploy good solutions to many more users.
EIS EIS is one branch of management reporting. The term became popular in the mid 1980s, when it was defined to mean Executive Information Systems; some people also used the term ESS (Executive Support System). Since then, the original concept has been discredited, as the early systems were very proprietary, expensive and hard to maintain.
Fig(7.1) EIS
Division Of Computer Engineering, SOE, CUSAT
30
Online Analytical Processing With this proliferation of descriptions, the meaning of the term is now irretrievably blurred. In essence, an EIS is a more highly customized, easier to use management reporting system, but it is probably now better recognized as an attempt to provide intuitive ease of use to those managers who do not have either a computer background or much patience. There is no reason why all users of an OLAP based management reporting system should not get consistently fast performance, great ease of use, reliability and flexibility — not just top executives, who will probably use it much less than mid level managers. The basic philosophy of EIS was that “what gets reported gets managed,” so if executives could have fast, easy access to a number of key performance indicators (KPIs) and critical success factors (CSFs), they would be able to manage their organizations better. But there is little evidence that this worked for the buyers, and it certainly did not work for the software vendors who specialized in this field, most of which suffered from a very poor financial performance.
Profitability analysis This is an application which is growing in importance. Even highly profitable organizations ought to know where the profits are coming from; less profitable organizations have to know where to cut back. Profitability analysis is important in setting prices (and discounts), deciding on promotional activities, selecting areas for investment or divestment and anticipating competitive pressures. Decisions in these areas are made every day by many individuals in large organizations, and their decisions will be less effective if they are not well informed about the differing levels of profitability of the company’s products and customers. Profitability figures may be used to bias actions, by basing remuneration on profitability goals rather than revenue or volume. One popular way to assign costs to the right products or services is to use activity based costing. This is much more scientific than simply allocating overhead costs in proportion to revenues or floor space. It attempts to measure resources that are consumed
Division Of Computer Engineering, SOE, CUSAT
31
Online Analytical Processing by activities, in terms of cost drivers. Typically costs are grouped into cost pools which are then applied to products or customers using cost drivers, which must be measured. Some cost drivers may be clearly based on the volume of activities, others may not be so obvious. They may, for example, be connected with the introduction of new products or suppliers. Others may be connected with the complexity of the organization (the variety of customers, products, suppliers, production facilities, markets etc). There are also infrastructure-sustaining costs that cannot realistically be applied to activities. Even ignoring these, it is likely that the costs of supplying the least profitable customers or products exceeds the revenues they generate. If these are known, the company can make changes to prices or other factors to remedy the situation — possibly by withdrawing from some markets, dropping some products or declining to bid for certain contracts. There are specialist ABC products on the market and these have many FASMI characteristics. It is also possible to build ABC applications in OLAP tools, although the application functionality may be less than could be achieved through the use of a good specialist tool.
Quality analysis Although quality improvement programs are less in vogue than they were in the early 1990s, the need for consistent quality and reliability in goods and services is as important as ever. The measures should be objective and customer rather than producer focused. The systems are just as relevant in service organizations and the public sector. Indeed, many public sector service organizations have specific service targets. These systems are used not just to monitor an organization’s own output, but also that of its suppliers. There may, for example, be service level agreements that affect contract extensions and payments. Quality systems can often involve multidimensional data if they monitor numeric measures across different production facilities, products or services, time, locations and customers. Many of the measures will be non-financial, but they may be just as important as traditional financial measures in forming a balanced view of the organization. As with
Division Of Computer Engineering, SOE, CUSAT
32
Online Analytical Processing financial measures, they may need analyzing over time and across the functions of the organization; many organizations are committed to continuous improvement, which requires that there be formal measures that are quantifiable and tracked over long periods; OLAP tools provide an excellent way of doing this, and of spotting disturbing trends before they become too serious.
Division Of Computer Engineering, SOE, CUSAT
33
Online Analytical Processing
8.Future Gazing: Possibility of convergence of OLAP and OLTP
What prevents the convergence of functionality of OLAP AND OLTP tools, both being business intelligence tools with data retrieval and analysis being common areas. One significant difference being the function of time; in case of an OLTP tool the transactions are being logged in real time due to the necessity of business to monitor certain parameters regularly like inventory, production processes, whereas OLAP tools are more concerned with historical data not only within a business process but also combining the effect of the total business environment for example like promotions, special external events such as holidays and weather etc. Because of their different nature in the present form the service that is required is significantly different in each case. In case of OLTP the service being online and ability to store large amount of data is critical to its success, whereas in the case of an OLAP tool, the capability to do multidimensional analysis with historical data is important. The first step in this direction might have been taken by Microsoft by integrating reporting services with its SQL Server has taken the lead in integrating functionalities in its tools.
Division Of Computer Engineering, SOE, CUSAT
34
Online Analytical Processing
REFERENCES 1. http://en.wikipedia.org/wiki/Online_analytical_processing 2. http://www.dmreview.com/issues/19971101/964-1.html 3. http://en.wikipedia.org/wiki/Extract,_transform,_load 4. http://www.olapreport.com/Applications.html
Division Of Computer Engineering, SOE, CUSAT
35
View more...
Comments