DWH Slowly Changing Dimension

July 18, 2017 | Author: suri1221 | Category: Data Warehouse, Computer Data, Data Management, Business, Data

Share Embed Donate

Report this link

Short Description

Download DWH Slowly Changing Dimension...

Description

The definition of slowly changing dimension is in its name only. The dimension whichÂ changes slowly with time. A customer dimension table represents customer. When creatingÂ a customer, normal assumption is it is independent of time. But what if address of customer changes or name of customer changes? This change can be implemented in three ways... Type I: Replace the old record with a new record with updated data, there by we lose the history. But data warehouse has a responsibility to track the history effectively, where Type I implementation fails. Type II: Create a new additional dimension table record with new value. By this way we can keep the history. We can determine which dimension is current by adding a current record flag or by time stamp on the dimensional row. Type III: In this type of implementation we create a new field in the dimension table, which stores the old value of the dimension. When an attribute of the dimension changes then, we push the updated value to the current field and old value to the old field.

--------------------------------------------------------------------------------------------------------------------

Slowly Changing Dimensions Entities change over time. Customer demographics, product characteristics, classification rules, status of customers etc. lead to changes in the attributes of dimensions. In a transaction system, many a times the change is overwritten and track of change is lost. For example a source system may have only the latest customer PIN Code, as it is needed to send the marketing and billing statements. However, a data warehouse needs to maintain all the previous PIN Codes as well, because we need to track on how many customers move to new locations over what frequency. A key benefit for Data Warehouse is to provide historical information, which is typically overwritten (and thus lost)in the transaction systems. How to handle slowly changing dimensions in a Dimensional Model is a key determinant to that benefit.

There are three ways to handle the same: Slowly Changing Dimension method 1 (In short SCD 1)

The way most of the source systems will handle it- Overwrite the attribute value. For example if a customer’s marital status has moved from 'Unmarried' to 'Married', we over-write 'unmarried' to 'Married'. Similarly, if an insurance policy status has moved from 'Lapsed' to 'Re-instated' the new status is over written on the old status. This is obviously done, when we are not analyzing the historical information.

Slowly Changing Dimension Method 2 (in short SCD 2) This is the true-blue technique to deliver precise historical analysis. This is used, when there is more than one change in the attributes of an entity, and we need to track the date of change of the attribute. In this method, a new record is added whereby the new record is given a separate identifier as the primary key. We cannot use the production key as the primary key here as it has not changed (Customer ID has remained the same, while the value of its attribute 'marital status' has changed). This new identifier is called the surrogate key. Apart from adding a new record and providing a new primary (surrogate) key, the validity period for this new record is also added. For example- You have a dimensional table with customer_ID '110002' with marital status as 'single'. Overtime, customer gets married and also moved to a new location. The customer dimension record will be: Surrogate Key 1100021 1100022 1100023

Marital Status Sept 23, 2004 Single Oct 25, 2005 Married Nov 23, 2005 Married

Customer ID Date Valid 110002 110002 110002

Date of Birth Jan8, 1982 Jan8, 1982 Jan8, 1982

City Palo Alto Palo Alto San Francisco

Slowly changing dimension method 3 (SCD 3) This is a mid-way between method 1 and method 2. Here we don’t add an additional record, but add a new field 'old attribute value'. However, this has limitations. This method has to know from the beginning on what attributes will change. This is because a new field/attribute has to be added in the design for every attribute, which can change. Secondly, attribute can change maximum once in the lifetime of the entity OR at least the lifetime of the data warehouse. Surrogate Key

Customer ID

Marital Date of Status Birth

1100021

110002

Married Jan8, 1982 San Francisco

City

Marital Status Old Single

City Old Palo Alto

NOTE – The term of 'Slowly changing dimension' is used because of it being a universally acknowledged term. However, the same methods will apply to fast changing dimensions as well.

Surrogate Keys as Primary keys of dimension tables There is a best practice in dimensional model design to not to use the production primary key as the primary key for the dimension table. This goes against conventional logic, but has a reason. Data Warehouse has a core need for maintaining historical information and how an entity has moved and changed shape through the passage of time. Typically Source Systems need for this kind of information is quite less. In case of historical tracking in source systems, these systems can have the luxury of using multiple-field primary key (including the key identifier of the entity plus date stamp). For example if an insurance policy is lapsed and after two months it becomes reinstated, one can use the primary key as a combination of Policy number+ date/time +the status in the 'policy history table'. However, Data Warehouse doesn’t recommend the luxury of using multiple field primary key in dimension table. Therefore, the concept 'surrogate Key' comes into play where the primary key is not the production key, but a key generated by the system. The production key is also used as an attribute within the same dimension table.

The situations/reasons on when a surrogate key is used: • • •

'Slowly changing dimensions' When the primary key itself is repeated. When there is a multiple field primary key. Dimension model typically does not use multiple field primary key to link to the fact table.

Therefore it is always recommended to use surrogate keys. it is difficult to find the organizations, which will not face the situations as highlighted above. If there are, they could as well manage their needs using excel and pivot tables.

Acquisition Phase 1)Access lock 2)Acquisition lock which sets a flag in table header rejecting any DDL except DROP. How these 2 locks differ with each other.....basically access lock doesnt allow any exclusive lock but acquisition lock allows DROP....does that mean it overrides access lock???? Application Phase 1)Write lock 2)Application phase which sets a flag in table header rejecting any DDL except DROP and allowing DML with access lock only(select)Same ques here also......how these 2 differs

RELEASE MLOAD; Will release access lock on target table and acquisition lock or both....? Also please specify what causes these errors and what shud we do in each case a)error 2571 ....mload can't be released and what is the raeson for such error b)error 2652 Error Code 2652. Operation Not allowed.tableis being mloaded

----------------------------------------------------------------------------

DWH Slowly Changing Dimension

Short Description

Description

Comments

We need your help!