Satish Dwhconcept[1]
Short Description
Download Satish Dwhconcept[1]...
Description
1) What are the advantages data mining over tradition... A) Data Mining is used for the estimation of future. For example, if we take a company/business organization, by using the concept of Data Mining, we can predict the future of business in terms of Revenue (or) Employees (or) Customers (or) Orders etc. Traditional approaches use simple algorithms for estimating the future. But, it does not give accurate results when compared to Data Mining. 2) What is the difference between views and materialized Views? 2A) View - stores the SQL statement in the database and let you use it as a table. Every time you access the view, the SQL statement executes. Materialized view - stores the results of the SQL in table form in the database. SQL statement only executes once and after that every time you run the query, the stored result set is used. Pros include quick query results. 2B) VIEW: This is a PSEUDO table that is not stored in the database and it is just a query. MATERIALIZED VIEWS: These are similar to a view but these are permantely stored in the database and often refreshed. This is used in optimization for the faster data retrieval and is useful in aggregation and summarization of data. 3) What is the main difference between Inmon and Kimball...? 3A) basically speaking, Inmon professes the Snowflake Schema while Kimball relies on the Star Schema 3B) both differed in the concept of building the data warehouse... According to Kimball... Kimball views data warehousing as a constituency of data marts. Data marts are focused on delivering business objectives for departments in the organization. And the data warehouse is a conformed dimension of the data
marts. Hence a unified view of the enterprise can be obtained from the dimension modeling on a local departmental level. Inmon beliefs in creating a data warehouse on a subject-by-subject area basis. Hence the development of the data warehouse can start with data from the online store. Other subject areas can be added to the data warehouse as their needs arise. Point-of-sale (POS) data can be added later if management decides it is necessary.
i.e., Kimball--First Data Marts--Combined way ---Data warehouse Inmon---First Data warehouse--Later----Data marts
3C) the main difference b/w the Kimball and inmon technologies is... Kimball --- creating data marts first then combining them up to form a data warehouse Inmon----Creating data warehouse --- then data marts
3D) actually, the main difference is Kimball: fallows Dimensional Modeling Inmon: fallows ER Modeling bye Mayee 3E) Ralf Kimball: he follows bottom-up approach i.e., first creates individual Data Marts from the existing sources and then create Data Warehouse. BillImmon: he follows top-down approach i.e., first creates Data Warehouse from the existing sources and then create individual Data Marts.
4) What is junk dimension? What is the difference between junk dimension and degenerate dimension? 4A) a junk dimension is a collection of random transactional codes, flags and text attributes that are unrelated to any particular dimension. The junk dimension is simply a structure that provides the convenient place to store the junk dimension. 4B) a "junk" dimension is a collection of random transactional codes, flags and/or text attributes that are unrelated to any particular dimension. The junk dimension is simply a structure that provides a convenient place to store the junk attributes. Where as A degenerate dimension is data that is dimensional in nature but stored in a fact table. 4C) junk dimension: The column which we are using rarely or not used, these columns are formed a dimension is called junk dimension Degenerative dimension The column which we use in dimension is degenerative dimension Example: EMP table has empno, ename, sal, job, deptno But We are talking only the column empno, ename from the EMP table and forming a dimension this is called degenerative dimension 5) What is the definition of normalized and denormalization? 5A) Normalization is the process of removing redundancies. Denormalization is the process of allowing redundancies.
OLTP uses the Normalization process and The OLAP/DW uses the denormalised process to capture greater level of detailed data (each and every transaction) 6) Why fact table is in normal form? 6A) a fact table consists of measurements of business requirements and foreign keys of dimensions tables as per business rules. 6B) basically the fact table consists of the Index keys of the dimension/look up tables and the measures. So when ever we have the keys in a table .that itself implies that the table is in the normal form. 7) What is Difference between E-R Modeling and Dimensional modeling? 7A) Basic diff is E-R modeling will have logical and physical model. Dimensional model will have only physical model. E-R modeling is used for normalizing the OLTP database design. Dimensional modeling is used for de-normalizing the ROLAP/MOLAP design. 7B) E-R modeling revolves around the Entities and their relationships to capture the overall process of the system. Dimensional model/Muti-Dimensinal Modeling revolves around Dimensions (point of analysis) for decision making and not to capture the process.
7C) In ER modeling the data is in normalized form. So more number of Joins, which may adversely affect the system performnace.Whereas in
Dimensional Modeling the data, is denormalised, so less number of joins, by which system performance will improve. 8) What is conformed fact? 8A) conformed dimensions are the dimensions which can be used across multiple Data Marts in combination with multiple facts tables accordingly 8B) Conformed facts are allowed to have the same name in separate tables and can be combined and compared mathematically. 8C) the relationship between the facts and dimensions are with 3NF, and can works in any type of joins are called as conformed schema, the members of that schema are called so... 8D) Conformed dimensions are those tables that have a fixed structure. There will be no need to change the metadata of these tables and they can go along with any number of facts in that application without any changes 8E) A dimension table which is used by more than one fact table is known as a conformed dimension. 9) What are the methodologies of Data Warehousing? 9A) every company has methodology of their own. But to name a few SDLC Methodology, AIM methodology are stardadly used. Other methodologies are AMM, World class methodology and many more.
9B) Most of the time, we use Mr. Ralph Kimball methodologies for data warehousing design. Two kind of schema: star and snow flake. 9C) most probably every one fallows Either star schema or snowflake schema 9D) there r 2 methodologies 1) Kimball-first data marts then EDWH 2) inmon-first EDWH then data marts from edwh
9C) regarding the methodologies in the Data warehousing. They are mainly 2 methods. 1. Ralph Kimball Model 2. Inmon Model. Kimball model always structured as Denormalised structure. Inmon model structed as Normalized structure. Depends on the requirements of the company anyone can follow the company's DWH will choose the one of the above models.
9D) in Data warehousing contains the Two Methods 1>> Top Down Method 2>>Bottom up method In Top Down method first loads the Datamarts and then loads the data ware house. In Bottom Up method first loads the Data warehouse and then loads the Data marts. 9E) Top Down approach is first Data warehouse then Data marts. Bottom Down approach is first Data marts then Data warehouse. 9F) there are 2 methodologies 1. Kimball 2. Inmon likewise 1. Star Flake 2. Snow Flake schemas
9G) There are two approaches in Data ware housing named as
Top down Approach and Bottom-up Approach Top down approach in the sense preparing individual departments data (Data Marts) from the Enterprise Data warehouse Bottom up Approach is nothing but first gathering all the departments’ data and then cleanse the data and Transforms the data and then load all the individual departments data into the enterprise data ware house 10) what is BUS Schema? 10A) BUS Schema is composed of a master suite of confirmed dimension and standardized definition if facts. 10B) A BUS Schema or a BUS Matrix? A BUS Matrix (in Kimball approach) is to identify common Dimensions across Business Processes; i.e.: a way of identifying Conforming Dimensions. 11) What is Data warehousing Hierarchy? 11A) Hierarchies are logical structures that use ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation. For example, in a time dimension, a hierarchy might aggregate data from the month level to the quarter level to the year level. A hierarchy can also be used to define a navigational drill path and to establish a family structure. Within a hierarchy, each level is logically connected to the levels above and below it. Data values at lower levels aggregate into the data values at higher levels. A dimension can be composed of more than one hierarchy. For example, in the product dimension, there might be two hierarchies--one for product categories and one for product suppliers. Dimension hierarchies also group levels from general to granular. Query tools use hierarchies to enable you to drill down into your data to view different levels of granularity. This is one of the key benefits of a data warehouse. When designing hierarchies, you must consider the relationships in business structures. For example, a divisional multilevel sales organization.
Hierarchies impose a family structure on dimension values. For a particular level value, a value at the next higher level is its parent, and values at the next lower level are its children. These familial relationships enable analysts to access data quickly. Levels A level represents a position in a hierarchy. For example, a time dimension might have a hierarchy that represents data at the month, quarter, and year levels. Levels range from general to specific, with the root level as the highest or most general level. The levels in a dimension are organized into one or more hierarchies. Level Relationships Level relationships specify top-to-bottom ordering of levels from most general (the root) to most specific information. They define the parent-child relationship between the levels in a hierarchy. Hierarchies are also essential components in enabling more complex rewrites. For example, the database can aggregate existing sales revenue on a quarterly base to a yearly aggregation when the dimensional dependencies between quarter and year are known. 12) What are data validation strategies for data mart v...? 12A) Data validation is to make sure that the loaded data is accurate and meets the business requirements. Strategies are different methods followed to meet the validation requirements 13) What r the data types present in Bo? N what happens I... 13A) three different data types: Dimensions, Measure and Detail. View is nothing but an alias and it can be used to resolve the loops in the universe. 13B) in my knowledge, these are called as object types in the Business Objects.
And alias is different from view in the universe. View is at database level, but alias is a different name given for the same table to resolve the loops in universe. 13C) the different data types in business objects are: 1. Character.2. Date.3. Long text.4. Number 13D) dimension, measure, detail are objects type. Data types are character, date and numeric 14) What is surrogate key? Where we use it explain WI... 14A) Surrogate key is the primary key for the Dimensional table. 14B) surrogate key is a substitution for the natural primary key. It is just a unique identifier or number for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for each row in the table. Data warehouses typically use a surrogate, (also known as artificial or identity key), key for the dimension tables primary keys. They can use in sequence generator, or Oracle sequence, or SQL Server Identity values for the surrogate key. It is useful because the natural primary key (i.e. Customer Number in Customer table) can change and this makes updates more difficult. Some tables have columns such as AIRPORT_NAME or CITY_NAME which are stated as the primary keys (according to the business users) but ,not only can these change, indexing on a numerical value is probably better and you could consider creating a surrogate key called, say, AIRPORT_ID. This would be internal to the system and as far as the client is concerned you may display only the AIRPORT_NAME. 2. Adapted from response by Vincent on Thursday, March 13, 2003
Another benefit you can get from surrogate keys (SID) is: Tracking the SCD - Slowly Changing Dimension. Let me give you a simple, classical example: On the 1st of January 2002, Employee 'E1' belongs to Business Unit 'BU1' (that's what would be in your Employee Dimension). This employee has a turnover allocated to him on the Business Unit 'BU1' but on the 2nd of June the Employee 'E1' is muted from Business Unit 'BU1' to Business Unit 'BU2.' All the new turnover has to belong to the new Business Unit 'BU2' but the old one should Belong to the Business Unit 'BU1.' If you used the natural business key 'E1' for your employee within your data warehouse everything would be allocated to Business Unit 'BU2' even what actually belongs to 'BU1.' If you use surrogate keys, you could create on the 2nd of June a new record for the Employee 'E1' in your Employee Dimension with a new surrogate key. This way, in your fact table, you have your old data (before 2nd of June) with the SID of the Employee 'E1' + 'BU1.' All new data (after 2nd of June) would take the SID of the employee 'E1' + 'BU2.' You could consider Slowly Changing Dimension as an enlargement of your natural key: natural key of the Employee was Employee Code 'E1' but for you it becomes Employee Code + Business Unit - 'E1' + 'BU1' or 'E1' + 'BU2.' But the difference with the natural key enlargement process is that you might not have all part of your new key within your fact table, so you might not be able to do the join on the new enlarge key -> so you need another id.
14C) When creating a dimension table in a data warehouse, we generally create the tables with a system generated key to uniquely identify a row in the dimension. This key is also known as a surrogate key. The surrogate key is used as the primary key in the dimension table. The surrogate key will also be placed in the fact table and a foreign key will be defined between the
two tables. When you ultimately join the data it will join just as any other join within the database. 14D) A surrogate key to a data warehouse is what a primary key is for an OLTP source. It is used to uniquely identify a record in dimension tables. It provides the solution for the critical col., problem. mar 14E) surrogate key is system generated artificial primary key values e.g.: any candidate key can be considered as surrogate key.
14F) Surrogate key is a unique identification key, it is like an artificial or alternative key to production key, because the production key may be alphanumeric or composite key but the surrogate key is always single numeric key. Assume the production key is an alphanumeric field. If u creates an index for this fields it will occupy more space, so it is not advisable to join/index, because generally all the data warehousing fact table are having historical data. These fact tables are linked with so many dimension tables. If it's a numerical field the performance is high 14G) surrogate key in a data warehouse is more than just a substitute for a natural key. In a data warehouse, a surrogate key is a necessary generalization of the natural production key and is one of the basic elements of data warehouse design 14F) Surrogate key is a system generated key, It is mainly used for criticalum in dwh, Here criticalum means nothing but it is a column which when we updated in the them most dwh in to oltp systems 14G) surrogate keys r that which join dimension tables and fact table 14H) Surrogate Key is the solution for critical column problems. For example the customer purchases different items in different locations, for this situation we have to maintain historical data.
By using surrogate key we can introduce the row in the data warehouse to maintain historical data. 15) What is a linked cube? 15A) a cube can be partitioned in 3 ways.Replicate, Transparent and Linked. In the linked cube the data cells can be linked in to another analytical database. If an end-user clicks on a data cell, you are actually linking through another analytic database. 15B) linked cube in which a sub-set of the data can be analyzed into great detail. The linking ensures that the data in the cubes remain consistent.
16) Partitioning a cube? 16A) Partitioning a cube mainly used for optimization.(ex) U may have data for 5gb to create a report u can specify a size for a cube as 2gb so if the cube exceeds 2gb it automatically creates the second cube to store the data. 17) What is meant by metadata in context of a Data ware house? 17A) in context of a Data warehouse metadata is meant the information about the data .This information is stored in the designer repository. 17B) Meta data is the data about data; Business Analyst or data modeler usually capture information about data - the source (where and how the data is originated), nature of data (char, varchar, nullable, existence, valid values etc) and behavior of data (how it is modified / derived and the life cycle ) in data dictionary a.k.a metadata. Metadata is also presented at the Data mart level, subsets, fact and dimensions, ODS etc. For a DW user, metadata provides vital information for analysis / DSS.
17C) metadata is data about data, it including things name, location, and length including things.
We can u store data in metadata in data warehouse 18) What is incremental loading? 2. What i... 18A) Incremental loading means loading the ongoing changes in the OLTP. Aggregate table contains the [measure] values, aggregated /grouped/summed up to some level of hierarchy.
18B) Please learn to spell incremental and cross reference first! Or at least use a spell check! 19) What are the possible data marts in Retail sales....? 19A) product information, sales information 20) What is the main difference between schema in RDBMS and schemas in Data Warehouse....? 20A) RDBMS Schema * Used for OLTP systems * Traditional and old schema * Normalized * Difficult to understand and navigate * Cannot solve extract and complex problems * Poorly modeled DWH Schema * Used for OLAP systems * New generation schema * De Normalized * Easy to understand and navigate * Extract and complex problems can be easily solved * Very good model 20B) Schema is nothing but the systematic arrangement of tables
In OLTP it will be normalized In Data warehouse it will be denormalized 20C) the difference depends on the context. Technically, if Oracle is used, a schema is a "user". In that context there is no difference between the schemas in OLTP or ROLAP.Although, denormalized/normalized tables are given as examples above, it is not the difference. 20D) RDBMS-normalized Data warehouse -Denormalized 20E) Diff b.w OLTP and OLAP: -----------------------OLTP Schema: * Normalized * More no. of Trans * Less time for queries execution * More no. of users * Have Insert, delete and update Trans. OLAP (DWH) Schema: * De Normalized * Less no. of Trans * Less no. of users * More time for query exec * Will not have more insert, delete and updates. 21) What are the various ETL tools in the Market? 21A) Informatica, Ascential Data stage, AbInitio 21B) Various ETL tools used in market are: Informatica Data Stage Oracle Warehouse Builder
Ab Initio Data Junction 21C) 1. Informatica Power Center 2. Ascential Data Stage 3. ESS Base Hyperion 4. Ab Intio 5. BO Data Integrator 6. SAS ETL 7. MS DTS 8. Oracle OWB 9. Pervasive Data Junction 10. Cognos Decision Stream 21D) ETL TOOLS BY different vendors Informatica Ascential Data stage AbInitio 21E) Informatica Data Stage MS-SQL DTS (Integrated Services 2005) Abinitio SQL Loader Sunopsis Oracle Warehouse Builder Data Junction Data Integrator (Business Objects) 21F) Have any come acress ETL tool "sunopsis"..? If not please check this URL....It is amazing... http://www.sunopsis.com 22) What is Dimensional Modeling? 22A) In Dimensional Modeling, Data is stored in two kinds of tables: Fact Tables and Dimension tables.
Fact Table contains fact data e.g. sales, revenue, profit etc..... Dimension table contains dimensional data such as Product Id, product name, product description etc.....
22B) Dimensional Modeling is a design concept used by many data warehouse designers to build their data warehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measurements i.e., the dimensions on which the facts are calculated. 23) Why is Data Modeling Important? Data modeling is probably the most labor intensive and time consuming part of the development process. Why bother especially if you are pressed for time? A common response by practitioners who write on the subject is that you should no more build a database without a model than you should build a house without blueprints. The goal of the data model is to make sure that the all data objects required by the database are completely and accurately represented. Because the data model uses easily understood notations and natural language, it can be reviewed and verified as correct by the end-users. The data model is also detailed enough to be used by the database developers to use as a "blueprint" for building the physical database. The information contained in the data model will be used to define the relational tables, primary and foreign keys, stored procedures, and triggers. A poorly designed database will require more time in the long-term. Without careful planning you may create a database that omits data required to create critical reports, produces results that are incorrect or inconsistent, and is unable to accommodate changes in the user's requirements.
22C) Steps In Building the Data Model
While ER model lists and defines the constructs required to build a data model, there is no standard process for doing so. Some methodologies, such as IDEFIX, specify a bottom-up development process were the model is built in stages. Typically, the entities and relationships are modeled first, followed by key attributes, and then the model is finished by adding non-key attributes. Other experts argue that in practice, using a phased approach is impractical because it requires too many meetings with the end-users. The sequence used for this document is: Identification of data objects and relationships Drafting the initial ER diagram with entities and relationships Refining the ER diagram Add key attributes to the diagram Adding non-key attributes Diagramming Generalization Hierarchies Validating the model through normalization Adding business and integrity rules to the Model 22D) Dimensional Modeling is a logical design technique that seeks to present the data in a standard, intuitive framework that allows for highperformance access. It is inherently dimensional, and it adheres to a discipline that uses the relational model with some important restrictions. Every dimensional model is composed of one table with a multipart key, called the fact table, and a set of smaller tables called dimension tables. Each dimension table has a single-part primary key that corresponds exactly to one of the components of the multipart key in the fact table. Dimensional Modeling
22E) it’s a process or technique of designing a database model.
22F) a centralized table is called as fact table which is connected to multiple dimension table is called as dimensional modeling or star schema
22G) Systematic arrangement of Fact/Dimension tables are called as Schema, the designing a schema in data warehouse / data mart is known as Dimension modeling 22H) Dimensional Modeling, It is a modeling technic used in OLAP system, Here one fact table and surrounded by different dimensions. 23) What is VLDB? 23A) very large database 23B) the perception of what constitutes a VLDB continues to grow. A one terabyte database would normally be considered to be a VLDB. 23C) Data base is too large to back up in a time frame Then it's a VLDB 23D) VLDB stands for Very Large Data Base, any database too large (normally more than 1TB) considered as VLDB. 23E) Very Large Database (VLDB) It is sometimes used to describe databases occupying magnetic storage in the terabyte range and containing billions of table rows. Typically, these are decision support systems or transaction processing applications serving large numbers of users. 24) What is real time data-warehousing? 24A) Real-time data warehousing is a combination of two things: 1) realtime activity and 2) data warehousing. Real-time activity is activity that is
happening right now. The activity could be anything such as the sale of widgets. Once the activity is complete, there is data about it. Data warehousing captures business activity data. Real-time data warehousing captures business activity data as it occurs. As soon as the business activity is complete and there is data about it, the completed activity data flows into the data warehouse and becomes available instantly. In other words, real-time data warehousing is a framework for deriving information from data as the data becomes available.
24B) A real time data warehouse provide live data for DSS (may not be 100% up to that moment, some latency will be there). Data warehouse have access to the OLTP sources, data is loaded from the source to the target not daily or weekly, but may be every 10 minutes through replication or log shipping or something like that. SAP BW is providing real time DW, with the help of extended star schema, source data is shared. 24C) in real-time data warehousing, your warehouse contains completely up-to-date data and is synchronized with the source systems that provide the source data. In near-real-time data warehousing, there is a minimal delay between source data being generated and being available in the data warehouse. Therefore, if you want to achieve real-time or near-real-time updates to your data warehouse, you’ll need to do three things: Reduce or eliminate the time taken to get new and changed data out of your source systems. Eliminate, or reduce as much as possible, the time required to cleanse, transform and load your data. Reduce as much as possible the time required to update your aggregates. Starting with version 9i, and continuing with the latest 10g release, Oracle has gradually introduced features into the database to support real-time and near-real-time, data warehousing. These features include: Change Data Capture External tables, table functions, pipelining, and the MERGE command, and Fast refresh materialized views
24D) Real time Data warehousing means combination of heterogeneous databases and query and analysis purpose and Decision-making and reporting purpose. 25) What is a lookup table? 25A) when a table is used to check for some data for its presence prior to loading of some other data or the same data to another table, the table is called a LOOKUP Table. 25B) when a value for the column in the target table is looked up from another table apart from the source tables, that table is called the lookup table. 25C) when we want to get related value from some other table based on particular value... suppose in one table A we have two columns emp_id, name and in other table B we have emp_id address in target table we want to have emp_id, name, address we will take source as table A and look up table as B by matching emp_id we will get the result as three columns...emp_id, name, address 25D) A lookup table is nothing but a 'lookup' it gives values to referenced table (it is a reference), it is used at the run time, it saves joins and space in terms of transformations. Example, a lookup table called states, provide actual state name ('Texas') in place of TX to the output. 25E) based on responsibility how to protect/secure/hide even lookup values such as meaning? 25F) when a table is used to check for some data for its presence prior to loading of some other data or the same data to another table, the table is called a LOOKUP Table. 25G) reference table can be otherwise called as lookup table
25H) in DW Terminology the Dimension table is also called as Look up Table (Specific IB Business Objects) Since the index key in the fact table is from (referencing) the particular Dimension table so it’s also called as look up table.
25I) The Look Up table provides the detailed information about the attributes. For example, the lookup table for the quarter attribute would include a list of all the quarters available in the data warehouse.i.e. First quarter of 2001 may be represented as "Q1 2001" or "2001 Q1".BYE. 26) What is a general purpose scheduling tool? 26A) General purpose of scheduling tool may be cleansing and loading data at specific given time 26B) the basic purpose of the scheduling tool in a DW Application is to stream line the flow of data from Source to Target at specific time or based on some condition. 27) What type of Indexing mechanism do we need to use for a typical data warehouse 27A) bitmap index 27B) Function Index, B-tree Index, Partition Index, Hash index etc... 27C) on the fact table it is best to use bitmap indexes. Dimension tables can use bitmap and/or the other types of clustered/non-clustered, unique/nonunique indexes. To my knowledge, SQLServer does not support bitmap indexes. Only Oracle supports bitmaps.
27D)It generally depends upon the data which u have in table if u have less distinct values in particular column its always that u built up bit map index... rather that other one on dimension tables generally we have indexes... 27E) that is based on requirement and size of your data mart/data warehouse; Most of the data warehouse is in Bitmap index 28) Explain the advantages of RAID 1, 1/0, and 5. What type of RAID setup would you put your TX logs 28A) Raid 0 - Make several physical hard drives look like one hard drive. No redundancy but very fast. May use for temporary spaces where loss of the files will not result in loss of committed data. Raid 1- Mirroring. Each hard drive in the drive array has a twin. Each twin has an exact copy of the other twin’s data so if one hard drive fails; the other is used to pull the data. Raid 1 is half the speed of Raid 0 and the read and writes performance is good. Raid 1/0 - Striped Raid 0, then mirrored Raid 1. Similar to Raid 1. Sometimes faster than Raid 1. Depends on vendor implementation. Raid 5 - Great for read-only systems. Write performance is 1/3rd that of Raid 1 but Read is same as Raid 1. Raid 5 is great for DW but not good for OLTP. Hard drives are cheap now so I always recommend Raid 1. 29) What is a Data Warehousing? 29A) Datawarehosing is a process of creating, queriring and populating data warehouse. It includes a number of discrete technologies like Identifying sources Process of ECCD, ETL which includes data cleansing, data transforming and data loading to targets.
29B) A Data warehouse is a subject oriented, integrated, time-variant, nonvolatile collection of data to enable decision making across disparate group of users. 29C) a data warehouse is a repository containing subject-oriented, integrated, time-variant and non-volatile collection of data, used for companies’ decision support systems requirement 29D) Data warehousing is a subject oriented, authoritative, integrated historical database reflective of changes over meaningful time periods in order to facilitate query and analysis for useful management decision making.
29E) Data warehousing is a subject oriented, authoritative, integrated historical database reflective of changes over meaningful time periods in order to facilitate query and analysis for useful management decision making.
29F) Data warehouse contains a collection of historic (history of data), integrated, non-volatile data, which is used for analyzing and developing forecasting reports. 30) What does level of Granularity of a fact table signify? 30A) it describes the amount of space required for a database. 30B) Level of Granularity indicates the extent of aggregation that will be permitted to take place on the fact data. More Granularities implies more aggregation potential and vice-versa.
30C) in simple terms, level of granularity defines the extent of detail. As an example, let us look at geographical level of granularity. We may analyze data at the levels of COUNTRY, REGION, TERRITORY, CITY and STREET. In this case, we say the highest level of granularity is STREET. 30D) level of granularity means the upper/lower level of hierarchy, up to which we can see/drill the data in the fact table.
30E) Granularity means nothing but it is a level of representation of measures and metrics. The lowest level is called detailed data And highest level is called summary data It depends of project we extract fact table significance 31) What is data mining? 31A) Data mining is a process of extracting hidden trends within a data warehouse. For example an insurance data ware house can be used to mine data for the most high risk people to insure in a certain geographical area. 31B) in its simple definition you can say data mining is a way to discover new meaning in data. 31C) Data mining is a concept of deriving/discovering the hidden, unexpected information from the existing data 31D) Data Mining is a non-trivial process of identified valid, potantially useful and ultimately understands of data
31E) A data warehouse typically supplies answer to a question like 'who is buying our products/". A data mining approach would seek answer to questions like "Who is NOT buying our products?” 32) What is degenerate dimension table? 32A) the values of dimension which is stored in fact table is called degenerate dimensions. These dimensions don’t have its own dimensions. 32B) A attribute in fact table it’s not a fact and it’s not a key value 32C) in simple terms, the column in a fact table that does not map to any dimensions, neither it s a measure column. For e.g. Invoice_no, Invoice_line_no in fact table will be a degenerate dimension (columns), provided if you don’t have a dimension called invoice.
32D) Degenerate Dimensions: If a table contains the values, which r neither dimension nor measures is called degenerate dimensions.Ex: invoice id, empno 33) How do you load the time dimension 33A)In Data ware house we manually load the time dimension
33B) Every Data warehouse maintains a time dimension. It would be at the most granular level at which the business runs at (ex: week day, day of the month and so on). Depending on the data loads, these time dimensions are updated. Weekly process gets updated every week and monthly process, every month. 33C) Time dimension in DWH must be load manually. We load data into Time dimension using pl/sql scripts.
33D) Generally we load the Time dimension by using Source Stage as a Seq File and we use one passive stage in that transformer stage we will manually write functions as Month and Year Functions to load the time dimensions but for the lower level i.e., Day also we have one function to implement loading of Time Dimension.
34) What is ER DIAGRAM? 34A) ER - Stands for entity relationship diagrams. It is the first step in the design of data model which will later lead to a physical database design of possible an OLTP or OLAP database 34B) The Entity-Relationship (ER) model was originally proposed by Peter in 1976 [Chen76] as a way to unify the network and relational database views. Simply stated the ER model is a conceptual data model that views the real world as entities and relationships. A basic component of the model is the Entity-Relationship diagram which is used to visually represent data objects. Since Chen wrote his paper the model has been extended and today it is commonly used for database design For the database designer, the utility of the ER model is: It maps well to the relational model. The constructs used in the ER model can easily be transformed into relational tables. it is simple and easy to understand with a minimum of training. Therefore, the model can be used by the database designer to communicate the design to the end user. In addition, the model can be used as a design plan by the database developer to implement a data model in a specific database management software.
34C) ER diagram is a entity relationship diagram that provides the entities along with attributes. 34D) E.R Diagram (Entity Relationship diagram) means how the different database table related to each other and what r the primary key and foreign key and their relation. It is the first step of any database project to build E-R Diagram 34E) Physical and logical arrangement of the database table and relationship is explained by a diagram, that diagram is known as ER diagram 34F) ER diagram means it is a suitable modeling technic in OLTP systems, Here contain one-one, menty-menty relationship 35) Difference between Snow flake and Star Schema. What are situations where Snow flake Schema is better? 35A) star schema and snowflake both serve the purpose of dimensional modeling when it comes to data warehouses. Star schema is a dimensional model with a fact table (large) and a set of dimension tables (small). The whole set-up is totally denormalized. However in cases where the dimension table are split to many table that is where the schema is slightly inclined towards normalization ( reduce redundancy and dependency) there comes the snow flake schema. The nature/purpose of the data that is to be feed to the model is the key to your question as to which is better. 35B) Star schema contains the dimension tables mapped around one or more fact tables. It is a denormalised model. No need to use complicated joins. Queries results fastly.
Snowflake schema It is the normalized form of Star schema. Contains in-depth joins, because the tables r splitted in to many pieces. We can easily do modification directly in the tables. We have to use complicated joins, since we have more tables. There will be some delay in processing the Query.
35C) Star Schema means A centralized fact table and surrounded by different dimensions Snowflake means In the same star schema dimensions split into another dimensions Star Schema contains Highly Denormalized Data Snow flake contains partially normalized Star can not have parent table But snow flake contain parent tables Why need to go there Star: Here 1) less joiners contain 2) Simply database 3) Support drilling up options Why need to go Snowflake schema:
Here some times we used to provide seperate dimensions from existing dimensions that time we will go to snowflake Dis Advantage Of snowflake: Query performance is very low because more joiners is there 35D) star schema and snowflake both serve the purpose of dimensional modeling when it comes to data warehouses. Star schema is a dimensional model with a fact table (large) and a set of dimension tables (small). The whole set-up is totally denormalized. However in cases where the dimension table are split to many table that is where the schema is slightly inclined towards normalization ( reduce redundancy and dependency) there comes the snow flake schema. The nature/purpose of the data that is to be feed to the model is the key to your question as to which is better 36) What is a CUBE in data warehousing concept? 36A) Cubes are logical representation of multidimensional data. The edge of the cube contains dimension members and the body of the cube contains data values. 36B) Cube is a logical schema which contains facts and dimensions 36C) cubes r multi-dimensional view of DW or data marts. it is designed in a logical way to drill, slice-n-dice. Every part of the cube is a logical representation of the combination of facts-dimension attribs. 37) What is ODS 37A) ODS stands for Online Data Storage. It is used to maintain, store the current and up to date information and the transactions regarding the source databases taken from the OLTP system.
It is directly connected to the source database systems instead of to the staging area. It is further connected to data warehouse and moreover can be treated as a part of the data warehouse database. Edit by Admin: ODS Stands for Operational Data Store not Online Data Storage
37B) ODS stands for Operational Data Store. It is the final integration point in the ETL process before loading the data into the Data Warehouse.
37C) ODS stands for Operational Data Store. It contains near real time data. In typical data warehouse architecture, sometimes ODS is used for analytical reporting as well as source for Data Warehouse. 37D) Operational Data Services is Hybrid structure that has some aspects of a data warehouse and other Aspects of an Operational system. Contains integrated data. It can support DSS processing. It can also support High transaction processing. Placed in between Warehouse and Web to support web users.
37E) the form that data warehouse takes in the operational environment. Operational data stores can be updated, do provide rapid constant time, and contain only limited amount of historical data
37F) An Operational Data Store presents a consistent picture of the current data stored and managed by transaction processing system. As data is modified in the source system, a copy of the changed data is moved into the ODS. Existing data in the ODS is updated to reflect the current status of the source system 37G) ODS means Operational Data Store It is used to store current data through transactional webpplications, sap, and MQ series Current data means particular data from one date into one date ODS contains 30-90 data
37 H) an Operational Data Store is a collection of data in support of an organizations need for up to operational, integrated, collective information. ODS is purely operational construct to address the operational needs of a corporation. While loading data from Staging to ODS we do the process of data scrubbing, data validation. 38) What are conformed dimensions? 38A) they are dimension tables in a star schema data mart that adhere to a common structure, and therefore allow queries to be executed across star schemas. For example, the Calendar dimension is commonly needed in most data marts. By making this Calendar dimension adhere to a single structure, regardless of what data mart it is used in your organization, you can query by date/time from one data mart to another to another. 38B) Conformed dimensions are dimensions which are common to the cubes. (Cubes are the schemas contains facts and dimension tables) Consider Cube-1 contains F1, D1, D2, D3 and Cube-2 contains F2,D1, D2, D4 are the Facts and Dimensions
Here D1, D2 are the Conformed Dimensions
38C) if a table is used as a dimension table for more than one fact tables. Then the dimension table is called conformed dimensions. 38D) confirmed dimensions are the dimensions which can be used in multiple star schemas correct me if i am wrong. 38E) Conformed Dimensions are the one if they share one or more attributes whose values are drawn from the same domains.
38F) the dimensions which is used more than one fact table is called conformed dimensions 38G) A conformed dimension is a single, coherent view of the same piece of data throughout the organization. The same dimension is used in all subsequent star schemas defined. This enables reporting across the complete data warehouse in a simple format 38H) Conformed Dimensions are the Dimensions which are common to two cubes .say CUBE-1 contains F1,D1,D2,D3 and CUBE-2 contains F2,D1,D2,D4 are the Facts and Dimensions ,here D1,D2 are the Conformed Dimensions
38I) if the dimension is 100% sharable across the star schema then this dimension is called as confirmed dimension. RE: Which columns go to the fact table and which columns go the dimension table 39) What are SCD1, SCD2, and SCD3?
39A) SCD 1: Complete overwrite SCD 2: Preserve all history. Add row SCD 3: Preserve some history. Add additional column for ol/new. 39B) SCD Type 1, the attribute value is overwritten with the new value, obliterating the historical attribute values. For example, when the product roll-up Changes for a given product, the roll-up attribute are merely updated with the current value. SCD Type 2, a new record with the new attributes is added to the dimension table. Historical fact table rows continue to reference the old dimension key with the old roll-up attribute; going forward, the fact table rows will reference the new surrogate key with the new roll-up thereby perfectly partitioning history. SCDType 3, attributes are added to the dimension table to support two simultaneous roll-ups - perhaps the current product roll-up as well as “current version minus one”, or current version and original.
39C) SCD: -------- The value of dimensions is used change very rarely that is called Slowly Changing dimensions Here mainly 3 1) SCD1: Replace the old values overwrite by new values 2) SCD2: Just Creating Additional records 3) SCD3: It's maintain just previous and recent In the SCD2 again 3 1) Versioning 2) Flag value 3) Effective Date range
Versioning: Here the updated dimensions inserted in to the target along with version number The new dimensions will be inserted into the target along with Primary key Flagvalue: The updated dimensions insert into the target along with 0 And new dimensions inset into the target along with 1 40) What is Normalization, First Normal Form, Second Normal Form, and Third Normal Form? 40A) Normalization: The process of decomposing tables to eliminate data redundancy is called Normalization. 1N.F:- The table should contain scalar or atomic values. 2 N.F:- Table should be in 1N.F + No partial functional dependencies 3 N.F:-Table should be in 2 N.F + No transitive dependencies 40B) 2NF - table should be in 1NF + non-key should not dependent on subset of the key ({part, supplier}, sup address) 3NF - table should be in 2NF + non key should not dependent on another non-key ({part}, warehouse name, warehouse addr) {Primary key} More... 4, 5 NF - for multi-valued dependencies (essentially to describe many-tomany relations) 40C) Normalization can be defined as segregating of table into two different tables, so as to avoid duplication of values. The normalization is a step by step process of removing redundancies and dependencies of attributes in data structure The condition of data at completion of each step is described as a “normal form”. Needs for normalization: improves data base design.
Ensures minimum redundancy of data. Reduces need to reorganize data when design is modified or enhanced. Removes anomalies for database activities. First normal form: · A table is in first normal form when it contains no repeating groups. · The repeating column or fields in a UN normalized table are removed from the table and put in to tables of their own. · Such a table becomes dependent on the parent table from which it is derived. · The key to this table is called concatenated key, with the key of the parent table forming a part it. Second normal form: · A table is in second normal form if all its non_key fields fully dependent on the whole key. · This means that each field in a table must depend on the entire key. · Those that do not depend upon the combination key, are moved to another table on whose key they depend on. · Structures which do not contain combination keys are automatically in second normal form. Third normal form: · A table is said to be in third normal form, if all the non key fields of the table are independent of all other non key fields of the same table. 40D) Normalization is a process of remove the redundancy and inconsistency Mainly 3 normal forms 1normal form: Here contain only atomic values 2normal form: The nonkey values must be depend upon the primary key 3normal form: not depended on transitivity
40E) Normalization: It is the process of efficiently organizing data in a database.There is 2-goals of the normalization process: 1. Eliminate redundant data 2. Ensure data dependencies make sense (only storing related data in a table)First Normal Form: It sets the very basic rules for an organized database. 1. Eliminate duplicate columns from the same table 2. Create separate tables for each group of related data and identify each row with a unique column or set of columns. Second Normal Form: Further addresses the concept of removing duplicative data. 1. Remove subsets of data that apply to multiple rows of a table and place them in separate tables. 2. Create relationships between these new tables and their predecessors through the use of foreign keys. Third Normal Form: 1.Remove columns that are not dependent upon the primary key. Fourth Normal Form: 1.A relation is in 4NF if it has no multi valued dependencies. These normalization guidelines are cumulative. For a database to be in 2NF, it must first fulfill all the criteria of a 1NF database. 41) What is ETL? 41A) ETL is extraction, trasformation and loading, ETL technology is used for extraction the information from the source database and loading it to the target database with the necessary transformations done in between. 41B) ETL is a short for Extract, Transform and Load. It is a data integration function that involves extracting the data from outside sources, transforming it into business needs and ultimately loading it into a data warehouse 41C) ETL is an abbreviation for "Extract, Transform and Load”. This is the process of extracting data from their operational data sources or external data sources, transforming the data which includes cleansing, aggregation, summarization, integration, as well as basic transformation and loading the data into some form of the data warehouse. 41 D) Extraction Transformations Loading 41E) E: Extraction of data from the homogeneous/heterogeneous sources.
T: Transforming/modifying the source data by applying some transformations like Filter, Expression, Router, Joiner, Union (or) Lookup. L: Loading the Transformed data into corresponding Target tables. 42) What are non-additive facts? 42A) Non-additive facts are facts that cannot be summed up for any of The dimensions present in the fact table. Example: temparature, bill number...etc 42B) fact table typically has two types of columns: those that contain numeric facts (often called measurements), and those that are foreign keys to dimension tables. A fact table contains either detail-level facts or facts that have been aggregated. Fact tables that contain aggregated facts are often called summary tables. A fact table usually contains facts with the same level of aggregation. Though most facts are additive, they can also be semi-additive or nonadditive. Additive facts can be aggregated by simple arithmetical addition. A common example of this is sales. Non-additive facts cannot be added at all. An example of this is averages. Semi-additive facts can be aggregated along some of the dimensions and not along others. An example of this is inventory levels, where you cannot tell what a level means simply by looking at it. 42C) If the columns of a fact table is not able in the position to aggregate then it is called non-additive facts. 42D) Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table. 43) How are the Dimension tables designed?
43A) most dimension tables are designed using Normalization principles up to 2NF. In some instances they are further normalized to 3NF. 43B) Find where data for this dimension are located. Figure out how to extract this data. Determine how to maintain changes to this dimension (see more on this in the next section). Change fact table and DW population routines. 44) Why should you put your data warehouse on a different system than your OLTP system? 44A)OLTP system stands for on-line transaction processing. These are used to store only daily transactions as the changes have to be made in as few places as possible. OLTP do not have historical data of the organization Data warehouse will contain the historical information about the organization 44B) Data Warehouse is a part of OLAP (On-Line Analytical Processing). It is the source from which any BI tools fetch data for Analytical, reporting or data mining purposes. It generally contains the data through the whole life cycle of the company/product. DWH contains historical, integrated, denormalized, subject oriented data. However, on the other hand the OLTP system contains data that is generally limited to last couple of months or a year at most. The nature of data in OLTP is: current, volatile and highly normalized. Since, both systems are different in nature and functionality we should always keep them in different systems. 44C) An DW is typically used most often for intensive querying . Since the primary responsibility of an OLTP system is to faithfully record on going
transactions (inserts/updates/deletes), these operations will be considerably slowed down by the heavy querying that the DW is subjected to. 45) What is Fact Table? 45A) A table in a data warehouse whose entries describe data in a fact table. Dimension tables contain the data from which dimensions are created. 45B) a fact table in data ware house is it describes the transaction data. It contains characteristics and key figures. 45C) A Fact table is a collection of facts and foreign key relations to the dimensions. 45D) Fact Table contains the measurements or metrics or facts of business process. If your business process is "Sales”, then a measurement of this business process such as "monthly sales number" is captured in the Fact table. Fact table also contains the foreign keys for the dimension tables. 45E) Fact table contains the transactions data ,which have more columns and less no of rows. Among the data it also includes the foreign key of the dimension tables which r attached to it.
45F) Fact Table contains the keys(primary key,foreign key) of the related dimension tables and measures which is based on the keys.
45G) Fact table will have numeric columns. Or Values of the columns in Dimension table
45H) fact table which represents the information of measurements and as well as the foreign key of dimension table.... if am wrong please inform me 46) What are Semi-additive and factless facts and in which scenario will you use such kinds of fact table? 46A) Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others. For example: Current_Balance and Profit_Margin are the facts. Current_Balance is a semi-additive fact, as it makes sense to add them up for all accounts (what's the total current balance for all accounts in the bank?), but it does not make sense to add them up through time (adding up all current balances for a given account for each day of the month does not give us any useful information 46B) a factless fact table captures the many-to-many relationships between Dimensions, but contains no numeric or textual facts. They are often used to record events or Coverage information. Common examples of factless fact tables include: - Identifying product promotion events (to determine promoted products that didn’t sell) - Tracking student attendance or registration events - Tracking insurance-related accident events - Identifying building, facility, and equipment schedules for a hospital or university 47) What is a level of Granularity of a fact table? 47A) Level of granularity means level of detail that you put into the fact table in a data warehouse. For example: Based on design you can decide to put the sales data in each transaction. Now, level of granularity would mean what detail you are willing to put for each transactional fact. Product sales with respect to each minute or you want to aggregate it up to minute and put that data. 47B) It also means that we can have (for example) data aggregated for a year for a given product as well as the data can be drilled down to Monthly,
weekly and daily basis...the lowest level is known as the grain. Going down to details is Granularity The Aggregation or calculated value columns will go to Fact Table and details information will go to dimensional table. 48) Which columns go to the fact table and which columns go the dimension table? 48A) To add on, Foreign key elements along with Business Measures, such as Sales in $ amt, Date may be a business measure in some case, units (qty sold) may be a business measure, are stored in the fact table. It also depends on the granularity at which the data is stored 48B) Before broken into columns is going to the fact After broken going to dimensions
49) What are the Different methods of loading Dimension Tables? 49A) they are of two types insert--> if it is not there in the dimension and update--> if it exists. 49B) Conventional Load: Before loading the data, all the Table constraints will be checked against the data. Direct load :( Faster Loading) All the Constraints will be disabled. Data will be loaded directly. Later the data will be checked against the table constraints and the bad data won't be indexed.
49C) Conventional and Direct load method are applicable for only oracle. The naming convention is not general one applicable to other RDBMS like DB2 or SQL server.. 50) What are Aggregate Tables? 50A) Aggregate tables contain redundant data that is summarized from other data in the warehouse. 50B) these are the tables which contain aggregated / summarized data. E.g. Yearly, monthly sales information. These tables will be used to reduce the query execution time. 50C) Aggregate table contains the summary of existing warehouse data which is grouped to certain levels of dimensions. Retrieving the required data from the actual table, which have millions of records will take more time and also affects the server performance. To avoid this we can aggregate the table to certain required level and can use it. This table reduces the load in the database server and increases the performance of the query and can retrieve the result very fastly. 51) What is a dimension table? 51A) a dimension table in data warehouse is one which contains primary key and attributes. we called primary key as DIMID's(dimension id's). 51B) a dimensional table is a collection of hierarchies and categories along which the user can drill down and drill up. it contains only the textual attributes. 51C) Dimension tables r nothing but a master tables ,thru which u can extract the actual transactions .Dimension table contains less columns and more rows.
51D) Dimensional table is a table which contains business dimensions thru which v analyze the business matrices 52) What are the various Reporting tools in the Market? 52A) Cognos Business Objects Micro Strategies Actuate 52B) 1. MS-Excel 2. Business Objects (Crystal Reports) 3. Cognos (Impromptu, Power Play) 4. Micro strategy 5. MS reporting services 6. Informatica Power Analyzer 7. Actuate 8. Hyperion (BRIO) 9. Oracle Express OLAP 10. Proclarity 52C) INEA MS-Excel Business Objects (Crystal Reports) Cognos (Impromptu, Power Play) Micro strategy MS reporting services Informatica Power Analyzer Actuate Hyperion (BRIO) Oracle Express OLAP Proclarity SAS 52D) Reporting tools are entirely different from OLAP tools
OLAP tool are 1. Cognos 2. Business Objects 3. SAS 4. Microsoft Source analyzer 5. MSTR 6. Hyperion (BRIO)
53) What is the Difference between OLTP and OLAP? 53A) OLTP Current data Short database transactions Online update/insert/delete Normalization is promoted High volume transactions Transaction recovery is necessary OLAP Current and historical data Long database transactions Batch update/insert/delete Denormalization is promoted Low volume transactions Transaction recovery is not necessary
53B) OLTP is nothing but Online Transaction Processing, which contains a normalized tables and online data, which have frequent insert/updates/delete.
But OLAP (Online Analytical Programming) contains the history of OLTP data, which is, non-volatile ,acts as a Decisions Support System and is used for creating forecasting reports. 53C) OLTP: FEW OLAP: MANY JOINS OLTP: MANY OLAP: FEW 54) What is a Star Schema? 54A) A relational database schema organized around a central table (fact table) joined to a few smaller tables (dimension tables) using foreign key references. The fact table contains raw numeric items that represent relevant business facts (price, discount values, number of units sold, dollar value, etc.) 54B) Star schema is a type of organizing the tables such that we can retrieve the result from the database easily and fastly in the warehouse environment. Usually a star schema consists of one or more dimension tables around a fact table which looks like a star, so that it got its name. 54C) it’s a type of organizing the entities in a way, such that u can retrieve the result from the database easily and very fastly.Usually a star schema will have one or more dimension tables linking around a fact table and looks like a star. Hence got this name. 54D) Single fact table with ‘N’ number of dimension tables 55) Why are OLTP database designs not generally a good idea for a Data Warehouse?
55A) OLTP cannot store historical information about the organization. It is used for storing the details of daily transactions while a data warehouse is a huge storage of historical information obtained from different datamarts for making intelligent decisions about the organization. 56) Differences between star and snowflake schemas? 56A) the star schema is created when all the dimension tables directly link to the fact table. Since the graphical representation resembles a star it is called a star schema. It must be noted that the foreign keys in the fact table link to the primary key of the dimension table. This sample provides the star schema for a sales_ fact for the year 1998. The dimensions created are Store, Customer, Product_ class and time_by_day. The Product table links to the product_class table through the primary key and indirectly to the fact table. The fact table contains foreign keys that link to the dimension tables. 56B) the snowflake schema is a schema in which the fact table is indirectly linked to a number of dimension tables. The dimension tables are normalized to remove redundant data and partitioned into a number of dimension tables for ease of maintenance. An example of the snowflake schema is the splitting of the Product dimension into the product_category dimension and product_manufacturer dimension.. Read more on this here.... http://www.exforsys.com/content/view/1301/332/This tutorial covers Designing the Dimensional Model, Dimensional Model schemas like Star Schema, Snowflake Schema, Optimizing star schema and Design of the Relational Database, OLAP Cubes and Data mining tools, Security considerations, metadata and backup and recovery plans 56C) star schema uses denormalized dimension tables, but in case of snowflake schema it uses normalized dimensions to avoid redundancy... 56D) Star schema
A single fact table with N number of Dimension Snowflake schema Any dimensions with extended dimensions are know as snowflake schema
56E) Star Schema snowflake schema -------------------------Star schema is normalized Denormalised. Easy to use and understand End users will get confused. Want little efforts for maintenance Easy to maintain Fast execution of queries more time for exec bcas of more joins
56F) star schema uses denormalized dimension tables,but in case of snowflake schema it uses normalized dimensions to avoid redundancy... 57) What Snow Flake Schema 57A) Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data has been grouped into multiple tables instead of one large table. For example, a product dimension table in a star schema might be normalized into a products table, a product_category table, and a product_manufacturer table in a snowflake schema. While this saves space, it increases the number of dimension tables and requires more foreign key joins. The result is more complex queries and reduced query performance 57B) a normalized form of star schema is called snow flake schema. 57C) the snowflake schema is an extension of the star schema, where each point of the star explodes into more points. The main advantage of the snowflake schema is the improvement in query performance due to minimized disk storage requirements and joining smaller lookup tables. The
main disadvantage of the snowflake schema is the additional maintenance efforts needed due to the increase number of lookup tables.
57D) some people are considering as Normalized star schema, but it is partially normalized star schema. By partially normalizing it we may save some disk space.
57E) Star schema A single fact table with N number of Dimension Snowflake schema Any dimensions with extended dimensions are know as snowflake schema Multiple Star (galaxy) If the schema has more than one fact table then the schema is said to be Multiple star 58) What are modeling tools available in the Market? 58A) Modeling Tool Vendor =========== ========== Erwin Computer Associates ER/Studio Embarcadero Power Designer Sybase Oracle Designer Oracle 58B) these tools are used for Data/dimension modeling Oracle Designer Erwin (Entity Relationship for windows) Informatica (Cubes/Dimensions) Embarcadero
Power Designer Sybase 59) What are slowly changing dimensions? 59A) Dimensions that change over time are called Slowly Changing Dimensions. For instance, a product price changes over time; People change their names for some reason; Country and State names may change over time. These are a few examples of Slowly Changing Dimensions since some changes are happening to them over a period of time 59B) if the data in the dimension table happen to change very rarely, then it is called as slowly changing dimension. 60) What are Data Marts? 60A) Data mart is small subset of the data warehouse. It contains business division and department level. 60B) a data mart is a focused subset of a data warehouse that deals with a single area (like different department) of data and is organized for quick analysis 60C) Data Marts: A subset of data warehouse data used for a specific business function whose format may be a star schema, hypercube or statistical sample 60D) Data mart is the sub set of data ware housing and it is analysis the data one particular department and particular point of view. 60E) Data Mart: a data mart is a small data warehouse. In general, a data warehouse is divided into small units according the business requirements. For example, if we take a Data Warehouse of an organization, then it may be divided into the following individual Data Marts. Data Marts are used to improve the performance during the retrieval of data. e.g.: Data Mart of Sales, Data Mart of Finance, Data Mart of Marketing, Data Mart of HR etc.
61) What is the difference between Data warehousing and Business Intelligence? 61A)Data warehousing deals with all aspects of managing the development, implementation and operation of a data warehouse or data mart including meta data management, data acquisition, data cleansing, data transformation, storage management, data distribution, data archiving, operational reporting, analytical reporting, security management, backup/recovery planning, etc. Business intelligence, on the other hand, is a set of software tools that enable an organization to analyze measurable aspects of their business such as sales performance, profitability, operational efficiency, effectiveness of marketing campaigns, market penetration among certain customer groups, cost trends, anomalies and exceptions, etc. Typically, the term “business intelligence” is used to encompass OLAP, data visualization, data mining and query/reporting tools. Think of the data warehouse as the back office and business intelligence as the entire business including the back office. The business needs the back office on which to function, but the back office without a business to support, makes no sense. 61B) Data ware house is a relational database and it design analysis and transformation processing. A Data warehousing is a subject oriented, integrated, timevarient and nonvolatile collection of the data, A the support and management of the decision making process. Business Intelligence is collection of data warehousing, data mart and knowledge. 62) What is snapshot? 62A) you can disconnect the report from the catalog to which it is attached by saving the report with a snapshot of the data. However, you must reconnect to the catalog if you want to refresh the data. 63) Is OLAP databases are called decision support systems? 63A) True
64) What is active data warehousing? 64A) an active data warehouse provides information that enables decisionmakers within an organization to manage customer relationships nimbly, efficiently and proactively. Active data warehousing is all about integrating advanced decision support with day-to-day-even minute-to-minute-decision making in a way that increases quality of those customer touches which encourages customer loyalty and thus secure an organization's bottom line. The marketplace is coming of age as we progress from first-generation "passive" decision-support systems to current- and next-generation "active" data warehouse implementations 64B) Active Data ware house means Every user can access the database any time 24/7 That is called Active dwh
64B) Active Transformation means data can change and pass. 65) Why Renormalization is promoted in Universe Design... 65A) in a relational data model, for normalization purposes, some lookup tables are not merged as a single table. In a dimensional data modeling (star schema), these tables would be merged as a single table called DIMENSION table for performance and slicing data. Due to this merging of tables into one large Dimension table, it comes out of complex intermediate joins. Dimension tables are directly joined to Fact tables. Though, redundancy of data occurs in DIMENSION table, size of DIMENSION table is 15% only when compared to FACT table. So only Denormalization is promoted in Universe Designing. 65B) in a relational data model, for normalization purposes, some lookup tables are not merged as a single table. In a dimensional data modeling (star schema), these tables would be merged as a single table called DIMENSION
table for performance and slicing data. Due to this merging of tables into one large Dimension table, it comes out of more intermediate joins. Dimension tables are directly joined to Fact tables. Though, redundancy of data occurs in DIMENSION table, size of DIMENSION table is 15% only when compared to FACT table. So only Denormalization is promoted in Universe Designing.
66) Explain in detail about type 1, type... 66A) Type-1 Most Recent Value Type-2(full History) i) Version Number ii) Flag iii) Date Type-3 Current and one previous value 66B) SCD Type 1: overwrite data is to be there. Type 2: current, recent and history data should be there. Type 3: current and recent data should be there. 66C) SCD Type 1: overwrite data is to be there. Type 2: current, recent and history data should be there.
Type 3: current and recent data should be there. 66D) SCD means if the data in the dimension is happen to change very rarely, Mainly SCD 3 types 1) SCD-1: HERE THE PREVIOUS DATA OVERWRITE BY CURRENT DATA MEANS HERE ONLY MAINTAIN CURRENT DATA. 2) SCD-2: HERE JUST ADD THE ADDITIONAL RECORDS IN SCD2 3 TYPES1) VERSIONING2) FLAGVALUE3) EFFECT DATE RANGE VERSIONING: MEANS HERE JUST SEND THE UPDATED RECORDS TO THE TARGET ALONG WITH VERSION NUMBER. NEW RECORDS WILL BE SENDING TO THE TARGET ALONG WITH PRIMARY KEY FLAGVALUE: HERE UPDATED RECORDS SEND TO THE TARGET ALONG WITH 0 AND RECENT RECORDS SEND TO THE TARGET ALONG WITH 1 EFFECTIVE DATE RANGE: MEANS HERE TRACKS THE BOTH PREVIOUS AND CURRENT DATA 3) SCD-3: HERE MAINTAINS JUST PREVIOUS AND CURRENT DATA. 67) What are non-additive facts in detail? 67A) a fact may be measure, metric or a dollar value. Measure and metric are non additive facts.
Dollar value is additive fact. If we want to find out the amount for a particular place for a particular period of time, we can add the dollar amounts and come up with the total amount. A non additive fact, for e.g. measure height(s) for 'citizens by geographical location' , when we rollup 'city' data to 'state' level data we should not add heights of the citizens rather we may want to use it to derive 'count' 67B) Types of Facts There are three types of facts: Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact table. Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others. Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table. 67C) Factless Fact - same as non additive facts ... it can be counted but cannot be measured directly... 67D) Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.
View more...
Comments