Data wareHouse Questions

1.     What is data warehouse?
·     Data warehouse is the main repository of an organization’s historical data.
·     It contains the raw material for management’s decision taking system.
·     The main use of Data warehouse is that the data analyst can perform the complex queries without disturbing the daily operational systems.
·     Data warehouses contain a wide variety of data that present a coherent picture of business conditions at a single point in time.

2.     What is Business Intelligence (BI)?
·     BI refers to technologies, applications and practices for the collection, integration, analysis and presentation of business information.
·     The purpose of BI is to support better business decision making. Thus, BI is described as decision support system.

3.     What is dimension table?
·     Dimension table contains attributes of measurements stored in the fact tables.
·     Dimensional tables are the collection of hierarchies, categories and logic which can be used for user to traverse in hierarchy nodes.

4.     What is dimensional modeling?
·     Dimensional data model concept involves two types of tables and it is different from 3rd normal form. This concept uses Fact tables which contain the measurements of the business and Dimension table which contain the attribute of the measurements.

5.     What is Fact Table?
·     Fact table contains measurements of business process. Fact table contains the foreign keys for the dimension tables.

6.     What are additive, semi-additive and non-additive measures?
a.     Non-additive Measures
Non-additive measures are those which can not be used inside any numeric aggregation function (e.g. SUM(), AVG() etc.). One example of non-additive fact is any kind of ratio or percentage. Example, 5% profit margin, revenue to asset ratio etc. A non-numerical data can also be a non-additive measure when that data is stored in fact tables, e.g. some kind of varchar flags in the fact table.
b.     Semi Additive Measures
Semi-additive measures are those where only a subset of aggregation function can be applied. Let’s say account balance. A sum() function on balance does not give a useful result but max() or min() balance might be useful. Consider price rate or currency rate. Sum is meaningless on rate; however, average function might be useful.
c.     Additive Measures

Additive measures can be used with any aggregation function like Sum(), Avg() etc. Example is Sales Quantity etc.

7.     What is a hierarchy?
A hierarchy is the specification of levels that represent relationships between different attributes within a dimension. For example, one possible hierarchy in the Time dimension is YearàQuarteràMonthàDay

8.     What is ODS?
·     ODS – Operation Data Store.
·     A database structure that is a repository for near real –time operational data rather than long term trend data.

9.     What is ETL?
·     ETL is abbreviation of extract, transform and load.
·     ETL is software that enables businesses to consolidate their disparate data irrespective of their different formats in various databases.

10.     If de-normalized improves data warehouse processes, then why fact tables are in normal form?
·     Foreign keys of Fact tables are the primary keys of dimension tables. It is clear that fact table contains the columns which are primary keys of other table that itself makes them normal form tables.

11.     What are lookup tables?
·     A lookup table is the table placed on the target table based upon the primary keys of the target; it just updates the table by allowing only modified records based on the lookup condition.

12.     What are Conformed dimensions?
A conformed dimension is the dimension that is shared across multiple subject area. Consider 'Customer' dimension. Both marketing and sales department may use the same customer dimension table in their reports. Similarly, a 'Time' or 'Date' dimension will be shared by different subject areas. These dimensions are conformed dimension.
·     Conformed dimensions mean the exact same thing with every possible fact table to which they are joined.
·     Conformed dimensions are common to the cubes.

13.     What is degenerated dimensions?
A degenerated dimension is a dimension that is derived from fact table and does not have its own dimension table.
A dimension key, such as transaction number, receipt number, Invoice number etc. does not have any more associated attributes and hence can not be designed as a dimension table.

14.     What is junk dimension?
A junk dimension is a grouping of typically low-cardinality attributes (flags, indicators etc.) so that those can be removed from other tables and can be junked into an abstract dimension table.

These junk dimension attributes might not be related. The only purpose of this table is to store all the combinations of the dimensional attributes which you could not fit into the different dimension tables otherwise. Junk dimensions are often used to implement Rapidly Changing Dimensions in data warehouse

15.     What is a role-playing dimension?
Dimensions are often reused for multiple applications within the same database with different contextual meaning. For instance, a "Date" dimension can be used for "Date of Sale", as well as "Date of Delivery", or "Date of Hire". This is often referred to as a 'role-playing dimension'

16.     What is Conformed Fact?
·     Conformed dimensions are the dimensions which can be used across multiple Data Marts in combination with multiple fact tables accordingly.

17.     What is a level of granularity of a fact table?
·     Level of granularity means level of detail that you put into the fact table in a data warehouse. Level of granularity would mean what detail are you willing to put for each transactional fact.

18.     What are non-additive facts?
·     Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table. However they are considered to be useless facts. If there are changes in dimensions these facts can be useful.
Example: A fact may be measure, metric or a dollar value. Measure and metric are non additive facts.
Dollar value is additive fact. If we want to find out the amount for a particular place for a particular period of time, we can add the dollar amounts and come up with the total amount.
A non additive fact, for eg measure height(s) for 'citizens by geographical location' , when we rollup 'city' data to 'state' level data we should not add heights of the citizens rather we may want to use it to derive 'count'.

19.     What is factless fact table?
·     A fact table which does not contain any numeric fact columns, it is called the factless fact tables.
Example: suppose, we want to get details about the promotion of a particular product like when it is launched. We will get this information from promotion dimension and time dimension, so in this scenario there is no need of any fact information but we need data from these two dimensions then we go for factless facts.

20.     What are surrogate keys?
·     Surrogate key is substitute of natural primary key.
·     Data Warehouses typically uses surrogate key instead of primary keys.
·     Surrogate keys are helpful in handling the Slowly Changing Dimensions.

21.     What are Slowly Changing Dimensions (SCD)?
SCD applies to the cases where the attribute of a record varies over the time. There are three different types of SCDs:
·     SCD 1: The newer record replaces the original record. Only one record exists in database – current data.
·     SCD 2: A new record is added into the dimension table. Two records exist in database – current data and previous history data.
·     SCD 3: The original record is modified to include new data. One record exists in database – new information is attached with old information in same row.

22.     What is hybrid SCD?
Hybrid SCD is the combination of SCD 1 & SCD 2. It may happen that in a table, some columns are important and we need to track changes for them, i.e. capture the historical data for them (insert new records in such cases), whereas in some columns, even data changes, we do not care.

No comments:

Post a Comment