Monday 8 September 2014

08:: Dimensional Modeling

Dimensional Modeling
• Used by most contemporary BI
solutions
– “Right” mix of normalization and
denormalization often called Dimensional
Normalization
– Some use for full data warehouse design
– Others use for data mart designs
• Consists of two primary types of tables

– 1) Dimension tables
– 2) Fact tables

Dimensional normalization
– Logical design technique that presents data in an intuitive
way allowing high-performance access
– Targets decision support information
– Focused on easy user navigation and high performance
design

(vs)

•  Transactional normalization
– Logical design technique to eliminate data redundancy, to
keep data consistency, and storage efficiency
– Makes transactions simple and deterministic
– ER models for enterprise are usually complex often

containing hundreds, or even thousands, of entities/tables


Fact & Dimension tables example


Dimension tables contains Surrogate keys and they are more of descriptive tables. they describe entities in particular. Where as Fact table is a centralized table which contains Foreign-keys of dimension table. Below image depicts the all.
In above pic FactInternetSales table is Fact table & rest are Dimension tables.

Fact and dimension table key points


Details of Dimension table

What does a Dimension Table capture??

Keys in dimension table?



Types of Dimension table?



1) Slowly Changing Dimensions
2) Conformed Dimension
3) Role Playing Dimension
4) De-Generate Dimension
5) Junk Dimension


1)Slowly Changing Dimensions:
Go through below link for understanding Slowly changing dimensions.
https://www.youtube.com/watch?v=mUGvYgYX13U

2) Conformed Dimension
The dimensions which is used more than one fact table is called conformed dimensions.
Ex-Product Dimension related to Order fact, Sles fact
  • Same Dimension joins to multiple Fact Tables or is used across multiple Data Marts.
  • It is a master dimension and is used across multiple dimensional models.

Time,Product & Staff are conformed dimensions.

3) Role Playing Dimension:

In this type of dimension, single dimension plays dual role for fact table. Refer example below.
In above example Dim_Date(date dimension) playing role of both Order_Date and Ship_Date.


4) De-Generate Dimension:

Degenerate dimension: A column of the key section of the fact table that does not have the associated dimension table but used for reporting and analysis, such column is called degenerate dimension or line item dimension.

For ex, we have a fact table with customer_id, product_id, branch_id, employee_id, bill_no, date in key section and price, quantity, amount in measure section. In this fact table, bill_no from key section is a single value, it has no associated dimension table. Instead of cteating a separate dimension table for that single value, we can include it in fact table to improve performance.SO here the column, bill_no is a degenerate dimension or line item dimension.

5) Junk Dimension

consider a trade company which consists of fact about trading that take places in a share trading firm.in these there may be some facts like mode of trade(which indicates whether the user is trading through phone or online)which will be not related to any of the dimensions such as account,date,indices,amount of share etc.
so these unrelated facts are removed from the fact table and stored as a separate dimension as a junk dimension which will be useful to provide extra information
 

Details of Fact Tables

What does a Fact table capture?
Fact Table Granularity
– The level of detail of data contained in the fact table
– The description of a single instance (a record) of the fact
Typically includes a time level and a distinct
combinations of other dimensions
• e.g. Daily item totals by product, by store,
Weekly snapshot of store inventory by product

Additive Nature
• Additive: Facts that can be summed
up/aggregated across all of the dimensions in the
fact table (e.g., discrete numerical measures of
activity, i.e., quantity sold, dollars sold)
• Semi-Additive: Facts that can be summed up for
some of the dimensions in the fact table, but not
the others (e.g., account balances, inventory
level, distinct counts)
• Non-Additive: Facts that cannot be summed up
for any of the dimensions present in the fact table
(e.g., measurement of room temperature)

Types of Fact Tables??

1) Transactional
2) Snapshot or inventory
3) Accumulating Fact table


1) Transactional fact table

Transaction fact table is one that holds data at the grain of per transaction.
e.g. If a customer buys 3 different products from a point of sale. then fact table will have three records for each transaction indicating 3 different type of product sale. Basically if three transactions are displayed on customer receipt then we have to store three records in fact table as granularity of fact table is at transaction level.
e.g. Customer Bank transaction
CustomerTransaction TypeAmountDate
Customer1Credit1000001-01-2012
Customer1Debit500002-01-2012
Customer1Credit100003-01-2012
Grain of above transaction table is per transaction
Transaction fact table are at the most detailed level and generally have large number of dimensions associated with it

2) Periodic snapshot fact table

As its name suggests periodic snapshot fact table are used to store a snapshot of data taken at particular point of time. Periodic fact tables stores one row for particular period of time.
e.g. let’s take an example of credit/debit transaction made by a customer.
CustomerTransaction TypeAmountDate
Customer1Credit1000001-01-2012
Customer1Debit500002-01-2012
Customer1Credit100003-01-2012
Above table is a transaction fact table suppose we need to create a periodic fact table whose grain is month which stores customer balance at the end of month then it should look like as below

CustomerMonthAmount
Customer1Jan-20126000

3) Accumulating Fact Table

Accumulating fact table stores one row for a entire life time of an event. To understand it better let’s take an example of order processing process.
DateOrder Status
01-01-2012Customer Ordered Product
02-01-2012Order Product Despatched from Warehouse
03-01-2012Handed Over to Courier Company
04-01-2012Delivered to Customer
If you look at the above events you could see that each date has its own name
e.g. Customer Order Date , Warehouse despatch date etc.
In accumulating fact table at each stage , dates will be updated with relevant facts

1 comment: