There are two version of definitions for Data Warehouse.
1) Definition of R. Kimbal 2) Definition of Inmon
According to R. Kimbal
"A data warehouse is a copy of transactional data specifically structured for querying and analysis".
According to this definition:
The form of the stored data (RDBMS, flat file) has nothing to do with whether something is a data warehouse.
Data warehousing is not necessarily for the needs of "decision makers" or used in the process of decision making.
According to Inmon
A data warehouse is a
subject-oriented,
integrated,
nonvolatile, and
time-variant
collection of data in support of management’s decisions. The data
warehouse contains granular corporate data.
Subject-Oriented?
Classical operations systems are organized around the applications
of the company. For an insurance company, the applications may be auto, health,
life, and casualty. The major subject areas of the insurance corporation might
be customer, policy, premium, and claim. For a manufacturer, the major subject
areas might be product, order, vendor, bill of material, and raw goods. For a
retailer, the major subject areas may be product, SKU, sale, vendor, and so
forth. Each
type of company has its own unique set of subjects.
Integrated
Of all the aspects of a data warehouse, integration is the most
important. Data is fed from multiple disparate sources into the data
warehouse. As the data is fed it is
converted, reformatted, resequenced, summarized, and so forth. The result is that
data—once it resides in the data warehouse—has a single physical corporate
image.
Nonvolatile
Data is updated in the operational environment as a regular matter
of course, but warehouse data exhibits a very different set of characteristics.
Data warehouse data is loaded (usually en masse) and accessed, but it is not
updated (in the general sense). Instead, when data in the data warehouse is
loaded, it is loaded in a snapshot, static format. When subsequent changes
occur, a new snapshot record is written. In doing so a history of data is kept
in the data warehouse.
Time-variant
Time variancy implies that
every unit of data in the data warehouse is accurate as of some one moment in
time. In some cases, a record is time stamped. In other cases, a record has a
date of transaction. But in every case, there is some form of time marking to
show the moment in time during which the record is accurate. A 1 to 2 year time
horizon is normal for operational systems; a 5-to-15-year time
horizon is normal for the data warehouse. As a result of this difference in
time horizons, the data warehouse contains much more history than any other environment
Summary:
While Creating a Data WareHouse Data will be isolated at department
level. Each Department deals with particular subject. For instance consider bank,
it has Insurance, Customer, Transnational etc details. While
creating a WareHouse each
department will be treated as a subject, further diluting the concept of
subject, each department will have its own source file to be processed.
When we Receive files from different departments then, each dept has its own
standards to be followed. for instance,
Encoding: (Gender field)
Dept A: Male/ Female
Dept B: M/F
Dept c: 0/1
Dept A,Dept B,Dept C Gender field will be converted into one unit or same
format. i.e Integrated.
Non Volatile data will not under go for any changes or not frequent
update or alter statements will be run against Data WareHouse.
Time-Variant is vital feature of DWH. When any source is processed, the
Timestamp of source, Process date & time and Period will be recorded.