A data warehouse is an analytical database that is used as the foundation of a decision support system. It is designed for large volumes of read-only data, providing intuitive access to information that will be used in making decisions.
A data warehouse is created as ongoing commitment by the organization to ensure the appropriate data is available to the appropriate end user at the appropriate time”
Very large databases and systems require special capabilities, tools
- To analyse large quantities of data
- To access data from multiple systems
Three key techniques
3.Tools for accessing internal databases through the Web
Stores current and historical data from many core operational transaction systems
Consolidates and standardizes information for use across enterprise, but data cannot be altered
Data warehouse system will provide query, analysis, and reporting tools
Subset of data warehouse
Summarized or highly focused portion of firm’s data for use by specific population of users
Typically focuses on single subject or line of business
Online Transactions Processing (OLTP)
OLTP – Online Transactions Processing is the gathering of input data, processing that data and updating existing information to reflect the gathered & processed information.
Databases which support these functions are often called operational databases (e.g. customer db, product db).
Systems that support the running activities of the organization
- Point of sale in stores;
- ATM and Bank operations
- e-commerce (amazon, iTunes, etc)
- Thousand of operations per second
- Repeated operations dealing with small amounts of data (insert, update, remove)
- Real Time
Online Analytical Processing (OLAP)
This is the manipulation of information to support decision making.
Used in conjunction with data warehouses.
Data warehouse is special form of a databases that contains information gathered from operational databases for the purpose of supporting decision making.
Data warehouses do not support OLTP.
Supports multidimensional data analysis
Viewing data using multiple dimensions
Each aspect of information (product, pricing, cost, region, time period) is different dimension
E.g., how many washers sold in the East in June compared with other regions?
OLAP enables rapid, online answers to ad hoc queries
Systems that provide the users the necessary capabilities to analyze many and different aspects of organization activities and its performance.
- How well certain product is selling in different regions?
- How well is the evolution in the market from its introduction?
- Which are the top ten selling product in each region? and globally?
- Small number of queries (per day), when compared with OLTP systems
- Large amount of data processed in each query, in order to obtain a small output.
- It is hard to predict the queries and in general they are much more diverse, when compared with OLTP systems
Data warehouses are optimised for analytic access patterns. Analytic access patterns generally involve selecting specific fields and rarely if ever ‘select *’ as is more common in operational databases. Because of these differences in access patterns, operational databases (loosely, OLTP) benefit from the use of a row-oriented DBMS whereas analytics databases (loosely, OLAP) benefit from the use of a column oriented DBMS. Unlike operational systems which maintain a snapshot of the business, data warehouses generally maintain an infinite history which is implemented through ETL processes that periodically migrate data from the operational systems over to the data warehouse.