A data warehouse is a database that is maintained separately from an organization's operational database. The data warehouse stores minimally processed data. A data warehouse includes raw (or minimally processed) data, metadata, and summary data. The data warehouse supports information processing by providing a solid platform of consolidated, historical data for analysis.
William H. Inmon originally defined a data warehouse as "a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management's decision-making process."
This defines the four key elements of a data warehouse
- **subject-oriented**: organized around major subjects (e.g., users, products) rather than ongoing operations.
- **integrated**: organizes multiple heterogenous data sources through data cleaning and integration (e.g., naming, encoding, attribute measures).
- **time-variant**: provides a historical perspective by preserving all past data; time is included explicitly or implicitly.
- **non-volatile**: updated infrequently (relative to the relational database); avoids transaction processing, recovery and concurrency controls.
It also helps define a boundary around what should be included in a data warehouse, namely that it should be useful for decision making.
When used informally, data warehousing can describe any process of storing data in useful formats for later processing, up to and including formal data modeling and database storage (although warehousing typically indicates a less structured approach).
Subsets with specific foci are called **data marts**.
The [[data warehouse schema]] describes the organizational approach of the data warehouse.
> [!Tip]- Additional Resources
> - [Data Warehouse Concepts] | Amazon AWS
> - [Data Warehouse Topics] | Oracle Cloud
> - [Data Warehouse] | IBM Cloud
> - [Data Warehouse Explained] | Google Cloud