Over time the architecture of data warehouse has evolved towards an architecture known as Data Warehouse (DW) 2.0. In DW 2.0 there have been several advances including the inclusion of unstructured data into the data warehouse, the need for a formal and enterprise wide inclusion of corporate metadata. This course includes an overview to DW 2.0 including:
An introduction to DW 2.0. We explore the traditional definition of a data warehouse as subject oriented, integrated, non-volatile, and time variant. We also explore the demands of unstructured data on the data warehouse and what makes the DW 2.0 architecture both unique and powerful.
The DW 2.0 Lifecycle. Data can start off as interactive which is very current (up to the second), then integrated (current, hours to 5 years), then near line (less than current to over five years), and finally archival (older than five years).
Archival within DW 2.0. We cover archival, which is when the primary usage of the data is done (that is, probability of access is low) yet the data still needs to be maintained by the organization. Data stored in archive can originate from the big data arena and contain both structured and unstructured data. Metadata is physically and tightly coupled with the data that resides in the archival sector. Data may be periodically retrieved from archival on a project basis for deeper analysis.
DW 2.0 Components. We explore each component of the data warehouse architecture including applications, procedures, programs, databases, and transactions. The structures within DW 2.0 are organized by subject area such as Customer and Product. We will also discuss the Operational Data Store (ODS).
DW 2.0 Database Design. The DW 2.0 contains different types of data. Therefore, there are different ways to do database design, which are covered within this video segment. We discuss the Interactive Sector, which demands a two to three second response time and 24 x 7 availability. The Integrated Sector of the architecture contains lots of data with this data being used for many different purposes. There is a heavy amount of indexing within the integrated sector. We also explore data mining within the integrated sector. With data mining the requirements are not provided or known, and usually the design resembles a spreadsheet in the form of flat records. We also discuss exploration processing and the role of historical data.
DW 2.0 Integrated Design. We cover the integrated sector of DW 2.0 in detail. This part of the architecture is granular, non-redundant, historical, and represents a single version of truth. This sector presents a corporate perspective of information.
DW 2.0 Interactive Sector. We cover the interactive sector of DW 2.0 in detail. Within this part of the architecture direct updating and loading of data occur. The interactive sector handles queries where high performance is demanded such as real time data warehousing. No historical integrity of data is maintained and data may not be integrated inside the interactive sector.
DW 2.0 Linkage. In this segment we cover the linkage between both structured and unstructured data. Some linkages are strong while others are weak. We cover external keys which are a powerful form of linkages. There’s no right or wrong way to implement links. They can be linked on the fly or ahead of time through static linkages.