Data Load in a Data Warehouse

Filed under: General — admin at 5:18 pm on Monday, August 18, 2008

The ultimate goal and the responsibility of the load manager is to get data loaded into the data warehouse database. There are a number of different methods for getting the data from the source into the data warehouse. Some of the more commonly used approaches are: loading from flat files, 3GL programs for extracting and loading data, gateway products for connecting different RDBMSs, and copy management tools.

Some of these tools, such as the gateway products and some of the third-party tools, are designed to select data directly from the source system itself. Others rely on the data being transferred from the source system as a file. If large amounts of data need to be loaded, then it is likely that you will need to use the RDBMS’s own loader software. These loader programs will sometimes have direct path load mechanisms, which bypass the SQL layer of the RDBMS and build whole database blocks directly. This will greatly improve the performance of the load.

Generally, the load performance can be further improved if there is.a no-logging option. This option switches off journaling for the load operation. Note, however, that if this option is used it needs to be taken into consideration in your backup recovery strategy. If no-logging is to be used you also need to design the load jobs to be re-startable.

Data Movement in a Data Warehouse

Filed under: General — admin at 5:22 pm on Wednesday, August 13, 2008

The warehouse manager is responsible for any data movement within the warehouse, such as aggregation creation and maintenance. Any tables, indexes or other objects required will also be created and maintained by the warehouse manager.
Aggregation creation and maintenance is a process that should be automated.

The warehouse manager needs an interface that allows new aggregations to be requested, and old aggregations to be removed. This mechanism can be controlled via tables in the database. Database procedures can be used to insert a row into a table for each new aggregation to be created. The warehouse manager will also need to know what fields are required in the aggregation and what fields to aggregate on. From this information it can build the aggregation and the metadata to describe it.

Most aggregations can be created by a single query. However, the query is often complex and not necessarily the most efficient method of generating the aggregation. It may be possible to generate the aggregation from other aggregations, or by a multiple-stage process with interim result tables. These interim results can often be used as the basis of multiple aggregations, thereby saving time and resource overall. The warehouse manager needs to be capable of taking advantage of these optimizations.

Data Warehouse Manager

Filed under: General — admin at 5:18 pm on Monday, August 4, 2008

The warehouse manager is responsible for maintaining the data while it is in the data warehouse. This layer of software also creates and maintains the metadata that describes where everything resides within the data warehouse. The responsibilities of the warehouse manager are data movement, metadata management, performance monitoring and tuning and data archiving.

Given the size and complexity of most data warehouses it is important to automate as many data management operations as possible. The warehouse manager should be designed to deal with all the tasks mentioned above automatically. Extensive reporting capabilities are a must for the warehouse manager; in particular, it should be able to report any failures of data movement, such as aggregation creation failures. These facts need to be taken into account by the data warehouse manager and proper attention given to the main tasks so that aggregation creation failures are minimized to the best possible extent.