-->

Monday, 29 January 2007

A complete fact capture job

Hi all,
Here is a job used to capture fact data on a daily basis.
Data is coming from a source table where :
- time is stored in columns, 1 column for 1 day = 31 columns for a month, from 1 to 31,
- indicators are stored on rows : an indicator code for each indicator type.
This source structure is not mine, the challenge to integrate and manage, is ;)

Here is the DDL of source table :
HOTEL_CODE,
INDICATOR_CODE,
YEAR,
MONTH,
ACTIVITY_TYPE,
CONCEPT_CODE,
DAY1
...
DAY 31

Here is the DDL of my destination table :
ID,
HOTEL_CODE,
TIME_KEY,
YEAR,
MONTH,
INDICATOR1
...
INDICATOR12
TIMESTAMP

The job begins with a GetVariable job to capture new timestamp for new data from a previous job used to calculate the gap.
Then a first normalization puts data from "31 columns for a month" to one column named "Duration" (coz we deal duration data). At this moment, I chose to store an intermediate level of data in a temporary table, for performance purpose. Then, you can see a fork to load two types of tables :
- an agregated one, with denormalized data for indicators (each code indicator, we have 12, becomes a column),
- normalized one, used to print very specific reporting, back on the intranet application.

Job is working great in prod environment. The SORT operation is a bit slow, but nothing really dangerous for our timing.

For informations, please feel free to contact me : mailto:vteyssier@decisionsystems-studio.fr








No comments: