Datawarehousing consists of three main area :
1. ETL(data migration, data cleansing, data scrubbing, data loading )
2. Datawarehouse design
3. Business Intelligence (BI) Reporting infrastructure.
Read These Two part article for BI
Design : Now Coming to part 2 (is generally work of Datawarehouse architect)
Read Some details More would be covered in future articles
Part 1: ETL Engineer:
Most common task of ETL (Extract data- Transform data -Load to target).
Most Common ETL Tool being
Independent Tool: Informatica, IBM data stage, Ab-initio, Terradata ETL utilities,
Tool within ERP: SAP BIW ABAP based transformations , LSMW, peoplesoft EPM (internally uses other tools though).
Tool within Databases: Oracle SQL loader, Teradata ETL utilities,(Tricle pump, multi-load, fast load),
Microsoft BI Stack with SQL server had : SSIS SQL server integration services.
Cloud based Tool: Apache Hadoop Hive datawarehouse ( here requirement is different from un-structured realtime data analysis.
First data modelling have to completed to have level of granularity to represent requirements of business key drivers.
Once datawarehouse Structure in completed to ascertain level of granularity required.
The data loading Cycle Starts With:
Extraction from desperate data sources in Stagging area
On Stagging area data is cleansed.
Then data Transformation are applied for
Example in Informatica and SSIS:
Two Sets of documents are There LLD and HLD to look at what needs transformation to be applied.
Like in Informatica Transformation Types are :
Look at all transformations available in Informatica version 9
These can be customized according to logic required.
Next step is Loading to datawarehouse dimension tables and then to Fact table.