Pentaho Tools :

Pentaho C-Tools(CDE,CDF,CDA),Pentaho CE & EE Server,OLAP-Cubes,Analysis using Pivot4J, Saiku Analytics, Saiku Reporting, Ad-hoc Reporting using Interactive Reporting Tool,Dashboards,Reports using PRD, PDD,Data Integration using Kettle ETL,Data Mining usign WEKA,Integration of Servers with Databases,Mobile/iPad compatible Dashboards using Bootstrap Css,Drilldown dashboards,Interactive Dashboards

Thursday 28 December 2017

Pentaho DWH BI POC - UNECE Analytics

Hi folks,

This is a document based POC article developed in early 2017 using Pentaho open source tools. The source data is taken from unece website for demonstration purpose. It can be found at http://w3.unece.org/PXWeb/en


Disclaimer: This article is strictly not a production ready one instead an experiment. It is to be used for educational purposes only, therefore following this approach may not suitable or work in your environment and the author or reviewers are not responsible for correctness, completeness or quality of the information provided. However, re-distributing the same content in any other sites is offensive and all rights reserved.

                                           

The sample architecture DWH life cycle from the POC is as follows

Tool set

ETL
Pentaho Data Integration (PDI)
pdi-ce-7.0.0.0-25
DWH
Kimball Star Schema
Kimball Star Schema
OLAP Analysis
Pentaho Schema Workbench(PSW)
 and Saiku Analytics
psw-ce-3.12.0.1-196
BA Server
Pentaho BI Server
7
Reporting
Pentaho Report Designer(PRD)
prd-ce-6.1.0.1-196
Dashboards
Pentaho Ctools
7
Source of Data
Excel files
MS-Office 2016
Source database
MySQL
5.6.25
Target database
MySQL
5.6.25
OS
Windows
10

Below is the architectural approach for the POC

1)Prepare source database from downloaded Excel files
    (staging of data is prepared from downloaded   Excel files and in the process of creating source     data base, the data will be profiled and cleansed)
2) Populate data mart by identifying dimensions and facts from source database.
3) Populate warehouse using PDI tool using incremental data load approach.
4) Create OLAP cubes for data analysis.
5) Data visualization using report and dashboard tools.

Download the full document and source code here : Click Me

I hope this helps someone in community who are beginners or who are in intermediate stage in DWH.

Thank you
Sadakar Pochampalli



3 comments:

  1. Great blog !It is best institute.Top Training institute In chennai
    http://chennaitraining.in/big-data-training-in-chennai/
    http://chennaitraining.in/ccna-training-in-chennai/
    http://chennaitraining.in/dot-net-training-in-chennai/
    http://chennaitraining.in/hadoop-training-in-chennai/
    http://chennaitraining.in/informatica-training-in-chennai/
    http://chennaitraining.in/pmp-training-in-chennai/

    ReplyDelete
  2. Such a nice blog with the attractive reference links which give the basic ideas on the topic.
    Data Science and Data Analytics
    Statistics For Business Analytics

    ReplyDelete
  3. Good Post. I like your blog. Thanks for Sharing

    machine learning course

    ReplyDelete