Hitachi Vantara Pentaho - BI Suite Tutorials: August 2014

Monday, 11 August 2014

My first learning example in Kettle - An example is reproduced from Pentaho Tutorials Topic : Filtering Rows

Hi Guys,

Here is my first learning experience with Kettle Community ETL.

Download Kettle from : Click Me

Sources to get start with Kettle :

1) http://wiki.pentaho.com/display/EAI/Getting+Started
2) http://localhost:8080/pentaho/docs/InformationMap.html
3) http://wiki.pentaho.com/display/EAI/Pentaho+Data+Integration+%28Kettle%29+Tutorial

You should have installed java in your local machine and path, java_home, jre_home set for it.

Below topic is a re-production of Kettle Transformation & Job example from Pentaho Tutorials and the description is slightly differ from the actual description.

Let's look at simple basics and then we'll jump into 1st example.

# What is Transformation in Kettle ?
# What are jobs in Kettle ?
# Core difference b/w Transformation & Jobs in Kettle ?
# Extensions for transformations & jobs in Kettle ?
# What are steps & hops ?

1) What is the usage of Transformation ?

Transformations are used to describe the data flows for ETL such as reading from a source, transforming data and
loading it into a target location.

2) Jobs

Jobs are used to coordinate ETL activities such as

Defining the flow and dependencies for what order transformations should be run
Preparing for execution by checking conditions such as, "Is my source file available?," or "Does a table exist?"
Performing bulk load database operations
File Management such as posting or retrieving files using FTP, copying files and deleting files
Sending success or failure notifications through email

3)What's the difference between transformations and jobs?

Transformations are about moving and transforming rows from source to target. Jobs are more about high level flow control: executing transformations, sending mails on failure, ftp'ing files

4) Extensions for transformations & jobs in Kettle ?
Transformation : .ktr
Jobs : .kjb

5) What are steps & hops ?

A transformation is a network of logical tasks called steps.
Transformations are essentially data flows.
The transformation is, in essence, a directed graph of a logical set of data transformation configurations.

Steps are the building blocks of a transformation,
for example a text file input or a table output.
There are over 140 steps available in Pentaho Data Integration and they are grouped according to function; for example, input, output, scripting, and so on.
Each step in a transformation is designed to perform a specific task, such as reading data from a flat file, filtering rows, and logging to a database
Steps can be configured to perform the tasks you require.
Hops are data pathways that connect steps together and allow schema metadata to pass from one step to another.
Hops determine the flow of data through the steps not necessarily the sequence in which they run.
When you run a transformation, each step starts up in its own thread and pushes and passes data.

More info at : Click Me

TOPIC :
Load sales data into a database. Several of the customer records are missing postal codes(zip codes) that must be resolved before loading into the database.

You will be given two .csv files to load the data into database.
The first file is : sales_data.csv and later one is Zipssortedbycitystate.csv

( Location of the files : /home/sada/softwares installed/Pentaho/data-integration/samples/transformations/files
)

In short , there is a column called "postalcode" in sales_data.csv with missing postal codes(i.e., the values are null) and need to fill the missed the missed postal codes using "postalcode" column of Zipsortedbycitystate.csv file.

Explanation :
The Final Transformation looks as shown in below :

The final job looks as shown in below(doing job is optional for this topic)

Transformation is divided into below sub tasks.
1) Retrieve Data From Flat File ( i.e., Retrieve Data From "sales_data.csv")
2) Filtering the records ( from the source file get only not null valued rows for "postal code").
3) Load to into Relational database ( Total Number of rows in this 2747)
4) Retrieve Data From your look-up file.
5) Resolve Missing zip information.

Download the example : Click Me

NOTE : While running the .ktr file (or .kjb file) in your environment you should specify the flat files(.csv files) location as per your folder structure.

#: I've taken postgresql as output table.

Thank you.

Sadakar
BI developer.
(Pentaho/Jasper/Talend/Kettle).

:-)

Friday, 8 August 2014

OLTP Vs OLAP , differences

Best source I found for quicker understanding..

http://datawarehouse4u.info/OLTP-vs-OLAP.html

:-)

Thursday, 7 August 2014

RICH Bar Chart ( Stacked Bar + Moving Average Trend Lines + Line Chart) Example in Pentaho CDE - Submitt Button Component Explained on CDE dashboard

Hi Guys,

Trend Lines: WIKI
# A trend line is formed when a diagonal line can be drawn between two or more price pivot points.

# They are commonly used to judge entry and exit investment timing when trading securities.It can also be referred to as a dutch line as it was first used in Holland.

# More info at : http://en.wikipedia.org/wiki/Trend_line_%28technical_analysis%29

I have tested 3 kinds of trend lines support with Community Dashboard Editor.They are
i) linear :
ii) moving-average and
iii) weighted-moving-average.

For more info about types refere this : http://www.ehow.com/list_7255661_types-trend-lines.html

Example Developed on :
C-Tools of 14.06.18 version, foodmart database of postgresql(A jasper server sample database which follows star schema model).

Focused on Core part of the Example in this post.

Sample Query:

SELECT
    to_char(t.the_date,'mm-DD-YYYY') Date,
    SUM(sf7.unit_sales) UnitSales,
    SUM(sf7.store_sales) StoreSales,
    SUM(sf7.store_cost) StoreCost
FROM time_by_day t INNER JOIN sales_fact_1997 sf7 ON t.time_id=sf7.time_id
WHERE
    to_char(t.the_date,'YYYY-MM-DD')>='2012-01-01'
    AND
    to_char(t.the_date,'YYYY-MM-DD')<='2012-01-07'
GROUP BY t.the_date
ORDER BY t.the_date

Sample output:

date           unitsales    storesales    storecost
01-01-2012    348.0000    706.3400    280.4990
01-02-2012    635.0000    1304.5300    525.8396
01-03-2012    589.0000    1294.1200    515.2609
01-04-2012    20.0000            42.8700            17.6873
01-05-2012    966.0000    1987.1900    809.6743
01-06-2012    993.0000    2162.3400    864.9502
01-07-2012    1265.0000    2696.6100    1078.2984

Note that column indexes start from 0 and ends with n-1 i.e., here it takes 0,1,2,3
Converting above resultsetinto below points on the chart(RICH BAR CHART)
X-axis : data values
Left Side Y-axis : Stacked bars with UnitSales & StoreCost
Right Side Y-axis : StoreSales Line Chart.
Moving average Trend lines for : Left Side Y-axis measures(i.e., to UnitSales & StoreCost).

On the chart component you need to give below properties.

For Line Chart :
Plot2 : True
Plot2ColorAxis:2
Plot2Series:storesales (making 3rd column as line chart - i.e, 2nd indexed column)
PlotFrameVisible: false

For Stacked :
stacked :true

For TrendLines of unitsales & storecost:
trendType: moving-average
NOTE : other trendTypes are linear and weighted-moving-average

(NOTE :
1) I've focused on major properties in this post.. download the example and debugg for other properties set. for instance, extension points and colors.
2) Also, I have not discussed about parameterisation of this example in this post.. you can find how from_date & to_date implemented on example by downloading.
)

How submit button works ?

NOTES:
1. For the 1st time when dashboard loads, all the components on the dashboard should take default values.

2. Next time, when you select other inputs on the dashobard, the components have to stop it's loading for each of the input selection.(the blinking).

3. After selecting new inputs & after clicking "Submit" button only you the components on the dashboard should load.

How one can accomplish this on CDE Dashboards....
1) Let's assume you have two date parameters (of Generic -> Date Parameters).
    param1_FromDate & param2_ToDate

2) Let's say the corresponding Date Picker Selects (of selects -> Date input components).
    select1_FromDate & select2_ToDate

3) Usually we set listeners for these Date selects.. Do not do this (i.e., don't make your Date selects to listen the param1_FromDate & param2_ToDate parameters). But give the Parameters to the Date selects.

4) Let us take two more simple parameters .. Let's say param1 & param2 ( Generic -> Simple parameter).

5) Now on your chart component
   Give parameters as : param1_FromDate & param2_ToDate
   Give Listenrs as : param1 and param2

6) Take a button component from Others of Components section and give place holder for it(i.e., htmlObject to keep this button).
   and give Label name as : Submit

7) Write below code in the Expression area of button component.

Code:
function f(){

    Dashboards.fireChange('param1',param1_FromDate);
    Dashboards.fireChange('param2',param2_ToDate);

    alert("param1_FromDate="+param1_FromDate);
    alert("param1_ToDate="+param2_ToDate);
    alert("param1="+param1)
    alert("param2="+param2)


}

8) Alerts for testing purpose. In the above code capturing the acual date parameters values into general parameters.

9) 3 & 5 points after this code will work (i.e., when previewing the dashboard)..

Download a sample Example here :

Click Me

Sample output

Sadakar Pochampalli
BI developer.

Navigation Menu Component Explained in Pentaho CDE

Hi Guys,

The below points explains you how Navigation Menu Component is useful for Dashboards.

NOTES from Plugins Documentation :

1) Navigation Menu component generate a menu.
2) It allows the user to navigate through the solution folders.
3) When a template is not present in the selected folder, the default-dashboard-template is presented. (see template documentation)
4) The menu items are build dynamically using the index.properties present in each solution folders.
5) To hide undesired folders edit index.properties and set the property "visible" to false.
Options

All the time client may not wish to go back to the Repository for dashboard solutions & check out.. instead this component makes him/her job easy.

How to implement ?

1) Layout section : Row ->Column(Col1[htmlObject]).
2) Components sections : Others-> Navigation Menu Component.
3) Give the basic properties to it.

Name : NavigationMenuComponent
htmlObject : Col1

Sample output:

Observations :
1) This output is with blueprint mode of dashobard.
2) Observed a small issue with bluepint mode. i.e., there is slight less width each folder name.

:-)

Dual Level Pie Chart in Pentaho CDE - Example Explained

Hi Guys,

This post will tech you how to create Dual level pie chart.

Example developed Environment :

1) C-Tools of 14.07.29 version on 5.1.0 Server.
2) Foodmart database of Postgresql(A jasper server example database).

Step 1 : LayOut section

1) Row ->Column("Col1" is the htmlObject-> Bootstrap Panel
2)
Bootstrap Panel:
Name : Panel2, Corners: Simple & Panel Style : Primary
Panel Header :
Name: Panel2_Header, Corners: Simple & Text Align: Center
Add html : HTML: <b>Dual Level Pie Chart Example in Pentaho CDE</b> , Font size: 18
Panel Footer : Remove it.

Step 2 : Data sources section
1) Give all the properties
Name : query2
Driver : org.postgresql.Driver
UserName/Password: postgres/postgres
URL : jdbc:postgresql://localhost:5432/foodmart
Query :
SELECT
        distinct
                gender,
                member_card AS card,
                sum(total_children) AS TotalChildern,
                sum(num_children_at_home) AS childernathome
FROm customer
group by gender,member_card
order by gender,member_card

Query Sample Output:
gender    card    totalchildern    childernathome
F    Bronze    6626        1278
F    Golden    2012        1529
F    Normal    3025        1048
F    Silver    1196        377
M    Bronze    6570        1288
M    Golden    2029        1511
M    Normal    3062        945
M    Silver    1210        399

3) Components Section
1) From Charts ->Select Pie Chart
2) Give all the required properties
Few of the major properties are:
Name : DualPieChart2
Title: Total Children Vs Children At Home
Datasource : query2
Colors : #5F9EA0,#6495ED,#006400,#483D8B
Html Object : Panel2_Body
PreExecution:

function f() {
    $.extend(this.chartDefinition, {

     // Data source
    crosstabMode: false,
    readers: ['gender, card, childernathome, TotalChildern'],

    // Data
    dimensions: {
        // Dimension bound to "dataPart" is hidden by default
        gender: {isHidden: false},
        // Sort brands
        card: {comparer: def.ascending},
        // Notice the currency sign and the /1000 scale factor (the comma beside the dot).
        //sales: {valueType: Number, format: "¤#,0,.0K"}
    },

    // Visual Roles
    visualRoles: {
        // Chart
        dataPart: 'gender',

        // Main pLot
        value:    'TotalChildern',
        category: 'card'
    },

    // Plots
    plots: [
        {
            // Main plot (outer)
            name: 'main',
            dataPart: 'F',
            valuesLabelStyle: 'inside',
            valuesOptimizeLegibility: true,
            slice_innerRadiusEx: '60%',
            slice_strokeStyle:   'white'
        },
        {
            // Second plot (inner)
            name: 'inner',
            type: 'pie',
            dataPart: 'M',
            valuesLabelStyle: 'inside',
            valuesOptimizeLegibility: true,
            slice_strokeStyle: 'white',
            slice_outerRadius: function() {
                return 0.5 * this.delegate(); // 50%
            }
        }


        ]
    });
}
Save your dashobard and preview it in New Window.

Core Part :
Explained at this thread in forum :
http://forums.pentaho.com/showthread.php?167447-CCC-Charts-Outer-Ring-to-donut-pie-%28Inner-ring-to-donut-pie%29

NOTES :
1) For Dual level pie chart your query result set should be categorised as shown in sample result set.
2) In the PreExecution code, in plots sections you are dividing it to two plots.
3) In the same way, we can also implement tri level pie charts, double dual level pie charts by dividing the plots(Better option
is going with Sunbrust chart in this case).

Reference :
http://www.webdetails.pt/ctools/ccc.html

Why to wait ??? Download Example here : Click Me (Note that to run all my blog examples you should have foodmart database running on postgresql server).

Sample output 1:

Sample output 2:

Mobile Output Sample :

:-) For demo's on free open source CDE dashboards contact me at my mail ID:-)

Tuesday, 5 August 2014

New Mobile based Product Sales Dashboard 2 Example in Pentaho CDE

Hi Guys,

New Mobile Product Sales Dashboard Example in Pentaho CDE

High lights of the Dashboard:
1) Mobile Compatability.
2) Visualize high level decision making numeric values.
3) Visualize single percentage value using ivy gauge components.
4) Navigation to Paranet Dashobard & Next Dashboard.
5) Dynamic with start date & end date parameters.
6) Export chart to image.
7) HTML symbols for Panel headings (Quick glance instead textual info).
8) Dashobard background.

Contact me for demo's :-)

Download this New Mobile Product Sales Dashboard-1
New Mobile Product Sales Dashboard-2
New Mobile Product Sales Dashboard-3

Thank you.
Sadakar
BI developer.

New Features in 14.06.18 & 14.07.29 - Quick Introduction of Bootstrap Column Properties & Bootstrap Panlel in LayOut section of Pentaho CDE

Hi Guys,

Guys, who work on Pentaho C-Tools should be up to date yourself on it's new releases. The latest stable version of C-Tools in this week is : 14.07.29. (blog written date : 4th Aug 2014)

I don't remember exactly, but Pedro somewhere on social networking has informed the community that the C-Tools are up-datable over the time for every 6 weeks.

This post will cover my sample work out on 14.06 & 14.07 releases especially about the column properties & built in bootstrap panel.

These new features will reduce writing Bootstrap html code in HTML section either for rows or columns & increase the development speed of project.

We get information like below for columns with bootstrap latest releases..

Features Bootstrap 3 Grid System	Extra small devices Phones (<768px)	Small devices Tablets (≥768px)	Medium devices Desktops (≥992px)	Large devices Desktops (≥1200px)
Max container width	None (auto)	`750px`	`970px`	`1170px`
Class prefix	`.col-xs-`	`.col-sm-`	`.col-md-`	`.col-lg-`
Max column width	`Auto`	`~62px`	`~81px`	`~97px`
Gutter width	`15px` on each side of a column (i.e. `30px`)

Bootstrap is of 12 columns based grid system ..and the above table implementation in CDE is similar to below
i.e.,
Large Devices =3
Medium Devices=6
Small Devices=9 or 12
Extra Small Devices=12

14.07 added bootstrap panel concept in it's Layout section & it can be implemented as shown in below image.

Notes:
# Find Bootstrap Panel icon on the Layout Structure tool bar.
# Click on it (for a column) and it'll be having 3 sections. They are Header, Body & Footer.
# Give all the properties(for instance, name, height, corners and etc).
# Similarly for the Body & Footer.
# You can remove any of the 3 sections for Bootstrap Panel.. highlight the section & click on X mark on the tool bar of Layout structure.

Sample Layout Design Download : Click Me
Deployment :
1) Your environment should be of 14.07.29 version of C-Tools or later before deploying this example.
2) Pentaho Server version : 5.x version.
3) Upload example using Uploader utility of server.

Sample output:
Extra Large Devices :