Pentaho Tools :

Pentaho C-Tools(CDE,CDF,CDA),Pentaho CE & EE Server,OLAP-Cubes,Analysis using Pivot4J, Saiku Analytics, Saiku Reporting, Ad-hoc Reporting using Interactive Reporting Tool,Dashboards,Reports using PRD, PDD,Data Integration using Kettle ETL,Data Mining usign WEKA,Integration of Servers with Databases,Mobile/iPad compatible Dashboards using Bootstrap Css,Drilldown dashboards,Interactive Dashboards

Thursday 25 September 2014

Error while saving & running the transformations/jobs in Kettle : Unexpected problem reading shared objects from XML file : null


What are shared objects in Kettle ?

The below pentaho wiki & info center gives the brief of the Shared object in Kettle. 

http://infocenter.pentaho.com/help/index.jsp?topic=%2Fpdi_user_guide%2Fconcept_pdi_usr_changing_the_pdi_home_dir.html

http://wiki.pentaho.com/display/EAI/.17+Shared+Objects



From the Menu bar


1) Click on Edit
2) Click on "Edit the kettle.properties file" to open it's properties.
3) Give value=50 for KETTLE_SHARED_OBJECTS variable
   i.e., KETTLE_SHARED_OBJECTS=50
4) Go back to the console and try saving the job and run it, now you will not get the error. 

Problem Image :



Solution Image :


Reference : 

http://forums.pentaho.com/showthread.php?93473-Unexpected-problem-reading-shared-objects-from-XML-file-null

Kettle : Stream Lookup Step explained with a Sample Transformation in Pentaho Kettle - Pentaho Data Integration

Hi Guys,

This simple transformation explains the "Stream Look Up" step example in Pentaho Kettle as part of my learning & documenting here for community developers.

In next articles I'll be sharing more interesting topics with End to End examples.

Example developed on : 
5.1.0 Kettle
Reference :
 D:\pdi-ce-5.1.0.0-752\data-integration\samples\transformations\Stream lookup - basics.ktr

Scenario : 
There are 2 tables lets say Employee and Department and the data is as follows in both of the tables.
There is a common field lets say DeptID_Emp and DeptID_Dept in both the tables. 

Lookup the DeptID_Emp field with DeptID_Dept and get the records of all employees with department names.

Employee Table:
EmpID    Name    DeptID_Emp
110    Sadakar    10
111    Hasini    10
112    Dolly    20
113    Kutti    20
114    Jikky    30

Department Table:
DeptID_Dept    DeptName
 10    Mathematics
 20    Computers

1. Drag and drop Data Grid step (Input->Data Grid) and Meta(Column names) and Data(Insert the employee data shown above) name it as "Employee".

2. Drag and drop Data Grid step (Input->Data Grid) and double click the step to open it's properties.
    Give meta data (i.e., column names with length) Insert the data shown in Department table.

3. Drag and drop Stream Lookup step (Look Up -> Stream Lookup).
   Connect Employee -> Stream Lookup and Department -> Stream Lookup
and open the properties of Stream Value Lookup.
 As shown in below figure set the configuration.

Lookup Step : Department
Filed : DeptID_Emp ( Lookup field from the source stream).
LookupFiled : DeptID_Dept (lookup the DeptID_Emp field with DeptID_Dept).

i.e., matching the values of the fields ( Internally it'll compared with = operator).

Get Fields : If you click on it, it will fetch all the fields from source stream (here it is Employee table) on.
Get Lookup fields: It'll fetch all the rows from lookup file(here it is Departmet table).

We can also provide default values to the fields that are coming from lookup file/table. 


4. Connect "Stream lookup" to a dummy step (Flow->Dummy) and have a preview.(right click on the dummy step and see the preview).

What the Stream Lookup do here in the transformation is : It'll look for a match from the source stream. If the values matches from the lookup file then associated department name is added to the dummy step else it'll take NULL value.

If we add select values step we can get the desired fields with new names.

Output is shown below image.

The sample transformation shown in below image :



Download the .ktr file here : Click me

:-)






Wednesday 17 September 2014

Migrating data from Oracle database to Postgresql database using Pentaho Kettle


This post will talk about migrating data from Oracle database to Postgresql database.

The same can be doable from one database to another database.(i.e., for example MS-SQL server to postgreSQL).

Pentaho has it's in built wizard tools to Migrate data from one database to another another database.

Find the below steps how we can use this in-buit tool in migrating.

1) Create source database connection and target database connection.

i.e., for example oralce connection as source and postgresql connection as target.

Go to Tools -> Wizard -> Create database connection.

Repeat the same for postgresql connection.

2) Go to Tools -> Wizard ->Copy Tables . Find the below images.





3) It'll create a job and “N” number of transformations based upon the number of tables in oracle database.

4) Run your job.. That's all we have done.

Check the description & content of the tables in postgres.


References : 

https://wiki.postgresql.org/wiki/Migrating_from_one_database_to_another_with_Pentaho_ETL


http://wiki.pentaho.com/display/COM/Using+the+Copy+Table+Wizard

Tuesday 16 September 2014

Toggle between two Divs for Pentaho CDE Charts

This post will cover the toggling among the charts.

i.e., Share one place holder for multiple charts by providing a link or a button.

Example developed on :

C-Tools of 14.07.29, foodmart of postgresql, pentaho 5.0.1 CE stable.

Step 1 :  Layout section.
1) Save the dashboard in bootstrap mode.
2) Row ->Column->Html

In Html write below code :

<p><a href="#" id="link">Show B</a></p>
<div id="a"></div>
<div id="b"></div>


<script type="text/javascript">
var $divA = $('#a'),
    $divB = $('#b'),
    $link = $('#link');

// Initialize everything
$link.text( 'Pie' );
$divA.hide();

$link.click(function(){

  // If A is visible when the link is clicked
  // you need to hide A and show B
  if( $divA.is( ':visible' ) ){
    $link.text( 'Pie' );
    $divA.hide();
    $divB.show();
  } else {
    $link.text( 'Bar' );
    $divA.show();
    $divB.hide();
  }

  return false;
});
</script>
 

Step 2: Data Sources section

1) Give all the connection details.

Name : query1 
 URL : jdbc:postgresql://localhost:5432/foodmart
 Driver : org.postgresql.Driver
Username/Password : postgres/postgres
Query :

SELECT * FROM
(
SELECT
        DISTINCT brand_name AS "Brand Name",
       SUM(unit_sales) AS "Sales"
FROM product p
INNER JOIN sales_fact_1997 sf7
ON p.product_id=sf7.product_id
INNER JOIN time_by_day t
ON sf7.time_id=t.time_id
WHERE
(
to_char(t.the_date,'YYYY-MM-DD')>='2012-01-01'
AND
to_char(t.the_date,'YYYY-MM-DD')<='2012-01-07'
)
GROUP BY "Brand Name"
ORDER BY SUM(unit_sales) DESC
limit 5
)table1

Step 3 : Components section
1) Take  pie  chart & bar chart.
2) Set all the properties like name, htmlObject (for pie take "a" as htmlObject and for bar chart "b" as htmlObject).
3) Save your dashboard and see the preview.


Sample output:

Image 1 : Pie Chart.  



 Image 2 : Perform click action on Pie you will get bar chart in place of pie chart that means you are toggling b/w two chart (i.e., Single place holder is sharing by two charts and the logic of two divs applied for CDE charts).



Download example here : Click Me

References :
1) http://jsfiddle.net/QAxgD/
2) http://stackoverflow.com/questions/18110320/toggle-between-two-divs
3) http://forums.pentaho.com/showthread.php?170449-How-To-Toaggle-Between-Chart-amp-Grid



Friday 12 September 2014

Show pecentage of stacks along with Value of a Stacked Bar Chart in Pentaho CDE

This is a useful tip from Leo on Pentaho Forum,

For your Bar Chart set below properties.

Stacked =True
valuesVisible= True
valueMask = {value}{(value.percent)}

To popup the value of a stack or percentage of a stack you need to write below code in clickAction and make clickable is true.

function(scene) {
    var pctVar = scene.vars.value.percent;
    
    alert(pctVar.label);
}

 

Sample output Tested :



Query output should be like this for a stacked bar chart : 


Refer below link for more information : 

http://forums.pentaho.com/showthread.php?170389-Show-values-as-percentage-on-stacked-bar-chart

http://jsfiddle.net/duarteleao/e2Qfd/



Thursday 11 September 2014

Any chart responsive code in PreExecution in Pentaho CDE



function f(){
var myself = this;
  // Set initial width to match the placeholder
  myself.chartDefinition.width = myself.placeholder().width();

  // Attach the resize handler only on the first execution of the chart component
  if (!this.resizeHandlerAttached){

    // Ensure render is only triggered after resize events have stopped
    var debouncedResize = _.debounce(function(){

      // Show chart again.
      myself.placeholder().children().css('visibility','visible');

      // Change chart width
      myself.chartDefinition.width = myself.placeholder().width();
      myself.render( myself.query.lastResults() );
    }, 200);

    // Attach resize handler
    $(window).resize(function(){

      // Only trigger resize if the container has changed width
      if ( myself.chartDefinition.width != myself.placeholder().width()){

        // Temporarily hide chart so that overflow does not happen
        myself.placeholder().children().css('visibility','hidden');

        // Trigger the resize with debounce
        debouncedResize();
      }    
    });

    this.resizeHandlerAttached = true;
  }
  
}

Tuesday 9 September 2014

Dynamic Dashboard Example using MongoDB - Parameterised Dashboard Example in pentaho CDE with mongoDB


This post will cover below topics

MongoDB:

1. Creating mongoDB collection from flat file data.
2. Test the Collection Content

PDI(Kettle)

3. Querying the mongoDB collection in Kettle to fetch the data.
   ( Designing Transformation to get the required result set).

Pentaho CDE:

4. Creating a dynamic Pie Chart using in CDE with Kettle transformations as data sources.


Software Ready :

1) MongoDB 2.6 CE
2) PDI (Kettle) - 5.0.1- stable CE
3) Pentaho BA Server - 5.0.01- stable  CE
4) C-Tools -14.07.29 - Bootstrap supported .
5) Browser - Google Chrome / Mozilla Firefox


Example : 

Show the top counts of NextURL field for the given URL's.
count, NextURL and URL are fields in the collection.

Categories : NextURL, Measure: count and parameter is : URL.


I've written basic of this article in my previous post which is a static report. Click here for the similar article without parameters.


MongoDB:

1) Download and install mongo DB : Click here
2) Create a Collection called "PageSuccession"  in Demo Database : Click here for collection creation with mongoDB
3) Test the collection whether the content exists or not. 
  >db.PageSuccession.find().pretty() 

PDI(Kettle)

Write the transformation in Kettle as shown in below figure. 

 Let's say the transformation name is : DashboardWithMongoDBParameters.ktr
This transformation will give you the below result set as per the example requirement.
(Top 10 Next URL's with count and url as parameter in transformation).
NextURL    Count
/demo    26205
/feeds/press    19601
/home    7369
/download    6419
/products    5164
/product/product2    4098
/product/product3    4047
/product/product4    3607
/product/product6    2759
/product/product1    2270
NOTE : default parameter is given to get the above result set. (Default parameter value is : --firstpage--)
At the time of job/step execution, if we give different value for parameter we will get data related to that parameter..
Let us say our parameter(Described in next steps how to create) is "param_url" and now the value given is : /about and the output will be differ.

NextURL    Count
/about    593
/team    111
/contact    108
/about/customers    55
/product/product2    47
/product/product12    35
/product/product3    33
/news    32
/products    26
/download    24

Quick Image for better understanding:

 MongoDB Input:
Configure Connection :  Host : localhost or give the ip address of mongoDB installed  
                                            machine. 27017 is the port number for mongoDB.
 Query                            :  { url : "${param_url}" } where param_url is the parameter defined.
Fields                              :  check the single output jSON field. 

JSON input:
Always use Json input with mongoDB input. B'z we write json expressions on mongoDB documents. Direct way of mongoDB querying is not allowed.
Sample Rows: This will limit the number of rows from the previous step. It'll take range values. For example : 1..10 or 1..20

Click the get fields

Creating parameter for transformation
1) Double click on the canvas to get the transformation properties. 
2) Click on Parameters Tab and define a parameter called "param_url" with --firstpage-- as it's default value. 
3) On run time you can give different value for param_url parameter.
4) Value of the parameter will be replaced in the Query written in MongoDB input and gives you the required result set. 

Pentaho CDE

 Now let's jump into Dashboard creation  with Pentaho CDE.

1) Prepare layout to keep parameters and pie chart. 
2) Data sources section.

Parameter is url field values from collection. So to feed the selection we again need to write a simple transformation which fetch the field values of url.

Sample transformation can be designed as shown below figure to get only url field.

Let's say the transformation name is : DashboardWithMongoDBParameters2.ktr
The above transformation will give the below output( part of the output is shown in next lines).
URL
--firstpage--
/about
/about/awards
/about/customers
/ad/easy
/ad/lead
/ad/save
/ad/survey
/ad/training
/analyst_perspective
/buy
/careers
/careers/account
/careers/capetown
/careers/engineer
/careers/frankfurt
/careers/london
/careers/manager
/careers/munich
/careers/paris
/careers/sales
/careers/syndey
/confirmed/consulting
/confirmed/contact
/confirmed/demo
/confirmed/sales
/confirmed/thankyou
/contact
/customer
/demo
/docs
/docs/doc1.pdf
/docs/doc10.pdf
/docs/doc11.pdf
/docs/doc12.pdf
crate a simple parameter in CDE and name it as param_url and also create a selection and give parameter & listener as param_url for it and give place holder for it (i.e., html Object).

Go to Data sources  in CDE and create two Kettel Queries 
one for getting result to plot the data on pie chart 
another to feed the input selection. 
For the pie chart query give 
locate the transformation uploaded to the folder in the server and then 
Variables : param_url , param_url as arg and value.
Parameters : param_url & param_url as arg and value. 
Kettle Step Name : Sort rows 2 (the step which gives the result set for chart). 


For input query give : locate the transformation file uploaded in folder and give the step name ( for this : Select values)
Check the output of queries using CDA editor.. 
Save the dashboard and preview it..
Preview 1 : With --firstpage-- input value for URL

Preview 2 : With /carres/paris input value for URL
Download the example : Click Me..!!!


Readers of this post encouraged to add your suggestions , feed back & additions to this post. drop your comments 
Sadakar
BI developer 

 

 











Wednesday 3 September 2014

BarChart Example in Pentaho CDE using mongoDB(NoSQL database)


This articles is for beginners in MongoDB+Pentaho Kettle+Pentaho CDE.

Aim of this post is : 
Creating a Bar Chart using MongoDB in Pentaho CDE (A static bar chart, not parametrized) 

  • There is no direct way to connect to mongoDB in pentaho CDE.
  • Using “Kettle Queries” option in pentaho CDE we can get required result sets for visualization.
Let's assume you have a collection called “PageSuccessions” in “Demo” database in mongoDB server.

You can refer below articles from pentaho WIKI to Write, Read, creating Reports in PRD using mongoDB.

http://wiki.pentaho.com/display/BAD/MongoDB

Let's assume you have below documents in “PageSuccessions” collection.

MongoDB Preparation for this example is from :
http://wiki.pentaho.com/display/BAD/Write+Data+To+MongoDB

In mongo shell let's have a look at what is there in “PageSuccessions” by giving below command.

> db.PageSuccessions.find().pretty();
{
    "_id" : ObjectId("5404609c5ae882923576f854"),
    "key" : "--firstpage--~^~/about",
    "url" : "--firstpage--",
    "nextUrl" : "/about",
    "Count" : NumberLong(504)
}
{
    "_id" : ObjectId("5404609c5ae882923576f855"),
    "key" : "--firstpage--~^~/about/awards",
    "url" : "--firstpage--",
    "nextUrl" : "/about/awards",
    "Count" : NumberLong(80)
}
{
    "_id" : ObjectId("5404609c5ae882923576f856"),
    "key" : "--firstpage--~^~/about/customers",
    "url" : "--firstpage--",
    "nextUrl" : "/about/customers",
    "Count" : NumberLong(667)
}


Creating Bar Chart – Top 10 url pages

Getting top 10 documents from Kettle.

As shown in figure-1 in you need to write kettle transformation to fetch the top 10 urls.

  • As of 5.0.1 Pentaho kettle release, we can not directly query on mongoDB.
  • We'll take JSON step to get the fields. 
  • MongoDB input component query area supports only JSON query expressions which you can find at http://wiki.pentaho.com/display/EAI/MongoDB+Input
  • From the ETL design at final step you should get the result set as shown below.
Download the ktr file and try to open each step.

URL            Count  
/demo            114747  
--firstpage--            108143  
/download            21583  
/feeds/press            15378  
/products            14686  
/product/product3            14380  
/partners/resell            14316  
/download/download3.zip    13939  
/product/product4            13334  
/buy            10293

Creating Dashboard in Pentaho CDE( A simple bar chart using above query result set)

1) Prepare your layout in bootstrap mode of Dashboard. (Find it in settings).
2) Take a BarChaart component from the Components section.
3) In Data sources section take “KETTLE Queries”.(Upload the ktr file to your dashboard folder using Pentaho User Console).
4) In “Kettle Tran formation file” property locate the uploated ktr file.(for eg: DashboardWithMongoDB.ktr).
5) Give the step name which is giving the top 10 rows(for example : Select values).
6) Check the output of this query usign CDA editor.
7) Go back to the Bar chart and set height, width, data source (the name of data source created in above steps.. for ex: query1 is the name of the kettle data source).

8) Save the dashboard & Preview it. Sample output is shown in figure-2.

IMP NOTE : As of 5.0.1 pentaho release JSON query expressions will give you the result sets  from monogoDB. (Which you can find at mongoDB Input step from BigData Node).



Transformation Sample Design Image:


Bar Chart Sample output Image :

Download Transformation : Click-1
Download Sample Bar Char : Click-2

References : 

1) http://wiki.pentaho.com/display/BAD/MongoDB
2)  http://wiki.pentaho.com/display/EAI/MongoDB+Input#MongoDBInput-queryexamples

Next post : Parametrized dashboard using MongoDB.


Sadakar Pochampalli
:-)