Pentaho Tools :

Pentaho C-Tools(CDE,CDF,CDA),Pentaho CE & EE Server,OLAP-Cubes,Analysis using Pivot4J, Saiku Analytics, Saiku Reporting, Ad-hoc Reporting using Interactive Reporting Tool,Dashboards,Reports using PRD, PDD,Data Integration using Kettle ETL,Data Mining usign WEKA,Integration of Servers with Databases,Mobile/iPad compatible Dashboards using Bootstrap Css,Drilldown dashboards,Interactive Dashboards

Tuesday, 17 March 2015

Pentaho Data Integration(CE): Basics-4 : Variables in Pentaho Data Integration (a.k.a Kettle ETL)

In this post you will find the basic understanding of variables in pentaho kettle ETL.

Ways to define variables in Kettle

  • Variables can be used throughout Pentaho Data Integration, including in transformation steps and job entries.
1) Variable can be defined as parameters(Named parameters)
2) Variables can be defined in Kettle properties file with key value paired syntax
3) Also in ,Previous job or transformation (if  transformation is going to be called from a job)
4) Variables section of the "Execute transformation window" or "Execute job" window when click on Run button.

How to use variables ?

    • Grabbing variables using the Get Variable step  OR
    • By specifying meta-data strings like 
      • ${VARIABLE} - Windows Style (OR)
      •  %%VARIABLE%% - Linux Style

 Refer below links for more information on variables 
1) http://wiki.pentaho.com/display/COM/Using+Variables+in+Kettle
2) http://wiki.pentaho.com/display/EAI/.07+Variables

In addition to above links, below information will be helpful to understand.

Variables in kettle.properties file with an example:

1) Define variables as key value pairs in kettle.properties file.
2) Location of the file in windows is : C:\Users\sadakar.p.SADAKAR_HYD\.kettle & the file name is kettle.properties
 Example : 

3) For example, You want to provide database connection details through variables. Define the configuration details as shown in above image in kettle.properties file.
Connecting to foodmart database of postgresql server
i.e., HOST=localhost
       DATABASE=foodmart
     USER_NAME=postgres

4) Now take a new transformation & get "Table Input" step from Input section category to the canvas
5) Double click to get it's properties and provide the above created variables in the respective placeholders.
6) Use Ctrl+Space button to get the list of all variables. Find the variables coming from kettle.properties and provide them in the text boxes.
6) Test the connection & see the magic..

Variables as parameters :

1) You can also use variables as parameters.
2) Example : select all the employees belonging to a particular country.. i.e., every time you pass a country dynamically you have to the results from employee table.
3) Take a new transformation and save it.. Press control+T to get the properties of transformation and define a parameter. For example : Define "VAR_COUNTRY" as a parameter and provide default value as "Mexico"
4) Now take a "Table input component" and give all the connections of postgres foodmart database and test it.
5) In query area write this select statement :
SELECT * FROMcustomer WHERE country like '${VAR_COUNTRY}'

NOTE : You should check "Replace Variables in Script"

6) Complete the transformation by adding a dummy step to the input step and have a preview.
7) select the dummy step to view it's output in "Execution Results" step. You will find the only "Mexico results "



8) If you pass other parameter value like "USA" when you run the transformation, you will all the result set for "USA" country. 

Variables section of "Execute transformation window" or "Execute job window" when click on Run button. 

Take the same example of above scenario with a small change.. 
Remove parameter from transformation settings and save it. Execute the transformation by clicking "Run button". In "Execute a transformation" window you can find the parameter used in the query. 
Provide the value for that parameter and test the output. 

the scenario is shown in below image.