Pentaho Tools :

Pentaho C-Tools(CDE,CDF,CDA),Pentaho CE & EE Server,OLAP-Cubes,Analysis using Pivot4J, Saiku Analytics, Saiku Reporting, Ad-hoc Reporting using Interactive Reporting Tool,Dashboards,Reports using PRD, PDD,Data Integration using Kettle ETL,Data Mining usign WEKA,Integration of Servers with Databases,Mobile/iPad compatible Dashboards using Bootstrap Css,Drilldown dashboards,Interactive Dashboards

Monday, 2 March 2015

Pentaho Data Integration : Basics-3 : Parameters in Pentaho Kettle ETL

In this post we will see the basics of parameters in Pentaho Data Integration(a.k.a Kettle ETL)

Software Setup for this tutorial is :

Kettle ETL - Spoon : 5.2.0.0 CE


Quick navigation steps : 

   Parameters
    1. How to define ?
      • In transformation ?
      • In Job ?
    2.  Passing values to parameters ?
      • From GUI of transformation or job.
        • How to assign values to parameters ?
        • How to access parameters ? Syntax.
      • From Command Line
    3. Example 
      1. Parameters usage in a Transformation (Explained GUI and Command line in detail). 
      2. Parameters usage in a Job.

Parameters:

  • Parameters are local variables to the transformation or to the job.
  • How to Define ?

    •  In Transformation :  
    •  Right-click on the transformation workspace and select Transformation settings from the context menu (or just press Ctrl-T), then click on the Parameters tab.
                               Image-1: Setting parameters in Transformation 
           
    • In Job: 
    •  Right-click on the Job workspace and select Job settings from the context menu (or just press Ctrl-J), then click on the Parameters tab.
                               Image-2: Setting parameters in Job 
                              

Note:

  • If the name of the parameter(named parameter) and  variable is same then parameter executes first  
  • B'z parameters are local to the transformation or job whereas variables are like global to transformations or jobs.
  • Parameters are variables(but local to transformation or  job)

Passing Values to Parameters : 

There are two ways to pass values for parameters

1) FROM GUI  of transformation [Image1 & Image 2 gives GUI way of passing values from parameters]

  • How to assign values to parameters ?

    • It offers setting of default value & description for the parameter
                      Parameter                                  Default Value                            Description
                       INPUT_FILE                           E:\Test\HOST\file1.txt                     Input File 1
                       MASTER_HOST                192.168.1.3                            Master host name

  • How to Access parameters defined & assigned in transformations or jobs? 
    • Using ${Parameter_Name}  
      • Example :
        ${ MASTER_HOST }

 2) FROM command line 


Pan is for 
Transformations
Kitchen is for
Jobs
NOTE :
Windows requires you to use quotes around the parameter otherwise the equals sign is
 treated as a space by the command interpreter

Listing parameters on command line for Transformation 
Win
>pan.bat /file:<directory>\<transformation name>  -listparam
>pan.bat /file:”D:\tmp\foo.ktr” -listparam
Linux
sh.pan.sh -file=<directory>/<transformation name> -listparam
user@host:$ sh pan.sh -file=/tmp/foo.ktr -listparam

OUTPUT
Parameter: MASTER_HOST=, default=localhost : The master slave server hostname to connect to
Parameter: MASTER_PORT=, default=8080 : The master slave server HTTP control port

Listing parameters on command line for Job
Win
>Kitchen.bat /file:<directory>\<transformation name>  -listparam
>Kitchen.bat /file:”D:\tmp\foo.kjb” -listparam
Linux
sh.Kichen.sh -file=<directory>\<transformation name> -listparam
user@host:$ sh Kitchen.sh -file=/tmp/foo.kjb -listparam

OUTPUT
Parameter: MASTER_HOST=, default=localhost : The master slave server hostname to connect to
Parameter: MASTER_PORT=, default=8080 : The master slave server HTTP control port


Passing parameter values from command line for Transformation 
Win
c:\> pan.bat -file:”D:\tmp\foo.ktr” “-param:MASTER_HOST=192.168.3.7 
 -param:MASTER_PORT=5432
Linux
c:\> pan.sh -file=/tmp/foo.ktr -param:MASTER_HOST=192.168.3.7 -param:MASTER_PORT=5432

Passing parameter values from command line for Transformation 
Win
c:\> Kitchen.bat -file:D:\tmp\foo.ktr -param:MASTER_HOST=192.168.3.7 
 -param:MASTER_PORT=5432
Linux
user@host:$ sh Kitchen.sh -file:/tmp/foo.ktr -param:MASTER_HOST=192.168.3.7 -param:MASTER_PORT=5432

NOTE :

  •   In above syntax's tested only for "Windows" machine and assuming the linux commands will work. 
  • Use double quotes when there is a space in the folder names.

 Example :1) Parameters usage in a Transformation


Problem statement :
Connect to "foodmart" database of postgresql server by passing HOST and PORT as parameters.

First we will be seeing the usage of parameters in a transformation. 
1. Defining parameters
a) Ctrl+N to design a new transformation and save it then Press Ctrl+T to get transformation properties to define parameters.
b) Give the parameter names ( eg : DEV_DB_HOST and DEV_DB_PORT)
c) Drag and drop Table input step from "Input" category.
BEST PRACTICE :
a) Always give parameter or variable names in CAPITAL letters.
b) By default Kettle converts all variables in to capital letters 

2 Accessing parameters in "Table input" step as variables(b'z parameters are local variables to.
a) Double click the "Table input step" and click on New button to connect postgresql foodmart database.
b) In place of "HOST Name" and  "Database Name" provide parameters with
     ${DEV_DB_HOST} and ${DEV_DB_PORT}
c) Test the connection by clicking "Test" button.
d) Write your queryin "SQL" area.

 HINTS:
1) Once the parameters creation is done, those parameters will be available in the drop down of variable selection.
2) You can use Ctrl+Space button to select from the drop down instead typing manually.

 3)Executing from GUI
a) Save your transformation and click on "Run" button which will open "Execute a transformation" window.
b) In window you can find "Parameters" section and it'll display the already set parameters list with it's default value.
c) You can specify the parameters values here if you want to pass any other. If not the default values will be assigned. 
d) Check the log of transformation for success information or for any error messages in "Execution Results" tab.  

4) Command Line - Listing parameters
a) Refer the syntax's and examples explained in this post (at top).
b) Let us see what are the parameters we are taking in the transformation from command prompt.
c) We use "-listparam" to list all the parameters that is taking by transformation with it's name, default value and description.
d) Example of this transformation : 
 i) Navigate to the Pentaho Data integration installation folder and find Pan.bat file. (Pan is a transformation tool to run it from command line).
ii) Now issue this command : >Pan.bat /file:E:\ParametersDemo.ktr -listparam
iii) Check the output on the command which looks like below
Parameter:DEV_DB_HOST=,default=localhost:Postgres sql installed on this host
Parameter: DEV_DB_PORT=,default=5432:Postgresql port number

NOTE :
If there is any space in your folder names , enclose the file path in double quotes.
Example : >Pan.bat /file:”E:\tmp folder\foo.ktr” -listparam  ( there is a space b/w tmp and folder)
5) Command Line - Executing transformations from command line by passing parameters
a) Once the list parameters test is completed let's pass the parameters from command line.
b) Command : 
>Pan.bat /file:E:\ParametersDemo.ktr  "-param:DEV_DB_HOST=localhost"  "-param:DEV_DB_PORT=8080" 
c) NOTE : Enclose parameters in double quotes for windows command line execution. 

Example : 2 : Parameters usage in a Job ( Pass parameters from job to transformation)

GUI Execution : 

1) In this example we'll see how to pass parameters from a job to transformation. 
2) Press Ctrl+Alt+N to create a new job and save it 
3) Drag start, tansforamtion, success job entries and connect them as shown in below image.

4) Press Ctrl+J to open job settings and go to Parameters tab and give parameter names 
5) Now, double click the transformation entry to locate the transformation(the one we developed in example-1)
NOTE : Remove the parameters in transformation in its setting tab - This is optional. 

6) Keep on the same window opened and go to "parameters section"  and provide parameters that need to pass down to the job. 


NOTE : 
1) You can create N number of parameters in job but can send limited (As per your requirement). 
2) Will update on "Stream column name" and value  in soon. 

7) Now save your job and run the the job. 

Command Line Execution : 

Listing parameters

Passing parameters from command line to the job.

Download Section : 

References :

http://wiki.pentaho.com/display/EAI/Named+Parameters