Pentaho Tools :

Pentaho C-Tools(CDE,CDF,CDA),Pentaho CE & EE Server,OLAP-Cubes,Analysis using Pivot4J, Saiku Analytics, Saiku Reporting, Ad-hoc Reporting using Interactive Reporting Tool,Dashboards,Reports using PRD, PDD,Data Integration using Kettle ETL,Data Mining usign WEKA,Integration of Servers with Databases,Mobile/iPad compatible Dashboards using Bootstrap Css,Drilldown dashboards,Interactive Dashboards

Tuesday, 3 March 2015

Pentaho Data Integration(CE): Basics-5 : Scheduling Community Kettle ETL Jobs in Winodws 7 or 8

In this post we will see the basics of how to schedule a Job in Windows environment using "Windows Task scheduler".

Prior to scheduling jobs or transformations either in linux or windows one should know how to execute jobs/transformations from command line by passing arguments or parameters or variables.

1) Design & develop transformation or job.
2) Make transformation or job dynamic using parameters or variables.
3) Execute the transformation or job from command line by passing parameter or variables.
4) Prepare a list of commands in a batch file.
5) Schedule the .bat file in windows using "Task Scheduler" Tool.

Transformations are used to process the rows of data flow whereas jobs are used for high level flow control hence we will be designing a transformation to process some data & then use this transformation in job.
Finally we will be seeing how to schedule job in Windows.
 Design & develop transformation or job

A Simple scenario : select table content from database and store in a file. Each run should fetch different table content by passing table name as a parameter.

Transformation Design :  Image-1
 Job Design : Image-2

Transformation :
1) Click Crtrl+N to create a new transformation and save it as "Parameters.ktr"  and then press "Ctrl+T" to get its settings.
2) Define a parameter "TABLE_NAME" under parameters section  and provide description & default value.
3) Take a "Table input" component and connect to foodmart database by providing all the details.
4) Write a simple query  : For example : SELECT * FROM ${TABLE_NAME}
5) Check "Replace Variables in Script"
6) Add dummy & Text files as shown in image-1
7) Execute it from GUI and command prompt by passing different values for table names.
For example on the command prompt: 

D:\Softwares Archive Installed\Pentaho\pdi-ce-\data-integration>Pan.bat /file:C:\Users\sadakar.p\Desktop\Parameters.ktr "-param:TABLE_NAME=sales_fact


9) Repeat step-8 & step-9 with different value (i.e., by passing different table name) on the command prompt.

1) Ctrl+Alt+N to create a new job and save it as "Parameters_job.kjb" as shown in image 2.
2) Ctrl+J to get job settings and define parameter as "TABLE_NAME".
3) Connect START job entry -> Transformation  entry -> success entry as shown in image-2
4) Double click on "transformation" entry & provide the path of the transformation then go to parameters tab and click on "Get parameters" button.
5) Now, save the job & run from GUI and command prompt.
6) For example : 
D:\Softwares Archive Installed\Pentaho\pdi-ce-\data-integration>Kitchen.bat /file:C:\Users\sadakar.p\Desktop\Parameters_job.kjb "-param:TABLE_NAME=sa

7) Now you can conclude that the jobs and transformations are running fine.

Preparing a batch file :

cd /d "D:\Softwares Archive Installed\Pentaho\pdi-ce-\data-integration\"
call Kitchen.bat /file:E:\Explore\Kettle\2_WindowsScheduling\Parameters_job.kjb "-param:TABLE_NAME=region" -logfile=E:\Explore\Kettle\2_WindowsScheduling\log.txt


Windows Scheduling :
1) Go to "Task Scheduler" in windows (simply type it in search box) and click on "Create Basic Task".





7) Next click on finish


At the time of execution it'll pop a window saying that the execution has started.

In this way we can schedule Kettle community jobs in windows 7 or 8 .

:-) comments & suggestions are most welcome to improve this article :-)

Thank you for spending your time on this page.