This post will cover below topics
MongoDB:
1. Creating mongoDB collection from flat file data.2. Test the Collection Content
PDI(Kettle)
3. Querying the mongoDB collection in Kettle to fetch the data.( Designing Transformation to get the required result set).
Pentaho CDE:
4. Creating a dynamic Pie Chart using in CDE with Kettle transformations as data sources.Software Ready :
1) MongoDB 2.6 CE2) PDI (Kettle) - 5.0.1- stable CE
3) Pentaho BA Server - 5.0.01- stable CE
4) C-Tools -14.07.29 - Bootstrap supported .
5) Browser - Google Chrome / Mozilla Firefox
Example :
Show the top counts of NextURL field for the given URL's.count, NextURL and URL are fields in the collection.
Categories : NextURL, Measure: count and parameter is : URL.
I've written basic of this article in my previous post which is a static report. Click here for the similar article without parameters.
MongoDB:
1) Download and install mongo DB : Click here
2) Create a Collection called "PageSuccession" in Demo Database : Click here for collection creation with mongoDB
3) Test the collection whether the content exists or not.
>db.PageSuccession.find().pretty()
3) Test the collection whether the content exists or not.
>db.PageSuccession.find().pretty()
PDI(Kettle)
Write the transformation in Kettle as shown in below figure.
Let's say the transformation name is : DashboardWithMongoDBParameters.ktr
This transformation will give you the below result set as per the example requirement.
(Top 10 Next URL's with count and url as parameter in transformation).
NextURL Count
/demo 26205
/feeds/press 19601
/home 7369
/download 6419
/products 5164
/product/product2 4098
/product/product3 4047
/product/product4 3607
/product/product6 2759
/product/product1 2270
/demo 26205
/feeds/press 19601
/home 7369
/download 6419
/products 5164
/product/product2 4098
/product/product3 4047
/product/product4 3607
/product/product6 2759
/product/product1 2270
NOTE : default parameter is given to get the above result set. (Default parameter value is : --firstpage--)
At the time of job/step execution, if we give different value for parameter we will get data related to that parameter..
Let us say our parameter(Described in next steps how to create) is "param_url" and now the value given is : /about and the output will be differ.
NextURL Count
/about 593
/team 111
/contact 108
/about/customers 55
/product/product2 47
/product/product12 35
/product/product3 33
/news 32
/products 26
/download 24
/about 593
/team 111
/contact 108
/about/customers 55
/product/product2 47
/product/product12 35
/product/product3 33
/news 32
/products 26
/download 24
Quick Image for better understanding:
MongoDB Input:
Configure Connection : Host : localhost or give the ip address of mongoDB installed
machine. 27017 is the port number for mongoDB.
Query : { url : "${param_url}" } where param_url is the parameter defined.
Fields : check the single output jSON field.
JSON input:
Always use Json input with mongoDB input. B'z we write json expressions on mongoDB documents. Direct way of mongoDB querying is not allowed.
Sample Rows: This will limit the number of rows from the previous step. It'll take range values. For example : 1..10 or 1..20
Click the get fields
Creating parameter for transformation
1) Double click on the canvas to get the transformation properties.
2) Click on Parameters Tab and define a parameter called "param_url" with --firstpage-- as it's default value.
3) On run time you can give different value for param_url parameter.
4) Value of the parameter will be replaced in the Query written in MongoDB input and gives you the required result set.
Pentaho CDE
Now let's jump into Dashboard creation with Pentaho CDE.
1) Prepare layout to keep parameters and pie chart.
2) Data sources section.
Parameter is url field values from collection. So to feed the selection we again need to write a simple transformation which fetch the field values of url.
Sample transformation can be designed as shown below figure to get only url field.
Let's say the transformation name is : DashboardWithMongoDBParameters2.ktr
The above transformation will give the below output( part of the output is shown in next lines).
URL
--firstpage--
/about
/about/awards
/about/customers
/ad/easy
/ad/lead
/ad/save
/ad/survey
/ad/training
/analyst_perspective
/buy
/careers
/careers/account
/careers/capetown
/careers/engineer
/careers/frankfurt
/careers/london
/careers/manager
/careers/munich
/careers/paris
/careers/sales
/careers/syndey
/confirmed/consulting
/confirmed/contact
/confirmed/demo
/confirmed/sales
/confirmed/thankyou
/contact
/customer
/demo
/docs
/docs/doc1.pdf
/docs/doc10.pdf
/docs/doc11.pdf
/docs/doc12.pdf
--firstpage--
/about
/about/awards
/about/customers
/ad/easy
/ad/lead
/ad/save
/ad/survey
/ad/training
/analyst_perspective
/buy
/careers
/careers/account
/careers/capetown
/careers/engineer
/careers/frankfurt
/careers/london
/careers/manager
/careers/munich
/careers/paris
/careers/sales
/careers/syndey
/confirmed/consulting
/confirmed/contact
/confirmed/demo
/confirmed/sales
/confirmed/thankyou
/contact
/customer
/demo
/docs
/docs/doc1.pdf
/docs/doc10.pdf
/docs/doc11.pdf
/docs/doc12.pdf
crate a simple parameter in CDE and name it as param_url and also create a selection and give parameter & listener as param_url for it and give place holder for it (i.e., html Object).
Go to Data sources in CDE and create two Kettel Queries
one for getting result to plot the data on pie chart
another to feed the input selection.
For the pie chart query give
locate the transformation uploaded to the folder in the server and then
Variables : param_url , param_url as arg and value.
Parameters : param_url & param_url as arg and value.
Kettle Step Name : Sort rows 2 (the step which gives the result set for chart).
For input query give : locate the transformation file uploaded in folder and give the step name ( for this : Select values)
Check the output of queries using CDA editor..
Save the dashboard and preview it..
Preview 1 : With --firstpage-- input value for URL
Preview 2 : With /carres/paris input value for URL
Readers of this post encouraged to add your suggestions , feed back & additions to this post. drop your comments
Sadakar
BI developer
HI how to made referencing with another collection in mongo db
ReplyDeleteIn shell i will execute query like this
db.Test.insertOne({ ptid:1, "aboutme": { "$ref":"about_me", "$id": ObjectId("5e8c9af3754853a035fcdd5b") } })
using stream lookup i will get objectid of inserted row id in another collection and form asting like aboutme field but it is inserted as string not as dbref
any solution in PDI