Hi Folks,
A question makes us go in-depth of subject, isn't it ? Recently, One of my community folks asked a question on the differences between Text File Input & CSV file input ? Which one would you prefer to use if you are given a CSV file ?
Hmmm..!!! Was trying to explain some bla stuff but later in the evening had a quick investigation on finding the differences. (An example speaks more than theory when understanding the concepts).
Here we go. ! Pentaho WIKI has detailed information about these two steps at
Text File Input : http://wiki.pentaho.com/display/EAI/Text+File+Input
CSV File Input : http://wiki.pentaho.com/display/EAI/CSV+Input
Here is my quick understanding on the conceptual things which you may find useful.
1) "Text File" input steps can read variety of files such as Text, CSV, Fixed length whereas CSV file
can read only read .csv extension files.
2) "Text File" input step is useful to add file information such as File Name, Extension, File URI(File directory and etc) which you can't do with CSV file.
3) What if you given multiple CSV files to read at a time ? Text file input step is capable of reading multiple files whereas CSV file only takes 1 file.
4) CSV file is a subsidiary component of Text file input step.
5) There are few advantages of CSV file over text file.
* Use CSV file input step if you are given single file to process at a time (for instance lookup
files).
* When you use CSV file input step it will faster the processing and hence increases the
performance.
* It has NIO (Non blocking Input Output / Native System calls) - which is nothing but size of
read buffer.
* It represents the amount of bytes that is read in one time from disk.
* Parallel running: If you configure this step to run in multiple
copies or in clustered mode, and you
enable parallel running, each copy
will read a separate block of a single file allowing you to
distribute
the file reading to several threads or even several slave nodes in a
clustered
transformation
* Lazy conversion: If you will be reading many fields from the file
and many of those fields will not be manipulate, but merely passed
through the transformation to land in some other text file or a
database, lazy conversion can prevent Kettle from performing unnecessary
work on those fields such as converting them into objects such as
strings, dates, or numbers.
Example :
Download Example here :
Click me .!!!
Change Files path to your system path.
More Information at NIO is at :
https://en.wikipedia.org/wiki/Non-blocking_I/O_%28Java%29
How to create a sample Fixed length file using Excel ?
create a fixed length text file from excel
https://www.google.co.in/?gws_rd=ssl#q=create+a+fixed+length+text+file+from+excel
References :
http://stackoverflow.com/questions/21803856/pentaho-data-integration-csv-input-add-filename-column