CSVTable
Overview
The CSVTable provides the central functionality for data verification and cleansing. With a simple script, you can quickly get the value from any row/column in the file using one of the following simple table patterns.
table = new CSVTable( "c:/fileView/simpleJoin/address.csv" )
// Iterate over each row in the file
table.eachRow() { rowIndex, table ->
println table.getValue( rowIndex, 'state_cd' )
}
// Direct access to specific rows/columns
city = table.getValue(5000, 'city')
state = table.getValue(5000, 'state_cd')
// Simple forward navigation
for (i = 0;i < table.getRowCount();i++) {
println table.getValue( i, 'state_cd' )
}
// Navigate the file in reverse almost as fast as going forward
for (i = table.getRowCount();i > -1;i--) {
println table.getValue( i, 'state_cd' )
}Constructors
Constructors provide a way to create a CSVTable instance by passing parameters. The following constructors are available for use and provide access to a few of the available options.
CSVTable( String fullyQualifiedFilePath )
CSVTable( String fullyQualifiedFilePath, String columnDelimiter )
CSVTable( String fullyQualifiedFilePath, Map<String, Object> processors)
CSVTable( String fullyQualifiedFilePath, Map<String, Object> processors,
String columnDelimiter )
CSVTable( String fullyQualifiedFilePath, List<String> columnNameOverride,
Map<String, Object> processors, String columnDelimiter )
Note: By default, the first line of a CSV file must contain the delimited column names. In a case where the column names are not provided, use the columnNameOverride to provide the column names to use for processing the file.
Note: To use an alternate file system for temporary files which can be large depending on the size of the input file, use the CSVTable.setTempDirectory( new File( "d:/andi/tmp" ) ) static method.
Builder Pattern
The builder allow setting each of the available options using a builder pattern which consists of the following:
create the builder: builder = CSVTable.newBuilder()
set each of the fields as shown in the methods below. You can set them individually or daisy chain as shown:
builder.setFile( "myFilePath" ).setProcessors([<processors>])
after all fields are set: csvTable = builder.build()
Method | Description | Default | Required |
setFile( String fullyQualifiedPath) | The CSV file to load |
| true |
setProcessors(Map<String, Object>) | Refer to Column Value Processors below |
|
|
setReplaceNonAsciiChars(String replacement) | Replace non-ascii characters before processing occurs. |
|
|
setAlertIfNonAscii(boolean alert) | true to write verification errors to the log if non-ascii characters are found |
|
|
setTempDirectory(String dirPath) | Temporary directory for internal files that will be deleted after completion. |
|
|
setEscapeSingleQuotes( boolean escape ) | true if single quotes should be escaped. This is useful for preparing a value for insertion into a database. |
| true |
setVerificationMode(String mode) | Refer to Verification Modes below |
| fail |
setVerificationLogFile(String filePath) | Refer to Verification Modes below |
|
|
setColumnNameOverrides(List<String> columnNameOverrides) | Use column names for getting values |
|
|
setReverseScanRowCacheSize(int size) | The number of bytes to increase speed of get row values in reverse order |
| 15,000 |
The following are formatting options.
Method | Description | Default | Required |
setCsvType(String csvType) | The following formats are available that pre-configure the other parameters. You can choose a default and override specific parameters.
| DEFAULT |
|
setDelimiter(String delimiter) | The delimiter between columns | , |
|
setQuoteMode(String quoteMode) |
| depends on csvType |
|
setQuoteCharacter(char quoteChar) | The character to use for double quotes | depends on csvType |
|
setEscapeCharacter(char escapeChar) | Escape character for escaping double quotes | depends on csvType |
|
setCommentMarker(char commentMarker) | The comment marker |
|
|
See Also
Summary
Column names are case insensitive
context.testExecutionCancelled() - Add inside loop to allow you to stop the script prematurely
Multiple processors can be defined
ExtensionParam - "daisy chain" multiple processors for a single column using an array [<extension>,<extension>]
delimiters - must only be one character
delimiters - delimiters can be embedded in a column value but must be double quoted