CSVTable

CSVTable

Overview

The CSVTable provides the central functionality for data verification and cleansing. With a simple script, you can quickly get the value from any row/column in the file using one of the following simple table patterns.

table = new CSVTable( "c:/fileView/simpleJoin/address.csv" ) // Iterate over each row in the file table.eachRow() { rowIndex, table -> println table.getValue( rowIndex, 'state_cd' ) } // Direct access to specific rows/columns city = table.getValue(5000, 'city') state = table.getValue(5000, 'state_cd') // Simple forward navigation for (i = 0;i < table.getRowCount();i++) { println table.getValue( i, 'state_cd' ) } // Navigate the file in reverse almost as fast as going forward for (i = table.getRowCount();i > -1;i--) { println table.getValue( i, 'state_cd' ) }

Constructors

Constructors provide a way to create a CSVTable instance by passing parameters. The following constructors are available for use and provide access to a few of the available options.

CSVTable( String fullyQualifiedFilePath ) CSVTable( String fullyQualifiedFilePath, String columnDelimiter ) CSVTable( String fullyQualifiedFilePath, Map<String, Object> processors) CSVTable( String fullyQualifiedFilePath, Map<String, Object> processors, String columnDelimiter ) CSVTable( String fullyQualifiedFilePath, List<String> columnNameOverride, Map<String, Object> processors, String columnDelimiter )

Note: By default, the first line of a CSV file must contain the delimited column names. In a case where the column names are not provided, use the columnNameOverride to provide the column names to use for  processing the file.

Note: To use an alternate file system for temporary files which can be large depending on the size of the input file, use the CSVTable.setTempDirectory( new File( "d:/andi/tmp" ) ) static method.

Builder Pattern

The builder allow setting each of the available options using a builder pattern which consists of the following:

  • create the builder: builder = CSVTable.newBuilder()

  • set each of the fields as shown in the methods below. You can set them individually or daisy chain as shown:

    • builder.setFile( "myFilePath" ).setProcessors([<processors>])

  • after all fields are set: csvTable = builder.build()

Method

Description

Default

Required

setFile( String fullyQualifiedPath)

The CSV file to load

 

true

setProcessors(Map<String, Object>)

Refer to Column Value Processors below

 

 

setReplaceNonAsciiChars(String replacement)

Replace non-ascii characters before processing occurs.

 

 

setAlertIfNonAscii(boolean alert)

true to write verification errors to the log if non-ascii characters are found

 

 

setTempDirectory(String dirPath)

Temporary directory for internal files that will be deleted after completion.

 

 

setEscapeSingleQuotes( boolean escape )

true if single quotes should be escaped. This is useful for preparing a value for insertion into a database.

 

true

setVerificationMode(String mode)

Refer to Verification Modes below

 

fail

setVerificationLogFile(String filePath)

 Refer to Verification Modes below

 

 

setColumnNameOverrides(List<String> columnNameOverrides)

Use column names for getting values

 

 

setReverseScanRowCacheSize(int size)

The number of bytes to increase speed of get row values in reverse order

 

15,000

The following are formatting options.

Method

Description

Default

Required

setCsvType(String csvType)

The following formats are available that pre-configure the other parameters. You can choose a default and override specific parameters.

  • DEFAULT

  • EXCEL

  • INFORMIX_UNLOAD

  • INFORMIX_UNLOAD_CSV

  • MONGODB_CSV

  • MONGODB_TSV

  • MYSQL

  • ORACLE

  • POSTGRE_SQL_CSV

  • POSTGRESQL_TEXT

  • RFC_4180

  • TDF

DEFAULT

 

setDelimiter(String delimiter)

The delimiter between columns

,

 

setQuoteMode(String quoteMode)

  • ALL

  • ALL_NON_NULL

  • MINIMAL

  • NON_NUMERIC

  • NONE

depends on csvType

 

setQuoteCharacter(char quoteChar)

The character to use for double quotes

depends on csvType

 

setEscapeCharacter(char escapeChar)

Escape character for escaping double quotes

depends on csvType

 

setCommentMarker(char commentMarker)

The comment marker

 

 

See Also

Summary

  • Column names are case insensitive

  • context.testExecutionCancelled() - Add inside loop to allow you to stop the script prematurely

  • Multiple processors can be defined

  • ExtensionParam - "daisy chain" multiple processors for a single column using an array [<extension>,<extension>]

  • delimiters - must only be one character

  • delimiters - delimiters can be embedded in a column value but must be double quoted