Putting it all Together
Using Extensions
The following example uses extensions to reformat column data to write to a new file:
Decode state (CA to California) with mixedCase. Note that another function to convert to a code is available.
Convert city and street address to mixed case
Address.csv Master Table Row
ADDRESS_ID,STREET_ADDRESS,CITY,STATE_CD,ZIP_CODE
95420,53059 EAST ZIMBABWE ST,POTTSVILLE,NV,22587
Generated Output File Row
myConstant*53059 East Zimbabwe St*Pottsville*Nevada*22587
processors = [
'state_cd' : new ExtensionParam( "andi:included", "stateDecode", ["mixedCase"], "log" ),
'city' : new ExtensionParam( "andi:included", "mixedCase" ), \
'street_address' : new ExtensionParam( "andi:included", "mixedCase" )
]
table = new CSVTable( "c:/fileView/simpleJoin/address.csv", processors )
// Write all verification errors to allow review
table.setVerificationMode( "log", "d:/andi/verification/verificationErrors.txt" )
remap = new File( "c:/fileView/simpleJoin/address.remap" )
remap.delete()
// newWriter() is a Groovy extension to File and allows simple methods for writing to the file
out = remap.newWriter()
try {
table.eachRow() { rowIndex, table ->
// In loops, always include the following testExecutionCancelled as it allows an exception
// to be thrown if you click the Scripting dialog Stop button.
context.testExecutionCancelled()
// Create a * delimited row and write to the output file. The column values will have
// all processors run when the column value is selected. Values that don't match a column
// will be written as a constant.
out.write( table.createDelimitedRow( rowIndex, ['myConstant','street_address','city','state','zip_code'], "*" ) )
}
} finally {
// Always close your files or they will remain open and remain read-only until Andi is
// restarted
out.close()
}Using Joins
This example will use a processor to convert a state code, for example NV to the spelled out state name such as Nevada.
Address.csv Master Table Row
ADDRESS_ID,STREET_ADDRESS,CITY,STATE_CD,ZIP_CODE
3636,28345 NORTH AMERICAN SAMOA AVE,ZANESVILLE,NV,77132
state_cd.csv Join Row
STATE_CD,STATE
NV,NEVADA
Generated Output File Row
53059 EAST ZIMBABWE ST*POTTSVILLE*Nevada*22587
processors = [
'state_cd' : new ExtensionParam( "andi:included", "stateDecode", ["mixedCase"], "log" ),
'city' : new ExtensionParam( "andi:included", "mixedCase" ), \
'street_address' : new ExtensionParam( "andi:included", "mixedCase" )
]
table = new CSVTable( "c:/fileView/simpleJoin/address.csv", processors )
// Write all verification errors to allow review
table.setVerificationMode( "log", "d:/andi/verification/verificationErrors.txt" )
remap = new File( "c:/fileView/simpleJoin/address.remap" )
remap.delete()
// newWriter() is a Groovy extension to File and allows simple methods for writing to the file
out = remap.newWriter()
try {
table.eachRow() { rowIndex, table ->
// In loops, always include the following testExecutionCancelled as it allows an exception
// to be thrown if you click the Scripting dialog Stop button.
context.testExecutionCancelled()
// Create a * delimited row and write to the output file. The column values will have
// all processors run when the column value is selected. Values that don't match a column
// will be written as a constant.
out.write( table.createDelimitedRow( rowIndex, ['myConstant','street_address','city','state_cd','zip_code'], "*" ) )
}
} finally {
// Always close your files or they will remain open and remain read-only until Andi is
// restarted
out.close()
}
Handling Embedded Special Characters
Non-Ascii Characters
Finally you can now deal with non-ascii characters before the data is inserted into your database. There are two general approaches you can use in resolving the issues.
setReplaceNonAsciiChars( String replacement ) - Automatically change all non-ascii characters to another value. This also avoids the verification exceptions and the messages in the log
setAlertIfNonAscii method to automatically verify all String data when data is retrieved. If true and a non-ascii value is found, a verification exception will be thrown. If verification mode is set to log, the exception will be written to the logfile.
table = new CSVTable( "c:/fileView/columnNameOverride/address.csv",
["ADDRESS_ID","STREET_ADDRESS","CITY","STATE_CD","ZIP_CODE"], masterProcessors )
table.setAlertIfNonAscii( true )
Use the new ExtensionParam( "andi:included", "replaceNonAscii", ["<replaceWith>"] ) extension to replace all non-ascii characters with the value passed in the parameter. You can replace with "" (blank string) or one or more characters "!!!!!" In the following example, nested ExtensionParam are used to replace non-ascii characters for both street address and city.
processors = [
'city' : [
new ExtensionParam( "andi:included", "mixedCase" ),
new ExtensionParam( "andi:included", "replaceNonAscii", [""] )
],
'street_address' : [
new ExtensionParam( "andi:included", "mixedCase" ),
new ExtensionParam( "andi:included", "replaceNonAscii", [""] )
],
'zip_code' : new ExtensionParam( "andi:included", "zipCode" )
]