Putting it all Together

Putting it all Together

Using Extensions

The following example uses extensions to reformat column data to write to a new file:

  • Decode state (CA to California) with mixedCase. Note that another function to convert to a code is available.

  • Convert city and street address to mixed case

Address.csv Master Table Row ADDRESS_ID,STREET_ADDRESS,CITY,STATE_CD,ZIP_CODE 95420,53059 EAST ZIMBABWE ST,POTTSVILLE,NV,22587 Generated Output File Row myConstant*53059 East Zimbabwe St*Pottsville*Nevada*22587
processors = [ 'state_cd' : new ExtensionParam( "andi:included", "stateDecode", ["mixedCase"], "log" ), 'city' : new ExtensionParam( "andi:included", "mixedCase" ), \ 'street_address' : new ExtensionParam( "andi:included", "mixedCase" ) ] table = new CSVTable( "c:/fileView/simpleJoin/address.csv", processors ) // Write all verification errors to allow review table.setVerificationMode( "log", "d:/andi/verification/verificationErrors.txt" ) remap = new File( "c:/fileView/simpleJoin/address.remap" ) remap.delete() // newWriter() is a Groovy extension to File and allows simple methods for writing to the file out = remap.newWriter() try { table.eachRow() { rowIndex, table -> // In loops, always include the following testExecutionCancelled as it allows an exception // to be thrown if you click the Scripting dialog Stop button. context.testExecutionCancelled() // Create a * delimited row and write to the output file. The column values will have // all processors run when the column value is selected. Values that don't match a column // will be written as a constant. out.write( table.createDelimitedRow( rowIndex, ['myConstant','street_address','city','state','zip_code'], "*" ) ) } } finally { // Always close your files or they will remain open and remain read-only until Andi is // restarted out.close() }

Using Joins

This example will use a processor to convert a state code, for example NV to the spelled out state name such as Nevada.

Address.csv Master Table Row ADDRESS_ID,STREET_ADDRESS,CITY,STATE_CD,ZIP_CODE 3636,28345 NORTH AMERICAN SAMOA AVE,ZANESVILLE,NV,77132 state_cd.csv Join Row STATE_CD,STATE NV,NEVADA Generated Output File Row 53059 EAST ZIMBABWE ST*POTTSVILLE*Nevada*22587

 

processors = [ 'state_cd' : new ExtensionParam( "andi:included", "stateDecode", ["mixedCase"], "log" ), 'city' : new ExtensionParam( "andi:included", "mixedCase" ), \ 'street_address' : new ExtensionParam( "andi:included", "mixedCase" ) ] table = new CSVTable( "c:/fileView/simpleJoin/address.csv", processors ) // Write all verification errors to allow review table.setVerificationMode( "log", "d:/andi/verification/verificationErrors.txt" ) remap = new File( "c:/fileView/simpleJoin/address.remap" ) remap.delete() // newWriter() is a Groovy extension to File and allows simple methods for writing to the file out = remap.newWriter() try { table.eachRow() { rowIndex, table -> // In loops, always include the following testExecutionCancelled as it allows an exception // to be thrown if you click the Scripting dialog Stop button. context.testExecutionCancelled() // Create a * delimited row and write to the output file. The column values will have // all processors run when the column value is selected. Values that don't match a column // will be written as a constant. out.write( table.createDelimitedRow( rowIndex, ['myConstant','street_address','city','state_cd','zip_code'], "*" ) ) } } finally { // Always close your files or they will remain open and remain read-only until Andi is // restarted out.close() }

Handling Embedded Special Characters

Non-Ascii Characters

Finally you can now deal with non-ascii characters before the data is inserted into your database. There are two general approaches you can use in resolving the issues.

  • setReplaceNonAsciiChars( String replacement ) - Automatically change all non-ascii characters to another value. This also avoids the verification exceptions and the messages in the log

  • setAlertIfNonAscii method to automatically verify all String data when data is retrieved. If true and a non-ascii value is found, a verification exception will be thrown. If verification mode is set to log, the exception will be written to the logfile.

table = new CSVTable( "c:/fileView/columnNameOverride/address.csv", ["ADDRESS_ID","STREET_ADDRESS","CITY","STATE_CD","ZIP_CODE"], masterProcessors ) table.setAlertIfNonAscii( true )
  • Use the new ExtensionParam( "andi:included", "replaceNonAscii", ["<replaceWith>"] ) extension to replace all non-ascii characters with the value passed in the parameter. You can replace with "" (blank string) or one or more characters "!!!!!" In the following example, nested ExtensionParam are used to replace non-ascii characters for both street address and city.

processors = [ 'city' : [ new ExtensionParam( "andi:included", "mixedCase" ), new ExtensionParam( "andi:included", "replaceNonAscii", [""] ) ], 'street_address' : [ new ExtensionParam( "andi:included", "mixedCase" ), new ExtensionParam( "andi:included", "replaceNonAscii", [""] ) ], 'zip_code' : new ExtensionParam( "andi:included", "zipCode" ) ]