Andi for Data Engineers

Andi for Data Engineers

Problem

Data Engineering can require extracting data from one or more sources, cleansing the data and then importing the data into the target database. This requires a toolset that out of the box allows you to be more productive and over time allows you to extend the tool to more efficiently analyze, verify and cleanse your data.

Large files with incomplete or wrong data requires many extra hours or days manually analyzing and manipulating files before loading the data. As large files can range from a megabyte to gigabytes in size or more, approaches such as text editors and Excel can no longer load the file leaving you with limited options such as writing your own custom code.

The worst problem is that moving from company to company, you may not be able to take the scripts/programs you have written and if you can, the business rules change per customer which means more coding of scripts is required, often requiring a rewrite.

Solution

What if there was a new solution that could help take away the pains of working with  csv files?

  • Quickly scan forward and backward through a file in seconds, even if the file has millions or more of rows

  • CSV file access is "table" based using the familiar rows/column pattern, with indexed joins reducing "table scans"

  • Column validations and cleansing can be performed automatically just by accessing a column with a configured processor. 

    • The processor can dynamically change a column value

    • Use pre-defined andi:included extensions for state code->state name, state name->state code, converting case of names and more.

    • Create an Andi Extension jar(s), each supporting 1-n functions. This allows you to standardize your validations and cleansing operations to quickly extend Andi Integrated Scripting with custom specific functions to support your customers business rules.

  • Multiple modes of operation:

    • Validation mode allows validations to be performed against all of the rows with errors being logged.

    • Fail mode will automatically fail the script if an exception is thrown

  • Join additional lookup csv files to cleanse a file.

    •  Indexing a join provides incredible performance, even if the files are very large

For more detailed information, take a look at the technical details and script samples

Quickly verify and cleanse very large files
Andi Integrated Scripting Extensions