3rd party Data Quality Issues

  • Every month a 3rd party sends us a text file that we import into a custom app and finally our database. This month the import file was rejected by the app and it took us over 4 hours (including hex dumps looking for special characters) to find the bad data. When we do, we send our findings back to the 3rd party so they can correct their data. (Actually, we get over 20 such files from different 3rd parties every month. Drives us crazy).

    Using Sql, or otherwise, what can be done to "pre-screen" the data for errors, either in format or content. It would be nice if the “pre-screen” app could tell us precisely where the bad data is.

    TIA,

    Barkindog

  • This month the import file was rejected by the app

    1, What language is your app written in?

    In any case does your app count each line 0f the file, and count each field in the line being read by the app. If it does, then modify your app to report the number of the last good line and last good field read from the file. That line +1 is then the offending line and that last good field +1 is the offending field with the unwanted character(s).

    Now you may have to search that line, correct it as required and begin process with the app a second time, and of course continue correcting one line at a pass until all errors are corrected.

    Or have the app check each character in each line using the ASCII equivalents for each character. You could create a table of "unwanted" characters, and to identify the hex code for each character refer to

    http://www.ascitable.com/

    and with this information write code to either correct the line and write the corrected line back to the file. Or write correct lines to a new file and then write the corrected line(s) to that new file and let the app continue until all lines have passed inspection.

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • Bitbucket,

    Truth be told we actually have many apps (written in different languaes) that import differerent format files and apply their own processes to each file. So it's not really as simply as modifying the "one" app. I was hoping there was some third party software that can deal with data quality in a general, largely all embracing, manner.

    Still you idea of number counting may be the most practical way to go as it relies on the app, which already has the business rules we want in place.

    Barkingdog

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply