Best design for multiple validation checking of data

  • Hi all,

    I'm new to SQL and have done some simple reports before. Now I want to do something to help with error checking in my company. For this we have a selection of different tables and the data needs to meet various validation rules else it is classed as an error.

    To deal with this I'm currently thinking of this approach:

    1. Create a view pulling all of the various data together from the multiple tables.

    2. Create an empty 'errors' data table.

    3. Create an Excel file with a button to call a Check for Errors Script

    Then in the the script:

    1. Clear the 'errors' data table

    2. Call multiple scripts, each of which uses the new view, applies the checks for that specific error and writes any erroring data into the 'errors' data table (along with a text string with the unique error code for filtering / sorting purposes).

    3. After calling all the scripts, the table can be refreshed in excel when when used with a pivot table can show the various errors, and let us drill down into all the data so we can fix them.

    Also.. Ideally, I'd like some way to write comments in an excel column for each entry and error code and be able to write that back into a comment table.

    Does this sound a ) doable, b) the right approach?

    I've very new to SQL, so want to make sure I approach it in the best way..

  • Sounds fine to me, we do something similar, only thing I feel strongly about is:

    thinking_machine (9/17/2015)


    1. Clear the 'errors' data table

    We don't do that. We have Create and Update date/time columns; it is helpful to see errors that are New. We also have a Status which allows a user to set "Won't fix" and those can be ignored on [future] reports. They are still there for statistical analysis, but to get the data clean the users want to know "What's new" and beyond that "What haven't we got around to fixing yet, but have not marked 'Won't Fix' "

    We have an IDENTITY column in every table (even if not required by the APP and/or not the Pkey / etc.). That means that we have a single, unique, INT identifier for any row in the database (given the table name as well).

    So our error table is:

    CreatedOn & UpdatedOn Date/Time

    Table name (you could use an ID for this to save space and normalise it)

    Linked ID - the ID of the record within that table. If you don't have a unique INT in every table then perhaps have an ID for those where you do, and a second VARCHAR(1000) column where you don't; you could concatenate the key fields using a "~" tilde delimiter (or something else that is unlikely to appear in the data)

    IsActive - if the error disappears we set this to 0, rather than deleting the row. If it reappears tomorrow we just set the Update Date, rather than creating a new row with today's date as the Create Date. YMMV.

    Status - Allow various user assigned status codes. Initially set to "1" = New. Examples of things user might change to: Read/Seen, High/Low priority to fix, Won't fix, Bug (i.e. you need to improve the Rules to fix that one!!)

    ErrNo - for us this is a unique number within that table. But could be unique for the whole database.

    Text message - typically including the actual fault data (in the relevant column(s) )

    We collect all the duff data into a #ERRORS temporary table. It takes several passes of the actual table with various INSERT statements into #ERRORS. We may suppress some errors if there is already a "high level" error in #ERRORS

    Then we UpSert #ERRORS against the actual Errors Table - setting IsActive=1 where zero, updating the UpdatedOn date/time and so on. (And INSERTing any that are not found)

    We also update IsActive=0 for any error, for the current Table, which is NOT in #ERRORS [any more] (and WHERE the row in Errors Table is IsActive=1)

    In our APP on all the data entry screens etc (i.e. things that display a Record Card for a single record, rather than aggregate data reports and the like) we display any (what we call) Data Validation Errors at the top of the page - i.e. SELECT * FROM ErrorsTable WHERE TableName = 'CurrentTable' AND LinkedID = @IDofCurrentRecord - basically to make a nuisance of ourselves if the errors are not getting fixed 😎 , and also so that anyone looking at the data knows that it is not "safe" / "clean" (We do sometimes include that information on reports too, perhaps just "Validation errors = 999" on ay row wither errors, rather than the actual text)

    We find that generally senior management think that Data Validation is critical - they don't want to take decisions about the future of the company based on duff-data, but frequently the worker-bees are sloppy about cleaning stuff up. In particular where the data in our APP is imported, overnight, from something else - 'coz they have to fix the data in the other app, and then "tomorrow" it will be fixed in our APP. I would like to think that there are no data errors in our APP because insofar as we have business rules they are enforced at data entry; so for us this is much more about imported data. (We do have some in our own APP though - a date-abased event can become an issue after a Due Date passes, for example)

    For us imported data is "overnight, by batch" and thus because it is not interactive with the user we can't report an error direct to the user, so using business rules to populate the Errors Table means that we do, at least, get to put the errors in front of the users when they look at a record etc. I can't see them bothering to print an Errors Report and do anything useful with it ...

    ... so where we DO have senior management buy-in then we like to include some indication of the errors on any report for them 😎 ("This report includes 22 records that contain data validation errors" will usually do the trick 🙂 )

    We have clients with literally tens of thousands of data validation errors and they make zero effort to clean them up. Reluctantly they will clean up the recent ones when we prod them .. but I do wonder why we bother ... I also wonder why the Big Megabucks Accounting APP they use cannot keep its house more tidy. Country Code in Zip Code field, and vice versa, Supplier VAT [i.e. Sales Tax] registration number's checksum error ... all sorts of junk like that which WE catch because their 3rd party data is such rubbish.

  • Wow!

    Kristen, firstly thank you very much for your detailed and helpful reply. You have obviously taken this higher than I was originally considering with tracking the old errors too. Especially as you have already put it into practise in a large company.

    I will have to reread your email several times to grasp everything you said as I am still very new to SQL and this is stretching my knowledge. But in a good way.. 🙂

    Do you offer the ability to add text comments as to why an error cannot be fixed? If so, is that pushed out into a separate table, or as you don't delete old errors is this just added into that table as a comments data field?

    Thank you,

    Richard

  • thinking_machine (9/17/2015)


    Do you offer the ability to add text comments as to why an error cannot be fixed? If so, is that pushed out into a separate table, or as you don't delete old errors is this just added into that table as a comments data field?

    We just have a large VARCHAR column that the user can use as a notepad ... if they choose to delete the contents that's up to them (but we do have an Archive/Audit in pretty much every table, so the "previous versions" are available, but someone would have to remember that something was deleted for anyone to bother looking in there)

    If you would prefer to have "Add a new comment", where each new comment had a UserID and a Date/Time, then a separate table, many-comments linked to one-parent-record, would be better. Depends on how much you need to provide workflow (you could have a Supervisor "assign" the fixing of the errors to specific people, and then after they do that it comes back to the original supervisor, or a specific QA-person, for testing. Might be overkill, it probably depends whether your Business Rules are uncovering Life-and-Death issues, or just rubbish-data-entry issues like mine are 🙂 )

    If you have any questions ask away. If I don't spot the thread / answer you are welcome to send me a nudge Private Message.

  • P.S. We have a "Lookup View" on each (main) table. This is basically all the regular columns from the table, and then any simple Code Lookup Descriptions etc. just makes it easier not to have to do all the JOINs every time you want a Name Record and also the Gender Codes's Description and the Category Code's Description etc. etc.

    Anyway, we also put an ErrorCount in that VIEW so that it is easy to display "There are 999 errors on this record" and then the user can click on that (link) to an actual list of what they all are.

    Something like this

    SELECTT.Col1,

    T.Col2,

    ...

    [ErrorCount] = E.ErrorCount

    FROMdbo.MyTable AS T

    OUTER APPLY (

    SELECT[ErrorCount] = COUNT(*)

    FROMdbo.MyRecordValidationErrorsTable AS E

    WHERE E.LinkID = T.SomeID

    AND E.TableName = 'MyTable'

    AND E.IsActve = 1

    ) AS E

  • Kristen-173977 (9/17/2015)


    P.S. We have a "Lookup View" on each (main) table. This is basically all the regular columns from the table, and then any simple Code Lookup Descriptions etc. just makes it easier not to have to do all the JOINs every time you want a Name Record and also the Gender Codes's Description and the Category Code's Description etc. etc.

    Anyway, we also put an ErrorCount in that VIEW so that it is easy to display "There are 999 errors on this record" and then the user can click on that (link) to an actual list of what they all are.

    Something like this

    CREATE VIEW MyTable_View

    AS

    SELECTT.Col1,

    T.Col2,

    ...

    [ErrorCount] = E.ErrorCount

    FROMdbo.MyTable AS T

    OUTER APPLY (

    SELECT[ErrorCount] = COUNT(*)

    FROMdbo.MyRecordValidationErrorsTable AS E

    WHERE E.LinkID = T.SomeID

    AND E.TableName = 'MyTable'

    AND E.IsActve = 1

    ) AS E

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply