Reconciliation of Source and Target data

  • Hi All,

    I need to implement an SSIS package wherein during the run time (i.e. during the data load), the package should check / match the fact data between the source and target systems. For eg, if it's a Sales measure in one of the fact tables, then it should reconcile this value between the two systems. If the sales value doesn't match, then it should shoot a mail notifying about the failed reconciliation. I surfed through some articles, but no luck.

    Can anybody provide a solution to this problem? Or point me to some link where i can the details/solution?

    Best Regards,

    Bhushan

  • Can anyone please reply to this?

  • Can anyone please help?

  • Bhushan,

    What version of SSIS? Here's a link but it only applies to the 2008 Profiler Task. Not sure if this is exactly what you want to do but it's comparing source and target using a task.

    http://www.simple-talk.com/sql/learn-sql-server/sql-server-2008--ssis-data-profiling-task/

    Off the top of my head it seems you could have an Execute SQL task that joins the source and target tables once the load is done and writes a row to a reconciliation table (or something) for mismatched values that you could report on with another SQL task if there are any results. Manual profiling, if you will. Just brainstorming. I'm sure there's a simpler way to do it.

    Matt

  • Matt,

    Thanks for your answer. I'm using SSIS 2008. I tried the article given in the link you mentioned. And to an extent, it can be used to solve my issue. However, there are some concerns regarding this example:

    1. After having checked the mismatched values, the package writes them to an XML file which contains the non-matching values in xml format. Is there any way to write these values to a database table along with the primary key of the non-matching records? So, that it would be easy to spot these non-matching values and re-run of the package would be further easy.

    2. Secondly, when you edit the Data Profiling Task, under Profile Requests tab, for the Value Inclusion request, there's a property beneath named MaxNumberOfViolations. In the given sample, it has been set to 100 (which means the XML can hold upto 100 mismatched values). However, do you know what is the limit for this particular property? (meaning how many non-matching values it can log) If we don't set this property, then would the data profiling task log the number of mismatched values infinitely into the XML?

    Awaiting your reply.

    Regards,

    Bhushan

  • Bhushan ,

    Forgive me if I came across as a 2008 expert, we're using SSIS 2005 currently so I don't have the Profiler to test with, and am theorizing at this point. We're on the verge of 2008, however and that's why I'm exploring this option. I'd ask anyone out there who is doing this to please respond as well. I had the same experience you did searching for information.

    After a bit more thought about the process you describe, I wonder if instead of profiling, your package could end with an Execute SQL task that runs a stored procedure to do the comparison between source and target and write the mismatched primary keys to a table. From there, you could read the table with another Data Flow task and generate emails. It doesn't compare row by row as the package is executing, but does the reporting at the end. Would that work for you? Or are you attempting to correct this mismatch as it occurs?

    If you choose the profiling route, here is what I would try:

    1. The second half of the article regarding generating the full inclusion profile looks like it's capturing the XML output of the first Profiling Task and writing it to a file. Since the SubsetColumns element of the XML output includes the ProductID in his example (specified in the InclusionColumns property), you should be able to include whatever primary key columns you want there and pass it to another Data Flow Task and out the OLE DB Destination.

    2. The article also would suggest that there is no violation limit so you'd have to experiment with values for MaxNumberOfViolations. Some parameters set to zero remove the limit but I don't know if this is the case with MaxNumberOfViolations. The discussion of the first Profiler Task would suggest zero means zero in this case. You might try blanking it but it may error on that. If there is a practical upper limit you might have to go at the cleanup of mismatches in stages; generate a list for cleanup, take care of those, then re-run. If the mismatch is recurring, then perhaps there are issues with the ETL process.

    Also what I would do is log into the forum and post these questions or contact the author directly.

    Regards,

    Matt

  • Hello Matt,

    You're forgived 😉 yet, you still seem to be an expert 🙂

    Coming to the topic...below i've tried to explain the issue to it's brevity...

    There are two systems...source - sql server db 2k5...and target - sql server db 2k8

    let's say we have one value in one of the source tables which holds a decimal value, say Gross Sales. Now, before loading the data into the warehouse, i want to check if this sales value is consistent in the source as well as the warehouse. Therefore, to check this consistency of data, it's needless to check the row count respective to the primary keys. Since, even after the row count between the two tables match, we can't assure that the sales value is same throughout between these tables. Though, we can consider the row count thing as a second check, but this can't be called as a primary check. Primary check should be on the sales value in the source and target systems. So, probably i can use the aggregate transformation i.e. summing up of values in this column for comparing the consistency of sales and then further using the mail generation task.

    Hence, what you suggested about using the Execute SQL task and DFT can be considered as a solution.

    Coming back to the Data Profiling task.

    I presume this can't be used to sort the issue. This has some different purpose in comparison to what I intend. The example which has been used, is based on the primary key column and it's an integer value. In my case, there's a need for computations on decimal values. Upon studying this task, I was found that this task primarily doesn't work with decimal fields. If it's an integer value, then it gives a perfect comparison. In case of floats, it throws an error.

    So, it's better that I don't use this task. Instead, I can try out the other solution of using the Execute SQL task etc.

    Thoughts?

    Regards,

    Bhushan

  • Hello Bhushan,

    I'm glad to know about the issue with floats and the Profiler, thank you for that. I agree with your plan, it's what I would try first.

    Matt

  • Hi Matt,

    I've implemented this thing as below:

    In Data Flow Task, I've used two aggregate transforms which would give the sum of the floating/decimal field (say gross sales). These values are passed to the respective script components in order to assign both the totals (source and staging totals) to two variables. After this, in Control flow, I've checked the values of these variables again in a script task. If the values match, then the data would get loaded into the warehouse, else an email would be sent stating the fact has inconsistent values.

    Now, my concern is that the notification which I sent only contains a text mentioning the failure of package. I want the mail to contain the details of non-matching records i.e. to check which particular records between source and staging areas are not matching. I suspect that Lookup transformation can't be used to match the values for a decimal/real/float fields. If it can, then I can redirect the non-matching rows to a table and then dump this table into a flat file. Albeit, can I use Lookup to achieve this?

    Thoughts?

    Thanks!

    Bhushan

  • Hi Bhushan,

    I would test to verify that the Lookup can't compare those types. Nevertheless, I think you're on the right track. The Lookup can be set to various cache levels (SQl Server 2005 Integration Services. Brian Knight et al. Wrox. 2006) to achieve more of a streaming lookup that won't slow processing too much. It might work in your case.

    You may have this reference already but if not I highly recommend this book:

    http://www.wrox.com/WileyCDA/WroxTitle/Professional-Microsoft-SQL-Server-2008-Integration-Services.productCd-0470247959.html

    As far as the detail in the email, depending on how much information you have in your variables, you might include the variable in the Subject of the Send Mail Task.

    Regards,

    Matt

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply