Remove/mark duplicate entries, SELECT remaining unmatched--Accounting-based problem

Question

Remove/mark duplicate entries, SELECT remaining unmatched--Accounting-based problem

mfagan12

Default port

Points: 1432
More actions
December 13, 2012 at 5:52 pm

#266103

I have a table with a month's worth of accounting data that I need to compare to another table with accounting data from another source. But before I can even get to that point, I need to remove what the accounting department calls the "back out" entries.
My data can look like this for a particular TransactionID/AcctNo/FundNo combination:
AcctNo FundNo Debit Credit TransactionID
xyz 123 0 25 abc123
xyz 123 0 25 abc123
xyz 123 25 0 abc123BO
Or due to multiple mistakes, it can look like this:
AcctNo FundNo Debit Credit TransactionID
xyz 123 0 25 abc123
xyz 123 0 25 abc123
xyz 123 25 0 abc123BO
xyz 123 0 25 abc123
xyz 123 25 0 abc123BO
The TransactionID is ultimately the big identifying key but I also have data that may not be "back outs" that look like this:
AcctNo FundNo Debit Credit TransactionID
abc 456 0 25 def123
abc 456 25 0 def123
This data is fine to keep (for the time being--right now I'm working on a quick slap together job and will go back and refine--dealing with the "back outs" is the main issue now because I'm looking a about 100K records that have the "BO" at the end of the TransactionID).
My question is how I would mark (I did add a field to my table to flag "back outs") the records that are part of the "back out?" For example:
AcctNo FundNo Debit Credit TransactionID
xyz 123 0 25 abc123
xyz 123 0 25 abc123 **this is the corresponding "back out" transaction
xyz 123 25 0 abc123BO
Or
AcctNo FundNo Debit Credit TransactionID
xyz 123 0 25 abc123
xyz 123 0 25 abc123 **
xyz 123 25 0 abc123BO
xyz 123 0 25 abc123 **
xyz 123 25 0 abc123BO
Any suggestions would be greatly appreciated. I've never run into this type of scenario during my entire career so this is most definitely a first--and of course, they needed it yesterday:-)
Thanks.
--Molly

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply

Dwain Camps SSC Guru Points: 86948 More actions · Answer 1

I have two suggestions:

1. Firstly to get quick help you need to post your data with DDL in a readily consumable form. You can consult this article for the how-tos: http://www.sqlservercentral.com/articles/Best+Practices/61537/. Include your "flag" column in the DDL.

2. Second, accounting transactions of this sort *always* have a date/time stamp associated with them. This is necessary to help you pair the corresponding incorrect entry with the BO entry. Presumably the BO entry occurs after the incorrect one and the date/time stamp should show this. So when you post your sample data according to forum etiqutte, include a date/time stamp to help your helper.

Remember that data in SQL tables has no inherent row ordering. That is why the row date is necessary.

With that, this solution I believe will be pretty straightforward and there are several approaches to it.

With 100K records in your table, you'll be wanting to have a good performing solution and among the several available there are some that stand out more than others.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

mfagan12 Default port Points: 1432 More actions · Answer 2

Sorry about posting the data in the wrong format. I posted this from my home computer and can't actually post anything close to the real data.

For the datetimestamp--that is not an option because there is no time on the entry--I just have a date.