September 30, 2013 at 9:26 am
Hello All,
I am currently working on comparing two versions of data for a comparison report. I have a delimited string of characters and I am trying to figure out if the data is the same but in a different order, I should not mark the record as different. I eliminated all data with different lengths. Edited...with data creation scripts
The approach I am using is to split the values using a cross apply to a "Split by delimiter function" into a new table and doing left outer join on part number between the
CURRENT AND PREVIOUS record sets and looking for where records are missing in the second table/record set.
For example:
-- Original Data
CREATE TABLE #Table_Orig (PartNumber int, FILEVERSION varchar(100), VALUE varchar(100))
INSERT INTO #Table_Orig VALUES (1234, 'CURRENT', 'A B')
INSERT INTO #Table_Orig VALUES (1234, 'PREVIOUS', 'B A')
INSERT INTO #Table_Orig VALUES (456, 'CURRENT', 'A B')
INSERT INTO #Table_Orig VALUES (456, 'PREVIOUS', 'A D')
-- Split Table - Once run through a splitter function
CREATE TABLE #Table_A (PartNumber int, FILEVERSION varchar(100), SplitVALUE varchar(100))
INSERT INTO #Table_A VALUES (1234, 'CURRENT', 'A')
INSERT INTO #Table_A VALUES (1234, 'CURRENT', 'B')
INSERT INTO #Table_A VALUES (1234, 'PREVIOUS', 'B')
INSERT INTO #Table_A VALUES (1234, 'PREVIOUS', 'A')
INSERT INTO #Table_A VALUES (456, 'CURRENT', 'A')
INSERT INTO #Table_A VALUES (456, 'CURRENT', 'B')
INSERT INTO #Table_A VALUES (456, 'PREVIOUS', 'A')
INSERT INTO #Table_A VALUES (456, 'PREVIOUS', 'D')
-- Query
select *
from #Table_A prev
LEFT OUTER JOIN #Table_A curr on prev.PartNumber = curr.PartNumber and curr.FileVersion = 'CURRENT' AND prev.SplitValue = curr.SplitValue
WHERE prev.FileVersion = 'PREVIOUS'
AND curr.PartNumber IS NULL
-- Expected Output
PartNumber: 456
SplitValue: D
This seems to work fine, but is a bit cumbersome. I did some searching on Stackoverflow and here with no real matches. Does anyone have any other "better" suggestions?
Thanks,
Anton
September 30, 2013 at 10:36 am
You have been around long enough to know what to expect from a post with this little detail...
In order to help we will need a few things:
1. Sample DDL in the form of CREATE TABLE statements
2. Sample data in the form of INSERT INTO statements
3. Expected results based on the sample data
Please take a few minutes and read the first article in my signature for best practices when posting questions.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 30, 2013 at 10:38 am
To help you, we would need some consumable DDL, sample data and expected results based on the sample data provided as noted in the article linked on my signature.
I know you posted something but it becomes confusing for me when it's something that might not be the actual input/output.
Even if you give us the original delimited strings, we could work on them.
Be sure to avoid "Split by delimiter functions" that use while loops or are not inline table functions.
September 30, 2013 at 12:42 pm
As requested...Sorry about the lousy format...
Anton
-- Original Data
CREATE TABLE #Table_Orig (PartNumber int, FILEVERSION varchar(100), VALUE varchar(100))
INSERT INTO #Table_Orig VALUES (1234, 'CURRENT', 'A B')
INSERT INTO #Table_Orig VALUES (1234, 'PREVIOUS', 'B A')
INSERT INTO #Table_Orig VALUES (456, 'CURRENT', 'A B')
INSERT INTO #Table_Orig VALUES (456, 'PREVIOUS', 'A D')
-- Split Table
CREATE TABLE #Table_A (PartNumber int, FILEVERSION varchar(100), SplitVALUE varchar(100))
INSERT INTO #Table_A VALUES (1234, 'CURRENT', 'A')
INSERT INTO #Table_A VALUES (1234, 'CURRENT', 'B')
INSERT INTO #Table_A VALUES (1234, 'PREVIOUS', 'B')
INSERT INTO #Table_A VALUES (1234, 'PREVIOUS', 'A')
INSERT INTO #Table_A VALUES (456, 'CURRENT', 'A')
INSERT INTO #Table_A VALUES (456, 'CURRENT', 'B')
INSERT INTO #Table_A VALUES (456, 'PREVIOUS', 'A')
INSERT INTO #Table_A VALUES (456, 'PREVIOUS', 'D')
-- Query
select *
from #Table_A prev
LEFT OUTER JOIN #Table_A curr on prev.PartNumber = curr.PartNumber and curr.FileVersion = 'CURRENT' AND prev.SplitValue = curr.SplitValue
WHERE prev.FileVersion = 'PREVIOUS'
AND curr.PartNumber IS NULL
-- Expected Output
PartNumber: 456
SplitValue: D
September 30, 2013 at 1:12 pm
So, you have a current solution but would like a better approach?
You could try with EXCEPT or NOT EXISTS.
Here are the examples.
select PArtNumber,
Item
from #Table_Orig prev
CROSS APPLY dbo.DelimitedSplit8K(VALUE, ' ')
WHERE prev.FileVersion = 'PREVIOUS'
EXCEPT
select PArtNumber,
Item
from #Table_Orig prev
CROSS APPLY dbo.DelimitedSplit8K(VALUE, ' ')
WHERE prev.FileVersion = 'CURRENT'
select *
from #Table_Orig prev
CROSS APPLY dbo.DelimitedSplit8K(VALUE, ' ') splitprev
WHERE prev.FileVersion = 'PREVIOUS'
AND NOT EXISTS( select 1
from #Table_Orig cur
CROSS APPLY dbo.DelimitedSplit8K(VALUE, ' ') splitcur
WHERE cur.FileVersion = 'CURRENT'
AND cur.PArtNumber = prev.PArtNumber
AND splitcur.Item = splitprev.Item)
I'm using a splitter function that will be very fast compared with most methods. You can read about it and get the code from this article. http://www.sqlservercentral.com/articles/Tally+Table/72993/
September 30, 2013 at 1:39 pm
Here's my version to do only a single pass of the table.
But I'm not sure why you want to list only "D" in the result, since both "B" and "D" did not have a match. I list both -- if you can explain the reasoning behind how to select only "D", I'd be happy to adjust the query accordingly:
SELECT PartNumber, SplitVALUE
FROM #Table_A
--WHERE FILEVERSION IN ('CURRENT', 'PREVIOUS') --uncomment if needed
GROUP BY PartNumber, SplitVALUE
HAVING COUNT(*) <> 2
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
September 30, 2013 at 1:52 pm
Luis Cazares (9/30/2013)
So, you have a current solution but would like a better approach?You could try with EXCEPT or NOT EXISTS.
Luis,
Yes, I already have the splitter and comparisons working using a LEFT OUTER JOIN. The EXCEPT works better than the left outer join and is a lot easier to comprehend.
I was making sure that I haven't missed anything that might be new in the 2008R2 or 2012 or totally didn't think outside of the box enough.
Thanks,
Anton
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply