Avoiding cursor: Help with getting only first match after previous match

Question

Avoiding cursor: Help with getting only first match after previous match

t.brown 89142

SSCommitted

Points: 1530
More actions
November 15, 2012 at 4:38 am

#266966

Hi I'm sure there is a better name for this problem, and its probably quite a common one with a hopefully easy solution. I can do this with a cursor - but I've learned from this site that its probably bad form.
I have 2 tables, with a matching text field - but the 2nd table contains many unwanted matches. and the matching has to go in sequence, so if a match is used, then future matches don't look prior to that for the next match.
Perhaps easier to explain with the examples data below:
CREATE TABLE #Table1 (
ID INT IDENTITY(1,1) PRIMARY KEY,
TEXT1 varchar(100) );
CREATE TABLE #Table2 (
aDate DATETIME NOT NULL,
TEXT2 varchar(100) );
CREATE CLUSTERED INDEX IX_aDate ON #Table2(aDate);
INSERT INTO #Table1 (TEXT1)
SELECT 'AAAA' UNION ALL
SELECT 'BBBB' UNION ALL
SELECT 'AAAA' UNION ALL
SELECT 'BBBB' UNION ALL
SELECT 'CCCC' UNION ALL
SELECT 'BBBB' UNION ALL
SELECT 'BBBB' UNION ALL
SELECT 'AAAA' UNION ALL
SELECT 'CCCC' UNION ALL
SELECT 'BBBB' UNION ALL
SELECT 'CCCC' ;
SELECT * FROM #Table1;
INSERT INTO #Table2 (aDate, TEXT2)
SELECT '2012-01-01 12:00:00', 'BBBB' UNION ALL
SELECT '2012-01-01 12:01:00', 'AAAA' UNION ALL
SELECT '2012-01-01 12:02:00', 'BBBB' UNION ALL
SELECT '2012-01-01 12:02:00', 'BBBB' UNION ALL
SELECT '2012-01-01 12:02:00', 'CCCC' UNION ALL
SELECT '2012-01-01 12:03:00', 'AAAA' UNION ALL
SELECT '2012-01-01 12:03:00', 'AAAA' UNION ALL
SELECT '2012-01-01 12:04:00', 'BBBB' UNION ALL
SELECT '2012-01-01 12:05:00', 'BBBB' UNION ALL
SELECT '2012-01-01 12:06:00', 'CCCC' UNION ALL
SELECT '2012-01-01 12:07:00', 'BBBB' UNION ALL
SELECT '2012-01-01 12:08:00', 'BBBB' UNION ALL
SELECT '2012-01-01 12:08:00', 'CCCC' UNION ALL
SELECT '2012-01-01 12:08:00', 'BBBB' UNION ALL
SELECT '2012-01-01 12:10:00', 'AAAA' UNION ALL
SELECT '2012-01-01 12:11:00', 'BBBB' UNION ALL
SELECT '2012-01-01 12:12:00', 'CCCC' UNION ALL
SELECT '2012-01-01 12:12:00', 'BBBB' UNION ALL
SELECT '2012-01-01 12:14:00', 'BBBB' UNION ALL
SELECT '2012-01-01 12:14:00', 'CCCC' UNION ALL
SELECT '2012-01-01 12:14:00', 'BBBB';
And the desired output is All of the Table1 records with the first match from table 2 where the match comes after the previous match.
-- Desired Result
ID Text1 aDate Text2
1'AAAA','2012-01-01 12:01:00', 'AAAA'
2'BBBB','2012-01-01 12:02:00', 'BBBB'
3'AAAA','2012-01-01 12:03:00', 'AAAA'
4'BBBB','2012-01-01 12:04:00', 'BBBB'
5'CCCC','2012-01-01 12:06:00', 'CCCC'
6'BBBB','2012-01-01 12:07:00', 'BBBB'
7'BBBB','2012-01-01 12:08:00', 'BBBB'
8'AAAA','2012-01-01 12:10:00', 'AAAA'
9'CCCC','2012-01-01 12:12:00', 'CCCC'
10'BBBB','2012-01-01 12:12:00', 'BBBB'
11'CCCC','2012-01-01 12:14:00', 'CCCC'
Here is my best effort so far. I know I have to do something with ranking or partitioning each subgroup of #Table2 and only selecting the first match - but the syntax eludes me.
-- First Add a ROW_Number to Table2
ALTER TABLE #Table2
ADD RowNum INT;
WITH SetRows AS (
SELECT ROW_NUMBER() OVER(ORDER BY aDate, TEXT2) As RN, aDate, Text2
FROM #Table2 )
UPDATE T2
SET RowNum = RN
FROM SetRows AS S
JOIN #Table2 AS T2 ON S.aDate = T2.aDate AND S.TEXT2 = T2.TEXT2
-- Gets too many rows but does include the required results
;WITH Numbered AS (
SELECT ROW_NUMBER() OVER(ORDER BY ID) AS Row1, ID, TEXT1
FROM #Table1
), Numbered2 AS (
SELECT ROW_NUMBER() OVER(ORDER BY ID, aDate) AS RNMatch,
ROW_NUMBER() OVER( PARTITION BY aDate ORDER BY ID, aDate) AS PartNo, Row1, ID, TEXT1, T2.RowNum, T2.aDate, T2.TEXT2
FROM Numbered AS T1
JOIN #Table2 AS T2 ON T1.TEXT1 = T2.TEXT2
AND T2.RowNum >= Row1
)
SELECT N1.PartNo, N1.Row1, N1.ID, N1.TEXT1, N1.RowNum, N1.aDate, N1.TEXT2
FROM Numbered2 AS N1
JOIN Numbered2 AS N2 ON N1.RNMatch = N2.RNMatch AND N2.PartNo >= N1.PartNo
ORDER BY N1.ID, N2.aDate

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply

Sean Lange SSC Guru Points: 286609 More actions · Answer 1

Excellent job posting ddl and sample data!!! I am a bit confused on what you are trying to do here though. I know you posted your desired output (another huge kudos!!!). However I can't figure out the logic here at all.

If we look just at AAA you have 3 rows in your results but I can't figure out what the logic of the value of the date is.

This query is what I was using to try to understand what you are looking for but it just doesn't quite match up.

SELECT *, ROW_NUMBER() over (partition by Text1 order by Text1) as RowNum FROM #Table1 order by TEXT1

select *, ROW_NUMBER() over (partition by Text2 order by aDate, Text2) as RowNum from #Table2 order by TEXT2

For AAA RowNum 1 = RowNum 1 from table2, so far so good.

AAA RowNum 2 = RowNum 2 from table2 ...still makes sense

AAA RowNum 3 = RowNum 4 from table2 ??? Why is this????

Then when you get to BBB the first one starts with row 2 from table 2 and then doesn't make any sense at all to me. You are getting rows 2,4,5 from table 2.

I suspect there must be some sort of reasoning here but I can't figure it out.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

drew.allen SSC Guru Points: 77014 More actions · Answer 2

The problem is that your desired results require that your TABLE1 be sorted in a specific order, but the data provided is insufficient to enforce that order. You have to remember that there is no default order by in a set.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

t.brown 89142 SSCommitted Points: 1530 More actions · Answer 3

I'll try to express it better - but I'm beginning to think its a procedural not a set-based problem.

Take a row from #Table1

Match the first row (Minimum Date) in #Table2 on the text field call this Match1.

Take the next row from #table1

Match the first row in #Table2 on text field where (minimum date that comes after Match1) - call this Match2 on text field

Take Nth row from #Table1

Match on text field where date = minimum date that comes after matchN

All #Table1 rows should have a match.

So the rule is when a #Table2 entry has now been matched THE MATCHED ROW AND ALL PRECEEDING ROWS on #Table2 should be excluded from future matches.

t.brown 89142 SSCommitted Points: 1530 More actions · Answer 4

drew.allen (11/15/2012)
The problem is that your desired results require that your TABLE1 be sorted in a specific order, but the data provided is insufficient to enforce that order. You have to remember that there is no default order by in a set.
Drew

Hence all the ROW_NUMBER() functions in my efforts so far.

But it doesn't matter if two #Table2 rows have the same Date and Text - as the outcome is the same whichever is chosen as the match.

t.brown 89142 SSCommitted Points: 1530 More actions · Answer 5

OK so this may not be possible to get exact results in T-SQL, because, as pointed out above, there is no default sort, but here is the solution I'm going with as a 'best fit' so far.

*** Double Cursor Evil RBAR monstrosity warning ***

ALTER TABLE #Table2

ADD RowNum INT;

WITH SetRows AS (

SELECT ROW_NUMBER() OVER(ORDER BY aDate) As RN, aDate, Text2

FROM #Table2 )

UPDATE T2

SET RowNum = RN

FROM SetRows AS S

JOIN #Table2 AS T2 ON S.aDate = T2.aDate AND S.TEXT2 = T2.TEXT2

--- Sod it - use a cursor

DECLARE @ROW1 INT, @ID INT, @TEXT1 VARCHAR(100);

DECLARE @MatchDate DATETIME, @TEXT2 VARCHAR(100);

DECLARE @MatchPos INT;

DECLARE @RESULTS TABLE (ID INT PRIMARY KEY, TEXT1 varchar(100), aDate DATETIME, TEXT2 varchar(100), aDate2 DATETIME );

INSERT INTO @RESULTS (ID, TEXT1)

SELECT ID, TEXT1

FROM #Table1;

DECLARE SODIT CURSOR FAST_FORWARD FOR

SELECT ROW_NUMBER() OVER(ORDER BY T1.ID) AS Row1, T1.ID, T1.TEXT1

FROM #Table1 AS T1

ORDER BY T1.ID

OPEN SODIT;

SET @MatchDate = '2000-01-01'; -- just a minimum date

SET @MatchPos = -1;

FETCH NEXT FROM SODIT INTO @ROW1, @ID, @TEXT1

WHILE(@@FETCH_STATUS = 0)

BEGIN

SELECT @MatchPos = MIN(T2.RowNum)

FROM #Table2 AS T2

WHERE T2.TEXT2 = @TEXT1

AND T2.RowNum > @MatchPos;

SELECT @MatchDate = T2.aDate

FROM #Table2 AS T2

WHERE T2.RowNum = @MatchPos;

UPDATE @RESULTS

SET TEXT1= @TEXT1,

aDate = @MatchDate

WHERE ID = @ID;

FETCH NEXT FROM SODIT INTO @ROW1, @ID, @TEXT1

END

CLOSE SODIT;

DEALLOCATE SODIT;

DECLARE SODIT CURSOR FAST_FORWARD FOR

SELECT ROW_NUMBER() OVER(ORDER BY T1.ID) AS Row1, T1.ID, T1.TEXT1

FROM #Table1 AS T1

ORDER BY T1.ID

OPEN SODIT;

SET @MatchDate = '2000-01-01'; -- just a minimum date

SET @MatchPos = -1;

FETCH NEXT FROM SODIT INTO @ROW1, @ID, @TEXT1

WHILE(@@FETCH_STATUS = 0)

BEGIN

SELECT @MatchDate = MIN(T2.aDate)

FROM #Table2 AS T2

WHERE T2.TEXT2 = @TEXT1

AND T2.aDate >= @MatchDate

UPDATE @RESULTS

SET TEXT2 = @TEXT1,

aDate2 = @MatchDate

WHERE ID = @ID;

FETCH NEXT FROM SODIT INTO @ROW1, @ID, @TEXT1

END

CLOSE SODIT;

DEALLOCATE SODIT;

SELECT * FROM @RESULTS ORDER BY ID;

Can't wait to run it on the live tables where #table1 has 105,000 rows and Table2 has 520,000 rows!

🙁

t.brown 89142 SSCommitted Points: 1530 More actions · Answer 6

And posting that CURSOR just got me promoted to 'SSC-Enthusiastic'

The Irony :rolleyes:

Dwain Camps SSC Guru Points: 86978 More actions · Answer 7

I'm not getting exactly the desired results you posted but maybe this will give you a hint.

;WITH CTE AS (

SELECT TOP 1 ID, TEXT1, aDate, TEXT2

FROM #Table1

JOIN #Table2 ON TEXT1 = TEXT2

UNION ALL

SELECT a.ID, a.TEXT1, b.aDate, b.TEXT2

FROM #Table1 a

JOIN CTE c ON a.ID = c.ID + 1

JOIN #Table2 b ON a.TEXT1 = b.TEXT2 AND b.aDate >= c.aDate

)

SELECT TEXT1, aDate, TEXT2

FROM (

SELECT TEXT1, aDate, TEXT2

,n=ROW_NUMBER() OVER (PARTITION BY ID ORDER BY aDate)

FROM CTE) a

WHERE n = 1

OPTION (MAXRECURSION 0)

Where the code says AND b.aDate >= c.aDate, try making that AND b.aDate > c.aDate and run it both ways. I think this is close and that the answer may lie in reinterpreting the expected results from the sample data.

Unfortunately this is probably going to be quite slow depending on the number of rows in your actual data.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

t.brown 89142 SSCommitted Points: 1530 More actions · Answer 8

Thanks Dwain and very useful - quicker than my cursor, and it gets the same results.

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 9

drew.allen (11/15/2012)
The problem is that your desired results require that your TABLE1 be sorted in a specific order, but the data provided is insufficient to enforce that order. You have to remember that there is no default order by in a set.
Drew

Take another look, Drew. He has IDENTITY columns on both tables to preserve the desired order.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 10

t.brown 89142 (11/15/2012)
drew.allen (11/15/2012)
The problem is that your desired results require that your TABLE1 be sorted in a specific order, but the data provided is insufficient to enforce that order. You have to remember that there is no default order by in a set.
Drew
Hence all the ROW_NUMBER() functions in my efforts so far.
But it doesn't matter if two #Table2 rows have the same Date and Text - as the outcome is the same whichever is chosen as the match.

But you DO have something to guarantee the order you want in the form of IDENTITY columns.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

t.brown 89142 SSCommitted Points: 1530 More actions · Answer 11

Actually #Table2 has the date, but as its only to the minute, there are duplicates - and the order of TEXT2 does not help - see the last few lines of the sample data - and my added column RowNum has duplicates. I think it originally comes from a text file somewhere, I could ask the programmer to re-import it for me, adding an identity column - trouble is he's run off his feet.

But the 'best fit' solution devised by my cursor and replicated by Dwain's recursive cte is good enough for the end user.

I had assumed (incorrectly as it turns out) that this problem would have a standard SQL solution as its a classic two file problem read file A, go down file B looking for a match. Save position in file B, read next from A, continue down file B from saved position. I've done this kind of thing in procedural languages many times.

Dwain Camps SSC Guru Points: 86978 More actions · Answer 12

t.brown 89142 (11/16/2012)
I had assumed (incorrectly as it turns out) that this problem would have a standard SQL solution as its a classic two file problem read file A, go down file B looking for a match. Save position in file B, read next from A, continue down file B from saved position. I've done this kind of thing in procedural languages many times.

Probably the reason there is no "standard" SQL solution for this is that SQL isn't procedural by nature (it is declarative) but you can do many procedural things with it.

I'm glad it worked for you even though I had my concerns on how it might perform over larger row sets.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

thava SSCrazy Points: 2275 More actions · Answer 13

dwain.c (11/15/2012)
I'm not getting exactly the desired results you posted but maybe this will give you a hint.
;WITH CTE AS (
SELECT TOP 1 ID, TEXT1, aDate, TEXT2
FROM #Table1
JOIN #Table2 ON TEXT1 = TEXT2
UNION ALL
SELECT a.ID, a.TEXT1, b.aDate, b.TEXT2
FROM #Table1 a
JOIN CTE c ON a.ID = c.ID + 1
JOIN #Table2 b ON a.TEXT1 = b.TEXT2 AND b.aDate >= c.aDate
)
SELECT TEXT1, aDate, TEXT2
FROM (
SELECT TEXT1, aDate, TEXT2
,n=ROW_NUMBER() OVER (PARTITION BY ID ORDER BY aDate)
FROM CTE) a
WHERE n = 1
OPTION (MAXRECURSION 0)
Where the code says AND b.aDate >= c.aDate, try making that AND b.aDate > c.aDate and run it both ways. I think this is close and that the answer may lie in reinterpreting the expected results from the sample data.
Unfortunately this is probably going to be quite slow depending on the number of rows in your actual data.

just by curiosity the last row of the result is not matched so i think adding an identity to the #table2 and using that will solve this issue may i right or we will need to work more

Every rule in a world of bits and bytes, can be bend or eventually be broken
MyBlog About Common dialog control
A Visualizer for viewing SqlCommand object script [/url]

t.brown 89142 SSCommitted Points: 1530 More actions · Answer 14

thava (11/19/2012)
dwain.c (11/15/2012)
... SNIP
Where the code says AND b.aDate >= c.aDate, try making that AND b.aDate > c.aDate and run it both ways.
... SNIP
just by curiosity the last row of the result is not matched so i think adding an identity to the #table2 and using that will solve this issue may i right or we will need to work more

For each item ONE of the two runs gets the correct result - where the differences occur required manual inspection to choose the right one. - But it happened a lot less than you'd think for the real data - less than 200 manual checks out of around 80,000 matches.