May 31, 2012 at 9:50 am
Hi Guys,
I know this should be simple to code but i've hit a wall trying to produce something simple that will do the following:-
I need to traverse through a table to find which rows have a column value that is less than the previous row's column value
e.g.
ColValue
23
123
233
32 ----- Bingo! This value is less than previous value
45
124
32 ----Bingo! This value is less than previous value
and so on
Any ideas?
Thanks,
Pianoman
May 31, 2012 at 10:07 am
How about this?
DECLARE @tab TABLE
( ID INT , ColValue INT)
;
INSERT INTO @tab (ID, ColValue)
SELECT 1, 23
UNION ALL SELECT 2, 123
UNION ALL SELECT 3, 233
UNION ALL SELECT 4, 32
UNION ALL SELECT 5, 45
UNION ALL SELECT 6, 124
UNION ALL SELECT 7, 32
;
;
WITH rCTE AS
(
SELECT T.ID , T.ColValue
, 0 AS Indicator
FROM @tab T
WHERE T.ID = 1
UNION ALL
SELECT Base.ID , BASE.ColValue
, CASE WHEN BasE.ColValue < R.ColValue THEN 1
ELSE 0
END
FROM @tab Base
JOIN rCTE R
ON R.ID + 1 = Base.ID
)
SELECT *
FROM rCTE
WHERE Indicator = 1
May 31, 2012 at 10:22 am
If you want something simpler than ColdCoffee's solution, but not nearly as efficient (according to the estimated execution plan, unless you index the join and include the return values), using the same @tab table variable...
SELECT *
FROM @tab A
INNER JOIN @tab B
ON A.ID = B.ID - 1
WHERE A.ColValue > B.ColValue
May 31, 2012 at 11:41 am
Thanks ColdCoffee and Jeffem - Appreciate the replies. Will give these suggestions a try.
My original attempt to code this problem ended up being a ugly cursor - I knew the would be a more elegant way.
Hope to return the favour (SQL engineering more my forte.)
Cheers!
May 31, 2012 at 3:27 pm
There can be a small problem with Jeffem and ColdCoffee's solution.
If you miss an ID value (most common scenario is when you delete a record and the identity does not use the value anymore)
For example:
INSERT INTO @tab (ID, ColValue)
SELECT 1, 23
UNION ALL SELECT 2, 123
UNION ALL SELECT 3, 233
UNION ALL SELECT 5, 32
UNION ALL SELECT 6, 45
UNION ALL SELECT 7, 124
UNION ALL SELECT 8, 32
;
ColdCoffee's solution won't show any value and jeffem's solution will only show one value.
I have the same problem and used something like this, but I'm concerned about the performance.
DECLARE @tab TABLE
( ID INT , ColValue INT)
;
INSERT INTO @tab (ID, ColValue)
SELECT 1, 23
UNION ALL SELECT 2, 123
UNION ALL SELECT 3, 233
UNION ALL SELECT 4, 32
UNION ALL SELECT 5, 45
UNION ALL SELECT 6, 124
UNION ALL SELECT 7, 32
;
WITH rCTE AS
(
SELECT A.ID, A.ColValue, MAX( B.ID) AS prev_ID
FROM @tab A
INNER JOIN @tab B
ON A.ID > B.ID
GROUP BY A.ColValue, A.ID
)
SELECT *
FROM rCTE A
INNER JOIN @tab B
ON A.prev_ID = B.ID
WHERE A.ColValue < B.ColValue
May 31, 2012 at 9:25 pm
A good point! It seems to me the best way to handle it might be to create a ROW_NUMBER(), to have perfectly sequential IDs, then use the solutions presented before. But that requires the data to be sortable exactly as the comparison is desired.
I more wanted to supply the logic without getting into those details, which was a bit irresponsible. But I think ROW_NUMBER() is the best way to go.
June 1, 2012 at 12:02 am
You can also do it with a subquery like this:
DECLARE @tab TABLE
( ID INT , ColValue INT)
;
INSERT INTO @tab (ID, ColValue)
SELECT 1, 23
UNION ALL SELECT 2, 123
UNION ALL SELECT 3, 233
UNION ALL SELECT 5, 32
UNION ALL SELECT 6, 45
UNION ALL SELECT 7, 124
UNION ALL SELECT 8, 32
;
;WITH CTE AS (
SELECT ID, ColValue
,(SELECT TOP 1 t2.ColValue
FROM @tab t2
WHERE t1.ID > t2.ID
ORDER BY ID DESC) PrevRow
FROM @tab t1
)
SELECT ID, ColValue
FROM CTE
WHERE 1=CASE WHEN PrevRow IS NULL THEN 0 WHEN PrevRow<ColValue THEN 0 ELSE 1 END
Just guessing mind you but because of the TOP clause in the subquery this should perform better than joining the table onto itself.
Note that this solution also handles the case of missing IDs.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
June 1, 2012 at 2:30 am
Cheers Guys.
Jeff - Yup I thought about adding ROW_NUMBER() too for the same reason Luis made about missing ID's (thanks Luis)
Dwain - Cheers to your feedback. Nice to see lovely bits of code with no cursor in sight 🙂
thanks
Pianoman
June 1, 2012 at 7:38 am
Dwain, good job on accounting for the missing IDs! 🙂
Just a note about the performance: I did a comparison, and using a table variable will give a bit of a performance advantage (15% or so, for reasonable numbers of records) to your method. If this were in an actual table (temp, or otherwise) with a good, covering index, then the execution plan between our two solutions is essentially identical.
June 1, 2012 at 10:56 am
Try this too: DECLARE @tab TABLE
( ID INT , ColValue INT)
;
INSERT INTO @tab (ID, ColValue)
SELECT 1, 23
UNION ALL SELECT 2, 123
UNION ALL SELECT 3, 233
UNION ALL SELECT 4, 32
UNION ALL SELECT 5, 45
UNION ALL SELECT 6, 124
UNION ALL SELECT 7, 32
SELECT t1.id,t1.colvalue
FROM @tab t1 LEFT JOIN @tab t2 ON t1.id = t2.id + 1 AND t1.colvalue < t2.colvalue
WHERE t2.id IS NOT NULL
Should be pretty good on performance. For some reason I'm currently in love with LEFT JOINS.
June 1, 2012 at 11:20 am
It doesn't hurt to do it with a left join, but if you use an inner join there, you don't need to require where t2.id is not null.
June 1, 2012 at 11:49 am
I'm surprised it hasn't been mentioned before, but I do hope you have some way to actually define the ordering within the table. A common stumbling block is to think there is some physical order to the table (there isn't). Assuming you DO, a CTE with Row_number() will perform fairly well, even on large sets. I'd think it would perform something with a correlated sub, but that will depend on existing indexes on the correlated table.
As an aside - if you have a healthy table (i.e anything over a couple hundred records), you will want to avoid table variables. Table variables tend to play nasty tricks on execution plans (estimated # of rows in a table variable stay pegged at 1), so the engine can't optimize for "large" table variables.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
June 1, 2012 at 3:34 pm
jeffem (5/31/2012)
But that requires the data to be sortable exactly as the comparison is desired
Heh.... if you can sort exactly as the comparison is desired, you're dead no matter what. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
June 1, 2012 at 9:18 pm
jeffem (6/1/2012)
Dwain, good job on accounting for the missing IDs! 🙂Just a note about the performance: I did a comparison, and using a table variable will give a bit of a performance advantage (15% or so, for reasonable numbers of records) to your method. If this were in an actual table (temp, or otherwise) with a good, covering index, then the execution plan between our two solutions is essentially identical.
Can you post your comparison? I'm confused whether you're comparing actual run results (e.g., with Jeff's 1M row test harness) or just comparing execution plans. I'd do it myself but I'm kinda tied up on something at the moment, but if I don't see something posted maybe I'll come back to it.
I'm really just curious because I saw a subquery version like mine blast the heck out of an alternative something like yours (no comparison here intended of course) before, where the execution plans rated them about the same.
I use table variables for convenience so I don't have temp tables littering my sandbox if I forget to drop them or something. I agree they don't always end up rating the same as using an actual table. And with good indexing, it could completely reverse the results.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
June 1, 2012 at 9:29 pm
Matt Miller (#4) (6/1/2012)
A common stumbling block is to think there is some physical order to the table (there isn't).
Of course there is. You just can't depend on results coming back in that physical order 🙂
DECLARE @Example AS TABLE
(
ColValue integer NOT NULL
);
INSERT INTO @Example
(ColValue)
VALUES
(23),
(123),
(233),
(32),
(45),
(124),
(32);
SELECT
e.*,
file_num = CONVERT(integer,CONVERT(binary(2),REVERSE(SUBSTRING(%%PhysLoc%%,5,2)))),
page_num = CONVERT(integer,CONVERT(binary(4),REVERSE(SUBSTRING(%%PhysLoc%%,1,4)))),
slot_num = CONVERT(integer,CONVERT(binary(2),REVERSE(SUBSTRING(%%PhysLoc%%,7,2))))
FROM @Example AS e
ORDER BY
file_num,
page_num,
slot_num
OPTION (RECOMPILE);
As an aside - if you have a healthy table (i.e anything over a couple hundred records), you will want to avoid table variables. Table variables tend to play nasty tricks on execution plans (estimated # of rows in a table variable stay pegged at 1), so the engine can't optimize for "large" table variables.
Well it can a bit. See the example above (the number of estimated rows from the table variable in the actual plan is 7). If a statement-level recompile occurs (for whatever reason) the optimizer will see the run-time cardinality of the table variable. Table variables do not, however, currently support statistics (even on indexes). Where a logical query has cost-based physical execution choices that benefit from statistical information, temporary tables are a better choice. If a trivial plan is possible, creating statistics on the temporary table just adds overhead.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
Viewing 15 posts - 1 through 15 (of 21 total)
You must be logged in to reply to this topic. Login to reply