Grouping by min/max in a series

Question

Post reply

Grouping by min/max in a series

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 1

Yeah. I know.

I wasn't sure about the amount of records to start with. I thought there might be only a few hundred of them.

I tested your code on SQL Server 2005, and there was no need to go dynamic. It run equally fast!

Good algorithm Jeff!

But... MAX portion does not seem to work on SQL Server 2005.

I had to drop the clustered index and recreate it descending.

USE TempDB

SET NOCOUNT ON

IF OBJECT_ID('MyTable') IS NOT NULL

DROP TABLE MyTable

CREATE TABLE MyTable

(

Code INT IDENTITY(1,1),

Status CHAR(1) NOT NULL

)

INSERT MyTable

(

Status

)

SELECT TOP 100000

CAST(RAND(CAST(NEWID() AS VARBINARY)) * 3 AS INT) AS Status

FROM Master.dbo.SYSCOLUMNS AS sc1

CROSS JOIN Master.dbo.SYSCOLUMNS AS sc2

UPDATE MyTable

SET Status = CASE STATUS

WHEN '0' THEN 'A'

WHEN '1' THEN 'S'

WHEN '2' THEN 'T'

END

SELECT TOP 100 *

FROM MyTable

ORDER BY Code

DECLARE @StartTime DATETIME

SET @StartTime = CURRENT_TIMESTAMP

SELECT Code,

Status,

CAST(NULL AS INT) AS MyMin,

CAST(NULL AS INT) AS MyMax

INTO #MyHead

FROM MyTable

ORDER BY Code,

Status

ALTER TABLE #MyHead

ADD CONSTRAINT PK_MyHead_Min PRIMARY KEY CLUSTERED (Code, Status)

DECLARE @PrevStatus CHAR(1),

@PrevCode INT

SELECT @PrevStatus = ' ',

@PrevCode = 0

UPDATE #MyHead

SET @PrevCode = MyMin = CASE Status

WHEN @PrevStatus THEN @PrevCode

ELSE Code

END,

@PrevStatus = Status

FROM #MyHead

ALTER TABLE #MyHead

DROP PK_MyHead_Min

ALTER TABLE #MyHead

ADD CONSTRAINT PK_MyHead_Max PRIMARY KEY CLUSTERED (Code DESC, Status DESC)

SELECT @PrevStatus = ' ',

@PrevCode = 0

UPDATE #MyHead

SET @PrevCode = MyMax = CASE Status

WHEN @PrevStatus THEN @PrevCode

ELSE Code

END,

@PrevStatus = Status

FROM #MyHead

SELECT DISTINCT MyMin AS Min,

MyMax AS Max,

Status

FROM #MyHead

ORDER BY MyMin

DROP TABLE #MyHead

PRINT DATEDIFF(ms,@StartTime,GETDATE())

N 56°04'39.16"
E 12°55'05.25"

Ken McKelvey SSCoach Points: 18843 More actions · Answer 2

Jeff,

Thanks for taking the time to explain your algorithm.

rmcknight SSC Eights! Points: 918 More actions · Answer 3

Yes, the source table only has little over 400 rows, and the retructuring only needs to be done once a year. The table was also supposed to have unique 'codes', but I had to clean that up.

Thank you ALL for goin at this.

This truely is an addiction as I explained to my new intern. Once challenged, it's amazing the response, which proves it. Guess theres a little McGuiver in all of us.

Jeff Moden SSC Guru Points: 1004360 More actions · Answer 4

Randy and Ken,

Yeah, it's pretty much an addiction to me. Thanks for the feedback... it's part of the addiction

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004360 More actions · Answer 5

Peter,

Ol' friend, thanks for the feedback. I haven't been "blessed" with 2k5 yet, so I always wonder how these things work out there.

Yes, you're correct. Randy originally said that this was for something a bit less than 900 rows... you and I have been in this business long enough to know that that will change... so I always try to write with the idea that it could be a million rows a year from now. It's part of that addiction Randy was talking about.

Hadn't thought about dropping the clustered index and reforming it to eliminate the dynamic SQL although it didn't seem to add much overhead... kinda surprised that 2k5 didn't like the index for Max. Really appreciate the rewrite you posted.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 6

The ~~bug~~ feature with clustered index versus non-clustered index and the technique with

SELECT @t = @t + ';' + Name

FROM Table1

was discussed some time ago on the other forum we usually meet

Too bad UPDATE does not come with ORDER BY...

I like the algorithm, it is VERY powerful and speedy.

But must be used with caution, as seen with SQL Server 2005. Who knows when MS change the specs again?

Until next time, Jeff!

Kudos

N 56°04'39.16"
E 12°55'05.25"

Jeff Moden SSC Guru Points: 1004360 More actions · Answer 7

Heh... yeah... I remember that discussion... It's a bloody shame that Microsoft keeps changing features to bugs especially on things like this... no wonder lot's of folks are hesitant to make the switch... they would have to do full regression testing and validation to check for junk like this... some new fault with TOP is another of those nasty bugs they introduced in 2k5... lot's of powerful code is suddenly making flushing sounds...

Hmmmm.... UPDATE with an ORDER BY... wonder if a derived table would do the trick there... probably requires a TOP 100 PERCENT which, I understand, has one of the new ~~bugs~~ features in 2k5.

[Edit]... You're usually pretty thorough when it comes to testing but I've gotta be sure... Did you try the WITH (INDEX) in 2k5, as well?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 8

Yes. I did test with INDEX hint.

In the code posted above, I omitted them becuase the ~~bug~~ feature of only following the clustered index anyway.

At least in the UPDATE. For the SELECT part

DECLARE @t VARCHAR

SELECT @t = ISNULL(@t + ',', '') + CAST(c.Number AS VARCHAR)

FROM (SELECT TOP 100 PERCENT Number FROM master..spt_values WHERE Name IS NULL AND Number BETWEEN 1 AND 9 ORDER BY Number) AS c

works, regardless of clustered index.

N 56°04'39.16"
E 12°55'05.25"