Select And Count Consecutive Numbers

  • Probably, Serqiy.... just I think they did it at a machine language level which would be a lot faster the at the "T-SQL Level". 

    Heh... I agree about the OO hints... kinda like that one fellow says as part of his sig... something like "DTS and Cursors are for those who don't understand T-SQL"   Still, I haven't been able to beat a properly written CTE, yet... was hoping it was because my home machine is a bit slower than servers which is why I keep asking folks to give us timing on a million row example. 

    I finally got a copy of the Developer's Edition of 2K5... will install sometime this weekend... right after I backup my machine

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • quoteJeff, I got to understand your code, but found a bug in it.

    Using your short test table, if you change the following:

    Thanks Antonio... was pretty sure that was covered even without the extra variable but I'll double check...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Antonio,

    Heh... had to do a double take... but no... it's working correctly because of the DISTINCT transfer and the GROUP BY... here's the results with the original data and your data modification...

    Results BEFORE Data Change

    Results AFTER Data Change

    EmployeeID

    Freight

    Items

    EmployeeID

    Freight

    Items

    1

    1.35

    2

    1

    1.35

    2

    1

    4.98

    2

    1

    4.98

    2

    1

    8

    2

    1

    8

    2

    1

    9.01

    4

    1

    9.01

    4

    2

    4.98

    3

    2

    4.98

    3

    2

    8

    3

    2

    8

    3

    2

    9.01

    4

    2

    9.01

    4

    3

    9.07

    2

    Because the 3,9.07 was changed to 3,9.08 and there already was a 3,9.08, then according to the rules the OP posted, the two 3,9.08's are disqualified and the listing for EmployeeID #3 simply goes away...

    Way to keep me on my toes, though

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, thanks for your patience. I am learning a LOT on this thread.

    Here is a variation on your code that will run faster than the original:

    --===== If the table that holds the test data already exists drop it

    IF OBJECT_ID('TempDB..#MyHead') IS NOT NULL

    DROP TABLE #MyHead

    --===== Recreate the test table and populate it with test data

    CREATE TABLE #MyHead (EmployeeID INT,Freight DECIMAL(6,2))

    INSERT INTO #MyHead (EmployeeID,Freight)

    SELECT 1,0.21 UNION ALL

    SELECT 1,0.45 UNION ALL

    SELECT 1,0.93 UNION ALL

    SELECT 1,1.27 UNION ALL

    SELECT 1,1.35 UNION ALL --should be a "hit" with count of "2"

    SELECT 1,1.36 UNION ALL

    SELECT 1,1.51 UNION ALL

    SELECT 1,1.66 UNION ALL

    SELECT 1,2.50 UNION ALL

    SELECT 1,3.94 UNION ALL

    SELECT 1,4.03 UNION ALL

    SELECT 1,4.27 UNION ALL

    SELECT 1,4.41 UNION ALL

    SELECT 1,4.98 UNION ALL --should be a "hit" with count of "2"

    SELECT 1,4.99 UNION ALL

    SELECT 1,4.99 UNION ALL --dupe (not included according to David's specs)

    SELECT 1,7.46 UNION ALL

    SELECT 1,8.00 UNION ALL --should be a "hit" with count of "2"

    SELECT 1,8.01 UNION ALL

    SELECT 1,9.01 UNION ALL --should be a "hit" with count of "4"

    SELECT 1,9.02 UNION ALL

    SELECT 1,9.03 UNION ALL

    SELECT 1,9.04 UNION ALL

    SELECT 2,4.27 UNION ALL --different employee id

    SELECT 2,4.41 UNION ALL

    SELECT 2,4.98 UNION ALL --should be a "hit" with count of "3"

    SELECT 2,4.99 UNION ALL

    SELECT 2,4.99 UNION ALL --dupe (not included according to David's specs)

    SELECT 2,5.00 UNION ALL

    SELECT 2,7.46 UNION ALL

    SELECT 2,8.00 UNION ALL --should be a "hit" with count of "3"

    SELECT 2,8.01 UNION ALL

    SELECT 2,8.02 UNION ALL

    SELECT 2,9.01 UNION ALL --should be a "hit" with count of "4"

    SELECT 2,9.02 UNION ALL

    SELECT 2,9.02 UNION ALL --dupe (not included according to David's specs)

    SELECT 2,9.02 UNION ALL --dupe (not included according to David's specs)

    SELECT 2,9.02 UNION ALL --dupe (not included according to David's specs)

    SELECT 2,9.03 UNION ALL

    SELECT 2,9.04 UNION ALL

    SELECT 2,9.07 UNION ALL

    SELECT 3,9.07 UNION ALL --just to show one EmployeeID won't bleed into the next

    SELECT 3,9.08

    --===== If the table that holds the isolated data already exists drop it

    IF OBJECT_ID('TempDB..#Work') IS NOT NULL

    DROP TABLE #Work

    --===== Create and populate the work table with a clustered primary key and a special

    CREATE TABLE #Work

    (

    EmployeeID

    INT NOT NULL,

    Freight

    DECIMAL(6,2) NOT NULL,

    MyGroup

    INT, --This is the "special" column

    PRIMARY KEY CLUSTERED (EmployeeID,Freight)

    )

    INSERT INTO #Work (EmployeeID,Freight)

    SELECT DISTINCT

    t1

    .EmployeeID,t1.Freight

    FROM #MyHead t1,

    #MyHead t2

    WHERE (t1.Freight+.01 = t2.Freight)

    AND t1.EmployeeID = t2.EmployeeID

    --***************************************************

    --*** Here I removed the second part of the union ***

    --***************************************************

    --=======================================================================================

    -- This bit of "trick" code relies on the order of the data which is

    -- forced by the Clustered Primary Key

    --===== Declare a couple of local variables to support the "trick" code

    DECLARE

    @MyGroup INT

    SET @MyGroup = 0

    DECLARE

    @LastFreight DECIMAL(6,2)

    SET @LastFreight = -1

    --===== Here's the "trick"... this "groups" rows that are .01 apart in Freight using the

    -- proprietary UPDATE clause of SQL Server... magic... and blows cursor speed away!!!

    UPDATE w

    SET @MyGroup = MyGroup = CASE

    WHEN Freight = @LastFreight + .01

    THEN @MyGroup

    ELSE @MyGroup+1

    END,

    @LastFreight

    = Freight

    FROM #Work w --Order of processing is by the Clustered Primary key

    --=======================================================================================

    -- So, with that having been done, it's now easy to get the results you want.

    --=======================================================================================

    --*********************************************************************************

    --*** Here I add 1 to the count to compensate for the "missing" trailing record ***

    --*********************************************************************************

    SELECT

    EmployeeID, MIN(Freight) AS Freight,COUNT(*) + 1 AS Items

    FROM #Work

    GROUP BY EmployeeID,MyGroup

     

    Antonio Macedo

  • Very cool... nice job, Antonio... takes only 28 seconds on the million row test...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Here is a solution that only takes about 12 seconds to do the calculation and present the data...

     

    -- Create sample data

    SELECT

    TOP 1000000

    CAST(RAND(CAST(NEWID() AS VARBINARY)) * 1000 + 1 AS INT) AS EmployeeID,

    CAST(RAND(CAST(NEWID() AS VARBINARY)) * 10 AS DECIMAL(6,2)) AS Freight

    INTO

    #MyHead

    FROM

    Master.dbo.SysColumns as sc1 WITH (NOLOCK),

    Master

    .dbo.SysColumns as sc2 WITH (NOLOCK)

    -- Create a clustered index over Freight

    CREATE

    CLUSTERED INDEX IX_Freight ON #MyHead (EmployeeID, Freight)

    -- Add an auxiliary column

    ALTER

    TABLE #MyHead

    ADD

    Grp INT

    -- Do the grouping thingy

    DECLARE

    @Freight DECIMAL(6, 2),

    @Grp

    INT,

    @Now

    DATETIME

    -- Start the clock

    SELECT

    @Freight = MIN(Freight) - 1,

    @Grp

    = 0,

    @Now

    = CURRENT_TIMESTAMP

    FROM

    #MyHead

    UPDATE

    #MyHead

    SET

    @Grp = Grp = CASE WHEN Freight > @Freight + 0.01 THEN @Grp + 1 ELSE @Grp END,

    @Freight

    = Freight

    -- Show the result

    SELECT

    EmployeeID,

    MIN(Freight) AS Freight,

    COUNT(DISTINCT Freight) AS [Count]

    FROM

    #MyHead

    GROUP

    BY Grp,

    EmployeeID

    HAVING

    COUNT(DISTINCT Freight) > 1

    ORDER

    BY EmployeeID,

    MIN(Freight)

    SELECT

    DATEDIFF(MILLISECOND, @Now, CURRENT_TIMESTAMP) AS Milliseconds

    -- Clean up

    drop

    table #myhead

     


    N 56°04'39.16"
    E 12°55'05.25"

  • You know what they say... "Tuning makes perfect"   Keeps getting better and better...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • <proprietary UPDATE clause of SQL Server... magic... and blows cursor speed away>

    This solution is really cute, and timely too - there are several current threads looking to a sequential cumulative component. But this statement type doesn't unfortunately work in a SELECT ("A SELECT statement that assigns a value to a variable must not be combined with data-retrieval operations."). Does it work with 2k5?

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Here is a set-based method that will work on both SQL 2000 and SQL 2005.

     

     

     

    -- Prepare sample data

    DECLARE

    @Sample TABLE (EmployeeID INT, Freight SMALLMONEY)

    INSERT

    @Sample

    SELECT

    1, 4.99 UNION ALL

    SELECT

    1, 0.21 UNION ALL

    SELECT

    1, 4.03 UNION ALL

    SELECT

    1, 1.36 UNION ALL

    SELECT

    1, 0.93 UNION ALL

    SELECT

    1, 4.98 UNION ALL

    SELECT

    1, 1.35 UNION ALL

    SELECT

    1, 0.45 UNION ALL

    SELECT

    1, 1.51 UNION ALL

    SELECT

    1, 4.99 UNION ALL

    SELECT

    1, 1.66 UNION ALL

    SELECT

    1, 2.50 UNION ALL

    SELECT

    1, 4.27 UNION ALL

    SELECT

    1, 3.94 UNION ALL

    SELECT

    1, 4.41 UNION ALL

    SELECT

    1, 1.27 UNION ALL

    SELECT

    1, 7.46

    -- Show the expected output

    SELECT

    d1.EmployeeID,

    d1

    .Freight,

    CAST(1 + 100 * MIN(d2.Freight) - 100 * d1.Freight AS INT) AS [Count]

    FROM

    (

    SELECT EmployeeID,

    Freight

    FROM @Sample AS t1

    WHERE NOT EXISTS (

    SELECT *

    FROM @Sample AS t2

    WHERE t1.Freight - t2.Freight = 0.01

    AND t1.EmployeeID = t2.EmployeeID

    )

    ) AS d1

    INNER

    JOIN (

    SELECT EmployeeID,

    Freight

    FROM @Sample AS t1

    WHERE NOT EXISTS (

    SELECT *

    FROM @Sample AS t2

    WHERE t2.Freight - t1.Freight = 0.01

    AND t2.EmployeeID = t1.EmployeeID

    )

    ) AS d2 ON d1.Freight <= d2.Freight AND d1.EmployeeID = d2.EmployeeID

    GROUP

    BY d1.EmployeeID,

    d1

    .Freight

    HAVING

    MIN(d2.Freight) > d1.Freight


    N 56°04'39.16"
    E 12°55'05.25"

  • Sorry, Chris... lost track of this thread...

    Specifically, what statement gave you that error?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yeah me too!

    I saw a use for ...

    UPDATE #MyHead

    SET

    @Grp = Grp = CASE WHEN Freight > @Freight + 0.01 THEN @Grp + 1 ELSE @Grp END,

    @Freight

    = Freight

    ... in the work I'm currently involved with and tried this syntax in a SELECT, but it only works with an UPDATE.

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Yeah... that's the bugger about it... @variable=column=formula ONLY works for Update and only with certain other "hooks" present....

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • and

    @variable=column=formula

    is equivalent to:

    SET @variable=formula, column = formula

    That is why it works for "Update" statements only

    Cheers,


    * Noel

  • My friend gave me an interesting problem yesterday. He works for a hospital AND needs TO COUNT patient visits, IF a patient visits the doctor ON consecutive DAYS, it needs TO be counted AS 1 visit.

    Here is how i solved it. This is just one OF many ways TO do it.

    DECLARE @tempTable TABLE(row_id INT IDENTITY(1,1),

    patient_id INT,

    er_date DATETIME)

    -- Test Data

    INSERT INTO @tempTable(patient_id,er_date)

    SELECT 1,'1/1/2012'

    UNION ALL

    SELECT 1,'1/2/2012'

    UNION ALL

    SELECT 1,'1/3/2012'

    UNION ALL

    SELECT 2,'1/5/2012'

    UNION ALL

    SELECT 2,'1/6/2012'

    UNION ALL

    SELECT 3,'1/8/2012'

    UNION ALL

    SELECT 3,'1/9/2012'

    UNION ALL

    SELECT 1,'2/1/2012'

    UNION ALL

    SELECT 1,'2/2/2012'

    UNION ALL

    SELECT 2,'2/3/2012'

    UNION ALL

    SELECT 2,'3/3/3012'

    UNION ALL

    SELECT 3,'1/10/2012'

    UNION ALL

    SELECT 3,'1/13/2012'

    UNION ALL

    SELECT 5,'1/5/2012'

    --If datediff is only 1 , it is treated as same visit

    SELECT

    tbl1.patient_id,

    SUM( CASE WHEN tbl1.er_date IS NOT NULL AND tbl2.er_date IS NULL THEN 1 ELSE 0 END ) [number_of_visits]

    FROM

    @tempTable tbl1

    LEFT JOIN @tempTable tbl2

    ON tbl1.patient_id = tbl2.patient_id

    AND tbl1.er_date = DATEADD(dd,1,tbl2.er_date)

    GROUP BY

    tbl1.patient_id;

  • Some people are entertained by fast cars...others by fast code...GO GEEKS, I say. 🙂

Viewing 15 posts - 31 through 45 (of 45 total)

You must be logged in to reply to this topic. Login to reply