grouping rows into sets

  • wish to group rows into "sets" based on t_id and t_type.

    -- set up, notes, and expected results below.

    Thanks JLS

    CREATE TABLE #l(
    t_id INT NOT NULL
    ,t_type VARCHAR(1) NOT NULL
    ,t_date DATE NOT NULL
    );
    INSERT INTO #l(t_id,t_type,t_date) VALUES
    (356778,'R','2012-01-01'),(356778,'R','2012-01-02'),(356778,'C','2015-01-07')
    ,(356778,'C','2015-01-08'),(356778,'C','2015-01-11'),(356778,'R','2019-01-01')
    ,(356778,'R','2019-01-02'),(356778,'R','2019-01-11'),(356778,'R','2019-01-12')
    ,(356778,'R','2020-01-01'),(435677,'R','2012-01-04'),(435677,'R','2012-01-05')
    ,(435677,'C','2015-01-07'),(435677,'C','2015-01-08'),(435677,'C','2015-01-09')
    ,(435677,'C','2015-01-10'),(435677,'C','2015-01-11'),(435677,'R','2019-01-01')
    ,(435677,'R','2019-01-02');

    SELECT * FROM #l

    -- want to incrementaly group by t_id and t_type
    -- with an id for each row per incremental group

    -- expected results as follows

    CREATE TABLE #results(
    t_id INT NOT NULL
    ,t_type VARCHAR(1) NOT NULL
    ,t_date DATE NOT NULL
    ,t_Group INT NOT NULL
    ,t_no INT NOT NULL
    );
    INSERT INTO #results(t_id,t_type,t_date,t_Group,t_no) VALUES
    (356778,'R','2012-01-01',1,1),(356778,'R','2012-01-02',1,2),(356778,'C','2015-01-07',2,1)
    ,(356778,'C','2015-01-08',2,2),(356778,'C','2015-01-11',2,3),(356778,'R','2019-01-01',3,1)
    ,(356778,'R','2019-01-02',3,2),(356778,'R','2019-01-11',3,3),(356778,'R','2019-01-12',3,4)
    ,(356778,'R','2020-01-01',3,5),(435677,'R','2012-01-04',1,1),(435677,'R','2012-01-05',1,2)
    ,(435677,'C','2015-01-07',2,1),(435677,'C','2015-01-08',2,2),(435677,'C','2015-01-09',2,3)
    ,(435677,'C','2015-01-10',2,4),(435677,'C','2015-01-11',2,5),(435677,'R','2019-01-01',3,1)
    ,(435677,'R','2019-01-02',3,2);

    SELECT * FROM #results

    DROP table #l
    DROP TABLE #results


    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • WITH CTE1 AS (
    SELECT *,
    CASE WHEN LAG(t_type) OVER(PARTITION BY t_id ORDER BY t_date) = t_type THEN 0 ELSE 1 END AS IsStart
    FROM #l
    ),
    CTE2 AS (
    SELECT *,
    SUM(IsStart) OVER(PARTITION BY t_id ORDER BY t_date) AS t_Group
    FROM CTE1
    )
    SELECT t_id,t_type,t_date,t_Group,
    ROW_NUMBER() OVER(PARTITION BY t_id,t_Group ORDER BY t_date) AS t_no
    FROM CTE2;

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Thanks Mark

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • This might be helpful reading.

    https://dotnettutorials.net/lesson/over-clause-sql-server/

     

    • This reply was modified 3 years, 3 months ago by  homebrew01.
  • Thanks Homebrew....

    it was the "trick" CASE WHEN LAG(t_type) OVER(PARTITION BY t_id ORDER BY t_date) = t_type THEN 0 ELSE 1 END AS IsStart that Mark used was the bit I had forgotten about.

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • >> wish to group rows into "sets" based on t_id and t_type. <<

    This makes little sense to me. You already have your sets defined by the t_id, So there’s no need to redundantly add some kind of group number. Within the groups, we have no rule given for ordering them. Therefore, I can assume you don’t want any ordering? In fact, since the t_type has duplicates within each t_id, we are stuck. It looks like you might be ordering by the date, but since you didn’t tell us that it’s not part of our specifications.

    CREATE TABLE Foobar

    (t_id CHAR(6) NOT NULL,

    t_type CHAR(1) NOT NULL CHECK(t_type IN (‘R’, ‘C’)),

    PRIMARY KEY (t_id, t_type, t_date),

    t_date DATE NOT NULL);

    Some minor things about your DDL; by definition, not as an option, the table must have a key. What you’ve done is put up a deck of punch cards, written in SQL. . In fact, I love to see the way you used the leading comma at the start of a punch card. We did that back in the 1960s so we can more easily rearrange the deck of punch cards. Ever since about 1970, programmers have simply hit a key and called a pretty printing routine to format their code. Since an identifier has to be on a nominal scale, again, by definition, not an option, it should be a character string. Also, VARCHAR(1) makes no sense.

    Finally, why are you materializing the results? We had to do this with punchcards, not with SQL. Also, did you know that the word “group” is a reserved word in SQL?

    Remember that one of our goals in RDBMS is to reduce redundancy, not increase it. Another goal is not to do display formatting in the database. That's done in a presentation layer. Just keep it simple.

    .

    CREATE VIEW Labeled_Foobar

    AS

    SELECT t_id, t_type, t_date,

    ROW_NUMBER() OVER (PARTITION BY t_id ORDER BY t_date) AS group_nbr

    FROM Foobar;

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  •  

    JCelko

    Thanks for your reply....but your coded response does not meet my expected results  (which I posted)

    Other posters have resolved the problem for me.

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Heh... Welcome back again, JLS.  As you can see, some things haven't changed. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff.....had to smile when I saw his post <grin>

    Wonder if he will follow it up?

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Shifting gears back to your good question, I'm curious...

    1. What are you going to do with the t_Group and t_no columns once they've been generated?
    2. How often will you need to generate them?
    3. How many distinct values of t_ID are there?
    4. What are the min and max dates for t_date?
    5. How many distinct t_types are there?
    6. Do you ever have a t_ID for a given date that contains more than 1 t_type?
    7. How many dates does each t_ID usually have?
    8. Are there ever any missing or duplicate dates for a given t_ID?

    Depending on those answers to those questions, I may have an alternate suggestion for all of this.  Of course, some of the questions are for info I'd need to generate a large test table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Shifting gears back to your good question, I'm curious...

    1. What are you going to do with the t_Group and t_no columns once they've been generated?
    2. How often will you need to generate them?
    3. How many distinct values of t_ID are there?
    4. What are the min and max dates for t_date?
    5. How many distinct t_types are there?
    6. Do you ever have a t_ID for a given date that contains more than 1 t_type?
    7. How many dates does each t_ID usually have?
    8. Are there ever any missing or duplicate dates for a given t_ID?

    Depending on those answers to those questions, I may have an alternate suggestion for all of this.  Of course, some of the questions are for info I'd need to generate a large test table.

    Hi Jeff....my question only really arose because of a conversation with a person I worked with a few years back and they mentioned the report that I had provided for them......I vaguely remembered it but when I saw a similar question elsewhere it piqued my interest and for the life of me I could not remember how I had got the groups....I am retired now , but have recently decided to re-engage with the SQL community to keep the grey cells working...and to be honest I enjoy the problem solving.

    So in answering your q's please forgive me if the answers are deliberately vague (confidential) or just vague cos I cant remember !

    What are you going to do with the t_Group and t_no columns once they have been generated?
    IIRC they were used as the basis for further processes to create a final report.

    How often will you need to generate them?
    they were run weekly

    How many distinct values of t_ID are there?
    max c.10K

    What are the min and max dates for t_date?
    a range as requested by user...........but generally last 12mths

    How many distinct t_types are there?
    max of 4

    Do you ever have a t_ID for a given date that contains more than 1 t_type?
    Yes....t-date was a datetime

    How many dates does each t_ID usually have?
    for a 12mth period maybe 100

    Are there ever any missing or duplicate dates for a given t_ID?
    date/time per row is ""as and when"" recorded
    duplicate dates but datetime is unique

    Have fun <grin>

    Will be interested in your thoughts......but tis for interest only so if you decide to pass I will understand

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Thanks, JLS... your answers are great... especially the implied answer of "you're all set".  I can "play" at my leisure, which I have disturbingly little time to do lately because I'm not yet retired.

    As a bit of a sidebar, if you want to keep your mind active, check out the following video where I destroy a myth long held for decades to be the true and, in the process, lay waste to supposed "Best Practice" index maintenance that the whole world has made the serious mistake of also using for decades.  It also introduces IndexDNA™, which is a proc and a spreadsheet I wrote to allow you to actually, literally, and graphically see what an index actually looks like.

    https://www.youtube.com/watch?v=qfQtY17bPQ4&list=PLr9ab4Dj3ObuaHAUA9JJz-GUbfbwXEyS5&index=4

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply