August 23, 2021 at 7:11 pm
wish to group rows into "sets" based on t_id and t_type.
-- set up, notes, and expected results below.
Thanks JLS
CREATE TABLE #l(
t_id INT NOT NULL
,t_type VARCHAR(1) NOT NULL
,t_date DATE NOT NULL
);
INSERT INTO #l(t_id,t_type,t_date) VALUES
(356778,'R','2012-01-01'),(356778,'R','2012-01-02'),(356778,'C','2015-01-07')
,(356778,'C','2015-01-08'),(356778,'C','2015-01-11'),(356778,'R','2019-01-01')
,(356778,'R','2019-01-02'),(356778,'R','2019-01-11'),(356778,'R','2019-01-12')
,(356778,'R','2020-01-01'),(435677,'R','2012-01-04'),(435677,'R','2012-01-05')
,(435677,'C','2015-01-07'),(435677,'C','2015-01-08'),(435677,'C','2015-01-09')
,(435677,'C','2015-01-10'),(435677,'C','2015-01-11'),(435677,'R','2019-01-01')
,(435677,'R','2019-01-02');
SELECT * FROM #l
-- want to incrementaly group by t_id and t_type
-- with an id for each row per incremental group
-- expected results as follows
CREATE TABLE #results(
t_id INT NOT NULL
,t_type VARCHAR(1) NOT NULL
,t_date DATE NOT NULL
,t_Group INT NOT NULL
,t_no INT NOT NULL
);
INSERT INTO #results(t_id,t_type,t_date,t_Group,t_no) VALUES
(356778,'R','2012-01-01',1,1),(356778,'R','2012-01-02',1,2),(356778,'C','2015-01-07',2,1)
,(356778,'C','2015-01-08',2,2),(356778,'C','2015-01-11',2,3),(356778,'R','2019-01-01',3,1)
,(356778,'R','2019-01-02',3,2),(356778,'R','2019-01-11',3,3),(356778,'R','2019-01-12',3,4)
,(356778,'R','2020-01-01',3,5),(435677,'R','2012-01-04',1,1),(435677,'R','2012-01-05',1,2)
,(435677,'C','2015-01-07',2,1),(435677,'C','2015-01-08',2,2),(435677,'C','2015-01-09',2,3)
,(435677,'C','2015-01-10',2,4),(435677,'C','2015-01-11',2,5),(435677,'R','2019-01-01',3,1)
,(435677,'R','2019-01-02',3,2);
SELECT * FROM #results
DROP table #l
DROP TABLE #results
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
August 23, 2021 at 7:55 pm
WITH CTE1 AS (
SELECT *,
CASE WHEN LAG(t_type) OVER(PARTITION BY t_id ORDER BY t_date) = t_type THEN 0 ELSE 1 END AS IsStart
FROM #l
),
CTE2 AS (
SELECT *,
SUM(IsStart) OVER(PARTITION BY t_id ORDER BY t_date) AS t_Group
FROM CTE1
)
SELECT t_id,t_type,t_date,t_Group,
ROW_NUMBER() OVER(PARTITION BY t_id,t_Group ORDER BY t_date) AS t_no
FROM CTE2;
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537August 24, 2021 at 7:12 am
Thanks Mark
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
August 24, 2021 at 12:46 pm
This might be helpful reading.
https://dotnettutorials.net/lesson/over-clause-sql-server/
August 24, 2021 at 1:42 pm
Thanks Homebrew....
it was the "trick" CASE WHEN LAG(t_type) OVER(PARTITION BY t_id ORDER BY t_date) = t_type THEN 0 ELSE 1 END AS IsStart that Mark used was the bit I had forgotten about.
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
August 24, 2021 at 5:50 pm
>> wish to group rows into "sets" based on t_id and t_type. <<
This makes little sense to me. You already have your sets defined by the t_id, So there’s no need to redundantly add some kind of group number. Within the groups, we have no rule given for ordering them. Therefore, I can assume you don’t want any ordering? In fact, since the t_type has duplicates within each t_id, we are stuck. It looks like you might be ordering by the date, but since you didn’t tell us that it’s not part of our specifications.
CREATE TABLE Foobar
(t_id CHAR(6) NOT NULL,
t_type CHAR(1) NOT NULL CHECK(t_type IN (‘R’, ‘C’)),
PRIMARY KEY (t_id, t_type, t_date),
t_date DATE NOT NULL);
Some minor things about your DDL; by definition, not as an option, the table must have a key. What you’ve done is put up a deck of punch cards, written in SQL. . In fact, I love to see the way you used the leading comma at the start of a punch card. We did that back in the 1960s so we can more easily rearrange the deck of punch cards. Ever since about 1970, programmers have simply hit a key and called a pretty printing routine to format their code. Since an identifier has to be on a nominal scale, again, by definition, not an option, it should be a character string. Also, VARCHAR(1) makes no sense.
Finally, why are you materializing the results? We had to do this with punchcards, not with SQL. Also, did you know that the word “group” is a reserved word in SQL?
Remember that one of our goals in RDBMS is to reduce redundancy, not increase it. Another goal is not to do display formatting in the database. That's done in a presentation layer. Just keep it simple.
.
CREATE VIEW Labeled_Foobar
AS
SELECT t_id, t_type, t_date,
ROW_NUMBER() OVER (PARTITION BY t_id ORDER BY t_date) AS group_nbr
FROM Foobar;
Please post DDL and follow ANSI/ISO standards when asking for help.
August 24, 2021 at 6:00 pm
JCelko
Thanks for your reply....but your coded response does not meet my expected results (which I posted)
Other posters have resolved the problem for me.
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
August 25, 2021 at 5:21 pm
Heh... Welcome back again, JLS. As you can see, some things haven't changed. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
August 25, 2021 at 5:35 pm
Jeff.....had to smile when I saw his post <grin>
Wonder if he will follow it up?
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
August 26, 2021 at 5:18 am
Shifting gears back to your good question, I'm curious...
Depending on those answers to those questions, I may have an alternate suggestion for all of this. Of course, some of the questions are for info I'd need to generate a large test table.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 26, 2021 at 1:27 pm
Shifting gears back to your good question, I'm curious...
- What are you going to do with the t_Group and t_no columns once they've been generated?
- How often will you need to generate them?
- How many distinct values of t_ID are there?
- What are the min and max dates for t_date?
- How many distinct t_types are there?
- Do you ever have a t_ID for a given date that contains more than 1 t_type?
- How many dates does each t_ID usually have?
- Are there ever any missing or duplicate dates for a given t_ID?
Depending on those answers to those questions, I may have an alternate suggestion for all of this. Of course, some of the questions are for info I'd need to generate a large test table.
Hi Jeff....my question only really arose because of a conversation with a person I worked with a few years back and they mentioned the report that I had provided for them......I vaguely remembered it but when I saw a similar question elsewhere it piqued my interest and for the life of me I could not remember how I had got the groups....I am retired now , but have recently decided to re-engage with the SQL community to keep the grey cells working...and to be honest I enjoy the problem solving.
So in answering your q's please forgive me if the answers are deliberately vague (confidential) or just vague cos I cant remember !
What are you going to do with the t_Group and t_no columns once they have been generated?
IIRC they were used as the basis for further processes to create a final report.
How often will you need to generate them?
they were run weekly
How many distinct values of t_ID are there?
max c.10K
What are the min and max dates for t_date?
a range as requested by user...........but generally last 12mths
How many distinct t_types are there?
max of 4
Do you ever have a t_ID for a given date that contains more than 1 t_type?
Yes....t-date was a datetime
How many dates does each t_ID usually have?
for a 12mth period maybe 100
Are there ever any missing or duplicate dates for a given t_ID?
date/time per row is ""as and when"" recorded
duplicate dates but datetime is unique
Have fun <grin>
Will be interested in your thoughts......but tis for interest only so if you decide to pass I will understand
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
August 26, 2021 at 2:26 pm
Thanks, JLS... your answers are great... especially the implied answer of "you're all set". I can "play" at my leisure, which I have disturbingly little time to do lately because I'm not yet retired.
As a bit of a sidebar, if you want to keep your mind active, check out the following video where I destroy a myth long held for decades to be the true and, in the process, lay waste to supposed "Best Practice" index maintenance that the whole world has made the serious mistake of also using for decades. It also introduces IndexDNA™, which is a proc and a spreadsheet I wrote to allow you to actually, literally, and graphically see what an index actually looks like.
https://www.youtube.com/watch?v=qfQtY17bPQ4&list=PLr9ab4Dj3ObuaHAUA9JJz-GUbfbwXEyS5&index=4
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply