Is there a way to show the next date that is greater but not have them duplicate like the highlighted below? I am using sql server and doing a select from the columns below where the service date is greater than da discharge date.
This is what I want it to show. It has the highlighted removed. I've tried connect by and lag but can't seem to get it.
You could do with reading this: How to post code problems
DROP TABLE IF EXISTS dbo.PATIENTS;
CREATE TABLE dbo.PATIENTS
(
FIRST_NAME varchar(50),
LAST_NAME varchar(50),
DA_DISCHARGE_DATE date,
SERVICE_DATE date,
SVCCOD varchar(10)
);
INSERT INTO dbo.PATIENTS (FIRST_NAME, LAST_NAME, DA_DISCHARGE_DATE, SERVICE_DATE, SVCCOD)
VALUES
('Bill', 'Smith', '2023-01-22', '2023-01-23', 'H2036'),
('Bill', 'Smith', '2023-01-22', '2023-01-24', 'H2036'),
('Bill', 'Smith', '2023-01-22', '2023-01-25', 'H2036'),
('Bill', 'Smith', '2023-01-22', '2023-01-26', 'H2036'),
('Bill', 'Smith', '2023-01-22', '2023-01-27', 'H2036'),
('Bill', 'Smith', '2023-01-22', '2023-01-28', 'H2036'),
('Bill', 'Smith', '2023-01-22', '2023-01-29', 'H2036'),
('Bill', 'Smith', '2023-01-22', '2023-01-30', 'H2036'),
('Bill', 'Smith', '2023-01-22', '2023-01-31', 'H2036'),
('Bill', 'Smith', '2023-01-22', '2023-02-20', 'H2013'),
('Bill', 'Smith', '2023-01-22', '2023-02-27', 'H2036'),
('Bill', 'Smith', '2023-01-22', '2023-03-15', 'H2034'),
('Bill', 'Smith', '2023-01-22', '2023-04-01', 'H2034'),
('Bill', 'Smith', '2023-02-09', '2023-02-20', 'H2013'),
('Bill', 'Smith', '2023-02-09', '2023-02-27', 'H2036'),
('Bill', 'Smith', '2023-02-09', '2023-03-15', 'H2034'),
('Bill', 'Smith', '2023-02-09', '2023-04-01', 'H2034')
;
And now the SQL you need:
SELECT p.FIRST_NAME,
p.LAST_NAME,
p.DA_DISCHARGE_DATE,
p.SERVICE_DATE,
p.SVCCOD
FROM dbo.PATIENTS p
WHERE NOT EXISTS(SELECT 1
FROM dbo.PATIENTS p2
WHERE p2.FIRST_NAME = p.FIRST_NAME
AND p2.LAST_NAME = p.LAST_NAME
AND p2.SERVICE_DATE = p.SERVICE_DATE
AND p2.SVCCOD = p.SVCCOD
AND p2.DA_DISCHARGE_DATE > p.DA_DISCHARGE_DATE)
;
July 9, 2023 at 10:25 am
This was removed by the editor as SPAM
July 13, 2023 at 4:20 pm
Thank you.
July 14, 2023 at 12:39 am
Just to add my 2 cents 2 late, for the given test data, the WHERE EXISTS code does 18 page reads. The following does 1.
WITH cteDupeCheck AS
(
SELECT FIRST_NAME,LAST_NAME,DA_DISCHARGE_DATE,SERVICE_DATE,SVCCOD
,DupeCheck = ROW_NUMBER() OVER (PARTITION BY FIRST_NAME,LAST_NAME,SERVICE_DATE,SVCCOD ORDER BY DA_DISCHARGE_DATE DESC)
FROM dbo.PATIENTS
)
SELECT FIRST_NAME,LAST_NAME,DA_DISCHARGE_DATE,SERVICE_DATE,SVCCOD
FROM cteDupeCheck
WHERE DupeCheck = 1
ORDER BY FIRST_NAME,LAST_NAME,DA_DISCHARGE_DATE,SERVICE_DATE,SVCCOD
;
--Jeff Moden
Change is inevitable... Change for the better is not.
July 14, 2023 at 12:58 am
Set up test data
set statistics io, time off
DROP TABLE IF EXISTS dbo.PATIENTS;
CREATE TABLE dbo.PATIENTS
(
FIRST_NAME varchar(50),
LAST_NAME varchar(50),
DA_DISCHARGE_DATE date,
SERVICE_DATE date,
SVCCOD varchar(10)
);
INSERT INTO dbo.PATIENTS (FIRST_NAME, LAST_NAME, DA_DISCHARGE_DATE, SERVICE_DATE, SVCCOD)
VALUES
('Bill', 'Smith', '2023-01-22', '2023-01-23', 'H2036'),
('Bill', 'Smith', '2023-01-22', '2023-01-24', 'H2036'),
('Bill', 'Smith', '2023-01-22', '2023-01-25', 'H2036'),
('Bill', 'Smith', '2023-01-22', '2023-01-26', 'H2036'),
('Bill', 'Smith', '2023-01-22', '2023-01-27', 'H2036'),
('Bill', 'Smith', '2023-01-22', '2023-01-28', 'H2036'),
('Bill', 'Smith', '2023-01-22', '2023-01-29', 'H2036'),
('Bill', 'Smith', '2023-01-22', '2023-01-30', 'H2036'),
('Bill', 'Smith', '2023-01-22', '2023-01-31', 'H2036'),
('Bill', 'Smith', '2023-01-22', '2023-02-20', 'H2013'),
('Bill', 'Smith', '2023-01-22', '2023-02-27', 'H2036'),
('Bill', 'Smith', '2023-01-22', '2023-03-15', 'H2034'),
('Bill', 'Smith', '2023-01-22', '2023-04-01', 'H2034'),
('Bill', 'Smith', '2023-02-09', '2023-02-20', 'H2013'),
('Bill', 'Smith', '2023-02-09', '2023-02-27', 'H2036'),
('Bill', 'Smith', '2023-02-09', '2023-03-15', 'H2034'),
('Bill', 'Smith', '2023-02-09', '2023-04-01', 'H2034')
;
INSERT INTO dbo.PATIENTS
SELECT FIRST_NAME + CONVERT(varchar, g.value), LAST_NAME, DA_DISCHARGE_DATE, SERVICE_DATE, SVCCOD
FROM dbo.PATIENTS p
CROSS JOIN generate_series(0,20000,1) g
WHERE p.FIRST_NAME = 'BILL'
go
Run test
drop table if exists #t1
drop table if exists #t2
go
set statistics io, time on
go
SELECT p.FIRST_NAME,
p.LAST_NAME,
p.DA_DISCHARGE_DATE,
p.SERVICE_DATE,
p.SVCCOD
into #t1
FROM dbo.PATIENTS p
WHERE NOT EXISTS(SELECT 1
FROM dbo.PATIENTS p2
WHERE p2.FIRST_NAME = p.FIRST_NAME
AND p2.LAST_NAME = p.LAST_NAME
AND p2.SERVICE_DATE = p.SERVICE_DATE
AND p2.SVCCOD = p.SVCCOD
AND p2.DA_DISCHARGE_DATE > p.DA_DISCHARGE_DATE)
;
go
WITH cteDupeCheck AS
(
SELECT FIRST_NAME,
LAST_NAME,
DA_DISCHARGE_DATE,
SERVICE_DATE,
SVCCOD,
DupeCheck = ROW_NUMBER() OVER (PARTITION BY FIRST_NAME,LAST_NAME,SERVICE_DATE,SVCCOD ORDER BY DA_DISCHARGE_DATE DESC)
FROM dbo.PATIENTS
)
SELECT FIRST_NAME,
LAST_NAME,
DA_DISCHARGE_DATE,
SERVICE_DATE,
SVCCOD
into #t2
FROM cteDupeCheck
WHERE DupeCheck = 1
ORDER BY FIRST_NAME,LAST_NAME,DA_DISCHARGE_DATE,SERVICE_DATE,SVCCOD
;
Although the second statement has fewer scans and logical reads takes twice as long to execute and uses double the CPU (on my machine)
Table 'PATIENTS'. Scan count 10, logical reads 3494, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
SQL Server Execution Times:
CPU time = 579 ms, elapsed time = 165 ms.
(260026 rows affected)
Table 'PATIENTS'. Scan count 5, logical reads 1747, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1139 ms, elapsed time = 324 ms.
(260026 rows affected)
July 14, 2023 at 1:41 am
Wow! Those times are about twice as fast as what I'm getting. What kind of machine are you working on?
I'm on an Alienware R17 (about 4 years old now) with 6 hyper-threaded I7s, 32GB of RAM, and NVME SSDs running at SQL Server 2022 Dev at 4GHz.
Anyway, I got the same reads and same ratios for CPU. Thanks for scaling this pitwooffie out a bit.
It a shame to be left this way... either use twice the memory I/O or twice the CPU. I will admit that if memory is plentiful, then I'd likely take your option.
As a bit of a sidebar, there's been some posts out on linked in where people are suggesting that you don't need to test for scalability. This is additional proof that they're wrong.
Thanks for the effort, Jonathan.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 14, 2023 at 2:54 pm
I've got a (8 years old) Dell XPS 8700 desktop with a i7-4790 and 28GB RAM, and a SanDisk Ultra II 960GB SSD with the tempdb on an even older Fusion-io 160GB card. Disabling hyperthreading is a step I took that resulted in improved performance.
Amazing. Thanks, Jonathan. I'm going to have to try disabling hyperthreading and see what happens.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 14, 2023 at 3:20 pm
The timings on my laptop
-- Jonathan
Table 'PATIENTS'. Scan count 18, logical reads 3494, physical reads 0, page server reads 0, read-ahead reads 1754.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1012 ms, elapsed time = 474 ms.
-- Jeff
Table 'PATIENTS'. Scan count 9, logical reads 1747, physical reads 0, page server reads 0, read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0.
SQL Server Execution Times:
CPU time = 2954 ms, elapsed time = 763 ms.
-- swePeso
Table 'PATIENTS'. Scan count 1, logical reads 1747, physical reads 0, page server reads 0, read-ahead reads 0.
SQL Server Execution Times:
CPU time = 296 ms, elapsed time = 427 ms.
N 56°04'39.16"
E 12°55'05.25"
July 14, 2023 at 3:45 pm
I just realized that both Jonathans and Jeffs queries used all 8 cores and mine just used 1 core.
So here is the same query with parallell plan
SELECT FIRST_NAME,
LAST_NAME,
MAX(DA_DISCHARGE_DATE) AS DA_DISCHARGE_DATE,
SERVICE_DATE,
SVCCOD
FROM dbo.PATIENTS
GROUP BY FIRST_NAME,
LAST_NAME,
SERVICE_DATE,
SVCCOD
OPTION (QUERYTRACEON 8649);
And the statistics look like this
-- Jonathan
Table 'PATIENTS'. Scan count 18, logical reads 3494, physical reads 0, page server reads 0, read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1169 ms, elapsed time = 475 ms.
-- Jeff
Table 'PATIENTS'. Scan count 9, logical reads 1747, physical reads 0, page server reads 0, read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0.
SQL Server Execution Times:
CPU time = 3236 ms, elapsed time = 875 ms.
-- swePeso
Table 'PATIENTS'. Scan count 9, logical reads 1747, physical reads 0, page server reads 0, read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0.
SQL Server Execution Times:
CPU time = 969 ms, elapsed time = 326 ms.
N 56°04'39.16"
E 12°55'05.25"
July 14, 2023 at 4:07 pm
drop table if exists #t1
drop table if exists #t2
drop table if exists #t3
go
set statistics io, time on
go
SELECT p.FIRST_NAME,
p.LAST_NAME,
p.DA_DISCHARGE_DATE,
p.SERVICE_DATE,
p.SVCCOD
into #t1
FROM dbo.PATIENTS p
WHERE NOT EXISTS(SELECT 1
FROM dbo.PATIENTS p2
WHERE p2.FIRST_NAME = p.FIRST_NAME
AND p2.LAST_NAME = p.LAST_NAME
AND p2.SERVICE_DATE = p.SERVICE_DATE
AND p2.SVCCOD = p.SVCCOD
AND p2.DA_DISCHARGE_DATE > p.DA_DISCHARGE_DATE)
;
go
WITH cteDupeCheck AS
(
SELECT FIRST_NAME,
LAST_NAME,
DA_DISCHARGE_DATE,
SERVICE_DATE,
SVCCOD,
DupeCheck = ROW_NUMBER() OVER (PARTITION BY FIRST_NAME,LAST_NAME,SERVICE_DATE,SVCCOD ORDER BY DA_DISCHARGE_DATE DESC)
FROM dbo.PATIENTS
)
SELECT FIRST_NAME,
LAST_NAME,
DA_DISCHARGE_DATE,
SERVICE_DATE,
SVCCOD
into #t2
FROM cteDupeCheck
WHERE DupeCheck = 1
ORDER BY FIRST_NAME,LAST_NAME,DA_DISCHARGE_DATE,SERVICE_DATE,SVCCOD
;
GO
SELECT FIRST_NAME,
LAST_NAME,
MAX(DA_DISCHARGE_DATE) AS DA_DISCHARGE_DATE,
SERVICE_DATE,
SVCCOD
into #t3
FROM dbo.PATIENTS
GROUP BY FIRST_NAME,
LAST_NAME,
SERVICE_DATE,
SVCCOD
;
I tried them with hyperthreading switched on:
-- Jonathan's
Table 'PATIENTS'. Scan count 10, logical reads 3492, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
SQL Server Execution Times:
CPU time = 749 ms, elapsed time = 193 ms.
-- Jeff's
Table 'PATIENTS'. Scan count 5, logical reads 1746, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1471 ms, elapsed time = 411 ms.
-- SwePeso's
Table 'PATIENTS'. Scan count 5, logical reads 1746, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
SQL Server Execution Times:
CPU time = 437 ms, elapsed time = 126 ms.
and with hyperthreading switched off:
-- Jonathan's
Table 'PATIENTS'. Scan count 10, logical reads 3494, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
SQL Server Execution Times:
CPU time = 592 ms, elapsed time = 178 ms.
-- Jeff's
Table 'PATIENTS'. Scan count 5, logical reads 1747, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1220 ms, elapsed time = 336 ms.
-- SwePeso's
Table 'PATIENTS'. Scan count 5, logical reads 1747, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
SQL Server Execution Times:
CPU time = 311 ms, elapsed time = 108 ms.
So a minor improvement for all queries with hyperthreading switched off for this query.
I then tried it on my work laptop Dell Latitude 5510 i7-10610U with 32GB RAM:
-- Jonathan's
Table 'PATIENTS'. Scan count 18, logical reads 4672, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1265 ms, elapsed time = 206 ms.
-- Jeff's
Table 'PATIENTS'. Scan count 9, logical reads 2336, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1015 ms, elapsed time = 165 ms.
-- SwePeso's
Table 'PATIENTS'. Scan count 9, logical reads 2336, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
SQL Server Execution Times:
CPU time = 580 ms, elapsed time = 115 ms.
So the performance has switched when I use my worktop laptop so Jeff's query is faster than my query.
July 16, 2023 at 3:02 pm
What's more interesting than any of that is how we missed the forest for the trees. 🙁 The MAX solution that Peter provided is all that's needed here. [headdesk].
--Jeff Moden
Change is inevitable... Change for the better is not.
July 16, 2023 at 3:51 pm
What's more interesting than any of that is how we missed the forest for the trees. 🙁 The MAX solution that Peter provided is all that's needed here. [headdesk].
Yes, we both should have noticed that, but our methods are more general in that they will return the entire row that contains the max date even if some of the other columns were not duplicated.
btw did you try disabling hyperthreading?
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply