July 23, 2018 at 8:24 pm
So, I have a table where data is being saved.
My table (CustomerSourceDE) looks something like this:
---------------------------------------
Id (text) NULLABLE
Email (Email) NULLABLE
Customer_FirstName (Text) NULLABLE
Status (Text) NULLABLE
RecordTimestamp (Date) default value - CURRENT DATETIME
---------------------------------------
Records in this table looks like this (one customer MAY have multiple rows which is why no primary key is defined)
---------------------------------------
1 | test1@test.com | Michael | New | Sunday, July 15, 2018 10:08 PM
2 | test2@test.com | Jackson | New | Sunday, July 15, 2018 10:10 PM
3 | test3@test.com | JordanAB | Accepted | Sunday, July 15, 2018 10:15 PM
1 | test1@test.com | Michael | Rejected | Sunday, July 15, 2018 10:25 PM
7 | test7@test.com | RobertMM | Rejected | Sunday, July 15, 2018 10:31 PM
2 | test2@test.com | Jackson | Disabled | Sunday, July 15, 2018 10:38 PM
---------------------------------------
I want DISTINCT records only and if there are multiple rows of same Id then I want most recent record row only based on the field (RecordTimestamp).
I have created this query but now getting the desired result (for some reason, I am getting 1 single record).
My desired result is:
---------------------------------------
3 | test3@test.com | JordanAB | Accepted | Sunday, July 15, 2018 10:15 PM
1 | test1@test.com | Michael | Rejected | Sunday, July 15, 2018 10:25 PM
7 | test7@test.com | RobertMM | Rejected | Sunday, July 15, 2018 10:31 PM
2 | test2@test.com | Jackson | Disabled | Sunday, July 15, 2018 10:38 PM
---------------------------------------
My sql query which is NOT working and I need help with:
SELECT * FROM CustomerSourceDE
WHERE
Id IN (SELECT DISTINCT Id from CustomerSourceDE where AddRecordTimestamp = (SELECT MAX(RecordTimestamp) FROM CustomerSourceDE as t WHERE Id = t.Id))
July 24, 2018 at 2:00 am
mikewill - Monday, July 23, 2018 8:24 PMSo, I have a table where data is being saved.My table (CustomerSourceDE) looks something like this:
---------------------------------------
Id (text) NULLABLE
Email (Email) NULLABLE
Customer_FirstName (Text) NULLABLE
Status (Text) NULLABLE
RecordTimestamp (Date) default value - CURRENT DATETIME
---------------------------------------
Records in this table looks like this (one customer MAY have multiple rows which is why no primary key is defined)
---------------------------------------
1 | test1@test.com | Michael | New | Sunday, July 15, 2018 10:08 PM
2 | test2@test.com | Jackson | New | Sunday, July 15, 2018 10:10 PM
3 | test3@test.com | JordanAB | Accepted | Sunday, July 15, 2018 10:15 PM
1 | test1@test.com | Michael | Rejected | Sunday, July 15, 2018 10:25 PM
7 | test7@test.com | RobertMM | Rejected | Sunday, July 15, 2018 10:31 PM2 | test2@test.com | Jackson | Disabled | Sunday, July 15, 2018 10:38 PM
---------------------------------------
I want DISTINCT records only and if there are multiple rows of same Id then I want most recent record row only based on the field (RecordTimestamp).
I have created this query but now getting the desired result (for some reason, I am getting 1 single record).
My desired result is:
---------------------------------------
3 | test3@test.com | JordanAB | Accepted | Sunday, July 15, 2018 10:15 PM
1 | test1@test.com | Michael | Rejected | Sunday, July 15, 2018 10:25 PM
7 | test7@test.com | RobertMM | Rejected | Sunday, July 15, 2018 10:31 PM2 | test2@test.com | Jackson | Disabled | Sunday, July 15, 2018 10:38 PM
---------------------------------------
My sql query which is NOT working and I need help with:
SELECT * FROM CustomerSourceDE
WHERE
Id IN (SELECT DISTINCT Id from CustomerSourceDE where AddRecordTimestamp = (SELECT MAX(RecordTimestamp) FROM CustomerSourceDE as t WHERE Id = t.Id))
Kindly post DDL with sample data in a consumable format.Look like this to me.
CREATE TABLE latestrecords
(
Id int,
Email varchar(200),
Customer_FirstName varchar(200),
Status varchar(200),
RecordTimestamp DATETIME
)INSERT INTO latestrecords VALUES (1,'test1@test.com','Michael','New','July 15, 2018 10:08 PM');
INSERT INTO latestrecords VALUES (2,'test2@test.com','Jackson','New','July 15, 2018 10:10 PM ');
INSERT INTO latestrecords VALUES (2,'test2@test.com','Jackson','Disabled','July 15, 2018 10:38 PM');
INSERT INTO latestrecords VALUES (3,'test3@test.com','JordanAB','Accepted','July 15, 2018 10:15 PM');
INSERT INTO latestrecords VALUES (1,'test1@test.com','Michael','Rejected ','July 15, 2018 10:25 PM');select Id,Email,Customer_FirstName,Status,RecordTimestamp
from
(
select *,
row_number()over(partition by Id,Email,Customer_FirstName order by RecordTimestamp desc) as rnk
from latestrecords
)ranked_records
where rnk=1
Id Customer_FirstName Status RecordTimestamp
1 test1@test.com Michael Rejected 15/07/2018 22:25:00 2 test2@test.com Jackson Disabled 15/07/2018 22:38:00 3 test3@test.com JordanAB Accepted 15/07/2018 22:15:00 3 rows (showing 1 to 3)
Saravanan
July 24, 2018 at 4:27 am
Another method of getting the result you want is to use CROSS APPLY with TOP(1)::;WITH CTE AS (select distinct Id from CustomerSourceDE)
SELECT T.* FROM CTE
CROSS APPLY(SELECT TOP(1) *
FROM CustomerSourceDE X
WHERE X.Id =CTE.Id
ORDER BY X.RecordTimestamp desc) as T
If you have an index on CustomerSourceDE(Id, RecordTimestamp) this will most likely be the fastest query too.
July 24, 2018 at 7:31 am
Jonathan AC Roberts - Tuesday, July 24, 2018 4:27 AMAnother method of getting the result you want is to use CROSS APPLY with TOP(1)::;WITH CTE AS (select distinct Id from CustomerSourceDE)
SELECT T.* FROM CTE
CROSS APPLY(SELECT TOP(1) *
FROM CustomerSourceDE X
WHERE X.Id =CTE.Id
ORDER BY X.RecordTimestamp desc) as T
If you have an index on CustomerSourceDE(Id, RecordTimestamp) this will most likely be the fastest query too.
The CROSS APPLY with the TOP(1) requires multiple scans of the table, whereas the ROW_NUMBER method only requires one.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
July 24, 2018 at 8:08 am
drew.allen - Tuesday, July 24, 2018 7:31 AMThe CROSS APPLY with the TOP(1) requires multiple scans of the table, whereas the ROW_NUMBER method only requires one.
Drew
Ok, a quick test of the two queries:set nocount on
set statistics io,time off
GO
DROP TABLE latestrecords;
GO
CREATE TABLE latestrecords
(
Id int,
Email varchar(200),
Customer_FirstName varchar(200),
Status varchar(200),
RecordTimestamp DATETIME
)
GO
INSERT INTO latestrecords VALUES (1,'test1@test.com','Michael','New','July 15, 2018 10:08 PM');
INSERT INTO latestrecords VALUES (2,'test2@test.com','Jackson','New','July 15, 2018 10:10 PM ');
INSERT INTO latestrecords VALUES (2,'test2@test.com','Jackson','Disabled','July 15, 2018 10:38 PM');
INSERT INTO latestrecords VALUES (3,'test3@test.com','JordanAB','Accepted','July 15, 2018 10:15 PM');
INSERT INTO latestrecords VALUES (1,'test1@test.com','Michael','Rejected ','July 15, 2018 10:25 PM');
GO 10000
CREATE INDEX IX_latestrecords_1 ON latestrecords(Id,RecordTimestamp);
select count(*) FROM latestrecords;
set statistics io,time on
PRINT '************* row_number *************'
select Id,Email,Customer_FirstName,Status,RecordTimestamp
from
(
select *,
row_number()over(partition by Id,Email,Customer_FirstName order by RecordTimestamp desc) as rnk
from latestrecords
)ranked_records
where rnk=1
PRINT '************* CTE *************'
;WITH CTE AS (select distinct Id from latestrecords)
SELECT T.* FROM CTE
CROSS APPLY(SELECT TOP(1) *
FROM latestrecords X
WHERE X.Id =CTE.Id
ORDER BY X.RecordTimestamp desc) as T
Results:
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'latestrecords'. Scan count 1, logical reads 365, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 110 ms, elapsed time = 133 ms.
************* CTE *************
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
Table 'latestrecords'. Scan count 4, logical reads 174, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 4 ms.
So on my computer it's faster.
The first query has to do a large sort.
July 24, 2018 at 8:33 am
Jonathan AC Roberts - Tuesday, July 24, 2018 8:08 AMdrew.allen - Tuesday, July 24, 2018 7:31 AMThe CROSS APPLY with the TOP(1) requires multiple scans of the table, whereas the ROW_NUMBER method only requires one.
Drew
Ok, a quick test of the two queries:
set nocount on
set statistics io,time off
GO
DROP TABLE latestrecords;
GO
CREATE TABLE latestrecords
(
Id int,
Email varchar(200),
Customer_FirstName varchar(200),
Status varchar(200),
RecordTimestamp DATETIME
)
GO
INSERT INTO latestrecords VALUES (1,'test1@test.com','Michael','New','July 15, 2018 10:08 PM');
INSERT INTO latestrecords VALUES (2,'test2@test.com','Jackson','New','July 15, 2018 10:10 PM ');
INSERT INTO latestrecords VALUES (2,'test2@test.com','Jackson','Disabled','July 15, 2018 10:38 PM');
INSERT INTO latestrecords VALUES (3,'test3@test.com','JordanAB','Accepted','July 15, 2018 10:15 PM');
INSERT INTO latestrecords VALUES (1,'test1@test.com','Michael','Rejected ','July 15, 2018 10:25 PM');
GO 10000
CREATE INDEX IX_latestrecords_1 ON latestrecords(Id,RecordTimestamp);select count(*) FROM latestrecords;
set statistics io,time on
PRINT '************* row_number *************'
select Id,Email,Customer_FirstName,Status,RecordTimestamp
from
(
select *,
row_number()over(partition by Id,Email,Customer_FirstName order by RecordTimestamp desc) as rnk
from latestrecords
)ranked_records
where rnk=1
PRINT '************* CTE *************'
;WITH CTE AS (select distinct Id from latestrecords)
SELECT T.* FROM CTE
CROSS APPLY(SELECT TOP(1) *
FROM latestrecords X
WHERE X.Id =CTE.Id
ORDER BY X.RecordTimestamp desc) as T
Results:************* row_number *************SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'latestrecords'. Scan count 1, logical reads 365, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.SQL Server Execution Times:
CPU time = 110 ms, elapsed time = 133 ms.
************* CTE *************SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
Table 'latestrecords'. Scan count 4, logical reads 174, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 4 ms.So on my computer it's faster.
The first query has to do a large sort.
Test, test, and test again. Good see the differences. Just remember that may not always be the case, which is why I made my initial statement.
July 24, 2018 at 8:48 am
Jonathan AC Roberts - Tuesday, July 24, 2018 8:08 AMdrew.allen - Tuesday, July 24, 2018 7:31 AMThe CROSS APPLY with the TOP(1) requires multiple scans of the table, whereas the ROW_NUMBER method only requires one.
Drew
Ok, a quick test of the two queries:
set nocount on
set statistics io,time off
GO
DROP TABLE latestrecords;
GO
CREATE TABLE latestrecords
(
Id int,
Email varchar(200),
Customer_FirstName varchar(200),
Status varchar(200),
RecordTimestamp DATETIME
)
GO
INSERT INTO latestrecords VALUES (1,'test1@test.com','Michael','New','July 15, 2018 10:08 PM');
INSERT INTO latestrecords VALUES (2,'test2@test.com','Jackson','New','July 15, 2018 10:10 PM ');
INSERT INTO latestrecords VALUES (2,'test2@test.com','Jackson','Disabled','July 15, 2018 10:38 PM');
INSERT INTO latestrecords VALUES (3,'test3@test.com','JordanAB','Accepted','July 15, 2018 10:15 PM');
INSERT INTO latestrecords VALUES (1,'test1@test.com','Michael','Rejected ','July 15, 2018 10:25 PM');
GO 10000
CREATE INDEX IX_latestrecords_1 ON latestrecords(Id,RecordTimestamp);select count(*) FROM latestrecords;
set statistics io,time on
PRINT '************* row_number *************'
select Id,Email,Customer_FirstName,Status,RecordTimestamp
from
(
select *,
row_number()over(partition by Id,Email,Customer_FirstName order by RecordTimestamp desc) as rnk
from latestrecords
)ranked_records
where rnk=1
PRINT '************* CTE *************'
;WITH CTE AS (select distinct Id from latestrecords)
SELECT T.* FROM CTE
CROSS APPLY(SELECT TOP(1) *
FROM latestrecords X
WHERE X.Id =CTE.Id
ORDER BY X.RecordTimestamp desc) as T
Results:************* row_number *************SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'latestrecords'. Scan count 1, logical reads 365, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.SQL Server Execution Times:
CPU time = 110 ms, elapsed time = 133 ms.
************* CTE *************SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
Table 'latestrecords'. Scan count 4, logical reads 174, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 4 ms.So on my computer it's faster.
The first query has to do a large sort.
You only tested on four rows (hence the scan count 4). As the number of rows increases, the cost of the CROSS APPLY will increase roughly at O(n log n), whereas the cost of the ROW_NUMBER() method will increase at roughly O(n). So your method may well "win" at small row counts, it will lose big time on row counts typical of a normal database, in the same way that a human can win a race against a horse if the distance is short enough, but will lose when the distance is longer.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
July 24, 2018 at 9:00 am
drew.allen - Tuesday, July 24, 2018 8:48 AMYou only tested on four rows (hence the scan count 4). As the number of rows increases, the cost of the CROSS APPLY will increase roughly at O(n log n), whereas the cost of the ROW_NUMBER() method will increase at roughly O(n). So your method may well "win" at small row counts, it will lose big time on row counts typical of a normal database, in the same way that a human can win a race against a horse if the distance is short enough, but will lose when the distance is longer.Drew
I think my method will win when there are a lot of rows with the same Id, it doesn't matter how many distinct Ids there are. When there are less than about 4 rows per Id then the row_count method might win. My query only does one table scan to get the distinct Ids. The query it does to get the TOP(1) uses a seek (so is fast). The only way to tell is to test it on different data.
Here's a script where you can change the total number of rows and the number of rows per Id. (you'll need a tally table with at least 1 million rows on http://www.sqlservercentral.com/articles/T-SQL/62867/)set statistics io,time off
DROP TABLE latestrecords ;
GO
CREATE TABLE latestrecords
(
Id int,
Email varchar(200),
Customer_FirstName varchar(200),
Status varchar(200),
RecordTimestamp DATETIME
)
GO
DECLARE @RowsToInsert as int = 1000000
DECLARE @RowsPerId as int = 100
DECLARE @DistinctIds as int =@RowsToInsert /@RowsPerId
INSERT INTO latestrecords
SELECT t.N%@DistinctIds ,CONCAT('test', t.N%@DistinctIds,'@test.com'),CONCAT('Name',t.N%@DistinctIds),'New',DATEADD(minute,t.n,'July 15, 2018 10:08 PM')
from tally t
where t.N <= @RowsToInsert
CREATE INDEX IX_latestrecords_1 ON latestrecords(Id,RecordTimestamp) INCLUDE (Email,Customer_FirstName,Status) ;
SELECT COUNT(*) Count FROM latestrecords
SELECT COUNT(distinct id) [distinct id] FROM latestrecords
IF OBJECT_ID('tempdb..#t1') IS NOT NULL
DROP TABLE #t1
IF OBJECT_ID('tempdb..#t2') IS NOT NULL
DROP TABLE #t2
set statistics io,time on
PRINT '************* row_number *************'
select Id,Email,Customer_FirstName,Status,RecordTimestamp
into #t1
from
(
select *,
row_number()over(partition by Id,Email,Customer_FirstName order by RecordTimestamp desc) as rnk
from latestrecords
)ranked_records
where rnk=1
PRINT '************* CTE *************'
;WITH CTE AS (select distinct Id from latestrecords)
SELECT T.*
into #t2
FROM CTE
CROSS APPLY(SELECT TOP(1) *
FROM latestrecords X
WHERE X.Id =CTE.Id
ORDER BY X.RecordTimestamp desc) as T
set statistics io,time off
With the values I've put in 1 million rows total with 10,000 different Ids I get the following results on my machine:
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 511 ms, elapsed time = 511 ms.
Table 'latestrecords'. Scan count 5, logical reads 7017, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1607 ms, elapsed time = 750 ms.
(10000 rows affected)
************* CTE *************
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
Table 'latestrecords'. Scan count 10005, logical reads 39809, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 187 ms, elapsed time = 35 ms.
(10000 rows affected)
July 24, 2018 at 9:23 am
Jonathan AC Roberts - Tuesday, July 24, 2018 9:00 AMdrew.allen - Tuesday, July 24, 2018 8:48 AMYou only tested on four rows (hence the scan count 4). As the number of rows increases, the cost of the CROSS APPLY will increase roughly at O(n log n), whereas the cost of the ROW_NUMBER() method will increase at roughly O(n). So your method may well "win" at small row counts, it will lose big time on row counts typical of a normal database, in the same way that a human can win a race against a horse if the distance is short enough, but will lose when the distance is longer.Drew
I think my method will win when there are a lot of rows with the same Id, it doesn't matter how many distinct Ids there are. When there are less than about 3 rows per Id then the row_count method might win. My query only does one table scan to get the distinct Ids. The query it does to get the TOP(1) uses a seek (so is fast). The only way to tell is to test it on different data.
Do a million row test. That will demonstrate what you are looking for here.
July 24, 2018 at 9:45 am
Jonathan AC Roberts - Tuesday, July 24, 2018 9:00 AMdrew.allen - Tuesday, July 24, 2018 8:48 AMYou only tested on four rows (hence the scan count 4). As the number of rows increases, the cost of the CROSS APPLY will increase roughly at O(n log n), whereas the cost of the ROW_NUMBER() method will increase at roughly O(n). So your method may well "win" at small row counts, it will lose big time on row counts typical of a normal database, in the same way that a human can win a race against a horse if the distance is short enough, but will lose when the distance is longer.Drew
I think my method will win when there are a lot of rows with the same Id, it doesn't matter how many distinct Ids there are. When there are less than about 3 rows per Id then the row_count method might win. My query only does one table scan to get the distinct Ids. The query it does to get the TOP(1) uses a seek (so is fast). The only way to tell is to test it on different data.
Your method will win when there are few distinct Ids, which is an unrealistic expectation for a production database. What kills your query is the nested loops. The more distinct ids you have, the more times the nested loop is called. Instead of testing only 3 distinct ids, try 100 or 1000 or 10,000.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
July 24, 2018 at 9:52 am
drew.allen - Tuesday, July 24, 2018 9:45 AMYour method will win when there are few distinct Ids, which is an unrealistic expectation for a production database. What kills your query is the nested loops. The more distinct ids you have, the more times the nested loop is called. Instead of testing only 3 distinct ids, try 100 or 1000 or 10,000.Drew
I've edited my post above to do a 1 million row test with 10,000 different Ids.
July 24, 2018 at 10:50 am
The imperfect index definition is a drag on the first query, requiring a sort of all rows. Correcting the index definition:
DROP INDEX IX_latestrecords_1 ON latestrecords;
CREATE UNIQUE INDEX IX_latestrecords_1 ON latestrecords(Id,RecordTimestamp DESC) INCLUDE (Email,Customer_FirstName,Status) ;
eliminates the sort, making the original queries much closer in CPU time/usage, although it does seem that the Segment operator takes some additional CPU. Still, I'm happy to trade that for the drastically reduced I/O of the first query, since typically I/O is the critical performance issue, not CPU. [For example, on my main prod instances, I've got I/O latency on many assorted files, with CPU typically between 8% and 16% and rarely exceeding 25% (except for backups, which seem to push it around 50%).]
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
July 24, 2018 at 11:02 am
ScottPletcher - Tuesday, July 24, 2018 10:50 AMThe imperfect index definition is a drag on the first query, requiring a sort of all rows. Correcting the index definition:DROP INDEX IX_latestrecords_1 ON latestrecords;
CREATE UNIQUE INDEX IX_latestrecords_1 ON latestrecords(Id,RecordTimestamp DESC) INCLUDE (Email,Customer_FirstName,Status) ;eliminates the sort, making the original queries much closer in CPU time/usage, although it does seem that the Segment operator takes some additional CPU. Still, I'm happy to trade that for the drastically reduced I/O of the first query, since typically I/O is the critical performance issue, not CPU. [For example, on my main prod instances, I've got I/O latency on many assorted files, with CPU typically between 8% and 16% and rarely exceeding 25% (except for backups, which seem to push it around 50%).]
I don't think creating and index with that column desc would help performance, at least not much anyway. The main reason you would want to put desc on an index is if rows are being inserted in that order. That way the index would get less fragmented. As the dates would be inserted in ascending order I would still recommend having the index with RecordTimestamp ASC.
What actually does improve performance is to leave off the INCLUDE columns as it makes the index smaller.
July 24, 2018 at 11:17 am
Jonathan AC Roberts - Tuesday, July 24, 2018 11:02 AMI don't think creating and index with that column desc would help performance, at least not much anyway.
I've already stated why in fact it does, because it eliminates the need for a sort to satisfy the ROW_NUMBER().
The main reason you would want to put desc on an index is if rows are being inserted in that order. That way the index would get less fragmented. As the dates would be inserted in ascending order I would still recommend having the index with RecordTimestamp ASC.
Generally, for most index uses, yes. But, since by definition you're inserting rows all over the place in this table anyway, you'll get almost the same fragmentation either way, might as well save lots of resources on the SELECTs.
What actually does improve performance is to leave off the INCLUDE columns as it makes the index smaller.
Not here, as it will force SQL to revert to a full table scan. Without the INCLUDE columns, there's no point in having this index at all.
Typically you should cluster such a table by RecordTimestamp first anyway, so you'd either need a covering index or just accept a sort for this query. If this query were reasonably rare, I'd just allow SQL to sort, and hopefully be able to limit the RecordTimestamp to a reasonable time range.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 15 posts - 1 through 15 (of 32 total)
You must be logged in to reply to this topic. Login to reply