April 30, 2015 at 5:43 am
Afternoon ladies and gents,
I'm trying to find an alternative, more efficient way of processing some code to select the top row of a specific order.
At the moment we use row_number with a partition. Then outside the statement in the where clause state where [rownum] =1.
Kinda like this:
Select *
From
(
Select ColA,ColB,ColC,
row_number() OVER ( PARTITION BY [ColA] ,[ColB] ORDER BY [ColA] ASC, Col[C]DESC) as rownum
)A
Where rownum =1
This seems like a really horrible way to do things, especially with relatively large datasets of 500k+ so I'm looking into a new solution.
Although this seems the simplest, is it the most efficient?
April 30, 2015 at 6:01 am
Hi, welcome to the forum.
You've not given us a massive amount to go on there. If you can post some sample data it will give us a much better idea of what you mean.
At a rough guess, have you considered SELECT TOP 1 * FROM YOURTABLE ORDER BY [your order criteria]
How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537
April 30, 2015 at 6:21 am
Without anything else to go on, I would pick the TOP 1 as well. If you post the DDL for the table and include any indexes you have already, we can probably give better advice.
April 30, 2015 at 6:37 am
Have you tried a bog-standard aggregate?
SELECT ColA, ColB, ColC = MAX(ColC)
FROM …
GROUP BY ColA, ColB
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
April 30, 2015 at 7:51 am
Unfortunately I cannot post the table or data due to confidentiality reasons, however an extreme simplification of the problem:
CREATE TABLE [dbo].[Team]
(
[FirstName] [varchar](20) NULL,
[LastName] [varchar] (20) NULL,
[Team] Varchar (20) NULL,
[TransactionID] int,
Price float,
NoofTickets int
) ON [PRIMARY]
INSERT INTO [dbo].[Team]
([FirstName]
,[LastName]
,[Team]
,[TransactionID]
,[Price]
, NoofTickets)
Values
('John', 'Smith', 'Team A', 65, 69.99,2),
('John', 'Smith', 'Team A', 65, NULL,NULL),
('Davie', 'Jones', 'Team A', 89, £349.99,5),
('Davie', 'Jones', 'Team A', 89, NULL,NULL),
('Davie', 'Jones', 'Team A', 89, NULL,NULL),
('Davie', 'Jones', 'Team A', 89, NULL,NULL),
('Davie', 'Jones', 'Team A', 89, NULL,NULL),
('Jack','Wilson','Team B',55,NULL,NULL),
('Jack','Wilson','Team B',55,NULL,NULL),
('Jack','Wilson','Team B',55,£299.99,5),
('Jack','Wilson','Team B',55,NULL,NULL),
('Jack','Wilson','Team B',55,NULL,NULL)
And the query:
Select *
FROM
(
SELECT [FirstName]
,[LastName]
,[Team]
,[TransactionID]
,[Price],
Row_Number() over (partition by firstname, Transactionid order by transactionid,price desc) rownum
FROM [Test].[dbo].[Team]
)A
Where rownum = 1
Unfortunately aggregations wont work as the maximum value wont always be the value returned (I know in the really bad example that this is the case and I can put a where is not null, but its the rownumber part to focus on).
So in a nutshell, row_number() is being used to de-duplicate the data.
The solution we have works fine, me being me, i'm just want to know if there's a better way of doing things
April 30, 2015 at 8:40 am
I'm not sure if it's a better solution but it's an alternative that you could test.
SELECTDISTINCT
[FirstName]
,[LastName]
,[Team]
,[TransactionID]
,FIRST_VALUE([Price]) over (partition by firstname, Transactionid order by price desc) Price
FROM [Test].[dbo].[Team]
April 30, 2015 at 8:51 am
DROP TABLE #Team
CREATE TABLE #Team
(
[FirstName] [varchar](20) NULL,
[LastName] [varchar] (20) NULL,
[Team] Varchar (20) NULL,
[TransactionID] int,
Price float,
NoofTickets int
) ON [PRIMARY]
INSERT INTO #Team
([FirstName]
,[LastName]
,[Team]
,[TransactionID]
,[Price]
, NoofTickets)
VALUES
('John', 'Smith', 'Team A', 65, 69.99,2),
('John', 'Smith', 'Team A', 65, NULL,NULL),
('Davie', 'Jones', 'Team A', 89, £349.99,5),
('Davie', 'Jones', 'Team A', 89, NULL,NULL),
('Davie', 'Jones', 'Team A', 89, NULL,NULL),
('Davie', 'Jones', 'Team A', 89, NULL,NULL),
('Davie', 'Jones', 'Team A', 89, NULL,NULL),
('Jack','Wilson','Team B',55,NULL,NULL),
('Jack','Wilson','Team B',55,NULL,NULL),
('Jack','Wilson','Team B',55,£299.99,5),
('Jack','Wilson','Team B',55,NULL,NULL),
('Jack','Wilson','Team B',55,NULL,NULL)
-- This type of query (Current method) is critically dependant upon correct indexing:
CREATE INDEX ix_helper on #Team (firstname, Transactionid, price DESC) INCLUDE ([LastName],[Team])
----------------------------------------------------------------------------------------------------------
--------------------------------------------------------------
-- Current method
SELECT
[FirstName]
,[LastName]
,[Team]
,[TransactionID]
,[Price]
FROM
(
SELECT [FirstName]
,[LastName]
,[Team]
,[TransactionID]
,[Price],
Row_Number() over (partition by firstname, Transactionid order by price desc) rownum
FROM #Team
)A
WHERE rownum = 1
ORDER BY firstname
-- One alternative method
SELECT
d.firstname,
x.LastName,
x.Team,
d.Transactionid,
d.price
FROM (
SELECT firstname, Transactionid, price = MAX(price)
FROM #Team
GROUP BY firstname, Transactionid
) d
CROSS APPLY (
SELECT TOP 1 [LastName],[Team]
FROM #Team t
WHERE t.FirstName = d.FirstName
AND t.TransactionID = d.TransactionID
) x
ORDER BY d.firstname
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
April 30, 2015 at 9:39 am
w.zia.2015 (4/30/2015)
Afternoon ladies and gents,I'm trying to find an alternative, more efficient way of processing some code to select the top row of a specific order.
At the moment we use row_number with a partition. Then outside the statement in the where clause state where [rownum] =1.
Kinda like this:
Select *
From
(
Select ColA,ColB,ColC,
row_number() OVER ( PARTITION BY [ColA] ,[ColB] ORDER BY [ColA] ASC, Col[C]DESC) as rownum
)A
Where rownum =1
This seems like a really horrible way to do things, especially with relatively large datasets of 500k+ so I'm looking into a new solution.
Although this seems the simplest, is it the most efficient?
I don't see anything wrong with this provided that you have the right index in place. Take a look at this article:
SQL Server 2012: How to Write T-SQL Window Functions, Part 3. In there Ben-Gan discusses the POC Index. If you have the right POC index in place your query should be just fine. Even, if you don't use Windows functions to solve your problem a POC index will still come in handy.
The index that Chris creates in his solution is a great example of a POC index. It improves the performance in both example queries that he provided by eliminating a very costly sort operator.
Ben-Gan talks about this concept and provides many more examples in his book, Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions
-- Itzik Ben-Gan 2001
April 30, 2015 at 9:43 am
Alan.B (4/30/2015)
Ben-Gan's talks about this concept and provides many more examples in his book, Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions
If you don't have this book, I'd highly recommend it.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply