October 24, 2012 at 10:59 am
The Dixie Flatline (10/23/2012)
p.s. I'm pretty sure that the presence of any NULL dates are going to throw a wrench into the gears.Agreed. One fix would be to add COALESCE or ISNULL expressions to work with the date parameters in order to force NULLS to high dates when calculating minimums, and low when calculating maximums.
P.S. Thanks for running the time trials, Luis. Would you run it again on your machine using Jeff's function? I wouldn't expect any overhead from CROSS APPLYing an itvf, but it would still be nice to see the final results. 🙂
OK, I've run it again and added NULLs to the test. As expected, my query that was later transformed into a function had to be fixed to show the correct results. With that fix, there was an important change in the time results. However, I wasn't sure on the best way to fix Dixie's V2 so I didn't include it in the test.
I'm attaching the functions scripts.
The results on miliseconds for the million row test is here.
Dixie's V1 and Eugenes are basically the same but they differ on how they construct the table.
--Average after 50 runs
Dixie_V1 Luis Eugene
----------- ----------- -----------
570 638 572
--Detail of the 50 runs
Dixie_V1 Luis Eugene
----------- ----------- -----------
573 647 590
571 647 581
561 643 559
559 643 599
558 634 563
563 638 559
554 631 561
562 631 567
582 631 559
556 630 566
563 633 557
558 629 559
576 630 561
560 634 575
562 634 567
559 636 588
565 656 565
554 637 562
567 634 580
566 630 561
565 654 618
574 658 588
619 649 626
603 651 569
577 637 569
605 646 588
567 636 560
580 642 559
600 659 618
637 642 582
568 637 573
571 640 570
565 637 564
568 660 566
561 631 563
565 632 562
567 633 554
565 634 617
566 632 587
562 631 565
567 630 568
564 637 571
586 630 565
563 642 561
563 631 566
571 636 561
570 634 573
571 633 564
560 630 560
564 634 567
October 24, 2012 at 12:37 pm
Another possible solution:
USE ProofOfConcept;
SET NOCOUNT ON;
IF OBJECT_ID(N'tempdb..#T') IS NOT NULL
DROP TABLE #T;
CREATE TABLE #T
(ID INT IDENTITY
PRIMARY KEY,
Col1 DATE,
Col2 DATE,
Col3 DATE,
Col4 DATE);
WITH Seeds(Seed)
AS (SELECT *
FROM ( VALUES ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1),
( 1), ( 1), ( 1) ) AS TVC (Seed)),
Numbers(Number)
AS (SELECT ROW_NUMBER() OVER (ORDER BY S1.Seed)
FROM Seeds AS S1
CROSS JOIN Seeds AS S2
CROSS JOIN Seeds AS S3
CROSS JOIN Seeds AS S4
CROSS JOIN Seeds AS S5
CROSS JOIN Seeds AS S6)
INSERT INTO #T
(Col1,
Col2,
Col3,
Col4)
SELECT DATEADD(DAY, CHECKSUM(NEWID()) % 10000, 0),
DATEADD(DAY, CHECKSUM(NEWID()) % 10000, 0),
DATEADD(DAY, CHECKSUM(NEWID()) % 10000, 0),
DATEADD(DAY, CHECKSUM(NEWID()) % 10000, 0)
FROM Numbers;
WITH CTE
AS (SELECT TOP 10000
*
FROM #T
ORDER BY NEWID())
UPDATE CTE
SET Col1 = NULL;
WITH CTE
AS (SELECT TOP 10000
*
FROM #T
ORDER BY NEWID())
UPDATE CTE
SET Col2 = NULL;
WITH CTE
AS (SELECT TOP 10000
*
FROM #T
ORDER BY NEWID())
UPDATE CTE
SET Col3 = NULL;
WITH CTE
AS (SELECT TOP 10000
*
FROM #T
ORDER BY NEWID())
UPDATE CTE
SET Col4 = NULL;
GO
IF OBJECT_ID(N'tempdb..#T2') IS NOT NULL
DROP TABLE #T2;
IF OBJECT_ID(N'tempdb..#T3') IS NOT NULL
DROP TABLE #T3;
DECLARE @Start DATETIME = GETDATE();
SELECT ID,
MAX([Date]) AS MaxDate
INTO #T2
FROM (SELECT *
FROM #T UNPIVOT ( [Date] FOR Col IN (Col1, Col2, Col3, Col4) ) AS unpvt)
AS Src
GROUP BY ID;
SELECT DATEDIFF(millisecond, @Start, GETDATE());
SET @Start = GETDATE();
SELECT ID,
MaxDate
INTO #T3
FROM #T
CROSS APPLY (SELECT MaxDate
FROM dbo.FindMinMaxDate4(Col1, Col2, Col3, Col4)) AS MinMax;
SELECT DATEDIFF(millisecond, @Start, GETDATE());
GO 5
(Includes full test harness and test data.)
UDF modified as follows:
USE [ProofOfConcept]
GO
/****** Object: UserDefinedFunction [dbo].[FindMinMaxDate4] Script Date: 10/24/2012 14:24:13 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[FindMinMaxDate4]
(@pDateTime1 DATETIME,
@pDateTime2 DATETIME,
@pDateTime3 DATETIME,
@pDateTime4 DATETIME)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
SELECT MinDate = CASE WHEN @pDateTime1 <= COALESCE(@pDateTime2, @pDateTime3, @pDateTime4, @pDateTime1)
AND @pDateTime1 <= COALESCE(@pDateTime3, @pDateTime4, @pDateTime1)
AND @pDateTime1 <= COALESCE(@pDateTime4, @pDateTime1)
THEN @pDateTime1
WHEN @pDateTime2 <= COALESCE(@pDateTime3, @pDateTime4, @pDateTime2)
AND @pDateTime2 <= COALESCE(@pDateTime4, @pDateTime2)
THEN @pDateTime2
WHEN @pDateTime3 <= COALESCE(@pDateTime4, @pDateTime3) THEN @pDateTime3
ELSE @pDateTime4
END,
MaxDate = CASE WHEN @pDateTime1 >= COALESCE(@pDateTime2, @pDateTime3, @pDateTime4, @pDateTime1)
AND @pDateTime1 >= COALESCE(@pDateTime3, @pDateTime4, @pDateTime1)
AND @pDateTime1 >= COALESCE(@pDateTime4, @pDateTime1)
THEN @pDateTime1
WHEN @pDateTime2 >= COALESCE(@pDateTime3, @pDateTime4, @pDateTime2)
AND @pDateTime2 >= COALESCE(@pDateTime4, @pDateTime2)
THEN @pDateTime2
WHEN @pDateTime3 >= COALESCE(@pDateTime4, @pDateTime3) THEN @pDateTime3
ELSE @pDateTime4
END;
GO
Results on my computer (high-end workstation):
Unpivot vs UDF
993 vs 1326
1006 vs 1016
946 vs 910
1006 vs 1020
1040 vs 1043
The UDF was slower 3 out of 5, but marginally so.
When I added (to the bottom of the test, before "GO 5"):
CHECKPOINT;
DBCC DROPCLEANBUFFERS;
Unpivot vs UDF
910 vs 990
980 vs 1093
983 vs 1100
953 vs 1080
986 vs 1120
So, if you can't count on the data being in the buffer already, then the Unpivot option is faster 5 for 5. Again, that's a fraction of a second for 1-million rows of data, which is marginal at best.
All my tests were run on a quad-core CPU with 16 Gig RAM, @@VERSION = Microsoft SQL Server 2008 R2 (SP1) - 10.50.2550.0 (X64) Jun 11 2012 16:41:53 Copyright (c) Microsoft Corporation Developer Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1).
I used this to determine whether both were returning the same results:
SELECT COUNT(*)
FROM #T2 AS T2
FULL OUTER JOIN #T3 AS T3
ON T2.ID = T3.ID
WHERE T2.ID IS NULL
OR T3.ID IS NULL
OR T2.MaxDate != T3.MaxDate;
Count = 0
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
October 24, 2012 at 2:16 pm
Thanks for all the replies. In case it make a doifference heres the tables DDL . Its a very simple table.
The thing is a queue of reports called requests. There are 4 date/Time fields indicating the date/Time that the request first entered the queue, when it is scheduled to be processed when it actually started and when its done being processed. Sometimes the Scheduled column is equal to EnteredQueue while at other times its later then Enteredqueue and in some cases its before. I don’t know why it sometimes does this only that it does.
When the Scheduled date/Time is earlier then the EnteredQueue date/time it messes up my calculations I run on the queue that determine how long on average a request waited in the queue and how long it took to process. I was looking for an easy way to return the lowest or highest value of 2 or more columns in the same row. I can Case it out but I try to avoid Case when possible just out of habit.
Thanks again
CREATE TABLE dbo.LE_QUEUE
(
hID NUMERIC(18,0) IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
RequestedBy VARCHAR(50) NOT NULL,
Status INT NOT NULL,
EnteredQueue DATETIME NOT NULL,
Scheduled DATETIME NOT NULL,
RequestStart DATETIME NULL,
RequestEnd DATETIME NULL,
hRequestDetail NUMERIC(18,0) NOT NULL
CONSTRAINT PK_LE_QUEUE PRIMARY KEY CLUSTERED ( hID ASC )
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
)
ON [PRIMARY]
Kindest Regards,
Just say No to Facebook!October 26, 2012 at 8:21 am
GSquared (10/24/2012)
Another possible solution:
USE ProofOfConcept;
SET NOCOUNT ON;.... OR T2.MaxDate != T3.MaxDate;
Count = 0
I just went to test out the ProofOfConcept UDf you provided and I realized that I don't see how I can use it so I'm hoping you can point out what it is that I'm not seeing.
In my prior post I listed the DDL for the table I'm working with which has 4 date columns. What I'm trying to do is to return the Max date of 2 dateTime columns in a table. If I use the UDF I don't see how it can work.
Its not that I'm not appreciative for the help but I just don't see hwo it can bd used. What is it I'm not seeing with the UDF you provded?
Thanks
Kindest Regards,
Just say No to Facebook!October 26, 2012 at 9:06 am
You originally were talking about four dates, so that's how we wrote the functions.
Inline table valued functions can be implemented with a CROSS APPLY, like so.
declare @datetimes table (ID int identity(1,1), date1 datetime, date2 datetime, date3 datetime, date4 datetime)
insert into @datetimes
select '11/03/2011','10/01/2012','01/01/2012','12/22/2011'
union all
select '11/03/2011','10/01/2012','01/01/2011','12/22/2012'
select ID,ca.*
from @datetimes
cross apply dbo.HiLoDate2(date1,date2,date3,date4) ca
Edited to add_ a second row of data with an ID column to the example.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
October 26, 2012 at 9:25 am
Or did you mean to do something like this?
declare @datetimes table (ID int identity(1,1), date1 datetime, date2 datetime)
insert into @datetimes
select '11/03/2011','12/01/2012'
union all
select '11/03/2011','10/01/2011'
select MAX(case when Date1 >= Date2 then date1 else date2 end) as LatestDate
from @datetimes
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
October 29, 2012 at 6:44 am
YSLGuru (10/26/2012)
GSquared (10/24/2012)
Another possible solution:
USE ProofOfConcept;
SET NOCOUNT ON;.... OR T2.MaxDate != T3.MaxDate;
Count = 0
I just went to test out the ProofOfConcept UDf you provided and I realized that I don't see how I can use it so I'm hoping you can point out what it is that I'm not seeing.
In my prior post I listed the DDL for the table I'm working with which has 4 date columns. What I'm trying to do is to return the Max date of 2 dateTime columns in a table. If I use the UDF I don't see how it can work.
Its not that I'm not appreciative for the help but I just don't see hwo it can bd used. What is it I'm not seeing with the UDF you provded?
Thanks
I didn't provide the UDF. I just modified it so it would work with NULL inputs.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
October 29, 2012 at 6:46 am
YSLGuru (10/24/2012)
Thanks for all the replies. In case it make a doifference heres the tables DDL . Its a very simple table.The thing is a queue of reports called requests. There are 4 date/Time fields indicating the date/Time that the request first entered the queue, when it is scheduled to be processed when it actually started and when its done being processed. Sometimes the Scheduled column is equal to EnteredQueue while at other times its later then Enteredqueue and in some cases its before. I don’t know why it sometimes does this only that it does.
When the Scheduled date/Time is earlier then the EnteredQueue date/time it messes up my calculations I run on the queue that determine how long on average a request waited in the queue and how long it took to process. I was looking for an easy way to return the lowest or highest value of 2 or more columns in the same row. I can Case it out but I try to avoid Case when possible just out of habit.
Thanks again
CREATE TABLE dbo.LE_QUEUE
(
hID NUMERIC(18,0) IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
RequestedBy VARCHAR(50) NOT NULL,
Status INT NOT NULL,
EnteredQueue DATETIME NOT NULL,
Scheduled DATETIME NOT NULL,
RequestStart DATETIME NULL,
RequestEnd DATETIME NULL,
hRequestDetail NUMERIC(18,0) NOT NULL
CONSTRAINT PK_LE_QUEUE PRIMARY KEY CLUSTERED ( hID ASC )
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
)
ON [PRIMARY]
You're just comparing two columns to see which is earliest? Just use a Case statement and get it done. No reason to avoid them, Case isn't like cursors.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
October 29, 2012 at 3:24 pm
GSquared (10/29/2012)
YSLGuru (10/24/2012)
Thanks for all the replies. In case it make a doifference heres the tables DDL . Its a very simple table.The thing is a queue of reports called requests. There are 4 date/Time fields indicating the date/Time that the request first entered the queue, when it is scheduled to be processed when it actually started and when its done being processed. Sometimes the Scheduled column is equal to EnteredQueue while at other times its later then Enteredqueue and in some cases its before. I don’t know why it sometimes does this only that it does.
When the Scheduled date/Time is earlier then the EnteredQueue date/time it messes up my calculations I run on the queue that determine how long on average a request waited in the queue and how long it took to process. I was looking for an easy way to return the lowest or highest value of 2 or more columns in the same row. I can Case it out but I try to avoid Case when possible just out of habit.
Thanks again
CREATE TABLE dbo.LE_QUEUE
(
hID NUMERIC(18,0) IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
RequestedBy VARCHAR(50) NOT NULL,
Status INT NOT NULL,
EnteredQueue DATETIME NOT NULL,
Scheduled DATETIME NOT NULL,
RequestStart DATETIME NULL,
RequestEnd DATETIME NULL,
hRequestDetail NUMERIC(18,0) NOT NULL
CONSTRAINT PK_LE_QUEUE PRIMARY KEY CLUSTERED ( hID ASC )
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
)
ON [PRIMARY]
You're just comparing two columns to see which is earliest? Just use a Case statement and get it done. No reason to avoid them, Case isn't like cursors.
Thats exactly what I was going for. I originally mentioned 4 date columns as thats how many date/time columns the table has and I was orginally told I would need to look at all 4. As things proressed we eliminated the need to check all 4, narrowing it down to just 2 columns. I have a Case statemtn now but Iwas checking to see if there was something better then CASE. While it may be just 2 columns its millions of rows of data in a top heavy table.
Thanks
Kindest Regards,
Just say No to Facebook!Viewing 9 posts - 16 through 23 (of 23 total)
You must be logged in to reply to this topic. Login to reply