November 30, 2012 at 5:45 pm
ScottPletcher (11/30/2012)
Jeff Moden (11/29/2012)
capnhector (11/29/2012)
EDIT: This is also for me to play with a couple of things and get more experience with them. so it may not be fast or work in your situation but does use some things i have been studying.No.... you're absolutely on the correct path. The MIN=MAX method the others used is twice as slow as the original function. You can make it a bit faster still by turning it into an iTVF instead of a scalar function. And lose the join. If you rework the GenericTable TYPE just a bit, you won't need it. Give it a shot. You've got this, Cap'n!
Really??
Doesn't the DISTINCT require an (expensive) sort but min/max don't?
Correct but the min/max still turns out slower. I'll try to remember to post my test code when I get home from work.
I also have an idea for solving the "base" problem here that might make both solutions seem slow.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 30, 2012 at 10:35 pm
Actually, I spoke a bit too soon. I was testing with 10 columns per row to be compared. There’s a cross-over point where the MIN/MAX method wins for 1 to 4 comparison columns and the COUNT(DISTINCT) wins for 5 and above.
Here’s the code I’ve been using to test with. To make things faster, I did like CapnHector did and converted both functions to iTVFs instead of leaving them as mTVFs.
--=====================================================================================================================
-- This section conditionally drops all of the objects and rebuilds them to make reruns in SSMS easier.
--=====================================================================================================================
--===== Do this in a nice safe place that everyone has.
USE tempdb
;
--===== Conditionally drop objects in dependency order to make reruns easier in SSMS
IF OBJECT_ID('dbo.AreEqualCount' ,'IF') IS NOT NULL DROP FUNCTION dbo.AreEqualCount;
IF OBJECT_ID('dbo.AreEqualMinMax','IF') IS NOT NULL DROP FUNCTION dbo.AreEqualMinMax;
IF OBJECT_ID('tempdb..#TestTable','U' ) IS NOT NULL DROP TABLE #TestTable;
IF EXISTS (SELECT * FROM sys.types WHERE name = 'GenericTable') DROP TYPE dbo.GenericTable;
GO
--===== First, rework the GenericTable just a bit
CREATE TYPE dbo.GenericTable AS TABLE (I SQL_Variant, D SQL_VARIANT);
GO
--===== Create the COUNT function with column "I" added
CREATE FUNCTION dbo.AreEqualCount
(@T AS dbo.GenericTable READONLY)
RETURNS TABLE WITH SCHEMABINDING
RETURN
SELECT I,
AllEqual = CASE WHEN COUNT(DISTINCT D) = 1 THEN 1 ELSE 0 END
FROM @T
GROUP BY I
;
GO
--===== Create the Min/Max version of the function with column "I" added
CREATE FUNCTION AreEqualMinMax
(@T AS dbo.GenericTable READONLY)
RETURNS TABLE WITH SCHEMABINDING
RETURN
SELECT I,
AllEqual = CASE WHEN MIN(D) = MAX(D) THEN 1 ELSE 0 END
FROM @T
GROUP BY I
;
GO
--===== Create the Test Table and populate it,
-- Note that it has some "L" columns that we don't want to work with.
SELECT TOP 100000
I = IDENTITY(INT,1,1),
D1 = ABS(CHECKSUM(NEWID()))%3+1,
D2 = ABS(CHECKSUM(NEWID()))%3+1,
D3 = ABS(CHECKSUM(NEWID()))%3+1,
D4 = ABS(CHECKSUM(NEWID()))%3+1,
D5 = ABS(CHECKSUM(NEWID()))%3+1,
D6 = ABS(CHECKSUM(NEWID()))%3+1,
D7 = ABS(CHECKSUM(NEWID()))%3+1,
D8 = ABS(CHECKSUM(NEWID()))%3+1,
D9 = ABS(CHECKSUM(NEWID()))%3+1,
D10 = ABS(CHECKSUM(NEWID()))%3+1,
L1 = ABS(CHECKSUM(NEWID()))%3+1,
L2 = ABS(CHECKSUM(NEWID()))%3+1,
L3 = ABS(CHECKSUM(NEWID()))%3+1
INTO #TestTable
FROM master.sys.all_columns ac1
CROSS APPLY master.sys.all_columns ac2
;
--===== Set every 5th row to have all equal values so we can see that things work correctly.
UPDATE #TestTable
SET D1 = I,
D2 = I,
D3 = I,
D4 = I,
D5 = I,
D6 = I,
D7 = I,
D8 = I,
D9 = I,
D10 = I
WHERE I%5 = 0
;
--=====================================================================================================================
-- This is the test section for the objects that we know of, so far.
--=====================================================================================================================
--===== Declare some obviously name variables
DECLARE @T GenericTable,
@StartTime DATETIME,
@Bitbucket SQL_VARIANT
;
--===== This unpivots the data and inserts it into the generic table variable.
-- Notice that we have to enter all of the desired column names
INSERT INTO @T
(I,D)
SELECT t.I, cav.D
FROM #TestTable t
CROSS APPLY (VALUES (D1),(D2),(D3),(D4),(D5),(D6),(D7),(D8),(D9),(D10)) cav (D) --Change here to check for cross-over point
;
--===== Test the COUNT function
SELECT @StartTime = GETDATE()
;
SELECT @Bitbucket = I,
@Bitbucket = AllEqual
FROM dbo.AreEqualCount(@T)
;
SELECT CountDur = DATEDIFF(ms,@StartTime, GETDATE())
;
--===== Test the MinMax function
SELECT @StartTime = GETDATE()
;
SELECT @Bitbucket = I,
@Bitbucket = AllEqual
FROM dbo.AreEqualMinMax(@T)
;
SELECT MinMaxDur = DATEDIFF(ms,@StartTime, GETDATE())
;
Here are the durations from the run.
CountDur
-----------
3650
MinMaxDur
-----------
4373
Now, my question is, what is the actual functionality that the OP is asking for in the original post? Although it looks like “using VALUES” in a function, I think the actual goal is to be able to point the code at virtually any table and with as little fuss as possible, identify the columns to compare to see if they’re equal. The number of columns can vary, according to the OP. I’ll also assume that the actual columns names will all be similar to each other (i.e. “follow a pattern”).
I’ll also assume that the OP not only wants this to be easy to use and, of course, accurate, but would also like it to be as fast as possible.
Enter the “Dark Horse” in the form of a very old but tried and true method that avoids RBAR, avoids having to unpivot the rows, is easy to understand, can be made to work in virtually any version of SQL, and runs fast as the wind. Here’s the code and the test.
--=====================================================================================================================
-- If the purpose of this exercise is to make it so that we can generically test if multiple similarly named
-- columns in are identical on any table, then does it matter which method we use? Why can't it be a stored
-- procedure that takes the table name and a pattern of the columns we're looking for? And, why don't we use
-- an old trick with "CASE" to really speed up the works?
--=====================================================================================================================
--===== Conditionally drop the stored procedure to make reruns easier in SSMS.
IF OBJECT_ID('tempdb.dbo.AreEqualDynamic','P') IS NOT NULL DROP PROCEDURE dbo.AreEqualDynamic;
GO
--===== This creates the permanent stored procedure that will do all the work directly on the given table
CREATE PROCEDURE dbo.AreEqualDynamic
@pTableName VARCHAR(128),
@pPkColName VARCHAR(128),
@pColNamePattern VARCHAR(128)
AS
--===== Declare the obviously named local variable(s)
DECLARE @sql VARCHAR(MAX);
;
--===== Build the dynamic SQL. No SQL Injection possible here
WITH
cteEnumerate AS
( --=== Enumerate the column names with the assumption that the first column is the key
SELECT ColNum = ROW_NUMBER() OVER (ORDER BY column_id),
ColName = name
FROM sys.columns
WHERE object_id = OBJECT_ID(@pTableName)
AND name LIKE @pColNamePattern
) --=== Build the comparisons for dynamic SQL
SELECT @sql = ISNULL(@SQL + ' AND ','') + QUOTENAME(e1.ColName) + '=' + QUOTENAME(e2.ColName)
FROM cteEnumerate e1
JOIN cteEnumerate e2
ON e1.ColNum +1 = e2.ColNum
;
--===== Build the rest of the dynamic SQL and execute it
SELECT @sql = '
SELECT PK = ' + (SELECT QUOTENAME(name) FROM sys.columns WHERE object_id = OBJECT_ID(@pTableName) AND name = @pPkColName) + ',
AreEqual = CASE
WHEN ' + @sql + '
THEN 1
ELSE 0
END
FROM #TestTable
;
'
EXEC (@SQL)
print @sql
;
GO
;
--=====================================================================================================================
-- Now, test the code like we did with the others.
--=====================================================================================================================
--===== Test the dynamic CASE stored proc method
DECLARE @StartTime DATETIME
;
SELECT @StartTime = GETDATE()
;
EXEC dbo.AreEqualDynamic 'tempdb..#TestTable', 'I', 'D%'
;
SELECT CaseDur = DATEDIFF(ms,@StartTime, GETDATE())
;
Here’s the duration from that run on the very same data as the other two runs.
CaseDur
-----------
583
Other than the obvious advantage of it blowing the doors off the other two methods even though it didn’t dump its output to variables, there’s a huge hidden advantage. It doesn’t use the memory that a Table Variable requires. Instead, it uses the data from the table directly and it does so in an SQL Injection free manner.
BTW... here's the code that the dynamic SQL came up as...
SELECT PK = ,
AreEqual = CASE
WHEN [D1]=[D2] AND [D2]=[D3] AND [D3]=[D4] AND [D4]=[D5] AND [D5]=[D6] AND [D6]=[D7] AND [D7]=[D8] AND [D8]=[D9] AND [D9]=[D10]
THEN 1
ELSE 0
END
FROM #TestTable
;
Simple, easy to understand, and nasty fast.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 4, 2012 at 4:52 am
@jeff Moden:
Now, my question is, what is the actual functionality that the OP is asking for in the original post? Although it looks like “using VALUES” in a function, I think the actual goal is to be able to point the code at virtually any table and with as little fuss as possible, identify the columns to compare to see if they’re equal. The number of columns can vary, according to the OP. I’ll also assume that the actual columns names will all be similar to each other (i.e. “follow a pattern”).
Why I need this: our company is preparing and printing lottery games. Every game is created as a sql table and I am responsible for control of the data.
An example: Lottery ticket contains 9 fields (3x3 matrix). If there are 3 equal integers (or chars or bits or strings,...) in each horizontal, vertical or diagonal rows, then the ticket contains a win. The easyest way to test this is to have a function which accepts 3 parameters (of any type) and return 1 if all parameters are true and false otherwise. I could use something like where T1 = T2 and T1 = T3
which isn't that bad really. But what if I need to test inequality, then I would need to write something like this:
where T1 <> T2 and T1 <> T3 and T2 <> T3
As the number of parameters grows the query gets more and more complex. So a function like NotEquals(... random number parameters of same type ...)
would reduce complexity of my queries dramaticaly.
As you can imagine, there is a vast amout of possible games with complex logic behind it, so a function which accepts any number of generic parameters and returns a single value is very handy for my job. And it has to be a scalar function (and not procedure), so that I can use it in selects and/or where clause, for example:
select TicketId, Equals(T1, T2, T3), Equals(T1, T5, T9)...
select * from MyTable where Equals(T1, T2, T3)
...
I got a lot of interesting solutions in this thread, but for now I just created functions that I need in a seperate database and wrote a set of tests for them. Then I add a synonym to all other databases with a stored procedure which uses sp_msforeachdb. This way the functions are in one place and thoroughly tested. The good: simple syntax, the bad: I have to write and test each function by it self. Example functions:
create function EqualI2(@i1 int, @i2 int)...
create function EqualI3(@i1 int, @i2 int, @i3 int)...
...
create function EqualS2(@s1 varchar(max), @s2 varchar(max)...
create function EqualS3(@s1 varchar(max), @s2 varchar(max), @s3 varchar(max))...
...
create function EqualB2(@b1 bit, @b2 bit)...
create function EqualB3(@b1 bit, @b2 bit, @b3 bit,...)...
...
Edit:
- yes, this kind of function is easy to write in C# for example (with the 'params' keyword, generic parameters and EqualityComparer), but I like to use sql for set-based problems.
- I don't like the tally table approach, because then you have to use something like 'Equal(convert(varchar, I1) + ', ' + convert(varchar, I2) + ', ' ...
- the basic goal is simple syntax, because the final file of a single control could be hunderts of rows long and I really don't like to make it any longer...
December 4, 2012 at 7:33 am
_simon_ (12/4/2012)
But what if I need to test inequality, then I would need to write something like this:
where T1 <> T2 and T1 <> T3 and T2 <> T3
I'm on my way to work and will have to do a deep dive on all that you wrote tonight but thought I'd quickly answer this...
I would suggest that you never test for such an inequality in such a manner. Always test for the equality and if that test "fails", then it must be an inequality, right?. Testing for equality is almost always faster than testing for inequalities.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 4, 2012 at 11:35 pm
I would suggest that you never test for such an inequality in such a manner. Always test for the equality and if that test "fails", then it must be an inequality, right?. Testing for equality is almost always faster than testing for inequalities.
Well, this is embarrassing... :hehe:
Viewing 5 posts - 16 through 19 (of 19 total)
You must be logged in to reply to this topic. Login to reply