April 11, 2013 at 2:45 pm
Hi,
I have couple of questions related to sql server, though I googled it, did not get any definite answer. I am sure one of you SSC guru would help me to understand SQL Server in a better way :-).
Why the joining between integer columns are faster?
How the sql server joins strings internally, does it checks character by character or uses ASCII to compare strings?
Does the database engine performs an order by on the group by column first to group the data easily?
Thanks in advance.
Regards,
Pravasis
April 11, 2013 at 2:56 pm
1. Joining integer columns (provided that they are correctly indexed) is faster than joining other data types because it is relatively straightforward (see below)
2. It depends. Varchar and nvarchar columns are joined according to rules specified by the columns' collation.
3. It depends. Run the specific query and check the execution plan. If the group by column is the clustered index I would expect SQL Server exploit the natural sort order.
April 14, 2013 at 1:31 am
Mansfield (4/11/2013)
1. Joining integer columns (provided that they are correctly indexed) is faster than joining other data types because it is relatively straightforward (see below)2. It depends. Varchar and nvarchar columns are joined according to rules specified by the columns' collation.
3. It depends. Run the specific query and check the execution plan. If the group by column is the clustered index I would expect SQL Server exploit the natural sort order.
Thanks for your reply. I am still not sure why the joining of int is faster.
If I have an identity column(bigint) and a 4 character character code. Which I should be given preference while joining. Lets assume none of them have index.
Regards,
Pravasis
April 14, 2013 at 12:42 pm
Pravasis (4/14/2013)
Mansfield (4/11/2013)
1. Joining integer columns (provided that they are correctly indexed) is faster than joining other data types because it is relatively straightforward (see below)2. It depends. Varchar and nvarchar columns are joined according to rules specified by the columns' collation.
3. It depends. Run the specific query and check the execution plan. If the group by column is the clustered index I would expect SQL Server exploit the natural sort order.
Thanks for your reply. I am still not sure why the joining of int is faster.
If I have an identity column(bigint) and a 4 character character code. Which I should be given preference while joining. Lets assume none of them have index.
Let's NOT assume that "none of them have index" because proper indexing is one of the most important things there is for performance.
That, not withstanding, a 4 character character-code will join faster than a BIGINT simply because BIGINT has 8 bytes and, even though it has to go through collation checks, the 4 character character_code is half that width.
If the comparison is done with an INT instead of a BIGINT, there will be a virtual tie if you use Latin1_General_Bin collation on the character-code.
Here's some test data. Change the "CAST" in each table build to build columns with different types.
-- DROP TABLE dbo.Test, dbo.IntLookup, VarCharLookup
GO
SELECT TOP 1000000
PKCol = IDENTITY(INT,1,1)
, IntCol = CAST(ABS(CHECKSUM(NEWID()))%9+1 AS INT)
, VarCharCol = CAST(REPLICATE(SUBSTRING('ABCDEFGHIJ',ABS(CHECKSUM(NEWID()))%9+1,1),4) AS CHAR(4)) COLLATE LATIN1_General_Bin
INTO dbo.Test
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
;
ALTER TABLE dbo.Test ADD PRIMARY KEY CLUSTERED (PKCol);
CREATE INDEX IX_Test_IntCol ON dbo.Test (IntCol);
CREATE INDEX IX_Test_VarCharCol ON dbo.Test (VarCharCol);
SELECT PkCol = ISNULL(CAST(PkCol AS INT),0)
, DescCol
INTO dbo.IntLookup
FROM (
SELECT 1,'First' UNION ALL
SELECT 2,'Second' UNION ALL
SELECT 3,'Third' UNION ALL
SELECT 4,'Fourth' UNION ALL
SELECT 5,'Fifth' UNION ALL
SELECT 6,'Sixth' UNION ALL
SELECT 7,'Seventh' UNION ALL
SELECT 8,'Eighth' UNION ALL
SELECT 9,'Nineth'
) d (PkCol, DescCol)
;
ALTER TABLE dbo.IntLookup ADD PRIMARY KEY CLUSTERED (PKCol);
SELECT PkCol = ISNULL(CAST(PkCol AS CHAR(4)),'') COLLATE LATIN1_GENERAL_BIN
, DescCol
INTO dbo.VarCharLookup
FROM (
SELECT 'AAAA','First' UNION ALL
SELECT 'BBBB','Second' UNION ALL
SELECT 'CCCC','Third' UNION ALL
SELECT 'DDDD','Fourth' UNION ALL
SELECT 'EEEE','Fifth' UNION ALL
SELECT 'FFFF','Sixth' UNION ALL
SELECT 'GGGG','Seventh' UNION ALL
SELECT 'HHHH','Eighth' UNION ALL
SELECT 'IIII','Nineth'
) d (PkCol, DescCol)
;
ALTER TABLE dbo.VarCharLookup ADD PRIMARY KEY CLUSTERED (PKCol);
Here's the test code which dumps the results to a variable to take display times out of the picture.
SET STATISTICS TIME ON;
DECLARE @Bitbucket VARCHAR(10);
SELECT @Bitbucket = lu.DescCol
FROM dbo.Test t
JOIN dbo.IntLookup lu
ON t.IntCol = lu.PKCol;
SET STATISTICS TIME OFF;
GO
SET STATISTICS TIME ON;
DECLARE @Bitbucket VARCHAR(10);
SELECT @Bitbucket = lu.DescCol
FROM dbo.Test t
JOIN dbo.VarCharLookup lu
ON t.VarCharCol = lu.PKCol COLLATE LATIN1_General_Bin;
SET STATISTICS TIME OFF;
Here are the results from my older/slower machine for the code in the condition it is posted.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 1125 ms, elapsed time = 1218 ms.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 1125 ms, elapsed time = 1201 ms.
The bottom line here is to make sure the data in the join columns is "right sized" for maximum efficiency.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 15, 2013 at 4:57 am
Thanks Jeff.
I was exactly looking for something like
The bottom line here is to make sure the data in the join columns is "right sized" for maximum efficiency.
.
Thanks for the detailed clarification. Appreciate it.
Regards,
Pravasis
April 15, 2013 at 5:59 am
great explanation Jeff 🙂
_______________________________________________________________
To get quick answer follow this link:
http://www.sqlservercentral.com/articles/Best+Practices/61537/
April 15, 2013 at 3:26 pm
Pravasis (4/14/2013)
Mansfield (4/11/2013)
1. Joining integer columns (provided that they are correctly indexed) is faster than joining other data types because it is relatively straightforward (see below)2. It depends. Varchar and nvarchar columns are joined according to rules specified by the columns' collation.
3. It depends. Run the specific query and check the execution plan. If the group by column is the clustered index I would expect SQL Server exploit the natural sort order.
Thanks for your reply. I am still not sure why the joining of int is faster.
If I have an identity column(bigint) and a 4 character character code. Which I should be given preference while joining. Lets assume none of them have index.
A BigInt has a width of 8 bytes, which is twice the width of the Char(4) column. So the Char identifier column may prove faster, especially if it's a large table and SQL Server has to build a hash table in background to facilitate the join.
Either way, the joined column needs to be indexed, which will take advantage of a B-tree style search and record exclusion. Without an index on the joined column, SQL Server will be forced to perform a full table scan and read every record.
It's also important that you join between columns of the same data type. For example, join int to int and char(4) to char(4), otherwise you will end up with type casting at runtime, which is a performace hit.
Also, avoid joining on concatentated columns or functions, because that will most likely result in a non-indexable expression.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
April 15, 2013 at 4:15 pm
Pravasis (4/15/2013)
Thanks Jeff.I was exactly looking for something like
The bottom line here is to make sure the data in the join columns is "right sized" for maximum efficiency.
.
Thanks for the detailed clarification. Appreciate it.
kapil_kk (4/15/2013)
great explanation Jeff 🙂
Thank you both. Glad I could help.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply