August 4, 2016 at 10:39 am
Hello,
Working with existing SQL (and new to SQL myself), and I'm trying to figure out how my version of a query (with JOINS) differs from
the original query (without JOINS and with a multiple FROM statement).
This is the existing query:
SELECT
yada
yada
FROM tbA, tbB, tbC
WHERE
tbA.X = tbB.X
AND tbB.Y = tbC.Y
AND tbA.Z = tbC.Z
and after reading about JOINs, I figured this was more correct:
SELECT
yada
yada
FROM tbA
JOIN tbB
ON tbA.X = tbB.X
JOIN tbC
ON tbB.Y = tbC.Y
AND tbA.Z = tbC.Z
but it created an out of memory error on the system.
So obviously I don't understand the joining process well enough.
Originally I had LEFT JOINs in it - but didn't want the whole set from the one table. And I know now that I should have "INNER JOIN" in place of "JOIN" for the sake of clarity.
But I can't find examples that explain the FROM with multiple tables.
And how that compares to JOINed version of the same.
I can't post the actual code. But I can describe the tables.
There are no indexes and no primary or foreign keys.
Mostly all strings and the equality relations are done on just string matching.
Also there is a one-to-many relationship between tbA matching fields and tbB.
And a one-one relationship between tbB matching fields and tbC.
And the final relationship between tbA and tbC is one to one with respect to the first two conditions already met.
Hope that makes sense.
Size wise:
tbA has 1.2 million records
tbB has 427 million records
tbC has 16 hundred records
I don't know how knowledgeable the original creator of these tables and SQL was, but I can't throw stones because at least his query eventually shows results. Whereas my JOIN version just choked.
So I'm asking if this query could easily be improved, but more importantly I'm trying to understand why the JOIN version sucked big time!
Thanks,
skippyV
August 4, 2016 at 10:50 am
The out of memory error is most likely returned by SSMS or the client application that you were using. That happens when dealing with large results sets.
Your rewrite is correct, except for the comma that shouldn't be there which I suppose was left after the copy-paste process.
August 4, 2016 at 10:55 am
yeah the comma was just a paste error - I'll edit that.
So are the 2 queries functionally equivalent? And therefore no performance difference?
August 4, 2016 at 10:58 am
skippyV (8/4/2016)
I can't post the actual code. But I can describe the tables.There are no indexes and no primary or foreign keys.
...
Size wise:
tbA has 1.2 million records
tbB has 427 million records
tbC has 16 hundred records
...
skippyV
With 1.2 and 427 million rows, you really should add an index to those tables. You should start by defining the clustered index and add non-clustered indexes as needed. The clustered index creation might take a while and the table will become unavailable, so take that into consideration.
If you can't share the tables structure (you could post altered names if possible) and don't feel confident on defining the correct indexes, you should hire a consultant to fix this.
August 4, 2016 at 10:59 am
skippyV (8/4/2016)
yeah the comma was just a paste error - I'll edit that.So are the 2 queries functionally equivalent? And therefore no performance difference?
No difference other than in syntax. What actually gets executed will be the same.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
August 4, 2016 at 11:06 am
Luis Cazares (8/4/2016)
With 1.2 and 427 million rows, you really should add an index to those tables. You should start by defining the clustered index and add non-clustered indexes as needed. The clustered index creation might take a while and the table will become unavailable, so take that into consideration.If you can't share the tables structure (you could post altered names if possible) and don't feel confident on defining the correct indexes, you should hire a consultant to fix this.
Thanks for the info. Fortunately this is for an archive type need and isn't something that will be used much (if ever).
So cost/benefit considerations just wouldn't result in spending more money to optimize these tables.
August 4, 2016 at 11:12 am
Phil Parkin (8/4/2016)
No difference other than in syntax. What actually gets executed will be the same.
Thanks for the clarification. Luis pretty much said that but I wanted to make sure..
August 4, 2016 at 11:18 am
skippyV (8/4/2016)
Luis Cazares (8/4/2016)
With 1.2 and 427 million rows, you really should add an index to those tables. You should start by defining the clustered index and add non-clustered indexes as needed. The clustered index creation might take a while and the table will become unavailable, so take that into consideration.If you can't share the tables structure (you could post altered names if possible) and don't feel confident on defining the correct indexes, you should hire a consultant to fix this.
Thanks for the info. Fortunately this is for an archive type need and isn't something that will be used much (if ever).
So cost/benefit considerations just wouldn't result in spending more money to optimize these tables.
At least, consider creating the clustered index on X for both tables, as that's probably a good option even if it's not the best. Trust me, even if it's not going to be used much, you'll prefer that this completes in a few minutes instead of taking hours.
August 4, 2016 at 3:56 pm
Given that there is no clustered index now, you might as well clus to match this query, something like below. If you have DATA_COMPRESSION available, be sure to use it for tbB at least!
CREATE CLUSTERED INDEX tbA__CL ON dbo.tbA ( X )
WITH ( /*DATA_COMPRESSION = ROW,*/ FILLFACTOR = 98 );
CREATE CLUSTERED INDEX tbB__CL ON dbo.tbB ( X )
WITH ( /*DATA_COMPRESSION = PAGE,*/ FILLFACTOR = 98, SORT_IN_TEMPDB = ON );
CREATE CLUSTERED INDEX tbC__CL ON dbo.tbC ( Y, Z )
WITH ( /*DATA_COMPRESSION = ROW,*/ FILLFACTOR = 98 );
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
August 5, 2016 at 3:20 pm
ScottPletcher (8/4/2016)
Given that there is no clustered index now, you might as well clus to match this query, something like below. If you have DATA_COMPRESSION available, be sure to use it for tbB at least!
All these fields are nvarchar(50). So when I attempted the index you provided (thank you btw), an error popped up saying 'X' is of a type that is invalid for an index.
So would I have to add checksum columns for each of those fields and then change the query to use the checksums?
August 5, 2016 at 3:51 pm
skippyV (8/5/2016)
ScottPletcher (8/4/2016)
Given that there is no clustered index now, you might as well clus to match this query, something like below. If you have DATA_COMPRESSION available, be sure to use it for tbB at least!All these fields are nvarchar(50). So when I attempted the index you provided (thank you btw), an error popped up saying 'X' is of a type that is invalid for an index.
So would I have to add checksum columns for each of those fields and then change the query to use the checksums?
Afaik, there should be no problem at all using an nvarchar(50) for an index. Are you using commands, such as I wrote, or some type of gui interface or tool to create the indexes?
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
August 6, 2016 at 8:00 am
This is the query I used. I extrapolated because I thought I understand the model your query involved.
Basically the index must cover each of the fields being compared on.
So table A and B both have 2 fields being compared on.
CREATE CLUSTERED INDEX indxCL_tblA_apples_pears ON [MyDB].[dbo].[tblA] ( apples, pears )
WITH ( /*DATA_COMPRESSION = ROW,*/ FILLFACTOR = 98 );
CREATE CLUSTERED INDEX indxCL_tblB_apples_oranges ON [MyDB].[dbo].[tblB] ( apples, oranges )
WITH ( /*DATA_COMPRESSION = PAGE,*/ FILLFACTOR = 98, SORT_IN_TEMPDB = ON );
CREATE CLUSTERED INDEX indxCL_tblC_oranges_pears ON [MyDB].[dbo].[tblC] ( oranges, pears )
WITH ( /*DATA_COMPRESSION = ROW,*/ FILLFACTOR = 98 );
In SQL Server Management Studio, apples and oranges on the 2nd create index - for tblB - is what had the red underline error codes.
Did I miss the mark?
August 8, 2016 at 8:57 am
Again, there should be no issue at all indexing on an nvarchar column (although I can't imagine why you'd actually need unicode for that type of data).
Can you post the actual column definition? Is it a computed column??
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
August 8, 2016 at 10:29 am
The error message from the tooltip over either of the red-underlined column names is: "... is of a type that is invalid for use as a key column in a index."
None of the columns used in the index commands are computed columns.
And their properties are identical to this http://inky.ws/g/3ii
Perhaps it's because the second table's column for 'apples' is (nvarchar(max), not null)?
Instead of (nvarchar(50), not null) as is the 'apples' column for table A.
August 8, 2016 at 10:40 am
skippyV (8/8/2016)
The error message from the tooltip over either of the red-underlined column names is: "... is of a type that is invalid for use as a key column in a index."None of the columns used in the index commands are computed columns.
And their properties are identical to this http://inky.ws/g/3ii
Perhaps it's because the second table's column for 'apples' is (nvarchar(max), not null)?
Instead of (nvarchar(50), not null) as is the 'apples' column for table A.
That's exactly the problem. nvarchar(max), varchar(max) and varbinary(max) are invalid as key columns in indexes. Defining all columns as long strings is lazy designing and will cause problems like the one you're facing now. It should be changed to a proper typed design.
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply