joining multiple tables without a join

Question

joining multiple tables without a join

skippyV

SSC Eights!

Points: 953
More actions
August 4, 2016 at 10:39 am

#310223

Hello,
Working with existing SQL (and new to SQL myself), and I'm trying to figure out how my version of a query (with JOINS) differs from
the original query (without JOINS and with a multiple FROM statement).
This is the existing query:
SELECT
yada
yada
FROM tbA, tbB, tbC
WHERE
tbA.X = tbB.X
AND tbB.Y = tbC.Y
AND tbA.Z = tbC.Z
and after reading about JOINs, I figured this was more correct:
SELECT
yada
yada
FROM tbA
JOIN tbB
ON tbA.X = tbB.X
JOIN tbC
ON tbB.Y = tbC.Y
AND tbA.Z = tbC.Z
but it created an out of memory error on the system.
So obviously I don't understand the joining process well enough.
Originally I had LEFT JOINs in it - but didn't want the whole set from the one table. And I know now that I should have "INNER JOIN" in place of "JOIN" for the sake of clarity.
But I can't find examples that explain the FROM with multiple tables.
And how that compares to JOINed version of the same.
I can't post the actual code. But I can describe the tables.
There are no indexes and no primary or foreign keys.
Mostly all strings and the equality relations are done on just string matching.
Also there is a one-to-many relationship between tbA matching fields and tbB.
And a one-one relationship between tbB matching fields and tbC.
And the final relationship between tbA and tbC is one to one with respect to the first two conditions already met.
Hope that makes sense.
Size wise:
tbA has 1.2 million records
tbB has 427 million records
tbC has 16 hundred records
I don't know how knowledgeable the original creator of these tables and SQL was, but I can't throw stones because at least his query eventually shows results. Whereas my JOIN version just choked.
So I'm asking if this query could easily be improved, but more importantly I'm trying to understand why the JOIN version sucked big time!
Thanks,
skippyV

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply

Luis Cazares SSC Guru Points: 183706 More actions · Answer 1

The out of memory error is most likely returned by SSMS or the client application that you were using. That happens when dealing with large results sets.

Your rewrite is correct, except for the comma that shouldn't be there which I suppose was left after the copy-paste process.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

skippyV SSC Eights! Points: 953 More actions · Answer 2

yeah the comma was just a paste error - I'll edit that.

So are the 2 queries functionally equivalent? And therefore no performance difference?

Luis Cazares SSC Guru Points: 183706 More actions · Answer 3

skippyV (8/4/2016)
I can't post the actual code. But I can describe the tables.
There are no indexes and no primary or foreign keys.
...
Size wise:
tbA has 1.2 million records
tbB has 427 million records
tbC has 16 hundred records
...
skippyV

With 1.2 and 427 million rows, you really should add an index to those tables. You should start by defining the clustered index and add non-clustered indexes as needed. The clustered index creation might take a while and the table will become unavailable, so take that into consideration.

If you can't share the tables structure (you could post altered names if possible) and don't feel confident on defining the correct indexes, you should hire a consultant to fix this.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Phil Parkin SSC Guru Points: 246986 More actions · Answer 4

skippyV (8/4/2016)
yeah the comma was just a paste error - I'll edit that.
So are the 2 queries functionally equivalent? And therefore no performance difference?

No difference other than in syntax. What actually gets executed will be the same.

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

skippyV SSC Eights! Points: 953 More actions · Answer 5

Luis Cazares (8/4/2016)
With 1.2 and 427 million rows, you really should add an index to those tables. You should start by defining the clustered index and add non-clustered indexes as needed. The clustered index creation might take a while and the table will become unavailable, so take that into consideration.
If you can't share the tables structure (you could post altered names if possible) and don't feel confident on defining the correct indexes, you should hire a consultant to fix this.

Thanks for the info. Fortunately this is for an archive type need and isn't something that will be used much (if ever).

So cost/benefit considerations just wouldn't result in spending more money to optimize these tables.

skippyV SSC Eights! Points: 953 More actions · Answer 6

Phil Parkin (8/4/2016)
No difference other than in syntax. What actually gets executed will be the same.

Thanks for the clarification. Luis pretty much said that but I wanted to make sure..

Luis Cazares SSC Guru Points: 183706 More actions · Answer 7

skippyV (8/4/2016)
Luis Cazares (8/4/2016)
With 1.2 and 427 million rows, you really should add an index to those tables. You should start by defining the clustered index and add non-clustered indexes as needed. The clustered index creation might take a while and the table will become unavailable, so take that into consideration.
If you can't share the tables structure (you could post altered names if possible) and don't feel confident on defining the correct indexes, you should hire a consultant to fix this.
Thanks for the info. Fortunately this is for an archive type need and isn't something that will be used much (if ever).
So cost/benefit considerations just wouldn't result in spending more money to optimize these tables.

At least, consider creating the clustered index on X for both tables, as that's probably a good option even if it's not the best. Trust me, even if it's not going to be used much, you'll prefer that this completes in a few minutes instead of taking hours.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

ScottPletcher SSC Guru Points: 101122 More actions · Answer 8

Given that there is no clustered index now, you might as well clus to match this query, something like below. If you have DATA_COMPRESSION available, be sure to use it for tbB at least!

CREATE CLUSTERED INDEX tbA__CL ON dbo.tbA ( X )

WITH ( /*DATA_COMPRESSION = ROW,*/ FILLFACTOR = 98 );

CREATE CLUSTERED INDEX tbB__CL ON dbo.tbB ( X )

WITH ( /*DATA_COMPRESSION = PAGE,*/ FILLFACTOR = 98, SORT_IN_TEMPDB = ON );

CREATE CLUSTERED INDEX tbC__CL ON dbo.tbC ( Y, Z )

WITH ( /*DATA_COMPRESSION = ROW,*/ FILLFACTOR = 98 );

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

skippyV SSC Eights! Points: 953 More actions · Answer 9

ScottPletcher (8/4/2016)
Given that there is no clustered index now, you might as well clus to match this query, something like below. If you have DATA_COMPRESSION available, be sure to use it for tbB at least!

All these fields are nvarchar(50). So when I attempted the index you provided (thank you btw), an error popped up saying 'X' is of a type that is invalid for an index.

So would I have to add checksum columns for each of those fields and then change the query to use the checksums?

ScottPletcher SSC Guru Points: 101122 More actions · Answer 10

skippyV (8/5/2016)
ScottPletcher (8/4/2016)
Given that there is no clustered index now, you might as well clus to match this query, something like below. If you have DATA_COMPRESSION available, be sure to use it for tbB at least!
All these fields are nvarchar(50). So when I attempted the index you provided (thank you btw), an error popped up saying 'X' is of a type that is invalid for an index.
So would I have to add checksum columns for each of those fields and then change the query to use the checksums?

Afaik, there should be no problem at all using an nvarchar(50) for an index. Are you using commands, such as I wrote, or some type of gui interface or tool to create the indexes?

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

skippyV SSC Eights! Points: 953 More actions · Answer 11

This is the query I used. I extrapolated because I thought I understand the model your query involved.

Basically the index must cover each of the fields being compared on.

So table A and B both have 2 fields being compared on.

CREATE CLUSTERED INDEX indxCL_tblA_apples_pears ON [MyDB].[dbo].[tblA] ( apples, pears )

WITH ( /*DATA_COMPRESSION = ROW,*/ FILLFACTOR = 98 );

CREATE CLUSTERED INDEX indxCL_tblB_apples_oranges ON [MyDB].[dbo].[tblB] ( apples, oranges )

WITH ( /*DATA_COMPRESSION = PAGE,*/ FILLFACTOR = 98, SORT_IN_TEMPDB = ON );

CREATE CLUSTERED INDEX indxCL_tblC_oranges_pears ON [MyDB].[dbo].[tblC] ( oranges, pears )

WITH ( /*DATA_COMPRESSION = ROW,*/ FILLFACTOR = 98 );

In SQL Server Management Studio, apples and oranges on the 2nd create index - for tblB - is what had the red underline error codes.

Did I miss the mark?

ScottPletcher SSC Guru Points: 101122 More actions · Answer 12

Again, there should be no issue at all indexing on an nvarchar column (although I can't imagine why you'd actually need unicode for that type of data).

Can you post the actual column definition? Is it a computed column??

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

skippyV SSC Eights! Points: 953 More actions · Answer 13

The error message from the tooltip over either of the red-underlined column names is: "... is of a type that is invalid for use as a key column in a index."

None of the columns used in the index commands are computed columns.

And their properties are identical to this http://inky.ws/g/3ii

Perhaps it's because the second table's column for 'apples' is (nvarchar(max), not null)?

Instead of (nvarchar(50), not null) as is the 'apples' column for table A.

Luis Cazares SSC Guru Points: 183706 More actions · Answer 14

skippyV (8/8/2016)
The error message from the tooltip over either of the red-underlined column names is: "... is of a type that is invalid for use as a key column in a index."
None of the columns used in the index commands are computed columns.
And their properties are identical to this http://inky.ws/g/3ii
Perhaps it's because the second table's column for 'apples' is (nvarchar(max), not null)?
Instead of (nvarchar(50), not null) as is the 'apples' column for table A.

That's exactly the problem. nvarchar(max), varchar(max) and varbinary(max) are invalid as key columns in indexes. Defining all columns as long strings is lazy designing and will cause problems like the one you're facing now. It should be changed to a proper typed design.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2