joining multiple tables without a join

  • Hello,

    Working with existing SQL (and new to SQL myself), and I'm trying to figure out how my version of a query (with JOINS) differs from

    the original query (without JOINS and with a multiple FROM statement).

    This is the existing query:

    SELECT

    yada

    yada

    FROM tbA, tbB, tbC

    WHERE

    tbA.X = tbB.X

    AND tbB.Y = tbC.Y

    AND tbA.Z = tbC.Z

    and after reading about JOINs, I figured this was more correct:

    SELECT

    yada

    yada

    FROM tbA

    JOIN tbB

    ON tbA.X = tbB.X

    JOIN tbC

    ON tbB.Y = tbC.Y

    AND tbA.Z = tbC.Z

    but it created an out of memory error on the system.

    So obviously I don't understand the joining process well enough.

    Originally I had LEFT JOINs in it - but didn't want the whole set from the one table. And I know now that I should have "INNER JOIN" in place of "JOIN" for the sake of clarity.

    But I can't find examples that explain the FROM with multiple tables.

    And how that compares to JOINed version of the same.

    I can't post the actual code. But I can describe the tables.

    There are no indexes and no primary or foreign keys.

    Mostly all strings and the equality relations are done on just string matching.

    Also there is a one-to-many relationship between tbA matching fields and tbB.

    And a one-one relationship between tbB matching fields and tbC.

    And the final relationship between tbA and tbC is one to one with respect to the first two conditions already met.

    Hope that makes sense.

    Size wise:

    tbA has 1.2 million records

    tbB has 427 million records

    tbC has 16 hundred records

    I don't know how knowledgeable the original creator of these tables and SQL was, but I can't throw stones because at least his query eventually shows results. Whereas my JOIN version just choked.

    So I'm asking if this query could easily be improved, but more importantly I'm trying to understand why the JOIN version sucked big time!

    Thanks,

    skippyV

  • The out of memory error is most likely returned by SSMS or the client application that you were using. That happens when dealing with large results sets.

    Your rewrite is correct, except for the comma that shouldn't be there which I suppose was left after the copy-paste process.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • yeah the comma was just a paste error - I'll edit that.

    So are the 2 queries functionally equivalent? And therefore no performance difference?

  • skippyV (8/4/2016)


    I can't post the actual code. But I can describe the tables.

    There are no indexes and no primary or foreign keys.

    ...

    Size wise:

    tbA has 1.2 million records

    tbB has 427 million records

    tbC has 16 hundred records

    ...

    skippyV

    With 1.2 and 427 million rows, you really should add an index to those tables. You should start by defining the clustered index and add non-clustered indexes as needed. The clustered index creation might take a while and the table will become unavailable, so take that into consideration.

    If you can't share the tables structure (you could post altered names if possible) and don't feel confident on defining the correct indexes, you should hire a consultant to fix this.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • skippyV (8/4/2016)


    yeah the comma was just a paste error - I'll edit that.

    So are the 2 queries functionally equivalent? And therefore no performance difference?

    No difference other than in syntax. What actually gets executed will be the same.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Luis Cazares (8/4/2016)


    With 1.2 and 427 million rows, you really should add an index to those tables. You should start by defining the clustered index and add non-clustered indexes as needed. The clustered index creation might take a while and the table will become unavailable, so take that into consideration.

    If you can't share the tables structure (you could post altered names if possible) and don't feel confident on defining the correct indexes, you should hire a consultant to fix this.

    Thanks for the info. Fortunately this is for an archive type need and isn't something that will be used much (if ever).

    So cost/benefit considerations just wouldn't result in spending more money to optimize these tables.

  • Phil Parkin (8/4/2016)


    No difference other than in syntax. What actually gets executed will be the same.

    Thanks for the clarification. Luis pretty much said that but I wanted to make sure..

  • skippyV (8/4/2016)


    Luis Cazares (8/4/2016)


    With 1.2 and 427 million rows, you really should add an index to those tables. You should start by defining the clustered index and add non-clustered indexes as needed. The clustered index creation might take a while and the table will become unavailable, so take that into consideration.

    If you can't share the tables structure (you could post altered names if possible) and don't feel confident on defining the correct indexes, you should hire a consultant to fix this.

    Thanks for the info. Fortunately this is for an archive type need and isn't something that will be used much (if ever).

    So cost/benefit considerations just wouldn't result in spending more money to optimize these tables.

    At least, consider creating the clustered index on X for both tables, as that's probably a good option even if it's not the best. Trust me, even if it's not going to be used much, you'll prefer that this completes in a few minutes instead of taking hours.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Given that there is no clustered index now, you might as well clus to match this query, something like below. If you have DATA_COMPRESSION available, be sure to use it for tbB at least!

    CREATE CLUSTERED INDEX tbA__CL ON dbo.tbA ( X )

    WITH ( /*DATA_COMPRESSION = ROW,*/ FILLFACTOR = 98 );

    CREATE CLUSTERED INDEX tbB__CL ON dbo.tbB ( X )

    WITH ( /*DATA_COMPRESSION = PAGE,*/ FILLFACTOR = 98, SORT_IN_TEMPDB = ON );

    CREATE CLUSTERED INDEX tbC__CL ON dbo.tbC ( Y, Z )

    WITH ( /*DATA_COMPRESSION = ROW,*/ FILLFACTOR = 98 );

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher (8/4/2016)


    Given that there is no clustered index now, you might as well clus to match this query, something like below. If you have DATA_COMPRESSION available, be sure to use it for tbB at least!

    All these fields are nvarchar(50). So when I attempted the index you provided (thank you btw), an error popped up saying 'X' is of a type that is invalid for an index.

    So would I have to add checksum columns for each of those fields and then change the query to use the checksums?

  • skippyV (8/5/2016)


    ScottPletcher (8/4/2016)


    Given that there is no clustered index now, you might as well clus to match this query, something like below. If you have DATA_COMPRESSION available, be sure to use it for tbB at least!

    All these fields are nvarchar(50). So when I attempted the index you provided (thank you btw), an error popped up saying 'X' is of a type that is invalid for an index.

    So would I have to add checksum columns for each of those fields and then change the query to use the checksums?

    Afaik, there should be no problem at all using an nvarchar(50) for an index. Are you using commands, such as I wrote, or some type of gui interface or tool to create the indexes?

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • This is the query I used. I extrapolated because I thought I understand the model your query involved.

    Basically the index must cover each of the fields being compared on.

    So table A and B both have 2 fields being compared on.

    CREATE CLUSTERED INDEX indxCL_tblA_apples_pears ON [MyDB].[dbo].[tblA] ( apples, pears )

    WITH ( /*DATA_COMPRESSION = ROW,*/ FILLFACTOR = 98 );

    CREATE CLUSTERED INDEX indxCL_tblB_apples_oranges ON [MyDB].[dbo].[tblB] ( apples, oranges )

    WITH ( /*DATA_COMPRESSION = PAGE,*/ FILLFACTOR = 98, SORT_IN_TEMPDB = ON );

    CREATE CLUSTERED INDEX indxCL_tblC_oranges_pears ON [MyDB].[dbo].[tblC] ( oranges, pears )

    WITH ( /*DATA_COMPRESSION = ROW,*/ FILLFACTOR = 98 );

    In SQL Server Management Studio, apples and oranges on the 2nd create index - for tblB - is what had the red underline error codes.

    Did I miss the mark?

  • Again, there should be no issue at all indexing on an nvarchar column (although I can't imagine why you'd actually need unicode for that type of data).

    Can you post the actual column definition? Is it a computed column??

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • The error message from the tooltip over either of the red-underlined column names is: "... is of a type that is invalid for use as a key column in a index."

    None of the columns used in the index commands are computed columns.

    And their properties are identical to this http://inky.ws/g/3ii

    Perhaps it's because the second table's column for 'apples' is (nvarchar(max), not null)?

    Instead of (nvarchar(50), not null) as is the 'apples' column for table A.

  • skippyV (8/8/2016)


    The error message from the tooltip over either of the red-underlined column names is: "... is of a type that is invalid for use as a key column in a index."

    None of the columns used in the index commands are computed columns.

    And their properties are identical to this http://inky.ws/g/3ii

    Perhaps it's because the second table's column for 'apples' is (nvarchar(max), not null)?

    Instead of (nvarchar(50), not null) as is the 'apples' column for table A.

    That's exactly the problem. nvarchar(max), varchar(max) and varbinary(max) are invalid as key columns in indexes. Defining all columns as long strings is lazy designing and will cause problems like the one you're facing now. It should be changed to a proper typed design.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply