Joins clarification

Question

Joins clarification

NikosV

SSCrazy

Points: 2111
More actions
August 13, 2018 at 6:47 am

#371085

Let's say I have Table A and Table B.
Table A can join onto Table B because both tables have a common column. Let's call this column PolicyNumber. However, Table B has multiple records of the same PolicyNumber, this is because each record with X PolicyNumber may differ on other details in Table B so basically each row is distinct.
Why is it when I join Table A onto Table B on the PolicyNumber, my overall result comes back wrong?
So what I'm asking is, when I perform a join onto a common column and TableB has multiple occurrences of the same PolicyNumber, will this affect my result and why?
Since I'm joining two common columns, I can't yet see why this ruins my overall result because of the actual values under PolicyNumber.I obviously haven't understood exactly how joins work yet but I have looked up many tutorials and they just explain what each join does. What I need is a clarification on what I asked if possible and also a recommendation on how to approach understanding how joins behave.
Thanks

Viewing 15 posts - 1 through 15 (of 31 total)

You must be logged in to reply to this topic. Login to reply

laurie-789651 SSCertifiable Points: 7680 More actions · Answer 1

NikosV - Monday, August 13, 2018 6:47 AM
Why is it when I join Table A onto Table B on the PolicyNumber, my overall result comes back wrong?

When you say it's wrong what do you mean? Also, what type of join is it?

If you gave an example with what you expect the results to be, that would be useful.

Eirikur Eiriksson SSC Guru Points: 182912 More actions · Answer 2

NikosV - Monday, August 13, 2018 6:47 AM
Let's say I have Table A and Table B.
Table A can join onto Table B because both tables have a common column. Let's call this column PolicyNumber. However, Table B has multiple records of the same PolicyNumber, this is because each record with X PolicyNumber may differ on other details in Table B so basically each row is distinct.
Why is it when I join Table A onto Table B on the PolicyNumber, my overall result comes back wrong?
So what I'm asking is, when I perform a join onto a common column and TableB has multiple occurrences of the same PolicyNumber, will this affect my result and why?
Since I'm joining two common columns, I can't yet see why this ruins my overall result because of the actual values under PolicyNumber.I obviously haven't understood exactly how joins work yet but I have looked up many tutorials and they just explain what each join does. What I need is a clarification on what I asked if possible and also a recommendation on how to approach understanding how joins behave.
Thanks

In simple terms, if the cardinality of the common columns in the two tables is different then you will get a multiplication of the two and the outcome will be SUM(A(N) x B(N)) where N is the instance count of each distinct value in each table.
😎

This will of course offset any calculation on the table wich only has distinct values on the common column.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 3

NikosV - Monday, August 13, 2018 6:47 AM
Let's say I have Table A and Table B.
Table A can join onto Table B because both tables have a common column. Let's call this column PolicyNumber. However, Table B has multiple records of the same PolicyNumber, this is because each record with X PolicyNumber may differ on other details in Table B so basically each row is distinct.
Why is it when I join Table A onto Table B on the PolicyNumber, my overall result comes back wrong?
So what I'm asking is, when I perform a join onto a common column and TableB has multiple occurrences of the same PolicyNumber, will this affect my result and why?
Since I'm joining two common columns, I can't yet see why this ruins my overall result because of the actual values under PolicyNumber.I obviously haven't understood exactly how joins work yet but I have looked up many tutorials and they just explain what each join does. What I need is a clarification on what I asked if possible and also a recommendation on how to approach understanding how joins behave.
Thanks

When you say it comes back wrong, what do you mean? We can't see what you see so it makes it harder to know what is going on.

NikosV SSCrazy Points: 2111 More actions · Answer 4

Thanks for that. Do you know of any websites where I can learn a bit more than the basics about joins? Also, what you described is what my problem is. If I wanted one row from Table I would probably have to use ROW_NUMBER with a WHERE clause or something. This makes most sense I've read up on Joins but it doesn't look it'll work with a standard join.

sgmunson SSC Guru Points: 110640 More actions · Answer 5

NikosV - Monday, August 13, 2018 6:47 AM
Let's say I have Table A and Table B.
Table A can join onto Table B because both tables have a common column. Let's call this column PolicyNumber. However, Table B has multiple records of the same PolicyNumber, this is because each record with X PolicyNumber may differ on other details in Table B so basically each row is distinct.
Why is it when I join Table A onto Table B on the PolicyNumber, my overall result comes back wrong?
So what I'm asking is, when I perform a join onto a common column and TableB has multiple occurrences of the same PolicyNumber, will this affect my result and why?
Since I'm joining two common columns, I can't yet see why this ruins my overall result because of the actual values under PolicyNumber.I obviously haven't understood exactly how joins work yet but I have looked up many tutorials and they just explain what each join does. What I need is a clarification on what I asked if possible and also a recommendation on how to approach understanding how joins behave.
Thanks

In simple terms, if Table A has exactly one row for a given Policy Number, and Table B has, say 5 rows, then any of the JOIN types will give you 5 rows, and the values for Table A will be repeated in the result set 5 times, while values from Table B will be the distinct values from it. If Table A instead had 2 rows for the same Policy Number, then the total number or resulting rows would be 10. Thus, if you want a single row result for a given policy number, then you need to be "selective" by including some form of restriction on the JOIN. As an example, let's say that the values in Table B require that you get the most recently dated row, based on the UpdatedDate column. Then you would need to either pre-group the rows by policy number and then take the max UpdatedDate value, and then use that CTE to join to Table A. On the other hand, if you just needed the one row from Table B where a given Status column had some specific value, and you know that only 1 such row for any given policy number can exist with that specific status value, then you could just include an additional condition on the JOIN to Table B that requires the Status column in Table B to have that value. The idea, of course, in both cases, is to restrict the total number of rows to just one for a given policy number. If you use an INNER JOIN, then to get a resulting row, then both tables MUST have a qualifying record. If you use a LEFT OUTER JOIN, then Table B will not need to have a matching value in order for rows from Table A to return values. That's the basics, for the most part, for INNER and OUTER joins...

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 6

Why not post the DDL (CREATE TABLE statement) for the tables, some sample date (not production data) for the tables as INSERT INTO statements, and the code you are running that shows the problem with the results based on that set of sample data.

laurie-789651 SSCertifiable Points: 7680 More actions · Answer 7

NikosV - Monday, August 13, 2018 7:49 AM
Thanks for that. Do you know of any websites where I can learn a bit more than the basics about joins?

If you search on this website using "joins" & select "Articles" there is everything from the basics upwards.

Eirikur Eiriksson SSC Guru Points: 182912 More actions · Answer 8

laurie-789651 - Monday, August 13, 2018 8:07 AM
NikosV - Monday, August 13, 2018 7:49 AM
Thanks for that. Do you know of any websites where I can learn a bit more than the basics about joins?
If you search on this website using "joins" & select "Articles" there is everything from the basics upwards.

Further on this, I would suggest researching into the "NaÃ¯ve set theory", get a grasp of that and it makes it easier to understand the problem.
😎

Michael L John One Orange Chip Points: 27056 More actions · Answer 9

NikosV - Monday, August 13, 2018 7:49 AM
Thanks for that. Do you know of any websites where I can learn a bit more than the basics about joins? Also, what you described is what my problem is. If I wanted one row from Table I would probably have to use ROW_NUMBER with a WHERE clause or something. This makes most sense I've read up on Joins but it doesn't look it'll work with a standard join.

Yes, this site.
http://www.sqlservercentral.com/stairway/

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

sgmunson SSC Guru Points: 110640 More actions · Answer 10

One additional note on JOINs... Once you grasp the basics, you can start using GROUP BY to restrict the rows that are returned, and that will then allow you to create groups and subgroups in your data, and perform "aggregate" operations, like SUM, COUNT, MIN, MAX, etc. In some cases, you may want to restrict which groups you get back, so you can use a HAVING clause, which can operate very much like a WHERE clause, just at the group level instead. Typically, aggregates are used in the HAVING clause, but not always. It's usually so that one can be selective about the aggregates that get returned. There is also a TOP clause you can add to your select to limit the total number of rows returned to a specific number. You usually specify an ORDER BY clause at the end of a query that uses TOP so as to guarantee order, so that perhaps you could select the top 3 regions by sales dollar volume, for example.

Eventually, you can start using queries as subqueries or as CTEs, and start to put together a total query as you might put together Lego blocks...

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

NikosV SSCrazy Points: 2111 More actions · Answer 11

sgmunson - Monday, August 13, 2018 7:54 AM
NikosV - Monday, August 13, 2018 6:47 AM
Let's say I have Table A and Table B.
Table A can join onto Table B because both tables have a common column. Let's call this column PolicyNumber. However, Table B has multiple records of the same PolicyNumber, this is because each record with X PolicyNumber may differ on other details in Table B so basically each row is distinct.
Why is it when I join Table A onto Table B on the PolicyNumber, my overall result comes back wrong?
So what I'm asking is, when I perform a join onto a common column and TableB has multiple occurrences of the same PolicyNumber, will this affect my result and why?
Since I'm joining two common columns, I can't yet see why this ruins my overall result because of the actual values under PolicyNumber.I obviously haven't understood exactly how joins work yet but I have looked up many tutorials and they just explain what each join does. What I need is a clarification on what I asked if possible and also a recommendation on how to approach understanding how joins behave.
Thanks
In simple terms, if Table A has exactly one row for a given Policy Number, and Table B has, say 5 rows, then any of the JOIN types will give you 5 rows, and the values for Table A will be repeated in the result set 5 times, while values from Table B will be the distinct values from it. If Table A instead had 2 rows for the same Policy Number, then the total number or resulting rows would be 10. Thus, if you want a single row result for a given policy number, then you need to be "selective" by including some form of restriction on the JOIN. As an example, let's say that the values in Table B require that you get the most recently dated row, based on the UpdatedDate column. Then you would need to either pre-group the rows by policy number and then take the max UpdatedDate value, and then use that CTE to join to Table A. On the other hand, if you just needed the one row from Table B where a given Status column had some specific value, and you know that only 1 such row for any given policy number can exist with that specific status value, then you could just include an additional condition on the JOIN to Table B that requires the Status column in Table B to have that value. The idea, of course, in both cases, is to restrict the total number of rows to just one for a given policy number. If you use an INNER JOIN, then to get a resulting row, then both tables MUST have a qualifying record. If you use a LEFT OUTER JOIN, then Table B will not need to have a matching value in order for rows from Table A to return values. That's the basics, for the most part, for INNER and OUTER joins...

This is actually a brilliant explanation. I believe I've understood exactly what you're trying to tell me. It makes perfect sense.
Let me just make sure I've got the hang of it because 95% of my work is SQL and it's quite a steep learning curve.

I've understood that in cases where I have multiple records of the same value, in order to perform a Join I have to narrow down or as you said, be selective in order to 'force' the system as it were, to bring me one record back. In other words, a working Join presumes that we are joining ONE value (PolicyNumber from Table A) to ONE value (PolicyNumber in Table 2). I'm not sure if I'm making any sense but I really need to get a hang of this.

Say I say this:

SELECT * FROM Server.Schema.TableA A LEFT OUTER JOIN Server.Schema.TableB B ON A.PolicyNumber = B.PolicyNumber

A has one record with PolicyNumber 012345 and B has 5 records of 012345. We said this would return 5 records, but I want one. So I would HAVE TO find a value in TableB like for example:

SELECT * FROM Server.Schema.TableA A LEFT OUTER JOIN Server.Schema.TableB B ON A.PolicyNumber = B.PolicyNumber

WHERE B.LayerNumber = 0 GROUP BY A.PolicyNumber

OR

SELECT MIN(B.LayerNumber), PolicyNumber FROM Server.Schema.TableA A LEFT OUTER JOIN Server.Schema.TableB B ON A.PolicyNumber = B.PolicyNumber GROUP BY A.PolicyNumber

that would fetch the PolicyNumber WITH a LayerNumber of say 0, for which Table B has one record only.

Would this approach correct my problem?

The next thing I want to make sure of is this:

When I perform a Join between TableA and TableB on the PolicyNumber, what happens under the hood? Apart from the database engine doing what it normally does, when I say ON A.PolicyNumber = B.PolicyNumber is the systems approach sort of like "OK, I have a PolicyNumber 012345 from A and I'm going to look for PolicyNumber 012345 in B and match them" or does it say, "I've got a column in A called PolicyNumber and I'm going to look for a column in B called PolicyNumber and marry the two columns". In other words, does it have a ByValue behaviour or a ByColumn behaviour?

Just some guidance on the above and I'll give it a jab at work tomorrow. Gawd I really hope my explanations are clear.

andycadley SSCertifiable Points: 5296 More actions · Answer 12

NikosV - Monday, August 13, 2018 1:55 PM
In other words, a working Join presumes that we are joining ONE value (PolicyNumber from Table A) to ONE value (PolicyNumber in Table 2). I'm not sure if I'm making any sense but I really need to get a hang of this.

A "working" join doesn't necessarily need to end up with a 1:1 relationship between rows of joined tables. Sometimes that's what you want, in which case you arrange the joining conditions to satisfy that requirement, but a lot of times you want to multiply up rows because that allows you to accomplish things that a purely 1:1 relationship would not.

sgmunson SSC Guru Points: 110640 More actions · Answer 13

NikosV - Monday, August 13, 2018 1:55 PM
sgmunson - Monday, August 13, 2018 7:54 AM
NikosV - Monday, August 13, 2018 6:47 AM
Let's say I have Table A and Table B.
Table A can join onto Table B because both tables have a common column. Let's call this column PolicyNumber. However, Table B has multiple records of the same PolicyNumber, this is because each record with X PolicyNumber may differ on other details in Table B so basically each row is distinct.
Why is it when I join Table A onto Table B on the PolicyNumber, my overall result comes back wrong?
So what I'm asking is, when I perform a join onto a common column and TableB has multiple occurrences of the same PolicyNumber, will this affect my result and why?
Since I'm joining two common columns, I can't yet see why this ruins my overall result because of the actual values under PolicyNumber.I obviously haven't understood exactly how joins work yet but I have looked up many tutorials and they just explain what each join does. What I need is a clarification on what I asked if possible and also a recommendation on how to approach understanding how joins behave.
Thanks
In simple terms, if Table A has exactly one row for a given Policy Number, and Table B has, say 5 rows, then any of the JOIN types will give you 5 rows, and the values for Table A will be repeated in the result set 5 times, while values from Table B will be the distinct values from it. If Table A instead had 2 rows for the same Policy Number, then the total number or resulting rows would be 10. Thus, if you want a single row result for a given policy number, then you need to be "selective" by including some form of restriction on the JOIN. As an example, let's say that the values in Table B require that you get the most recently dated row, based on the UpdatedDate column. Then you would need to either pre-group the rows by policy number and then take the max UpdatedDate value, and then use that CTE to join to Table A. On the other hand, if you just needed the one row from Table B where a given Status column had some specific value, and you know that only 1 such row for any given policy number can exist with that specific status value, then you could just include an additional condition on the JOIN to Table B that requires the Status column in Table B to have that value. The idea, of course, in both cases, is to restrict the total number of rows to just one for a given policy number. If you use an INNER JOIN, then to get a resulting row, then both tables MUST have a qualifying record. If you use a LEFT OUTER JOIN, then Table B will not need to have a matching value in order for rows from Table A to return values. That's the basics, for the most part, for INNER and OUTER joins...
This is actually a brilliant explanation. I believe I've understood exactly what you're trying to tell me. It makes perfect sense.
Let me just make sure I've got the hang of it because 95% of my work is SQL and it's quite a steep learning curve.
I've understood that in cases where I have multiple records of the same value, in order to perform a Join I have to narrow down or as you said, be selective in order to 'force' the system as it were, to bring me one record back. In other words, a working Join presumes that we are joining ONE value (PolicyNumber from Table A) to ONE value (PolicyNumber in Table 2). I'm not sure if I'm making any sense but I really need to get a hang of this.
Say I say this:
SELECT * FROM Server.Schema.TableA A LEFT OUTER JOIN Server.Schema.TableB B ON A.PolicyNumber = B.PolicyNumber
A has one record with PolicyNumber 012345 and B has 5 records of 012345. We said this would return 5 records, but I want one. So I would HAVE TO find a value in TableB like for example:
SELECT * FROM Server.Schema.TableA A LEFT OUTER JOIN Server.Schema.TableB B ON A.PolicyNumber = B.PolicyNumber
WHERE B.LayerNumber = 0 GROUP BY A.PolicyNumber
OR
SELECT MIN(B.LayerNumber), PolicyNumber FROM Server.Schema.TableA A LEFT OUTER JOIN Server.Schema.TableB B ON A.PolicyNumber = B.PolicyNumber GROUP BY A.PolicyNumber
that would fetch the PolicyNumber WITH a LayerNumber of say 0, for which Table B has one record only.
Would this approach correct my problem?
The next thing I want to make sure of is this:
When I perform a Join between TableA and TableB on the PolicyNumber, what happens under the hood? Apart from the database engine doing what it normally does, when I say ON A.PolicyNumber = B.PolicyNumber is the systems approach sort of like "OK, I have a PolicyNumber 012345 from A and I'm going to look for PolicyNumber 012345 in B and match them" or does it say, "I've got a column in A called PolicyNumber and I'm going to look for a column in B called PolicyNumber and marry the two columns". In other words, does it have a ByValue behaviour or a ByColumn behaviour?
Just some guidance on the above and I'll give it a jab at work tomorrow. Gawd I really hope my explanations are clear.

Let's first worry about the JOIN condition you specified. You put the Table B column in the WHERE clause, which just turned it into an INNER JOIN... As you had specified a LEFT OUTER JOIN, that indicated you wanted all rows from Table A, and only the matching rows from Table B, based on a Policy Number match, but you're actual matching condition is to have a Layer of 0, so that needs to be part of the JOIN condition as opposed to the WHERE clause. What you would end up with is all rows in Table A and only those with matching rows in B where the layer is 0. You may actually want that, but what you were trying to ask for was all the rows in A, AND bring in the info from Table B only on those rows that have a matching policy number AND a layer of 0. Did that make sense?

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Eirikur Eiriksson SSC Guru Points: 182912 More actions · Answer 14

andycadley - Monday, August 13, 2018 2:05 PM
NikosV - Monday, August 13, 2018 1:55 PM
In other words, a working Join presumes that we are joining ONE value (PolicyNumber from Table A) to ONE value (PolicyNumber in Table 2). I'm not sure if I'm making any sense but I really need to get a hang of this.
A "working" join doesn't necessarily need to end up with a 1:1 relationship between rows of joined tables. Sometimes that's what you want, in which case you arrange the joining conditions to satisfy that requirement, but a lot of times you want to multiply up rows because that allows you to accomplish things that a purely 1:1 relationship would not.

He he, think of dbo.Me joining dbo.Girlfriends, any kind of aggregation when joining the two would end up in an explosive situation😀
😎