Slight mystery....

Question

Slight mystery....

Jake Shelton

Hall of Fame

Points: 3109
More actions
March 4, 2012 at 7:40 am

#251885

I want to join a couple of tables, Logs with 122,000 rows, and AsignedUsers which has 3,000.
If I run this statement,
Select Count (*)
From Logs L
the table's rowcount is 122,622.
But if I run this...
Select Count (*)
From Logs L
Inner join AssignedUsers A
on A.AssignedUsers_UserID = L.UserID
I get 6.84 million! Now, there are no foreign keys defined in this database. Shouldn't an inner join return ONLY the count of matching data rows?
I've also tried changing to a right/left join, with the same result?? Have I overlooked something?

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004414 More actions · Answer 1

Check your AssignedUsers table with the following code. I'm betting that the AssignedUsers_UserID has duplicate values in it. If there are any dupes, the following code will find them. If there are no dupes, nothing will be returned.

SELECT AssignedUsers_UserID, COUNT(*)

FROM AssignedUsers

GROUP BY AssignedUsers_UserID

HAVING COUNT(*) > 1

;

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jake Shelton Hall of Fame Points: 3109 More actions · Answer 2

You were right...

AssignedUsers_UserID

291243

271011

30688

31293

Still not sure why it's coming up with 6.84 million rows...

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 3

What you have there is a partial cross join. If there are duplicate (or more) userIDs on both sides, the resultant row count is much larger than you might expect.

If we try this simple example...

CREATE TABLE t1 (UserID INT)

CREATE TABLE t2 (UserID INT)

INSERT INTO t1 VALUES (1),(1),(2),(2),(3),(3)

INSERT INTO t2 VALUES (1),(1),(1),(1),(2),(2),(2),(2),(3),(3)

and query them

SELECT * FROM t1 INNER JOIN t2 on t1.UserID = t2.UserID

you'll get 20 rows. That's because each of the rows in table 1 matches to each of the rows with the same UserID in table 2. So for UserID 1 there are 8 rows returned (2x4), for UserID 2 there are 8 rows (2x4) and for UserID 3 there are 4 rows (2x2)

Read up on Cartesian Products if you want more info on why this is the correct results.

Now, as for your query, do you have a table that has unique users in it? Perhaps a users table?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jake Shelton Hall of Fame Points: 3109 More actions · Answer 4

Hi Gail,

Indeed I do, and here are the first three columns of all 5 rows (the names have been changed to protect the guilty)

idnameusername

27Mick

29Mack

30Mock

31Muck

32Meckie

The ID values correspond exactly to those repeating throughout the AssignedUsers_UserID column. Now, the good stalkers amongst you will know I posted this on SQLTeam, about a standalone DB whose FK's did not survive the import, now I'm trying to inventorise what info is stored where, by analysing the tables in descending order of rowcount, eventually to get some actionable BI. This is also an excellent intro to database design

(next time I'll know to got STRAIGHT to the Users table!) for me.

I guess the next step is to focus on all the '%ID' columns that look like PK's and determine what relationships existed. I would appreciate any critique of my approach, which I am sure can be refined by more experienced minds.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 5

Then that's probably the table that you want to be joining to. Any time you have a situation where you join duplicate values to duplicate values you'll get the 'row multiplication' happening.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass