March 4, 2012 at 7:40 am
I want to join a couple of tables, Logs with 122,000 rows, and AsignedUsers which has 3,000.
If I run this statement,
Select Count (*)
From Logs L
the table's rowcount is 122,622.
But if I run this...
Select Count (*)
From Logs L
Inner join AssignedUsers A
on A.AssignedUsers_UserID = L.UserID
I get 6.84 million! Now, there are no foreign keys defined in this database. Shouldn't an inner join return ONLY the count of matching data rows?
I've also tried changing to a right/left join, with the same result?? Have I overlooked something?
March 4, 2012 at 8:54 am
Check your AssignedUsers table with the following code. I'm betting that the AssignedUsers_UserID has duplicate values in it. If there are any dupes, the following code will find them. If there are no dupes, nothing will be returned.
SELECT AssignedUsers_UserID, COUNT(*)
FROM AssignedUsers
GROUP BY AssignedUsers_UserID
HAVING COUNT(*) > 1
;
--Jeff Moden
Change is inevitable... Change for the better is not.
March 4, 2012 at 2:32 pm
You were right...
AssignedUsers_UserID
291243
271011
30688
31293
Still not sure why it's coming up with 6.84 million rows...
March 4, 2012 at 4:06 pm
What you have there is a partial cross join. If there are duplicate (or more) userIDs on both sides, the resultant row count is much larger than you might expect.
If we try this simple example...
CREATE TABLE t1 (UserID INT)
CREATE TABLE t2 (UserID INT)
INSERT INTO t1 VALUES (1),(1),(2),(2),(3),(3)
INSERT INTO t2 VALUES (1),(1),(1),(1),(2),(2),(2),(2),(3),(3)
and query them
SELECT * FROM t1 INNER JOIN t2 on t1.UserID = t2.UserID
you'll get 20 rows. That's because each of the rows in table 1 matches to each of the rows with the same UserID in table 2. So for UserID 1 there are 8 rows returned (2x4), for UserID 2 there are 8 rows (2x4) and for UserID 3 there are 4 rows (2x2)
Read up on Cartesian Products if you want more info on why this is the correct results.
Now, as for your query, do you have a table that has unique users in it? Perhaps a users table?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 5, 2012 at 9:07 am
Hi Gail,
Indeed I do, and here are the first three columns of all 5 rows (the names have been changed to protect the guilty)
idnameusername
27Mick
29Mack
30Mock
31Muck
32Meckie
The ID values correspond exactly to those repeating throughout the AssignedUsers_UserID column. Now, the good stalkers amongst you will know I posted this on SQLTeam, about a standalone DB whose FK's did not survive the import, now I'm trying to inventorise what info is stored where, by analysing the tables in descending order of rowcount, eventually to get some actionable BI. This is also an excellent intro to database design
(next time I'll know to got STRAIGHT to the Users table!) for me.
I guess the next step is to focus on all the '%ID' columns that look like PK's and determine what relationships existed. I would appreciate any critique of my approach, which I am sure can be refined by more experienced minds.
March 5, 2012 at 9:11 am
Then that's probably the table that you want to be joining to. Any time you have a situation where you join duplicate values to duplicate values you'll get the 'row multiplication' happening.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply