August 22, 2014 at 7:42 am
Hi all,
Can someone help explain to me how when I have 12 rows in table A and 10 in B and I do an inner join , I would get more rows than
in both A and B ?
Same with left and right joins...
This is just a simplified example. Let me share one of my issues with you
I have 2 views ; which was originally SQL on 2 base tables Culture and Trials.
And then when attempting to add another table Culture Steps, one of the team members separated the SQL into 2 views
Since this produces an error when updating(modification cannot be done as it affects multiple base tables), I would like to get
back to changing the SQL such that I no longer use the views but achieve the same results.
One of the views has
SELECT some columns
FROM dbo.Culture RIGHT JOIN
dbo.Trial ON dbo.Culture.cultureID = dbo.Trial.CultureID LEFT OUTER JOIN
dbo.TrialCultureSteps_view_part1 ON dbo.Culture.cultureID = dbo.TrialCultureSteps_view_part1.cultureID
The other TrialCultureSteps_view_part1 view
SELECT DISTINCT dbo.Culture.cultureID,
(SELECT TOP (1) WeekNr
FROM dbo.CultureStep
WHERE (CultureID = dbo.Culture.cultureID)
ORDER BY CultureStepID) AS normalstartweek
FROM dbo.Culture INNER JOIN
dbo.CultureStep AS CultureStep_1 ON dbo.Culture.cultureID = CultureStep_1.CultureID
So how can I combine the joins the achieve the same results using SQL only on tables without the need for views?
August 22, 2014 at 8:26 am
Quick thought, could this be caused by Cartesian Product, see the example?
π
USE tempdb;
GO
DECLARE @TBL01 TABLE
(
T_ID INT NOT NULL
);
DECLARE @TBL02 TABLE
(
T_ID INT NOT NULL
);
INSERT INTO @TBL01 (T_ID)
VALUES (1),(2),(3),(4),(5),(6),(5),(6),(7),(8);
INSERT INTO @TBL02 (T_ID)
VALUES (1),(2),(3),(4),(5),(5),(6),(6),(7),(8),(9),(10);
SELECT
*
FROM @TBL01 T1
INNER JOIN @TBL02 T2
ON T1.T_ID = T2.T_ID;
Results
T_ID T_ID
----------- ------
1 1
2 2
3 3
4 4
5 5
5 5
6 6
6 6
5 5
5 5
6 6
6 6
7 7
8 8
August 22, 2014 at 8:32 am
I don't think that's a Cartesian product. If it were, you would have 89 (8*10) rows of ouput.
A CROSS JOIN would give you a Cartesian Product
August 22, 2014 at 8:33 am
Please post CREATE TABLE statements for all the tables involved, some sample data for each as INSERT INTO statements, and the output you want to see from your join.
August 22, 2014 at 8:57 am
Do you have enforced referential constraints on the columns you're joining on? Does the data accurately map between the two tables, or is it possible that you're getting multiple matches for your JOIN criteria? That would be the most likely reason for more rows is that more values match the JOIN criteria for any given row.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
August 22, 2014 at 9:11 am
gbritton1 (8/22/2014)
I don't think that's a Cartesian product. If it were, you would have 89 (8*10) rows of ouput.A CROSS JOIN would give you a Cartesian Product
A CROSS JOIN would produce a full Cartesian Product, equal duplicate values will produce partial Cartesian Product, regardless of the join type.
π
August 22, 2014 at 9:38 am
Eirikur Eiriksson (8/22/2014)
gbritton1 (8/22/2014)
I don't think that's a Cartesian product. If it were, you would have 89 (8*10) rows of ouput.A CROSS JOIN would give you a Cartesian Product
A CROSS JOIN would produce a full Cartesian Product, equal duplicate values will produce partial Cartesian Product, regardless of the join type.
π
Another of Jeff's articles to the rescue:
http://www.sqlservercentral.com/articles/T-SQL/61539/[/url]
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
August 22, 2014 at 10:16 am
Thank you all for your replies π
Before copying any create or insert data, here is an example
Select * from Culture --2327 rows
Select * from Trial --6793 rows
Select * from CultureStep --13957 rows
Select * from Culture inner join Trial on Culture.CultureID = Trial.CultureID --6785
Select * from (Culture inner join Trial on Culture.CultureID = Trial.CultureID) inner join CultureStep
on Culture.CultureID=CultureStep.CultureID --39661
How come when the first join yields 6785 rows when inner joined with culturestep which has 13957 rows , this would yield row number greater than both (39661)?!
I am not very good at this , kindly explain with details and/or refer me to a good example to clear all kinds of joins and help me understand what results to expect π
August 22, 2014 at 10:32 am
shaimaa.tarekelshoeiby (8/22/2014)
Thank you all for your replies πBefore copying any create or insert data, here is an example
Select * from Culture --2327 rows
Select * from Trial --6793 rows
Select * from CultureStep --13957 rows
Select * from Culture inner join Trial on Culture.CultureID = Trial.CultureID --6785
Select * from (Culture inner join Trial on Culture.CultureID = Trial.CultureID) inner join CultureStep
on Culture.CultureID=CultureStep.CultureID --39661
How come when the first join yields 6785 rows when inner joined with culturestep which has 13957 rows , this would yield row number greater than both (39661)?!
I am not very good at this , kindly explain with details and/or refer me to a good example to clear all kinds of joins and help me understand what results to expect π
To me it looks like a many-to-many table without a filter, don't think it is a coincidence that the cardinality of Trial = CultureStep / 2 and CultureStep = Culture x 6
π
Quick question, can you provide either an ERD or a create table script for the three tables?
August 23, 2014 at 2:55 am
Culture
CREATE TABLE [dbo].[Culture](
[cultureID] [int] IDENTITY(1,1) NOT NULL,
[CultureName] [nvarchar](50) NOT NULL,
[Process] [nvarchar](50) NULL,
[Cropspecialist] [nvarchar](50) NULL,
[Requestor] [nvarchar](50) NULL,
[CuttingsOrSeed] [nvarchar](50) NULL,
[EndLocation] [nvarchar](50) NULL,
[StartWeek] [int] NULL,
[StartWeekDeviation] [int] NULL,
[DeliveryWeek] [int] NULL,
[DeliveryWeekDeviation] [int] NULL,
[Purpose] [nvarchar](50) NULL,
[Crop] [nvarchar](50) NULL,
[EntryVariation?] [nvarchar](50) NULL,
[CultureRemarks1] [nvarchar](255) NULL,
[CultureRemarks2] [nvarchar](255) NULL,
[extra] [int] NULL,
[planningGroupID1] [int] NULL,
[extra2] [int] NULL,
[extra3] [int] NULL,
[croptype1] [nvarchar](255) NULL,
[FLS/VEG] [nvarchar](255) NULL,
[Department] [nvarchar](255) NULL,
[subProcess] [nvarchar](255) NULL,
[timeregCode] [nvarchar](255) NULL,
[createdBy] [nvarchar](255) NULL,
[dateCreated] [nvarchar](255) NULL,
CONSTRAINT [aaaaaCulture_PK] PRIMARY KEY NONCLUSTERED
Culture Step:
CREATE TABLE [dbo].[CultureStep](
[CultureStepID] [int] IDENTITY(1,1) NOT NULL,
[CultureID] [int] NULL,
[EntryVarNr] [int] NULL,
[WeekNr] [int] NULL,
[stepordernr] [int] NULL,
[days] [int] NULL,
[plants per entry] [int] NULL,
[plants per fust] [decimal](18, 1) NULL,
[extraM2] [int] NULL,
[fustPerM2(Manual)] [float] NULL,
[SoilType] [nvarchar](255) NULL,
[covering soil] [nvarchar](255) NULL,
[watering] [nvarchar](255) NULL,
[fungicide treatm after sowing] [nvarchar](255) NULL,
[name Fungicide] [nvarchar](255) NULL,
[conc Fungiside] [nvarchar](255) NULL,
[type of Hood] [nvarchar](255) NULL,
[Pincing?] [nvarchar](50) NULL,
[temperature dayNight C] [nvarchar](255) NULL,
[Neg diff] [nvarchar](255) NULL,
[Cold drop] [nvarchar](255) NULL,
[artificial light (y/n)] [nvarchar](255) NULL,
[daylength neutral (yes/no)] [nvarchar](255) NULL,
[daylength] [nvarchar](255) NULL,
[standard PGR treatment?] [nvarchar](255) NULL,
[name PGR] [nvarchar](255) NULL,
[conc PGR] [nvarchar](255) NULL,
[Spraying Schedule PGR] [nvarchar](255) NULL,
[PH] [nvarchar](255) NULL,
[EC] [nvarchar](255) NULL,
[EC in drench water] [nvarchar](255) NULL,
[fertilisation schedule vegetative] [nvarchar](255) NULL,
[fertilisation schedule generative] [nvarchar](255) NULL,
[remarks fertilisation] [nvarchar](255) NULL,
[Common diseases] [nvarchar](255) NULL,
[remarks herbicides] [nvarchar](255) NULL,
[remarks fungicides] [nvarchar](255) NULL,
[prefLocation] [nvarchar](255) NULL,
[year] [int] NULL,
[dummy] [nvarchar](50) NULL,
[plants/hour] [int] NULL,
[EntryFactor] [decimal](18, 2) NULL,
[fustID1] [int] NULL,
[Actionid1] [int] NULL,
[CultureStepNameID1] [int] NULL,
[daynr] [int] NULL,
[endweekmanual] [int] NULL,
[CultureStepRemarks] [nvarchar](255) NULL,
CONSTRAINT [aaaaaCultureStep_PK] PRIMARY KEY NONCLUSTERED
Trial:
CREATE TABLE [dbo].[Trial](
[TrialID] [int] IDENTITY(1,1) NOT NULL,
[TrialCode] [nvarchar](50) NOT NULL,
[ActualNrOfTotalEntries] [int] NULL,
[ActualNrOfEntriesVarNr1] [int] NULL,
[ActualNrOfEntriesVarNr2] [int] NULL,
[ActualNrOfEntriesVarNr3] [int] NULL,
[ActualNrOfEntriesVarNr4] [int] NULL,
[AtualStartweek] [int] NULL,
[PlannedStartWeek] [int] NULL,
[ActualStartYear] [nvarchar](50) NULL,
[CultureID] [int] NULL,
[weekOfTrialrequest] [int] NULL,
[yearOfTrialrequest] [nvarchar](50) NULL,
[remarks1] [nvarchar](200) NULL,
[remarks2] [nvarchar](200) NULL,
[RequestedNrOfTotalEntries] [int] NULL,
[RequestedStartweek] [int] NULL,
[RequestedNrOfEntriesVarNr1] [int] NULL,
[RequestedNrOfEntriesVarNr2] [int] NULL,
[RequestedNrOfEntriesVarNr3] [int] NULL,
[RequestedNrOfEntriesVarNr4] [int] NULL,
[plannedRequest] [nvarchar](255) NULL,
[trialCodeGroup] [nvarchar](255) NULL,
[OrigRequestedNrOfTotalEntries] [int] NULL,
[OrigRequestedStartweek] [int] NULL,
[OrigRequestedNrOfEntriesVarNr1] [int] NULL,
[OrigRequestedNrOfEntriesVarNr2] [int] NULL,
[OrigRequestedNrOfEntriesVarNr3] [int] NULL,
[OrigRequestedNrOfEntriesVarNr4] [int] NULL,
[PlanningAproved] [nvarchar](255) NULL,
[EntryFactorTrial] [float] NULL,
[FirstStepWithEntryFactor] [float] NULL,
[ReasonEntryFactor] [nvarchar](100) NULL,
[RequestedEntryFacorTrial] [float] NULL,
[RequestedFirstStepWithEntryFactor] [float] NULL,
[weekCorrection] [int] NULL,
[StepWCorr] [int] NULL,
[M2Correction] [int] NULL,
[StepM2Corr] [int] NULL,
[fixedDeliveryWeek] [int] NULL,
[createdBy] [nvarchar](255) NULL,
[dateCreated] [nvarchar](255) NULL,
[trialevaluation] [nvarchar](255) NULL,
CONSTRAINT [aaaaaTrial_PK] PRIMARY KEY NONCLUSTERED
August 23, 2014 at 4:29 am
Quick analysis
π
First a simplified ERD
+--------------+ +--------------+ +-----------------+
| Trial | | Culture | | CultureStep |
+--------------+ +--------------+ +-----------------+
| TrialID | ,---|-| CultureID |-|--, | CultureStepID |
| TrialCode | | | CultureName | '---|<| CultureID |
| CultureID |>|--' | (Culture ) | | ( CultureStep ) |
| (Trial ) | | (attributes) | | ( attributes ) |
| (attributes) | +--------------+ +-----------------+
+--------------+
Looking at the ERD, we can tell that:
Each Trial has one and only one Culture.
One or more Trials can share the same Culture.
And
Each Culture has one or more CultureSteps.
One or more Cultures can share one or more CultureSteps.
Inspecting the Cardinality
Table | Row Count
------------|-----------
Trial | 6793
Culture | 2327
CultureStep | 13957
As the query "Select * from Culture inner join Trial on Culture.CultureID = Trial.CultureID" returns 6785 rows, we know that there are 8 (6793 - 6785) Trials sharing the same Culture. We can also tell that each Culture has the average of 6 CultureSteps (13957/2327 = 5.998).
Since we do not know the distribution of CultureID in the CultureStep table, the exact numbers cannot be produced but here is an approxymation:
Select * from (Culture inner join Trial on Culture.CultureID = Trial.CultureID) inner join CultureStep on Culture.CultureID=CultureStep.CultureID
Given that each CultureID appears aprox. 6 times in the CultureStep and the set Culture-Trial has 6785 entries, the expected result should have close to 6 x 6785 entries.
August 25, 2014 at 7:03 am
Thank you for the excellent post .
Can you please help me become capable of doing such analysis?
How were you able to make the simplified ERD and come up with the conclusions under it
I want to learn this and be good at it ...
Also , need to be good with joins and although I did read out several links with examples ...I still seem to find issues like when my brain cannot digest how an inner join would get more rows than the individual selects and so on.
August 25, 2014 at 7:25 am
shaimaa.tarekelshoeiby (8/25/2014)
Thank you for the excellent post .Can you please help me become capable of doing such analysis?
How were you able to make the simplified ERD and come up with the conclusions under it
I want to learn this and be good at it ...
Also , need to be good with joins and although I did read out several links with examples ...I still seem to find issues like when my brain cannot digest how an inner join would get more rows than the individual selects and so on.
You are very welcome.
My suggestion is to grab a good book on the subject, i.e. "Database Modeling and Design" or "Data Modeling for the Business" and while reading through it, use only simple tools such as pen/paper or a text editor to work through exercises and examples. Also read up on the "Relational Database Theory", "Naive Set Theory" and "Relational Algebra". This should get you well on the way.
Another thing I find helpful, especially when tackling very complex problems, is to practice solving puzzles like Soduku by memory (not using pen and paper), keeps the grey matter functional;-)
π
August 25, 2014 at 7:55 am
Thank you ...
But with so limited time , what can I do to help me go through until I can have more time to read the book(s) you suggested and use pen/paper?
Any recommended set of articles or a small daily exercise ; something of the sort? π
August 25, 2014 at 8:00 am
There's no instant or magic way to build this kind of knowledge. It takes a long time. I'd add getting a copy of the book T-SQL Querying Fundamentals by Itzik Ben Gan. It's a great way to learn.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
Viewing 15 posts - 1 through 15 (of 22 total)
You must be logged in to reply to this topic. Login to reply