August 10, 2010 at 10:04 am
Hi,
I have taken over the role of managing a database and there are some queries being executed that I need to understand before I make any changes.
Could someone with some experience take a look at this query and confirm what it actually does?
What I want it to do is to check the data in the wce_contact table against the data in the wce_ilr table and where a stu_id that does not exist in the contact table but does in the wce_ilr table, I want to copy those rows over to the contact table. It’s a duplication checker only new stu_id’s to be moved over.
I’m not sure that this query does that correctly or not. Thanks for looking.
with cteDelegates(stu_id, Student_Forename, middlename, Surname, Date_of_Birth, Sex, NI_Number, Learning_Difficulty, Learning_Diff, Disability, Ethnicity, Student_Address_1, Student_Address_2, Student_Address_3, Student_Address_4,
Student_Postcode, Telephone, Email_Address, Mobile_Number, STEN_Employer_Identifier, Date_Created, rown) as(Select wce_ilr.stu_id, wce_ilr.Student_Forename, wce_ilr.middlename, wce_ilr.Surname, wce_ilr.Date_of_Birth, wce_ilr.Sex, wce_ilr.NI_Number, wce_ilr.Learning_Difficulty, wce_ilr.Learning_Diff, wce_ilr.Disability, wce_ilr.Ethnicity, wce_ilr.Student_Address_1, wce_ilr.Student_Address_2, wce_ilr.Student_Address_3, wce_ilr.Student_Address_4,
wce_ilr.Student_Postcode, wce_ilr.Telephone, wce_ilr.Email_Address, wce_ilr.Mobile_Number, wce_ilr.STEN_Employer_Identifier,wce_ilr.Date_Created, ROW_NUMBER() OVER (PARTITION BY wce_ilr.stu_id ORDER BY wce_ilr.stu_id) from wce_ilr left join wce_contact
on WCE_ILR.stu_id =wce_contact.stu_id
where WCE_contact.stu_id is null)
INSERT INTO wce_contact (UNIQUEID,contact,firstname,middle_name,lastname,stu_id,dob,
gender,national_insurance_no,Learning_Difficulties,Learning_Diff,Disability,ethnicity,
address1,address2,address3,city,postalcode,phone,emailaddress,mobilephone,STEN_Employer_Identifier,record_type, Date_Created)
Select stu_id, Student_Forename + ' ' + Surname, Student_Forename, middlename,
Surname, stu_id, Date_of_Birth, Sex, NI_Number, Learning_Difficulty, Learning_Diff, Disability, Ethnicity, Student_Address_1, Student_Address_2, Student_Address_3, Student_Address_4, Student_Postcode, Telephone, Email_Address, Mobile_Number, STEN_Employer_Identifier, 'Learner', Date_Created from cteDelegates where rown = 1 order by Student_Forename
August 10, 2010 at 2:01 pm
It is, but it could be a lot simpler:-
INSERT INTO wce_contact (UNIQUEID,contact,firstname,middle_name,lastname,stu_id,dob,
gender,national_insurance_no,Learning_Difficulties,Learning_Diff,Disability,ethnicity,
address1,address2,address3,city,postalcode,phone,emailaddress,mobilephone,
STEN_Employer_Identifier,record_type, Date_Created)
SELECT S.stu_id, S.Student_Forename + ' ' + S.Surname, S.Student_Forename, S.middlename,
S.Surname, S.stu_id, S.Date_of_Birth, S.Sex, S.NI_Number, S.Learning_Difficulty, S.Learning_Diff, S.Disability, S.Ethnicity, S.Student_Address_1, S.Student_Address_2, S.Student_Address_3, S.Student_Address_4, S.Student_Postcode, S.Telephone, S.Email_Address, S.Mobile_Number, S.STEN_Employer_Identifier, 'Learner', S.Date_Created
FROM wce_ilr AS S LEFT OUTER JOIN wce_contact AS C on S.stu_id = C.stu_id
WHERE C.stu_id IS NULL
The combination of the LEFT OUTER JOIN and the IS NULL in the SELECT query selects all the records from wce_ilr that are not already present.
August 11, 2010 at 2:58 am
Thank you for your reply. I ran the select part of your script and it looks good but when i run the select on the script i posted the results are slightly different and before i go live with yours i would like to understand why. Thanks for looking.
This is my current script and it pulls 128 rows from the wce_ilr table where it can't find them in the wce_contact table.
with cteDelegates(stu_id, Student_Forename, middlename, Surname, Date_of_Birth, Sex, NI_Number, Learning_Difficulty, Learning_Diff, Disability, Ethnicity, Student_Address_1, Student_Address_2, Student_Address_3, Student_Address_4,
Student_Postcode, Telephone, Email_Address, Mobile_Number, STEN_Employer_Identifier, Date_Created, rown) as(Select wce_ilr.stu_id, wce_ilr.Student_Forename, wce_ilr.middlename, wce_ilr.Surname, wce_ilr.Date_of_Birth, wce_ilr.Sex, wce_ilr.NI_Number, wce_ilr.Learning_Difficulty, wce_ilr.Learning_Diff, wce_ilr.Disability, wce_ilr.Ethnicity, wce_ilr.Student_Address_1, wce_ilr.Student_Address_2, wce_ilr.Student_Address_3, wce_ilr.Student_Address_4,
wce_ilr.Student_Postcode, wce_ilr.Telephone, wce_ilr.Email_Address, wce_ilr.Mobile_Number, wce_ilr.STEN_Employer_Identifier,wce_ilr.Date_Created, ROW_NUMBER() OVER (PARTITION BY wce_ilr.stu_id ORDER BY wce_ilr.stu_id) from wce_ilr left join wce_contact
on WCE_ILR.stu_id =wce_contact.stu_id
where WCE_contact.stu_id is null)
Select stu_id, Student_Forename + ' ' + Surname, Student_Forename, middlename,
Surname, stu_id, Date_of_Birth, Sex, NI_Number, Learning_Difficulty, Learning_Diff,
Disability, Ethnicity, Student_Address_1, Student_Address_2, Student_Address_3,
Student_Address_4, Student_Postcode, Telephone, Email_Address, Mobile_Number,
STEN_Employer_Identifier, 'Learner', Date_Created from cteDelegates
where rown = 1
This is your script and it pulls 154 rows from the wce_ilr table where it can't find them in the wce_contact table.
SELECT S.stu_id, S.Student_Forename + ' ' + S.Surname, S.Student_Forename, S.middlename,
S.Surname, S.stu_id, S.Date_of_Birth, S.Sex, S.NI_Number, S.Learning_Difficulty, S.Learning_Diff, S.Disability, S.Ethnicity, S.Student_Address_1, S.Student_Address_2, S.Student_Address_3, S.Student_Address_4, S.Student_Postcode, S.Telephone, S.Email_Address, S.Mobile_Number, S.STEN_Employer_Identifier, 'Learner', S.Date_Created
FROM wce_ilr AS S LEFT OUTER JOIN wce_contact AS C on S.stu_id = C.stu_id
WHERE C.stu_id IS NULL
August 11, 2010 at 7:14 am
I have just been checking the data and can see the query provided by malcolm.o-nions has duplicate rows, i think this is because the wce_ilr table can have multiple rows per stu_id so therefore this query will not do what i need.
Anyone have any thoughts on this?
August 11, 2010 at 7:22 am
The order by in the window function in the CTE in your original query will return a random row from the available dupes.
Is this intentional?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
August 11, 2010 at 7:38 am
Well this does not matter in this instance i know they are unique based on their stu_id and i just want their contact deatils to populate the wce_contact tables.
So the answer is my existing query is the best for achieveing this?
August 11, 2010 at 7:49 am
sc-w (8/11/2010)
Well this does not matter in this instance i know they are unique based on their stu_id and i just want their contact deatils to populate the wce_contact tables.So the answer is my existing query is the best for achieveing this?
It's good enough and should give the correct results. The method is standard and straightforward. You might get significantly better performance from using a temp table instead of a CTE but that would depend upon the number of rows in the source table and how many are pulled on each pass of the query.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
August 11, 2010 at 8:23 am
Thanks for this.
Could you do me a favour? I'm not getting my head round how this works.
The part of my query below is the bit that works out if there are any duplicate records right?
Now please correct me when i go wrong, ROW_NUMBER() generates a unique number in increments of 1 for each stu_id it finds in the wce_ilr table, then PARTITION BY tells it to stop when it finds the first unique stu_id, then we are LEFT JOINING the wce_contact table to check the PARTIONIONED records against the wce_contact table on the wce_ilr.stu_id = wce_contact.stu_id
Up to this point if my assumption above is correct I’m happy and please let me know if my assumption is wrong so i learn.
Now here is where I’m not getting it, the WHERE wce_contact.stu_id IS NULL surely only checks for NULLS in the wce_contact.stu_id field. I would have thought you would need something like
WHERE NOT wce_ilr.stu_id = wce_contact.stu_id
Why the NULL? Look forward to getting this one clear in my head.
, ROW_NUMBER() OVER (PARTITION BY wce_ilr.stu_id ORDER BY wce_ilr.stu_id) from wce_ilr left join wce_contact
on WCE_ILR.stu_id =wce_contact.stu_id
where WCE_contact.stu_id is null)
August 11, 2010 at 8:34 am
sc-w (8/11/2010)
Thanks for this.Could you do me a favour? I'm not getting my head round how this works.
The part of my query below is the bit that works out if there are any duplicate records right?
Now please correct me when i go wrong, ROW_NUMBER() generates a unique number in increments of 1 for each stu_id it finds in the wce_ilr table, then PARTITION BY tells it to stop when it finds the first unique stu_id, then we are LEFT JOINING the wce_contact table to check the PARTIONIONED records against the wce_contact table on the wce_ilr.stu_id = wce_contact.stu_id
Up to this point if my assumption above is correct I’m happy and please let me know if my assumption is wrong so i learn.
Now here is where I’m not getting it, the WHERE wce_contact.stu_id IS NULL surely only checks for NULLS in the wce_contact.stu_id field. I would have thought you would need something like
WHERE NOT wce_ilr.stu_id = wce_contact.stu_id
Why the NULL? Look forward to getting this one clear in my head.
, ROW_NUMBER() OVER (PARTITION BY wce_ilr.stu_id ORDER BY wce_ilr.stu_id) from wce_ilr left join wce_contact
on WCE_ILR.stu_id =wce_contact.stu_id
where WCE_contact.stu_id is null)
PARTITION BY restarts at 1 when wce_ilr.stu_id changes, so if say there are three rows with wce_ilr.stu_i = 257, then the [rown] of the three rows will be 1, 2 and 3. So picking rows with [rown] of only 1 will ensure that your rows have unique values of stu_id.
WHERE NOT wce_ilr.stu_id = wce_contact.stu_id
- for wce_ilr.stu_i = 257, this would attempt to match (i.e. output) every row in wce_contact except wce_contact.stu_id = 257, and so on. A cartesian product, more or less.
WHERE wce_contact.stu_id IS NULL
Get rows from wce_ilr where there's no matching row (on stu_id) in wce_contact.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
August 11, 2010 at 9:02 am
Great feed back thank you.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply