Query producing unexpected results.

  • Hi,

    I have an query issue that I thought was resolved but now I have a problem with the results I am getting.

    The below query should check records in both wce_contact and wce_ilr tables based on edrs number and company name with this part of the query “wce_contact ON wce_ilr.edrs_no = wce_contact.edrs OR wce_ilr.company_name = wce_contact.company” If it finds an edrs number or a company that does not exists in the wce_contact it displays the record.

    When I run this query it takes 1 hour to complete and the results show 77 records. The reason behind the query taking so long is there are 43,603 rows in the wce_contact table and 110,083 rows in wce_ilr.

    When I check the wce_contact table (where I do not expect to see any of the 77 rows) they already exist in the table. You would have thought it would show all the rows that exist in both not just 77…..

    Any advise would be appreciated. Thanks in advance

    SELECT edrs_no, company_name, company_code, Employer_address_1, Employer_address_2, Employer_address_3, Employer_address_4, Employer_POSTCODE, 'employer', 'Company'

    FROM (SELECT ROW_NUMBER() OVER (PARTITION BY wce_ilr.company_name

    ORDER BY wce_ilr.company_name) r, wce_ilr.company_name,

    wce_ilr.company_code, wce_ilr.edrs_no,

    wce_ilr.Employer_address_1, wce_ilr.Employer_address_2, wce_ilr.Employer_address_3,

    wce_ilr.Employer_address_4, wce_ilr.Employer_POSTCODE

    FROM wce_ilr LEFT OUTER JOIN

    wce_contact ON wce_ilr.edrs_no = wce_contact.edrs OR wce_ilr.company_name = wce_contact.company COLLATE database_default

    WHERE wce_contact.edrs IS NULL AND wce_ilr.edrs_no IS NOT NULL

    ) A

    WHERE r = 1 order by edrs_no desc

    Some Results from above query:

    Edsr_noCompany_name

    901538027Careforce Nottingham

    900455489Field House Care Home

    188143122Brush Works

    187092397Best Quote/Connect Joinery

    Here are the records in the wce_contact table that It should be checking against. You can see the same company name and edrs number:

    EDRSCompany

    901538027Careforce Nottingham

    900455489Field House Care Home

    188143122Brush Works

    187092397Best Quote/Connect Joinery

  • It looks like it's an issue with your OR condition in your left outer join.

    Check if your suspicious results have rows in wce_contact with the same company name, but none or different edrs.

    Since you didn't include wce_contact.edrs in your output list, you wouldn't notice....

    How did you select the results for wce_contact in your post? I'd expect selecting by EDRS?

    Other than that it's hard to tell without any sample data that support your statement. If you'd like to post sample data so we can look a little deeper into it, please follow the link in my signature to get the sample format right at the first time. 😉



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Hi,

    Here is the sample you requested showing the two fields from both tables I am matching against. It doest matter which of these finds a match I just want results that do not exist in the wce_contact table.

    I have checked through lots of the results and can’t find one without or a different edrs. The example results were selected by the edrs.

    Thanks for the help, I’ll be checking all day to reply to any posts, be nice to get this fix.

    USE [database_name] –YOU WILL NEED TO CREATE THIS.

    --===== If the test table already exists, drop it

    IF OBJECT_ID('mytable','U') IS NOT NULL

    DROP TABLE mytable

    --===== Create the test table with

    CREATE TABLE [dbo].[mytable]

    (

    uniqueid varchar(16),

    wce_ilr_edrs_no varchar(50),

    wce_contact_edrs varchar(50),

    wce_ilr_company_name varchar(50),

    wce_contact_company varchar(50)

    )

    --===== Insert the test data into the test table

    INSERT INTO mytable

    (uniqueid, wce_ilr_edrs_no, wce_contact_edrs, wce_ilr_company_name, wce_contact_company)

    SELECT 'BOO005','168660105','168660105','Boomerangs Day Nursery','Boomerangs Day Nursery' UNION ALL

    SELECT 'BES009','187092397','187092397','Best Quote/Connect Joinery','Best Quote/Connect Joinery' UNION ALL

    SELECT 'BRU004','188143122','188143122','Brush Works','Brush Works' UNION ALL

    SELECT 'CAR039','901538027','901538027','Careforce Nottingham','Careforce Nottingham' UNION ALL

    SELECT 'FIE004','900455489','900455489','Field House Care Home','Field House Care Home'

  • Hi,

    could you please provide table structure and sample data the same way you just did but for table wce_ilr and wce_contact together with some sample data that show the effect you're struggling with? That way we can use your query from above simulating exactly the scenario you're facing.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • I will do the same for wce_ilr with just the columns i am working with and 5 rows of data and the same with the wce_contact table but it might work fine. I couldn't possibly create an examplewith the 110,000 rows from one table and 45,000 from the other as i am manually entering the data into the example i am posting. Shouldn't take me too long.

    Thanks

  • Well here it is these are 5 off the 77 rows that i am getting returned on the live system from the wce_ilr table and 5 rows selected by uniqueid from teh wce_contact table.

    As suspected the query is working and no results are returned with this example. Maybe a real test can only be done with all the data...

    Any thoughts would be appreciated

    WCE_ILR TABLE

    USE [example_db]

    --===== If the test table already exists, drop it

    --IF OBJECT_ID('wce_ilr','U') IS NOT NULL

    --DROP TABLE mytable

    --===== Create the test table with

    CREATE TABLE [dbo].[wce_ilr]

    (

    uniqueid varchar(16),

    edrs_no varchar(50),

    company_name varchar(50),

    company_code varchar(50),

    Employer_address_1 varchar(50),

    Employer_address_2 varchar(50),

    Employer_address_3 varchar(50),

    Employer_address_4 varchar(50),

    Employer_POSTCODE varchar(50)

    )

    --===== Insert the test data into the test table

    INSERT INTO wce_ilr

    (uniqueid, edrs_no, company_name, company_code, Employer_address_1, Employer_address_2, Employer_address_3, Employer_address_4, Employer_POSTCODE)

    SELECT 'BOO005','168660105','Boomerangs Day Nursery','BOO005','Birch Avenue','Newhall','','Swadlincote','DE11 0NG' UNION ALL

    SELECT 'BES009','187092397','Best Quote/Connect Joinery','BES009','22 Wyrale Drive','','','Nottingham','NG8 6RN' UNION ALL

    SELECT 'BRU004','188143122','Brush Works','BRU004','42 Waterside','','','Ely','CB7 4AZ' UNION ALL

    SELECT 'CAR039','901538027','Careforce Nottingham','CAR039','515a Mansfield Road' ,'','','NOTTINGHAM','NG5 2JL'UNION ALL

    SELECT 'FIE004','900455489','Field House Care Home','FIE004','Foxhall Road' ,'','','Nottingham','NG7 6LH'

    WCE_CONTACT TABLE

    USE [example_db]

    --===== If the test table already exists, drop it

    --IF OBJECT_ID('wce_ilr','U') IS NOT NULL

    --DROP TABLE mytable

    --===== Create the test table with

    CREATE TABLE [dbo].[wce_contact]

    (

    uniqueid varchar(16),

    EDRS varchar(50),

    COMPANY varchar(50),

    company_code varchar(50),

    ADDRESS1 varchar(50),

    ADDRESS2 varchar(50),

    ADDRESS3 varchar(50),

    city varchar(50),

    postalcode varchar(50)

    )

    --===== Insert the test data into the test table

    INSERT INTO wce_contact

    (UNIQUEID, EDRS, COMPANY, Company_Code, ADDRESS1, ADDRESS2, ADDRESS3, CITY, POSTALCODE)

    SELECT 'BES009','187092397','Best Quote/Connect Joinery','BES009','22 Wyrale Drive','','','Nottingham','NG8 6RN' UNION ALL

    SELECT 'BOO005','168660105','Boomerangs Day Nursery','BOO005','Birch Avenue','Newhall','','Swadlincote','DE11 0NG' UNION ALL

    SELECT 'BRU004','188143122','Brush Works','BRU004','42 Waterside','','','Ely','CB7 4AZ' UNION ALL

    SELECT 'CAR039','901538027','Careforce Nottingham','CAR039','515a Mansfield Road','','','NOTTINGHAM','NG5 2JL' UNION ALL

    SELECT 'FIE004','900455489','Field House Care Home','FIE004','Foxhall Road','','','Nottingham','NG7 6LH'

    CODE TO CROSS REF TABLES

    SELECT edrs_no, company_name, company_code, Employer_address_1, Employer_address_2, Employer_address_3, Employer_address_4, Employer_POSTCODE, 'employer', 'Company'

    FROM (SELECT ROW_NUMBER() OVER (PARTITION BY wce_ilr.company_name

    ORDER BY wce_ilr.company_name) r, wce_ilr.company_name,

    wce_ilr.company_code, wce_ilr.edrs_no,

    wce_ilr.Employer_address_1, wce_ilr.Employer_address_2, wce_ilr.Employer_address_3,

    wce_ilr.Employer_address_4, wce_ilr.Employer_POSTCODE

    FROM wce_ilr LEFT OUTER JOIN wce_contact on wce_ilr.company_name = wce_contact.company OR wce_ilr.edrs_no = wce_contact.edrs COLLATE database_default

    WHERE wce_contact.edrs IS NULL AND wce_ilr.edrs_no IS NOT NULL

    ) A

    WHERE r = 1 order by edrs_no desc

  • Obviously, the problem cannot be reproduced easily... 🙁

    Let's try it another way: Would you please run the following code on your system (after reviewing what it does, of course):

    SELECT edrs_no, ref_edrs, company_name, ref_company, r, company_code, Employer_address_1, Employer_address_2, Employer_address_3, Employer_address_4, Employer_POSTCODE, 'employer', 'Company'

    FROM (SELECT ROW_NUMBER() OVER (PARTITION BY wce_ilr.company_name

    ORDER BY wce_ilr.company_name) r, wce_ilr.company_name,

    wce_ilr.company_code, wce_ilr.edrs_no,

    wce_ilr.Employer_address_1, wce_ilr.Employer_address_2, wce_ilr.Employer_address_3,

    wce_ilr.Employer_address_4, wce_ilr.Employer_POSTCODE,

    wce_contact.company ref_company,

    wce_contact.edrs ref_edrs

    FROM wce_ilr LEFT OUTER JOIN wce_contact on wce_ilr.company_name = wce_contact.company OR wce_ilr.edrs_no = wce_contact.edrs COLLATE database_default

    --WHERE wce_contact.edrs IS NULL AND wce_ilr.edrs_no IS NOT NULL

    WHERE wce_ilr.edrs_no = '168660105'

    OR wce_contact.edrs = '168660105'

    OR wce_contact.company = 'Boomerangs Day Nursery'

    OR wce_ilr.company_name = 'Boomerangs Day Nursery'

    ) A

    -- WHERE r = 1

    order by r

    Reason:

    I'd like to see the join result for one of the failing edrsNo without any other condition applied but the plain join. So it left out the where condtions for edrsno and the rownumber.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • It's running but might take another 30 -40 minutes to get the result. As soon as it completes i will post the results. I would have thought that would have been quicker. Thanks

  • Ok it's done. Here are the results.

    USE [example_db]

    --===== If the test table already exists, drop it

    --IF OBJECT_ID('wce_ilr','U') IS NOT NULL

    --DROP TABLE mytable

    --===== Create the test table with

    CREATE TABLE [dbo].[wce_FORUMRESULT]

    (

    edrs_no varchar(16),

    ref_edrs varchar(50),

    company_name varchar(50),

    ref_company varchar(50),

    R varchar(2),

    company_code varchar(50),

    Employer_address_1 varchar(50),

    Employer_address_2 varchar(50),

    Employer_address_3 varchar(50),

    Employer_address_4 varchar(50),

    Employer_POSTCODE varchar(50)

    )

    --===== Insert the test data into the test table

    INSERT INTO wce_FORUMRESULT

    (edrs_no, ref_edrs, company_name, ref_company, r, company_code, Employer_address_1, Employer_address_2, Employer_address_3, Employer_address_4, Employer_POSTCODE)

    SELECT '168660105','168660105','Boomerangs Day Nursery','Boomerangs Day Nursery','1','BOO005','Birch Avenue','Newhall','','Swadlincote','DE11 0NG' UNION ALL

    SELECT '168660105',NULL,'Boomerangs Day Nursery','Boomerangs Day Nursery','2','BOO005','Birch Avenue','Newhall','','Swadlincote','DE11 0NG' UNION ALL

    SELECT '168660105','168660105','Boomerangs Day Nursery','Boomerangs Day Nursery','3','BOO005','Birch Avenue','Newhall','','Swadlincote','DE11 0NG' UNION ALL

    SELECT '168660105',NULL,'Boomerangs Day Nursery','Boomerangs Day Nursery','4','BOO005','Birch Avenue','Newhall','','Swadlincote','DE11 0NG'

  • We're getting closer...

    As expected in my first post ("Check if your suspicious results have rows in wce_contact with the same company name, but none or different edrs."):

    Rows numbers 2 and 4 of the result set do meet the original WHERE condition (WHERE wce_contact.edrs IS NULL AND wce_ilr.edrs_no IS NOT NULL). Therefore, those two rows get selected, even though there are rows in the contact table with matching edrs.

    You should easily be able to verify this by running:

    SELECT * FROM wce_contact WHERE company='Boomerangs Day Nursery'

    Lets have a look at the business rules:

    You do have a wce_contact table and a table wce_ilr. How are those two related to each other?

    How can it happen that you have rows in your contact table with identical company_name but some have an edrs_no while others don't?

    What rows do you actually want to select with your query under the given data constellation?

    Regarding performance:

    Neither your data volume nor the query itself can count as a reason for taking that long (over an hour?).

    I'd expect there is a "missing index issue"...

    I assume you have an index on each of the following columns:

    wce_ilr.company_name,

    wce_ilr.edrs_no,

    wce_contact.edrs, and

    wce_contact.company.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Ok, my head hurts but i like it cause we are getting somewhere!

    I am running my query that i first wrote about here that gives me the 77 rows, then i will do your simple select query to see iff there are two rows with company and edrs and without edrs.

    There is a reason for this being odd and i don't want to confuse the situation but when i first ran this query a week ago i got 700 rows that apparently did not exist in the wce_contact table so i added my insert statement above the select query and ran it. An hour later the 700 rows were inserted in the wce_contact table and i thought i was there. Before i could rest i thought i would run the select query after inserting those 700 to make sure that the select results were 0.... This was not the case i had the 77 rows we are dealing with now. Thought i should just mention that.

    Ok now i will answer your questions.

    You do have a wce_contact table and a table wce_ilr. How are those two related to each other?

    The wce_ilr table is a stand alone table that at 12pm every night gets populated with college records, each row captures a learner, the company they work for and the course details for that row. A learner might be on many courses therefore the learner, company and even some courses appear as if duplicated. However, some learners have a person_code, some companies have an edrs_no and courses have a course_code but not all, so these act as uniqueid's. if every row had these id's it would be easy to sort this out.

    Now the wce_contact is not stand alone table it is part of a CRM system and in that table i need to insert all the companies, learners and courses from wce_ilr and mark a field called record_type with the relevant type, so a company has a record_type = company and so on.

    Because the wce_ilr table is so badly setup and i have no control over it or the data i am left with no option but to try and work a near perfect logic to make sure that i can take each instance of a company, learner and course and create their own records so there are no duplicates.

    Now the other little issue is that the wce_contact table already has x thousand records of each type in it already! Hope you are still awake. So my logic behind it was if i can select the data for each type of record i need .i.e. companies, and where an edrs number exists select them using my PARTITION OVER() to only give one instance of that record then cross reference the wce_contact table on either edrs or company name as the initial data in this table is not perfect then insert them. Then every night use the same query to check and in thoery only a few records every day will need adding. Long day hope that makes sense?

    How can it happen that you have rows in your contact table with identical company_name but some have an edrs_no while others don't?

    The client imported the initial data into the wce_contact table knowing that some data already existed in the wce_ilr table so some has company names and some edrs which means i have to check eiether before inserting as we do not want duplicates.

    What rows do you actually want to select with your query under the given data constellation?

    I thinnk i have made it clear above,please let me know if it isn't.

    Regarding performance:

    Neither your data volume nor the query itself can count as a reason for taking that long (over an hour?).

    I'd expect there is a "missing index issue"...

    I assume you have an index on each of the following columns:

    wce_ilr.company_name,

    wce_ilr.edrs_no,

    wce_contact.edrs, and

    wce_contact.company.

    The anser to this is i don't think so,never done indexing before and never had an issue before it is just this query. If however i remove "wce_ilr.company_name = wce_contact.company OR " it executes in seconds, with the second join it takes over an hour.

    Thanks for your help hope you can see something i can't.

  • So, one of the business rules is "some companies have an edrs_no, but not all"?

    In that case, either all rows per specific company should have an edrs_no, or none. Also correct?

    If the same company name can have entries in wce_contact with and without edrs, the edrs column basically becomes irrelevant or is always part of the key to identify a specific company, making the company name not unique.

    Example:

    EDRS Company

    901538027 Careforce Nottingham

    NULL Careforce Nottingham

    Either this is one company or those two companies have to be treated separately.

    If the former, you should change all the NULL values in edrs column to the corresponding value, if one exists.

    I recommend you check wce_contact for data consistency.

    What you could do, is to find company names, that have more than one EDRS code assigned (different edrs codes or edrs code and NULL):

    SELECT company, COUNT(*) as EDRS_cnt

    FROM (

    SELECT company, edrs

    FROM wce_contact

    GROUP BY company, edrs

    ) sub

    GROUP BY company

    HAVING COUNT(*) > 1

    Check the resulting companies regarding the differences for the edrs code and change the edrs if required.

    After that you should do the same test with your wce_ilr table: search for inconsistent company names /edrs codes and correct the data.

    My next step would be to verify if the combination of edrs code and company name is identical between wce_contact and wce_ilr. If there are differences, get a decision which one's correct and change the other.

    Continue to get "clean" data would bring me to the remaining company names with edrs NULL in both tables: "convince" the client to get the mess corrected, resulting in one edrs for every single company name. Change wce_contact and wce_ilr to reflect this change.

    Finally, you could join wce_contact and wce_ilr only on edrs. What you also could add prior to the insert into wce_contact is a check whether all data populated into wce_ilr will have an edrs code and that this code exists in wce_contact and the company names will match. I'd do this by using a lookup table based on wce_contact containing one row per edrs with the one corresponding company. I would add some more consistency and integrity checks if I'd be faced with a data structure as described...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Thanks Lutz, I will go through what you have said and reply back if i have anything else to pick your brain with. You have been very helpful thanks again.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply