Problem with a join.....

  • Hi there

    Trying to join (INNER) 3 tables together and the only key they have in common is a field called Environment_Key.

    Seem to be able to get the expected results for the first table join - however when the second is added it seems to generate a Cartesian results set!!! Can some one advise what I'm doing wrong:

    Here's an example of the FULL query that returns the incorrect results:

    Select top 5000

    mv.Subscriber_No AS SUBNO,

    mv.Person_No AS PERNO,

    -- Splitting the Member_Name which is in the format SURNAME,FORENAME and a single field in SQL Server

    -- into the format FORENAME, SURNAME, INITIAL the same as the Oracle table for importing.

    SUBSTRING(mv.Member_Name, CHARINDEX(',', mv.Member_Name) + 1, LEN(Mv.Member_Name)) AS FORNAM, --Forename

    SUBSTRING(mv.Member_Name, 1, NULLIF(CHARINDEX(',', mv.Member_Name) - 1, -1)) AS SURNAM, -- Surname

    LEFT(SUBSTRING(mv.Member_Name, CHARINDEX(',', mv.Member_Name) + 1, LEN(Mv.Member_Name)),1) AS [INIT], -- Initial

    -- End of name split routine

    mv.Date_of_Birth AS MBIT,

    mv.Gender as MBSEX,

    mv.Relationship_Code as RELCD,

    mv.Commencement_Date as COMCDAT,

    /*ecv.Eligibility_From_Date as CESTDT,

    ecv.Eligibility_to_Date as CDENDT,*/

    amv.Address_Line_1 as MADD1,

    amv.Address_Line_2 as MADD2,

    amv.Address_Line_3 as MADD3,

    amv.City as MCITY,

    amv.Area as MCNTY,

    amv.State_Key as MSTAT,

    amv.Post_Code as MZIP,

    amv.Country_Code as CCODE

    from Member_View mv

    inner join address_member_view amv

    on amv.environment_key = mv.environment_key

    join eligibility_current_view ecv

    ON amv.environment_key = ecv.environment_key

    WHERE mv.environment_Key = 'DL'

    AND mv.Commencement_Date > '20100101' [/b]

    Any help would be appreciated. Believe I need to do 2 inner joins as I need only the entries from each of the 3 tables that match the WHERE / AND parameters at the end.

  • 'Seems' to generate a cartesian results set? If you arent sure, my first suggestion would be to add the environment key from each 'table' to the select and try to verify.

    Maybe if you could give us an example with ddl and some sample data, we could see what you're getting and you could show what you're expecting instead.

    If you're trying to diagnose something weird, you could also try joining each combination of 2 of the 3 tables and making sure they all get expected results.

    Also worth noting that it looks from the naming that these are views, not tables, and that they might be views based on the same underlying tables. That won't necessarily cause the problem you are complaining of, but is also probably not the best way to get the data you're after.

  • Unfortunately to really say what's going on we would need the table structures as well. At the very least the primary key/indexes, and preferably some sample data. Not to mention if you are really wanting some good help include the code to create those tables and code to insert the test data into them :). It makes life so much easier on the rest of us.

    Now without that information, if I had to make an educated guess, I would say that Member_View is a 1:many on both Address_Member_View and Eligibility_Current_View. In which case yes you are going to get a Cartesian result.

    For example lets say that Member_View has 1 row for Environment_Key 'DL', Address_Member_View has 2 and Eligibility_Current_View has 3. You are going to get 6 rows back. See below.

    Member_View

    Environmental_KeyMV_PK

    DL1

    Address_Member_View

    Environmental_KeyAMV_PK

    DL1

    DL2

    Eligibility_Current_View

    Environmental_KeyECV_PK

    DL1

    DL2

    DL3

    Your end result looks like this

    Environmental_KeyMV_PKAMV_PKECV_PK

    DL111

    DL112

    DL113

    DL121

    DL122

    DL123

    End result assuming I'm correct you either need to find some additional criteria, for example the most recent row in Eligibility_Current_View or give us some additional information about what you are trying to get out of your query.

    Kenneth

    Kenneth FisherI was once offered a wizards hat but it got in the way of my dunce cap.--------------------------------------------------------------------------------For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/[/url]Link to my Blog Post --> www.SQLStudies.com[/url]

  • Ran the following code to give me an idea how many rows on each table have the environment_key 'DL' which is used for joining the 3 tables together to get an idea of data volumes.

    select count(*) as [Members-Matching_Rows]

    from member_view

    where environment_key = 'DL'

    and Commencement_Date = '20100601'

    select count(*) as [Eligibility-Matching_Rows]

    from Eligibility_current_view

    where environment_key = 'DL'

    select count(*) as [Address-Matching_Rows]

    from address_member_view

    where environment_key = 'DL'

    Produced the follwing results:

    Members-Matching_Rows

    ---------------------

    2451

    Eligibility-Matching_Rows

    -------------------------

    32286

    Address-Matching_Rows

    ---------------------

    117041

    select count(*) as [Total-Matching-Records]

    from member_view mv , eligibility_current_view ecv

    where ecv.environment_key = 'DL'

    and mv.environment_key = 'DL'

    and mv.commencement_date = '20100601'

    Results:

    Total-Matching-Records

    ----------------------

    79132986

    This is the exact sum of the total # of records in member_view x eligibility. I thought there would be LESS or am I wrong if I am attempting to get ONLY THOSE RECORDS that have an environment_key of 'DL' and a commencement_date of '20100601'

  • As i mentioned in my initial posting there is only 1 common field I can join these 3 tables on "Environment_Key".

  • select count(*) as [Total-Matching-Records]

    from member_view mv , eligibility_current_view ecv

    where ecv.environment_key = 'DL'

    and mv.environment_key = 'DL'

    and mv.commencement_date = '20100601'

    No that sounds about right. mv.commencement_date = '20100601' affects only the number of rows from the member_view. If you get 100 rows from table A and cross join it to table B which also gets you 100 row then you get 10000 rows. And basically what you have above is a cross join. Based on the number of rows, if those 2 tables are supposed to join at all then you should have more columns to join them on. A member id or something like it. I would look over the primary key on member_view. Realizing of course that by the name its a view and probably doesn't have an actual PK, but it should have some combiniation of fields that makes each row unique. Then see if you can find that combination of fields in your Eligibility view.

    Kenneth FisherI was once offered a wizards hat but it got in the way of my dunce cap.--------------------------------------------------------------------------------For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/[/url]Link to my Blog Post --> www.SQLStudies.com[/url]

  • Just because environment_key is the only field that the three have in common doesn't mean that's the field that you want to join on. All of my tables have Inserted_By, but I wouldn't want to join on them. In your case, it looks like environment_key = "DL" stands for download.

    People have asked for your table structure in order to help. If you refuse to provide it, you're just shooting yourself in the foot.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • BravehearT1326 (8/20/2010)


    Ran the following code to give me an idea how many rows on each table have the environment_key 'DL' which is used for joining the 3 tables together to get an idea of data volumes.

    select count(*) as [Members-Matching_Rows]

    from member_view

    where environment_key = 'DL'

    and Commencement_Date = '20100601'

    select count(*) as [Eligibility-Matching_Rows]

    from Eligibility_current_view

    where environment_key = 'DL'

    select count(*) as [Address-Matching_Rows]

    from address_member_view

    where environment_key = 'DL'

    Produced the follwing results:

    Members-Matching_Rows

    ---------------------

    2451

    Eligibility-Matching_Rows

    -------------------------

    32286

    Address-Matching_Rows

    ---------------------

    117041

    select count(*) as [Total-Matching-Records]

    from member_view mv , eligibility_current_view ecv

    where ecv.environment_key = 'DL'

    and mv.environment_key = 'DL'

    and mv.commencement_date = '20100601'

    Results:

    Total-Matching-Records

    ----------------------

    79132986

    This is the exact sum of the total # of records in member_view x eligibility. I thought there would be LESS or am I wrong if I am attempting to get ONLY THOSE RECORDS that have an environment_key of 'DL' and a commencement_date of '20100601'

    An exact sum is what you should expect.

    There is a many to many match when joining on environment key (it is not unique in either table)

    think of it like this

    TABLE_1

    row_id row_in_common

    1 1

    2 1

    TABLE_2

    t2tow_id row_in_common

    1 1

    2 1

    3 1

    4 2

    SELECT * from table_1 INNER JOIN table_2 on table_1.row_in_common = table2.row_in_common where table_1.row_in_common = 1

    will give you records

    1 1 1 1

    1 1 2 1

    1 1 3 1

    2 1 1 1

    2 1 2 1

    2 1 3 1

    six results (2x3). Each hit in one table matches each hit in the other.

    Long story short, you're probably linking by the wrong thing.

    Again, these look like views and im guessing you aren't seeing matching fields because the views arent bringing in the underlying key fields. And that means you should probably be querying the underlying tables instead. Look at the view definitions and find out what you can.

    Or better yet, post the view and table definitions and some data and people here can help.

  • TABLE STRUCTURES that make up the VIEWS:

    -- ELIGIBILITY TABLE:

    CREATE TABLE [dbo].[Eligibility](

    [Group_Code] [varchar](6) NOT NULL,

    [Product_Key] [varchar](4) NOT NULL,

    [Subscriber_No] [varchar](9) NOT NULL,

    [Person_No] [varchar](2) NOT NULL,

    [Eligibility_From_Date] [datetime] NOT NULL,

    [Eligibility_To_Date] [datetime] NULL,

    [Plan_Code] [varchar](3) NOT NULL,

    [Plan_Type_Code] [varchar](1) NOT NULL,

    [Product_Tree_Key] [varchar](4) NOT NULL,

    [Rider_Code_1] [varchar](1) NOT NULL,

    [Rider_Code_2] [varchar](1) NOT NULL,

    [Rider_Code_3] [varchar](1) NOT NULL,

    [Rider_Code_4] [varchar](1) NOT NULL,

    [Rider_Code_5] [varchar](1) NOT NULL,

    [SubGroup_Code] [varchar](3) NOT NULL,

    [Subscriber_Type] [varchar](15) NOT NULL,

    [Termination_Reason_Code] [varchar](1) NOT NULL,

    [Environment_Key] [varchar](2) NOT NULL,

    [Date_Record_Loaded] [datetime] NOT NULL,

    [Deleted_From_Eurocare] [varchar](1) NOT NULL,

    [Journal_Sequence] [int] NOT NULL,

    [Date_Modified] [datetime] NOT NULL,

    [Member_Key] [varchar](15) NULL,

    [Member_Commencement_Date] [datetime] NULL,

    [Client_Key] [varchar](30) NULL,

    CONSTRAINT [PK__Eligibility] PRIMARY KEY CLUSTERED

    (

    [Subscriber_No] ASC,

    [Person_No] ASC,

    [Eligibility_From_Date] ASC,

    [Product_Key] ASC,

    [Environment_Key] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    --- MEMBER TABLE

    CREATE TABLE [dbo].[Member](

    [Member_Key] [varchar](15) NOT NULL,

    [Environment_Key] [varchar](2) NOT NULL,

    [Subscriber_No] [varchar](9) NOT NULL,

    [Person_No] [varchar](2) NOT NULL,

    [Relationship_Code] [varchar](1) NOT NULL,

    [Member_Name] [varchar](40) NOT NULL,

    [Member_Add_Month] [smallint] NOT NULL,

    [Member_Add_Year] [int] NOT NULL,

    [Gender] [varchar](1) NOT NULL,

    [Smoker_flag] [varchar](1) NOT NULL,

    [Subscriber_Flag] [varchar](1) NOT NULL,

    [Country_code] [varchar](4) NOT NULL,

    [Country_Name] [varchar](20) NOT NULL,

    [Nationality] [varchar](4) NOT NULL,

    [Nationality_Country_Name] [varchar](20) NOT NULL,

    [Country_of_Residency] [varchar](4) NOT NULL,

    [Country_of_Residency_Name] [varchar](20) NOT NULL,

    [Date_of_Birth] [datetime] NOT NULL,

    [Date_Modified] [datetime] NOT NULL,

    [Date_Record_Loaded] [datetime] NOT NULL,

    [Termination_Date] [datetime] NOT NULL,

    [Termination_Reason_Code] [varchar](1) NOT NULL,

    [Dentist_Surgical] [varchar](6) NOT NULL,

    [Dentist_Restorative] [varchar](6) NOT NULL,

    [Address_Key] [varchar](40) NOT NULL,

    [Address_PPO_Key] [varchar](40) NOT NULL,

    [CIGNA_Member] [char](1) NOT NULL,

    [Commencement_Date] [datetime] NOT NULL,

    [VIP_Status] [char](1) NOT NULL,

    [Date_Member_Added] [datetime] NOT NULL,

    [Date_Member_Terminated] [datetime] NOT NULL,

    [Client_Location] [varchar](50) NULL,

    [Client_Employee_No] [varchar](50) NULL,

    [Client_Cost_Centre] [varchar](50) NULL,

    [Client_Payroll_No] [varchar](50) NULL,

    [Client_Department] [varchar](50) NULL,

    CONSTRAINT [PK_Member] PRIMARY KEY CLUSTERED

    (

    [Member_Key] ASC,

    [Environment_Key] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    --ADDRESS

    USE [RDR]

    GO

    /****** Object: Table [dbo].[Address] Script Date: 08/20/2010 16:10:48 ******/

    CREATE TABLE [dbo].[Address](

    [Address_Key] [varchar](40) NOT NULL,

    [Post_Code] [varchar](9) NOT NULL,

    [City] [varchar](35) NOT NULL,

    [Country] [varchar](40) NOT NULL,

    [Country_Code] [varchar](4) NOT NULL,

    [Country_Desc] [varchar](35) NOT NULL,

    [First_Part_Postcode] [varchar](4) NOT NULL,

    [Second_Part_Postcode] [varchar](3) NOT NULL,

    [Area] [varchar](25) NOT NULL,

    [Region] [varchar](20) NOT NULL,

    [Environment_Key] [char](2) NOT NULL,

    [Address_Provider] [varchar](12) NULL,

    [Address_Line_1] [varchar](50) NULL,

    [Address_Line_2] [varchar](50) NULL,

    [Address_Line_3] [varchar](50) NULL,

    [EMail] [varchar](50) NULL,

    [State_Key] [char](2) NULL,

    CONSTRAINT [PK_Address] PRIMARY KEY CLUSTERED

    (

    [Address_Key] ASC,

    [Environment_Key] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    Looks like I may have to:

    Join member to eligibility using Environment_key & Member_Key

    Join member to Address using Environment_key & Address_key

  • Could you post the definitions of each view too? That will probably be revealing as to how these tables link.

    My guess is that envirnoment key is coincidental and not needed to join at all. Could be wrong on that though (do you know what that field represents? is there an environment table?) member probably just joins to address through address key (although the table design there is strange as it can link many members to one address but cant link many addresses to a member). Eligibility is trickier but i'd guess its subscriber number and person number that links them.

    Getting the joins right is only half the battle. Im assuming you only want one record per member based on your original query, but it looks like a member can have multiple eligibilities at once, in which case you'll have to use a grouping or derived table/CTE to ensure that you're only creating one row no matter how many matches there are.

  • Views defined as follows:

    --MEMBER_VIEW

    CREATEVIEW [dbo].[Member_View]

    AS

    SELECT

    Member_key,

    date_modified,

    Person_No,

    CASE WHEN CIGNA_Member = 'Y' THEN 'Confidential'

    ELSE Member_Name

    ENDAS Member_Name,

    Subscriber_No,

    Member_Add_Month,

    Member_Add_Year,

    Country_code,

    Country_Name,

    Relationship_Code,

    Date_of_Birth,

    Gender,

    Smoker_flag,

    Nationality,

    Nationality_Country_Name,

    Country_of_Residency,

    Country_of_Residency_Name,

    Subscriber_Flag,

    Date_Record_Loaded,

    Termination_Date,

    Termination_Reason_Code,

    Environment_Key,

    Dentist_Restorative,

    Dentist_Surgical,

    Address_Key,

    Address_PPO_Key,

    Commencement_Date,

    VIP_Status,

    Date_Member_Added,

    Date_Member_Terminated,

    Client_Location,

    Client_Employee_No,

    Client_Cost_Centre,

    Client_Payroll_No,

    Client_Department

    fromdbo.Member

    --ELIGIBILITY_CURRENT_VIEW

    CREATE view [dbo].[Eligibility_Current_View]

    AS

    SELECT *

    FROM dbo.Eligibility

    WHERE (Eligibility_From_Date <= getdate()

    ANDEligibility_To_Date >= getdate())

    OR(Eligibility_From_Date <= getdate()

    ANDEligibility_To_Date is null)

    --ADDRESS_MEMBER_VIEW

    USE [RDR]

    GO

    /****** Object: View [dbo].[Address_Member_View] Script Date: 08/20/2010 16:41:51 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE VIEW [dbo].[Address_Member_View]

    AS

    SELECT * from address

    --WHERE address_key LIKE '%Member'

    WHERE Address_Provider = 'Member'

    Having joined on the multiple keys I listed above seems to have done the trick.......I hope...still checking results.

  • --#1

    CREATE VIEW [dbo].[Address_Member_View]

    AS

    SELECT * from address

    --WHERE address_key LIKE '%Member'

    WHERE Address_Provider = 'Member'

    --#2

    CREATE view [dbo].[Eligibility_Current_View]

    AS

    SELECT *

    FROM dbo.Eligibility

    WHERE (Eligibility_From_Date <= getdate()

    ANDEligibility_To_Date >= getdate())

    OR(Eligibility_From_Date <= getdate()

    ANDEligibility_To_Date is null)

    --#3

    CREATEVIEW [dbo].[Member_View]

    AS

    SELECT

    Member_key,

    date_modified,

    Person_No,

    CASE WHEN CIGNA_Member = 'Y' THEN 'Confidential'

    ELSE Member_Name

    ENDAS Member_Name,

    Subscriber_No,

    Member_Add_Month,

    Member_Add_Year,

    Country_code,

    Country_Name,

    Relationship_Code,

    Date_of_Birth,

    Gender,

    Smoker_flag,

    Nationality,

    Nationality_Country_Name,

    Country_of_Residency,

    Country_of_Residency_Name,

    Subscriber_Flag,

    Date_Record_Loaded,

    Termination_Date,

    Termination_Reason_Code,

    Environment_Key,

    Dentist_Restorative,

    Dentist_Surgical,

    Address_Key,

    Address_PPO_Key,

    Commencement_Date,

    VIP_Status,

    Date_Member_Added,

    Date_Member_Terminated,

    Client_Location,

    Client_Employee_No,

    Client_Cost_Centre,

    Client_Payroll_No,

    Client_Department

    fromdbo.Member

  • Well, that didnt tell us much, did it? I was assuming that member address view at least would link between the two.

    It would be nice if the design was documented so you'd know how the tables joined, but based on the field names and primary keys I think you have it right.

    One thing to still watch out for is the potential many to one between member and eligibility. If you only want one row per member you might want to do a derived table that selects distinct member_key/environment key pairs from eligibility and then use that table in the join to member and address, since you dont need any fields from that table, you just need to check that there's a matching record.

    If you actually want multiple rows if a member has multiple eligibilities, ignore the above.

  • Everyone thanks for your help / guidance on this one. Looks like I have managed to get the desired results.

    Unfortunately the design is not the best and as you can see from the view defenitions there was not much to work with 😉 but looks like we're there.

    Again - all help appreciated

  • fyi, you might want to remove your company name from your script before you post it next time. You should be able to edit your own post if you want to take it out, you'll have to ask others to edit for you if you need it removed from the quotes.

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply