Query to join three tables but filter out subset of the data

  • I'm struggling to develop a query to pull data from a few tables while excluding some portion of the data. For simplicity sake, say we have the following tables:

    ROLES

    Role_IDRole_Name

    1User

    2Manager

    3Engineer

    4Controller

    PERSONS

    Person_IDName

    111Dave

    222Mike

    333Jeff

    ROLEPERSONS (Many to Many relationship with both roles/persons)

    Role_IDPerson_ID

    1111

    1222

    2111

    3222

    3333

    4111

    Now I want to write a query to join the tables but exclude all data for persons that have Role_Name = Controller. Note: I don't just want to exclude those specific rows where the role_id = controller, but I want to exclude ALL the joined rows for any person that is assigned to the Controller role (as well as any other roles).

    So in above case, I need to create a query to extract the following data:

    Role IdRole_NamePerson_IDName

    1 User222Mike

    3 Engineer222Mike

    3 Engineer333Jeff

    In other words, I want to filter out all rows for Dave because he is assigned to the controller role, regardless of the fact that he's also assigned to other roles as well. It seems like this should be easy enough to do, but I'm struggling to find a way. Any help is appreciated.

  • schoenbeck16 (12/8/2016)


    I'm struggling to develop a query to pull data from a few tables while excluding some portion of the data. For simplicity sake, say we have the following tables:

    ROLES

    Role_IDRole_Name

    1User

    2Manager

    3Engineer

    4Controller

    PERSONS

    Person_IDName

    111Dave

    222Mike

    333Jeff

    ROLEPERSONS (Many to Many relationship with both roles/persons)

    Role_IDPerson_ID

    1111

    1222

    2111

    3222

    3333

    4111

    Now I want to write a query to join the tables but exclude all data for persons that have Role_Name = Controller. Note: I don't just want to exclude those specific rows where the role_id = controller, but I want to exclude ALL the joined rows for any person that is assigned to the Controller role (as well as any other roles).

    So in above case, I need to create a query to extract the following data:

    Role IdRole_NamePerson_IDName

    1 User222Mike

    3 Engineer222Mike

    3 Engineer333Jeff

    In other words, I want to filter out all rows for Dave because he is assigned to the controller role, regardless of the fact that he's also assigned to other roles as well. It seems like this should be easy enough to do, but I'm struggling to find a way. Any help is appreciated.

    How about this:

    CREATE TABLE #ROLES (

    Role_ID int,

    Role_Name varchar(15)

    );

    INSERT INTO #ROLES (Role_ID, Role_Name)

    VALUES(1, 'User'),

    (2, 'Manager'),

    (3, 'Engineer'),

    (4, 'Controller');

    CREATE TABLE #PERSONS (

    Person_ID int,

    Name varchar(15)

    );

    INSERT INTO #PERSONS (Person_ID, Name)

    VALUES(111, 'Dave'),

    (222, 'Mike'),

    (333, 'Jeff');

    CREATE TABLE #ROLEPERSONS (

    Role_ID int,

    Person_ID int

    );

    INSERT INTO #ROLEPERSONS (Role_ID, Person_ID)

    VALUES(1, 111),

    (1, 222),

    (2, 111),

    (3, 222),

    (3, 333),

    (4, 111);

    WITH VALID_PERSONS AS (

    SELECT DISTINCT P.Person_ID

    FROM #PERSONS AS P

    WHERE NOT EXISTS (

    SELECT 1

    FROM #ROLES AS R

    INNER JOIN #ROLEPERSONS AS RP

    ON R.Role_ID = RP.Role_ID

    WHERE R.Role_Name = 'Controller'

    AND RP.Person_ID = P.Person_ID

    )

    )

    SELECT R.Role_ID, R.Role_Name, P.Person_ID, P.Name

    FROM #ROLES AS R

    INNER JOIN #ROLEPERSONS AS RP

    ON R.Role_ID = RP.Role_ID

    AND R.Role_Name <> 'Controller'

    INNER JOIN #PERSONS AS P

    ON RP.Person_ID = P.Person_ID

    INNER JOIN VALID_PERSONS AS VP

    ON P.Person_ID = VP.Person_ID

    WHERE R.Role_Name <> 'Controller'

    ORDER BY P.Person_ID;

    DROP TABLE #ROLES;

    DROP TABLE #PERSONS;

    DROP TABLE #ROLEPERSONS;

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • It's difficult to tell without knowing the indexes on these tables, but I'm thinking Steve's query is more expensive since it's hitting all 3 base tables twice, and the only one that would really need to be hit twice is #ROLEPERSONS, and just one extra seek to #ROLES. So maybe something like this would be quicker and easier to read:

    DECLARE @ControllerID INT;

    SELECT @ControllerID = Role_ID FROM #ROLES WHERE Role_Name = 'Controller';

    SELECT R.Role_ID, R.Role_Name, P.Person_ID, P.Name

    FROM #ROLES AS R

    INNER JOIN #ROLEPERSONS AS RP ON R.Role_ID = RP.Role_ID

    INNER JOIN #PERSONS AS P ON RP.Person_ID = P.Person_ID

    WHERE NOT EXISTS (SELECT NULL FROM #ROLEPERSONS rp2 WHERE rp2.Person_ID = p.Person_ID AND rp2.Role_ID = @ControllerID)

    ORDER BY P.Person_ID;

  • Thank you sir, this works great!

  • Chris Harshman (12/9/2016)


    It's difficult to tell without knowing the indexes on these tables, but I'm thinking Steve's query is more expensive since it's hitting all 3 base tables twice, and the only one that would really need to be hit twice is #ROLEPERSONS, and just one extra seek to #ROLES. So maybe something like this would be quicker and easier to read:

    DECLARE @ControllerID INT;

    SELECT @ControllerID = Role_ID FROM #ROLES WHERE Role_Name = 'Controller';

    SELECT R.Role_ID, R.Role_Name, P.Person_ID, P.Name

    FROM #ROLES AS R

    INNER JOIN #ROLEPERSONS AS RP ON R.Role_ID = RP.Role_ID

    INNER JOIN #PERSONS AS P ON RP.Person_ID = P.Person_ID

    WHERE NOT EXISTS (SELECT NULL FROM #ROLEPERSONS rp2 WHERE rp2.Person_ID = p.Person_ID AND rp2.Role_ID = @ControllerID)

    ORDER BY P.Person_ID;

    I don't think this one will work as is, since the variable will only hold the last value selected.

  • schoenbeck16 (12/9/2016)


    ...

    I don't think this one will work as is, since the variable will only hold the last value selected.

    OK, I thought there was only 1 row in the ROLES table for the role_name Controller. I broke the problem down into figuring out that ID, then excluding the people (WHERE NOT EXISTS) that have that Controller role. How many roles are Controller?

  • Chris Harshman (12/9/2016)


    schoenbeck16 (12/9/2016)


    ...

    I don't think this one will work as is, since the variable will only hold the last value selected.

    OK, I thought there was only 1 row in the ROLES table for the role_name Controller. I broke the problem down into figuring out that ID, then excluding the people (WHERE NOT EXISTS) that have that Controller role. How many roles are Controller?

    Yeah, I tried to simplify it for the sake of posting. We have different controllers for different sites so there are actually around 20 in total.

  • >> For simplicity sake, say we have the following tables:<<

    It would have been simpler if you posted real DDL, complete with constraints keys and references. What we have here is a real mess. Let us try and fix it okay?

    There are no such data elements as “role” or “person” in a valid model. These terms are too vague and general (and role has a specific meaning and metadata) to be valid column names. Your first table seems to be jobs and the job id looks to be redundant. I know that when you are first learning RDBMS, you want to have a record number on all your tables. That is not what a key is. The real database you might have had a DOT (dictionary of occupational titles) code here, but for our tutorial. Let us go ahead and get rid of the redundancy here.

    CREATE TABLE Jobs

    (job_title CHAR(15) NOT NULL PRIMARY KEY);

    INSERT INTO Jobs

    VALUES ('user'), ('Manager'), ('Engineer'), ('Controller');

    But then we have to ask why are these job titles in their own table? We might want to have considered putting them into a CHECK (job_title IN (..))Clause depending on how static or dynamic the data is and how big it is.

    CREATE TABLE Personnel

    (emp_id CHAR(3) NOT NULL PRIMARY KEY,

    emp_name VARCHAR(10) NOT NULL);

    Do you remember in elementary school when you are given the definition of a noun is a word that names a person place or thing? That is why you can never have a column simply named “persons” ; It is too generic to be meaningful (if you ever have a course in basic logic? One of the fundamental principles is “the law of identity”; it states that “to be is to be something in particular to be something in general or nothing in particular at all is to be nothing.” This actually predates Aristotle!).

    INSERT INTO Personnel

    VALUES

    ('111', 'Dave'),

    ('222', 'Mike'),

    ('333', 'Jeff');

    Obviously in a real schema, the identifier of an employee would be some sort of industry-standard encoding (Social Security number in the United States, social insurance number in Canada, etc.)

    CREATE TABLE Job_Assignments

    (job_title CHAR(12) NOT NULL REFERENCES Jobs,

    emp_id CHAR(3) NOT NULL REFERENCES Personnel,

    PRIMARY KEY (job_title, emp_id));

    Did you know that by definition – let me repeat, that by definition –Has to have a key? Did you see the references clauses in this DDL? That is why your narrative is useless; we need to know how you expressed your 1:m or m:n relationships in your DDL. In order to help you we have to do everything you fail to do for us; When someone is this rude, do you are you inclined to help them?

    INSERT INTO Job_Assignments

    VALUES

    ('user', '111'),

    ('user', '222'),

    ('Manger', '111'),

    ('Engineer', '222'),

    ('Engineer', '333'),

    ('Controller', '111');

    >> Now I want to write a query to join the tables but exclude all data for Personnel that have job_title = ‘Controller’. Note: I don't just want to exclude those specific rows where the job_id = controller, but I want to exclude ALL the joined rows for any person that is assigned to the Controller role (as well as any other jobs). <<

    Let’s do this with CTE’s so you can see the steps:

    WITH

    Non_Controllers

    AS

    (SELECT emp_id

    FROM Job_Assignments

    WHERE job_title <> ‘Controller')

    SELECT *

    FROM Job_Assignments

    WHERE emp_id IN (SELECT * FROM Non_Controllers));

    You can fold the CTE into the from clause of the query, if you wish. But this is just a little easier to read.

    Books in Celko Series for Morgan-Kaufmann Publishing
    Analytics and OLAP in SQL
    Data and Databases: Concepts in Practice
    Data, Measurements and Standards in SQL
    SQL for Smarties
    SQL Programming Style
    SQL Puzzles and Answers
    Thinking in Sets
    Trees and Hierarchies in SQL

  • CELKO (12/13/2016)

    Obviously in a real schema, the identifier of an employee would be some sort of industry-standard encoding (Social Security number in the United States, social insurance number in Canada, etc.)

    It wouldn't be SSN, because (1) that can change and (2) it might not technically be legal in certain use cases for a private company to use that.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • It wouldn't be SSN, because (1) that can change and (2) it might not technically be legal in certain use cases for a private company to use that.

    Yes, but it is also required for filing taxes and other things. Our Social Security number has just gotten much worse for other reasons; it used to come in three distinct parts. The could be checked for the regular expression or parser. But a few years ago it was converted into just a random nine digit number thanks to all of the illegals in the United States.

    Books in Celko Series for Morgan-Kaufmann Publishing
    Analytics and OLAP in SQL
    Data and Databases: Concepts in Practice
    Data, Measurements and Standards in SQL
    SQL for Smarties
    SQL Programming Style
    SQL Puzzles and Answers
    Thinking in Sets
    Trees and Hierarchies in SQL

  • CELKO (12/13/2016)


    It wouldn't be SSN, because (1) that can change and (2) it might not technically be legal in certain use cases for a private company to use that.

    Yes, but it is also required for filing taxes and other things. Our Social Security number has just gotten much worse for other reasons; it used to come in three distinct parts. The could be checked for the regular expression or parser. But a few years ago it was converted into just a random nine digit number thanks to all of the illegals in the United States.

    I don't want to have to give my ssn to anyone else, even inside my own company, just to do some company activity. Yes, the company needs to store my ssn (encrypted, of course) but they don't need to use it as any type of "identifier" except where it's legally required to be used (IRS filings, etc.). As you yourself well know, most companies use a unique employee number as identification.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • CELKO (12/13/2016)


    >> For simplicity sake, say we have the following tables:<<

    It would have been simpler if you posted real DDL, complete with constraints keys and references. What we have here is a real mess. Let us try and fix it okay?

    There are no such data elements as “role” or “person” in a valid model. These terms are too vague and general (and role has a specific meaning and metadata) to be valid column names. Your first table seems to be jobs and the job id looks to be redundant. I know that when you are first learning RDBMS, you want to have a record number on all your tables. That is not what a key is. The real database you might have had a DOT (dictionary of occupational titles) code here, but for our tutorial. Let us go ahead and get rid of the redundancy here.

    CREATE TABLE Jobs

    (job_title CHAR(15) NOT NULL PRIMARY KEY);

    INSERT INTO Jobs

    VALUES ('user'), ('Manager'), ('Engineer'), ('Controller');

    But then we have to ask why are these job titles in their own table? We might want to have considered putting them into a CHECK (job_title IN (..))Clause depending on how static or dynamic the data is and how big it is.

    CREATE TABLE Personnel

    (emp_id CHAR(3) NOT NULL PRIMARY KEY,

    emp_name VARCHAR(10) NOT NULL);

    Do you remember in elementary school when you are given the definition of a noun is a word that names a person place or thing? That is why you can never have a column simply named “persons” ; It is too generic to be meaningful (if you ever have a course in basic logic? One of the fundamental principles is “the law of identity”; it states that “to be is to be something in particular to be something in general or nothing in particular at all is to be nothing.” This actually predates Aristotle!).

    INSERT INTO Personnel

    VALUES

    ('111', 'Dave'),

    ('222', 'Mike'),

    ('333', 'Jeff');

    Obviously in a real schema, the identifier of an employee would be some sort of industry-standard encoding (Social Security number in the United States, social insurance number in Canada, etc.)

    CREATE TABLE Job_Assignments

    (job_title CHAR(12) NOT NULL REFERENCES Jobs,

    emp_id CHAR(3) NOT NULL REFERENCES Personnel,

    PRIMARY KEY (job_title, emp_id));

    Did you know that by definition – let me repeat, that by definition –Has to have a key? Did you see the references clauses in this DDL? That is why your narrative is useless; we need to know how you expressed your 1:m or m:n relationships in your DDL. In order to help you we have to do everything you fail to do for us; When someone is this rude, do you are you inclined to help them?

    INSERT INTO Job_Assignments

    VALUES

    ('user', '111'),

    ('user', '222'),

    ('Manger', '111'),

    ('Engineer', '222'),

    ('Engineer', '333'),

    ('Controller', '111');

    >> Now I want to write a query to join the tables but exclude all data for Personnel that have job_title = ‘Controller’. Note: I don't just want to exclude those specific rows where the job_id = controller, but I want to exclude ALL the joined rows for any person that is assigned to the Controller role (as well as any other jobs). <<

    Let’s do this with CTE’s so you can see the steps:

    WITH

    Non_Controllers

    AS

    (SELECT emp_id

    FROM Job_Assignments

    WHERE job_title <> ‘Controller')

    SELECT *

    FROM Job_Assignments

    WHERE emp_id IN (SELECT * FROM Non_Controllers));

    You can fold the CTE into the from clause of the query, if you wish. But this is just a little easier to read.

    I don't disagree with what you are saying, but this DB and all of its related tables are part of a vendor out of the box software - we did not design it this way. That said, next time I will remember to include keys/references/constraints to assist. Thanks for your reply.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply