Self Join Query

  • Hello Peers,

    I need some help in framing a self join query. The requirement is such that I need to port the data from one data model which doesn't have any foreign key constraint on the table but the other data model has a foreign key constraint. My aim is to eliminate those records which doesn't have the proper relation.

    Table 1 structure in data model 1 is as follows:

    ==================================

    CREATE TABLE dept

    (

    dept_id int Primary Key Not Null,

    dept_name varchar(100) Not Null,

    parent_id int Null

    )

    Data insertion script for Table 1

    ========================

    INSERT INTO dbo.dept VALUES(1, 'Test', 0)

    INSERT INTO dbo.dept VALUES(2, 'Test1', 1)

    INSERT INTO dbo.dept VALUES(3, 'Test2', 1)

    INSERT INTO dbo.dept VALUES(4, 'Test3', 2)

    INSERT INTO dbo.dept VALUES(5, 'Test4', 4)

    INSERT INTO dbo.dept VALUES(6, 'Test5', 0)

    INSERT INTO dbo.dept VALUES(7, 'Test6', 8)

    INSERT INTO dbo.dept VALUES(9, 'Test7', 6)

    INSERT INTO dbo.dept VALUES(10, 'Test8', 7)

    INSERT INTO dbo.dept VALUES(11, 'Test9', 100)

    INSERT INTO dbo.dept VALUES(12, 'Test10', 14)

    INSERT INTO dbo.dept VALUES(13, 'Test11', 4)

    INSERT INTO dbo.dept VALUES(14, 'Test12', 0)

    Table 2 structure in data model 2 is as follows:

    ==================================

    CREATE TABLE dept_hierarchy

    (

    dept_id int Primary Key Not Null,

    dept_name varchar(100) Not Null,

    parent_id int Null

    )

    ALTER TABLE dbo.dept_hierarchy WITH CHECK CONSTRAINT [FK_ParentId_DeptId] FOREIGN KEY (parent_id) REFERENCES dbo.dept_hierarchy(dept_id)

    I need to insert the records where table1.dept_id in (1, 2, 3, 4, 5, 6, 9, 12, 13, 14) into table2.dept_hierarchy.

    I would appreciate for a quick reply.

    Thanks in advance.


    Lucky

  • First, I would like to know why would you need to have a separate tables for the same entity? From this, may be we can help you eliminate incorrect data in the first place?

    --Ramesh


  • Ramesh (2/15/2009)


    First, I would like to know why would you need to have a separate tables for the same entity? From this, may be we can help you eliminate incorrect data in the first place?

    Hello Ramesh,

    As I quoted in my initial post, we are moving from one data model to another data model. In order to port the existing data from the base model to the new model, I need to first eliminate the improper relations from the existing data due to "WITH CHECK FOREIGN KEY" constraint on the same table in the new model. I think you are confused with seperate table (entity) names. I provided for easy understanding and they are not in the same database.

    Hope I'm clear this time.

    Thanks in advance.


    Lucky

  • I believe you are looking for a recursive way to find all those dept that have a valid root dept by eliminating those that have in invalid root.

    i.e:

    (7, 'Test6', 8) and (10, 'Test8', 7)

    dept 7 is not valid because 8 does not exists.

    dept 10 is also not valid because the parent of 7 does not exists.

    And this condition can continue in even more levels like:

    (20,'',10) None of these are valid because the lack of dept 8 creates a tree of invalid depts.

    (30,'',20)

    (40,'',30)

    A solution to this is to create a recursive CTE. You start by getting a baseline of root dept_ids and recursively selecting the children of those departments. You can then join the CTE with the table to get the children dept. This will continue to return rows with valid child-parent links.

    WITH DEPT_TREE (dept_id, parent_id) as

    (

    --select all the root depts parent_id = 0

    select dept_id, parent_id from dept(nolock)

    where parent_id=0

    union all

    --select all the children depts associated to the root

    --and recursively move to the next levels until there

    --are no more dept.

    select a.dept_id, a.parent_id from dept(nolock) a

    inner join DEPT_TREE b on a.parent_id = b.dept_id

    )

    select * from DEPT_TREE

    order by dept_id

    I hope this helps.

  • Hello ozkary,

    Thanks for the reply.

    This is exactly what I've done before posting to the forum but somehow it is giving odd results. Every line matches to the one you posted.

    I'll debug and will post an update if anything works out.


    Lucky

  • Perhaps null values are giving you different results. You should check for null values on either side of the join (parent_id or dept_id). You can add a isnull(col,-1) around the values.

    I hope that helps

  • Part of the problem is that you don't have nulls where you do need them. You want to have a self-referential chain of departments and sub-departments. That is fine, but you have to have at least one root or super department. This means departments with NULL in the parent_id field. I submit the following having assumed that those are the ones where you have zero in that field.

    create table dept (

    dept_id int Primary Key Not Null,

    dept_name varchar(100) Not Null,

    parent_id int Null

    );

    go

    -- There must be at least one dept at the top of the

    -- reference chain. Assuming those are the ones with

    -- parent_id = 0, you have to change the 0 to NULL.

    -- In your case, with existing data, this would be

    -- an update statement:

    -- update dept

    -- set parent_id = null

    -- where parent_id = 0;

    insert into dbo.dept

    (dept_id, dept_name, parent_id)

    select 1, 'Test', null union all --<< 0 value changed to NULL

    select 2, 'Test1', 1 union all

    select 3, 'Test2', 1 union all

    select 4, 'Test3', 2 union all

    select 5, 'Test4', 4 union all

    select 6, 'Test5', null union all --<< 0 value changed to NULL

    select 7, 'Test6', 8 union all

    select 9, 'Test7', 6 union all

    select 10, 'Test8', 7 union all

    select 11, 'Test9', 100 union all

    select 12, 'Test10', 14 union all

    select 13, 'Test11', 4 union all

    select 14, 'Test12', null; --<< 0 value changed to NULL

    create table dept_hierarchy (

    dept_id int Primary Key Not Null,

    dept_name varchar(100) Not Null,

    parent_id int Null

    );

    go

    -- You may want to add the FK /after/ the data transfer if

    -- there are many rows to transfer.

    ALTER TABLE dbo.dept_hierarchy WITH CHECK

    add CONSTRAINT [FK_ParentId_DeptId]

    FOREIGN KEY (parent_id)

    REFERENCES dbo.dept_hierarchy(dept_id);

    go

    -- Execute this step once to start the chain

    insert dbo.dept_hierarchy

    (dept_id, dept_name, parent_id)

    select dept_id, dept_name, parent_id

    from dbo.dept

    where parent_id is null;

    -- The following step must be executed once for every sublevel

    -- in the hierarchy. You know you are finished when there are

    -- no more rows to insert.

    while 1 = 1 begin

    insert dbo.dept_hierarchy

    (dept_id, dept_name, parent_id)

    select d1.dept_id, d1.dept_name, d1.parent_id

    from dbo.dept d1

    join dbo.dept_hierarchy d2

    on d1.parent_id = d2.dept_id

    where d1.dept_id not in(

    select dept_id

    from dbo.dept_hierarchy);

    if @@RowCount = 0

    break;

    end;

    -- Show the result

    select *

    from dbo.dept_hierarchy;

    There may be a way to do this with a recursive CTE, but for a one-shot operation, it's just not worth the effort.

    Tomm Carr
    --
    Version Normal Form -- http://groups.google.com/group/vrdbms

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply