number the Group on data from two columns duplicated

Question

Post reply

number the Group on data from two columns duplicated

shaini.binu

SSC Enthusiast

Points: 187
More actions
January 14, 2010 at 11:07 am

#229155

Hi All,
i have a table LinkTable which contains two intger columns ID1 and ID2 which may contain duplicate data interchanged in different rows
Sample data of the table,
SID(primary key)| ID1 | ID2
=================================
1 | 101 | 102
2 | 102 | 101
3 | 103 | 104
4 | 103 | 105
5 | 104 | 105
6 | 106 | 104
7 | 107 | 108
I need a sql query which will form groups based on values of ID1 and ID2 and need to number the group starting from 1
ID1 | ID2
================================================
101 | 102 => 1 and 2 will go to group 1
102 | 101 => Already 1 and 2 is added so ignore
103 | 104 => 3 and 4 will make group 2
103 | 105 => 5 will go to group 2 as its ID1 is 3, which in turn is part of group 1
104 | 105 => ignore since 4 and 5 already added
106 | 104 => 6 will go to group 2 as its ID2 4 is of group 2
107 | 108 => 7 and 8 will make group 3
so the result will look like this :
Group | Member
=======|============
1 | 101
1 | 102
2 | 103
2 | 104
2 | 105
2 | 106
3 | 107
3 | 108
Please help me to solve this problem,
Thanks
Sh.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

LutzM SSC Guru Points: 107049 More actions · Answer 1

Would you mind sharing the business case that would use such a strange data structure?

Regarding your requirement:

Am I right that you'd need to start a new group as soon as neither one of the numbers in one row has been previously assigned to a group?

If so, how would you handle row 6 to be "6 | 101 | 104" instead of "6 | 106 | 104" (both values in a row are preassigned, but to different groups)?

Are the groups supposed to be consecutive or can you "jump back" to let's say group 1 if row 8 would be 8 | 101 | 109?

Again: really strange requirement...

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

shaini.binu SSC Enthusiast Points: 187 More actions · Answer 2

Hi Lutz,

Thanks for the quick reply,

Regarding the requirement ,

In LinkTable , ID1 contains the uniqueid of a transaction and ID2 contans the uniqueid of the transaction related to it,

One transaction can be related to 0 or many transactions,

We are in the process of creating specific groups for all related transaction.

Quote: Am I right that you'd need to start a new group as soon as neither one of the numbers in one row has been previously assigned to a group?

Ans: Yes, you are correct ,

Quote:If so, how would you handle row 6 to be "6 | 101 | 104" instead of "6 | 106 | 104" (both values in a row are preassigned, but to different groups)?

Are the groups supposed to be consecutive or can you "jump back" to let's say group 1 if row 8 would be 8 | 101 | 109?

Ans :

If the linktable contains data :

SID(primary key)|ID1 |ID2

====================

1| 101 | 102

2| 102 | 101

3| 103 | 104

4| 103 | 105

5| 104 | 105

6| 101 | 104

7| 107 | 108

8| 101 | 109

The expected output is:

GroupNum| ID

==========

1| 101

1| 102

1| 103

1| 104

1| 105

2| 107

2| 108

1| 109

Thanks & Regards

Sh.

LutzM SSC Guru Points: 107049 More actions · Answer 3

I thought about applying the concept of a hierarchical CTE (aka recursive CTE, see BOL for details). Basically, I would have used the smaller ID per row as "manager" and the other number as "employee" (this would actually take care of (101,102) and (102,101) not to cause an infinite loop.

But after spending quite some time with it, I have to give up.

The sample data you provided do not show any consistent logical pattern to apply a hierarchy structure to.

Examples:

In your second set of sample "employee" 105 would have to report to "manager" 104 and both have to report to 103 at the same time.

As far as I can see, the only way to solve that puzzle would be using a *cough* cursor *cough**cough* since it would handle one row at a time, ignoring inconsistent data. (Un)fortunately, my SQL Server version is not capable of programming a cursor (or maybe I'm not. Can't tell.) 😉

But the much more recommended way would be to get normalized data. Those could be transformed using hierarchy tools like recursive CTEs.

Btw: I can't think of a business case using transaction IDs that are related in a many to many relationship including circular reference across several levels...

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Jeff Moden SSC Guru Points: 1004401 More actions · Answer 4

shaini.binu (1/14/2010)
Hi Lutz,
Thanks for the quick reply,
Regarding the requirement ,
In LinkTable , ID1 contains the uniqueid of a transaction and ID2 contans the uniqueid of the transaction related to it,
One transaction can be related to 0 or many transactions,
We are in the process of creating specific groups for all related transaction.
[font="Arial Black"]Quote: Am I right that you'd need to start a new group as soon as neither one of the numbers in one row has been previously assigned to a group?
Ans: Yes, you are correct ,[/font]
Quote:If so, how would you handle row 6 to be "6 | 101 | 104" instead of "6 | 106 | 104" (both values in a row are preassigned, but to different groups)?
Are the groups supposed to be consecutive or can you "jump back" to let's say group 1 if row 8 would be 8 | 101 | 109?
Ans :
If the linktable contains data :
SID(primary key)|ID1 |ID2
====================
1| 101 | 102
2| 102 | 101
3| 103 | 104
4| 103 | 105
5| 104 | 105
6| 101 | 104
7| 107 | 108
8| 101 | 109
The expected output is:
GroupNum| ID
==========
1| 101
1| 102
1| 103
1| 104
1| 105
2| 107
2| 108
1| 109
Thanks & Regards
Sh.

Then why doesn't 103, 104, and 105 live in a group 2 by themselves?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

LutzM SSC Guru Points: 107049 More actions · Answer 5

Jeff Moden (1/14/2010)
shaini.binu (1/14/2010)
[font="Arial Black"]Quote: Am I right that you'd need to start a new group as soon as neither one of the numbers in one row has been previously assigned to a group?
Ans: Yes, you are correct ,[/font]
Ans :
If the linktable contains data :
SID(primary key)|ID1 |ID2
====================
1| 101 | 102
2| 102 | 101
3| 103 | 104
4| 103 | 105
5| 104 | 105
6| 101 | 104
7| 107 | 108
8| 101 | 109
The expected output is:
GroupNum| ID
==========
1| 101
1| 102
1| 103
1| 104
1| 105
2| 107
2| 108
1| 109
Thanks & Regards
Sh.
Then why doesn't 103, 104, and 105 live in a group 2 by themselves?

As far as I would read it, row #6 makes 103 to 105 belonging to group 1.

Instead of "previously assigned" the term probably should be "assigned anywhere in the table".

I think to solve this puzzle you'd need to order the rows asc by the value of either id1 or id2 (per row), depending on which one is less than the other...

Using the example from above you'd end up with

SID(primary key)|ID1 |ID2

====================

1| 101 | 102

2| 101 | 102 (ID1 and ID2 switched)

6| 101 | 104

8| 101 | 109

3| 103 | 104

4| 103 | 105

5| 104 | 105

7| 107 | 108

... at least that's what I guess should be done to get the expected result...

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

shaini.binu SSC Enthusiast Points: 187 More actions · Answer 6

Hi,

I have solved the puzzle with the help of cursurs, Thanks for all the help,

Thanks

Sh.

Jeff Moden SSC Guru Points: 1004401 More actions · Answer 7

shaini.binu (1/15/2010)
Hi,
I have solved the puzzle with the help of cursurs, Thanks for all the help,
Thanks
Sh.

Two way street here, shaini... please post the code for your solution.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

aaron.reese SSChampion Points: 13498 More actions · Answer 8

Would I be correct in thinking that these are reconciliation groups?

If you have 3 receipts and 10 invoices with cross associations in the reconciliation they all need to be in the same reconciliation group....

not sure exactly how to solve the problem yet, but cursors is NOT a nice solution.

Which version on MSSQL are you on.

Jeff Moden SSC Guru Points: 1004401 More actions · Answer 9

Jeff Moden (1/15/2010)
shaini.binu (1/15/2010)
Hi,
I have solved the puzzle with the help of cursurs, Thanks for all the help,
Thanks
Sh.
Two way street here, shaini... please post the code for your solution.

Heh... so much for Shaini... 😉 Stuck with cursors and doesn't even know he has a problem... yet.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

aaron.reese SSChampion Points: 13498 More actions · Answer 10

This kept me awake half the night - I knew it shouldn't be too difficult. (would have been a nightmare in SQL2000)

If you were to put this in a production environment or an sp, you could wrap the repeat block in a do while...loop

I think the current data has to go through 4 iterations to get the required results

begin try

drop table #x1

end try

begin catch

-- no table to drop

end catch

begin try

drop table #x2

end try

begin catch

-- no table to drop

end catch

-- bulid the base data

create table #x1

(

IDint,

value1 int,

value2 int

)

insert into #x1 (ID,value1,value2) values (1, 101 , 102)

insert into #x1 (ID,value1,value2) values (2, 102 , 101 )

insert into #x1 (ID,value1,value2) values (3, 103 , 104 )

insert into #x1 (ID,value1,value2) values (4, 103 , 105 )

insert into #x1 (ID,value1,value2) values (5, 104 , 105 )

insert into #x1 (ID,value1,value2) values (6, 101 , 104 )

insert into #x1 (ID,value1,value2) values (7, 107 , 108 )

insert into #x1 (ID,value1,value2) values (8, 101 , 109)

-- normalise the data

Create table #x2

(

ID int identity(1,1),

PairID int,

Value int,

GroupID int

)

insert into #x2 select ID,value1,ID from #X1

insert into #x2 select ID,value2,ID from #X1

/* repeat the bit below until you get 0 rows affected */

-- get the lowest grouping for each value

with mingroup as

(select value,min(groupID) as 'mingroupID' from #x2 group by value

)

,

x3 as

(

select

x.PairID,

x.value,

x.GroupID,

mg.mingroupID

from

#x2 x

join

mingroup mg on x.value = mg.value

)

,

x4 as(

select pairID,min(mingroupID) as mingroupID from x3 group by PairID

)

update #x2

set #x2.GroupID = x4.mingroupID

from #x2 inner join x4 on #x2.pairID = x4.pairID

where #x2.GroupID <> x4.mingroupID

/* End Repeat */

-- show the full results

select distinct groupID,Value from #X2 order by GroupID,Value