Porcess Range IDs

Question

Porcess Range IDs

Dan Segalles

SSC-Addicted

Points: 412
More actions
September 2, 2008 at 12:11 pm

#192222

Hello everyone,
DECLARE @MyTable TABLE
(
RowID bigint,
RowKey int,
Locale varchar(10),
Descr nvarchar(1000)
)
INSERT INTO @MyTable (RowID,RowKey,Locale,Descr)
SELECT '1','5','en','Sample Description 1' UNION ALL
SELECT '2','5','de','Sample Description 2' UNION ALL
SELECT '3','4','en','Sample Description 3' UNION ALL
SELECT '4','3','en','Sample Description 4' UNION ALL
SELECT '5','2','en','Sample Description 5' UNION ALL
SELECT '6','1','en','Sample Description 6' UNION ALL
SELECT '7','1','de','Sample Description 7' UNION ALL
SELECT '8','2','en','Sample Description 8' UNION ALL
SELECT '9','3','en','Sample Description 9' UNION ALL
SELECT '10','5','en','Sample Description 10' UNION ALL
SELECT '11','4','en','Sample Description 11' UNION ALL
SELECT '12','3','en','Sample Description 12' UNION ALL
SELECT '13','2','en','Sample Description 13' UNION ALL
SELECT '14','1','en','Sample Description 14' UNION ALL
SELECT '15','3','de','Sample Description 15'
SELECT * FROM @MyTable
In the above data, How do I get the range of RowIDs in such a way that there will be no duplicate keys(RowKey) in a certain RowID range?
I need to check both the [RowKey] and [Locale] columns combination in getting the range.
So the above data shall output/result the following range:
RowID
8
12
16
So the range from 1-8 will cover only the following range of data with no duplicate based on the column
combination [RowKey] & [Locale]:
RowID RowKey Locale Descr
15enSample Description 1
2 5deSample Description 2
34enSample Description 3
43enSample Description 4
52enSample Description 5
61enSample Description 6
71deSample Description 7
then the range of 8-12 will cover only the following range of data:
RowID RowKey Locale Descr
8 2enSample Description 8
9 3enSample Description 9
105enSample Description 10
114enSample Description 11
then the range of 12-16 will cover only the following range of data:
RowID RowKey Locale Descr
123enSample Description 12
13 2enSample Description 13
141enSample Description 14
153deSample Description 15
You may be wondering why there is RowID 16 in the output, the reason is that I need to get the MAX(RowID)+1 to cover all the range needed.
Anyone who knows how to get these range ids?
thank you very much for your help.
- dans

Viewing 15 posts - 1 through 15 (of 31 total)

You must be logged in to reply to this topic. Login to reply

GSquared SSC Guru Points: 260824 More actions · Answer 1

I'm afraid I don't understand the relationship between RowID and RowKey. How do you determine the range in which no RowKey can be duplicate?

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Dan Segalles SSC-Addicted Points: 412 More actions · Answer 2

RowIDs are just for row counters on which to return the range result. What I need to look at is the combination of the two columns [RowKey] and [Locale] as what I've mentioned above. If there is no RowKey & Locale can be duplicate then the range result is

RowID

16

which means that the range is from 1-16 since there's no duplicates are found within this range.

thanks a lot for your time and help.

GSquared SSC Guru Points: 260824 More actions · Answer 3

So, what you're trying to find is how many in a row have no duplicates? Is that correct?

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Dan Segalles SSC-Addicted Points: 412 More actions · Answer 4

Actually no, I'm trying to find the interleaving rows of unique data (trying to look at the combination of RowKey and Locale column). In the above sample, you can see that the range result doesn't have duplicates for every range of RowID's.

So from 1-8 range of RowID's which has the data below:

First Range

RowID RowKey Locale Descr

15enSample Description 1

2 5deSample Description 2

34enSample Description 3

43enSample Description 4

52enSample Description 5

61enSample Description 6

71deSample Description 7

each of RowKey & Locale column data combination are unique(meaning no duplicates), RowID 8 is not included on the first range since it has the same (RowKey & Locale) data with RowID 5 which is within the first range already.

So, RowID 8 will belong to the second range, as shown below:

Second Range

RowID RowKey Locale Descr

8 2enSample Description 8

9 3enSample Description 9

105enSample Description 10

114enSample Description 11

Since RowID 12 (RowKey & Locale) data is the same with RowID 9, it won't be included in the second range 8-12 instead on the third range which is 12-16:

Third Range

RowID RowKey Locale Descr

123enSample Description 12

13 2enSample Description 13

141enSample Description 14

153deSample Description 15

Hope this will help enlighten on what I am about to achieve.

Thank you for your help.

-dans

parackson Mr or Mrs. 500 Points: 532 More actions · Answer 5

dsegalles80 (9/2/2008)
Actually no, I'm trying to find the interleaving rows of unique data

Are you trying to receive multiple resultsets....

RowID RowKey Locale Descr
153deSample Description 15

Also from reading this wouldn't RowID 15 be part of the first set.

GSquared SSC Guru Points: 260824 More actions · Answer 6

Use the Row_Number() function, partition by location and row key, to find the duplicates. Then select from one to the next, not including the next duplicate. That should give you what you want.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Dan Segalles SSC-Addicted Points: 412 More actions · Answer 7

To parackson:

RowID 15 shouldn't be part of first set since this should be in sequence and this is a stored procedure which returns directly the result (range of row ids)

To GSquared:

Thank you for the idea about the partition but if can give me the sample source code in where it can output the correct result, then I would really appreciate it.

It's something like this:

SELECT RowID,RowKey,

row_number() OVER (PARTITION BY RowKey,Locale ORDER BY RowID) AS DupRank

FROM @MyTable

which will give you the result with ranking for every duplicate, but how would I select/traverse from one to the next, not including the next duplicate?

thanks so much for the help.

- dans

GSquared SSC Guru Points: 260824 More actions · Answer 8

Yes, that use of Row_Number will give you the ones that are duplicates.

What you have to do at that point is find the first one where the row number is higher than 1, and end your set there. Well, one row before that (you can use your row ID for that part).

Then start at that one, and re-run the duplicate check.

This will have to be a recursive function of some sort.

It's not going to perform well, but SQL isn't designed to deal with rows with an inherent, complex sequentiality since relational databases are inherently non-ordered.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Dan Segalles SSC-Addicted Points: 412 More actions · Answer 9

I see. What we are dealing here is a huge data like about 2 Million or more with possible pattern as to my sample. Aside from this, I'm also looking for the optimize possible solution to generate the exact result on this. Thank you for your inputs GSquared though it is much appreciated if code samples are displayed here to have a clearer view on what you mean.

I really appreciate your help.

-dans

Christian Buettner-167247 SSChampion Points: 13729 More actions · Answer 10

Hi dans,

You might try something like in the attached example.

I have not spent too much time verifying it, so it might contain a few errors.

But the concept might be helpful.

I have tested it with 5000 copies of the original data you posted, and the query ran for approximately 1:30 minutes for 75000 RowIDs and 15000 Ranges.

Since the ranges were very small in the example, you might be able to achieve better results

with your real table if there alre also bigger ranges.

You might also try to play around with the amount by which the OptRows is increased and decreased on each test.

Best Regards,

Chris Büttner

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 11

I gotta ask... why do you need to do this? What is the purpose of this?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

jj011185 Old Hand Points: 324 More actions · Answer 12

hmm... interesting, how about having a counter that resets to 1 when it encounters a duplicate... i'm not sure if this is applicable/achievable using partitioning/ranking,

my thought would be

RowIDRowKeyCounter

151

251

342

433

524

611

711

Then selecting all Counter with value of 1

RowIDRowKeyCounter

151

251

611

711

GSquared SSC Guru Points: 260824 More actions · Answer 13

jj, the problem with that is it has to be a duplicate, since the last duplicate.

Getting all the duplicates is easy. Getting the duplicates since the last duplicate is a pain.

It means that, if row 8 is a duplicate of row 3, and row 10 is a duplicate of row 5, then row 10 doesn't count as a duplicate, because 5 is before 8.

I don't see a way around recursive processing. A function could take a "start row" input parameter, finds the first duplicate, return the ID, and then call itself with that as the new start point. That could give you a list of the start and end of each set.

Of course, recursion is RBAR's slightly less ugly cousin, but I don't see a way around it in this case.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

jj011185 Old Hand Points: 324 More actions · Answer 14

GSquared (9/4/2008)
jj, the problem with that is it has to be a duplicate, since the last duplicate.
Getting all the duplicates is easy. Getting the duplicates since the last duplicate is a pain.
It means that, if row 8 is a duplicate of row 3, and row 10 is a duplicate of row 5, then row 10 doesn't count as a duplicate, because 5 is before 8.
I don't see a way around recursive processing. A function could take a "start row" input parameter, finds the first duplicate, return the ID, and then call itself with that as the new start point. That could give you a list of the start and end of each set.
Of course, recursion is RBAR's slightly less ugly cousin, but I don't see a way around it in this case.

hmm, sounds like its going to be messy.

lets try not to go into cursors/loops/recursion.. this is indeed challenging. if not impossible.

How about, Counter only Resets to 1 for every duplicate key encountered in a range

RowID RowKey Counter

1 1 1

2 2 2

3 3 3

4 3 1

5 2 2

6 1 3

7 3 1

8 2 2

9 2 1

10 4 2

and get all the 1

RowID RowKey Counter

1 1 1

4 3 1

7 3 1

9 2 1