September 2, 2008 at 12:11 pm
Hello everyone,
DECLARE @MyTable TABLE
(
RowID bigint,
RowKey int,
Locale varchar(10),
Descr nvarchar(1000)
)
INSERT INTO @MyTable (RowID,RowKey,Locale,Descr)
SELECT '1','5','en','Sample Description 1' UNION ALL
SELECT '2','5','de','Sample Description 2' UNION ALL
SELECT '3','4','en','Sample Description 3' UNION ALL
SELECT '4','3','en','Sample Description 4' UNION ALL
SELECT '5','2','en','Sample Description 5' UNION ALL
SELECT '6','1','en','Sample Description 6' UNION ALL
SELECT '7','1','de','Sample Description 7' UNION ALL
SELECT '8','2','en','Sample Description 8' UNION ALL
SELECT '9','3','en','Sample Description 9' UNION ALL
SELECT '10','5','en','Sample Description 10' UNION ALL
SELECT '11','4','en','Sample Description 11' UNION ALL
SELECT '12','3','en','Sample Description 12' UNION ALL
SELECT '13','2','en','Sample Description 13' UNION ALL
SELECT '14','1','en','Sample Description 14' UNION ALL
SELECT '15','3','de','Sample Description 15'
SELECT * FROM @MyTable
In the above data, How do I get the range of RowIDs in such a way that there will be no duplicate keys(RowKey) in a certain RowID range?
I need to check both the [RowKey] and [Locale] columns combination in getting the range.
So the above data shall output/result the following range:
RowID
8
12
16
So the range from 1-8 will cover only the following range of data with no duplicate based on the column
combination [RowKey] & [Locale]:
RowID RowKey Locale Descr
15enSample Description 1
2 5deSample Description 2
34enSample Description 3
43enSample Description 4
52enSample Description 5
61enSample Description 6
71deSample Description 7
then the range of 8-12 will cover only the following range of data:
RowID RowKey Locale Descr
8 2enSample Description 8
9 3enSample Description 9
105enSample Description 10
114enSample Description 11
then the range of 12-16 will cover only the following range of data:
RowID RowKey Locale Descr
123enSample Description 12
13 2enSample Description 13
141enSample Description 14
153deSample Description 15
You may be wondering why there is RowID 16 in the output, the reason is that I need to get the MAX(RowID)+1 to cover all the range needed.
Anyone who knows how to get these range ids?
thank you very much for your help.
- dans
September 2, 2008 at 1:12 pm
I'm afraid I don't understand the relationship between RowID and RowKey. How do you determine the range in which no RowKey can be duplicate?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
September 2, 2008 at 1:31 pm
RowIDs are just for row counters on which to return the range result. What I need to look at is the combination of the two columns [RowKey] and [Locale] as what I've mentioned above. If there is no RowKey & Locale can be duplicate then the range result is
RowID
16
which means that the range is from 1-16 since there's no duplicates are found within this range.
thanks a lot for your time and help.
September 2, 2008 at 2:46 pm
So, what you're trying to find is how many in a row have no duplicates? Is that correct?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
September 2, 2008 at 4:04 pm
Actually no, I'm trying to find the interleaving rows of unique data (trying to look at the combination of RowKey and Locale column). In the above sample, you can see that the range result doesn't have duplicates for every range of RowID's.
So from 1-8 range of RowID's which has the data below:
First Range
RowID RowKey Locale Descr
15enSample Description 1
2 5deSample Description 2
34enSample Description 3
43enSample Description 4
52enSample Description 5
61enSample Description 6
71deSample Description 7
each of RowKey & Locale column data combination are unique(meaning no duplicates), RowID 8 is not included on the first range since it has the same (RowKey & Locale) data with RowID 5 which is within the first range already.
So, RowID 8 will belong to the second range, as shown below:
Second Range
RowID RowKey Locale Descr
8 2enSample Description 8
9 3enSample Description 9
105enSample Description 10
114enSample Description 11
Since RowID 12 (RowKey & Locale) data is the same with RowID 9, it won't be included in the second range 8-12 instead on the third range which is 12-16:
Third Range
RowID RowKey Locale Descr
123enSample Description 12
13 2enSample Description 13
141enSample Description 14
153deSample Description 15
Hope this will help enlighten on what I am about to achieve.
Thank you for your help.
-dans
September 2, 2008 at 4:34 pm
dsegalles80 (9/2/2008)
Actually no, I'm trying to find the interleaving rows of unique data
Are you trying to receive multiple resultsets....
RowID RowKey Locale Descr
153deSample Description 15
Also from reading this wouldn't RowID 15 be part of the first set.
September 3, 2008 at 7:32 am
Use the Row_Number() function, partition by location and row key, to find the duplicates. Then select from one to the next, not including the next duplicate. That should give you what you want.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
September 3, 2008 at 8:28 am
To parackson:
RowID 15 shouldn't be part of first set since this should be in sequence and this is a stored procedure which returns directly the result (range of row ids)
To GSquared:
Thank you for the idea about the partition but if can give me the sample source code in where it can output the correct result, then I would really appreciate it.
It's something like this:
SELECT RowID,RowKey,
row_number() OVER (PARTITION BY RowKey,Locale ORDER BY RowID) AS DupRank
FROM @MyTable
which will give you the result with ranking for every duplicate, but how would I select/traverse from one to the next, not including the next duplicate?
thanks so much for the help.
- dans
September 3, 2008 at 1:31 pm
Yes, that use of Row_Number will give you the ones that are duplicates.
What you have to do at that point is find the first one where the row number is higher than 1, and end your set there. Well, one row before that (you can use your row ID for that part).
Then start at that one, and re-run the duplicate check.
This will have to be a recursive function of some sort.
It's not going to perform well, but SQL isn't designed to deal with rows with an inherent, complex sequentiality since relational databases are inherently non-ordered.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
September 3, 2008 at 2:44 pm
I see. What we are dealing here is a huge data like about 2 Million or more with possible pattern as to my sample. Aside from this, I'm also looking for the optimize possible solution to generate the exact result on this. Thank you for your inputs GSquared though it is much appreciated if code samples are displayed here to have a clearer view on what you mean.
I really appreciate your help.
-dans
September 3, 2008 at 5:17 pm
Hi dans,
You might try something like in the attached example.
I have not spent too much time verifying it, so it might contain a few errors.
But the concept might be helpful.
I have tested it with 5000 copies of the original data you posted, and the query ran for approximately 1:30 minutes for 75000 RowIDs and 15000 Ranges.
Since the ranges were very small in the example, you might be able to achieve better results
with your real table if there alre also bigger ranges.
You might also try to play around with the amount by which the OptRows is increased and decreased on each test.
Best Regards,
Chris Büttner
September 3, 2008 at 5:41 pm
I gotta ask... why do you need to do this? What is the purpose of this?
--Jeff Moden
Change is inevitable... Change for the better is not.
September 4, 2008 at 7:51 am
hmm... interesting, how about having a counter that resets to 1 when it encounters a duplicate... i'm not sure if this is applicable/achievable using partitioning/ranking,
my thought would be
RowIDRowKeyCounter
151
251
342
433
524
611
711
Then selecting all Counter with value of 1
RowIDRowKeyCounter
151
251
611
711
September 4, 2008 at 8:45 am
jj, the problem with that is it has to be a duplicate, since the last duplicate.
Getting all the duplicates is easy. Getting the duplicates since the last duplicate is a pain.
It means that, if row 8 is a duplicate of row 3, and row 10 is a duplicate of row 5, then row 10 doesn't count as a duplicate, because 5 is before 8.
I don't see a way around recursive processing. A function could take a "start row" input parameter, finds the first duplicate, return the ID, and then call itself with that as the new start point. That could give you a list of the start and end of each set.
Of course, recursion is RBAR's slightly less ugly cousin, but I don't see a way around it in this case.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
September 4, 2008 at 9:16 am
GSquared (9/4/2008)
jj, the problem with that is it has to be a duplicate, since the last duplicate.Getting all the duplicates is easy. Getting the duplicates since the last duplicate is a pain.
It means that, if row 8 is a duplicate of row 3, and row 10 is a duplicate of row 5, then row 10 doesn't count as a duplicate, because 5 is before 8.
I don't see a way around recursive processing. A function could take a "start row" input parameter, finds the first duplicate, return the ID, and then call itself with that as the new start point. That could give you a list of the start and end of each set.
Of course, recursion is RBAR's slightly less ugly cousin, but I don't see a way around it in this case.
hmm, sounds like its going to be messy.
lets try not to go into cursors/loops/recursion.. this is indeed challenging. if not impossible.
How about, Counter only Resets to 1 for every duplicate key encountered in a range
RowID RowKey Counter
1 1 1
2 2 2
3 3 3
4 3 1
5 2 2
6 1 3
7 3 1
8 2 2
9 2 1
10 4 2
and get all the 1
RowID RowKey Counter
1 1 1
4 3 1
7 3 1
9 2 1
Viewing 15 posts - 1 through 15 (of 31 total)
You must be logged in to reply to this topic. Login to reply