October 27, 2008 at 7:21 am
I continue to think that a complete statement of the problem coming from the OP has yet to occur. There is no written description of the EXACT process to follow for any given record within a group of A's. I'm not even sure the OP knows what he wants.
Dan - PLEASE provide a COMPLETE description of EXACTLY how to handle ANY GIVEN RECORD within a set of A's, that will give the CORRECT RESULT, regardless of the input, and TAKES INTO ACCOUNT ALL POSSIBLE INPUTS. The description needs to be written, and it needs to leave absolutely NO ROOM for doubt about what the words mean. I'm thoroughly convinced at this point that this problem will not be solvable until we understand what you want in the EXACT SAME way that YOU understand it. We need to be able to read the description and come away without any questions. So far, every description you've provided has left any of a number of questions, and some of the behavior you're asking for in producing the correct result could only be a result of an inconsistent set of rules or some other undisclosed rule that we haven't seen you post as yet. Make NO assumptions, ok?
Steve
(aka smunson)
:):):)
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
October 27, 2008 at 7:23 am
Or... the even more elusive question... what on earth the end goal of this thing really is. Part of the reason we are unable to come up with any good set based approach is because I don't think anyone (with the possible exception of yourself) really understands what you're trying to do here.
October 27, 2008 at 4:28 pm
This thread is still going !!?!! I'm sorry to have missed all the fun, but I'm feeling lucky, so I'll try again. If the results are correct, then someone else can make it go faster. From what I've read, there are only two things to identify:
1. The rows where the Code column "breaks" to a new code, which doesn't include the first row in the table.
2. Any duplicates (same Key, Locale and Code) within the range between breaks, OTHER than the first row in each set of duplicates.
So anyway, here goes.
============================================
CREATE TABLE #TempTable
( SeqID int identity(1,1) primary key
,RowID int
,RowKey varchar(1000)
,Locale varchar(10)
,Code char(1)
)
INSERT INTO #TempTable
SELECT 1 as rowID,'1','en','A' UNION ALL
SELECT 2,'1','en','A' UNION ALL
SELECT 5,'1','en','C' UNION ALL
SELECT 7,'1','en','A' UNION ALL
SELECT 8,'1','zh','A' UNION ALL
SELECT 10,'1','en','A' UNION ALL
SELECT 11,'1','zh','A' UNION ALL
SELECT 20,'1','en','D' UNION ALL
SELECT 21,'1','en','D'
order by rowID -- be sure to order by rowID
select * from #temptable
;with cte1 as -- identify breaks
(select t1.code,t1.rowid
from #temptable t1
left join #temptable t2 on t1.seqID = t2.seqID+1
where t1.code <> t2.code
)
,cte1a as -- include first row for ranging purposes
(select top 1 code,rowID
from #temptable
union all
select * from cte1)
,cte2 as -- identify ranges for duplicates
(select c1.code,c1.rowid as minRow,isnull(min(c1a.rowID)-1,c1.rowid+1) as maxRow
from cte1a c1
left join cte1a c1a on c1a.code <> c1.code and c1a.rowID > c1.rowID
-- where c1.code = 'A'
group by c1.code,c1.rowID
)
,cte3 as -- identify duplicates
(select c.minrow,max(rowid) as dupID,t.rowKey,t.locale,c.code
from #temptable t
cross join cte2 c
where t.rowid >=minRow and t.rowID <= maxRow
group by c.minRow,t.rowKey,t.locale,c.code
having count(*) > 1
)
,cte4 as
(select dupID as rowID
from cte3
union all
select rowID
from cte1
)
select distinct c.rowID,t.rowkey,t.locale,t.code
from cte4 c
join #temptable t on c.rowID = t.rowID
order by c.rowID
drop table #temptable
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
October 27, 2008 at 5:42 pm
bhovious (10/27/2008)
1. The rows where the Code column "breaks" to a new code, which doesn't include the first row in the table.2. Any duplicates (same Key, Locale and Code) within the range between breaks, OTHER than the first row in each set of duplicates.
Unfortunately it's considerably more complicated than that. He wants it done iteratively, meaning you can't just look for duplicates, you have to go through it line by line taking into account where the last duplicate was within the code group, and restarting a duplicate match search from the last one. However, if there is a copy of a dupe immediately following the exact same dupe, that one is counted as well, but not if there is a different key/locale between them. Beyond that, there's discrepancies over whether the first or second dupe in an iterative series is kept. That's only my current understanding of it, and I don't think that quite encompasses everything. I've tried to do this using a clustered index update (ie. running total technique) about 30 different ways now, but it simply will not work with any local variable inside of the update statement. Nor will it look at the last row updated. Beyond that, with the row by row logic being applied, I simply do not see any set based logic working out here.
My only thoughts for optimization of this currently go towards breaking up your huge table of data into smaller chunks with a second outer loop (with each chunk also taking 100 rows of the next chunk to satisfy your @optrows for scanning) that lets you perform this operation on smaller data sets in hopes of improving the speed of the scans. This may just as easily hurt the performance as help it, but it's something to try.
October 27, 2008 at 5:49 pm
Hey Seth,
Could you give me one example of the iterative situation you describe. I didn't see that in all of the examples I looked at. If you could even point me to the example data he used, I'd be grateful. (Is this what he means back on page 1 when he says row 19 is already covered by the range 14-18?)
Bob
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
October 27, 2008 at 6:11 pm
Basically, this:
IDRowKeyLocaleCode
11ENA
22ENA
32DUA
41ENA
52DUA
You see that Rows 3 and 5 are duplicates. However, because Row 4 is a duplicate of Row 1, the matching starts over at row 4. There is no duplicate of row 5 that is greater than row 4. The proper output in this scenario would be 1 and 4. (as far as I understand it)
Here's where it gets really hairy.
If you have this:
IDRowKeyLocaleCode
11ENA
22ENA
32DUA
41ENA
51ENA
61ENA
72DUA
81ENA
92DUA
My understanding is that he wants 1,4,5,6,9 returned. 5 and 6 get added in because it's a "group" of duplicates. 7 still gets skipped because there is nothing above the last dupe (6) which matches it, and 8 gets skipped because the "group" of 1 EN A has now been broken. 9 gets returned because it is a duplicate of 7 which was the next candidate after the last dupe of 6.
October 27, 2008 at 6:16 pm
bhovious (10/27/2008)
Hey Seth,Could you give me one example of the iterative situation you describe. I didn't see that in all of the examples I looked at. If you could even point me to the example data he used, I'd be grateful. (Is this what he means back on page 1 when he says row 19 is already covered by the range 14-18?)
Bob
Seth's solution is actually iterative. He goes through each row in the table in order and flags duplicates. Take a look at his second post on this thread.
Read this article for further explanation on how this approach works http://www.sqlservercentral.com/articles/Advanced+Querying/61716/
October 27, 2008 at 6:22 pm
ggraber (10/27/2008)
bhovious (10/27/2008)
Hey Seth,Could you give me one example of the iterative situation you describe. I didn't see that in all of the examples I looked at. If you could even point me to the example data he used, I'd be grateful. (Is this what he means back on page 1 when he says row 19 is already covered by the range 14-18?)
Bob
Seth's solution is actually iterative. He goes through each row in the table in order and flags duplicates. Take a look at his second post on this thread.
Read this article for further explanation on how this approach works http://www.sqlservercentral.com/articles/Advanced+Querying/61716/
Or more aptly in this case, doesn't work. Because I need to start the searching over after the last row that matched the criteria, it completely breaks that method. I would dearly love to see someone get that to work, but no matter how much I change the thing around, the optimizer will not recursively:
A. Look for the highest row that has been flagged as a duplicate or
B. Use a local variable inside the search.
I've written that statement in a dozen different ways that were logically correct and should have solved it... but the problem is, the optimizer converts them all to derived tables that it generates before the update statement runs either off the local variable being 1 or the max duplicated rownumber in the table being NULL. It refuses to recursively run anything to take into account the last record that was a duplicate and search from there. It's actually driving me nuts.
October 27, 2008 at 7:27 pm
smunson (10/27/2008)
I continue to think that a complete statement of the problem coming from the OP has yet to occur. There is no written description of the EXACT process to follow for any given record within a group of A's. I'm not even sure the OP knows what he wants.Dan - PLEASE provide a COMPLETE description of EXACTLY how to handle ANY GIVEN RECORD within a set of A's, that will give the CORRECT RESULT, regardless of the input, and TAKES INTO ACCOUNT ALL POSSIBLE INPUTS. The description needs to be written, and it needs to leave absolutely NO ROOM for doubt about what the words mean. I'm thoroughly convinced at this point that this problem will not be solvable until we understand what you want in the EXACT SAME way that YOU understand it. We need to be able to read the description and come away without any questions. So far, every description you've provided has left any of a number of questions, and some of the behavior you're asking for in producing the correct result could only be a result of an inconsistent set of rules or some other undisclosed rule that we haven't seen you post as yet. Make NO assumptions, ok?
Steve
(aka smunson)
:):):)
Hi Steve,
I just thought that the images that I provided was clear enough to achieve my goal here but I guess i'm wrong. I just need two things on this:
A. Get the RowIDs of the interleaving rows by Code, meaning the rows where the Code column "breaks" to a new code, which doesn't include the first row in the table as bhovious have said.
B. And second, is to look into every group of Code 'A'. Get the interleaving rows of duplicates within that group but should start from every previous duplicate found. This is to separate the duplicate so that for every range of rowids within A, no duplicates would be present.
Lets say for example, I have this data inside Code 'A'.
SELECT 1 as RowID,'1' as RowKey,'en' as Locale,'A' as Code UNION ALL
SELECT 2,'1','zh','A' UNION ALL
SELECT 3,'1','zh','A' UNION ALL
SELECT 4,'2','en','A' UNION ALL
SELECT 5,'2','en','A' UNION ALL
SELECT 10,'1','zh','A' UNION ALL
SELECT 11,'1','zh','A'
The result of this would be:
RowIDs 3,5, and 11.
Notice that the range result of this example has unique data inside it, duplicates were separated.
The ranges produced in the result was RowIDs 1-2, 3-4, 5-10, and 11 in which for every range, there
are no duplicates since it was already divided/separated and that's the reason why I wan't to iteratively locate
the duplicates and always start from the previous duplicate found to find another one.
After that, I need to combined the result of A & B.
October 27, 2008 at 7:41 pm
What if there were an additional row in your example?
SELECT 6,'2','en','A' UNION ALL
Include 5? Include 6? Include 5 and 6 both?
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
October 27, 2008 at 8:04 pm
bhovious (10/27/2008)
What if there were an additional row in your example?SELECT 6,'2','en','A' UNION ALL
Include 5? Include 6? Include 5 and 6 both?
It should include both 5 & 6 since 5 is a dupe on 4 and 6 is a dupe on 5. Always start looking for another duplicate from the previous duplicate found.
October 28, 2008 at 12:09 am
Hi Dan
If you Check my last post your should be getting the correct results But need to change the last query before viewing results to below
INSERT #TempTable
SELECT 0,'1','en','C' UNION ALL
SELECT 1,'1','en','A' UNION ALL
SELECT 2,'1','en','A' UNION ALL
SELECT 5,'1','en','C' UNION ALL
SELECT 7,'1','en','A' UNION ALL
SELECT 8,'1','zh','A' UNION ALL
SELECT 10,'1','en','A' UNION ALL
SELECT 11,'1','zh','A' UNION ALL
SELECT 12,'1','zh','A' UNION ALL
SELECT 13,'1','zh','A' UNION ALL
SELECT 14,'1','zh','A' UNION ALL
SELECT 20,'1','en','D' UNION ALL
SELECT 21,'1','en','D'
GO
--Remove First Record in the table---------
Delete from tgt
From #Result as tgt
Inner Join(
Select Min(RowID) as RowID
From #Seq
)as seq
on seq.RowID = Tgt.RowID
------------------------------
Select * From #Result
]/code
October 28, 2008 at 12:29 am
The above change produces the following results
RowIDRowKeyLocaleCode
11enA
21enA
51enC
71enA
101enA
121zhA
131zhA
141zhA
201enD
Note
the first row in the table is removed RowId= 0
Rows 12, 13, 14 are all in the result set as required
Am I still missing something in the solution?
October 28, 2008 at 6:39 am
Thanks for the reply, I believe I understand now. Regardless of the "why", this was an interesting problem to look at. If Anziobake's solution still doesn't get it right, I'll give it another shot this evening.
It seems like a set-based solution should be possible, but no guarantees about the speed.
The final set should include the breaks from the Code Column and a subset of the "A" duplicates. If I'm on the right track, the definition of this subset eliminates any duplicates which were matched to a rowID which is less than the rowID of any other duplicate. Although this seems iterative, I still think someone can get there. I'm posting this thought because work has gotten busy and I may not be able to follow up tonight. Good hunting, Anzio and Garadin!!
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
October 28, 2008 at 6:41 am
Ok, so I've given up on the old method, and come up with this. Not set based :crying:. Hopefully I didn't completely bugger up my cursor syntax, I had to look it up several times. Dan, if you could, let us know how both of these solutions work time and consistency wise.
Warning to sensitive eyes, RBAR follows :hehe:.
[font="Courier New"]USE tempdb;
SET NOCOUNT ON;
GO
IF OBJECT_ID ('#TempTable','U') IS NOT NULL
BEGIN
DROP TABLE #TempTable
END
IF OBJECT_ID ('#Deleted','U') IS NOT NULL
BEGIN
DROP TABLE #Deleted
END
CREATE TABLE #TempTable
(
RowID INT PRIMARY KEY CLUSTERED,
RowKey VARCHAR(1000),
Locale VARCHAR(10),
Code CHAR(1),
GroupID INT NULL
)
GO
INSERT #TempTable(RowID, RowKey, Locale, Code)
SELECT 1,'1','en','C' UNION ALL
SELECT 2,'1','de','C' UNION ALL
SELECT 3,'1','zh','A' UNION ALL
SELECT 4,'2','en','A' UNION ALL
SELECT 5,'2','en','A' UNION ALL
SELECT 10,'1','zh','A' UNION ALL
SELECT 11,'1','zh','A' UNION ALL
SELECT 15,'1','zh','C' UNION ALL
SELECT 16,'1','de','C' UNION ALL
SELECT 20,'1','en','A' UNION ALL
SELECT 21,'3','en','A' UNION ALL
SELECT 22,'3','de','A' UNION ALL
SELECT 23,'3','en','A' UNION ALL
SELECT 24,'3','de','A' UNION ALL
SELECT 30,'1','en','C' UNION ALL
SELECT 31,'1','en','C'
DECLARE @GroupID INT,
@PrevCode CHAR(5),
@RowID INT,
@Locale VARCHAR(5),
@RowKey VARCHAR(1000),
@PrevGroupID INT
SET @PrevCode = (SELECT Code FROM #TempTable WHERE RowID = 1)
SET @GroupID = 1
-----------------------------------------------------------
-- Breaks your data out into groups so it can be scanned --
UPDATE #TempTable
SET @GROUPID = CASE WHEN Code <> @PrevCode THEN @GroupID + 1 ELSE @GroupID END,
GroupID = @GroupID,
@PrevCode = Code
FROM #TempTable WITH (INDEX(0))
-----------------------------------------------------------
DECLARE @CurRows TABLE (
RowKey VARCHAR(1000),
Locale VARCHAR(10))
CREATE TABLE #Deleted(
RowID INT)
DECLARE TTCursor CURSOR FORWARD_ONLY FOR
SELECT RowID, GroupID, RowKey, Locale
FROM #TempTable
WHERE Code = 'A'
ORDER BY RowID
OPEN TTCursor
FETCH NEXT FROM TTCursor INTO @RowID, @GroupID, @RowKey, @Locale
WHILE @@FETCH_STATUS > -1
BEGIN
IF EXISTS (SELECT * FROM @CurRows WHERE RowKey = @RowKey AND Locale = @Locale)
BEGIN
INSERT INTO #Deleted(RowID)
VALUES(@RowID)
DELETE FROM @CurRows
END
INSERT INTO @CurRows(RowKey, Locale) -- Delete all the rows from the table)
VALUES(@RowKey, @Locale) -- Repopulate the table with the current row (this
IF @GroupID <> @PrevGroupID
DELETE FROM @CurRows
SET @PrevGroupID = @GroupID
FETCH NEXT FROM TTCursor INTO @RowID, @GroupID, @RowKey, @Locale
END
CLOSE TTCursor
DEALLOCATE TTCursor
SELECT * FROM #Deleted -- See just the ones to be deleted from the cursor.
SELECT * FROM #TempTable T -- See all rows matching criteria.
LEFT JOIN #Deleted D ON T.RowID = D.RowID
LEFT JOIN (SELECT GroupID, MIN(RowID) FirstGroupRow FROM #TempTable WHERE GROUPID > 1 GROUP BY GroupID) T2 ON T.RowID = T2.FirstGroupRow
WHERE D.RowID IS NOT NULL OR T2.FirstGroupRow IS NOT NULL[/font]
Viewing 15 posts - 46 through 60 (of 115 total)
You must be logged in to reply to this topic. Login to reply