November 3, 2008 at 3:19 pm
smunson (11/3/2008)
Just a thought, but why would a set of data that starts with rows 1 thru 3 as duplicates, having only rows 2 and 3 be marked, and yet, if row 4 were C and then rows 5 thru 7 were 'A' dupes, all 3 of those dupes would be marked? I'm still trying to concoct ANY kind of scenario where that kind of lack of consistency is an essential part of anything other than some bizarre logic puzzle. At this point, I can't even remember if you have to identify a break between a C and a D, and your example didn't specify on that either.
Steve, 5 would be marked as a break from C back to A, not as a dup.
I was surprised to see the sudden burst of traffic on this. It's very tempting to unsubscribe here, but I'm going to stick around a while longer hoping to hear solid confirmation that one of the solutions was correct. I don't think we'll ever get a solid explanation of the "why" behind this pretzel logic. However, we could solve the issue of performance among using the million-row generation.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
November 3, 2008 at 5:46 pm
Heh... I wanna know who ever said that "set based" meant "everything done in one query... 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
November 3, 2008 at 11:23 pm
HI.
I too, would Love to know why you would do this, but I had more fun writing the solution that performs reasonably well on a million rows. Maybe it is just a "problem" with sites like these....there are people on here because they SQL is the bigger part of their daily jobs and then...
there are those who have a problem.
I get the impression the OP was using non-DB logic to solve a problem and got stuck in that, and therefore did not know how to explain the overall problem, just the step he got stuck at.
my 0.02 ZAR
November 3, 2008 at 11:24 pm
HI.
I too, would Love to know why you would do this, but I had more fun writing the solution that performs reasonably well on a million rows. Maybe it is just a "problem" with sites like these....there are people on here because they SQL is the bigger part of their daily jobs and then...
there are those who have a problem.
I get the impression the OP was using non-DB logic to solve a problem and got stuck in that, and therefore did not know how to explain the overall problem, just the step he got stuck at.
BTW Jeff, where you able to use the random data generator?
my 0.02 ZAR
November 4, 2008 at 12:43 am
AnzioBake (11/3/2008)
BTW Jeff, where you able to use the random data generator?
Yep... made a couple of changes to it to make it run just a little faster... give it a try...
SELECT TOP 1000000
IDENTITY(INT,1,1) AS RowID,
ABS(CHECKSUM(NEWID()))%2+1 AS RowKey,
CASE WHEN ABS(CHECKSUM(NEWID()))%30 < 16 THEN 'EN'
WHEN ABS(CHECKSUM(NEWID()))%30 > 24 THEN 'ZH'
ELSE 'DE'
END AS Locale,
CASE WHEN ABS(CHECKSUM(NEWID()))%10 < 7 THEN 'A'
ELSE 'C'
END AS Code
INTO dbo.SSCData
FROM Master.dbo.SysComments sc1
CROSS JOIN Master.dbo.SysComments sc2
Sorry I haven't spent much time on this post, yet... I'll be back. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
November 4, 2008 at 1:02 am
Nice....That is an interesting way to get a random number i.e. newID combined with checksum()....
I like it
November 4, 2008 at 5:06 am
You can thank Matt Miller for that little gem. I used to use RAND(NEWID), but CHECKSUM is quite a bit faster. Either way, don't need a loop because NEWID does provide a random number. You just use CHECKSUM or RAND to do a conversion. CHECKSUM just happens to return an INT which makes it scream.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 4, 2008 at 10:04 am
Just beware that CHECKSUM can return a negative value.
Use ABS(CHECKSUM(NEWID())) to get positive random values.
N 56°04'39.16"
E 12°55'05.25"
November 4, 2008 at 5:32 pm
Heh... thanks, Peter... I put that in my code but forgot to mention it.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 6, 2008 at 8:18 am
If anyone does do any testing of these methods, please post the results to this thread. I'm curious as to the outcome of this. Otherwise, I'll try to get something together this weekend.
November 6, 2008 at 12:16 pm
Hey Seth,
Work got busy so I had to leave this behind. Please let us know if you pursue it. I don't expect my cte-on-cte solution to be the winner, but the update approach should run quicker than the while loop.
Let me ask a question about set-based solutions based on a comment Jeff made earlier. Although doing everything in one query may not be the fastest solution to a problem, isn't it theoretically possible to use a single query (with subqueries and/or ctes) to solve a problem that has a set-based solution?
Bob
P.S. Thanks to all for ABS(CHECKSUM(NEWID())). That one went immediately to the utility belt.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
Viewing 11 posts - 106 through 115 (of 115 total)
You must be logged in to reply to this topic. Login to reply