Finding Sequences (Need help ASAP)

Question

Finding Sequences (Need help ASAP)

Viewing 15 posts - 76 through 90 (of 91 total)

You must be logged in to reply to this topic. Login to reply

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 1

Paul White NZ (10/13/2010)

Craig Farrell (10/8/2010)
Craig,
You can't use joins in a Quirky Update.
Paul

Can't, or shouldn't? I did get the correct results here when I tested the code, with no errors. What's the possible pitfalls of using the join, besides having clustered index sorting problems?

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 2

Sorry Paul, that was a bit abrupt and you deserve better than that, your explainations are always concise and instructive.

I would agree in general that quirky update should not be joined under normal circumstances, but in this case it's a self join with only a single clustered index in control and the second join is a scan by nature. By using the self-join against a completely controlled object, the #tmp, the ordering doesn't falter as you might expect from other methods.

I've played with this technique for creating data islands (to steal Jeff's term, it's better then mine) in 2k and 2k5 (I admit to having little practical experience in 2k8 besides goofing off at home), and have never had an issue as long as I control the entire table environment locally. If I needed to do more selective row computations, or external tables, I would usually include more columns to the #tmp to pre-select the data before igniting the quirky update.

Now, I haven't done the deep root level analysis this would deserve as a dissertation, but my understanding is that the scan, and thus the merge join off the clustered scan, will not modify the ordering because the clustered scan from the primary input won't alter. However, if I had left in a second index with a flipped ordering and it decided to use that instead, I'd have been in trouble.

But, Jeff's come up with something I haven't thought of before and haven't had time to puzzle through and grok it, and it seems more efficient for creating data islands. I may have to switch my methodology. 🙂

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Paul White SSC Guru Points: 150468 More actions · Answer 3

Craig Farrell (10/14/2010)
Sorry Paul, that was a bit abrupt and you deserve better than that, your explainations are always concise and instructive.

No need to apologise. My reply was shorter than usual because I was pressed for time and just scanned this thread while responding to another of Craig's threads.

The Quirky Update has a number of rules (as you know) and one of the main ones is that there cannot be a join operation. It is frequently possible to achieve the desired 'running total' results while breaking one or more QU 'rules', but I think it puts Jeff in an awkward position: Someone else may see your code, adapt it for their own use, and then blame the QU when it all blows up horribly next week.

I think it's only fair that if one uses the QU method, one should stick to the rules. If you do stray from those, or otherwise modify the method, it might be best not to call it Quirky Update, or at least put a very clear disclaimer on it.

My personal preference is not to use QU, but if I did, I would use the 'safety mechanism' I suggested (see the discussion on Jeff's QU article) to ensure that if the QU assumptions do not hold, a runtime error occurs and no data is changed.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 4

Paul White NZ (10/14/2010)
It is frequently possible to achieve the desired 'running total' results while breaking one or more QU 'rules', but I think it puts Jeff in an awkward position: Someone else may see your code, adapt it for their own use, and then blame the QU when it all blows up horribly next week.
I think it's only fair that if one uses the QU method, one should stick to the rules. If you do stray from those, or otherwise modify the method, it might be best not to call it Quirky Update, or at least put a very clear disclaimer on it.

Part of the issue, I guess, is that I don't see the QU as being Jeff's method, as I was using it long before I ever saw his article, so I'm not sure how that puts Jeff in an awkward position. Mind, I didn't know it under the name "Quirky Update", and that's a simple thing to fix by not using that descriptive name again so Jeff doesn't get associated with some of my wilder, not-so-safe-if-not-done-just-so, coding mechanics. 🙂

I still do a lot of coding back against 2k, due to politics, cheap managers, and any other number of reasons, so I still use the serial update (Another name I know it as. :w00t: ) a lot when there's no other easy outs besides while or cursor loops.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Paul White SSC Guru Points: 150468 More actions · Answer 5

Craig Farrell (10/14/2010)
Part of the issue, I guess, is that I don't see the QU as being Jeff's method, as I was using it long before I ever saw his article, so I'm not sure how that puts Jeff in an awkward position. Mind, I didn't know it under the name "Quirky Update", and that's a simple thing to fix by not using that descriptive name again so Jeff doesn't get associated with some of my wilder, not-so-safe-if-not-done-just-so, coding mechanics. 🙂

Ah well that's different then - sorry, I didn't know the history there.

I still do a lot of coding back against 2k, due to politics, cheap managers, and any other number of reasons, so I still use the serial update (Another name I know it as. :w00t: ) a lot when there's no other easy outs besides while or cursor loops.

I see. Yes SQL Server 2000 is a bit tricky - no SQLCLR (my favourite running-totals tool) and Hugo Kornelis' fine Set-Based Iteration method uses a ranking function, so that's out too.

For 2005 onward, consider adding the 'safety code' I mentioned - you can find it here: http://www.sqlservercentral.com/Forums/FindPost980118.aspx

One more interesting thing about the 'serial update': if you do a loops join to the inserted or deleted pseudo-tables (inside a trigger, obviously) you'll find the running-total runs exactly backwards. There are other instances like this where side-effects of engine implementation details can trip the method up when a join is involved. Just FYI.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 6

Paul White NZ (10/14/2010)
For 2005 onward, consider adding the 'safety code' I mentioned - you can find it here: http://www.sqlservercentral.com/Forums/FindPost980118.aspx

Well, far be it from me to turn down a damned good idea. 🙂 I'll do my best to incorporate that going forward. It's a nice safety test to make sure nothing goes sideways.

One more interesting thing about the 'serial update': if you do a loops join to the inserted or deleted pseudo-tables (inside a trigger, obviously) you'll find the running-total runs exactly backwards. There are other instances like this where side-effects of engine implementation details can trip the method up when a join is involved. Just FYI.

I've never actually needed to do that in a trigger, so I've never ran into that. That's... odd. Do you know if it still occurs if you force a merge join instead between the inserted/deleted, or is that only when you're running against exterior tables to the trigger?

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Paul White SSC Guru Points: 150468 More actions · Answer 7

Craig Farrell (10/14/2010)
I've never actually needed to do that in a trigger, so I've never ran into that. That's... odd. Do you know if it still occurs if you force a merge join instead between the inserted/deleted, or is that only when you're running against exterior tables to the trigger?

As I recall, it was a join to the 'external' table. It was a while ago so I can't pretend to remember all the details.

BTW I'm finishing up a SQLCLR solution to the problem on this thread (original and Jeff's test rig), which is astonishingly fast. I'll post it up later today for anyone that's interested.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

rishabh.upkr Grasshopper Points: 17 More actions · Answer 8

Your range is static

means %%%%%%%%%%0001-%%%%%%%%%%003

or it will change on some condition

Jeff Moden SSC Guru Points: 1004403 More actions · Answer 9

rishabh.upkr (10/14/2010)
Your range is static
means %%%%%%%%%%0001-%%%%%%%%%%003
or it will change on some condition

Can you explain what your talking about just a bit more?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004403 More actions · Answer 10

Craig Farrell (10/14/2010)
Paul White NZ (10/14/2010)
For 2005 onward, consider adding the 'safety code' I mentioned - you can find it here: http://www.sqlservercentral.com/Forums/FindPost980118.aspx
Well, far be it from me to turn down a damned good idea. 🙂 I'll do my best to incorporate that going forward. It's a nice safety test to make sure nothing goes sideways.

It's not only a damned fine safety test that's more effecient than any other way of checking that I've seen, it also forces runs that may otherwise fail into running because of the requirements it places on the clustered index. It's a bit like a super index hint without the slowdown of using one of the two clustered index hints (Index 0 or 1).

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004403 More actions · Answer 11

Craig Farrell (10/14/2010)
Paul White NZ (10/14/2010)
It is frequently possible to achieve the desired 'running total' results while breaking one or more QU 'rules', but I think it puts Jeff in an awkward position: Someone else may see your code, adapt it for their own use, and then blame the QU when it all blows up horribly next week.
I think it's only fair that if one uses the QU method, one should stick to the rules. If you do stray from those, or otherwise modify the method, it might be best not to call it Quirky Update, or at least put a very clear disclaimer on it.
Part of the issue, I guess, is that I don't see the QU as being Jeff's method, as I was using it long before I ever saw his article, so I'm not sure how that puts Jeff in an awkward position. Mind, I didn't know it under the name "Quirky Update", and that's a simple thing to fix by not using that descriptive name again so Jeff doesn't get associated with some of my wilder, not-so-safe-if-not-done-just-so, coding mechanics. 🙂
I still do a lot of coding back against 2k, due to politics, cheap managers, and any other number of reasons, so I still use the serial update (Another name I know it as. :w00t: ) a lot when there's no other easy outs besides while or cursor loops.

Heh, heh.... Man, I love it and I'm right there with you. Like I said in the article, the QU isn't my idea. People have been using it in Sybase since before SQL Server was a gleam in MS's eye. I just ended up writing about it in a very pro QU way and that's why I take a lot of heat about it.

Can I take it that some of your "wilder, not-so-safe-if-not-done-just-so, coding mechanics" still hasn't caused you a problem with the QU? It's amazing how well things work when you're trying to make something work instead of writing code to make something break. 😀

Mind, I didn't know it under the name "Quirky Update"...

Me either. I didn't coin the phrase "Quirky Update". Robyn Page did that on the "Simple Talk" forum. I used to just call it a "running total update" but it confused people when I used it to describe a process that wasn't actually a running total so I started using the term "Quirky Update".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004403 More actions · Answer 12

CELKO (10/14/2010)
Jeff Moden (10/13/2010)
Very cool, Joe. I'm thinking that you didn't test it...
[font="Courier New"]Msg 207, Level 16, State 1, Line 18
Invalid column name 'part_id'.
Msg 245, Level 16, State 1, Line 2
Conversion failed when converting the varchar value 'NULL' to data type int. [/font]
... but it runs quite fast. The other problem is that your code misses the 50,000 to 60,000 single entry because you only check the first column of the data, but I think there might be a workaround for that.
Here is my sample data set. It runs fine for me:
CREATE TABLE Gaps (part_id VARCHAR(20));
INSERT INTO Gaps
VALUES
('PPCI000000001' ),
('PPCI000000002' ),
('PPCI000000003' ),
('PPCI000000005' ),
('PPCI000000010' ),
('PPCI000000011' ),
('PPCI000000012' ),
('PPCI000000013' ),
('PPCI000000014' ),
('PPCI000000015' ),
('PPCI000000016' ),
('PPCI000000017' ),
('PPCI000000100' ),
('PPCI000000101' ),
('PPCI000000102' ),
('PPCI000000103' ),
('PPCI000000104' ),
('PPCI000000105' ),
('PPCI000000106' ),
('PPCI000000167' ),
('PPCI000000197'),
('XXX000000001' ),
('XXX000000002' ),
('XXX000000003' ),
('XXX000000005' ),
('XXX000000010' ),
('XXX000000011' ),
('XXX000000012' ),
('XXX000000013' ),
('XXX000000014' ),
('XXX000000015' ),
('XXX000000016' ),
('XXX000000017' ),
('XXX000000100' ),
('XXX000000101' ),
('XXX000000102' ),
('XXX000000103' ),
('XXX000000104' ),
('XXX000000105' ),
('XXX000000106' ),
('XXX000000167' ),
('XXX000000197');
WITH
Splits
AS
(SELECT part_id, -- nest REPLACEs inside each other
(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(
REPLACE (part_id,'0',''),
'1',''),
'2',''),'3',''),'4',''),'5',''),
'6',''),'7',''),'8',''),'9','')) AS alpha_prefix,
CAST (REPLACE (REPLACE (part_id, 'XXX',''), 'PPCI', '') AS INTEGER) AS num_postfix
FROM Gaps),
Sequenced
AS
(SELECT part_id, alpha_prefix, num_postfix,
(num_postfix
- ROW_NUMBER() OVER (PARTITION BY alpha_prefix ORDER BY num_postfix)
) AS seq_in_part_grp
FROM Splits)
SELECT alpha_prefix, MIN(num_postfix) AS run_start, MAX(num_postfix) AS run_end
FROM Sequenced
GROUP BY alpha_prefix, seq_in_part_grp
ORDER BY alpha_prefix, seq_in_part_grp;

Yep.... runs fine if you have the right column names. The original test data folks provided didn't use the same column names you did and you didn't provide any test data until now.

You're example also doesn't solve the original problem although I think the way it does solve the problem it solves is very, very cool.

Thanks for the feedback, Joe.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 13

The SQLCLR test results are in:

Craig's test data (267,150 rows): 360ms

Jeff's test data (889,892 rows): 1386ms

Test rig and csv data for both runs attached, along with C# source code.

This SQLCLR solution uses Adam Machanic's Query Parallelizer engine, which you can find here: http://sqlblog.com/files/folders/beta/entry29021.aspx

The solution is 2008-only in this version.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

WayneS SSC Guru Points: 95460 More actions · Answer 14

Paul White NZ (10/17/2010)
The SQLCLR test results are in:
Craig's test data (267,150 rows): 360ms
Jeff's test data (889,892 rows): 1386ms
Test rig and csv data for both runs attached, along with C# source code.
This SQLCLR solution uses Adam Machanic's Query Parallelizer engine, which you can find here: http://sqlblog.com/files/folders/beta/entry29021.aspx
The solution is 2008-only in this version.
Paul

Paul, you must be slipping... those times seem a bit slow for your clr code.:-D:w00t::-P

What makes this CLR code 2008 specific?

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Paul White SSC Guru Points: 150468 More actions · Answer 15

WayneS (10/17/2010)
Paul, you must be slipping... those times seem a bit slow for your clr code...

To put the timings in context, the T-SQL solutions ran for about 30-45 seconds on the same hardware. The major part of the time taken by the SQLCLR code is actually taken by T-SQL: reading the data from the source tables 😛

WayneS (10/17/2010)
What makes this CLR code 2008 specific?

It's a side effect of my decision to code the test rig to run in tempdb. If anyone has a burning need to try it out on 2005, please let me know, and I'll see what I can do. The client assembly will also need to be catalogued as UNSAFE for 2005 though, which isn't ideal, but hey who's still using 2005 anyway :laugh:

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi