May 7, 2009 at 8:44 pm
I'm back. I have gone off and done my homework, hopefully, so let's restart this discussion, and I'll post my code and some test data. I have learned much while reading about how to post here!
I have a stored proc that was originally used to score one person's responses from a quiz against a collection of scoring items, sometimes more than one collection.
It was used sporadically and performance was not a concern. Now the client wants to be able to run dozens, maybe hundreds, of persons against dozens of scoring items. So my original techniques of using cursors and of passing data in by text strings are now sorely in question.
As an example, the following call to my stored proc performs scoring:
EXEC [GetKTAnalysis_Summary] '5,48,61', '1698,1,BUS,00,41,1698,1,BUS,01,49,1698,1,BUS,02,19,1698,1,BUS,03,33,1698,1,BUS,04,51,1698,1,BUS,05,59,1698,1,BUS,06,33,1698,1,BUS,07,45,1698,1,CPS,00,55,1698,1,CPS,01,65,1698,1,CPS,02,45,1698,1,CPS,03,80,1698,1,CPS,04,80,1698,1,CPS,05,55,1698,1,CPS,06,65,1698,1,CPS,07,55,1698,1,CPS,08,10,1698,1,CPS,09,65,1698,1,CPS,10,40,1698,1,CPS,11,50,1698,1,CPS,12,55,1698,1,EXP,00,53,1698,1,EXP,01,58,1698,1,EXP,02,45,1698,1,EXP,03,61,1698,1,EXP,04,49,1698,1,HRL,00,50,1698,1,HRL,01,35,1698,1,HRL,02,70,1698,1,HRL,03,80,1698,1,HRL,04,65,1698,1,HRL,05,45,1698,1,HRL,06,40,1698,1,HRL,07,55,1698,1,HRL,08,35,1698,1,HRL,09,65,1698,1,HRL,10,50,1698,1,HRL,11,55,1698,1,HRL,12,10,1698,1,INA,00,90,1698,1,INA,01,90,1698,1,INV,00,38,1698,1,INV,01,35,1698,1,INV,02,27,1698,1,INV,03,49,1698,1,INV,04,27,1698,1,INV,05,51,1698,1,LFS,00,54,1698,1,LFS,01,70,1698,1,LFS,02,45,1698,1,LFS,03,70,1698,1,LFS,04,45,1698,1,LFS,05,20,1698,1,LFS,06,45,1698,1,LFS,07,60,1698,1,LFS,08,55,1698,1,LFS,09,55,1698,1,LFS,10,80,1698,1,MVF,00,57,1698,1,MVF,01,75,1698,1,MVF,02,40,1698,1,MVF,03,35,1698,1,MVF,04,70,1698,1,MVF,05,25,1698,1,MVF,06,90,1698,1,MVF,07,65,1698,1,MVF,08,35,1698,1,MVF,09,80,1698,1,PPL,00,57,1698,1,PPL,01,57,1698,1,PPL,02,54,1698,1,PPL,03,53,1698,1,PPL,04,66,1698,1,PPL,05,63,1698,1,PPL,06,51,1698,1,PSS,00,52,1698,1,PSS,01,35,1698,1,PSS,02,45,1698,1,PSS,03,45,1698,1,PSS,04,65,1698,1,PSS,05,45,1698,1,PSS,06,90,1698,1,PSS,07,70,1698,1,PSS,08,75,1698,1,PSS,09,5,1698,1,PSS,10,45,1698,1,SAS,00,56,1698,1,SAS,01,55,1698,1,SAS,02,65,1698,1,SAS,03,40,1698,1,SAS,04,25,1698,1,SAS,05,55,1698,1,SAS,06,55,1698,1,SAS,07,55,1698,1,SAS,08,70,1698,1,SAS,09,70,1698,1,SAS,10,70,1698,1,SVC,00,50,1698,1,SVC,01,65,1698,1,SVC,02,65,1698,1,SVC,03,55,1698,1,SVC,04,61,1698,1,SVC,05,18,1698,1,SVC,06,40,1698,1,SVC,07,47,1698,1,SVC,08,45,1698,1,TRD,00,45,1698,1,TRD,01,61,1698,1,TRD,02,40,1698,1,TRD,03,42,1698,1,TRD,04,28,1698,1,TRD,05,53,1698,1,VCI,00,56,1698,1,VCI,01,35,1698,1,VCI,02,55,1698,1,VCI,03,70,1698,1,VCI,04,45,1698,1,VCI,05,65,1698,1,VCI,06,65,1698,1,VCI,07,35,1698,1,VCI,08,99,1698,1,VCI,09,35,1698,1,VCI,10,55,1698,1,VLS,00,60,1698,1,VLS,01,65,1698,1,VLS,02,70,1698,1,VLS,03,45,1698,1,WKH,00,57,1698,1,WKH,01,65,1698,1,WKH,02,65,1698,1,WKH,03,45,1698,1,WKH,04,45,1698,1,WKH,05,70,1698,1,WKH,06,65,1698,1,WKH,07,50,1698,1,WKH,08,25,1698,1,WKH,09,80'
And returns the scoring results:
RespondentNumberKey_Trait_NumberKey_Trait_Name SubScore DiscriminatorCareerFitPredictor
1698 61R07 Social Worker 590-950
1698 4805 Sales Rep, Consumer54-5-1633
1698 533 Franchise, Retail Owner47-10-2017
Notice that the scoring groups I wanted to score against are 5, 48 and 61, and that I am scoring for respondent 1698. All respondents have exactly 137 scores.
I want the same results for 1698 and similar results for any number of others I pass in scores for, but I think my lame approach in the stored proc I will post next could be much improved into a set-based approach.
The scoring has a specific set of rules, as you might see in the stored proc, and if further explanation is needed we might be into this too far, but I'm willing.
In my next post I will post some test data and the existing stored proc.
Thank in advance.
May 7, 2009 at 8:50 pm
Here is the test data setup using an empty db called testing, and the stored proc.
(Data moved to attched files in another post)
May 7, 2009 at 9:49 pm
Dude. This code post is really big and is causing the page to hang for me (and others, I presume)
Could you edit your post and move the code into an attached text file?
Thanks,
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
May 8, 2009 at 6:28 am
Here are the data as attached files...
May 8, 2009 at 10:32 am
Thanks Phil.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
May 8, 2009 at 11:18 am
I'm continuing to work on this while awaiting any performance or other advice I might receive, and I now have some specific questions, observations:
-I am now in a position to retieve the scores for the respondents by reaching out to another database, so that solves one of the big parsing issues (I can replace parameter @Allscores with a parameter merely listing the respondents I wish to do). Question - is it reasonable to try to create a table variable in code (ASP.Net) and Pss it in to the stored proc? My question is, I have a list of integers that I want to pass to the stored proc, what's the best way?
--As a test I ran around 2900 respondents against 50 traits. Using cursors in the stored proc, it took 27 minutes on my desktop machine (2.6GHz, 2Gb XP). I don't need subsecond response time.. maybe this is good enough, I dunno, we're going to try to batch the analyses up and run them overnight on the web server/database server.
--In the cursors, for reasons of the way I wrote it, in line 472 there is a statement SELECT TOP 400000 * FROM --was 3000
which selects from a derived table. SQL requires that I put the TOP part in, but the truth is I want all of them. In my test, they were cut off at 400000. Is there another way to write this? I guess that's the question about the whole thing.
Thanks, people. BTW, I know my code is probably amateurish, so please be kind!
May 8, 2009 at 3:03 pm
SELECT TOP 100 PERCENT * FROM --was 3000
--Jeff Moden
Change is inevitable... Change for the better is not.
May 8, 2009 at 3:24 pm
PhilM99 (5/8/2009)
Here are the data as attached files...
Those files look pretty good. Instead of me trying to weed through this whole post, care to "nut shell" the problem you're trying to solve here with references to the data file? Thanks.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 8, 2009 at 4:51 pm
D'oh, as Homer Simpson would say. I did not know about the Top 100%. Seems redundant, but if it works...
In a nutshell, I am tasked with rewriting a stored proc I wrote a few years ago. When it was used sporadically online, it was used the measure one respondent against typically one or two scoring tables. Performance was not a concern, and at my then-level of skill I freely used cursors and looped through them. Now I need performance, since they may throw hundreds of respondents against dozens of scoring tables, and I understand cursors really suck for performance.
Preparing for posting this problem has been revealing to me. First, now I greatly understand the problem better.
I may not need the help I asked for, but I do have questions, please read on.
First, I fixed the awkward parsing of a CSV parameter list by reading it in directly from another database. A few years ago I did not know you could do that.
Secondly, I noted that the original proc was multi purpose and produced intermediate output that went into a cursor. The cursor could be read multiple times or ways depending on the level of detail required... but now I only need summary data.
So my current strategy is to eliminate all cursors by creating nested derived tables. The scoring logic for a particular score and respondent is at the centre with a joining SELECT. I eliminate all the extraneous columns not needed, to keep the performance up.
Then I use that as a derived table and and sum the logic for a particular respondent and score, grouping as required.
But is using a derived table more efficient that looping through a cursor? or is that too general a question?
Thirdly, I still want to use a table variable to stash intermediate results. Is that efficient, or should I nest one more level of derived tables?
Fourthly, what's an efficent way of handing a stored proc a long list of integers (think array in programming terms)?
I'm still cranking on this....
Thanks
May 8, 2009 at 7:37 pm
Yep... the TOP 100 PERCENT thing works just fine although a whole lot of people will tell you
that order shouldn't matter in most processing. Except for things like running totals and other
previous row drills, order shouldn't matter at all.
You speak a lot of "derived tables" and, without knowing what the business logic behind what
you're trying to do is, I have to admit just a bit of concern. "Set Based" doesn't necessarily
mean "doing it all in one query". I think you understand that a bit because of your reference
to a table variable (heh, don't really like those for a multitude of reasons) but, let me just say
that "Divide'n'Conquer" goes a really long way to improving performance over a complicated
task. I prefer to use Temp Tables over Table Variables because they "persist" in QA and SSMS.
In other words, you don't have to run the whole proc to test "the next section" of code you
write because the Temp Table will stay active so long as you keep the QA or SSMS editor
window open.
And, no... Table Variables are not "memory only" and Temp Tables are not "disk only". Both
start out in memory and will stay there if they fit. Both will spool onto TempDB on disk if
they don't. Temp Tables will usually have the advantage because stats will develop for them.
Stats will never develop for Table Variables and execution plans will show only a single
row for table variables.
Fourthly, what's an efficent way of handing a stored proc a long list of integers
(think array in programming terms)?
There're a couple of ways. But, let me ask, where is the long list of integers coming from?
It could make a huge difference.
I'm still cranking on this....
I've noticed... and that's why I don't mind helping if I can. I admire this type of tenacity and
you've not asked for someone to rewrite all your code for you. You keep hammering away at
it yourself and that's going to make you and even stronger SQL developer.
If you get stuck with how to resolve a particular business rule or task, please don't hesitate
to ask.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 9, 2009 at 11:27 am
Thanks for the encouragement.
I have attached a greatly improved version of the stored proc.
I think as you develop technical skills in a product there are typical stages including: what is it, how can I make this work, aha! now I get it, wow look at all the things I can do, oh boy, I know nothing about performance, and finally I can do a lot of things in a performant way. I'm still in the middle.
I get my integers from an ASP.Net application where an administrative user is selecting people from a long list of projects (they might select an entire project full of people, or individually add some from a project). The integers represent the person (respondents table and this allows me to look up their scores in a related table in another db).
In the past I have passed them in using a CSV list (the attached proc still does that). I've been reluctant to try and devise a 'request list' or something like that because of the concurrency (there can be more than one administrator) but I think I need to do this now, because now I have to collect the request and run it batch later (where do I ask about that?).
Please look at the simplified proc attached and let me know if there's anything dramatically wrong or improveable.
Some minor performance info:
Unimproved proc: 9700 respondents, 50 traits, 489150 rows returned, 29 minutes. (34.8 sec /KT)
Improved proc: 9700 respondents, 50 traits, 489150 rows returned, 17 minutes. (20.4 sec /KT)
Improved proc: 9700 respondents, 663 traits, 6486129 rows returned, 2hr 42 min. (14.7 sec /KT)
The last one is way more than I will ever need to do at once...
So it's better, but not terribly fast on my dev machine.
May 9, 2009 at 7:44 pm
Sorry, Phil... the proc you attached came out as one huge line. I won't even start to look at something like that.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 9, 2009 at 8:24 pm
It formatted fine for me?
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
May 9, 2009 at 8:28 pm
OK, I tried downloading it and then opening it in notepad (in addition to my usual of opening it in another Firefox tab). Both of these came out fine for me, Jeff.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
May 9, 2009 at 10:31 pm
PhilM99 (5/7/2009)
I'm back. I have gone off and done my homework, hopefully, so let's restart this discussion, and I'll post my code and some test data. I have learned much while reading about how to post here!I have a stored proc that was originally used to score one person's responses from a quiz against a collection of scoring items, sometimes more than one collection.
It was used sporadically and performance was not a concern. Now the client wants to be able to run dozens, maybe hundreds, of persons against dozens of scoring items. So my original techniques of using cursors and of passing data in by text strings are now sorely in question.
As an example, the following call to my stored proc performs scoring:
EXEC [GetKTAnalysis_Summary] '5,48,61', '1698,1,BUS,00,41,1698,1,BUS,01,49,1698,1,BUS,02,19,1698,1,BUS,03,33,...'
Phil, I have been going through this today, trying to understand it, but it is very complex and there are a lot of mysterious things.
One of them is this command line, which seems entirely too long to be pracitcal as a command statement.
How are you execiuting this command line? Through what facilitiy? Do you have a client application that actually executes it for you or are you actually typing this in through SSMS?
Also, the final parameter seems to be a (large) table of data. Where is this data coming from and why isn't it already in a table somewhere? And how do you get it and get it into this form if it is not already in a table?
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
Viewing 15 posts - 16 through 30 (of 40 total)
You must be logged in to reply to this topic. Login to reply