Matrix multiplication without cursors?

  • I'm back. I have gone off and done my homework, hopefully, so let's restart this discussion, and I'll post my code and some test data. I have learned much while reading about how to post here!

    I have a stored proc that was originally used to score one person's responses from a quiz against a collection of scoring items, sometimes more than one collection.

    It was used sporadically and performance was not a concern. Now the client wants to be able to run dozens, maybe hundreds, of persons against dozens of scoring items. So my original techniques of using cursors and of passing data in by text strings are now sorely in question.

    As an example, the following call to my stored proc performs scoring:

    EXEC [GetKTAnalysis_Summary] '5,48,61', '1698,1,BUS,00,41,1698,1,BUS,01,49,1698,1,BUS,02,19,1698,1,BUS,03,33,1698,1,BUS,04,51,1698,1,BUS,05,59,1698,1,BUS,06,33,1698,1,BUS,07,45,1698,1,CPS,00,55,1698,1,CPS,01,65,1698,1,CPS,02,45,1698,1,CPS,03,80,1698,1,CPS,04,80,1698,1,CPS,05,55,1698,1,CPS,06,65,1698,1,CPS,07,55,1698,1,CPS,08,10,1698,1,CPS,09,65,1698,1,CPS,10,40,1698,1,CPS,11,50,1698,1,CPS,12,55,1698,1,EXP,00,53,1698,1,EXP,01,58,1698,1,EXP,02,45,1698,1,EXP,03,61,1698,1,EXP,04,49,1698,1,HRL,00,50,1698,1,HRL,01,35,1698,1,HRL,02,70,1698,1,HRL,03,80,1698,1,HRL,04,65,1698,1,HRL,05,45,1698,1,HRL,06,40,1698,1,HRL,07,55,1698,1,HRL,08,35,1698,1,HRL,09,65,1698,1,HRL,10,50,1698,1,HRL,11,55,1698,1,HRL,12,10,1698,1,INA,00,90,1698,1,INA,01,90,1698,1,INV,00,38,1698,1,INV,01,35,1698,1,INV,02,27,1698,1,INV,03,49,1698,1,INV,04,27,1698,1,INV,05,51,1698,1,LFS,00,54,1698,1,LFS,01,70,1698,1,LFS,02,45,1698,1,LFS,03,70,1698,1,LFS,04,45,1698,1,LFS,05,20,1698,1,LFS,06,45,1698,1,LFS,07,60,1698,1,LFS,08,55,1698,1,LFS,09,55,1698,1,LFS,10,80,1698,1,MVF,00,57,1698,1,MVF,01,75,1698,1,MVF,02,40,1698,1,MVF,03,35,1698,1,MVF,04,70,1698,1,MVF,05,25,1698,1,MVF,06,90,1698,1,MVF,07,65,1698,1,MVF,08,35,1698,1,MVF,09,80,1698,1,PPL,00,57,1698,1,PPL,01,57,1698,1,PPL,02,54,1698,1,PPL,03,53,1698,1,PPL,04,66,1698,1,PPL,05,63,1698,1,PPL,06,51,1698,1,PSS,00,52,1698,1,PSS,01,35,1698,1,PSS,02,45,1698,1,PSS,03,45,1698,1,PSS,04,65,1698,1,PSS,05,45,1698,1,PSS,06,90,1698,1,PSS,07,70,1698,1,PSS,08,75,1698,1,PSS,09,5,1698,1,PSS,10,45,1698,1,SAS,00,56,1698,1,SAS,01,55,1698,1,SAS,02,65,1698,1,SAS,03,40,1698,1,SAS,04,25,1698,1,SAS,05,55,1698,1,SAS,06,55,1698,1,SAS,07,55,1698,1,SAS,08,70,1698,1,SAS,09,70,1698,1,SAS,10,70,1698,1,SVC,00,50,1698,1,SVC,01,65,1698,1,SVC,02,65,1698,1,SVC,03,55,1698,1,SVC,04,61,1698,1,SVC,05,18,1698,1,SVC,06,40,1698,1,SVC,07,47,1698,1,SVC,08,45,1698,1,TRD,00,45,1698,1,TRD,01,61,1698,1,TRD,02,40,1698,1,TRD,03,42,1698,1,TRD,04,28,1698,1,TRD,05,53,1698,1,VCI,00,56,1698,1,VCI,01,35,1698,1,VCI,02,55,1698,1,VCI,03,70,1698,1,VCI,04,45,1698,1,VCI,05,65,1698,1,VCI,06,65,1698,1,VCI,07,35,1698,1,VCI,08,99,1698,1,VCI,09,35,1698,1,VCI,10,55,1698,1,VLS,00,60,1698,1,VLS,01,65,1698,1,VLS,02,70,1698,1,VLS,03,45,1698,1,WKH,00,57,1698,1,WKH,01,65,1698,1,WKH,02,65,1698,1,WKH,03,45,1698,1,WKH,04,45,1698,1,WKH,05,70,1698,1,WKH,06,65,1698,1,WKH,07,50,1698,1,WKH,08,25,1698,1,WKH,09,80'

    And returns the scoring results:

    RespondentNumberKey_Trait_NumberKey_Trait_Name SubScore DiscriminatorCareerFitPredictor

    1698 61R07 Social Worker 590-950

    1698 4805 Sales Rep, Consumer54-5-1633

    1698 533 Franchise, Retail Owner47-10-2017

    Notice that the scoring groups I wanted to score against are 5, 48 and 61, and that I am scoring for respondent 1698. All respondents have exactly 137 scores.

    I want the same results for 1698 and similar results for any number of others I pass in scores for, but I think my lame approach in the stored proc I will post next could be much improved into a set-based approach.

    The scoring has a specific set of rules, as you might see in the stored proc, and if further explanation is needed we might be into this too far, but I'm willing.

    In my next post I will post some test data and the existing stored proc.

    Thank in advance.

  • Here is the test data setup using an empty db called testing, and the stored proc.

    (Data moved to attched files in another post)

  • Dude. This code post is really big and is causing the page to hang for me (and others, I presume)

    Could you edit your post and move the code into an attached text file?

    Thanks,

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Here are the data as attached files...

  • Thanks Phil.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I'm continuing to work on this while awaiting any performance or other advice I might receive, and I now have some specific questions, observations:

    -I am now in a position to retieve the scores for the respondents by reaching out to another database, so that solves one of the big parsing issues (I can replace parameter @Allscores with a parameter merely listing the respondents I wish to do). Question - is it reasonable to try to create a table variable in code (ASP.Net) and Pss it in to the stored proc? My question is, I have a list of integers that I want to pass to the stored proc, what's the best way?

    --As a test I ran around 2900 respondents against 50 traits. Using cursors in the stored proc, it took 27 minutes on my desktop machine (2.6GHz, 2Gb XP). I don't need subsecond response time.. maybe this is good enough, I dunno, we're going to try to batch the analyses up and run them overnight on the web server/database server.

    --In the cursors, for reasons of the way I wrote it, in line 472 there is a statement SELECT TOP 400000 * FROM --was 3000 which selects from a derived table. SQL requires that I put the TOP part in, but the truth is I want all of them. In my test, they were cut off at 400000. Is there another way to write this? I guess that's the question about the whole thing.

    Thanks, people. BTW, I know my code is probably amateurish, so please be kind!

  • SELECT TOP 100 PERCENT * FROM --was 3000

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • PhilM99 (5/8/2009)


    Here are the data as attached files...

    Those files look pretty good. Instead of me trying to weed through this whole post, care to "nut shell" the problem you're trying to solve here with references to the data file? Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • D'oh, as Homer Simpson would say. I did not know about the Top 100%. Seems redundant, but if it works...

    In a nutshell, I am tasked with rewriting a stored proc I wrote a few years ago. When it was used sporadically online, it was used the measure one respondent against typically one or two scoring tables. Performance was not a concern, and at my then-level of skill I freely used cursors and looped through them. Now I need performance, since they may throw hundreds of respondents against dozens of scoring tables, and I understand cursors really suck for performance.

    Preparing for posting this problem has been revealing to me. First, now I greatly understand the problem better.

    I may not need the help I asked for, but I do have questions, please read on.

    First, I fixed the awkward parsing of a CSV parameter list by reading it in directly from another database. A few years ago I did not know you could do that.

    Secondly, I noted that the original proc was multi purpose and produced intermediate output that went into a cursor. The cursor could be read multiple times or ways depending on the level of detail required... but now I only need summary data.

    So my current strategy is to eliminate all cursors by creating nested derived tables. The scoring logic for a particular score and respondent is at the centre with a joining SELECT. I eliminate all the extraneous columns not needed, to keep the performance up.

    Then I use that as a derived table and and sum the logic for a particular respondent and score, grouping as required.

    But is using a derived table more efficient that looping through a cursor? or is that too general a question?

    Thirdly, I still want to use a table variable to stash intermediate results. Is that efficient, or should I nest one more level of derived tables?

    Fourthly, what's an efficent way of handing a stored proc a long list of integers (think array in programming terms)?

    I'm still cranking on this....

    Thanks

  • Yep... the TOP 100 PERCENT thing works just fine although a whole lot of people will tell you

    that order shouldn't matter in most processing. Except for things like running totals and other

    previous row drills, order shouldn't matter at all.

    You speak a lot of "derived tables" and, without knowing what the business logic behind what

    you're trying to do is, I have to admit just a bit of concern. "Set Based" doesn't necessarily

    mean "doing it all in one query". I think you understand that a bit because of your reference

    to a table variable (heh, don't really like those for a multitude of reasons) but, let me just say

    that "Divide'n'Conquer" goes a really long way to improving performance over a complicated

    task. I prefer to use Temp Tables over Table Variables because they "persist" in QA and SSMS.

    In other words, you don't have to run the whole proc to test "the next section" of code you

    write because the Temp Table will stay active so long as you keep the QA or SSMS editor

    window open.

    And, no... Table Variables are not "memory only" and Temp Tables are not "disk only". Both

    start out in memory and will stay there if they fit. Both will spool onto TempDB on disk if

    they don't. Temp Tables will usually have the advantage because stats will develop for them.

    Stats will never develop for Table Variables and execution plans will show only a single

    row for table variables.

    Fourthly, what's an efficent way of handing a stored proc a long list of integers

    (think array in programming terms)?

    There're a couple of ways. But, let me ask, where is the long list of integers coming from?

    It could make a huge difference.

    I'm still cranking on this....

    I've noticed... and that's why I don't mind helping if I can. I admire this type of tenacity and

    you've not asked for someone to rewrite all your code for you. You keep hammering away at

    it yourself and that's going to make you and even stronger SQL developer.

    If you get stuck with how to resolve a particular business rule or task, please don't hesitate

    to ask.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for the encouragement.

    I have attached a greatly improved version of the stored proc.

    I think as you develop technical skills in a product there are typical stages including: what is it, how can I make this work, aha! now I get it, wow look at all the things I can do, oh boy, I know nothing about performance, and finally I can do a lot of things in a performant way. I'm still in the middle.

    I get my integers from an ASP.Net application where an administrative user is selecting people from a long list of projects (they might select an entire project full of people, or individually add some from a project). The integers represent the person (respondents table and this allows me to look up their scores in a related table in another db).

    In the past I have passed them in using a CSV list (the attached proc still does that). I've been reluctant to try and devise a 'request list' or something like that because of the concurrency (there can be more than one administrator) but I think I need to do this now, because now I have to collect the request and run it batch later (where do I ask about that?).

    Please look at the simplified proc attached and let me know if there's anything dramatically wrong or improveable.

    Some minor performance info:

    Unimproved proc: 9700 respondents, 50 traits, 489150 rows returned, 29 minutes. (34.8 sec /KT)

    Improved proc: 9700 respondents, 50 traits, 489150 rows returned, 17 minutes. (20.4 sec /KT)

    Improved proc: 9700 respondents, 663 traits, 6486129 rows returned, 2hr 42 min. (14.7 sec /KT)

    The last one is way more than I will ever need to do at once...

    So it's better, but not terribly fast on my dev machine.

  • Sorry, Phil... the proc you attached came out as one huge line. I won't even start to look at something like that.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It formatted fine for me?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • OK, I tried downloading it and then opening it in notepad (in addition to my usual of opening it in another Firefox tab). Both of these came out fine for me, Jeff.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • PhilM99 (5/7/2009)


    I'm back. I have gone off and done my homework, hopefully, so let's restart this discussion, and I'll post my code and some test data. I have learned much while reading about how to post here!

    I have a stored proc that was originally used to score one person's responses from a quiz against a collection of scoring items, sometimes more than one collection.

    It was used sporadically and performance was not a concern. Now the client wants to be able to run dozens, maybe hundreds, of persons against dozens of scoring items. So my original techniques of using cursors and of passing data in by text strings are now sorely in question.

    As an example, the following call to my stored proc performs scoring:

    EXEC [GetKTAnalysis_Summary] '5,48,61', '1698,1,BUS,00,41,1698,1,BUS,01,49,1698,1,BUS,02,19,1698,1,BUS,03,33,...'

    Phil, I have been going through this today, trying to understand it, but it is very complex and there are a lot of mysterious things.

    One of them is this command line, which seems entirely too long to be pracitcal as a command statement.

    How are you execiuting this command line? Through what facilitiy? Do you have a client application that actually executes it for you or are you actually typing this in through SSMS?

    Also, the final parameter seems to be a (large) table of data. Where is this data coming from and why isn't it already in a table somewhere? And how do you get it and get it into this form if it is not already in a table?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

Viewing 15 posts - 16 through 30 (of 40 total)

You must be logged in to reply to this topic. Login to reply