Why do people need to do this?

  • Hey Jeff,

    Not sure if you're including "running sum" or similar problems in your question...? If so, a common reason for needing that in my experience has been detecting the date a customer meets a lifetime purchase threshold, or a salesperson meets a quota target.

    Regards,

    Jacob

  • I've had to present a "best guess" at matching events from two data sources. Each source provides a subject identifier that tells where a package came from, but there's no unique package identifier (would have required expensive development by disparate entities). One source gives package dispatch data with descriptions of the contents and the other source just tells when the package arrived. The idea is that we can then report on the contents of a just-arrived package without the receiving party having to do any data entry. The trouble with this is that if we get two or more "dispatch" events before either package is received, we have to group and rank using row_number() the possible dispatched-received pairings, and allow for manual override if the first pair is found to be a mismatch upon examination of the package contents.

    I'm not sure this is what you're looking for, but at least is an example of how exact solutions aren't always possible, so we do what we can.

  • Jacob Luebbers (1/19/2009)


    Hey Jeff,

    Not sure if you're including "running sum" or similar problems in your question...? If so, a common reason for needing that in my experience has been detecting the date a customer meets a lifetime purchase threshold, or a salesperson meets a quota target.

    Regards,

    Jacob

    Excellent idea, Jacob... and yes, you would certainly use a running total for what you described... but would you need to restart the numbering each time a customer changes (or other thing) within the same day as in the original data I posted? I'm thinking that you wouldn't.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • john.arnott (1/19/2009)


    I've had to present a "best guess" at matching events from two data sources. Each source provides a subject identifier that tells where a package came from, but there's no unique package identifier (would have required expensive development by disparate entities). One source gives package dispatch data with descriptions of the contents and the other source just tells when the package arrived. The idea is that we can then report on the contents of a just-arrived package without the receiving party having to do any data entry. The trouble with this is that if we get two or more "dispatch" events before either package is received, we have to group and rank using row_number() the possible dispatched-received pairings, and allow for manual override if the first pair is found to be a mismatch upon examination of the package contents.

    I'm not sure this is what you're looking for, but at least is an example of how exact solutions aren't always possible, so we do what we can.

    Aye... thanks John. Not sure I can shoehorn that explanation in, but you're correct... good example of how exact solutions aren't always possible.

    What I may have to do is just explain that I'd seen the problem before and have no real idea as to why someone would need to do the exact thing I'm getting ready to demonstrate. I like the idea that someone may need that for a single file instead of multiple files as has been suggested, but I don't really know.

    Anyway, thanks again.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    After 30 years of having to find ways to get data in or out of computer systems for almost every conceivable purpose, what I've found is that in every single case, after asking enough questions, it's a failure to properly capture the data, often combined with a structural design that prevents any possible shot at permanently rectifying the problem at anything remotely resembling a reasonable cost. If you bottom-line it, it ultimately comes down to a failure to think the "system" all the way through. Of course, trying to save money is almost always some part of that root cause, by virtue of leading to what I'll call "allegedly simpler systems". The fact that such systems all too often don't actually do what is ultimately desired without having to add cost/time/resources well beyond the initial savings, doesn't seem to impact the decision makers because of the short-term thinking so typical of "corporate America" these days. The current economic crisis is almost sure to spawn even more problems of this nature.

    As usual, we, the lowly employee, often have little to no control over anything, and input contrary to management direction is almost always not looked upon favorably, no matter how right we might be, nor how poor it will be for the company by virtue of just blindly following management direction.

    This may not help your article, but perhaps you can offer up that properly designed systems simply do not need those kinds of solutions, and that perhaps a good test for any system design is to have it determined that the system doesn't need any of that kind of reporting solution...

    Steve

    (aka smunson)

    :):):)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • smunson (1/19/2009)


    Jeff,

    After 30 years of having to find ways to get data in or out of computer systems for almost every conceivable purpose, what I've found is that in every single case, after asking enough questions, it's a failure to properly capture the data, often combined with a structural design that prevents any possible shot at permanently rectifying the problem at anything remotely resembling a reasonable cost. If you bottom-line it, it ultimately comes down to a failure to think the "system" all the way through. Of course, trying to save money is almost always some part of that root cause, by virtue of leading to what I'll call "allegedly simpler systems". The fact that such systems all too often don't actually do what is ultimately desired without having to add cost/time/resources well beyond the initial savings, doesn't seem to impact the decision makers because of the short-term thinking so typical of "corporate America" these days. The current economic crisis is almost sure to spawn even more problems of this nature.

    As usual, we, the lowly employee, often have little to no control over anything, and input contrary to management direction is almost always not looked upon favorably, no matter how right we might be, nor how poor it will be for the company by virtue of just blindly following management direction.

    This may not help your article, but perhaps you can offer up that properly designed systems simply do not need those kinds of solutions, and that perhaps a good test for any system design is to have it determined that the system doesn't need any of that kind of reporting solution...

    Steve

    (aka smunson)

    :):):)

    Thanks for the feedback, but I believe you may have misunderstood the intent of the article. Agreed that we all have to do strange things and this is one of those strange things that people have to do. I just want to know what it's actually for so that I can explain "When you try to do such-and-such", I can put in the correct words for "such-and-such". I'm not bitching that the task has to be done, I just want to know why people need to do the task so I can put a label on it. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/19/2009)


    ...but would you need to restart the numbering each time a customer changes (or other thing) within the same day as in the original data I posted?

    In the cases I'm talking about the whole (calendar) day is irrelevant - we just want to capture the exact time that a customer's purchase caused them to cross a threshold, or a saleperson's sale caused them to meet quota. So yes, we'd have the running sum restart when the customer/saleperson changes.

    Regards,

    Jacob

  • I just remembered another business case for this methodology: implementing a state machine that mimics a real-world transactional process. I've had occasion in the past to do this to represent a process that involves different kinds of interdependent transactions occuring on a timeline. Each type of transaction had a dependency on the transactions that preceeded it, and the relationship varied depending on the type of transactions before it in the sequence. In essence it involved maintaining multiple running sums, with nested CASE statements controlling the exact calculation of these sums (eg which transactions affect which sum and when, etc.)

    And yes, it was probably a case of "wrong tool for the job" 🙂

    Regards,

    Jacob

  • Jeff Moden (1/19/2009)


    Ramesh (1/18/2009)


    I would like to share my thoughts on the need of such rules.

    I have the punch/swipe card data imported from some other system in the following format...

    EmployeeNo, CardReaderNo, CardReaderType, PunchDateTime

    E1, C1, IN, 21-Jan-2008 11:01:21 134

    E1, C1, OUT, 21-Jan-2008 11:10:01 449

    E1, C1, IN, 21-Jan-2008 12:04:35 121

    E1, C1, OUT, 21-Jan-2008 15:34:05 154

    Now, here the business requirement is, a report that shows the employee day wise in/out history in the office...

    Employee, Date, InTime, OutTime

    E1, 21-Jan-2008, 11:01:21 134, 11:10:01 449

    E1, 21-Jan-2008, 12:04:35 121, 15:34:05 154

    Now, the business need is to track an employee's productive time in the office.

    That's pretty good... thanks Ramesh... but look at the original data I posted again... there's absolutely no hint of such row pairing in that data.

    Lot's of good suggestions, folks. Thanks.

    Aahh:crazy:, does I misread the post?? Ohhh...:cool:, probably YESSS.

    Actually, Jeff, I never had to do such a weird stuff...

    --Ramesh


  • Jacob Luebbers (1/19/2009)


    Jeff Moden (1/19/2009)


    ...but would you need to restart the numbering each time a customer changes (or other thing) within the same day as in the original data I posted?

    In the cases I'm talking about the whole (calendar) day is irrelevant - we just want to capture the exact time that a customer's purchase caused them to cross a threshold, or a saleperson's sale caused them to meet quota. So yes, we'd have the running sum restart when the customer/saleperson changes.

    Regards,

    Jacob

    But it wouldn't reset to 1 for the same person, would it?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, I think the basic point of such questions is that data sequentiality matters in some cases, and it's something that SQL (and relational databases in general) are supposed to ignore.

    I'd just call it a sequentiality issue, if you're looking for a name for it.

    Depending on the specific situation, it might have different rules from one time to the next, and it can often be modeled using running totals type calculations, but it can vary for specific rules.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Jeff,

    I think there's an inherent problem with "slapping a label on it", which does little more than both condone AND support the very worst way to go about things. The mere presence of a label isn't going to solve the problem, and since we're almost exclusively dealing with what GSquared referred to as sequencing issues, that's probably as good a label as you're likely to get.

    I realize that we all are made to do these things, but to borrow from a now (in?)famous phrase, putting lipstick on a pig doesn't stop it from being a pig. I guess I just consider it a top priority to educate those who make these demands about the true cost of continuing down that same road.

    Steve

    (aka smunson)

    :):):)

    Jeff Moden (1/19/2009)


    smunson (1/19/2009)


    Jeff,

    After 30 years of having to find ways to get data in or out of computer systems for almost every conceivable purpose, what I've found is that in every single case, after asking enough questions, it's a failure to properly capture the data, often combined with a structural design that prevents any possible shot at permanently rectifying the problem at anything remotely resembling a reasonable cost. If you bottom-line it, it ultimately comes down to a failure to think the "system" all the way through. Of course, trying to save money is almost always some part of that root cause, by virtue of leading to what I'll call "allegedly simpler systems". The fact that such systems all too often don't actually do what is ultimately desired without having to add cost/time/resources well beyond the initial savings, doesn't seem to impact the decision makers because of the short-term thinking so typical of "corporate America" these days. The current economic crisis is almost sure to spawn even more problems of this nature.

    As usual, we, the lowly employee, often have little to no control over anything, and input contrary to management direction is almost always not looked upon favorably, no matter how right we might be, nor how poor it will be for the company by virtue of just blindly following management direction.

    This may not help your article, but perhaps you can offer up that properly designed systems simply do not need those kinds of solutions, and that perhaps a good test for any system design is to have it determined that the system doesn't need any of that kind of reporting solution...

    Steve

    (aka smunson)

    :):):)

    Thanks for the feedback, but I believe you may have misunderstood the intent of the article. Agreed that we all have to do strange things and this is one of those strange things that people have to do. I just want to know what it's actually for so that I can explain "When you try to do such-and-such", I can put in the correct words for "such-and-such". I'm not bitching that the task has to be done, I just want to know why people need to do the task so I can put a label on it. 😉

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Jeff Moden (1/19/2009)


    Jacob Luebbers (1/19/2009)


    Hey Jeff,

    Not sure if you're including "running sum" or similar problems in your question...? If so, a common reason for needing that in my experience has been detecting the date a customer meets a lifetime purchase threshold, or a salesperson meets a quota target.

    Regards,

    Jacob

    Excellent idea, Jacob... and yes, you would certainly use a running total for what you described... but would you need to restart the numbering each time a customer changes (or other thing) within the same day as in the original data I posted? I'm thinking that you wouldn't.

    I think this may relate to what Smunson and others are alluding to, missing requirements when the database was modeled, and I realize that your example was just that, but your customer sequence changes on the change of the hour? Could this be due to some ranking that needs to take place at a subdivision of a date, which is not captured as a separate column?

    For instance, in a call center environment, if I needed to know the top 5 clients for each hour of the day, or some such. Making that up, top 5 entry points for requests (phone/fax/web), request categories (complaint/general service/compliment - that one's always blank), something like that.

    Uses for this being operational crisis management. "Where are our chokepoints?"

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • Jeff Moden (1/20/2009)


    But it wouldn't reset to 1 for the same person, would it?

    Ah, I missed that key point 😀

    In that case no, I've never seen a business case to do this.

    Regards,

    Jacob

  • Jacob Luebbers (1/20/2009)


    Jeff Moden (1/20/2009)


    But it wouldn't reset to 1 for the same person, would it?

    Ah, I missed that key point 😀

    In that case no, I've never seen a business case to do this.

    Regards,

    Jacob

    BWAA-HAA! Me neither... and now you know why I'm asking! Thanks for the feedback, Jacob.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 31 through 45 (of 94 total)

You must be logged in to reply to this topic. Login to reply