how to remove numbers from strings?

  • Hmm... First there's a whole lot of flack about not being 'set oriented',

    not knowing SQL etc etc, and now you're posting something that loops

    dealing with one row at a time..?

    If you deal with sets that has no more than one row, then the order becomes redundant, eh?

    How many different ways can you sort a single row? ๐Ÿ˜‰

    /Kenneth

  • I used loop to guarantee the order of INSERTING the rows.

    Are you agree that ORDER OF INSERTING was the same for both tables?

    So, what the SELECT proves in this case?

    P.S. Kenneth, are you really that dumb?

    _____________
    Code for TallyGenerator

  • Must be ๐Ÿ™‚

    And your fine example served to demonstrate exactly what?

    How to write

    INSERT t1 (a) SELECT 'a'

    INSERT t1 (a) SELECT 'b'

    INSERT t1 (a) SELECT 'c'

    INSERT t1 (a) SELECT 'd'

    ..but with fewer keystrokes?

    I don't see the point with that demonstration of yours, it seems to be about something that hasn't yet come up anywhere in this thread?

    Would you care to explain what you mean to prove by it?

    /Kenneth

  • Kenneth Wilhelmsson (11/1/2007)


    I don't see the point with that demonstration of yours, it seems to be about something that hasn't yet come up anywhere in this thread?

    Would you care to explain what you mean to prove by it?

    OK, for dumbs - explaining 3rd time.

    Does this script guarantee order of insertion?

    Were the rows inserted into both tables in the same order?

    Is the order displayed by following SELECTS the same for both tables?

    After you answer those questions go to the page you referenced, read it once again and answer an additional question:

    Do those MS "professionals" have any clue about how SQL Server works?

    _____________
    Code for TallyGenerator

  • Sergiy (11/1/2007)


    OK, for dumbs - explaining 3rd time.

    Does this script guarantee order of insertion?

    Were the rows inserted into both tables in the same order?

    Is the order displayed by following SELECTS the same for both tables?

    Yes.

    Yes.

    No. You have different primary keys. So what is your mission with this? Leaving ORDER BY in a SELECT statement never guarantees the order to be the same between executions.


    N 56ยฐ04'39.16"
    E 12ยฐ55'05.25"

  • Peter Larsson (11/1/2007)


    No. You have different primary keys. So what is your mission with this? Leaving ORDER BY in a SELECT statement never guarantees the order to be the same between executions.

    Exactly.

    But those "professionals" from MS judge the order of INSERTION by the order of SELECTION performed after insertion.

    And even being pointed on the error they cannot get it.

    They don't have a clue that a table is not just a set of sectors on disk but the CODE, the PIECE OF FUNCTIONALITY operating the data supplied to the table.

    And order of SELECT depends not on order of INSERT (OK, not only on order of INSERT), but on that functionality we name "a table".

    My example clearly shows this. The only thing is different is table definitions. And it brings different results.

    _____________
    Code for TallyGenerator

  • I was away for a while, but finally had a chance to log on, so time to clear up some misconceptions that seem to be out there.

    Jeff Moden (10/22/2007)[


    Ya know... for someone who bitched as much as you did about personal attacks, you're really pretty good at it. :Whistling:[/QUOTE]

    I'm VERY good at it if I try. My bitch is about launching personal attacks, not responding to them. If I call you an idiot, I don't expect you to ignore it. What I expect in this forum, is for me to not call you (or anyone else) an idiot (or anything else derogatory) in the first place, when completely unprovoked. I have zero intention of letting Sergiy continue trolling and calling every other new poster, or poster he disagrees with, an idiot (especially when he's wrong, both with his response and about the new poster, which he frequently is), without calling him out.

    Remember the post to my friend Greg? You could have said nothing... instead, it was you who attacked.

    I don't have the slightest idea what you're talking about here. Care to link to the post(s) in question?

    And there's a strong bit of innuendo in your statement above... are YOU questioning my integrity? Or just insinuating? And all because I happen to not be arbitrating for your position? Sheesh!

    Neither. I was asking, quite seriously in fact, as your response made absolutely zero sense in any other context I could come up with. The reason I asked is that I produced two identical actual execution plans, one of which was less code, and which also proved Sergiy wrong, and you stated that the INNER JOIN method was the preferrable one. On what basis?

    Let's get back to the code because that's all that really matters...

    You've brought a good point to the table, David... the OVER(PARTITION) does appear to do an inherent sort (I can't verify, I don't have 2k5, so I take your word for it)... and that's great for reports of all kinds...

    That's not completely true. It does do sorting, but there is no guarantee that the final output will be truly sorted. What it does do, is it just keeps the members of the group together, and in some cases (depending on caching issues) happens to also do a sort on the group itself. This is not at all guaranteed.

    ...but... who's going to create a report with a million rows? The purpose of the testing I'm interested in is not how fast you can create a million row report... the purpose is how fast each method can process a million rows of data.

    I didn't create the million row test set, so I'm not sure why this question is directed at me. I simply stated that in a typical reporting environment, OVER performs as well as the INNER JOIN method, but with less code, and easier ongoing maintenance.

    I agree... if you don't want to even think about tuning, go ahead and use OVER(PARTITION). If you intend to build reports, go ahead, use OVER(PARTITION). If you want to process millions of rows of data about twice as fast, you just might want to consider exercising a little "old" T-SQL by using the Inner Join method with the covering index.

    Again, I'm confused as to what this has to do with tuning. For reporting purposes, you're just not likely to see covering indexes tossed all over the place. Without the covering index, OVER and INNER JOIN have the same execution plan.

    You don't believe that covering indexes are a representation of what a real production system would have.

    I said no such thing. I said they're not a representation of what performance tuning measures would typically be taken to accomodate reports.

    We're not arguing the fact that you don't use covering indexes in production, but, contrary to your beliefs, many of us do use such indexes in production when we want to double the performance of mega row code... it's part of "tuning" and you know that.

    I use them all of the time, as I work with terabytes of data, some of which is in tables that are far wider than one would hope.

    Now, instead of questioning my personal integrity for doing tests that don't happen to meet your particular needs, embrace the spirit of the tests... Show me the code, the indexes, or any combination of methods where you can process a million rows of data for the current given problem (which I believe you originally posted, by the way) using OVER(PARTITION) that can beat Serqiy's Inner Join method in the presence of the tuning/coving index that's been listed. THEN we can test the best way to do (gasp) million row reportsโ€ฆ

    Naw, I made my point. OVER has uses. I stated it, I demonstrated it, I proved it, and I stand by it. Does that mean that I no longer write a single batch without OVER in it? Of course not. It just means that in cases where it makes sense (and those cases do exist), I'm going to use the tools in 2005. I could care less if others would prefer not to. It's their loss, not mine. There are a ton of people who don't like change. Hell, I know people who still refuse to use ANSI joins, and wouldn't know an INFORMATION_SCHEMA view if it slapped them upside the head. As long as I don't have to mess with their code, it doesn't bother me a bit, but they'll be the ones that suffer down the road.

  • David, what did you prove?

    My statement was exactly this:

    Probably 2k5 is more comfortable for procedural language programmers, but for those who understand relational model and use to operate with datasets it's useless.

    You states that OVER() is such an advantage in SQL2005 that you cannot return to SQL200 even for 5k raise.

    Did you prove ANY advantage of OVER() except the one I stated: it is more comfortable for procedural language programmers?

    _____________
    Code for TallyGenerator

  • David McFarland (11/5/2007)


    You don't believe that covering indexes are a representation of what a real production system would have.

    I said no such thing. I said they're not a representation of what performance tuning measures would typically be taken to accomodate reports.

    David McFarland (10/22/2007)


    I simply stated that people aren't going to put such a covering index on a query for a single report. That's a fact. We call that "gaming the system" where I come from.

    Proven a liar.

    At least.

    _____________
    Code for TallyGenerator

  • The reason I asked is that I produced two identical actual execution plans, one of which was less code, and which also proved Sergiy wrong, and you stated that the INNER JOIN method was the preferrable one. On what basis?

    Because, in the presence of the index, which is also a part of T-SQL and optimization, the Inner Join is more than twice as fast as the Over. Here's Lowell's test report that included the index and the proper "subtrahend" after my terrible copy and paste error... the Inner Join method is 2-3 times faster than the OVER method when proper indexes (again, all part of knowing T-SQL) are deployed and available...

    SQL2005/QA

    Creating test table...

    00:00:09:030 Duration (hh:mi:ss:mmm)

    ======================================

    INNER JOIN method...

    00:00:03:533 Duration (hh:mi:ss:mmm)

    ======================================

    OverParition method...

    00:00:08:907 Duration (hh:mi:ss:mmm)

    ======================================

    SQL2005/QA Second Pass

    Creating test table...

    00:00:07:953 Duration (hh:mi:ss:mmm)

    ======================================

    INNER JOIN method...

    00:00:02:437 Duration (hh:mi:ss:mmm)

    ======================================

    OverParition method...

    00:00:07:670 Duration (hh:mi:ss:mmm)

    ======================================

    SQL2005/SSMS

    Creating test table...

    00:00:48:500 Duration (hh:mi:ss:mmm)

    ======================================

    INNER JOIN method...

    00:00:03:360 Duration (hh:mi:ss:mmm) ======================================

    OverParition method...

    00:00:08:797 Duration (hh:mi:ss:mmm) ======================================

    SQL2005/SSMS repeated, because

    the temp table took so long:

    Creating Temp Table

    00:00:09:017 Duration (hh:mi:ss:mmm)

    ======================================

    INNER JOIN method...

    00:00:02:437 Duration (hh:mi:ss:mmm) ======================================

    OverParition method...

    00:00:07:593 Duration (hh:mi:ss:mmm) ======================================

    I would be impressed and convinced if you (or anyone) could write the OVER method so that it beats the indexed version of the Inner Join method. So far, all you've proven is that if you pull the wings off T-SQL, the new methods will allow people who don't really know T-SQL to write code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sergiy (11/5/2007)


    David, what did you prove?[/QUOTE]

    That you were wrong. No big deal, as I've done it before, and I'll do it again (based on your "maths" skills, probably sooner rather than later).

    You states that OVER() is such an advantage in SQL2005 that you cannot return to SQL200 even for 5k raise.

    No, I stated no such thing, but I won't call you a liar (at least not in this case). I stated that you'll take Over() away from me over my dead body. Regarding the 5K raise, that was returning to SQL 2000 from 2005 in general, not because of Over. Here, let me refresh your short-term memory.

    [David McFarland


    I wouldn't go back to 2000 for a $5000 raise.

    That's all by itself in as a single sentence paragraph, in a post referring to "tons" of nice things about 2005, one of which is indeed Over().

    Did you prove ANY advantage of OVER() except the one I stated: it is more comfortable for procedural language programmers?

    Of course I did. I discuss the advantages several times. In fact, I don't see why it would be more comfortable for procedural programmers, as it's a SQL (and set-based) construct. For a procedural type person, a series of temp table creations would be far mroe logical, as would a cursor with accumulator variables. There is nothing hard about subqueries, as I have entry level SQL folks here who can nest them until the cows come home. Sometimes, there are better options. In those cases, I use the better option. Pretty simple really.

  • Sergiy (11/5/2007)


    Proven a liar.

    At least.

    That's pretty funny. Not even close to true, but funny.

    Um, in case you're not aware, the two statements say basically the same thing. If by "liar", you mean "correct", then sure, I can work with that.

  • Jeff Moden (11/5/2007)


    Because, in the presence of the index, which is also a part of T-SQL and optimization, the Inner Join is more than twice as fast as the Over. Here's Lowell's test report that included the index and the proper "subtrahend" after my terrible copy and paste error... the Inner Join method is 2-3 times faster than the OVER method when proper indexes (again, all part of knowing T-SQL) are deployed and available...[/QUOTE]

    Do you use covering indexes to optimize all of your reports? I don't, as I don't have the terabytes of extra storage I'd need to do so. If you do, then sure, tweak to your heart's desire. You'll be creating new ones every time the requirements change a bit, but if that's acceptable to your business, then I'd consider doing it as well.

    I would be impressed and convinced if you (or anyone) could write the OVER method so that it beats the indexed version of the Inner Join method.

    Not really the goal. The goal is less code, with similar (or identical) performance. Without the covering index, it gives that.

    So far, all you've proven is that if you pull the wings off T-SQL, the new methods will allow people who don't really know T-SQL to write code.

    I didn't pull any wings off of T-SQL. That's a part of T-SQL now (note that I didn't say part of SQL Server, but specifically T-SQL. It's in the actual language), and since it's also a part of ANSI SQL 1999, likely isn't going anywhere anytime soon. Using it doesn't make one a good (or bad) SQL developer. Using it (or refusing to use it) in situations where you should do otherwise, does.

  • David McFarland (11/6/2007)


    Do you use covering indexes to optimize all of your reports? I don't, as I don't have the terabytes of extra storage I'd need to do so. If you do, then sure, tweak to your heart's desire. You'll be creating new ones every time the requirements change a bit, but if that's acceptable to your business, then I'd consider doing it as well.

    Yup... and what's really great about it is that the managers that used to wait hours for reports can now get them in just a couple of minutes.

    So far as the "terabytes of extra storage" you'd need... think outside the box... that's what expendible working tables are for... build 'em, index 'em, use 'em, drop 'em. And, disk space is relatively cheap compared to ticking off the big dogs with late reports, especially mission critical reports that are really nothing more than files that get uploaded to places like the PUC, FCC, Federal and State Tax agencies, and ACH's. There's only a certain amount of time at month end to get those things done... if you're late, the company gets fined, period... and it's real tough to get another job with the boss' teeth marks on your butt. ๐Ÿ˜› The covering indexes and the inner join methods we've tested are well worth the extra effort to not getting fined. In the past, several 24 hour jobs would fail just because of interference... how many times do you think you can do a rerun of something like that when the deadline is only 36 hours from the git? Using those seemingly archaic Inner Join methods and covering indexes has allowed me to meet all of those incredible "reporting" deadlines and save the company relatively large amounts of money.

    Not really the goal. The goal is less code, with similar (or identical) performance. Without the covering index, it gives that.

    You are correct... getting every ounce of performance may not be your goal... but it was and is my goal and it was and is the purpose of the testing we did.

    I didn't pull any wings off of T-SQL. That's a part of T-SQL now (note that I didn't say part of SQL Server, but specifically T-SQL. It's in the actual language), and since it's also a part of ANSI SQL 1999, likely isn't going anywhere anytime soon. Using it doesn't make one a good (or bad) SQL developer. Using it (or refusing to use it) in situations where you should do otherwise, does.

    And that's the whole point of all of this, David... the new functionality is there to make it easier on people who can't think of or don't want to type the extra handful of characters it takes to render the inner join/index solution. And, it allows people who don't really know how to use all the features of T-SQL to do their jobs at the expense of some performance without taking the full performance hit when RBAR/ISAM programmers simply give up and use a Cursor or While Loop or (gasp) things like DTS. The really bad thing about OVER, in this particular test, is that it can't be tuned for additional performance (or at least no one has yet demonstrated how it can be)... what you see is what you get and it will never beat the Inner Join method in the presence of the proper index for sheer performance for things like what we tested.

    If you want nice short code, go right ahead and use OVER... if you need 2 to 3 times the performance for things like this, use the slightly longer Inner Join method with an index.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/6/2007)


    Yup... and what's really great about it is that the managers that used to wait hours for reports can now get them in just a couple of minutes.[/QUOTE]

    We don't have a single report that takes longer than 30 seconds to retrieve (typically they're in the 0-2 second range), thanks to some careful marting. We also don't have a single report that takes advantage of a covering index, although we use them extensively to speed up the marting processes themselves. Under this environment, Over() is working like a champ.

    So far as the "terbytes of extra storage" you'd need... think outside the box... that's what expendible working tables are for... build 'em, index 'em, use 'em, drop 'em.

    As you've probably noticed in another thread from yesterday, I'm quite familiar with these. They'd just slow us down in our particular case. Again, I don't have performance issues with reports.

    And, disk space is relatively cheap compared to ticking off the big dogs with late reports, especially mission critical reports that are really nothing more than files that get uploaded to places like the PUC, FCC, Federal and State Tax agencies, and ACH's. There's only a certain amount of time at month end to get those things done... if you're late, the company gets fined, period... and it's real tough to get another job with the boss' teeth marks on your butt. ๐Ÿ˜› The covering indexes and the inner join methods we've tested are well worth the extra effort to not getting fined. In the past, several 24 hour jobs would fail just because of interference... how many times do you think you can do a rerun of something like that when the deadline is only 36 hours from the git? Using those seemingly archaic Inner Join methods and covering indexes has allowed me to meet all of those incredible "reporting" deadlines and save the company relatively large amounts of money.

    As I noted, if they are both necessary and worthwhile in your operation, I'd also recommend you use them. In our particular environment, covering indexes are not needed for direct report generation (although again, as noted, they are used to create the tables behind report generation). Most of our data comes from two tables, a header and detail table, the latter of which can get close to a billion rows a day on busy days (typically 300 million or so). Once I've gotten the data out of those into various marted tables, everything down the line is a piece of cake.

    You are correct... getting every ounce of performance may not be your goal... but it was and is my goal and it was and is the purpose of the testing we did.

    But it wasn't the purpose behind sometimes using Over() in our environment. Less, more easily (and quickly) maintainable code, was. Had it had a noticeably negative impact on performance, it would have mattered. It didn't (as noted, it had zero impact in our environment), so it doesn't.

    And that's the whole point of all of this, David... the new functionality is there to make it easier on people who can't think of or don't want to type the extra handful of characters it takes to render the inner join/index solution. And, it allows people who don't really know how to use all the features of T-SQL to do their jobs at the expense of some performance without taking the full performance hit when RBAR/ISAM programmers simply give up and use a Cursor or While Loop or (gasp) things like DTS. The really bad thing about OVER, in this particular test, is that it can't be tuned for additional performance (or at least no one has yet demonstrated how it can be)... what you see is what you get and it will never beat the Inner Join method in the presence of the proper index for sheer performance for things like what we tested.

    I don't find joins and subqueries particularly difficult concepts to teach people, so I don't agree. I also wouldn't recommend that anyone who doesn't understand ol' Ted's philosophies use Over() anymore than I'd tell them to use joins, or write any T-SQL for that matter. The CLR is a totally different beast, as it does allow procedural programmers to write code that they are familiar with in our world. I haven't personally found a single use for it, although I am about to do some testing of various parsing methods, including that. If it happens to work faster (which I'm doubting in our case), I'll use it. If not, I won't. I have no clue why you brought up RBAR, by the way, as Over() isn't even closely related to that.

    As an aside, the same arguments you have made could be made in defense of Assembly language over C, for example. Nothing I can do in C that can't be done in Assemblyl language. It's a hell of a lot faster to get things done in C, while optimized ASSM apps are going to be faster. The reality is that C is in 99% of the cases far faster than it needs to be, so you don't lose anything by going that route, yet you gain in many areas. In that 1% of the cases, hell yeah, use ASSM.

    If you want nice short code, go right ahead and use OVER... if you need 2 to 3 times the performance for things like this, use the slightly longer Inner Join method with an index.

    Again, in our environment, there is zero gain by using the join method. Also, I think you meant to saying "covering index", not "index", correct? Those are very different beasts in this conversation.

Viewing 15 posts - 151 through 165 (of 172 total)

You must be logged in to reply to this topic. Login to reply