Help writing a SQL

  • RBarryYoung (9/22/2009)


    Anyhow, here it is, note it probably still needs that Clustered Key thing for optimal performance

    Are you SUUUUURE? I wouldn't want to play any favorites!! 😉

    Thanks for posting this Barry. I'm gonna take a look at this in depth tomorrow.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Seth: any updates on this?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Sorry, I just got back from a cruise to the Bahamas, have been completely unplugged from the world for 4 days now. May take today to 'recover' from my 'vacation', but tomorrow I'll definitely have some new updates.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Garadin (9/27/2009)


    Sorry, I just got back from a cruise to the Bahamas, have been completely unplugged from the world for 4 days now. May take today to 'recover' from my 'vacation', but tomorrow I'll definitely have some new updates.

    Hrrm. I can't honestly say that you're eliciting a lot of sympathy from me here ...

    😛

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I concede that my guts can be wrong 😀

    But just to add to the debate , this goes some way to resolving quirky update ordering issues.

    I havent seen anything quite like this on the web.

    Seems ok on 2005, further testing is appreciated.

    Drop table mytab

    go

    create table mytab

    (

    Idx char(1),

    Col1 integer,

    Col2 integer

    )

    go

    Create clustered index idx1 on mytab(idx)

    go

    drop view vw1

    go

    create view vw1

    as

    Select top 99999999999 * from mytab

    order by col1

    go

    delete from mytab

    go

    insert into mytab(Idx,Col1) values('Z',1)

    insert into mytab(Idx,Col1) values('Y',2)

    insert into mytab(Idx,Col1) values('X',4)

    insert into mytab(Idx,Col1) values('W',3)

    insert into mytab(Idx,Col1) values('V',6)

    go

    declare @roll integer

    Select @roll =0

    update vw1

    set @roll = col1+@Roll,

    col2 =@roll

    go

    select * from vw1

    select * from mytab



    Clear Sky SQL
    My Blog[/url]

  • Or even....

    declare @roll integer

    Select @roll =0

    ;with cte(Idx , Col1 , Col2)

    as

    (

    Select top 9999999 Idx , Col1 , Col2

    from mytab order by col1

    )

    update cte

    set @roll = col1+@Roll,

    col2 =@roll

    go

    select * From mytab

    order by col1



    Clear Sky SQL
    My Blog[/url]

  • Or how about this if you cant add a column to the original table.

    drop table #mytab2

    go

    create table #mytab2

    (

    Idx char(1),

    roll integer)

    go

    insert into #mytab2 (idx,roll)

    select idx,NULL

    from mytab

    go

    declare @roll integer

    Select @roll =0

    ;with cte(Idx , Col1 , Col2 , idxupd)

    as

    (

    Select top 9999999 mytab.Idx , Col1 , roll,#mytab2.idx

    from mytab join #mytab2

    on mytab.idx = #mytab2.idx

    order by col1

    )

    update cte

    set @roll = col1+@Roll,

    col2 =@roll,

    Idxupd = idx

    output inserted.Idxupd,inserted.col2



    Clear Sky SQL
    My Blog[/url]

  • Dave Ballantyne (9/28/2009)


    this goes some way to resolving quirky update ordering issues.

    Heh... what ordering issues?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (9/28/2009)

    Heh... what ordering issues?

    Well , just that on your base table , you dont ( disclaimer More testing needed) need a clustered index in the order you want to update the data.



    Clear Sky SQL
    My Blog[/url]

  • Dave Ballantyne (9/28/2009)


    Jeff Moden (9/28/2009)

    Heh... what ordering issues?

    Well , just that on your base table , you dont ( disclaimer More testing needed) need a clustered index in the order you want to update the data.

    You have to trust me on this... if you don't have a clustered index in the order you want to update the data, it will fail in a most unpredictable manner. Further, if the clustered index engages during the update (and it frequently does despite your best attempts), it will still update in clustered index order.

    If you don't have the correct clustered index on the original table, the only safe way to use the quirky update is to copy the data to a Temp Table and put the correct one on that.

    I haven't quite finished the article yet (trying to trim it down from 26 MS Word pages), but I have the proof of the failures I'm talking about in the form of demonstrable code in the article.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Dave Ballantyne (9/28/2009)


    Jeff Moden (9/28/2009)

    Heh... what ordering issues?

    Well , just that on your base table , you dont ( disclaimer More testing needed) need a clustered index in the order you want to update the data.

    Also, if you decide not to take Jeff's word for it and want to test this yourself, make sure you don't insert the data in the exact order it needs to be in.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Ok, so I had some time back at my machine to test your solution Barry. It comes very close to tying the quirky update, but doesn't *quite* manage it. (Once I slightly adjusted your script to match this scenario) I ran the time trials without the execution plan then re-ran to generate it. It is attached. One interesting thing I found is that yours improves slightly when re-running it, where as the quirky update stays approximately the same on repeated runs. It also returns the exact same number of rows. Very close second for those who do not "trust" the 3 part update. Thanks for generating this Barry, I'm going to see if I can get that varbinary code I created the other day working to see if it comes close to this.

    Barry's Method

    XML Built . Time Elapsed: 16 seconds

    (64202 row(s) affected)

    Select to Table Done. Total Time Elapsed: 18 seconds

    Final Row Count, Barry Pseudo-XML Method:64202

    Modified Code:

    DECLARE @timerdatetime,

    @FRCint

    SET @timer = GETDATE()

    --create clustered index ix_acct_ndt on MR (acct, ndt)

    declare @xml varchar(MAX), @acct int, @nextdate DATETIME, @datstr as varchar(80)

    select @xml = '', @acct = -1, @nextdate = '1900-01-01', @datstr = ''

    Declare @i int, @z int, @rowstate int

    Declare @S16 Varchar(MAX), @S14 Varchar(MAX), @S12 Varchar(MAX), @s10 Varchar(MAX)

    Select @i=0, @S16='', @S14='', @S12='', @s10=''

    select

    @z = ROW_NUMBER() OVER(ORDER BY acct,ndt),

    @rowstate = CASE WHEN @acct <> acct THEN 2 WHEN ndt >= @nextdate THEN 1 ELSE 0 END,

    @nextdate = CASE @rowstate WHEN 0 THEN @nextdate ELSE DATEADD(day, 150, ndt) END,

    @i = CASE @rowstate WHEN 0 THEN @i ELSE @i+1 END,

    @datstr = CASE @rowstate WHEN 0 THEN '' ELSE '<r a="' + convert(varchar(20), acct) + '" t="' + convert(varchar(10), ndt, 121) + '"/>' END,

    @S16 = CASE WHEN @rowstate=0 THEN @S16 WHEN @i%65536=0 THEN @S16+@S14+@S12+@S10+@xml+@datstr Else @S16 END,

    @S14 = CASE WHEN @rowstate=0 THEN @S14 WHEN @i%65536=0 THEN '' WHEN @i%16384=0 THEN @S14+@S12+@S10+@xml+@datstr Else @S14 END,

    @S12 = CASE WHEN @rowstate=0 THEN @S12 WHEN @i%16384=0 Then '' WHEN @i%4096=0 THEN @S12+@S10+@xml+@datstr Else @S12 END,

    @s10 = CASE WHEN @rowstate=0 THEN @s10 WHEN @i%4096=0 Then '' WHEN @i%1024=0 THEN @s10+@xml+@datstr Else @s10 END,

    @xml = CASE WHEN @rowstate=0 THEN @xml WHEN @i%1024=0 THEN '' ELSE @xml+@datstr END,

    @acct = acct

    from MR

    ORDER BY acct,ndt

    SELECT @xml = @S16+@S14+@S12+@S10+@xml

    Declare @x as XML

    select @x = '<d>'+@xml+'</d>'

    PRINT 'XML Built . Time Elapsed: ' + CAST(DATEDIFF(s,@Timer,GETDATE()) AS varchar(10)) + ' seconds'

    SELECT acct, ndt38

    INTO #3

    FROM (

    SELECT

    T.c.value('@a[1]', 'int') AS [acct],

    T.c.value('@t[1]', 'varchar(24)') AS [ndt38]

    FROM @x.nodes('/d/r') AS T(c)

    ) T

    --where RIGHT(ndt38,1)='2'

    PRINT 'Select to Table Done. Total Time Elapsed: ' + CAST(DATEDIFF(s,@Timer,GETDATE()) AS varchar(10)) + ' seconds'

    SET @FRC = (SELECT COUNT(*) FROM #3)

    PRINT 'Final Row Count, Barry Pseudo-XML Method:' + CAST(@FRC AS varchar(10))

    [Edit] Issue posting the .sqlplan, will post when able.

    [Edit2] Appears it is an issue with Chrome.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • I realized after the fact that it might have been a little unclear as to why I didn't believe Barry's solution didn't *beat* the quirky update, as the quirky takes 19 seconds when it has to create the temp table vs. Barry's 18.

    The reason for this is that there are only 2 reasons why you need to make the temp table in the first place.

    1. The big one. The clustered index. It's not realistic to assume that the clustered index on your table will always be the one needed for the quirky update.

    2. The inability to add any fields to the base table. This is (in my opinion) a much smaller reason and one that won't come into play in most situations. The field required by the quirky update is a bit field which would either be a very minimal size increase or (depending on the other fields in the table) might not actually increase the size of the table at all.

    The first reason seemed moot, as Barry's solution used the same clustered index to achieve those performance numbers. The second is minor, but it is a valid reason in some circumstances and his method would win by a few seconds in the situation if the clustered index was correct on the table but you were not allowed to add the field. When I said it almost beat it, I was comparing it to the quirky method sans temp table, where it was only 2-3 seconds behind.

    Now, I decided to take my own advice and completely re-create test data in a new table and never index it. The results were... surprising. Note that because the data is now completely unindexed, a temp table is *required* for the quirky update. I'm honestly shocked at how fast Barry's method runs without any index available.

    Quirky Method (w/Temp Table)

    (2000000 row(s) affected)

    Temp Table Insert Done. Time Elapsed: 12 seconds

    (2000000 row(s) affected)

    Update Done. Time Elapsed: 40 seconds

    (64425 row(s) affected)

    Select to Table Done. Total Time Elapsed: 42 seconds

    Final Row Count, Quirky Method:64425

    Barry Method: No Clustered Index.

    (1 row(s) affected)

    XML Built . Time Elapsed: 20 seconds

    (64425 row(s) affected)

    (1 row(s) affected)

    Select to Table Done. Total Time Elapsed: 23 seconds

    (1 row(s) affected)

    Final Row Count, Barry Pseudo-XML Method:64425

    Original CTE Method: No Clustered Index.

    I re-ran this as well but cut it off at 2m 30s as it still wasn't a contender.

    Barry's method appears to win hands down when the data is not ordered to begin with. What is confusing me (and why I can't figure out if I did something wrong or not) is that the quirky update itself seems to take significantly longer even after I've inserted the data into the temp table. (30 seconds for 2M rows? It seems like I did something wrong, I'm just not seeing it.)

    Execution Plans and code attached. If anyone spots a mistake, please let me know.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Garadin (9/28/2009)


    Ok, so I had some time back at my machine to test your solution Barry. It comes very close to tying the quirky update, but doesn't *quite* manage it. (Once I slightly adjusted your script to match this scenario) I ran the time trials without the execution plan then re-ran to generate it. It is attached. One interesting thing I found is that yours improves slightly when re-running it, where as the quirky update stays approximately the same on repeated runs. It also returns the exact same number of rows. Very close second for those who do not "trust" the 3 part update. Thanks for generating this Barry, I'm going to see if I can get that varbinary code I created the other day working to see if it comes close to this...

    I am actually quite surprised that it did as well as it did, and I have to give credit to Richard Fryar for his "string to XML" idea which seems to have fixed one of the major problems that my previous varbinary version had.

    However, even though it may have worked well in this case, I still do not believe that it will scale as well as Jeff's solution. Even with all of the tricks that I employ in this approach, it is still basically filling up memory with a gigantic string, which works great, Until it runs out of memory. At that point it's performance will very quickly get 10x or even 100x worse, Not Good.

    That's why I originally went from a Varchar(MAX)/NVarchar(MAX) approach to a Varbinary(MAX) one, it's probably the most efficient way to pack together are boatload of concatenated data. Unfortunately, the part of that that I though would be super-fast: extracting out the arrayed data using SUBSTRING and fixed-size matrix offset calculations, turned out to be terribly slow (I still don't know for sure why, but apparently big MAX-type strings cannot just be accessed with simple direct-memory offsets, duh (I should have realized that)).

    Anyway, apparently, putting it into an XML-structured variable fixes that (so far) as the extracts times have been really good.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Garadin (9/28/2009)

    Also, if you decide not to take Jeff's word for it and want to test this yourself, make sure you don't insert the data in the exact order it needs to be in.

    In my little test scripts i did post the data in a 'random' order and then created the clustered index so it was in a different order. 🙂

    Anyway Jeff , im sure you've been there done it and got the T-Shirt 😀 and look forward to your completed document , but , i really am struggling to get incorrect results. Ive scaled up by test to be on adventureworks and no matter what order or filtering i use then my Cte-Quirky update returns the same as my 'control' cursor. Must get on with some 'real' work now 😉

    Drop table #Balance

    go

    drop table #CurBalance

    go

    drop index Sales.SalesOrderHeader.idxOrderDate

    go

    create index idxOrderDate on Sales.SalesOrderHeader(Orderdate) include(SalesOrderId,SalesPersonId,subtotal,taxamt,freight)

    go

    create index idxSalesPerson on Sales.SalesOrderHeader(SalesPersonId) include(SalesOrderId,OrderDate,subtotal,taxamt,freight)

    go

    Create Table #Balance(

    SalesOrderId integer,

    RollingBalance money

    )

    Create Table #CurBalance(

    SalesOrderId integer,

    RollingBalance money

    )

    insert into #Balance(SalesOrderId,RollingBalance)

    select SalesOrderID,NULL

    from Sales.SalesOrderHeader SOH

    join Sales.SalesPerson SP

    on SP.SalesPersonID = SOH.SalesPersonID

    join Sales.SalesTerritory ST

    on St.TerritoryID = SP.TerritoryID

    where OrderDate between '01jan03' and '01may03'

    -- and SOH.SalesPersonID in(276 ,277)

    -- CountryRegionCode = 'CA'

    order by OrderDate, SOH.SalesPersonID

    --order by SOH.SalesPersonID ,OrderDate

    --Order by st.TerritoryID,OrderDate

    --Order by st.CountryRegionCode,OrderDate

    Declare @OrderYear integer,

    @OrderMonth integer,

    @SalesPersonId integer,

    @TerritoryId integer,

    @CountryRegionCode char(2),

    @RollingBalance money

    Select @OrderYear = 0

    Select @OrderMonth = 0

    Select @RollingBalance = 0

    Select @SalesPersonId =0

    ;with cteValue(SalesOrderId,OrderDate,SalesPersonId, TotalDue ,RollingBalance,TerritoryID , CountryRegionCode)

    as(

    select top 99999999999 SOH.SalesOrderID,OrderDate,SOH.SalesPersonID,TotalDue,RollingBalance, SP.TerritoryID,CountryRegionCode

    from Sales.SalesOrderHeader SOH

    join Sales.SalesPerson SP

    on SP.SalesPersonID = SOH.SalesPersonID

    join Sales.SalesTerritory ST

    on St.TerritoryID = SP.TerritoryID

    join #Balance

    on #Balance.SalesOrderId = SOH.SalesOrderID

    where OrderDate between '01jan03' and '01may03'

    -- and SOH.SalesPersonID in(276 ,277)

    -- CountryRegionCode ='CA'

    --order by OrderDate, SOH.SalesPersonID

    order by SOH.SalesPersonID ,OrderDate

    --Order by st.TerritoryID,OrderDate

    -- Order by st.CountryRegionCode,OrderDate

    )

    update cteValue

    set @RollingBalance = case when @SalesPersonId <> cteValue.SalesPersonId or

    @OrderMonth <> DATEPART(mm,OrderDate) or

    @OrderYear <> DATEPART(yy,OrderDate) or

    @TerritoryId <> TerritoryId

    then TotalDue

    else @RollingBalance +TotalDue end,

    RollingBalance = @RollingBalance,

    @OrderYear = DATEPART(yy,OrderDate),

    @OrderMonth = DATEPART(mm,OrderDate),

    @SalesPersonId = cteValue.SalesPersonId,

    @TerritoryId = cteValue.TerritoryID,

    @CountryRegionCode = cteValue.CountryRegionCode

    go

    Declare @OrderYear integer,

    @OrderMonth integer,

    @SalesPersonId integer,

    @TerritoryId integer,

    @CountryRegionCode char(2),

    @RollingBalance money,

    @PrevOrderYear integer,

    @PrevOrderMonth integer,

    @PrevSalesPersonId integer,

    @PrevTerritoryId integer,

    @PrevCountryRegionCode char(2),

    @SalesOrderId integer,

    @OrderDate datetime,

    @TotalDue money

    Select @OrderYear = 0,

    @OrderMonth = 0,

    @RollingBalance = 0,

    @SalesPersonId =0,

    @PrevOrderYear = 0,

    @PrevOrderMonth = 0,

    @PrevSalesPersonId =0,

    @PrevTerritoryId =0,

    @PrevCountryRegionCode='XX'

    declare balancecur cursor for

    select SOH.SalesOrderID,OrderDate,SOH.SalesPersonID,TotalDue, SP.TerritoryID,CountryRegionCode

    from Sales.SalesOrderHeader SOH

    join Sales.SalesPerson SP

    on SP.SalesPersonID = SOH.SalesPersonID

    join Sales.SalesTerritory ST

    on St.TerritoryID = SP.TerritoryID

    where OrderDate between '01jan03' and '01may03'

    --and SOH.SalesPersonID in(276 ,277)

    --CountryRegionCode ='CA'

    --order by OrderDate, SOH.SalesPersonID

    order by SOH.SalesPersonID ,OrderDate

    --Order by st.TerritoryID,OrderDate

    --Order by st.CountryRegionCode,OrderDate

    open balancecur

    while(0=0) begin

    fetch next from balancecur into @SalesOrderId,@OrderDate,@SalesPersonId,@TotalDue,@TerritoryId,@CountryRegionCode

    if(@@FETCH_STATUS<>0) break

    Select @OrderMonth = DATEPART(mm,@OrderDate),

    @OrderYear = DATEPART(yy,@OrderDate)

    if( @OrderMonth <> @PrevOrderMonth or

    @OrderYear <> @PrevOrderYear or

    @TerritoryId <> @PrevTerritoryId or

    @CountryRegionCode <> @PrevCountryRegionCode or

    @SalesPersonId <> @PrevSalesPersonId

    ) begin

    Select @RollingBalance = 0,

    @PrevOrderMonth = @OrderMonth,

    @PrevOrderYear = @OrderYear,

    @PrevSalesPersonId = @SalesPersonId,

    @PrevCountryRegionCode = @CountryRegionCode ,

    @PrevTerritoryId = @TerritoryId

    end

    Select @RollingBalance = @RollingBalance + @TotalDue

    insert into #CurBalance(SalesOrderId,RollingBalance)

    values(@SalesOrderId,@RollingBalance)

    end

    close balancecur

    deallocate balancecur

    go

    select COUNT(*) from #CurBalance

    go

    select COUNT(*) from #Balance

    go

    select COUNT(*) from #CurBalance join #Balance

    on #CurBalance.SalesOrderId = #Balance.SalesOrderId

    and #CurBalance.RollingBalance= #Balance.RollingBalance



    Clear Sky SQL
    My Blog[/url]

Viewing 15 posts - 46 through 60 (of 70 total)

You must be logged in to reply to this topic. Login to reply