T-sql question

  • The follow is the data I have in a single table:

    createdateIDcolumnCustnumber

    12/1/20052100

    5/1/2007 1100

    5/29/20072100

    5/30/20073100

    6/1/120074100

    6/28/20075100

    2/1/2006 2200

    6/1/200071200

    6/5/2007 2200

    6/7/2007 3200

    2/5/2005 4300

    6/10/20081300

    For the above data, if the first row's createdate is less than second row's createdate for the same custnumber 100,

    then the IDcolumn of the first row should be updated to 1, and accordingly if the second row's createdate is less than

    3rd row, then second row's IDcolumn should be 2 etc., the following is the output I am expecting:

    createdateIDcolumnCustnumber

    12/1/20051100

    5/1/2007 2100

    5/29/20073100

    5/30/20074100

    6/1/120075100

    6/28/20076100

    2/1/2006 1200

    6/1/200072200

    6/5/2007 3200

    6/7/2007 4200

    2/5/2005 1300

    6/10/20082300

    Any ideas are greatly appreciated, Thanks!!

  • I happened to have one: :w00t:

    declare @temp table

    (

    Createdate varchar(10),

    IDColumn smallint,

    CustNumber int

    )

    -- INSERT DATA for testing

    insert into @temp values('12/1/2005', 2, 100)

    insert into @temp values('5/1/2007', 1, 100)

    insert into @temp values('5/29/2007', 2, 100)

    insert into @temp values('5/30/2007', 3, 100)

    insert into @temp values('6/1/12007', 4, 100)

    insert into @temp values('6/28/2007', 5, 100)

    insert into @temp values('2/1/2006', 2, 200)

    insert into @temp values('6/1/20007', 1, 200)

    insert into @temp values('6/5/2007', 2, 200)

    insert into @temp values('6/7/2007', 3, 200)

    insert into @temp values('2/5/2005', 4, 300)

    insert into @temp values('6/10/2008', 1, 300)

    -- CREATE A TEMP TABLE holding the sorted data

    create table #temp

    (

    ID int identity(1, 1),

    Createdate varchar(10),

    IDColumn smallint,

    CustNumber int

    )

    INSERT INTO #temp(CREATEDATE, IDColumn, CustNumber)

    SELECT *

    FROM @temp

    ORDER BY CustNumber, CreateDate, IDColumn

    UPDATE A -- UPDATE every first record of CustNumber

    SET IDColumn = 1

    FROM #temp A

    INNER JOIN

    (SELECT CustNumber, MIN(CreateDate) AS CreateDAte

    FROM @temp

    GROUP BY CustNumber

    ) B

    ON A.CustNumber = B.CustNumber AND A.CreateDate = B.CreateDate

    declare @row int

    SELECT @row = MAX(X.CT)

    FROM (SELECT CustNumber, COUNT(IDColumn) AS CT

    FROM @temp

    GROUP BY CustNumber) X

    WHILE @row > 0

    BEGIN

    UPDATE A

    SET A.IDColumn = B.IDColumn + 1

    FROM #temp A

    INNER JOIN #temp B

    ON A.ID = B.ID + 1 AND A.CustNumber = B.CustNumber

    SET @row = @row - 1

    END

    UPDATE A

    SET A.IDColumn = B.IDColumn

    FROM @temp A

    INNER JOIN #temp B

    ON A.CustNumber = B.CustNumber AND A.CreateDate = B.CreateDate

    ---- CHECKING

    select * from @temp

    drop table #temp

    go

    --******** RESULT*********--

    Createdate IDColumn CustNumber

    ---------- -------- -----------

    12/1/2005 1 100

    5/1/2007 2 100

    5/29/2007 3 100

    5/30/2007 4 100

    6/1/12007 5 100

    6/28/2007 6 100

    2/1/2006 1 200

    6/1/20007 2 200

    6/5/2007 3 200

    6/7/2007 4 200

    2/5/2005 1 300

    6/10/2008 2 300

    (12 row(s) affected)

  • Try this:

    UPDATE#temp

    setIDColumn

    = (Select count(*)

    from #temp as CustPrior

    whereCustPrior.CustNumber = #temp.CustNumber

    and (CustPrior.Createdate < #temp.Createdate

    or ( CustPrior.Createdate = #temp.Createdate

    andCustPrior.IDColumn<= #temp.IDColumn

    )))

    SQL = Scarcely Qualifies as a Language

  • Good code... but be careful when you get a lot of rows or the groups of rows become large. Take a look at the following article...

    http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Nice article, Jeff.:D

  • Aye. Thanks for the feedback... I sometimes wonder if posts like that do any good. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff is quite correct on the performance affect of a triangular join so I decided to compare using the RANK window function available under SQL Server 2005 and 2008.

    For 100 Customers and 1,000 rows for each customer, the resoures used are:

    [font="System"]

    TableScanslogical reads

    TriangleRankTriangleRank

    #temp1000012 805064 574

    WorkTable1 2203266405932

    MillisecondsCPU msElapsed

    Triangle35,95136,734

    rank 1,462 1,508

    [/font]

    In short, using RANK is 36 times better than using a Triangle Join.

    set statistics io off

    set statistics time off

    go

    IF object_id('tempdb..#temp') is not null drop table #temp

    create table #temp

    (CustNumberintegernot null

    ,CreateDatedatetimenot null

    ,IDColumnsmallintnot null

    , constraint temp_p primary key (CustNumber,Createdate))

    INSERT into #temp

    (CustNumber,CreateDate, IDColumn )

    selectCustomer.N

    ,CAST( '2000-01-01' as datetime) + DayAdd.N

    ,0 as IDColumn

    frommaster.dbo.TALLY as Customer

    cross join

    master.dbo.TALLY as DayAdd

    whereCustomer.N between 1 and 100

    andDayAdd.N between 1 and 1000

    update statistics #temp

    go

    set statistics io on

    set statistics time on

    -- SQL Server 2000

    UPDATE #temp

    set IDColumn

    = (Select count(*)

    from #temp as CustPrior

    where CustPrior.CustNumber = #temp.CustNumber

    and ( CustPrior.Createdate < #temp.Createdate

    or ( CustPrior.Createdate = #temp.Createdate

    and CustPrior.IDColumn <= #temp.IDColumn

    )))

    -- SQL Server 2005 or 2008

    UPDATE#temp

    SETIDColumn = New.NewRank

    from#tempas Old

    JOIN(select Ranker.CustNumber

    ,Ranker.CreateDate

    ,RANK ( ) OVER (PARTITION BY Ranker.CustNumber order by Ranker.CreateDate) as NewRank

    from#temp as Ranker

    ) as New

    onNew.CustNumber= Old.CustNumber

    andNew.CreateDate= Old.CreateDate

    andNew.NewRank<> Old.IDColumn

    SQL = Scarcely Qualifies as a Language

  • Just as a follow-up, here are the resource usages when performing a SELECT instead of an UPDATE

    [font="System"]

    ResourceScanlogical readsCPU MSElapsed MS

    Triangle Join:100001 455,648 23,103 24,798

    Rank 1 287 100 899

    [/font]

    set statistics io on

    set statistics time on

    -- Triangle Join

    SelectCustomer.CustNumber

    ,Customer.Createdate

    ,count(*) as NewRank

    from#temp as Customer

    join#temp as CustomerPrior

    on CustomerPrior.CustNumber = Customer.CustNumber

    and CustomerPrior.Createdate <= Customer.Createdate

    group by Customer.CustNumber

    ,Customer.Createdate

    -- Rank Window for SQL Server 2005 or 2008

    selectCustomer.CustNumber

    ,Customer.CreateDate

    ,RANK ( ) OVER (PARTITION BY Customer.CustNumber order by Customer.CreateDate) as NewRank

    from#temp as Customer

    SQL = Scarcely Qualifies as a Language

  • If you change Carl's code to 2000 per customer, you see where Triangular Joins really start to fall apart... time goes up by a factor of 4 for the Triangular join... time for the RANK stays just about the same. And that's the way it is with Triangular joins... double the number of rows, the amount of time goes up by a factor of about 4.

    If you need to do ranking type functions in SQL Server 2000, please see the following article. The neat thing is that the code in the article is nearly as fast as that of the Rank function and is directly portable to SQL Server 2005...

    http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply