Data Conversion question - To Cursor or not to Cursor, that is the question

  • I'm studiously avoiding the cursors in my data conversion project as much as possible, but I'm just stuck on this one.

    For some background, I have a table with two pieces of information I am concerned with, the ticketnumber and the date. (and 350 or so columns that don't matter here)

    I need to manipulate the ticket numbers into a new format which is going to be as follows:

    If the original ticket number is Non-Null and "Good"

    A. Last two digits of the year

    B. The original ticket number

    C. Nothing if A-B is a unique combination, or an incrementing letter 'A', 'B', 'C', 'D'

    There are a large number which have either a NULL or garbage ticket number, for these the new number will be

    A. Last two digits of the year

    B. 'XX' - Actual text to be determined but 'XX' works for now

    C. 5 digit number padded with 0's (00001, 00002, 00003....), sequentially numbered for each year

    Create table #TempTickets

    (

    TicketID INT Identity,

    TicketNum Varchar(8),

    TicketDate DateTime,

    TicketNumFixed VarChar(10) NULL

    )

    Insert into #TempTickets (TicketNum, TicketDate)

    Select 'BADNUM', '1/1/2008' UNION ALL -- '08XX00001'

    Select NULL,'1/1/2008' UNION ALL-- '08XX00002'

    Select NULL,'1/2/2007' UNION ALL-- '07XX00001'

    Select '1234','1/1/2008' UNION ALL-- '081234'

    Select '1234','1/1/2007' UNION ALL-- '071234A'

    Select '1234','1/2/2007' -- '071234B'

    Update #TempTickets set TicketNumFixed = [Insert magic here]

    Drop Table #TempTickets

    I'm all for avoiding the cursor, but I just don't see a way around using one on this example. I'm ok with adding extra columns to hold partial results to derive the final one, I'm even OK with using the cursor if I have to, because performance is not as much an issue as getting it done.

    I got this far experimenting on my own. I used an intermediate step to set the year and the number/filler. Then I tried to use a function call to try to find the maximum used so far and increment it, but that doesnt work from within an update. (I know, I didn't finish the formatting into letters or 0 padded 5 digit numbers, if I get good numbers that can happen tomorrow)

    Do I just need to break down and use a cursor to step through the table to fix this?

    Use TempDB

    Create table TempTickets

    (

    TicketID INT Identity,

    TicketNum Varchar(8),

    TicketDate DateTime,

    TicketNumWork VarChar(10) NULL,

    TicketNumFixed VarChar(10) NULL

    )

    Go

    Create Function dbo.fn_FixTicket (@TicketNum as varchar(10), @TicketNumWork as Varchar(10))

    Returns varchar(10) AS

    BEGIN

    Declare @TicketNumFixed as Varchar(10)

    Declare @MaxTicketNum as Varchar(10)

    Declare @TicketType as Int

    Declare @Count as Int

    Set @TicketType =

    (Case

    When (@TicketNum IS NULL) Then 1

    When (@TicketNum IN ('BADNUM', 'BADNUM2','BADNUM3')) Then 1

    ELSE 2

    END)

    Set @Count = (Select count(*) from TempTickets where TicketNumWork = @TicketNumWork)

    Set @TicketNumFixed =@TicketNumWork

    If @Count > 1

    BEGIN

    Set @MaxTicketNum = (Select Count(TicketNumFixed) from TempTickets where TicketNumWork = @TicketNumWork)+1

    Set @TicketNumFixed = @TicketNumWork + @MaxTicketNum

    END

    Return @TicketNumFixed

    END

    GO

    Insert into TempTickets (TicketNum, TicketDate)

    Select 'BADNUM', '1/1/2008' UNION ALL -- '08XX00001'

    Select NULL,'1/1/2008' UNION ALL-- '08XX00002'

    Select NULL,'1/2/2007' UNION ALL-- '07XX00001'

    Select '1234','1/1/2008' UNION ALL-- '081234'

    Select '1234','1/1/2007' UNION ALL-- '071234A'

    Select '1234','1/2/2007' -- '071234B'

    Update TempTickets Set TicketNumWork = Right(DatePart(yy, TicketDate),2) +

    (Case

    When (TicketNum IS NULL) Then 'XX'

    When (TicketNum IN ('BADNUM', 'BADNUM2','BADNUM3')) Then 'XX'

    ELSE TicketNum

    END)

    Update TempTickets Set TicketNumFixed = (Select dbo.fn_FixTicket(TicketNum, TicketNumWork))

    select * from TempTickets -- All the sequence numbers are 1 - Not good

    select dbo.fn_FixTicket(NULL, '08XX') -- This works

    Drop Function fn_FixTicket

    Drop Table TempTickets

    Any suggestions would be appreciated. More sample data can supplied, but I didn't think you wanted all 140,000 rows.

    All I know is after a nice 15 hour day at work...

    Insert Wayne

    Values (Beer, Sleep)

  • Wayne,

    Take a look at the following URL and warp the technology to do exactly what you need...

    http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

    It will be nasty fast... methods used will take care of a million rows in less than 7 seconds. You may have to make two passes... 1 for good tickets, one for bad... both passes will be very fast and beat even the slickest cursor.

    If you can't quite get it using that method, post back and we'll show you how.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks...I actually looked at that last night after I got home, and it looked like it might work, but the brain was too fried to make sense of it.

    I have it working in my sample code now, I'm sure it could use a little further polishing but it works, and it works relatively fast, and that matters the most to the boss at the moment.

    Use TempDB

    Create table TempTickets

    (

    TicketID INT Identity,

    TicketNum Varchar(8),

    TicketDate DateTime,

    TicketNumWork VarChar(10) NULL,

    TicketType int NULL,

    TicketNumCount int NULL,

    TicketNumFixed VarChar(10) NULL

    )

    Go

    Insert into TempTickets (TicketNum, TicketDate)

    Select 'BADNUM', '1/1/2008' UNION ALL -- '08XX00001'

    Select NULL,'1/1/2008' UNION ALL -- '08XX00002'

    Select NULL,'1/2/2007' UNION ALL -- '07XX00001'

    Select '1234','1/1/2008' UNION ALL -- '081234'

    Select '1234','1/1/2007' UNION ALL -- '071234A'

    Select '1234','1/2/2007' -- '071234B'

    Update TempTickets Set TicketNumWork = Right(DatePart(yy, TicketDate),2) +

    (Case

    When (TicketNum IS NULL) Then 'XX'

    When (TicketNum IN ('BADNUM', 'BADNUM2','BADNUM3')) Then 'XX'

    ELSE TicketNum

    END),

    TicketType = (Case

    When (TicketNum IS NULL) Then 1

    When (TicketNum IN ('BADNUM', 'BADNUM2','BADNUM3')) Then 1

    ELSE 2

    END)

    Create Clustered Index IX_TempTicket_TypeNum on TempTickets (TicketType, TicketNumWork)

    GO

    Declare @TicketCount Int

    Set @TicketCount = 0

    Declare @PrevTicketNum as varchar(10)

    Set @PrevTicketNum = ''

    Update TempTickets

    Set @TicketCount = TicketNumCount = CASE

    When TicketNumWork = @PrevTicketNum

    THEN @TicketCount +1

    ELSE 1

    END,

    @PrevTicketNum = TicketNumWork

    From TempTickets WITH (Index(IX_TempTicket_TypeNum),TABLOCKX)

    /* Update Type 1 tickets - generated ticket numbers */

    Update TempTickets

    Set TicketNumFixed =TicketNumWork + Right('00000' + cast(TicketNumCount as varchar),5)

    From TempTickets WITH (Index(IX_TempTicket_TypeNum),TABLOCKX)

    where TicketType = 1

    --select * from TempTickets

    /* Update Type 2 tickets where there are no duplicates - final number is the work number */

    /* I know this one is probably sub optimal */

    Update TempTickets

    Set TicketNumFixed = TicketNumWork

    from TempTickets where TicketType = 2 and TicketNumWork in (Select TicketNumwork from TempTickets group by TicketNumWork having count(*)=1)

    --select * from TempTickets

    /* Update Type 2 tickets where there are duplicates - add A,B,C,D...to the end. */

    Update TempTickets

    Set TicketNumFixed =TicketNumWork + CHAR(64+TicketNumCount)

    From TempTickets WITH (Index(IX_TempTicket_TypeNum),TABLOCKX)

    where TicketType = 2 and TicketNumFixed is NULL

    select * from TempTickets

    Drop Table TempTickets

    I can tell the difference between the code I wrote for the first phases of this and where I am now, I wish there was time to go back and fix some of that. At least each piece gets easier and better.

    Thanks for all the help!

  • Looks like you got it... Thanks for the feedback and for posting your solution.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply