Help needed- Generate 8 char Alphanumeric Unique Sequence

  • WITH Alphas AS

    (

    SELECT a

    FROM (VALUES('A'),('B'),('C'),('D')) a (a) -- use 22 letters

    ),

    Numbers AS

    (

    SELECT n

    FROM (VALUES('0'),('1'),('2')) a (n) -- use 10 digits

    )

    -- For my case 4*4*3*3*4*4*3*3 = 20,736 rows

    SELECT a.a + b.a + c.n + d.n + e.a + f.a + g.n + h.n

    FROM Alphas a

    CROSS JOIN Alphas b

    CROSS JOIN Numbers c

    CROSS JOIN Numbers d

    CROSS JOIN Alphas e

    CROSS JOIN Alphas f

    CROSS JOIN Numbers g

    CROSS JOIN Numbers h;

    I wouldn't dare run this with 22 letters and 10 digits but there you go.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (4/21/2015)


    WITH Alphas AS

    (

    SELECT a

    FROM (VALUES('A'),('B'),('C'),('D')) a (a) -- use 22 letters

    ),

    Numbers AS

    (

    SELECT n

    FROM (VALUES('0'),('1'),('2')) a (n) -- use 10 digits

    )

    -- For my case 4*4*3*3*4*4*3*3 = 20,736 rows

    SELECT a.a + b.a + c.n + d.n + e.a + f.a + g.n + h.n

    FROM Alphas a

    CROSS JOIN Alphas b

    CROSS JOIN Numbers c

    CROSS JOIN Numbers d

    CROSS JOIN Alphas e

    CROSS JOIN Alphas f

    CROSS JOIN Numbers g

    CROSS JOIN Numbers h;

    I wouldn't dare run this with 22 letters and 10 digits but there you go.

    Yep. That is exactly what I was thinking too Dwain.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • dwain.c (4/21/2015)


    WITH Alphas AS

    (

    SELECT a

    FROM (VALUES('A'),('B'),('C'),('D')) a (a) -- use 22 letters

    ),

    Numbers AS

    (

    SELECT n

    FROM (VALUES('0'),('1'),('2')) a (n) -- use 10 digits

    )

    -- For my case 4*4*3*3*4*4*3*3 = 20,736 rows

    SELECT a.a + b.a + c.n + d.n + e.a + f.a + g.n + h.n

    FROM Alphas a

    CROSS JOIN Alphas b

    CROSS JOIN Numbers c

    CROSS JOIN Numbers d

    CROSS JOIN Alphas e

    CROSS JOIN Alphas f

    CROSS JOIN Numbers g

    CROSS JOIN Numbers h;

    I wouldn't dare run this with 22 letters and 10 digits but there you go.

    If it's important to make the sequence run in (what I'd consider) the "natural" order, you might want to add an OPTION(FORCE ORDER).

    Edit:

    This is of course utter tosh: if you want to guarantee a particular order, you should use ORDER BY.

    The hint only had the described effect when I was experimenting with the query in an unordered "firehose" form.

    Luckily, Jeff has since moved the discussion on to a more efficient solution.

  • @ditha,

    Tonight after work, I'm going to try to save your professional life. If you run any code that generates all 3+ billion rows, the table just for that one sequence column alone will be more than 68GB. If you force it into CHAR(8) as a NOT NULL column, it will still be over 55GB. And, it will lock up your server for 1 hour, 41 minutes, and 37 seconds. That's not a guess. That's the actual time on a 4 core machine. I ran it because I was curious as to how long it would take and what the actual effects on system resources would be.

    If you do such a thing, your DBA, NetOps, the backup team, your managers, and the users of the machine will all invite you to the nearest woodshed for a serious "come to Jesus" talk about what not to do.

    I have a function that will make your life a whole lot easier and I'll demonstrate it tonight.

    Whatever you do, do NOT run Dwain's code or anyone else's code that produces the entire sequence! It's not that he's done anything wrong. It's just the volume of data that his code will correctly produce.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Curious problem… the first thing I thought of was to think of the format (AA00ZZ99) as an IP address: split up the value into 4 distinct buckets, with a numeric range in each section. The 2 letter section would translate to a value between 0 and 575 (24*24), and the numeric part is 0 to 99.

    The numeric value for an IP address is a 4 byte unsigned INT (256*256*256*256). In this case, the numeric value will range from 0 to 3,306,250,000 (still within an unsigned INT value). Since SQL Server doesn’t use unsigned INT, a BIGINT field set to 0 with an IDENTITY increment on it would give you the incrementing surrogate key, and then your “key” field would require a translation from BIGINT to your format.

    If you make the following a table valued function and used CROSS APPLY to it, you should do OK.

    DECLARE @pSurKey BIGINT

    SET@pSurKey = 18574935 -- add error checking

    ;

    WITH

    src AS(

    SELECTUpr = @pSurKey / 57600, -- first half, or Upper part

    Lwr = @pSurKey % 57600-- second half, or Lower part

    ),

    a AS(-- character map to numeric value

    SELECTNbr, Chr

    FROM(

    VALUES (CAST(0 AS SMALLINT),

    CAST( 'A' AS CHAR(1))),

    ( 1, 'B' ),( 2, 'C' ),( 3, 'D' ),

    ( 4, 'E' ),( 5, 'F' ),( 6, 'G' ),

    ( 7, 'H' ),( 8, 'J' ),( 9, 'K' ),

    ( 10, 'L' ),( 11, 'M' ),( 12, 'N' ),

    ( 13, 'P' ),( 14, 'Q' ),( 15, 'R' ),

    ( 16, 'S' ),( 17, 'T' ),( 18, 'U' ),

    ( 19, 'V' ),( 20, 'W' ),( 21, 'X' ),

    ( 22, 'Y' ),( 23, 'Z' )

    ) z (Nbr, Chr)

    ),

    cc AS(-- full range of characters, numeric values

    SELECTNbr = (b.Nbr*24)+ a.Nbr,

    Chr = b.Chr + a.Chr

    FROMa

    CROSS

    JOINa AS b

    ),

    Upr AS(-- Upper part lookup / interpretation

    SELECTPt4 = Chr,

    Pt3 = RIGHT('0' + CAST( Upr % 100 AS VARCHAR(3)), 2)

    FROMcc, src

    WHERENbr = Upr / 100

    ),

    Lwr AS(-- Lower part lookup / interpretation

    SELECTPt2 = Chr,

    Pt1 = RIGHT('0' + CAST( Lwr % 100 AS VARCHAR(3)), 2)

    FROMcc, src

    WHERENbr = Lwr / 100

    )

    SELECTSurKey = @pSurKey,

    AlNbrKey = Pt4 + Pt3 + Pt2 + Pt1

    FROMUpr, Lwr

    Beer's Law: Absolutum obsoletum
    "if it works it's out-of-date"

  • I'd do this in a C# CLR. Or in a simple program to generate the sequence and write it to DB table.

    The code would be straightforward.

  • grant-665832 (4/22/2015)


    I'd do this in a C# CLR. Or in a simple program to generate the sequence and write it to DB table.

    The code would be straightforward.

    Why? In this case, the equally straight-forward T-SQL would likely be as fast or faster and then you also wouldn't have to worry about more managed code not to mention the fact that you'd still end up with a 55-69GB table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/22/2015)


    grant-665832 (4/22/2015)


    I'd do this in a C# CLR. Or in a simple program to generate the sequence and write it to DB table.

    The code would be straightforward.

    Why? In this case, the equally straight-forward T-SQL would likely be as fast or faster and then you also wouldn't have to worry about more managed code not to mention the fact that you'd still end up with a 55-69GB table.

    Agreed - At very least you'd have to stream it back unless you want to put a serious kink in your server's memory when that sucker returns in one big 60GB memory object.

    if you're going to persist, set-based would still be fastest.

    Will be curious to see what you come back with - I can get past the "server going catatonic" by chunking this up, but not sure I'm making any major gains on the overall perf (still coming in around 2 hours to store all 3.3 Billion sequences).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • @ditha,

    Ok, here we go.

    As I previously stated and most on this thread implied, generating and storing all 3+ billion values of the complete sequence isn't just going to cause a load on the server (even if you use C#, the data still has to hit the disk) but it's also going to take somewhere between 55 and 68 GB and a large group of people in your organization would probably try to get a piece of your hide of you did such a thing. You should also be aware that if you add just 2 INT columns, you will double the size of that even if those columns contain no data. A clustered index wouldn't be so bad to add but if you were to add a non-clustered index to that column, that would also cause the size to double.

    So, considering that you probably don't currently have a 3+ billion row table to add this sequence to, generating and storing all 3+ billion values in the full sequence isn't a good idea.

    None the less, I also understand why someone might want to do such a thing because they think they can figure out an "easy" way to automatically impart such a precalculated numbering system to a table as it grows. Of course, they also want it stored in the table so that they don't have to go through the throws of reversing the alphanumeric sequence to a integer, which would actually be the best idea but we'll skip that for now with the idea of making all of this drop-dead simple.

    DEK46656 has the right idea and he also has the right idea of splitting this thing into obvious paired parts but most people don't understand even the straight forward method never mind the paired optimizations that he (apologies for the generic term but don't know the gender) used. With that thought in mind, we're going to use the straight forward method that I'm pretty sure you'll have no problem grasping.

    You already have a handle on the values of each position in a sequence so let's just create a scalar function to do the conversion from a BIGINT to the sequence value. And, yeah... it has to be scalar function because of the way we're going to use it.

    USE tempdb; --Safe place to experiment

    GO

    CREATE FUNCTION dbo.AA99AA99

    /*******************************************************************************

    Pupose:

    Give an integer value from 0 to 3,317,759,999 for a total of 3,317,760,000

    different values (about a billion more than the positive range of an INT),

    convert the integer to the AA99AA99 format where "A" is a letter of the

    English Alphabet not including "I" or "O" and "9" is a digit from 0 to 9 in

    the decimal numbering system.

    Usage:

    --===== Basic Syntax

    SELECT dbo.AA99AA99(@N)

    ;

    Revision History

    Rev 00 - 22 Apr 2015 - Jeff Moden

    - Initial creation and unit test.

    - Ref: http://www.sqlservercentral.com/Forums/Topic1678735-391-3.aspx

    *******************************************************************************/

    --===== Declare the I/O for this function

    (@N BIGINT)

    RETURNS CHAR(8) WITH SCHEMABINDING AS

    BEGIN

    RETURN

    (

    SELECT Seq_AA99AA99 =

    SUBSTRING(Str2,(@N/138240000)%LEN(Str2)+1,1) --LeftMost

    + SUBSTRING(Str2,(@N/5760000)%LEN(Str2)+1,1)

    + SUBSTRING(Str1,(@N/576000)%LEN(Str1)+1,1)

    + SUBSTRING(Str1,(@N/57600)%LEN(Str1)+1,1)

    + SUBSTRING(Str2,(@N/2400)%LEN(Str2)+1,1)

    + SUBSTRING(Str2,(@N/100)%LEN(Str2)+1,1)

    + SUBSTRING(Str1,(@N/10)%LEN(Str1)+1,1)

    + SUBSTRING(Str1,(@N/1)%LEN(Str1)+1,1) --RightMost

    FROM (SELECT '0123456789','ABCDEFGHJKLMNPQRSTUVWXYZ')d(Str1,Str2)

    );

    END

    ;

    Now, you can use that directly or you can use it as a computed column in a table that will automatically figure things out for you.

    For example, let's create the following table...

    USE tempdb; --Safe place to experiment

    GO

    CREATE TABLE dbo.TestTable

    (

    TestTableID BIGINT IDENTITY(0,1)

    ,SomeDate DATE

    ,SomethingElse VARCHAR(50)

    ,Seq_AA99AA99 AS ISNULL(dbo.AA99AA99(TestTableID),'') PERSISTED --Calulated column

    )

    ;

    Now, let's add a million rows in that table to simulate some usage...

    INSERT INTO dbo.TestTable

    (SomeDate,SomethingElse)

    SELECT TOP 1000000

    SomeDate = GETDATE()

    ,SomethingElse = NEWID()

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    ;

    Then do the following and see what you get.

    SELECT * FROM dbo.TestTable;

    To finish it all up, since the column is PERSITED and has been rendered as a NOT NULL column by the ISNULL, it could actually be used as the Clustered PK of the table as if it were a non-computed column.

    If you study the function carefully, you also now know how to create virtually any kind of sequence you want. Just remember that such sequences with 3 or more adjacent alpha columns are absolutely guaranteed to eventually start spelling some pretty nasty words. That can certainly be made a bit less offensive by eliminating all vowels but that won't protect you on an international basis.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff - that is extremely clever!


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Matt Miller (#4) (4/22/2015)


    Jeff Moden (4/22/2015)


    grant-665832 (4/22/2015)


    I'd do this in a C# CLR. Or in a simple program to generate the sequence and write it to DB table.

    The code would be straightforward.

    Why? In this case, the equally straight-forward T-SQL would likely be as fast or faster and then you also wouldn't have to worry about more managed code not to mention the fact that you'd still end up with a 55-69GB table.

    Agreed - At very least you'd have to stream it back unless you want to put a serious kink in your server's memory when that sucker returns in one big 60GB memory object.

    if you're going to persist, set-based would still be fastest.

    Will be curious to see what you come back with - I can get past the "server going catatonic" by chunking this up, but not sure I'm making any major gains on the overall perf (still coming in around 2 hours to store all 3.3 Billion sequences).

    Thanks, Matt. At an hour and 41 minutes, I didn't do much better. The difference might be as subtle as me using a SELECT/INTO to create the table on the fly. I don't want to post my experimental code that produced all 3.3 billion rows because I truly fear that some neophyte might try it on a company server and lose his or her job or at least suffer a serious trip to the woodshed. 😉

    I can send it to you on a PM if you're really interested but I can't imagine it being much different than anything you might have come up with.

    {Edit}... and it should be pretty obvious that I didn't use a scalar function in my experiments to do it in that time. :hehe:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • dwain.c (4/22/2015)


    Jeff - that is extremely clever!

    Heh... thanks Dwain. As you can immediately tell, I've never had to do such a thing before. 😀 I've had to answer questions like this enough times over the years that I think it's about time for me to write a Spackle article on it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/22/2015)


    dwain.c (4/22/2015)


    Jeff - that is extremely clever!

    Heh... thanks Dwain. As you can immediately tell, I've never had to do such a thing before. 😀 I've had to answer questions like this enough times over the years that I think it's about time for me to write a Spackle article on it.

    It also gave me an idea for something really exotic that I've worked on before. Too bad I can't talk to you in person about it.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (4/22/2015)


    Jeff Moden (4/22/2015)


    dwain.c (4/22/2015)


    Jeff - that is extremely clever!

    Heh... thanks Dwain. As you can immediately tell, I've never had to do such a thing before. 😀 I've had to answer questions like this enough times over the years that I think it's about time for me to write a Spackle article on it.

    It also gave me an idea for something really exotic that I've worked on before. Too bad I can't talk to you in person about it.

    You can do the next best thing... you've got my email. I'd read whatever you've got on this or a similar subject with great relish and respond in kind.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hmmm... depending on what it is, joint article?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 16 through 30 (of 32 total)

You must be logged in to reply to this topic. Login to reply