Help needed- Generate 8 char Alphanumeric Unique Sequence

Question

Post reply

Help needed- Generate 8 char Alphanumeric Unique Sequence

Viewing 15 posts - 16 through 30 (of 32 total)

You must be logged in to reply to this topic. Login to reply

Dwain Camps SSC Guru Points: 86978 More actions · Answer 1

WITH Alphas AS

(

SELECT a

FROM (VALUES('A'),('B'),('C'),('D')) a (a) -- use 22 letters

),

Numbers AS

(

SELECT n

FROM (VALUES('0'),('1'),('2')) a (n) -- use 10 digits

)

-- For my case 4*4*3*3*4*4*3*3 = 20,736 rows

SELECT a.a + b.a + c.n + d.n + e.a + f.a + g.n + h.n

FROM Alphas a

CROSS JOIN Alphas b

CROSS JOIN Numbers c

CROSS JOIN Numbers d

CROSS JOIN Alphas e

CROSS JOIN Alphas f

CROSS JOIN Numbers g

CROSS JOIN Numbers h;

I wouldn't dare run this with 22 letters and 10 digits but there you go.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Sean Lange SSC Guru Points: 286590 More actions · Answer 2

dwain.c (4/21/2015)
WITH Alphas AS ( SELECT a FROM (VALUES('A'),('B'),('C'),('D')) a (a) -- use 22 letters ), Numbers AS ( SELECT n FROM (VALUES('0'),('1'),('2')) a (n) -- use 10 digits ) -- For my case 4*4*3*3*4*4*3*3 = 20,736 rows SELECT a.a + b.a + c.n + d.n + e.a + f.a + g.n + h.n FROM Alphas a CROSS JOIN Alphas b CROSS JOIN Numbers c CROSS JOIN Numbers d CROSS JOIN Alphas e CROSS JOIN Alphas f CROSS JOIN Numbers g CROSS JOIN Numbers h;
I wouldn't dare run this with 22 letters and 10 digits but there you go.

Yep. That is exactly what I was thinking too Dwain.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Sporidium SSCommitted Points: 1550 More actions · Answer 3

dwain.c (4/21/2015)
WITH Alphas AS ( SELECT a FROM (VALUES('A'),('B'),('C'),('D')) a (a) -- use 22 letters ), Numbers AS ( SELECT n FROM (VALUES('0'),('1'),('2')) a (n) -- use 10 digits ) -- For my case 4*4*3*3*4*4*3*3 = 20,736 rows SELECT a.a + b.a + c.n + d.n + e.a + f.a + g.n + h.n FROM Alphas a CROSS JOIN Alphas b CROSS JOIN Numbers c CROSS JOIN Numbers d CROSS JOIN Alphas e CROSS JOIN Alphas f CROSS JOIN Numbers g CROSS JOIN Numbers h;
I wouldn't dare run this with 22 letters and 10 digits but there you go.

If it's important to make the sequence run in (what I'd consider) the "natural" order, you might want to add an OPTION(FORCE ORDER).

Edit:

This is of course utter tosh: if you want to guarantee a particular order, you should use ORDER BY.

The hint only had the described effect when I was experimenting with the query in an unordered "firehose" form.

Luckily, Jeff has since moved the discussion on to a more efficient solution.

Jeff Moden SSC Guru Points: 1004415 More actions · Answer 4

@ditha,

Tonight after work, I'm going to try to save your professional life. If you run any code that generates all 3+ billion rows, the table just for that one sequence column alone will be more than 68GB. If you force it into CHAR(8) as a NOT NULL column, it will still be over 55GB. And, it will lock up your server for 1 hour, 41 minutes, and 37 seconds. That's not a guess. That's the actual time on a 4 core machine. I ran it because I was curious as to how long it would take and what the actual effects on system resources would be.

If you do such a thing, your DBA, NetOps, the backup team, your managers, and the users of the machine will all invite you to the nearest woodshed for a serious "come to Jesus" talk about what not to do.

I have a function that will make your life a whole lot easier and I'll demonstrate it tonight.

Whatever you do, do NOT run Dwain's code or anyone else's code that produces the entire sequence! It's not that he's done anything wrong. It's just the volume of data that his code will correctly produce.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

DEK46656 SSCrazy Points: 2111 More actions · Answer 5

Curious problem… the first thing I thought of was to think of the format (AA00ZZ99) as an IP address: split up the value into 4 distinct buckets, with a numeric range in each section. The 2 letter section would translate to a value between 0 and 575 (24*24), and the numeric part is 0 to 99.

The numeric value for an IP address is a 4 byte unsigned INT (256*256*256*256). In this case, the numeric value will range from 0 to 3,306,250,000 (still within an unsigned INT value). Since SQL Server doesn’t use unsigned INT, a BIGINT field set to 0 with an IDENTITY increment on it would give you the incrementing surrogate key, and then your “key” field would require a translation from BIGINT to your format.

If you make the following a table valued function and used CROSS APPLY to it, you should do OK.

DECLARE @pSurKey BIGINT

SET@pSurKey = 18574935 -- add error checking

;

WITH

src AS(

SELECTUpr = @pSurKey / 57600, -- first half, or Upper part

Lwr = @pSurKey % 57600-- second half, or Lower part

),

a AS(-- character map to numeric value

SELECTNbr, Chr

FROM(

VALUES (CAST(0 AS SMALLINT),

CAST( 'A' AS CHAR(1))),

( 1, 'B' ),( 2, 'C' ),( 3, 'D' ),

( 4, 'E' ),( 5, 'F' ),( 6, 'G' ),

( 7, 'H' ),( 8, 'J' ),( 9, 'K' ),

( 10, 'L' ),( 11, 'M' ),( 12, 'N' ),

( 13, 'P' ),( 14, 'Q' ),( 15, 'R' ),

( 16, 'S' ),( 17, 'T' ),( 18, 'U' ),

( 19, 'V' ),( 20, 'W' ),( 21, 'X' ),

( 22, 'Y' ),( 23, 'Z' )

) z (Nbr, Chr)

),

cc AS(-- full range of characters, numeric values

SELECTNbr = (b.Nbr*24)+ a.Nbr,

Chr = b.Chr + a.Chr

FROMa

CROSS

JOINa AS b

),

Upr AS(-- Upper part lookup / interpretation

SELECTPt4 = Chr,

Pt3 = RIGHT('0' + CAST( Upr % 100 AS VARCHAR(3)), 2)

FROMcc, src

WHERENbr = Upr / 100

),

Lwr AS(-- Lower part lookup / interpretation

SELECTPt2 = Chr,

Pt1 = RIGHT('0' + CAST( Lwr % 100 AS VARCHAR(3)), 2)

FROMcc, src

WHERENbr = Lwr / 100

)

SELECTSurKey = @pSurKey,

AlNbrKey = Pt4 + Pt3 + Pt2 + Pt1

FROMUpr, Lwr

Beer's Law: Absolutum obsoletum
"if it works it's out-of-date"

grant-665832 Valued Member Points: 70 More actions · Answer 6

I'd do this in a C# CLR. Or in a simple program to generate the sequence and write it to DB table.

The code would be straightforward.

Jeff Moden SSC Guru Points: 1004415 More actions · Answer 7

grant-665832 (4/22/2015)
I'd do this in a C# CLR. Or in a simple program to generate the sequence and write it to DB table.
The code would be straightforward.

Why? In this case, the equally straight-forward T-SQL would likely be as fast or faster and then you also wouldn't have to worry about more managed code not to mention the fact that you'd still end up with a 55-69GB table.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 8

Jeff Moden (4/22/2015)
grant-665832 (4/22/2015)
I'd do this in a C# CLR. Or in a simple program to generate the sequence and write it to DB table.
The code would be straightforward.
Why? In this case, the equally straight-forward T-SQL would likely be as fast or faster and then you also wouldn't have to worry about more managed code not to mention the fact that you'd still end up with a 55-69GB table.

Agreed - At very least you'd have to stream it back unless you want to put a serious kink in your server's memory when that sucker returns in one big 60GB memory object.

if you're going to persist, set-based would still be fastest.

Will be curious to see what you come back with - I can get past the "server going catatonic" by chunking this up, but not sure I'm making any major gains on the overall perf (still coming in around 2 hours to store all 3.3 Billion sequences).

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jeff Moden SSC Guru Points: 1004415 More actions · Answer 9

@ditha,

Ok, here we go.

As I previously stated and most on this thread implied, generating and storing all 3+ billion values of the complete sequence isn't just going to cause a load on the server (even if you use C#, the data still has to hit the disk) but it's also going to take somewhere between 55 and 68 GB and a large group of people in your organization would probably try to get a piece of your hide of you did such a thing. You should also be aware that if you add just 2 INT columns, you will double the size of that even if those columns contain no data. A clustered index wouldn't be so bad to add but if you were to add a non-clustered index to that column, that would also cause the size to double.

So, considering that you probably don't currently have a 3+ billion row table to add this sequence to, generating and storing all 3+ billion values in the full sequence isn't a good idea.

None the less, I also understand why someone might want to do such a thing because they think they can figure out an "easy" way to automatically impart such a precalculated numbering system to a table as it grows. Of course, they also want it stored in the table so that they don't have to go through the throws of reversing the alphanumeric sequence to a integer, which would actually be the best idea but we'll skip that for now with the idea of making all of this drop-dead simple.

DEK46656 has the right idea and he also has the right idea of splitting this thing into obvious paired parts but most people don't understand even the straight forward method never mind the paired optimizations that he (apologies for the generic term but don't know the gender) used. With that thought in mind, we're going to use the straight forward method that I'm pretty sure you'll have no problem grasping.

You already have a handle on the values of each position in a sequence so let's just create a scalar function to do the conversion from a BIGINT to the sequence value. And, yeah... it has to be scalar function because of the way we're going to use it.

USE tempdb; --Safe place to experiment

GO

CREATE FUNCTION dbo.AA99AA99

/*******************************************************************************

Pupose:

Give an integer value from 0 to 3,317,759,999 for a total of 3,317,760,000

different values (about a billion more than the positive range of an INT),

convert the integer to the AA99AA99 format where "A" is a letter of the

English Alphabet not including "I" or "O" and "9" is a digit from 0 to 9 in

the decimal numbering system.

Usage:

--===== Basic Syntax

SELECT dbo.AA99AA99(@N)

;

Revision History

Rev 00 - 22 Apr 2015 - Jeff Moden

- Initial creation and unit test.

- Ref: http://www.sqlservercentral.com/Forums/Topic1678735-391-3.aspx

*******************************************************************************/

--===== Declare the I/O for this function

(@N BIGINT)

RETURNS CHAR(8) WITH SCHEMABINDING AS

BEGIN

RETURN

(

SELECT Seq_AA99AA99 =

SUBSTRING(Str2,(@N/138240000)%LEN(Str2)+1,1) --LeftMost

+ SUBSTRING(Str2,(@N/5760000)%LEN(Str2)+1,1)

+ SUBSTRING(Str1,(@N/576000)%LEN(Str1)+1,1)

+ SUBSTRING(Str1,(@N/57600)%LEN(Str1)+1,1)

+ SUBSTRING(Str2,(@N/2400)%LEN(Str2)+1,1)

+ SUBSTRING(Str2,(@N/100)%LEN(Str2)+1,1)

+ SUBSTRING(Str1,(@N/10)%LEN(Str1)+1,1)

+ SUBSTRING(Str1,(@N/1)%LEN(Str1)+1,1) --RightMost

FROM (SELECT '0123456789','ABCDEFGHJKLMNPQRSTUVWXYZ')d(Str1,Str2)

);

END

;

Now, you can use that directly or you can use it as a computed column in a table that will automatically figure things out for you.

For example, let's create the following table...

USE tempdb; --Safe place to experiment

GO

CREATE TABLE dbo.TestTable

(

TestTableID BIGINT IDENTITY(0,1)

,SomeDate DATE

,SomethingElse VARCHAR(50)

,Seq_AA99AA99 AS ISNULL(dbo.AA99AA99(TestTableID),'') PERSISTED --Calulated column

)

;

Now, let's add a million rows in that table to simulate some usage...

INSERT INTO dbo.TestTable

(SomeDate,SomethingElse)

SELECT TOP 1000000

SomeDate = GETDATE()

,SomethingElse = NEWID()

FROM sys.all_columns ac1

CROSS JOIN sys.all_columns ac2

;

Then do the following and see what you get.

SELECT * FROM dbo.TestTable;

To finish it all up, since the column is PERSITED and has been rendered as a NOT NULL column by the ISNULL, it could actually be used as the Clustered PK of the table as if it were a non-computed column.

If you study the function carefully, you also now know how to create virtually any kind of sequence you want. Just remember that such sequences with 3 or more adjacent alpha columns are absolutely guaranteed to eventually start spelling some pretty nasty words. That can certainly be made a bit less offensive by eliminating all vowels but that won't protect you on an international basis.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Dwain Camps SSC Guru Points: 86978 More actions · Answer 10

Jeff - that is extremely clever!

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Jeff Moden SSC Guru Points: 1004415 More actions · Answer 11

Matt Miller (#4) (4/22/2015)
Jeff Moden (4/22/2015)
grant-665832 (4/22/2015)
I'd do this in a C# CLR. Or in a simple program to generate the sequence and write it to DB table.
The code would be straightforward.
Why? In this case, the equally straight-forward T-SQL would likely be as fast or faster and then you also wouldn't have to worry about more managed code not to mention the fact that you'd still end up with a 55-69GB table.
Agreed - At very least you'd have to stream it back unless you want to put a serious kink in your server's memory when that sucker returns in one big 60GB memory object.
if you're going to persist, set-based would still be fastest.
Will be curious to see what you come back with - I can get past the "server going catatonic" by chunking this up, but not sure I'm making any major gains on the overall perf (still coming in around 2 hours to store all 3.3 Billion sequences).

Thanks, Matt. At an hour and 41 minutes, I didn't do much better. The difference might be as subtle as me using a SELECT/INTO to create the table on the fly. I don't want to post my experimental code that produced all 3.3 billion rows because I truly fear that some neophyte might try it on a company server and lose his or her job or at least suffer a serious trip to the woodshed. 😉

I can send it to you on a PM if you're really interested but I can't imagine it being much different than anything you might have come up with.

{Edit}... and it should be pretty obvious that I didn't use a scalar function in my experiments to do it in that time. :hehe:

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004415 More actions · Answer 12

dwain.c (4/22/2015)
Jeff - that is extremely clever!

Heh... thanks Dwain. As you can immediately tell, I've never had to do such a thing before. 😀 I've had to answer questions like this enough times over the years that I think it's about time for me to write a Spackle article on it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Dwain Camps SSC Guru Points: 86978 More actions · Answer 13

Jeff Moden (4/22/2015)
dwain.c (4/22/2015)
Jeff - that is extremely clever!
Heh... thanks Dwain. As you can immediately tell, I've never had to do such a thing before. 😀 I've had to answer questions like this enough times over the years that I think it's about time for me to write a Spackle article on it.

It also gave me an idea for something really exotic that I've worked on before. Too bad I can't talk to you in person about it.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Jeff Moden SSC Guru Points: 1004415 More actions · Answer 14

dwain.c (4/22/2015)
Jeff Moden (4/22/2015)
dwain.c (4/22/2015)
Jeff - that is extremely clever!
Heh... thanks Dwain. As you can immediately tell, I've never had to do such a thing before. 😀 I've had to answer questions like this enough times over the years that I think it's about time for me to write a Spackle article on it.
It also gave me an idea for something really exotic that I've worked on before. Too bad I can't talk to you in person about it.

You can do the next best thing... you've got my email. I'd read whatever you've got on this or a similar subject with great relish and respond in kind.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004415 More actions · Answer 15

Hmmm... depending on what it is, joint article?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)