Want to create random alphanumeric characters for primary key values

  • for a new project

    i need to create random alphanumeric characters as primary key values when inserting a record.

    eg: cmSbXsFE3l8

    it can start from 4 digit characters and can grow to 6, 7 as required

    the database will involve more than 10000 concurrent users.

    i don't want to use guid or auto increment integer or sequence

    any idea

    thanks

  • Why? What does a sequence, identity, or guid not do for you?

    What problem are you trying to solve with this approach?

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • carrieathomer (2/11/2015)


    for a new project

    i need to create random alphanumeric characters as primary key values when inserting a record.

    eg: cmSbXsFE3l8

    it can start from 4 digit characters and can grow to 6, 7 as required

    the database will involve more than 10000 concurrent users.

    i don't want to use guid or auto increment integer or sequence

    any idea

    thanks

    the default answer is don't do this, for several reasons.

    a primary key should be small in size, and use an incremental value in order to make storage and searching effective and maximize values per page.

    having a random string, like a guid, will result in massive amount of page splits every time a new user is added.

    a string like that is prone to spelling bad words and offensive phrases. try running the explanation that an offensive word was "automatically, randomly generated" and then tied to a customers account through the legal department, and see if they think that's a good idea.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Lowell (2/11/2015)


    carrieathomer (2/11/2015)


    for a new project

    i need to create random alphanumeric characters as primary key values when inserting a record.

    eg: cmSbXsFE3l8

    it can start from 4 digit characters and can grow to 6, 7 as required

    the database will involve more than 10000 concurrent users.

    i don't want to use guid or auto increment integer or sequence

    any idea

    thanks

    the default answer is don't do this, for several reasons.

    a primary key should be small in size, and use an incremental value in order to make storage and searching effective and maximize values per page.

    having a random string, like a guid, will result in massive amount of page splits every time a new user is added.

    a string like that is prone to spelling bad words and offensive phrases. try running the explanation that an offensive word was "automatically, randomly generated" and then tied to a customers account through the legal department, and see if they think that's a good idea.

    I can understand what you are trying to say. Then how come all the youtube videos are using random alphanumeric string as id values to identify the videos

    eg: cmSbXsFE3l8

  • Good point, but the name of the video is likely not the primary key of the table.

    For starters, as Lowell said, having a key that could be changed is not a best practice.

    If your new application is actually going to be storing documents, or videos, how do you handle the names? I am a user. I create a document called "Michaels document.docx".

    The system would likely generate an artificial key, preserving the rest of the data.

    When you search for a video on YouTube, do you search for mf78eysj ???

    Or, do you search for a subject, key words, author (or person who uploaded the video), and so forth?

    Also, that random set of letters you are seeing may not be so random. Having worked with a few document management systems, there were typically a finite set of values that were coupled with other values to provide an alternate key on the table.

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • carrieathomer (2/11/2015)


    i need to create random alphanumeric characters as primary key values when inserting a record.

    eg: cmSbXsFE3l8

    it can start from 4 digit characters and can grow to 6, 7 as required

    the database will involve more than 10000 concurrent users.

    Six-figure hardware might just cope with this, if you have a SQL Server architecture expert set it up and a TSQL expert assigned full time to the project. You will need one.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • let us assume i am going to store documents.

    for every document i am going to set a primary key that will be an alphanumeric value (may be 4 digits)

    n users can upload n documents with same names, but i am going to differentiate only with primary key value. document name is not my identifier, only the primary key value is my identifier.

    so for that i want to use alphanumeric value, that is my requirement

  • carrieathomer (2/11/2015)


    let us assume i am going to store documents.

    for every document i am going to set a primary key that will be an alphanumeric value (may be 4 digits)

    n users can upload n documents with same names, but i am going to differentiate only with primary key value. document name is not my identifier, only the primary key value is my identifier.

    so for that i want to use alphanumeric value, that is my requirement

    Who made the decision to use the alpha-numeric value as the primary key? We (myself, and speaking for everyone else on this thread) strongly recommend that you try to change this requirement.

    It's extremely difficult to implement, it breaks relational design rules, and it certainly is not a best practice.

    I am very curious why an identity, sequence, or guid will not work.

    I would recommend using an identity or sequence as the artificial key, and then create a composite alternate key out of the combination of other fields.

    The document name will likely not be a good candidate for this.

    Also, is there any plans for versioning, if this is a document management system? If a user creates a doc, the system should keep track of versions. Each of these would need a unique id also.

    But, if you insist, here is a link from Jeff Moden that is very close to what you are trying to do.

    http://www.sqlservercentral.com/Forums/Topic1011181-1292-1.aspx

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • carrieathomer (2/11/2015)


    let us assume i am going to store documents.

    for every document i am going to set a primary key that will be an alphanumeric value (may be 4 digits)

    n users can upload n documents with same names, but i am going to differentiate only with primary key value. document name is not my identifier, only the primary key value is my identifier.

    so for that i want to use alphanumeric value, that is my requirement

    I think you are confusing the primary key of the table (used by the database and if circumstances permit, by the business) with a unique identifier for a document. In your case they are probably two different entities. Use a sequential number as the primary key of the table, and your string as the document identifier. Using strings as identifiers has risks as others have pointed out - you may need a profanity checker in your string generator.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • I have to agree with what everybody else is telling you. I believe you are establishing an "artificlal" requirement here for some reason. I cannot imagine a case where such a requirement would come from a business user, and if perchance it were it is incumbent on the software professional to rationally argue the case where that is not advisable and provide alternatives.

    When choosing a primary key, normally you do one of two things:

    - Use the natural key, which is a combination of attributes from a row that will result in uniqueness. (Yes I know there's a bit more to it than this, but for now this should suffice)

    - Use an artificial key, as with IDENTITY. I'm not big on guids as primary keys due to the horror stories I've read about performance, but that is ultimately the option that is closest to your "requirement."

    There are ways that you could randomly generate an alphanumeric key as you're suggesting you want, but the problem with them is that on insert you want to make sure each new one has never been selected before. There are ways to do even that, but they usually involve generating all of the possible random keys first and then selecting one from them (this is "generating a random number without duplicates"). For example, here's a case study by SQL MVP Aaron Bertrand: Generate random integers without collisions. I actually have another way to do this, but I won't go into it here.

    Note that if this is just for testing purposes and isn't the way you plan to do it in your Prod database, there may be some hope here.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Also, keep in mind that what is displayed to the user might not be what is stored in the db.

    Using the you tube example, the random value likely is a hash of some sort. The database may be storing "michaels video" and they are applying a scrambling algorithm for display.

    This is a common practice when embedding a file name, etc into a public URL. You never want to display these types of values in clear text.

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • This is the nearest solutions i may use, using the following article.

    http://sqlperformance.com/2013/09/t-sql-queries/random-collisions

    my scenario,

    More 1000000 users registers with site and uploads documents minimum 10. Even though i can use guid, sequence for primary key values, I don't want to show them in the public site to identify then. So may include another unique column to store a unique alphanumeric values

    so the public url will identify the document as site.com/doc.php?id=x3e9yh

    which in fact x3e9yh will point to docid = 487 (eg)

    from the article above i will create million alphanumeric values which i will take and add in the document table to identify. After using it i will mark as used. I will write a trigger to know when i am running short of alphanumeric values.

    is this a good idea.

  • carrieathomer (2/12/2015)


    This is the nearest solutions i may use, using the following article.

    http://sqlperformance.com/2013/09/t-sql-queries/random-collisions

    my scenario,

    More 1000000 users registers with site and uploads documents minimum 10. Even though i can use guid, sequence for primary key values, I don't want to show them in the public site to identify then. So may include another unique column to store a unique alphanumeric values

    so the public url will identify the document as site.com/doc.php?id=x3e9yh

    which in fact x3e9yh will point to docid = 487 (eg)

    from the article above i will create million alphanumeric values which i will take and add in the document table to identify. After using it i will mark as used. I will write a trigger to know when i am running short of alphanumeric values.

    is this a good idea.

    That's the point of my last post. What's stored is typically not displayed.

    Aaron's article is great, it will work for you, but you do not have to do all of that work.

    Find, or write, a hashing function that works on a GUID or an integer. Scramble when you display the value, unscramble when you store the value.

    Read this article. It's probably easier, and fits what you are trying to do.

    http://sqlmag.com/t-sql/simple-string-encryption-and-decryption

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • which is better for performance GUID or sequence

    thanks

  • carrieathomer (2/12/2015)


    which is better for performance GUID or sequence

    thanks

    Since a GUID (UniqueIdentifier/NEWID()) is based on a pseudo-random number and is 16 bytes long where a sequence is typically narrow (8, 4, 2, or 1 bytes in width) and ever increasing, which do you suppose will usually be better for performance?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply