T-SQL Help Needed

  • Sorry, Joe, but your answers are not only inappropriate, but also will introduce bad limitations in design, I have very strong words to say about your advices, but I will try to be extremely polite:

    Short answers: for businesses, I use the Data Universal Numbering System(DUNS). This “limits” me to ~180 million companies right now. It is now so universal, that I have to have a DUNS to do business with DELL as a consultant. The advantages are that D&B maintains it for me (validation and verification), it is free and it is a legal requirement in many countries.

    1a.) Not fit for the purpose: That system is supported by few governments and only some companies in the world will have this number. I have few my own companies (eg. in Australia and UK, also in East Europe) but none - NONE has this number.

    1b.) Design problems: Using such identifiers as PK (they do represent random number generated by external DUNS) may have significant impact on performance and usability. You will need to think twice before making PK to be clustered index in this case (numbers are random). WHat happens if you need to load the list of 1000 new companies which doesn't have DUNS allocated as yet?

    If you request this number electronically, according to the wiki:

    When obtaining a DUNS number online, the wait can be as long as 30 days

    , but you can call them to have immediate response.

    In the database, I'm working right now, there are around 3,000,000 companies registered world-wide, I'm pretty much sure that most of them don't have this "useful" identifier...

    For people, I use their email address, like most internet users these days. I validate it with a regular expression (from hell) and verify it with a ping.

    I've never heard more stupid idea than that, sorry Professor.

    2a.) Not fit for the purpose: Where are you from? Do you really think that every person on the planet have an email? Is it issued by some Archangel at birth? What about children? Nothing can stop email be reused for different people.

    2b.) Design problems: Same as for DUNS idea. If used for PK, will have issue with being clustered

    index. Will have issue with obtaining for the large list of new entries. If taken from phone call, it can be entered and saved incorrectly. Will you advise then updating it?

    Again, for the author of so many books about SQL, your answers are very poor. I will feel very sorry for people who will take such advices.

    Lets now see your suggestions about choosing the candidates for a natural keys:

    1) A natural key is a subset of attributes which occur in a table and act as a unique identifier. They are seen by the user. You can go to the external reality and verify them. you would also like to have some validation rule. Example: bar codes on consumer goodsand validate them with a check digit or a manufacturer's website, geographical co-ordinates (get a GPS).

    As you can see from examples for Person and Companies, there are cases where there are none. Eg. you simply have no natural unique identifier for Person. (Bar-codes are not guaranteed to be globally unique and never re-used).

    2) An artificial key is an extra attribute added to the table which is seen by the user. It does not exist in the external reality, but can be verified for syntax or check digits inside itself. It is up to the DBA to maintain a trusted source for them. Example: the open codes in the UPC/EAN scheme which a user can assign to his own stuff. The check digits still work, but you have to verify them inside your own enterprise

    If you have to construct a key yourself, it takes time to design them, to invent a validation rule, set up audit trails, etc.

    .

    What do you mean by "external reality"? Is it something extraterrestrial? IDENTITY is easily can play this role.

    3) An "exposed physical locator" is not based on attributes in the data model and is exposed to user. There is no reasonable way to predict it or verify it, since it usually comes from the physical state of the hardware at the time of data insertion. The system obtains a value thru some physical process in the storage hardware totally unrelated to the logical data model. Example: IDENTITY columns, other proprietary, non-relational auto-numbering devices.

    Technically, these are not really keys at all, since they are attributes of the PHYSICAL storage and are not even part of the LOGICAL data model. But they are handy for lazy, non-RDBMS rogrammers who don't want to research or think! This is the worst way to program in SQL.

    I think that is what most often used by most of professional RDBMS programmers including SQL developers. I don't believe that your opinion in this regards (eg. lazy and non-RDBMS ...) will be supported by most of specialists and architects. Instead they can call you plain-stupid and arrogant for such statements.

    4) A surrogate key is system generated to replace the actual key behind the covers where the user never sees it. It is based on attributes in the table. Example: Teradata hashing algorithms, indexes, pointer chains, ADABASE numbers, etc.

    The fact that you can never see it or use it for DELETE and UPDATE or create it for INSERT is vital. When users can get to them, they will screw up the data integrity by getting the real keys and these physical locators out of synch. The system must maintain them.

    Surrogate key doesn't need to be based on attributes in the table. IDENTITY is a perfect for this purpose.

    ** Notice that people get "exposed physical locator" and surrogate mixed up; they are totally different concepts. **

    They are! Theoretically! But practically it can be the same.

    Honestly, I'm really tired to read the crap you have put here. Most of things you say are fine theoretically, but often have no much practical use.

    Whatever you have said here is "ok" for "Theoretical aspects of RDBMS for beginners" subject in college or school - MAXIMUM, but it is mainly irrelevant to practical applications.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene Elutin (9/12/2012)


    ...Even DNA is not 100% unique ...

    For us humans, and you too Eugene, it is - even for so-called "identical twins".

    Eugene Elutin (9/12/2012)


    ...(that is why forensic DNA-result doesn't confirm "match" or "no-match", it does only gives the probability of them).

    Miles out! Forensic DNA analysis works by chopping up DNA samples using special tools (restriction enzymes) and measuring the number and relative (no pun intended) size of fragments, not by sequencing it. That's why it can only yield a match probability.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • aaron.reese (9/13/2012)


    [

    If I create a table to look up state codes ('AL', 'AK', etc.), I use state code as the key, not a meaningless identity value.

    fine, but if you are creating a table of product groups (<'BK','Books'>,<'SH','Shoes'>,<'BG','Bags'>) etc.. I would put an ID field on that and use that instead, otherwise if someone decides they need to re-use or change the product group code you're stuffed.

    A real world example.

    A company I worked for produces clothing catalogues. Each catalogue has a 2 character reference. They produce about 130 catalogues/adverts a year, but not all letters get used (S,Q,I etc) This means they cycle through all the possible combinations within just under 3 years. They did not use an ID on the table and so could not use it as a FK. They now have multi-part keys into almost all their other tables (Brand, Year and catalogue code) which slows down development, testing and reporting as all table joins will require at least three join criteria. The same principle of no ID field is in most of the application and so the same issue arises. As a result, they have some data that it is impossible to correct (e.g. the same product code represents a different colour in two consecutive seasons because the colour table did not use IDs, but a multi-part key based on department and colour and the product changed departements and the same colour had a different colour code in the new departement)

    It's quite common for a product to "evolve" over time without its' stock code changing - it's rampant in the fashion world. Clients, agents, reps all know an item by the stock code. How would you use it as a natural key? It would have to incorporate something temporal or iterative, like "Season". Can you imagine joining this to a table of order lines or sales lines? And any other table which refers to products?

    The best solution I've worked with uses incrementing ID's as surrogate keys. Each iteration of the product will have a separate StockID or ProductID. The "stock code" becomes something more like a description or name - which it is to the agents etc - and is of no significance to the db. Likewise, users are oblivious to the ProductID.

    Clustering the Products table on this surrogate key isn't a problem. The most recently added products are more likely to be ordered or sold than older products, other than that the set (of items in an order/sale) is quite random.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • CELKO (9/13/2012)


    I really would love to listen to his answers regarding choosing natural keys for Persons and Companies entities (where the system is international and people/companies are not just from USA, you know there some other countries around the globe)

    Short answers: for businesses, I use the Data Universal Numbering System(DUNS). This “limits” me to ~180 million companies right now. It is now so universal, that I have to have a DUNS to do business with DELL as a consultant. The advantages are that D&B maintains it for me (validation and verification), it is free and it is a legal requirement in many countries.

    For people, I use their email address, like most internet users these days. I validate it with a regular expression (from hell) and verify it with a ping.

    Thank you for proving (once more) that you can't take the theory to the practice in a correct way.

    How can you suggest to use a random number controlled by someone else that you can't validate as a valid key for companies? That's even worse than an IDENTITY column which, by the way, you can control and predict. How are DUNS part of the LOGICAL data model? and how do they differ from another proprietary, non-relational auto-numbering system?

    I'm sure you are aware the problems on using emails as a key for people. For instance, at the moment I have 5 different email accounts, all valid. Would I be 5 different people? What about people with no email address? or even better, what about services as 10 minute mail? Your design is a total failure and you have to do it again.

    Please post DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, data types, etc. in your schema are.

    There's no real natural key that can be used internationally for these examples without real problems. You might know as well that Primary Keys are different from Clustered Indexes, the way to chose them will be different even if they tend to be the same.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • I have to agree that DUNS and email are (a) not "natural" keys, (b) very poor choices as primary keys (well, the DUNS numbers would be a good PK for the DUNS databases; would fail anywhere else).

    First, email is so obviously not a natural key for a human being that I'm always shocked when people try to use it that way. It's fine as a login string (username), in many cases, but for anything else it's worse than using names. And the problem with using them as logins is that this helps promote security-reuse, which is one of the biggest keys to identity theft. So it's not just bad database design, it's also a security hole, and assists criminals.

    Second, DUNS numbers are no more a natural key for a business than an incremented ID is. It's just a random number assigned by someone else instead of an incremental number assigned by your database engine. Arguing that this makes it superior somehow, makes zero sense. That's even ignoring the fact that millions of businesses don't have or need DUNS numbers.

    Both concepts also assume that all data will be for current-era entities. DUNS numbers only go back to 1963. E-mail for common use, even in tech-heavy countries like the US or Japan, only goes back about 10 years, maybe 15. Human activity that might need to be tracked in a database goes back considerably further than that.

    And e-mail is being replaced for many people by SMS or SMS-like services (Twitter, etc.). So nobody knows how long it will be before email stops being even as valid as it currently is. Tying your database design to something that has that kind of limited time-dimension, when tracking human affairs, is very, very short-sighted.

    I could list hundreds of more reasons those are poorly chosen PKs. They have their uses, but claiming universality to them is just ignorant.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Luis Cazares (9/14/2012)


    CELKO (9/13/2012)


    I really would love to listen to his answers regarding choosing natural keys for Persons and Companies entities (where the system is international and people/companies are not just from USA, you know there some other countries around the globe)

    Short answers: for businesses, I use the Data Universal Numbering System(DUNS). This “limits” me to ~180 million companies right now. It is now so universal, that I have to have a DUNS to do business with DELL as a consultant. The advantages are that D&B maintains it for me (validation and verification), it is free and it is a legal requirement in many countries.

    For people, I use their email address, like most internet users these days. I validate it with a regular expression (from hell) and verify it with a ping.

    Thank you for proving (once more) that you can't take the theory to the practice in a correct way.

    How can you suggest to use a random number controlled by someone else that you can't validate as a valid key for companies? That's even worse than an IDENTITY column which, by the way, you can control and predict. How are DUNS part of the LOGICAL data model? and how do they differ from another proprietary, non-relational auto-numbering system?

    I'm sure you are aware the problems on using emails as a key for people. For instance, at the moment I have 5 different email accounts, all valid. Would I be 5 different people? What about people with no email address? or even better, what about services as 10 minute mail? Your design is a total failure and you have to do it again.

    Please post DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, data types, etc. in your schema are.

    There's no real natural key that can be used internationally for these examples without real problems. You might know as well that Primary Keys are different from Clustered Indexes, the way to chose them will be different even if they tend to be the same.

    I worked for a company that does business internationally. We stored the DUNS numbers for those businesses that we sent to D&B and were returned that information. Problem is, our customers were of all sizes and shapes and many did not have DUNS numbers, especially those overseas. Using the DUNS number as a key simply would not work for this business.

  • Lynn Pettis (9/14/2012)


    I worked for a company that does business internationally. We stored the DUNS numbers for those businesses that we sent to D&B and were returned that information. Problem is, our customers were of all sizes and shapes and many did not have DUNS numbers, especially those overseas. Using the DUNS number as a key simply would not work for this business.

    That's the point, you can store it and use it as an attribute, but it can hardly be considered as a key (just for DUNS databases as Gus said)

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • ChrisM@Work (9/14/2012)

    It's quite common for a product to "evolve" over time without its' stock code changing - it's rampant in the fashion world. Clients, agents, reps all know an item by the stock code. How would you use it as a natural key? It would have to incorporate something temporal or iterative, like "Season". Can you imagine joining this to a table of order lines or sales lines? And any other table which refers to products?

    The best solution I've worked with uses incrementing ID's as surrogate keys. Each iteration of the product will have a separate StockID or ProductID.

    Clustering the Products table on this surrogate key isn't a problem. The most recently added products are more likely to be ordered or sold than older products, other than that the set (of items in an order/sale) is quite random.

    Sounds like a nightmare. How would you keep pricing straight? Does each separate color get its own unique id too??

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher (9/14/2012)


    ChrisM@Work (9/14/2012)

    It's quite common for a product to "evolve" over time without its' stock code changing - it's rampant in the fashion world. Clients, agents, reps all know an item by the stock code. How would you use it as a natural key? It would have to incorporate something temporal or iterative, like "Season". Can you imagine joining this to a table of order lines or sales lines? And any other table which refers to products?

    The best solution I've worked with uses incrementing ID's as surrogate keys. Each iteration of the product will have a separate StockID or ProductID.

    Clustering the Products table on this surrogate key isn't a problem. The most recently added products are more likely to be ordered or sold than older products, other than that the set (of items in an order/sale) is quite random.

    Sounds like a nightmare. How would you keep pricing straight? Does each separate color get its own unique id too??

    One fashion company I worked for some years ago had problems with both. Most garments were calf suede, leather, fetal lambskin, sheerling etc. Different colours of the same skin can be different cost prices which of course was reflected in the selling price. Cost prices are seasonal too, with significant swings in either direction. So yes, a separate surrogate key per colour. The db handled this just Jim Dandy, but there was no end of confusion within sales. Agents would ask for a style so-and-so from three seasons ago, in a particular colour, and that request had to be matched by sales to the correct product in the warehouse.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I don't think we will see J.C. back to this discussion. He is out training people and writing books to add to his signature

    In my previous post I've tried to respond to each of his absolute rubbish suggestions, but I became tired too quickly. Not only the ideas about DUNS and Emails as PKs for Company and Person entities are plainly stupid and not-workable, it's terrible that such crap ideas were voiced by one of the very well known RDBMS "expert" and book writer.

    It is really quite shocking and I am very disappointed.:(

    I guess, it would be better if he just left my questions without answers...

    Mr Celko, honestly, I think you need to get some training yourself...

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • CELKO (9/14/2012)


    One article from the New York Times on the differences in the DNA of twins.

    http://www.nytimes.com/2008/03/11/health/11real.html

    One of the Japanese companies is working on a single chip that does DNA analysis as a security device. I used to work for a state prison system, and we used "ten cards" for finger prints; ink and one finger at a time. Today, I got a clearance for volunteering as a math coach at a trouble middle school. It was done with a scanner plate in a second.

    Really great! DNA sequence for PK!

    Call Centre staff creating new customer record:

    - Could you please tell me your Fist Name?

    - Joe

    - Surname?

    - Celko

    - Your DNA sequence?

    - :w00t:

    What do you smoke? I want the same one!

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • A company I worked for produces clothing catalogs... They now have multi-part keys into almost all their other tables (Brand, Year and catalogue code) which slows down development, testing and reporting as all table joins will require at least three join criteria.

    They've got some other issue if they're slow. I've got tables with 5-column clustered keys that are lightning fast.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • [quote-0It's quite common for a product to "evolve" over time without its' stock code changing - it's rampant in the fashion world. Clients, agents, reps all know an item by the stock code. How would you use it as a natural key? It would have to incorporate something temporal or iterative, like "Season". Can you imagine joining this to a table of order lines or sales lines? And any other table which refers to products?[/quote-0]

    I havn't read the remainder of the thread so apologies if this crosses with another post...

    I too would use Identity as a surrogate key. You could have two items with ABCXXX as a SKU code (what the user sees) but had completely different identity keys and use the identity in the foreign tables. MUCH more tidy than having a 4 part key in every table, or even worse not having all four parts so that you cannot use rely on the key to find the correct record which is in fact what happened. Programmers kept having to bung extra logic in to make sure the right record had been found. - and ad hoc reports completely failed

  • They've got some other issue if they're slow. I've got tables with 5-column clustered keys that are lightning fast.

    They did - they were working in Progress :-P. It was the development that was slow, not the database - at least not because of the poor relational design.

  • Celko, I'm really not sure where to start....

    I fear for the next generation of business analysts and DBAs if this stuff is included in your books..

    "It is recognised, recommended and/or required by more than 50 global, industry and trade associations, including the United Nations, the U.S. Federal Government, the Australian Government and the European Commission"

    I'm a UK ltd co and I have no plans to work with ANY of these organisations. I had not even heard of DUNS until this thread. I do have a VAT number (not all business do), I have a registration number at Companies House (but not all businesses are PLC or LTD). a DUNS number may help you match the company record across systems, but it is only an attribute of the company record

    That concept is "trusted source" which appears in security and data integrity. Can I validate a DUNS ? Yes, if I can get to a D&B office. Can I validate a GUID? Not without going to the one machine on earth that created it.

    I create a dummy place holder with alpha characters, and give them X days to get a DUNS. That is what DELL did with me.

    This one, above all others scares the hell out of me. One one hand you say you are going to use DUNS as the PK (and therefore the record pointer - Foreign Key - in related tables; THEN you say you are going to insert a temporary RANDOM value and replace it later. you now have to make sure you update set DUNS = newDUNS where DUNS = OldDuns in who knows how many tables. The only thing I can think of is we have different meanings for primary and foreign keys - I certainly hope so.

    I've never heard more stupid idea (email address for peopel) than that, sorry Professor.

    Damn! Better tell Amazon, EBay, and the rest of internet commerce about this. As it works out, more people have emails than telephones today and they keep them longer.

    Sorry. I am pretty sure that an organisation the size of Amazon does not use my email address as a PK to my account. I have several email addresses registered with them and can change any of the at any time. Using an email address as the PK on my record would be as stupid a design as having fields email1, email2, email3. emails will be in a separate table (probably with its own Identifier field, a foreign key that holds my accountID, the email address and a field to flag it as my preferred address (at a miniumum, I expect there are others)

    (Bar-codes are not guaranteed to be globally unique and never re-used).

    You might want to tell ISO about this revelation.

    A bar code is only an encoding of a string to make it readable by a barcode reading device. EPCs and EANs are registered and yes I believe that they are unique. I know for a fact - cos I had to try and stock take them - that NIKE golf caps in different colours have the same EAN barcode, so we had to create our own barcodes for use at the EPoS system for stock keeping.

    The why I have been saving bad databases for the past 25+ years? Let me answer that; because of bad programmers, who have no idea how to do a simple schema design. I am so glad that you are not a doctor or an engineer with this attitude.

    Sort of with you on that one.. Programmers (as a huge whole sweeping generalisation) should NOT be allowed to design the database schema, that is the job of the systems/business analyst who has worked closely with the business to model the real-world processes, entities and data flows using UML, BPMN or some other modelling process.

Viewing 15 posts - 31 through 45 (of 125 total)

You must be logged in to reply to this topic. Login to reply