February 6, 2023 at 4:17 pm
CreateIndexNonclustered wrote:I agree that you should only store attributes as numeric types if you plan on doing math on them, but if you are using only surrogate keys as you should be, no one should ever see the primary key.
I guess the question is whether the professorID is the ID number of the professor or if it is only the identity of the record in the table.
If it is the natural ID of the professor, use a string.
If it is only a surrogate key identifying the record, there is a reason to use integer, it is 8 bits smaller than char(10), will query faster and consume less memory. Granted char(10) has more possible values, but as a DBA do you really want to have to determine whether an index value is a tab or a space, or figure out if your key is ` or ' ?
Again, it's just impractical to use a string rather than a numeric, whether you do math on the value or not.
The professorID is likely both the natural ID for the professor and the identity of the row. Suppose you do use a string: how, then, do you determine the next value to be used?
Also, the overall performance of an int is vastly better. An int is at least 6 bytes smaller than a char(10) (with no compression, even less with compression).
And, to prevent non-numeric chars, you'll have to add CHECK conditions for a string ID that will even more complexity and overhead.
Sometimes a generic rules make sense in and of itself, but not in practical usage.
The practicality you speak of is based on bad premise.
IF you were to use a natural key, SQL server should not be generating the key, it would be getting inserted from somewhere else. The value of the key is important and significant in a way that SQL server could not understand.
Using an integer for an employee ID that is text because it is more efficient for joins, harms query ability in several scenarios.
(I did my math wrong earlier, an integer is 32 bits, char(10) is 80 bits)
If you use an integer because it is easier to increment and more efficient to join, two things are sacrificed for the convenience of the developer and database engine at the cost of non-optional business requirements of creating employee/professor Ids that are non-arbitrary, arbitrarily, and making all necessary string functions impossible without type conversions.
The real take away is, never use natural keys for table relations.
February 6, 2023 at 7:31 pm
>> The professorID is likely both the natural ID for the professor and the identity of the row. <<
We used to call people who used the identity table property as an identifier "ID-iots" because it missed some of the basic parts of RDBMS. The rows in a table have no ordering. They are not located by position inside the database schema. They are located by key. The key was defined as a subset of columns in each row that is unique and has no NULLs. This is straight Dr. Codd RDBMS and is usually covered in the first week of a decent class.
The professor_id is probably a tax identification number of some kind. This would give it the two things we want an identifier; validation and verification. We will know the identifier is valid if it is string by using a regular expression. This is why every standardized ISO and ANSI encoding I've see over the last 30+ years follows the Unicode convention of using Latin alphabet, digits and a limited range of punctuation. Thanks to punchcards and fixed length data fields in COBOL filesystems, the vast majority of these standards are fixed length.
Validation is done either by a trusted authority, or a process. An example of a trusted authority would be the taxing authority, to whom we are paying part of this professor's salary. His tax identification would be done by table lookup or a combination of table lookup and a process to see that his tax id is correctly assigned to this guy.
The use of an identity table property is wrong because it's not an attribute of the entity being modeled. Again, this is straight Dr. Codd.
Did you ever get into Jewish mysticism? A Kabbalah number is that which is assigned by God to everything in his creation and it's written out in Hebrew characters (which are also numbers). If you know the true name of God, then you can invoke him with his Kabbalah number and a proper ceremony and since God is one thing, not divided into many roles like his creations, He will come!
Unfortunately, in the rule of RDBMS, if you try to represent the same entity in different SQL servers, you get a different value for the identity property on each machine. The analogy I prefer is that of a parking garage. Each car has a unique identifier or key in the form of its VIN. But if you really want to save space, just use the parking space number in the garage. It works great locally! You've got a ticket with that number on it. You verified that it goes to the particular space that you left your car in. But did you notice if you try to use that parking space number on your insurance forms, a bill of sale, a traffic ticket, or anything, meaningful, it is useless.
And after all of this messing around, did you notice that you still have to have the real key in the database? TheID-iots are usually using identity or GUID to mimic a pointer chain in a non-RDBMS data model.
>> Suppose you do use a string for the ID: how, then, do you determine the next value to be used?<<
You said something that makes no sense in a relational model. Remember that this is a set oriented language, not one based on sequential file structures. The concept of "next" doesn't exist. If you want to do a sequence of some sort, you should be using a CREAT SEQUENCE statement and have designed that particular data element properly, instead of depending on the current hardware to do it for you.
Please post DDL and follow ANSI/ISO standards when asking for help.
February 6, 2023 at 8:10 pm
>> The professorID is likely both the natural ID for the professor and the identity of the row. <<
The professor_id is probably a tax identification number of some kind. ... Validation is done either by a trusted authority, or a process. An example of a trusted authority would be the taxing authority, to whom we are paying part of this professor's salary. His tax identification would be done by table lookup or a combination of table lookup and a process to see that his tax id is correctly assigned to this guy.
The use of an identity table property is wrong because it's not an attribute of the entity being modeled. Again, this is straight Dr. Codd.
And after all of this messing around, did you notice that you still have to have the real key in the database? TheID-iots are usually using identity or GUID to mimic a pointer chain in a non-RDBMS data model.
>> Suppose you do use a string for the ID: how, then, do you determine the next value to be used?<<
You said something that makes no sense in a relational model. Remember that this is a set oriented language, not one based on sequential file structures. The concept of "next" doesn't exist. If you want to do a sequence of some sort, you should be using a CREAT SEQUENCE statement and have designed that particular data element properly, instead of depending on the current hardware to do it for you.
I'd say ZERO chance that the professor id is some type of tax number. The security implications of that alone would prevent it. For many, their tax number would be their Soc Sec Number -- only an IDiot would want to use that a key that the entire IT staff would be privy to. Unless they were issued a fed tax number (EIN), but I don't see why most university profs would need those.
If you imagine that there is some magic group issuing some global number for university professors, I guess you could claim to use that as a key. But in the real world, that just doesn't exist.
No one is trying to mimic a pointer chain. No modern database person even thinks of those any more. You need to stop making that ancient claim. It's just silly.
The professor id is almost certainly just a sequential number assigned by the university. It's issued at the state level, at highest, since most universities are controlled at the state level. I can't imagine there's any U.S.-wide authority issuing numbers (other than a fed tax number, which was dealt with above).
Again, it's the theoretical vs. the practical. If you waited until there was a universal, verified id / number for all professors at all universities, you'd never be able to create a db, since such an id simply does not exist.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
February 6, 2023 at 10:34 pm
>> I agree that you should only store attributes as numeric types if you plan on doing math on them, .. <<
I agree and would add a caveat. Make sure you use the right kind of numeric data type for the data. Decades ago, when I was a FORTRAN programmer, we spent the better part of two weeks learning about floating-point arithmetic. It's a lot trickier than younger programmers think. (a + b) + c <> a + (b + c), the library functions were weird from compiler to compiler, and so forth. IBM and Univac compilers could have some really different results. There is a series of calculations known as the Gibson mix that was used to test floating-point math.
Today, I tell people to use DECIMAL(s,p) in SQL because you have better control from product to product and a lot fewer surprises. I have one job where the idiot had used floating-point for all of his numerics, including things like ZIP Codes! perhaps this was like the train station platform numbers in the Harry Potter novels.
>> .. but if you are using only surrogate keys as you should be, no one should ever see the primary key. <<
Hidden from you? Like pointer chains in non-relational databases, like Total, Image, IDMS, IMS, etc.? Did you ever work with C? One of the first rules was never do pointer arithmetic. It never ended well. The same principle applies to network databases, like the ones you're trying to build with SQL
Since the key is actually data, having datatypes, constraints, and references and it is part of your basic data model, makes a little hard to do validation and verification on it, or even to think about a join.
You need to have that data to have a valid data model, and what you seem to have called a "surrogate" is what I would call "needless unverifiable redundancy"
>> I guess the question is whether the professorID is the ID number of the professor or if it is only the identity of the record [sic] in the table. <<
I have a mania that got put on T-shirts at one point, which says, "tables are not files, rows are not records, columns are not fields" to stress the differences between a file system and SQL. Since the professors are entities in the data model, setting up pointers to their physical storage is just not RDBMS. Again, this is straight from Dr. Codd.
>> there is a reason to use integer, it is 8 bits smaller than char(10), will query faster, and consume less memory. Granted char(10) has more possible values, but as a DBA do you really want to have to determine whether an index value is a tab or a space, or figure out if your key is .. <<
Storage has not really been a problem for most applications. Since the 1970s, disk and other bits of sand are cheap and plentiful. The real problem is the access time, and we've gotten really good at avoiding indexes and other linear storage methods in favor of hashing, parallel processing, etc. for most of us, frankly, the traditional indexes (ISAM, B-Tree, B+Tree Red-Black. etc) that we inherited from filesystems are good enough.
>> Again, it's just impractical to use a string rather than a numeric, whether you do math on the value or not. <<
Wrong.
>>Suppose you do use a string: how, then, do you determine the next value to be used? <<
Again in RDBMS. There is no concept of linear ordering, so the concept of next doesn't apply.
>> Also, the overall performance of an int is vastly better. An int is at least 6 bytes smaller than a char(10) (with no compression, even less with compression). <<
Vastly? No it isn't. In modern hardware, I'm grabbing 64 to 128 bits at a time. I'm also buffering data and doing all kinds of hardware tricks to push lots of bits through my machine. I see you still use "int" instead of saying "INTEGER NOT NULL" which usually means your last procedural programming language was in the C family. He asked
>> And, to prevent non-numeric chars, you'll have to add CHECK conditions [sic: a constraint is not a condition] for a string ID that will have even more complexity and overhead. <<
Don't you try to put as many CHECK() constraints into your DDL as you possibly can? This means you have to do those checks one way, one time and have a common single point of validation for the hundreds of routines that use your data. The constraint will be validated when data enters the database; that's one time for each transaction that is involved in at most.
The most complex constraint I've ever had to write was a dihedral five check digit.. It's an ugly thing that involves its own table lookup, but it catches virtually every possible input error. I'm usually happy with just a Luhn check digit, but if you're working with data that absolutely, positively has to be right, or it will kill people, then it's worth going the extra nanoseconds for a little complexity.
Please post DDL and follow ANSI/ISO standards when asking for help.
February 6, 2023 at 11:37 pm
<<
Hidden from you? Like pointer chains in non-relational databases, like Total, Image, IDMS, IMS, etc.? Did you ever work with C? One of the first rules was never do pointer arithmetic. It never ended well. The same principle applies to network databases, like the ones you're trying to build with SQL
Since the key is actually data, having datatypes, constraints, and references and it is part of your basic data model, makes a little hard to do validation and verification on it, or even to think about a join.
You need to have that data to have a valid data model, and what you seem to have called a "surrogate" is what I would call "needless unverifiable redundancy"
>
No one as in, the application or user. The value of the key doesn't matter because it is not a property of the entity. Only SQL needs to use it to relate, and perhaps occasionally a DBA doing one-off maintenance.
<<
>> Also, the overall performance of an int is vastly better. An int is at least 6 bytes smaller than a char(10) (with no compression, even less with compression). <<
Vastly? No it isn't. In modern hardware, I'm grabbing 64 to 128 bits at a time. I'm also buffering data and doing all kinds of hardware tricks to push lots of bits through my machine. I see you still use "int" instead of saying "INTEGER NOT NULL" which usually means your last procedural programming language was in the C family.
>
It can be. Especially joining hundreds of millions or billions of rows, dealing with big cartesian products or when involving things like column store indexes. I've experienced it myself extracting data from a database that used GUIDs for primary keys, artificially generating an integer to replace the guid keys, then extracted into another table greatly sped up queries with many joins.
February 7, 2023 at 8:03 pm
I used the script you provided and it was very helpful, thank you. One issue I am running into is that the professor_id and student_id values won't appear in the contacts table despite them existing in their respective tables. Also, I will try to run the alter/ foreign key statements and this error appears:
Msg 1785, Level 16, State 0, Line 22
Introducing FOREIGN KEY constraint 'FK__ContactsN__profe__14270015' on table 'ContactsNew' may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints.
Msg 1750, Level 16, State 1, Line 22
Could not create constraint or index. See previous errors.
What can I do to navigate this error message and ultimately have both professor and student ID's appear when I select * from Contacts?
February 7, 2023 at 11:05 pm
This was removed by the editor as SPAM
February 8, 2023 at 1:24 pm
The use of the prefixed "TBL_" is a design error so bad it has a name; it's called the tibble.
This is not a design error. It does not violate the normal forms. I might have preferred tblNameOfTable rather than TBL_NameOfTable. So-called tibbling helps avoid reserved words and keeps from having to identify a table as a lookup or some such to avoid another table that would otherwise have the same name but a different function.
February 8, 2023 at 2:15 pm
I would not prefix a view with tbl. In any case I rarely use views in an OLTP environment. I do make extensive use of them in my OLAP designs per Kimball's suggestion from his 3rd edition book.
February 8, 2023 at 2:57 pm
I would not prefix a view with tbl. In any case I rarely use views in an OLTP environment. I do make extensive use of them in my OLAP designs per Kimball's suggestion from his 3rd edition book.
No, of course you don't deliberately prefix a view with "tbl". You create a table, wrongly prefix it with "tbl", and then later that table needs to become a view instead. No way you're going to rename everything, so now the name is misleading from then on.
Or vice versa: a "view_whatever" needs to become a table instead.
It's a terrible idea to put "tbl" or "view" in the table name (ditto for "proc" on general principles). It is a very poor naming practice.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
February 8, 2023 at 3:05 pm
RonKyle wrote:I would not prefix a view with tbl. In any case I rarely use views in an OLTP environment. I do make extensive use of them in my OLAP designs per Kimball's suggestion from his 3rd edition book.
No, of course you don't deliberately prefix a view with "tbl". You create a table, wrongly prefix it with "tbl", and then later that table needs to become a view instead. No way you're going to rename everything, so now the name is misleading from then on.
Or vice versa: a "view_whatever" needs to become a table instead.
It's a terrible idea to put "tbl" or "view" in the table name (ditto for "proc" on general principles). It is a very poor naming practice.
I agree about not prefixing table names with anything but sometimes it could be useful to prefix views with something, maybe "v_". Then if you are looking at a complex query someone else has written you would know where the data is coming from. Microsoft prefix most of their system stored procedures with "sp_".
February 8, 2023 at 3:17 pm
ScottPletcher wrote:RonKyle wrote:I would not prefix a view with tbl. In any case I rarely use views in an OLTP environment. I do make extensive use of them in my OLAP designs per Kimball's suggestion from his 3rd edition book.
No, of course you don't deliberately prefix a view with "tbl". You create a table, wrongly prefix it with "tbl", and then later that table needs to become a view instead. No way you're going to rename everything, so now the name is misleading from then on.
Or vice versa: a "view_whatever" needs to become a table instead.
It's a terrible idea to put "tbl" or "view" in the table name (ditto for "proc" on general principles). It is a very poor naming practice.
I agree about not prefixing table names with anything but sometimes it could be useful to prefix views with something, maybe "v_". Then if you are looking at a complex query someone else has written you would know where the data is coming from. Microsoft prefix most of their system stored procedures with "sp_".
"V_" still a terrible idea.
I prefix procs I need to put in the master db with "sp_", because otherwise they don't have the "special" property that the "sp_" stands for. "sp_" does not stand for "stored procedure" as everyone assumes. It allows procs in the master db to by default run in the current db context, for example, sp_help.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
February 8, 2023 at 3:17 pm
I've been doing this a long time and I haven't found anything "terrible" about it. When I'm working with other naming conventions it can sometimes be hard to otherwise avoid a wordy table name. I don't prefix views, nor have I had a case of turning a table into a view. Not sure how that would come about. I also don't prefix stored procedures. When I've been in environments that prefix them "sp_" I let them know that they should leave that prefix to Microsoft. If stored procs must be prefixed then use usp_.
February 8, 2023 at 3:27 pm
nor have I had a case of turning a table into a view. Not sure how that would come about.
We have a few old third party systems that use the tbl convention. All of them have at least one view which starts with tbl. ie When a system evolves over a thirty year period it is quite likely that a table will be turned into a view.
Viewing 15 posts - 16 through 30 (of 35 total)
You must be logged in to reply to this topic. Login to reply