Sort already comma separated list

Question

Sort already comma separated list

Viewing 15 posts - 31 through 45 (of 69 total)

You must be logged in to reply to this topic. Login to reply

ScottPletcher SSC Guru Points: 101099 More actions · Answer 1

jcelko212 32090 - Wednesday, February 6, 2019 9:11 AM
ScottPletcher - Wednesday, January 30, 2019 11:38 AM
This topic comes up so much, I think MS should "fudge" a little on the relational model and create a type of "char array". They include not just selection against it, but order by also.
Having spent some time working on standards for languages, it's not as easy as you might think. The SQL standard string is case-sensitive, has an explicit correlation that can be changed (and can be more complicated than just single characters -- double L used to be treated as a separate letter in Spanish, etc.). In SQL, string comparisons pad out the shorter string with blanks (watch out for Arabic and Hebrew!), But the xBase languages truncate the longer string. Some languages are not case-sensitive and some are. Some have only one correlation, and others allow it to be set.
Now given a list of character strings, do we treat the string as a permutation or a combination? Can I nest lists inside lists (like we do in LISP)? What about list concatenation? Can the elements within the list have different correlations? How about different datatypes? What library functions do we need? Can I locate a sublist within the list?

Fair enough. MS could still create their own version, and explicitly specify the rules for their particular custom data type. With the advent of STRING_SPLIT and STRING_AGG, a built-in "string_array" data type doesn't seem too far-fetched to me.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

x SSC-Insane Points: 23660 More actions · Answer 2

Jeff Moden - Tuesday, February 5, 2019 7:39 PM
Almost forgot, Joe. If you want to see how what you're calling "God Numbers" are being used and why they're important, then just click on the following Microsoft article. Once there, search for "_id" (without the quotes).
https://docs.microsoft.com/en-us/sql/relational-databases/system-catalog-views/sys-objects-transact-sql?view=sql-server-2017
Heh... according to you, you helped design all this. 😉

thats a materially bad example however. When we reference database objects, we use the names, these are the "keys" as Joe describes them. For instance, if you took two separate servers and created objects in the databases using the same names, the names will match regardless of the order they are created, however, the internal "id's" used could vary in their ordinal values, there is no guarantee that the "company" table will be assigned any particular integer id value. Microsoft gets to do this, because they understand that they're not producing natural keys, but instead they have the development experience and testing expertise to assure that these numbers are internally consistant, and given the wide range of installations they support, they can't skimp on performance and for that matter they don't have to, because after all they aren't ditching the "natural" keys, ie., the object names that are subsequenly "linked" internally. Using natural keys is inherently more correct in most cases, but with all programming efficiencies they do come with a cost, they are typicall bigger than ints after all.

Gotta agree with Joe on this, its cool if nobody agrees with me, it would just reinforce how bad the database profession has become, heck if the "professionals" had the humility to produce a genuinely valid set of "best practices" based on consensus and analysis, we wouldn't be seeing these repetitive and tiresome arguments about this sort of stuff.

edit: I just thought of a great example, take a programming language with subroutines / functions. Internally, these subroutine names resolve into actual targets, and these targets will be arbitrary addresses based on what the linker has calculated. However, the "natural" key for a subroutine name is just that, the "name", in whatever namespace its a member of. The internal id's are just the addresses where the particular load happens to land the code.

So if you take the same paradigm, the internal object "ids" are just the internal representation that the actual "name" of the object "resolves" to. Think of the primary key example "product 0001", this natural looking key actually resolves into an internal row id in storage, but we would never think it wise to refer to any database row by its internal id unless we were bypassing the usual database mechanisms and going right to physical storage, like "dump page" or whatever its called.

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 3

patrickmcginnis59 10839 - Thursday, February 7, 2019 4:12 PM
Jeff Moden - Tuesday, February 5, 2019 7:39 PM
Almost forgot, Joe. If you want to see how what you're calling "God Numbers" are being used and why they're important, then just click on the following Microsoft article. Once there, search for "_id" (without the quotes).
https://docs.microsoft.com/en-us/sql/relational-databases/system-catalog-views/sys-objects-transact-sql?view=sql-server-2017
Heh... according to you, you helped design all this. 😉
thats a materially bad example however. When we reference database objects, we use the names, these are the "keys" as Joe describes them. For instance, if you took two separate servers and created objects in the databases using the same names, the names will match regardless of the order they are created, however, the internal "id's" used could vary in their ordinal values, there is no guarantee that the "company" table will be assigned any particular integer id value. Microsoft gets to do this, because they understand that they're not producing natural keys, but instead they have the development experience and testing expertise to assure that these numbers are internally consistant, and given the wide range of installations they support, they can't skimp on performance and for that matter they don't have to, because after all they aren't ditching the "natural" keys, ie., the object names that are subsequenly "linked" internally. Using natural keys is inherently more correct in most cases, but with all programming efficiencies they do come with a cost, they are typicall bigger than ints after all.
Gotta agree with Joe on this, its cool if nobody agrees with me, it would just reinforce how bad the database profession has become, heck if the "professionals" had the humility to produce a genuinely valid set of "best practices" based on consensus and analysis, we wouldn't be seeing these repetitive and tiresome arguments about this sort of stuff.
edit: I just thought of a great example, take a programming language with subroutines / functions. Internally, these subroutine names resolve into actual targets, and these targets will be arbitrary addresses based on what the linker has calculated. However, the "natural" key for a subroutine name is just that, the "name", in whatever namespace its a member of. The internal id's are just the addresses where the particular load happens to land the code.
So if you take the same paradigm, the internal object "ids" are just the internal representation that the actual "name" of the object "resolves" to. Think of the primary key example "product 0001", this natural looking key actually resolves into an internal row id in storage, but we would never think it wise to refer to any database row by its internal id unless we were bypassing the usual database mechanisms and going right to physical storage, like "dump page" or whatever its called.

Understood. So let me ask you, what is an SSN? It's a "God" number like what Joe is talking about. We just didn't happen to be the ones that created it. Same thing with DUNs. It's a "God" number just like the ones Joe is talking about.

Joe keeps ranting about how bad things like Identity columns are but has never responded with anything that would meet the requirements of a PK. I just want him to tell us what he would consider to be a good and proper candidate for a PK column (or columns) for a Customer table without using SSNs or DUNs and that would meet the requirements of being a PK column.

The example of object_id relates to what I'm talking about and another related subject. As you know, the object_id must be unique in the table and it doesn't actually matter what the numeric value is until it's assigned, just like in a Customer table. Further, the name of the object must also be unique, just like it should be in a Customer table. And, we get a warning if we try to change that name key using the likes of sp_rename because it could break things, just like it will in a Customer table. The object_id meets the requirements for being a PK just as a Customer_ID would. The name of the object does not, just like it wouldn't in a Customer table. Of course, you already know that and so I'm only saying it for Joe and anyone else that may be reading this.

For all the same reasons you talked about for "internal numbers", a Customer_ID would fit the bill as an "internal number" and makes an excellent PK.

As you say, it's cool if no one agrees with me. I just want them to explain their position especially when they accompany their argument with unrelated reasoning such as it can't be a number because it will never be used for math (which is actually incorrect, as well).

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

x SSC-Insane Points: 23660 More actions · Answer 4

Jeff Moden - Thursday, February 7, 2019 7:34 PM
patrickmcginnis59 10839 - Thursday, February 7, 2019 4:12 PM
Jeff Moden - Tuesday, February 5, 2019 7:39 PM
Almost forgot, Joe. If you want to see how what you're calling "God Numbers" are being used and why they're important, then just click on the following Microsoft article. Once there, search for "_id" (without the quotes).
https://docs.microsoft.com/en-us/sql/relational-databases/system-catalog-views/sys-objects-transact-sql?view=sql-server-2017
Heh... according to you, you helped design all this. 😉
thats a materially bad example however. When we reference database objects, we use the names, these are the "keys" as Joe describes them. For instance, if you took two separate servers and created objects in the databases using the same names, the names will match regardless of the order they are created, however, the internal "id's" used could vary in their ordinal values, there is no guarantee that the "company" table will be assigned any particular integer id value. Microsoft gets to do this, because they understand that they're not producing natural keys, but instead they have the development experience and testing expertise to assure that these numbers are internally consistant, and given the wide range of installations they support, they can't skimp on performance and for that matter they don't have to, because after all they aren't ditching the "natural" keys, ie., the object names that are subsequenly "linked" internally. Using natural keys is inherently more correct in most cases, but with all programming efficiencies they do come with a cost, they are typicall bigger than ints after all.
Gotta agree with Joe on this, its cool if nobody agrees with me, it would just reinforce how bad the database profession has become, heck if the "professionals" had the humility to produce a genuinely valid set of "best practices" based on consensus and analysis, we wouldn't be seeing these repetitive and tiresome arguments about this sort of stuff.
edit: I just thought of a great example, take a programming language with subroutines / functions. Internally, these subroutine names resolve into actual targets, and these targets will be arbitrary addresses based on what the linker has calculated. However, the "natural" key for a subroutine name is just that, the "name", in whatever namespace its a member of. The internal id's are just the addresses where the particular load happens to land the code.
So if you take the same paradigm, the internal object "ids" are just the internal representation that the actual "name" of the object "resolves" to. Think of the primary key example "product 0001", this natural looking key actually resolves into an internal row id in storage, but we would never think it wise to refer to any database row by its internal id unless we were bypassing the usual database mechanisms and going right to physical storage, like "dump page" or whatever its called.
Understood. So let me ask you, what is an SSN? It's a "God" number like what Joe is talking about. We just didn't happen to be the ones that created it. Same thing with DUNs. It's a "God" number just like the ones Joe is talking about.
Joe keeps ranting about how bad things like Identity columns are but has never responded with anything that would meet the requirements of a PK. I just want him to tell us what he would consider to be a good and proper candidate for a PK column (or columns) for a Customer table without using SSNs or DUNs and that would meet the requirements of being a PK column.
The example of object_id relates to what I'm talking about and another related subject. As you know, the object_id must be unique in the table and it doesn't actually matter what the numeric value is until it's assigned, just like in a Customer table. Further, the name of the object must also be unique, just like it should be in a Customer table. And, we get a warning if we try to change that name key using the likes of sp_rename because it could break things, just like it will in a Customer table. The object_id meets the requirements for being a PK just as a Customer_ID would. The name of the object does not, just like it wouldn't in a Customer table. Of course, you already know that and so I'm only saying it for Joe and anyone else that may be reading this.
For all the same reasons you talked about for "internal numbers", a Customer_ID would fit the bill as an "internal number" and makes an excellent PK.
As you say, it's cool if no one agrees with me. I just want them to explain their position especially when they accompany their argument with unrelated reasoning such as it can't be a number because it will never be used for math (which is actually incorrect, as well).

I've seen some of those concerns with purchase order number. Sometimes you have to accept what you get, and with the customer id, they form a key that requires uniqueness. The closest you can get is to allow enough characters for the wide varieties of values you're going to get.

This sort of requirement I bet is pretty common when interchanging data. This doesn't necessarily preclude an integer as the primary key of a clustered table, heck it can be arbitrary and increasing, and narrow the lookup column for the nonclustered indexes. Using it as a business key just isn't going to be a universal thing. If you can get away with it, thats fine. Heck, time and again I've even seen end users take over a column or screen field meant for one thing and used it for another. Obviously a set back when we subsequently want to use it for the original purpose, but thats what you get.

If you want to communicate however, you probably have to share these keys, and if it isn't purely numeric then you have to do whats required to store and communicate the information.

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 5

patrickmcginnis59 10839 - Thursday, February 7, 2019 8:25 PM
I've seen some of those concerns with purchase order number. Sometimes you have to accept what you get, and with the customer id, they form a key that requires uniqueness. The closest you can get is to allow enough characters for the wide varieties of values you're going to get.
This sort of requirement I bet is pretty common when interchanging data. This doesn't necessarily preclude an integer as the primary key of a clustered table, heck it can be arbitrary and increasing, and narrow the lookup column for the nonclustered indexes. Using it as a business key just isn't going to be a universal thing. If you can get away with it, thats fine. Heck, time and again I've even seen end users take over a column or screen field meant for one thing and used it for another. Obviously a set back when we subsequently want to use it for the original purpose, but thats what you get.
If you want to communicate however, you probably have to share these keys, and if it isn't purely numeric then you have to do whats required to store and communicate the information.

Yep... totally agreed. That's why I said "internally". Sometimes it does go external but, for internal use, it's ok to use whether you're doing math on it or not.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

sgmunson SSC Guru Points: 110640 More actions · Answer 6

paul s-306273 - Tuesday, February 5, 2019 1:21 AM
Just an observation - Jeff and Lynn frequently respond to Joe's posts.
Anybody else think they 'follow' Joe?

Wouldn't waste my time without getting paid really well, to do something that would waste my time, pretty much always.
He's almost always very short on facts and very obstinate and haughty defending an awful lot of stuff that when it comes
to getting stuff done, is usually quite practical as well as appropriate, but heaven forbid it not fit his standards 100% and
he has to have a few cows and a half every time. Why anyone wastes the time responding to anything he says is above
my pay grade. I'll never understand it. He's just not going to change.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

jcelko212 32090 SSCrazy Eights Points: 9303 More actions · Answer 7

Jeff Moden - Tuesday, February 5, 2019 6:04 PM

jcelko212 32090 - Tuesday, February 5, 2019 2:53 PM

Jeff Moden - Monday, February 4, 2019 2:28 PM
jcelko212 32090 - Monday, February 4, 2019 11:16 AM
Jeff Moden - Sunday, February 3, 2019 8:08 PM
jcelko212 32090 - Friday, February 1, 2019 1:46 PM
No problem, Joe. All you have to do is tell us what the "unique identifier" (the thing that people are currently using that you're claiming no math will ever be used on) should be for a Customer table? Once you've told us that, please explain the impact that will have not only on the clustered index (which is what I presume you would assign such a key to) but every non-clustered index. Once you're done with that, do the same for the related address and phone number tables. Then I'll do the same for the thing you think has no calculations done against it. 😉
As you say, no vague unsupported statements on your part please because, so far, that's mostly all you've ever done on this subject.

I think I've answered this several times in the past. I do not believe in Kabbalah number magic. There is no such thing as a universal identifier that God put into everything in the universe. There is no such thing as a generic all-purpose "customer"; one enterprise might work with individuals, another only with corporations, and an enterprise might be a cash business which has sales and no customers identified whatsoever. If I'm working with corporations, my customer identifier might be the DUNS (actually required by law. In some places). Or it might be a tax identification number of both individuals, lawful persons, and corporations.

In short, your question makes no sense. The customer exists only within a logical data model of the enterprise, not as some Kabbalah number assigned by God or dependent on physical storage on one machine.

For the phone numbers. I've got the international standard E.164 family. For addresses within the United States, I would use CASS. I can buy software to validate both of these standards and I can verify them against third-party databases. If I need to get addresses from overseas, then I need to find out their countries standards are.

Heh... yes you have. Pretty much the same thing you've answered above, which is the usual rhetoric . Using individual Tax Ids is wrong at so many levels not to mention that they can be changed on request. Duns might be right but only if they have a DUNs and they actually know what it is. Have you actually ever created a real database that used Duns?

So, if you can't use a Tax ID and you can't use a DUNs, what do YOU use?

And phone numbers are absolutely stupid to use as a customer identifier because they violate the first and most important rule for Primary Keys... they must be immutable and phone numbers are anything but immutable. The same holds true for addresses. They cannot be used as a PK because they are not immutable. They're also not necessarily unique.

To this date and with the possible exception of a DUNs number, which can't be used for all customer types, you've failed to produce a viable recommendation for Primary Keys that will meet all of the requirements of Primary Keys and stand the test of time. And I don't need to produce documentation on what those requirements are because, of all the people in the world, you should know already.

>> using individual tax ids is wrong at so many levels not to mention that they can be changed on request. <<

I guess thatâ€™s why the IRS, state taxing agencies, etc. have something else. What is it? Is a tax identifier required by law? I keep stressing validation and verification as properties that we want an identifier. Verification can be done by having a trusted source, such as the taxing agency. This is how you do it with your local DMV and VIN numbers.

As an aside, the GTIN used to be reusable within a manufacturerâ€™s product line. They just change the rules this year. It says that a trade item will be associated permanently with its GTIN.

>> DUNS might be right but only if they have a DUNS and they actually know what it is. <<

You can get a DUNS for free, or you can pay a little bit to get it expedited. I know I had to have one to do consulting work at Dell Computers here in Austin.

>> have you actually ever created a real database that used duns? <<

Yes. It was a consulting job for a company that makes small aircraft parts. They had to be able to supply a complete providence of the parts, and all the suppliers were required by regulation to have such a number.

>> So, if you can't use a Tax ID and you can't use a DUNs, what do YOU use? <<

youâ€™re trying to play the â€œyes, but..â€ game. This was an avoidance procedure. We learned in our Transaction Analysis (ever read â€œGames People Playâ€?) classes.

>> And phone numbers are absolutely stupid to use as a customer identifier because they violate the first and most important rule for Primary Keys... they must be immutable and phone numbers are anything but immutable. <<

I never said immutable. I said verifiable. Getting into another desirable property of data in general; stability, with immutability being the extreme of this property. At the other extreme, I have a friend who used to work for Wall Street firm whose identifiers were never stable, from minute to minute. He had to plug a dongle into his computer to keep up with the constantly changing identifiers on their system. It was something no human being could do so they had to have software and hardware. This made a very secure system, which was the idea.
Actually, phone numbers have gotten much more stable than they used to be. I can move my phone number from one mobile company to another, something I couldnâ€™t do we only had copper wires. Weâ€™ve also found out that email addresses are remarkably stable. My wife works for a company that provides online access to television shows. Their customers are identified by their email. Validation for an email address is very easy with a horrible looking regular expression. Verification is done by sending a message to the email itself. The email also shows up in the person since the company a message.

>> The same holds true for addresses. They cannot be used as a PK because they are not immutable. They're also not necessarily unique. <<

the rule in the direct mail industry is that a mailing list of any size will have a minimum of 10% incorrect addresses. People move. Street names change. Etc. it also depends what your audience is like. Senior citizens tend not to move, while younger people do. However, if you want to get to the lot and not a person, then this works pretty good. We also have the CASS standards for validation.it was never intended as an identifier for a person
.
>> And I don't need to produce documentation on what those requirements are because, of all the people in the world, you should know already. <<

humor me and quit playing the â€œyes, butâ€ game. Or better yet, tell me what the right answer is. With the thing that will identify all customers of all types for all time. The Kabbalah number again!

Please post DDL and follow ANSI/ISO standards when asking for help.

jcelko212 32090 SSCrazy Eights Points: 9303 More actions · Answer 8

>> But to create a primary key with an int or bigint all you need to do is define the table like this:

CREATE TABLE My_Table (my_table_id INTEGER IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED, col1 NVARCHAR(20) NOT NULL, col2 NVARCHAR(20) NOT NULL );
Then you can forget about it, no need to invent some text column which needs a complicated algorithm to work out the value. <<

Your solution of using an IDENTITY property (not a column!) has all kinds of problems. It is a table property that belongs to only one table, and only one implementation of one particular SQL product on only this machine. It has nothing whatsoever to do with the entities being modeled. It's a physical locator, that Sybase implemented because their first products were on UNIX based machines. The file system was based on a magnetic tape which was mapped over to a sequential file on the early disk drives.

Think about a VIN number. I can validate it with a regular expression. I can verify it by going to my local DMV, Carmax, the manufacturer, and any other number of trusted sources. That's why my auto insurance company uses it. This is why the same car can be referenced in so many different databases. If worse comes to worst, I can go over and physically take the VIN number off the dashboard manually.

What you're doing is using a physical locator and building pointer chains. Essentially, your IDENTITY is like the parking space number in a single garage building. Oh, since it is a table level property and cannot be a valid column and it is not an attribute of whatever the table is modeling. Therefore, by definition, it can never be part of a key.

>> You also don't usually need to display the primary key anywhere, you just use it for joins and uniqueness. <<

Actually, I found it to be quite the opposite. You're not displaying your locator because it has nothing to the data model. I'm constantly displaying the keys because keys are part of the data. A very important attribute of each entity. Can you really imagine not having a VIN an automobile database or failing to show it?

>> If you had a database with 400 tables in it each one having a text column for the primary key it would become a total nightmare to maintain. <<

First of all, after 35 years of doing databases, I have seldom had any schema with that many tables. It's usually a sign of a bad design (lots of attribute splitting, mimicking individual tape files with tables, etc.). And no, it isn't.

>> There also is some maths done on these columns: 1 is added to it to get the next value.<<

The next value of what? It's an identifier, which is on a nominal scale by definition. Please tell me you don't believe that the square root of your credit card number has some meaning. Oh,wait a minute! You're still thinking of pointers and sequential file allocations on magnetic tape, not RDBMS.

Please post DDL and follow ANSI/ISO standards when asking for help.

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 9

jcelko212 32090 - Friday, February 8, 2019 12:42 PM

Jeff Moden - Tuesday, February 5, 2019 6:04 PM

jcelko212 32090 - Tuesday, February 5, 2019 2:53 PM
Jeff Moden - Monday, February 4, 2019 2:28 PM
jcelko212 32090 - Monday, February 4, 2019 11:16 AM
Jeff Moden - Sunday, February 3, 2019 8:08 PM
jcelko212 32090 - Friday, February 1, 2019 1:46 PM
No problem, Joe. All you have to do is tell us what the "unique identifier" (the thing that people are currently using that you're claiming no math will ever be used on) should be for a Customer table? Once you've told us that, please explain the impact that will have not only on the clustered index (which is what I presume you would assign such a key to) but every non-clustered index. Once you're done with that, do the same for the related address and phone number tables. Then I'll do the same for the thing you think has no calculations done against it. 😉
As you say, no vague unsupported statements on your part please because, so far, that's mostly all you've ever done on this subject.
I think I've answered this several times in the past. I do not believe in Kabbalah number magic. There is no such thing as a universal identifier that God put into everything in the universe. There is no such thing as a generic all-purpose "customer"; one enterprise might work with individuals, another only with corporations, and an enterprise might be a cash business which has sales and no customers identified whatsoever. If I'm working with corporations, my customer identifier might be the DUNS (actually required by law. In some places). Or it might be a tax identification number of both individuals, lawful persons, and corporations.
In short, your question makes no sense. The customer exists only within a logical data model of the enterprise, not as some Kabbalah number assigned by God or dependent on physical storage on one machine.
For the phone numbers. I've got the international standard E.164 family. For addresses within the United States, I would use CASS. I can buy software to validate both of these standards and I can verify them against third-party databases. If I need to get addresses from overseas, then I need to find out their countries standards are.