Sort already comma separated list

Question

Sort already comma separated list

Viewing 15 posts - 16 through 30 (of 69 total)

You must be logged in to reply to this topic. Login to reply

jcelko212 32090 SSCrazy Eights Points: 9303 More actions · Answer 1

Jeff Moden - Sunday, February 3, 2019 8:08 PM
jcelko212 32090 - Friday, February 1, 2019 1:46 PM
And identifiers can never be numeric because you don't do any math on them.
Good lord. I can't believe that your still on that kick, Joe. I also can't believe that an intelligent person would give that as a reason to not use numerics as identifiers.

Why don't we just go back to basics? How much are your first data modeling class do you remember?
An identifier is a nominal scale. That means each of its values is distinct, it has no ordering, no origin and it matches one identifier to one entity. I'm going to assume that you don't think that identifiers are a ratio scale!

Since it is a nominal scale, we don't do math on it. I'll go ahead and repeat my old joke about the kid telling the teacher that 5Ã—6 = 30, and when asked how he arrived at that conclusion, he tells her that he took the two previous wrong answers from other students, divided Thursday by red and got 30. I use this joke a lot when I'm teaching classes to explain datatypes, the operations are valid on them.

SQL is a strongly typed language with orthogonality. That means anywhere I could use a constant of some data type, I can replace it with an expression of that same data type. Thus, in your world, an employee identifier of 4 should be replaced with (2+2) would make sense and have no problems whatsoever. But it doesn't. That's why we want to see a string "004" instead. It also means that each type has a set of valid operations that could be done on them. Thus, in your world, the square root of your credit card number would make sense. That is the nature of a numeric data type, and it's allowed computations.

Now from a practical viewpoint, instead of mathematical foundations and proper design, a string lets us add regular expressions and check digits to validate an identifier. Numerics do not. Okay, Jeff, it's your turn to come back to the logical argument with a good theoretical base and some practical suggestions. No vague unsupported statements, please.

Please post DDL and follow ANSI/ISO standards when asking for help.

ScottPletcher SSC Guru Points: 101120 More actions · Answer 2

jcelko212 32090 - Monday, February 4, 2019 11:16 AM
Jeff Moden - Sunday, February 3, 2019 8:08 PM
jcelko212 32090 - Friday, February 1, 2019 1:46 PM
And identifiers can never be numeric because you don't do any math on them.
Good lord. I can't believe that your still on that kick, Joe. I also can't believe that an intelligent person would give that as a reason to not use numerics as identifiers.
Why don't we just go back to basics? How much are your first data modeling class do you remember?
An identifier is a nominal scale. That means each of its values is distinct, it has no ordering, no origin and it matches one identifier to one entity. I'm going to assume that you don't think that identifiers are a ratio scale!
Since it is a nominal scale, we don't do math on it. I'll go ahead and repeat my old joke about the kid telling the teacher that 5Ã—6 = 30, and when asked how he arrived at that conclusion, he tells her that he took the two previous wrong answers from other students, divided Thursday by red and got 30. I use this joke a lot when I'm teaching classes to explain datatypes, the operations are valid on them.
SQL is a strongly typed language with orthogonality. That means anywhere I could use a constant of some data type, I can replace it with an expression of that same data type. Thus, in your world, an employee identifier of 4 should be replaced with (2+2) would make sense and have no problems whatsoever. But it doesn't. That's why we want to see a string "004" instead. It also means that each type has a set of valid operations that could be done on them. Thus, in your world, the square root of your credit card number would make sense. That is the nature of a numeric data type, and it's allowed computations.
Now from a practical viewpoint, instead of mathematical foundations and proper design, a string lets us add regular expressions and check digits to validate an identifier. Numerics do not. Okay, Jeff, it's your turn to come back to the logical argument with a good theoretical base and some practical suggestions. No vague unsupported statements, please.

It's stored as numeric because it's more efficient, both in terms of size and in terms of built-in validity checking. Some of our identifiers require a bigint value. That could takes 16+ bytes of char data, but only 8 bytes of numeric data. Also, if we make the column varchar, then we need complex CHECK(s) to make sure that only valid digits are entered into the column. Finally, it's also vastly easier to assign the next value to a numeric value rather than a string one.

Just because a column is numeric doesn't mean that every arithmetic calc on it will yield a practical, usable result. Item_quantity would be numeric, but a square root of it is still meaningless.

The check-digit validation will just require an additional step for numeric values: first convert the entire value to a string. The rest of the process should be identical to what it would be for an stored as char.

Can't we be practical in these situations rather than slavishly adhering to theoretical rules without any usefulness in context?

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Alan Burstein SSC Guru Points: 61141 More actions · Answer 3

PSB - Wednesday, January 30, 2019 7:44 AM
Hi
I have values like this in a column which basically is the exact same thing if sorted .
ORM;PS;SUP
ORM;SUP;PS
I want to have it as follows : ORM;
PS;SUP
CREATE TABLE #TEST (ID INT, Column1 VARCHAR(20))
INSERT INTO #TEST ( ID,Column1)
SELECT 1,'ORM;PS;SUP' UNION
SELECT 2,'ORM;SUP;PS'
SELECT * FROM #TEST
DROP TABLE #TEST

If a correctly Normalized table is not an option then I suggest you derive one using and an indexed view that splits the string for you. Using this sample data and DDL to create a persisted tally table (1-base):
-- Sample data into persisted table CREATE TABLE dbo.Test (ID INT, Column1 VARCHAR(20)); INSERT INTO dbo.Test (ID,Column1) VALUES (1,'ORM;PS;SUP'),(2,'ORM;SUP;PS');

-- Tally Table Code (if you don't have one) IF OBJECT_ID('dbo.tally') IS NOT NULL DROP TABLE dbo.tally; CREATE TABLE dbo.tally (N INT NOT NULL);

INSERT dbo.tally(N) SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM sys.all_columns a, sys.all_columns b;

ALTER TABLE dbo.tally ADD CONSTRAINT pk_cl_tally PRIMARY KEY CLUSTERED(N) WITH FILLFACTOR=100;

ALTER TABLE dbo.tally ADD CONSTRAINT uq_dbo_tally UNIQUE NONCLUSTERED(N);

... you could create then create an indexed view like this:

CREATE VIEW dbo.TestSplit WITH SCHEMABINDING AS SELECT t.ID, item = SUBSTRING ( t.Column1, tt.N+SIGN(tt.N-1), ISNULL(NULLIF((CHARINDEX(';',t.Column1,tt.N+1)),0),LEN(t.Column1)+1)-(tt.N)-SIGN(tt.N-1) ) FROM dbo.Test AS t CROSS JOIN dbo.tally AS tt WHERE tt.N <= LEN(t.Column1) AND (tt.N = 1 OR SUBSTRING(t.column1,tt.N,1) = ';'); GO

CREATE UNIQUE CLUSTERED INDEX uq_cl__testSplit ON dbo.TestSplit(Id,Item) GO

Now you don't need to split the string each time you do what you're doing. ~~I skipped how you would concatenate this as others have already demonstrated how to do so.~~ The key here is the "split" is pre-aggregated.

Updated to Include Concatenation:
WITH base AS ( SELECT ID FROM dbo.testSplit AS t WITH (NOEXPAND) GROUP BY t.ID) SELECT base.Id, STUFF(( SELECT ';'+s2.item FROM dbo.testSplit AS s2 WITH (NOEXPAND) WHERE base.Id = s2.Id ORDER BY s2.item FOR XML PATH('') ),1,1,'') FROM base;

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 4

jcelko212 32090 - Monday, February 4, 2019 11:16 AM
Jeff Moden - Sunday, February 3, 2019 8:08 PM
jcelko212 32090 - Friday, February 1, 2019 1:46 PM
And identifiers can never be numeric because you don't do any math on them.
Good lord. I can't believe that your still on that kick, Joe. I also can't believe that an intelligent person would give that as a reason to not use numerics as identifiers.
Okay, Jeff, it's your turn to come back to the logical argument with a good theoretical base and some practical suggestions. No vague unsupported statements, please.

No problem, Joe. All you have to do is tell us what the "unique identifier" (the thing that people are currently using that you're claiming no math will ever be used on) should be for a Customer table? Once you've told us that, please explain the impact that will have not only on the clustered index (which is what I presume you would assign such a key to) but every non-clustered index. Once you're done with that, do the same for the related address and phone number tables. Then I'll do the same for the thing you think has no calculations done against it. 😉

As you say, no vague unsupported statements on your part please because, so far, that's mostly all you've ever done on this subject.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

paul s-306273 SSChampion Points: 10730 More actions · Answer 5

Just an observation - Jeff and Lynn frequently respond to Joe's posts.

Anybody else think they 'follow' Joe?

Jeffrey Williams SSC Guru Points: 90304 More actions · Answer 6

Jonathan AC Roberts - Wednesday, January 30, 2019 12:50 PM
Luis Cazares - Wednesday, January 30, 2019 11:18 AM
This could be a faster alternative if we consider the measurements used on the performance testings for [DelimitedSplit8K] and [DelimitedSplit8K_Lead].
CREATE FUNCTION [dbo].[SortDelimitedStringN4K] --===== Define I/O parameters (@pString VARCHAR(8000), @pDelimiter CHAR(1)) RETURNS TABLE WITH SCHEMABINDING AS RETURN --===== "Inline" CTE Driven "Tally Tableâ€ produces values from 0 up to 10,000... -- enough to cover VARCHAR(8000) WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ), --10E+1 or 10 rows E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front -- for both a performance gain and prevention of accidental "overruns" SELECT 0 UNION ALL SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4 ), cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter) SELECT t.N+1 FROM cteTally t WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0) ) --===== Do the actual split, sort and concatenate again. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found. SELECT STUFF( (SELECT @pDelimiter + SUBSTRING(@pString,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1) OVER (ORDER BY s.N1) - 1),0)-s.N1,4000)) FROM cteStart s ORDER BY 1 FOR XML PATH(''), TYPE).value('./text()[1]', 'varchar(max)'), 1, 1, '') AS SortedString ; GO

Note that the way of calling it is a bit different
SELECT sds.SortedString, * FROM #TEST CROSS APPLY [dbo].[SortDelimitedStringN4K](Column1,';') sds
Yes, an Inline Table-valued Function is much faster than a Scalar-valued Function for this.

As of SQL Server 2017 - there is the STRING_AGG built-in function: https://docs.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql?view=sql-server-2017

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

jcelko212 32090 SSCrazy Eights Points: 9303 More actions · Answer 7

>> It's stored as numeric because it's more efficient, both in terms of size and in terms of built-in validity checking. <<

Saving storage space today is not the issue that it was back when we were making the Y2K mistakes. And I disagree with you about the validity checking. The ISO standards all use a subset of Latin characters, digits, and punctuation. This was so that Unicode could get them into every written language on earth. It also means that we can use very simple regular expressions for validation.

>> Some of our identifiers require a bigint value. That could takes 16+ bytes of char data, but only 8 bytes of numeric data.<<

And then you have to take your bigint value, unpack it converted to characters and then display it. I wish I could remember who said this, but "store data of the way you use it; use date of the way it's been stored".

>> Also, if we make the column VARCHAR(n), then we need complex CHECK(s) to make sure that only valid digits are entered into the column. <<

Perhaps this is why most of the ANSI/ISO standards are fixed length? Perhaps this is why numeric encoding schemes often had with leading zeros? I also think that from a human viewpoint, "0004" looks more like an encoding than just plain old integer 4.

The worst VARCHAR(n) encoding I can think of is British postal codes. They are an alphanumeric mix based on physical post offices that no longer exist. In fact, they are so bad that they've been replaced for bulk mailers with the five-digit Mailsort code.

I agree that badly designed encoding schemes add complexity. The problem is not the storage or datatypes; it's a fundamental design problem.

>> Finally, it's also vastly easier to assign the next value to a numeric value rather than a string one. <<
You've just snuck in something that you probably don't even think about. Not all identifiers are sequences! In fact, the Social Security Number which used to have a sequence for its last four positions stopped doing that several years ago. But that nasty old measurement theory that says nominal scales have no ordering, no origin, and no meaningful computations, so you wouldn't even think of this.

doesn't do that anymore because it's too dangerous Just because a column is numeric doesn't mean that every arithmetic calc on it will yield a practical, usable result. Item_quantity would be numeric, but a square root of it is still meaningless.

>> The check-digit validation will just require an additional step for numeric values: first, convert the entire value to a string. The rest of the process should be identical to what it would be for a value stored as char. <<

You can't argue one minute that your numerics are making computations faster and then argue that they had an extra step that would never have been needed if they had been strings in the first place.

My Texas drivers license identifier is eight digits. if I store that license number as an INTEGER I will use 4 bytes. That's not much of savings and it's not worth the extra burden. Every time I read that integer, I have to put it into a string and pad that string with leading zeros. That's a little module of code. Then I need to have a DDL check constraint to make sure nobody puts in a zero, out of range or negative number. I love doing that when someone's done a numeric identifier; it almost always messes up their data because people were lazy. My character string license identifier has another advantage; I can change the regular expression I used to validated my string so that when Texas changes the layout of the license numbers to include alphabetic, or punctuation. Or a new validation rule. Look at the difference between ISBN-10 and ISBN-13.

Please post DDL and follow ANSI/ISO standards when asking for help.

ScottPletcher SSC Guru Points: 101120 More actions · Answer 8

jcelko212 32090 - Tuesday, February 5, 2019 12:57 PM
>> It's stored as numeric because it's more efficient, both in terms of size and in terms of built-in validity checking. <<
Saving storage space today is not the issue that it was back when we were making the Y2K mistakes. And I disagree with you about the validity checking. The ISO standards all use a subset of Latin characters, digits, and punctuation. This was so that Unicode could get them into every written language on earth. It also means that we can use very simple regular expressions for validation.
>> Some of our identifiers require a bigint value. That could takes 16+ bytes of char data, but only 8 bytes of numeric data.<<
And then you have to take your bigint value, unpack it converted to characters and then display it. I wish I could remember who said this, but "store data of the way you use it; use date of the way it's been stored".
>> Also, if we make the column VARCHAR(n), then we need complex CHECK(s) to make sure that only valid digits are entered into the column. <<
Perhaps this is why most of the ANSI/ISO standards are fixed length? Perhaps this is why numeric encoding schemes often had with leading zeros? I also think that from a human viewpoint, "0004" looks more like an encoding than just plain old integer 4.
The worst VARCHAR(n) encoding I can think of is British postal codes. They are an alphanumeric mix based on physical post offices that no longer exist. In fact, they are so bad that they've been replaced for bulk mailers with the five-digit Mailsort code.
I agree that badly designed encoding schemes add complexity. The problem is not the storage or datatypes; it's a fundamental design problem.
>> Finally, it's also vastly easier to assign the next value to a numeric value rather than a string one. <<
You've just snuck in something that you probably don't even think about. Not all identifiers are sequences! In fact, the Social Security Number which used to have a sequence for its last four positions stopped doing that several years ago. But that nasty old measurement theory that says nominal scales have no ordering, no origin, and no meaningful computations, so you wouldn't even think of this.
doesn't do that anymore because it's too dangerous Just because a column is numeric doesn't mean that every arithmetic calc on it will yield a practical, usable result. Item_quantity would be numeric, but a square root of it is still meaningless.
>> The check-digit validation will just require an additional step for numeric values: first, convert the entire value to a string. The rest of the process should be identical to what it would be for a value stored as char. <<
You can't argue one minute that your numerics are making computations faster and then argue that they had an extra step that would never have been needed if they had been strings in the first place.
My Texas drivers license identifier is eight digits. if I store that license number as an INTEGER I will use 4 bytes. That's not much of savings and it's not worth the extra burden. Every time I read that integer, I have to put it into a string and pad that string with leading zeros. That's a little module of code. Then I need to have a DDL check constraint to make sure nobody puts in a zero, out of range or negative number. I love doing that when someone's done a numeric identifier; it almost always messes up their data because people were lazy. My character string license identifier has another advantage; I can change the regular expression I used to validated my string so that when Texas changes the layout of the license numbers to include alphabetic, or punctuation. Or a new validation rule. Look at the difference between ISBN-10 and ISBN-13.

The DDL check for a numeric value is extremely simple: CHECK(value > 0). The check DDL for your char(8) will have to be at least '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'. That's on top of doubling the column size ... for every id in every table in every db. No, it's just not worth it.

Ok, maybe not every single id is sequential -- nice dodge there -- but the vast majority of them are. How else do you quickly and conveniently get an unused number? SEQUENCEs work well for that. You'd need a separate table to get the next value(s) for a char column. Ugh, what overhead and hassle.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jonathan AC Roberts SSCoach Points: 18361 More actions · Answer 9

jcelko212 32090 - Tuesday, February 5, 2019 12:57 PM
>> It's stored as numeric because it's more efficient, both in terms of size and in terms of built-in validity checking. <<
Saving storage space today is not the issue that it was back when we were making the Y2K mistakes. And I disagree with you about the validity checking. The ISO standards all use a subset of Latin characters, digits, and punctuation. This was so that Unicode could get them into every written language on earth. It also means that we can use very simple regular expressions for validation.
>> Some of our identifiers require a bigint value. That could takes 16+ bytes of char data, but only 8 bytes of numeric data.<<
And then you have to take your bigint value, unpack it converted to characters and then display it. I wish I could remember who said this, but "store data of the way you use it; use date of the way it's been stored".
>> Also, if we make the column VARCHAR(n), then we need complex CHECK(s) to make sure that only valid digits are entered into the column. <<
Perhaps this is why most of the ANSI/ISO standards are fixed length? Perhaps this is why numeric encoding schemes often had with leading zeros? I also think that from a human viewpoint, "0004" looks more like an encoding than just plain old integer 4.
The worst VARCHAR(n) encoding I can think of is British postal codes. They are an alphanumeric mix based on physical post offices that no longer exist. In fact, they are so bad that they've been replaced for bulk mailers with the five-digit Mailsort code.
I agree that badly designed encoding schemes add complexity. The problem is not the storage or datatypes; it's a fundamental design problem.
>> Finally, it's also vastly easier to assign the next value to a numeric value rather than a string one. <<
You've just snuck in something that you probably don't even think about. Not all identifiers are sequences! In fact, the Social Security Number which used to have a sequence for its last four positions stopped doing that several years ago. But that nasty old measurement theory that says nominal scales have no ordering, no origin, and no meaningful computations, so you wouldn't even think of this.
doesn't do that anymore because it's too dangerous Just because a column is numeric doesn't mean that every arithmetic calc on it will yield a practical, usable result. Item_quantity would be numeric, but a square root of it is still meaningless.
>> The check-digit validation will just require an additional step for numeric values: first, convert the entire value to a string. The rest of the process should be identical to what it would be for a value stored as char. <<
You can't argue one minute that your numerics are making computations faster and then argue that they had an extra step that would never have been needed if they had been strings in the first place.
My Texas drivers license identifier is eight digits. if I store that license number as an INTEGER I will use 4 bytes. That's not much of savings and it's not worth the extra burden. Every time I read that integer, I have to put it into a string and pad that string with leading zeros. That's a little module of code. Then I need to have a DDL check constraint to make sure nobody puts in a zero, out of range or negative number. I love doing that when someone's done a numeric identifier; it almost always messes up their data because people were lazy. My character string license identifier has another advantage; I can change the regular expression I used to validated my string so that when Texas changes the layout of the license numbers to include alphabetic, or punctuation. Or a new validation rule. Look at the difference between ISBN-10 and ISBN-13.

But to create a primary key with an int or bigint all you need to do is define the table like this:
CREATE TABLE myTable ( myTableId int IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED, Col1 nvarchar(20) NOT NULL, Col2 nvarchar(20) NOT NULL );
Then you can forget about it, no need to invent some text column which needs a complicated algorithm to work out the value. You also don't usually need to display the primary key anywhere, you just use it for joins and uniqueness.
If you had a database with 400 tables in it each one having a text column for the primary key it would become a total nightmare to maintain.
There also is some maths done on these columns: 1 is added to it to get the next value.

Alan Burstein SSC Guru Points: 61141 More actions · Answer 10

Jeffrey Williams 3188 - Tuesday, February 5, 2019 12:32 PM
Jonathan AC Roberts - Wednesday, January 30, 2019 12:50 PM
Luis Cazares - Wednesday, January 30, 2019 11:18 AM
This could be a faster alternative if we consider the measurements used on the performance testings for [DelimitedSplit8K] and [DelimitedSplit8K_Lead].
CREATE FUNCTION [dbo].[SortDelimitedStringN4K] --===== Define I/O parameters (@pString VARCHAR(8000), @pDelimiter CHAR(1)) RETURNS TABLE WITH SCHEMABINDING AS RETURN --===== "Inline" CTE Driven "Tally Tableâ€ produces values from 0 up to 10,000... -- enough to cover VARCHAR(8000) WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ), --10E+1 or 10 rows E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front -- for both a performance gain and prevention of accidental "overruns" SELECT 0 UNION ALL SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4 ), cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter) SELECT t.N+1 FROM cteTally t WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0) ) --===== Do the actual split, sort and concatenate again. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found. SELECT STUFF( (SELECT @pDelimiter + SUBSTRING(@pString,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1) OVER (ORDER BY s.N1) - 1),0)-s.N1,4000)) FROM cteStart s ORDER BY 1 FOR XML PATH(''), TYPE).value('./text()[1]', 'varchar(max)'), 1, 1, '') AS SortedString ; GO

Note that the way of calling it is a bit different
SELECT sds.SortedString, * FROM #TEST CROSS APPLY [dbo].[SortDelimitedStringN4K](Column1,';') sds
Yes, an Inline Table-valued Function is much faster than a Scalar-valued Function for this.
As of SQL Server 2017 - there is the STRING_AGG built-in function: https://docs.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql?view=sql-server-2017

+1 (and a like) - STRING_AGG is wonderful.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

jcelko212 32090 SSCrazy Eights Points: 9303 More actions · Answer 11

Jeff Moden - Monday, February 4, 2019 2:28 PM

jcelko212 32090 - Monday, February 4, 2019 11:16 AM

Jeff Moden - Sunday, February 3, 2019 8:08 PM
jcelko212 32090 - Friday, February 1, 2019 1:46 PM

No problem, Joe. All you have to do is tell us what the "unique identifier" (the thing that people are currently using that you're claiming no math will ever be used on) should be for a Customer table? Once you've told us that, please explain the impact that will have not only on the clustered index (which is what I presume you would assign such a key to) but every non-clustered index. Once you're done with that, do the same for the related address and phone number tables. Then I'll do the same for the thing you think has no calculations done against it. 😉

As you say, no vague unsupported statements on your part please because, so far, that's mostly all you've ever done on this subject.

I think I've answered this several times in the past. I do not believe in Kabbalah number magic. There is no such thing as a universal identifier that God put into everything in the universe. There is no such thing as a generic all-purpose "customer"; one enterprise might work with individuals, another only with corporations, and an enterprise might be a cash business which has sales and no customers identified whatsoever. If I'm working with corporations, my customer identifier might be the DUNS (actually required by law. In some places). Or it might be a tax identification number of both individuals, lawful persons, and corporations.

In short, your question makes no sense. The customer exists only within a logical data model of the enterprise, not as some Kabbalah number assigned by God or dependent on physical storage on one machine.

For the phone numbers. I've got the international standard E.164 family. For addresses within the United States, I would use CASS. I can buy software to validate both of these standards and I can verify them against third-party databases. If I need to get addresses from overseas, then I need to find out their countries standards are.

Please post DDL and follow ANSI/ISO standards when asking for help.

ScottPletcher SSC Guru Points: 101120 More actions · Answer 12

jcelko212 32090 - Tuesday, February 5, 2019 2:53 PM

Jeff Moden - Monday, February 4, 2019 2:28 PM

jcelko212 32090 - Monday, February 4, 2019 11:16 AM
Jeff Moden - Sunday, February 3, 2019 8:08 PM
jcelko212 32090 - Friday, February 1, 2019 1:46 PM

No problem, Joe. All you have to do is tell us what the "unique identifier" (the thing that people are currently using that you're claiming no math will ever be used on) should be for a Customer table? Once you've told us that, please explain the impact that will have not only on the clustered index (which is what I presume you would assign such a key to) but every non-clustered index. Once you're done with that, do the same for the related address and phone number tables. Then I'll do the same for the thing you think has no calculations done against it. 😉

As you say, no vague unsupported statements on your part please because, so far, that's mostly all you've ever done on this subject.

I think I've answered this several times in the past. I do not believe in Kabbalah number magic. There is no such thing as a universal identifier that God put into everything in the universe. There is no such thing as a generic all-purpose "customer"; one enterprise might work with individuals, another only with corporations, and an enterprise might be a cash business which has sales and no customers identified whatsoever. If I'm working with corporations, my customer identifier might be the DUNS (actually required by law. In some places). Or it might be a tax identification number of both individuals, lawful persons, and corporations.

In short, your question makes no sense. The customer exists only within a logical data model of the enterprise, not as some Kabbalah number assigned by God or dependent on physical storage on one machine.

For the phone numbers. I've got the international standard E.164 family. For addresses within the United States, I would use CASS. I can buy software to validate both of these standards and I can verify them against third-party databases. If I need to get addresses from overseas, then I need to find out their countries standards are.

So the day you decide to expand your business beyond corporations, your entire existing key structure has to be re-done from scratch? That sounds like an awful idea. That's why one uses a key that has no preset meaning or value. Do you really use tax number for some of your keys? Is that still legal??

And while knowing the DUNS may be required by law for certain things, the feds obviously don't require you to use DUNS as your internal, identitying key.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 13

jcelko212 32090 - Tuesday, February 5, 2019 2:53 PM

Jeff Moden - Monday, February 4, 2019 2:28 PM

jcelko212 32090 - Monday, February 4, 2019 11:16 AM
Jeff Moden - Sunday, February 3, 2019 8:08 PM
jcelko212 32090 - Friday, February 1, 2019 1:46 PM

No problem, Joe. All you have to do is tell us what the "unique identifier" (the thing that people are currently using that you're claiming no math will ever be used on) should be for a Customer table? Once you've told us that, please explain the impact that will have not only on the clustered index (which is what I presume you would assign such a key to) but every non-clustered index. Once you're done with that, do the same for the related address and phone number tables. Then I'll do the same for the thing you think has no calculations done against it. 😉

As you say, no vague unsupported statements on your part please because, so far, that's mostly all you've ever done on this subject.

I think I've answered this several times in the past. I do not believe in Kabbalah number magic. There is no such thing as a universal identifier that God put into everything in the universe. There is no such thing as a generic all-purpose "customer"; one enterprise might work with individuals, another only with corporations, and an enterprise might be a cash business which has sales and no customers identified whatsoever. If I'm working with corporations, my customer identifier might be the DUNS (actually required by law. In some places). Or it might be a tax identification number of both individuals, lawful persons, and corporations.

In short, your question makes no sense. The customer exists only within a logical data model of the enterprise, not as some Kabbalah number assigned by God or dependent on physical storage on one machine.

For the phone numbers. I've got the international standard E.164 family. For addresses within the United States, I would use CASS. I can buy software to validate both of these standards and I can verify them against third-party databases. If I need to get addresses from overseas, then I need to find out their countries standards are.

Heh... yes you have. Pretty much the same thing you've answered above, which is the usual rhetoric . Using individual Tax Ids is wrong at so many levels not to mention that they can be changed on request. Duns might be right but only if they have a DUNs and they actually know what it is. Have you actually ever created a real database that used Duns?

So, if you can't use a Tax ID and you can't use a DUNs, what do YOU use?

And phone numbers are absolutely stupid to use as a customer identifier because they violate the first and most important rule for Primary Keys... they must be immutable and phone numbers are anything but immutable. The same holds true for addresses. They cannot be used as a PK because they are not immutable. They're also not necessarily unique.

To this date and with the possible exception of a DUNs number, which can't be used for all customer types, you've failed to produce a viable recommendation for Primary Keys that will meet all of the requirements of Primary Keys and stand the test of time. And I don't need to produce documentation on what those requirements are because, of all the people in the world, you should know already.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 14

Almost forgot, Joe. If you want to see how what you're calling "God Numbers" are being used and why they're important, then just click on the following Microsoft article. Once there, search for "_id" (without the quotes).
https://docs.microsoft.com/en-us/sql/relational-databases/system-catalog-views/sys-objects-transact-sql?view=sql-server-2017

Heh... according to you, you helped design all this. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

jcelko212 32090 SSCrazy Eights Points: 9303 More actions · Answer 15

ScottPletcher - Wednesday, January 30, 2019 11:38 AM
This topic comes up so much, I think MS should "fudge" a little on the relational model and create a type of "char array". They include not just selection against it, but order by also.

Having spent some time working on standards for languages, it's not as easy as you might think. The SQL standard string is case-sensitive, has an explicit correlation that can be changed (and can be more complicated than just single characters -- double L used to be treated as a separate letter in Spanish, etc.). In SQL, string comparisons pad out the shorter string with blanks (watch out for Arabic and Hebrew!), But the xBase languages truncate the longer string. Some languages are not case-sensitive and some are. Some have only one correlation, and others allow it to be set.

Now given a list of character strings, do we treat the string as a permutation or a combination? Can I nest lists inside lists (like we do in LISP)? What about list concatenation? Can the elements within the list have different correlations? How about different datatypes? What library functions do we need? Can I locate a sublist within the list?

Please post DDL and follow ANSI/ISO standards when asking for help.