is there any difference between != and <>

Question

is there any difference between != and <>

Viewing 15 posts - 46 through 60 (of 62 total)

You must be logged in to reply to this topic. Login to reply

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 1

Jeff Moden (11/15/2007)
Me too! Anyone know where there might be a posting for it? Or, maybe post it here so we can do some testing on it?

http://omnibuzz-sql.blogspot.com/2006/06/concatenate-values-in-column-in-sql.html

Bottom of the page. It's actually a CSQ, but it seems to run like the wind with proper indexing (8 seconds re-concatenating the 3.54M phone numbers I extracted previously). Those aren't really long sequences to concatenate however (which was what some were reporting as tripping the thing up).

if you're interested - the code:

set @StartTime=getdate()

create index ix_mattPhoneExtTest2 on #mattPhoneExtTest(rid) include (val)

PRINT CONVERT(CHAR(12),GETDATE()-@StartTime,114) + ' Duration (hh:mi:ss:mmm)- IDX creation'

set @StartTime=getdate()

select @dummy=rID, @dummy2=stuff( ( select ','+ val

from #mattPhoneExtTest t1

where t2.rID = t1.rID

for xml path('')

),1,1,'')

from #mattPhoneExtTest t2

group by rID

order by rID

PRINT CONVERT(CHAR(12),GETDATE()-@StartTime,114) + ' Duration (hh:mi:ss:mmm) - XML path() with IDX'

Results:

00:00:07:827 Duration (hh:mi:ss:mmm)- IDX creation

00:00:08:717 Duration (hh:mi:ss:mmm) - XML path() with IDX

00:00:21:590 Duration (hh:mi:ss:mmm) - XML path() no IDX

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jeff Moden SSC Guru Points: 1004441 More actions · Answer 2

Ok... first things first... after a (finally) decent night's sleep, I did the Phase II code for the phone number problem. Assuming you still have the Tally table active, here's the code in whole... don't need to run any of it separately...

Like I said, this is where SQL Server and I are gonna get our ears folded back by a CLR... SQL Server pretty much sucks at true ReGex...

--===========================================================================================================

PRINT REPLICATE('=',78)

PRINT SPACE(4)+'Jeff Moden''s Tally Table Solution "Phone test - Phase 1 (Extraction)".'

PRINT REPLICATE('-',78)

--===========================================================================================================

--===== Declare and start a time to measure duration with

DECLARE @StartTime DATETIME

SET @StartTime = GETDATE()

--===== Suppress the auto-display of rowcounts for appearances

SET NOCOUNT ON

--===== If the target table already exists, drop it

IF OBJECT_ID('TempDB.dbo.MyHead','U') IS NOT NULL

DROP TABLE TempDB.dbo.MyHead

--===== Declare a local variable for some dynamic SQL

DECLARE @sql VARCHAR(300)

--===== Get the required data and and reserve a column for an index, as well

-- Index will be used to get rid of "3rd" telephone number for each row.

SELECT m.RID,

RTRIM(SUBSTRING(m.Doc,t.N,14)) AS TelNum,

StartPos = t.N,

YPos = CASE

WHEN RIGHT(SUBSTRING(m.Doc,t.N,14),1)=' ' --No space pattern

THEN 6 --No space pattern

ELSE 7 --Has space pattern

END,

0 AS [Index]

INTO TempDB.dbo.MyHead

FROM dbo.MattTestText m

LEFT OUTER JOIN dbo.Tally t WITH (NOLOCK)

ON SUBSTRING(m.Doc,t.N,1) = '(' --This check doubles the performance (halves the duration)

AND ( SUBSTRING(m.Doc,t.N,15) LIKE '%([0-9][0-9][0-9]) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9] %'

OR SUBSTRING(m.Doc,t.N,14) LIKE '%([0-9][0-9][0-9])[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9] %')

AND t.N < LEN(m.Doc)-13

-- Display the rowcount

PRINT STR(@@ROWCOUNT,12) + ' rows created in the Temp table...'

--===== Create the required index

CREATE UNIQUE INDEX IX_tmpMyHead_RID_StartPos

ON TempDB.dbo.MyHead (RID,StartPos,Ypos)

--===== Create the [Index] column which identifies which number telephone number in a row is represented

-- This has to be dynamic because the index used in WITH(INDEX()) doesn't exist at compile time

SET @sql = '

DECLARE @PrevRID INT, @PrevIndex INT

SELECT @PrevRID = 0, @PrevIndex = 0

UPDATE TempDB.dbo.MyHead WITH (TABLOCK)

SET @PrevIndex = [Index] = CASE WHEN RID = @PrevRID THEN @PrevIndex+1 ELSE 1 END,

@PrevRID = RID

FROM TempDB.dbo.MyHead WITH (INDEX(IX_tmpMyHead_RID_StartPos))

WHERE StartPos > 0'

EXEC (@SQL)

--===== Get rid of any "3rd" telephone numbers

DELETE TempDB.dbo.MyHead WHERE [INDEX] = 3

--===== Display the duration

PRINT CONVERT(CHAR(12),GETDATE()-@StartTime,114) + ' Duration (hh:mi:ss:mmm)'

PRINT REPLICATE('=',78)

--===== End of test... display the first 100 rows of each table for visual check...

SELECT TOP 100 * FROM TempDB.dbo.MyHead ORDER BY RID,StartPos

GO

--===========================================================================================================

-- Create a function to Redaction the Doc column for Phase II of the test

--===========================================================================================================

--===== If the function already exists, drop it

IF OBJECT_ID('dbo.fnJBMRedaction') IS NOT NULL

DROP FUNCTION dbo.fnJBMRedaction

GO

--===== Create a UDF to Redaction a row

CREATE FUNCTION dbo.fnJBMRedaction (@RID INT, @Doc VARCHAR(850))

RETURNS VARCHAR(850)

AS

BEGIN

SELECT @Doc =

STUFF(

@Doc,

StartPos+Ypos-1,3,'YYY'),

StartPos+1,3,'XXX')

FROM TempDB.dbo.MyHead

WHERE RID = @RID

RETURN @Doc

END

GO

--===========================================================================================================

PRINT REPLICATE('=',78)

PRINT SPACE(12)+'Jeff Moden''s Non-RegEx Solution "Phone test - Phase 2 (Redaction)".'

PRINT REPLICATE('-',78)

--===========================================================================================================

--===== Declare and start a time to measure duration with

DECLARE @StartTime DATETIME

SET @StartTime = GETDATE()

--===== Copy the original data to the column to be Redactioned (DocRed)

UPDATE dbo.MattTestText WITH (TABLOCKX)

SET DocRed = dbo.fnJBMRedaction(RID,Doc)

-- Display the rowcount

PRINT STR(@@ROWCOUNT,12) + ' rows Redactioned...'

--===== Display the duration

PRINT CONVERT(CHAR(12),GETDATE()-@StartTime,114) + ' Duration (hh:mi:ss:mmm)'

PRINT REPLICATE('=',78)

--===== End of test... display the first 100 rows of each table for visual check...

SELECT TOP 100 * FROM dbo.MattTestText ORDER BY RID

--===========================================================================================================

-- Houskeeping... cleanup the stuff we put on Matt's drive

--===========================================================================================================

--===== If the target table already exists, drop it

IF OBJECT_ID('TempDB.dbo.MyHead','U') IS NOT NULL

DROP TABLE TempDB.dbo.MyHead

--===== If the function already exists, drop it

IF OBJECT_ID('dbo.fnJBMRedaction') IS NOT NULL

DROP FUNCTION dbo.fnJBMRedaction

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004441 More actions · Answer 3

Bottom of the page. It's actually a CSQ, but it seems to run like the wind with proper indexing (8 seconds re-concatenating the 3.54M phone numbers I extracted previously). Those aren't really long sequences to concatenate however (which was what some were reporting as tripping the thing up).

Matt,

What does the orginal data look like and what does it end up looking like after your concatenation?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 4

First - test results. Took a while to get the code realigned correctly after the copy/paste operation. I made one change to your code (making the @sql variable 1000 instead of 300 - it seemed to be truncating the statement)

==============================================================================

Jeff Moden's Tally Table Solution "Phone test - Phase 1 (Extraction)".

------------------------------------------------------------------------------

4026650 rows created in the Temp table...

00:02:53:357 Duration (hh:mi:ss:mmm)

==============================================================================

Jeff Moden's Non-RegEx Solution "Phone test - Phase 2 (Redaction)".

------------------------------------------------------------------------------

500000 rows Redactioned...

00:02:19:417 Duration (hh:mi:ss:mmm)

==============================================================================

There's one unknown left. Your extraction process failed to pick up 13K phone numbers, usually the last one in a DOC. Not sure why yet.

So - not as fast, but personally I'd still say both methods are "in range" (cleaning up 3.5M phone numbers in less than 3 minutes still ain't so shabby, and that's with a "single drive installation" of SQL server.)

The update phase - well, that's one thing regex is really good at.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 5

Jeff Moden (11/15/2007)
Matt,
What does the orginal data look like and what does it end up looking like after your concatenation?

The original data is the output of Phone Test #1. In other words a table with RID (which repeats) and VAL (a varchar(20) to concatenate).

sample output

1(205)267-1886 ,(118)968-6573 ,(136) 232-1351,(192) 660-4222,(856) 398-8842

2(126) 151-6314,(303)449-1778 ,(202)100-1299 ,(108) 147-1487,(232) 124-8413,(821) 213-2451

3(246) 120-2076,(386) 752-1275,(114)134-1698 ,(326)173-1150 ,(110)190-1003 ,(173) 181-1402,(132) 110-1908

4(109)268-1933 ,(710) 207-2316,(449)249-1302 ,(184)113-1468 ,(217) 198-6872,(187)138-7872 ,(184)210-1855 ,(527) 208-1880,(212)148-1957

5(118) 162-1347,(167)198-1976 ,(566) 105-1552,(384) 219-6923,(192)203-1105 ,(639)192-8775 ,(328) 204-2221,(623)124-1244 ,(120) 162-8677

6(195)108-2059 ,(119) 115-2446,(621) 181-1370,(644) 193-1051,(178) 176-8503,(140) 550-1360

7(417) 132-9134,(718) 112-1727,(101) 110-1062,(188) 206-1340,(106) 909-1192,(238) 744-2143,(120) 128-1164,(104) 165-9994

8(183)449-1619 ,(173)172-2507 ,(177)134-3391 ,(734)125-1635 ,(227) 612-2049,(295) 140-1867,(108)374-4209

9(687) 162-3682,(144)445-7692 ,(166) 984-6919,(640)487-2014 ,(550) 521-1211,(162) 147-1325,(156) 934-1061

10(156) 194-2554,(137)148-1946 ,(120)782-4540 ,(162)179-1139 ,(205)712-1560 ,(214) 994-1809,(326) 734-1048

11(143)988-7749 ,(359) 141-1983,(189) 488-8303,(808) 528-2132,(146) 154-1435,(114)142-9461 ,(212)687-1136 ,(751)121-1204

12(101)904-6971 ,(517) 106-1043,(945) 565-7758,(442)892-6715 ,(423) 311-5400,(621) 157-1308

13(127)378-7069 ,(692)833-3567 ,(472) 541-2195,(203) 156-1671,(164)137-9184 ,(153) 128-2106,(231) 153-3532,(103)265-8522

14(206) 611-1905,(156) 694-6223,(736)103-1067 ,(137) 143-1634

15(971)171-1079 ,(220) 640-1698,(204)432-1451 ,(123) 140-5961,(173)166-9052 ,(104) 125-3202,(163) 128-9482,(207) 176-1887

16(212)894-6784 ,(115)592-9209 ,(352)209-2115 ,(140)193-1376 ,(666)749-1685

17(293)793-2040 ,(139) 877-1467,(211) 553-1489,(169) 152-1267,(148)129-6236 ,(258) 641-1745,(201)106-1371 ,(115) 844-1268

18(107) 763-8238,(145) 187-4945,(857)837-6324 ,(203) 873-1633,(192)145-8031 ,(563)144-1590 ,(173)385-9178 ,(171)202-3628

19(201) 178-7597,(111) 976-1995,(987)586-1023 ,(847)184-8850 ,(155)710-6013 ,(178)128-1641

20(178)180-1055 ,(100)151-1342 ,(121) 301-4615,(118) 445-7220,(898)628-2009 ,(178)141-3891 ,(166)775-1602

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jeff Moden SSC Guru Points: 1004441 More actions · Answer 6

Oh, I will add in an additional challenge if you two feel like it.
Take data of the following form, with a lot, lot more data (say 100 000 rows and 250 categories)
CREATE TABLE #Test
Category INT,
Code VARCHAR(5)
)
INSERT INTO #Test VALUES (1,'a')
INSERT INTO #Test VALUES (1,'t')
INSERT INTO #Test VALUES (1,'e')
INSERT INTO #Test VALUES (1,'s')
INSERT INTO #Test VALUES (1,'u')
INSERT INTO #Test VALUES (2,'u')
INSERT INTO #Test VALUES (2,'e')
INSERT INTO #Test VALUES (2,'n')
INSERT INTO #Test VALUES (2,'a')
INSERT INTO #Test VALUES (3,'i')
INSERT INTO #Test VALUES (3,'z')
and output data of the form. (order of the characters isn't important, since there's nothing to order by)
1, 'a,t,e,s,u'
2, 'u,e,n,a'
3, 'i,z'
My gut feel is that a user-defined aggregate will out perform any T-SQL by quite a way.

Heh... "Gut feel"... time for some "liposuction" courtesy of some "real" T-SQL to slim down the times with :hehe:

http://omnibuzz-sql.blogspot.com/2006/06/concatenate-values-in-column-in-sql.html

That's the URL that Matt posted explaining how poorly the "Concatenation Function" that everyone has come to know and hate runs so terribly slow. Well, it's not SQL Server 2000, T-SQL, or the UDF that's slow...

In every concatenation example code I've seen (including a couple that yours-truly has made the mistake of writing :blush: ), they do it absolutely the wrong way... first, they make a UDF that looks something like this... (format from the the "Omnibuzz" URL that Matt posted...--=======================================================

-- Written by someone who doesn't really know how to use

-- the COALESCE function. Requires the STUFF function

-- to remove the leading comma the code produces.

--=======================================================

CREATE FUNCTION dbo.fnConcat (@Category INT)

RETURNS VARCHAR(8000)

AS

BEGIN

DECLARE @Result VARCHAR(1000)

SELECT @Result = COALESCE(@Result,'')+','+Code

FROM dbo.JBMTest

WHERE Category = @Category

RETURN STUFF(@Result,1,1,'')

END

Anyone care to identify what's wrong with that picture? :blink: Well, whoever wrote that bloody thing, doesn't really know how to use COALESCE here... if s(he) did, then they'd know that the extra STUFF function to remove the superfluous leading comma is absolutely NOT necessary if the COALESCE function is used correctly... like this......--=======================================================

-- Better function uses the power of COALESCE to

-- elimanate the need for the STUFF function.

--=======================================================

CREATE FUNCTION dbo.fnConcat (@Category INT)

RETURNS VARCHAR(8000)

AS

BEGIN

DECLARE @Result VARCHAR(1000)

SELECT @Result = COALESCE(@Result+',','')+Code

FROM dbo.JBMTest

WHERE Category = @Category

RETURN @Result

ENDSee the difference??? Also, the problem with that code is that there is absolutely no guarantee as to the order of the parts of the result string. Ya gotta have an ORDER BY somewhere... Since I usually create tables with an IDENTITY column just for this reason, my final UDF code to solve Gails good problem definition is as follows:--=======================================================

-- Correctly written function uses the power of the

-- COALESCE function to eliminate the need for the STUFF

-- function AND the ORDER BY of a "positional" column

-- (could be an IDENTITY or DATETIME column) guarantees

-- the parts of the result string will be in the correct

-- order.

--=======================================================

CREATE FUNCTION dbo.fnConcat(@Category INT)

RETURNS VARCHAR(8000)

AS

BEGIN

DECLARE @Return VARCHAR(8000)

SELECT @Return = COALESCE(@Return+',','')+Code

FROM dbo.JBMTest

WHERE Category = @Category

ORDER BY RowNum

RETURN @Return

ENDNow... someone goes through all that trouble to guarantee that the function works fast and correctly, like in the above, and then they write some really terrible code to use it... probably because they don't know how GROUP BY or DISINCT works anymore than they know how COALESCE works. 😛 PLEASE don't bother running this code... I stopped it after 9 minutes...

--=======================================================

-- This is some really bad code that will take

-- comparitively forever to run. Can YOU see why?

--=======================================================

DECLARE @Dummy1 INT, @Dummy2 VARCHAR(8000)

SELECT @Dummy1 = Category,

@Dummy2 = dbo.fnConcat(Category)

FROM dbo.JBMTest

GROUP BY Category

So, why am I being so tough on that simple bit of seeming set-based code? The answer is that the author of that code didn't account for what's going to happen behind the scenes... if there are 100,000 rows of 250 categories, that means each category will have an (approximate) average of 400 "Codes".

The code above will do a GROUP BY on 100,000 rows... the function will process an average of 400 rows per row... 100,000 x 400 is a whopping big 40 MILLION rows that need to not only be calculated, but then also need to be grouped as well. (Another way to state this is that the correct answer for each Category will be calculated 400 times!) And then folks have the nerve to say that T-SQL sucks at concatenation! :hehe: It's not T-SQL, it's the code writer that doesn't know how to correctly use T-SQL or what it does behind the scenes! 😀

I know, I know... "Put your money where your mouth is, Jeff." OK... Let's get started...

First, here's the code to setup a test table as Gail suggested... 100,000 rows, 250 "Categories", 26 different single letter "Codes"... this takes about 1 second to generate...--===== Create and populate a 100,000 row test table.

-- Column "RowNum" has a range of 1 to 1,000,000 unique numbers

-- Column "Category" has a range of 1 to 250 non-unique numbers

-- Column "Code" has a range of "A-Z"

-- Takes just over a second to execute including creation of the PK.

SELECT TOP 100000

RowNum = IDENTITY(INT,1,1),

Category = ABS(CHECKSUM(NEWID()))%250+1,

Code = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

INTO dbo.JBMTest

FROM Master.dbo.SysColumns t1,

Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN

--===== A table is not properly formed unless a Primary Key has been assigned

ALTER TABLE dbo.JBMTest

ADD PRIMARY KEY CLUSTERED (RowNum)

As a reminder... here's the correct function to use for concatenation of this table as Gail described...--=======================================================

-- Correctly written function uses the power of the

-- COALESCE function to eliminate the need for the STUFF

-- function AND the ORDER BY of a "positional" column

-- (could be an IDENTITY or DATETIME column) guarantees

-- the parts of the result string will be in the correct

-- order.

--=======================================================

CREATE FUNCTION dbo.fnConcat(@Category INT)

RETURNS VARCHAR(8000)

AS

BEGIN

DECLARE @Return VARCHAR(8000)

SELECT @Return = COALESCE(@Return+',','')+Code

FROM dbo.JBMTest

WHERE Category = @Category

ORDER BY RowNum

RETURN @Return

END

...and, here's the CORRECT way to use the function to concatenate as Gail described...--=======================================================

-- The right way to make a Comedy Delimited column...

-- elimate the duplication for the Category column up

-- front instead of recalculating the same answer over

-- and over and over as done with the GROUP BY on both

-- the Category and function.

--=======================================================

DECLARE @Dummy1 INT, @Dummy2 VARCHAR(8000)

SELECT @Dummy1 = d.Category,

@Dummy2 = dbo.fnConcat(d.Category)

FROM (--==== Derived table "d" finds distinct Category

SELECT DISTINCT Category

FROM dbo.JBMTest

)dTHAT code returns in even less time (6 seconds on my slow 6year old desktop... can't wait for Matt to run it on his box!) than the XML PATH method that everyone has become so enamoured with. And, THAT's without any special indexing... you should see how fast it runs with the correctly designed index...

--===== Create an index 6 times the performance

CREATE INDEX IX_JBMTest_Category_Code

ON dbo.JBMTest (Category,Code)

Matt, if you'd like to do the honors on the same box that you tested the XML Path code on, here's the whole shootin' match in one script...

--===== Use a safe database that everyone has

USE TempDB

GO

--===== If the test table exists, drop it

IF OBJECT_ID('dbo.JBMTest','U') IS NOT NULL

DROP TABLE dbo.JBMTest

GO

PRINT 'Creating the test table in TempDB, where it''s safe...'

--===== Create and populate a 100,000 row test table.

-- Column "RowNum" has a range of 1 to 1,000,000 unique numbers

-- Column "Category" has a range of 1 to 250 non-unique numbers

-- Column "Code" has a range of "A-Z"

-- Takes just over a second to execute including creation of the PK.

SELECT TOP 100000

RowNum = IDENTITY(INT,1,1),

Category = ABS(CHECKSUM(NEWID()))%250+1,

Code = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

INTO dbo.JBMTest

FROM Master.dbo.SysColumns t1,

Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN

--===== A table is not properly formed unless a Primary Key has been assigned

ALTER TABLE dbo.JBMTest

ADD PRIMARY KEY CLUSTERED (RowNum)

--===== Create an index for additional speed

CREATE INDEX IX_JBMTest_Category_Code

ON dbo.JBMTest (Category,Code)

GO

--========================================================================================

--===== If the function already exists, drop it

IF OBJECT_ID('dbo.fnConcat') IS NOT NULL

DROP FUNCTION dbo.fnConcat

GO

--===== Creat the function for the test

CREATE FUNCTION dbo.fnConcat(@Category INT)

RETURNS VARCHAR(8000)

AS

BEGIN

DECLARE @Return VARCHAR(8000)

SELECT @Return = COALESCE(@Return+',','')+Code

FROM dbo.JBMTest

WHERE Category = @Category

ORDER BY RowNum

RETURN @Return

END

GO

--=======================================================================================

PRINT REPLICATE('=',78)

PRINT SPACE(16)+'Jeff Moden''s "Improved" Concatenation Method".'

PRINT REPLICATE('-',78)

--=======================================================================================

--===== Declare and start a time to measure duration with

DECLARE @StartTime DATETIME

SET @StartTime = GETDATE()

--===== Run the concatenation code into some variable to eliminate any disparity between

-- display times on different machines

DECLARE @Dummy1 INT, @Dummy2 VARCHAR(8000)

SELECT @Dummy1 = d.Category,

@Dummy2 = dbo.fnConcat(d.Category)

FROM (--==== Derived table "d" finds distinct Category

SELECT DISTINCT Category

FROM dbo.JBMTest

)d

--===== Display the duration

PRINT CONVERT(CHAR(12),GETDATE()-@StartTime,114) + ' Duration (hh:mi:ss:mmm)'

PRINT REPLICATE('=',78)

--===========================================================================================================

-- Houskeeping... cleanup the stuff we put on Matt's drive

--===========================================================================================================

--===== If the target table already exists, drop it

IF OBJECT_ID('TempDB.dbo.JBMTest','U') IS NOT NULL

DROP TABLE TempDB.dbo.JBMTest

--===== If the function already exists, drop it

IF OBJECT_ID('dbo.fnConcat') IS NOT NULL

DROP FUNCTION dbo.fnConcat

Here's what I get for 100,000 rows with 250 categories (avg of 400 codes per category)...

Creating the test table in TempDB, where it's safe...

(100000 row(s) affected)

==============================================================================

Jeff Moden's "Improved" Concatenation Method".

------------------------------------------------------------------------------

00:00:01:110 Duration (hh:mi:ss:mmm)

==============================================================================

... and here's what I get with 1,000,000 rows with 500 categories (avg of 2000 codes per category).

Creating the test table in TempDB, where it's safe...

(1000000 row(s) affected)

==============================================================================

Jeff Moden's "Improved" Concatenation Method".

------------------------------------------------------------------------------

00:00:10:903 Duration (hh:mi:ss:mmm)

==============================================================================

Fast enough for ya? 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 7

Jeff - one last update on Phone Test #1.

I got the results to match. two smallish updates:

- changed the AND t.N < LEN(m.Doc)-14 to AND t.N < LEN(m.Doc)-11

and that got rid of MOST of the missing ones.

- the other one involves dealing with a difference in behavior between 2000 and 2005 with regards to

the "inner loop" process (the dynamic SQL statement). It does some funky things unless you add some dummy values to

make sure the @ variables are updating correctly.

So the updated dynamic looks like:

SET @sql = 'DECLARE @PrevRID INT, @PrevIndex INT , @dummy INT,@dummy2 INT

SELECT @PrevRID = 0,@PrevIndex = 0

UPDATE TempDB.dbo.MyHead WITH (TABLOCK)

SET @PrevIndex = [Index] = CASE WHEN RID = @PrevRID THEN @PrevIndex+1 ELSE 1 END,

@dummy=@previndex,

@PrevRID = RID,

@dummy2=@prevrid

FROM TempDB.dbo.MyHead

WITH (INDEX(IX_tmpMyHead_RID_StartPos)) WHERE StartPos > 0'

EXEC (@SQL)

No noticeable difference in processing time.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 8

Jeff - I was going to quote you...but I didn't know what to pick up. You're right - most of the examples in there are fairly disgusting SQL.

That being said - I took your script and added two more contenders, and came out with this. The 100,000 row test was TOO fast. As Tim "the Toolman" Taylor would say - "your table lacked power...so I reworked it". (It's okay, grunting is allowed). Only change I made was to bop the rowcount to 10,000,000 and the category count to 25,000 (so we don't blow out the varchar limit).

Results...

(10000000 row(s) affected)

==============================================================================

Jeff Moden's "Improved" Concatenation Method".

------------------------------------------------------------------------------

00:00:21:547 Duration (hh:mi:ss:mmm)

==============================================================================

The XML path() method

------------------------------------------------------------------------------

00:00:08:423 Duration (hh:mi:ss:mmm)

==============================================================================

The CLR UDA version

------------------------------------------------------------------------------

00:00:20:517 Duration (hh:mi:ss:mmm)

==============================================================================

As I said earlier - I was very surprised by this result as well... The User-defined aggregate dbo.concatenate is a copy and paste out of the BOL example, and it definitely held its own. The XML Path just tore it up....

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jeff Moden SSC Guru Points: 1004441 More actions · Answer 9

No noticeable difference in processing time.

You mean on the dynamic SQL by itself or the whole Phone Test #1?

Thanks for the great tip (we call them "oolies" in the US Navy) on the dynamic SQL in 2k5...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004441 More actions · Answer 10

==============================================================================
The XML path() method
------------------------------------------------------------------------------
00:00:08:423 Duration (hh:mi:ss:mmm)
==============================================================================

Now... THAT's impressive! It reduced the processing time by almost ²/₃rds. Rare thing when you find a gem like that. Thanks!

EDIT: What I REALLY like about it, is it make the CLR blow chunks! Of course, my T-SQL didn't fair any better, but it was a close race 😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 11

Jeff Moden (11/15/2007)
No noticeable difference in processing time.
You mean on the dynamic SQL by itself or the whole Phone Test #1?
Thanks for the great tip (we call them "oolies" in the US Navy) on the dynamic SQL in 2k5...

No noticeable time difference processing phone Test 1 once I made the adjustments. Just got the match counts equal on both methods.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jeff Moden SSC Guru Points: 1004441 More actions · Answer 12

Sorry... must still have the "ranger eye's" going on from lack of sleep the other day... I just say the results you posted for the run.

Thanks for fixing my boo-boo. I also gotta say that was a hell of a lot of fun! All of it! Learned a heck of a lot, today!

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004441 More actions · Answer 13

Took a while to get the code realigned correctly after the copy/paste operation

Matt,

Reformatting code that you've copied from those "Code" windows is actually pretty easy (although I wish they'd just fix it so it works right!!!).... put the mouse pointer just above a code window (must be outside the code window). Click and hold while you select down past the code window (again, must be outside the code window).

Copy and paste the select into word... leading spaces will be preserved. Then, do a Search'n'Replace... replace ^l (circumflex small "L", manual line break in Word) with ^p (circumflex small "P", paragraph mark in Word).

Then, copy from the Word document into Query Analyzer (or that "thing" for 2k5 😛 ) and all is pretty... 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 14

Jeff Moden (11/15/2007)
Took a while to get the code realigned correctly after the copy/paste operation
Matt,
Reformatting code that you've copied from those "Code" windows is actually pretty easy (although I wish they'd just fix it so it works right!!!).... put the mouse pointer just above a code window (must be outside the code window). Click and hold while you select do past the code window (again, must be outside the code window).
Copy and paste the select into word... leading spaces will be preserved. Then, do a Search'n'Replace... replace ^l (circumflex small "L", manual line break in Word) with ^p (circumflex small "P", paragraph mark in Word).
Then, copy from the Word document into Query Analyzer (or that "thing" for 2k5 😛 ) and all is pretty... 😉

I umm...know... I've been avoiding that method ever since I fat-fingered some SSC code into the middle of several very large technical documents.....:blink:

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 15

Jeff Moden (11/15/2007)
Sorry... must still have the "ranger eye's" going on from lack of sleep the other day... I just say the results you posted for the run.
Thanks for fixing my boo-boo. I also gotta say that was a hell of a lot of fun! All of it! Learned a heck of a lot, today!

So did I - it turned out to be VERY useful. I'm still mulling over these results, because I think the best solution might be a combination of the two processes. There seems to be a giant suction noise coming from the performance factor of a CROSS APPLY. The RegexMatches TVF seems to kick some butt, but using it in a cross apply seems to even the game out. I'm wondering how a (scalar) regexMatch function hooked up to a Tally Table might do.

...

Where RegexMatch would take in string to check, pattern to look for, and the occurence number, and would return the nth match in a string.

...

Let me think on it this weekend. I have to be on a plane by 6AM, so tonight's not an option.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?