November 13, 2007 at 1:52 am
Both "not equal" operators can be used in SA 5.5 but only != works in ASE 11.5. [Top]
!= not SQL-92 standard
November 13, 2007 at 2:04 am
As neither is 'wrong', they both work then its a matter of personal choice/practice.
Just be consistent and be prepared to do translation when porting your SQL to other databases if you don't use <> 🙂
Hiding under a desk from SSIS Implemenation Work :crazy:
November 13, 2007 at 4:10 am
Jeff Moden (11/13/2007)
...and here's your advantage... I don't have 2k5 to play with... I have to do it using only T-SQL 'cause I only have 2k. 😉
Is not fair then. 😀 I prefer level playing fields
Seriously though, the last demo I saw was some fairly simple piece of string manip on a large table (100 000+ rows) and the CLR function came in way faster than the T-SQL function, and they were both well-written. It wasn't a case of good CLR vs bad T-SQL.
When I find 5 min at work I'll give it a try. Don't have visual studio at home, and don't fancy using the command line compiler. I'll try a few simple things and post the code and times for both.
I do believe we're getting seriously off-topic here though. Will start a thread when I have some numbers
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 13, 2007 at 5:25 am
Sure... a little off topic... but I love it! Looking forwards to your post(s), Gail.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 13, 2007 at 5:29 am
Shaun McGuile (11/13/2007)
Jeff you rascal 😀
Heh... You know me, Shaun... I even challenge myself if I think I'm wrong 😀 Good man once told me "A Developer must not guess, a Developer must know". So when I see a challenge like that, I gotta find out... since I don't have 2k5, I'm hoping the other person will agree to a "race".
--Jeff Moden
Change is inevitable... Change for the better is not.
November 13, 2007 at 5:34 am
Shaun McGuile (11/13/2007)
As neither is 'wrong', they both work then its a matter of personal choice/practice.Just be consistent and be prepared to do translation when porting your SQL to other databases if you don't use <> 🙂
Heh... let's see someone port an Oracle trigger to SQL Server without deploying RBAR. 😉 And, Yes, it can be done but demonstrates that writing trully portable SQL is pretty much a myth. 😉 I know, I know... I'm off subject again 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
November 13, 2007 at 5:40 am
What will be the prize?
and
What will the loser have to do in forfeiture?
and
Will everyone win as the knowledge will be most useful?
Regards
Shaun
Hiding under a desk from SSIS Implemenation Work :crazy:
November 13, 2007 at 5:49 am
Took the words right out of my mouth... prize for participants and casual observers alike will be more knowledge. "Some of the best 'Things' in life are not 'Things'" :hehe:
--Jeff Moden
Change is inevitable... Change for the better is not.
November 13, 2007 at 10:38 am
Jeff Moden (11/13/2007)
Sting manipulation? Complex mathematical calcs? Heh... Let's have a race... let's start with string manipulation... How about splitting a comma seperated column and returning a table with the PK of each row and the split value... 1 row for every split on a 100k row table? You write a CLR to solve, I'll write the T-SQL to solve. I'll even be happy to provide the test data if you want.Or, how about doing Init Caps on a column in a million row table?
Or, how about removing all characters except digits and letters from a column in a million row table?
Or, name the problem...
...and here's your advantage... I don't have 2k5 to play with... I have to do it using only T-SQL 'cause I only have 2k. 😉
I'll pick up that gauntlet:).. How about we up the ante a little. Split the string, return the PK, the split value AND the index in the original string....
or - pick out every "phone number" in a char(500) field based on format? how about just the 7th instance?
come on - you know I can't just walk away from that one...... and I don't need an advantage - I will be happy to run said test on the same hardware...
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
November 13, 2007 at 8:56 pm
I'll pick up that gauntlet.. How about we up the ante a little. Split the string, return the PK, the split value AND the index in the original string....
Heh... Good Man! I just knew you'd be the one to bite, Matt... we've been having a lot of fun at this on other threads... 🙂
Ok... here's the rules...
1. You're going to use a CLR... you may pick ANY language to write the CLR but you must tell us what you used.
2. I'm going to use my ol' favorite... Tally Table. Here's the code for it just in case you don't have it for testing...
--===== Create and populate the Tally table on the fly
SELECT TOP 11000 --equates to more than 30 years of dates
IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
--===== Add a Primary Key to maximize performance
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N
PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100
--===== Allow the general public to use it
GRANT SELECT ON dbo.Tally TO PUBLIC
3. We both use the same data. With that in mind, here's the data we'll use. We may not add, remove, or alter any columns or calculated columns. We may add/remove/change any indexes on the column we see fit but we must reveal what they are.
DROP TABLE JBMTest
GO
--===== Create and populate a 100,000 row test table.
-- Column "RowNum" has a range of 1 to 100,000 unique numbers
-- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers
-- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings
-- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers
-- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times
-- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'
-- for all rows.
-- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)
-- Takes about 2 seconds to execute.
SELECT TOP 100000
SomeID = IDENTITY(INT,1,1),
SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,
SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)
+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),
SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),
SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),
SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),
SomeHex12 = RIGHT(NEWID(),12)
INTO dbo.JBMTest
FROM Master.dbo.SysColumns t1,
Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN
--===== A table is not properly formed unless a Primary Key has been assigned
-- Takes about 1 second to execute.
ALTER TABLE dbo.JBMTest
ADD PRIMARY KEY CLUSTERED (SomeID)
--===== Change a couple of the SomeCSV values to produce special situations
UPDATE dbo.JBMTest
SET SomeCSV = NULL
WHERE SomeID = 1
UPDATE dbo.JBMTest
SET SomeCSV = ''
WHERE SomeID = 2
UPDATE dbo.JBMTest
SET SomeCSV = ','
WHERE SomeID = 3
UPDATE dbo.JBMTest
SET SomeCSV = ',Part02,Part03,Part04,,,Part07,Part08,Part09,'
WHERE SomeID = 4
UPDATE dbo.JBMTest
SET SomeCSV = ',Part02,Part03,Part04, , ,Part07,Part08,Part09,'
WHERE SomeID = 5
UPDATE dbo.JBMTest
SET SomeCSV = 'Part01'
WHERE SomeID = 6
UPDATE dbo.JBMTest
SET SomeCSV = NULL
WHERE SomeID = 7
4. The result of the split will be a table containing 1 row for each "split value". Although the table may contain any number of columns or indexes, it must contain at least the Primary Key column (SomeID) from the test table, the SplitValue, and the [Index] of that SplitValue. The end result of the new table must have a Unique Clustered Index at completion.
5. "Missing" values in the original test table must be single blanks in the new table as verified by DATALENGTH.
6. Fully NULL string in the test table must return a NULL for the SplitValue. Index in that case may be NULL or zero, your choice.
7. SplitValue column must be able to handle max string width of 8000 characters (obviously NOT NVarChar).
8. All code, except for SELECT TOP 100 * for verification purposes and duration measurment code, must be included in duration measurements.
With all of that in mind... here's my submittal including duration code...
--===== Identify the run
PRINT REPLICATE('=',78)
PRINT SPACE(12)+'Jeff Moden''s Tally Table Solution for Indexed Parsing.'
PRINT REPLICATE('-',78)
--===== Declare and start a time to measure duration with
DECLARE @StartTime DATETIME
SET @StartTime = GETDATE()
--===== Make sure the scratchpad table doesn't already exist
IF OBJECT_ID('TempDB..#MyHead','U') IS NOT NULL
DROP TABLE #MyHead
--===== Do the split including an index for each part
-- Empty or blank strings/split values return BLANKS with the correct Index.
-- Totally Null string returns NULL with a "0" index, saves about 2 seconds if NULL returned instead.
-- Leading and trailing spaces in each split value are removed.
SELECT h.SomeID,
SplitValue = LTRIM(RTRIM(SUBSTRING(h.SomeCsv, t.N+1, CHARINDEX(',', h.SomeCsv, t.N+1)-t.N-1))),
[Index] = ISNULL(t.N-DATALENGTH(REPLACE(LEFT(h.SomeCsv,t.N),',','')),0)
INTO #MyHead
FROM dbo.Tally t
RIGHT OUTER JOIN --Necessary in case SomeCsv is NULL
(SELECT SomeID, SomeCsv = ','+SomeCsv+',', Length = LEN(SomeCSV)+2 FROM dbo.jbmTest) h
ON SUBSTRING(h.SomeCsv, t.N, 1) = ','
AND t.N < h.Length
--===== Add the Unique Clustered Index
CREATE UNIQUE INDEX UCIX_tmpMyHead_SomeID_Index
ON #MyHead (SomeID,[Index])
--===== Display the duration
PRINT CONVERT(CHAR(12),GETDATE()-@StartTime,114) + ' Duration (hh:mi:ss:mmm)'
PRINT REPLICATE('=',78)
--===== Check the first 100 rows...
SELECT TOP 100 *
FROM #MyHead
ORDER BY SomeID,[Index]
For the phone number challenge... I'll let you produce the test data...
--Jeff Moden
Change is inevitable... Change for the better is not.
November 13, 2007 at 10:53 pm
I will tackle the split challenge tomorrow - I will not get a chance to set up both of the test scenarios etc.
Phone numbers challenge... This is 500,000 records, but that can get bumped up at any time if you'd prefer.
Here's the data:
use test
go
DROP TABLE mattTestText
create table mattTestText
(rid int identity (1,1) not null,
fun char(100),
doc char(850), docred char(850))
insert matttestText (fun,doc)
select top 500000 cast(newid() as char(100)),
--1
left('('+left(abs(checksum(cast(newid() as
varchar(100)))),3)+')'+ISNULL(REPLICATE(' ',abs(checksum(cast(newid() as
varchar(100))))%2),'')
+left(abs(checksum(cast(newid() as varchar(100)))),3)
+'-'+left(abs(checksum(cast(newid() as varchar(100)))),4)+ '
'+cast(newid() as varchar(100)),abs(checksum(cast(newid() as
varchar(100))))%75)
--2
+
left('('+left(abs(checksum(cast(newid() as
varchar(100)))),3)+')'+ISNULL(REPLICATE(' ',abs(checksum(cast(newid() as
varchar(100))))%2),'')
+left(abs(checksum(cast(newid() as varchar(100)))),3)
+'-'+left(abs(checksum(cast(newid() as varchar(100)))),4)+ '
'+cast(newid() as varchar(100)),abs(checksum(cast(newid() as
varchar(100))))%75)
--3
+
left('('+left(abs(checksum(cast(newid() as
varchar(100)))),3)+')'+ISNULL(REPLICATE(' ',abs(checksum(cast(newid() as
varchar(100))))%2),'')
+left(abs(checksum(cast(newid() as varchar(100)))),3)
+'-'+left(abs(checksum(cast(newid() as varchar(100)))),4)+ '
'+cast(newid() as varchar(100)),abs(checksum(cast(newid() as
varchar(100))))%75)
--4
+
left('('+left(abs(checksum(cast(newid() as
varchar(100)))),3)+')'+ISNULL(REPLICATE(' ',abs(checksum(cast(newid() as
varchar(100))))%2),'')
+left(abs(checksum(cast(newid() as varchar(100)))),3)
+'-'+left(abs(checksum(cast(newid() as varchar(100)))),4)+ '
'+cast(newid() as varchar(100)),abs(checksum(cast(newid() as
varchar(100))))%75)
--5
+
left('('+left(abs(checksum(cast(newid() as
varchar(100)))),3)+')'+ISNULL(REPLICATE(' ',abs(checksum(cast(newid() as
varchar(100))))%2),'')
+left(abs(checksum(cast(newid() as varchar(100)))),3)
+'-'+left(abs(checksum(cast(newid() as varchar(100)))),4)+ '
'+cast(newid() as varchar(100)),abs(checksum(cast(newid() as
varchar(100))))%75)
--6
+
left('('+left(abs(checksum(cast(newid() as
varchar(100)))),3)+')'+ISNULL(REPLICATE(' ',abs(checksum(cast(newid() as
varchar(100))))%2),'')
+left(abs(checksum(cast(newid() as varchar(100)))),3)
+'-'+left(abs(checksum(cast(newid() as varchar(100)))),4)+ '
'+cast(newid() as varchar(100)),abs(checksum(cast(newid() as
varchar(100))))%75)
--7
+
left('('+left(abs(checksum(cast(newid() as
varchar(100)))),3)+')'+ISNULL(REPLICATE(' ',abs(checksum(cast(newid() as
varchar(100))))%2),'')
+left(abs(checksum(cast(newid() as varchar(100)))),3)
+'-'+left(abs(checksum(cast(newid() as varchar(100)))),4)+ '
'+cast(newid() as varchar(100)),abs(checksum(cast(newid() as
varchar(100))))%75)
--8
+
left('('+left(abs(checksum(cast(newid() as
varchar(100)))),3)+')'+ISNULL(REPLICATE(' ',abs(checksum(cast(newid() as
varchar(100))))%2),'')
+left(abs(checksum(cast(newid() as varchar(100)))),3)
+'-'+left(abs(checksum(cast(newid() as varchar(100)))),4)+ '
'+cast(newid() as varchar(100)),abs(checksum(cast(newid() as
varchar(100))))%75)
--9
+
left('('+left(abs(checksum(cast(newid() as
varchar(100)))),3)+')'+ISNULL(REPLICATE(' ',abs(checksum(cast(newid() as
varchar(100))))%2),'')
+left(abs(checksum(cast(newid() as varchar(100)))),3)
+'-'+left(abs(checksum(cast(newid() as varchar(100)))),4)+ '
'+cast(newid() as varchar(100)),abs(checksum(cast(newid() as
varchar(100))))%75)
--10
+
left('('+left(abs(checksum(cast(newid() as
varchar(100)))),3)+')'+ISNULL(REPLICATE(' ',abs(checksum(cast(newid() as
varchar(100))))%2),'')
+left(abs(checksum(cast(newid() as varchar(100)))),3)
+'-'+left(abs(checksum(cast(newid() as varchar(100)))),4)+ '
'+cast(newid() as varchar(100)),abs(checksum(cast(newid() as
varchar(100))))%75)
from Master.dbo.SysColumns sc1, Master.dbo.SysColumns sc2
create unique clustered index pk_matttestText on mattTestText(rid)
- There are UP TO 10 phone numbers in each row.
- phone numbers will be formatted as (999)999-9999. They will ALWAYS be followed by a space, and SOMETIMES will include a space after the ). The space in the middle is to be preserved, the end one should not
Phase I:
- return a table with clustered index, sporting RID (the PK from test table), the phone number, position within initial string. display top 100 rows.
- do NOT include the 3rd phone found in each row.
- same rules as to nulls and blank values, return values, duration code as you advanced.
- you may not alter the DDL or the data of the test table, except to add any non-clustered indexes you see fit. Full disclosure is in effect (any indexes to be added need to be disclosed.
Phase II:
- update the INITIAL TEST DATA with "redacted" versions of all phone number. the redacted format is (XXX)YYY-9999 where XXX and YYY are literals, and 9999 are the original last 4 numbers. You MUST preserve the format, so if the middle space as described previously is present in the original, it must also be present in the redacted version.
- put the redacted contents of DOC in the DOCRED field listed above.
- same rules as previously described otherwise.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
November 14, 2007 at 3:10 am
Heh... you've hit SQL Server in it's weak spot... RegEx... I know that's what you'll build into your CLR and I've got a funny feeling I'm going to get my ears folded back on these. If I do, ya gotta agree to one more test... combination of Running Balance, Running Count, Grouped Running Balance, and Grouped Running count.
I've got phase 1 ready and I'm going to get some shut-eye before I jump into phase two...
--Jeff Moden
Change is inevitable... Change for the better is not.
November 14, 2007 at 3:22 am
Wow. You guys have been busy overnight. I think I'll bow out of this race while I'm behind... 😀
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 14, 2007 at 3:25 am
Good idea Gail, I'm not gonna get in the way of battling giants!:D
Hiding under a desk from SSIS Implemenation Work :crazy:
November 14, 2007 at 6:34 am
Giants? Nah - I aint that tall....
More like - boys with their toys:)
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Viewing 15 posts - 16 through 30 (of 62 total)
You must be logged in to reply to this topic. Login to reply