Using the DelimitedSplit8k function - Help Jeff!

Question

Post reply

Using the DelimitedSplit8k function - Help Jeff!

Viewing 4 posts - 16 through 18 (of 18 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 1

Here's a functional test of the latest CLR code that Paul posted.

SELECT Extent = CONVERT(NVARCHAR(4000),d.Extent)

INTO #TestTable

FROM (

SELECT N'A;B;C;D;E;F;G' UNION ALL

SELECT N'A;B;C;;E;F;G' UNION ALL

SELECT N'A;B;C;E;F;G' UNION ALL

SELECT N'A;B' UNION ALL

SELECT N'A;' UNION ALL

SELECT N';A' UNION ALL

SELECT N'A' UNION ALL

SELECT N';;' UNION ALL

SELECT N';' UNION ALL

SELECT N'' UNION ALL

SELECT NULL

) d (Extent)

;

SELECT

Col01 = dbo.SplitElement(Extent, N';', 1),

Col02 = dbo.SplitElement(Extent, N';', 2),

Col03 = dbo.SplitElement(Extent, N';', 3),

Col04 = dbo.SplitElement(Extent, N';', 4),

Col05 = dbo.SplitElement(Extent, N';', 5),

Col06 = dbo.SplitElement(Extent, N';', 6),

Col07 = dbo.SplitElement(Extent, N';', 7)

FROM #TestTable

;

Col01Col02Col03Col04Col05Col06Col07

ABCDEFG

ABCEFG

ABCEFGNULL

ABNULLNULLNULLNULLNULL

ANULLNULLNULLNULLNULL

ANULLNULLNULLNULLNULLNULL

NULLNULLNULLNULL

NULLNULLNULLNULLNULL

NULLNULLNULLNULLNULLNULL

NULLNULLNULLNULLNULLNULLNULL

It certainly works the way I would expect it to. I haven't checked it for performance yet because the OP now has a critical decision to make first. There are 3 possibilities that I see here.

1. Continue to use the old code where the source code has been lost. The code can be transfered to another computer using the method I cited before. The problem with this method is that because the original source code has been lost, it's "black box" code that's just going to have to be trusted for no future changes. Of course, people do such things everyday when they call some of the COM oriented extended stored procedures that Microsoft built into SQL Server so I'm not real sure I see a problem with that. One HUGE advantage here is that NO CODE CHANGES anywhere else need to be done which is going to save a HUGE amount on regression testing.

2. Use Paul's well documented CLR after you rename it to be the same as the old CLR. This has all the advantages of (1) above as well as now having some well documented source code. A possible disadvantage is the source code could be lost again as it was in (1) above.

3. At this point, this 3rd possibility has some HUGE disadvantages. That is, rework all affected code to use the T-SQL solution. The disadvantages, of course, are that there could be quite a bit of rework depending on how many objects the CLR was used in. That means development time, unit test time, and regression tesing once the new objects have been integrated. The advantage, of course is, you'll never lose the source code again because it's all T-SQL. If the functionality is available, you can see the source code.

Of course, we don't yet know about performance differences between any of the 3 methods. Just as an example, if (3) turns out to be even just a little slower (or faster... don't know yet), what is the value of not having to manage external source code? If it does turn out to be faster, is it worth the regression testing that will be necessary to implement it? And what of the future? Will the code ever have to go to a shop where CLR is not allowed? Will the scale change enough where performance becomes an important factor?

Like I said, if the OP would post the readily consumable code to build the old CLR as Paul has done with his new CLR, I'd be happy to do a performance test across a large sample for all 3 possible methods. Such a test might make this a no-brainer if it isn't already.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

MyDoggieJessie SSC-Forever Points: 44486 More actions · Answer 2

Thanks so much it's for the additional feedback in this, I will test out the CLR code to see if I'm gaining any advantage over the tally table method! In my testing so far everything is good...but it never hurts to make things more efficient! 😀

______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 3

MyDoggieJessie (6/9/2012)
Thanks so much it's for the additional feedback in this, I will test out the CLR code to see if I'm gaining any advantage over the tally table method! In my testing so far everything is good...but it never hurts to make things more efficient! 😀

If the "Tally" table method you posted ends up being quite a bit slower, it may be because you're splitting all of the elements out of the string for each individual element you're looking for. If you want to get a bit more performance out of it, using the splitter function as it was inteneded to be used would give you (I believe, I still haven't tested it) much better performance. Of course, that would mean doing the rewrite I suggested along with all of the regression testing.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 4

My tests just finished (I'll post the code after I clean it up a bit). If you use the DelimitedSplit8k function as it was intended to be used and you mix that with CROSS TAB functionality, it turns out to be about 15% faster than the CLR. If you use it the way you posted it, the CLR blows it away (as I suspected).

My recommendation would be to tweek the Name and the inputs on Paul's CLR to match those of the original CLR and be done with it all. No regression testing required.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)