June 16, 2016 at 7:48 pm
I'm trying to create what I believe should be a very simple Table Valued Function CLR. Here's a T-SQL version of what I'm trying to create:
CREATE FUNCTION dbo.NGLoop(@string varchar(8000), @N int)
RETURNS @NG TABLE (Position int, Token varchar(8000)) AS
BEGIN
DECLARE @token varchar(8000), @position int = 0;
WHILE @position < LEN(@string) - (@N-1) AND (@N > 0 AND @N <= LEN(@string))
BEGIN
SELECT @token = SUBSTRING(@string,@position+1,@N),
@position = @position+1;
INSERT @NG VALUES(@position, @token)
END
RETURN;
END
GO
Here's some examples of what the results should look like (note the comments):
DECLARE @string varchar(8000) = 'abc123';
SELECT * FROM dbo.NGLoop(@string,1); -- should return 6 rows: (1,a), (2,b), (3,c), etc...
SELECT * FROM dbo.NGLoop(@string,2); -- should return 5 rows: (1,ab), (2,bc), (3,c1), etc...
SELECT * FROM dbo.NGLoop(@string,3); -- should return 4 rows: (1,abc), (2,bc1), (3,c12), etc...
SELECT * FROM dbo.NGLoop(NULL,1); -- should return nothing
SELECT * FROM dbo.NGLoop('xxx',-1); -- should return nothing
SELECT * FROM dbo.NGLoop('xxx',NULL); -- should return nothing
SELECT * FROM dbo.NGLoop('xxx',10); -- should return nothing
Hopefully this is simple and self-explanatory enough to understand what I'm trying to do. a C# or VB.Net version should be fine. I've been racking my brain here, any help would be greatly appreciated. Thanks!
-- Itzik Ben-Gan 2001
June 16, 2016 at 11:07 pm
Shouldn't be too hard using Linq...
using System.Linq;
[Microsoft.SqlServer.Server.SqlFunction(FillRowMethodName = "clrNGLoop_FillRow", TableDefinition = "Position int, Token nvarchar(4000)")]
public static IEnumerable clrNGLoop(SqlString value, SqlInt32 n)
{
int idx = 1;
Lookup<int, char> l = (Lookup < int, char> )value.Value.ToLookup(x => idx++, y => y);
return l.Select(x => new Item() { Position = x.Key, Token = new string(l.Where(y => y.Key >= x.Key && y.Key < (x.Key + n.Value)).Select(z => z.First()).ToArray()) }).Where(v=>v.Token.Length == n.Value);
}
public static void clrNGLoop_FillRow(Object obj, out SqlInt32 position, out SqlString token)
{
Item item = (Item)obj;
position = new SqlInt32(item.Position);
token = new SqlString(item.Token);
}
public class Item
{
public int Position;
public string Token;
}
June 17, 2016 at 8:25 am
Just curious on why you want to use CLR for this. I hope that you made that function just to make it easy to translate to procedural programming.
If anyone else tries to do something like this in T-SQL, this should be a lot better.
CREATE FUNCTION dbo.NGLoop(@string varchar(8000), @N int)
RETURNS TABLE AS
RETURN
WITH
E(n) AS(
SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(n)
),
E2(n) AS(
SELECT a.n FROM E a, E b
),
E4(n) AS(
SELECT a.n FROM E2 a, E2 b
),
cteTally(n) AS(
SELECT TOP(LEN(@string)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) n
FROM E4
)
SELECT n, SUBSTRING( @string, n, @N) Token
FROM cteTally
WHERE n <= LEN(@string) - @N + 1
AND @N > 0;
GO
June 17, 2016 at 8:49 am
Luis Cazares (6/17/2016)
Just curious on why you want to use CLR for this.
+1000
--Jeff Moden
Change is inevitable... Change for the better is not.
June 18, 2016 at 12:17 pm
Con Alexis thank you sir. That's what I was looking for. Sorry for the late reply, yesterday was my birthday.
Luis. It's for an article I'm working on. I have a tally table based version and wanted to compare the performance to a CLR. I figured that a more people who could write CLRs would understand the procedural code.
-- Itzik Ben-Gan 2001
June 18, 2016 at 12:51 pm
Alan.B (6/18/2016)
Con Alexis thank you sir. That's what I was looking for. Sorry for the late reply, yesterday was my birthday.Luis. It's for an article I'm working on. I have a tally table based version and wanted to compare the performance to a CLR. I figured that a more people who could write CLRs would understand the procedural code.
I hope you had a nice birthday!
Seems like I wasn't too far away from the truth :-). I just got puzzled because the code didn't seem like your usual code. And I just thought on leaving the alternative for someone else.
June 26, 2016 at 4:04 pm
Hey Alan. Interesting request. I do have a suggestion for the code, but first I would like to mention that it is probably best to state at the beginning that the desire for the code is to do performance comparisons as it will help inform / guide responders. For example, it would help when deciding between expediency of writing the code and efficiency of processing the code. Case in point: LINQ is very powerful, flexible, and expressive. However, that does come at a cost at runtime.
Also, when comparing performance between T-SQL and SQLCLR code, it is important to test both a VARCHAR(4000) and NVARCHAR(4000) versions of the T-SQL code since SQLCLR can only do NVARCHAR. So testing both using NVARCHAR(4000) will get the truest sense of performance difference, and testing with VARCHAR(4000) will provide a pragmatic difference since if one knows that their data will only ever be VARCHAR, then that reality won't be constrained with trying to be fair in testing ;-). Please note that I am using 4000 instead of 8000 for the VARCHAR datatypes since it doesn't help to test 8000 characters against SQLCLR using NVARCHAR(4000) as the extra 4000 characters will be silently truncated, resulting in a different input string given to the SQLCLR object.
Here is the iTVF with Tally Table I used (I created the VARCHAR(4000) version by copying and pasting it, changing the name, and changing "@String" to be VARCHAR(4000)):
CREATE FUNCTION dbo.NGramITVF(@String NVARCHAR(4000), @N INT)
RETURNS TABLE
--WITH SCHEMABINDING
AS RETURN
WITH nums AS
(
SELECT TOP (LEN(@string) - (@N - 1))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS [Num]
FROM [sys].[all_columns]
)
SELECT nums.[Num] AS [Position],
SUBSTRING(@string, nums.[Num], @N) AS [Token]
FROM nums;
And here is the .NET / C# code that I used for the SQLCLR streaming TVF:
[Microsoft.SqlServer.Server.SqlFunction(FillRowMethodName = "NGramSolomon_FillRow",
TableDefinition = "Position INT, Token NVARCHAR(4000)")]
public static IEnumerable NGramSolomon(
[SqlFacet(MaxSize = 4000)] SqlChars InputString, SqlInt32 TokenSize)
{
if (InputString.IsNull || TokenSize.IsNull)
{
yield break;
}
int _Take = TokenSize.Value;
int _Limit = ((int)InputString.Length - _Take);
if (_Take < 1 || _Limit < 0)
{
yield break;
}
ResultRow _Result = new ResultRow();
char[] _Chars = InputString.Value;
char[] _Temp = new char[_Take];
for (int _Index = 0; _Index <= _Limit; _Index++)
{
Array.Copy(_Chars, _Index, _Temp, 0, _Take);
_Result.Position = (_Index + 1);
_Result.Token = new SqlChars(_Temp);
yield return _Result;
}
}
public class ResultRow
{
public int Position;
public SqlChars Token;
}
public static void NGramSolomon_FillRow(Object obj,
out SqlInt32 position, out SqlChars token)
{
ResultRow _Item = (ResultRow)obj;
position = new SqlInt32(_Item.Position);
token = _Item.Token;
}
Test results:
NVARCHAR(4000) Tests (4000 characters):
NGLoop : 60 - 80 ms
SQLCLR_ConAlexis : 375 - 400 ms
iTVF_TallyTable : 2 - 6 ms
iTVF_TallyTableVC : 2 - 6 ms
SQLCLR_Solomon : 2 - 9 ms
NVARCHAR(MAX) Tests (40,000 characters):
iTVF_TallyTable : 34 - 50 ms
SQLCLR_ConAlexis : 39,000 ms
SQLCLR_Solomon : 34 - 56 ms
As you can see, the T-SQL version is only a few milliseconds faster than the SQLCLR. Of course, perhaps Alan's version using his "numbersAB" iTVF will be slightly faster, or maybe had I done an inline Tally Table then the T-SQL version might have been slightly faster. But that still wouldn't be much of a difference.
Still, it seems that there is no compelling reason to use SQLCLR for this particular task. HOWEVER, in so far as the NGram stuff might be used for string searches, it should be noted that depending on what those strings are, there could be a case for using SQLCLR. That case would be if the input strings use combining characters. Combining characters are accents and other marks that can be added to many letters and will display in the same position as the base character, appearing as a single character. And multiple combining characters can be used. When present, the base character and any number of combining characters that follow it need to be treated as a single unit. T-SQL has no facility for handling combining sequences, but .NET does. And no, combining characters are not the same as Supplementary Characters, which can be handled correctly in T-SQL when using a Collation ending in _SC, which were introduced in SQL Server 2012.
Take care,
Solomon..
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
June 28, 2016 at 12:50 pm
Solomon Rutzky (6/26/2016)
Hey Alan. Interesting request. I do have a suggestion for the code, but first I would like to mention that it is probably best to state at the beginning that the desire for the code is to do performance comparisons as it will help inform / guide responders. For example, it would help when deciding between expediency of writing the code and efficiency of processing the code. Case in point: LINQ is very powerful, flexible, and expressive. However, that does come at a cost at runtime.Also, when comparing performance between T-SQL and SQLCLR code, it is important to test both a VARCHAR(4000) and NVARCHAR(4000) versions of the T-SQL code since SQLCLR can only do NVARCHAR. So testing both using NVARCHAR(4000) will get the truest sense of performance difference, and testing with VARCHAR(4000) will provide a pragmatic difference since if one knows that their data will only ever be VARCHAR, then that reality won't be constrained with trying to be fair in testing ;-). Please note that I am using 4000 instead of 8000 for the VARCHAR datatypes since it doesn't help to test 8000 characters against SQLCLR using NVARCHAR(4000) as the extra 4000 characters will be silently truncated, resulting in a different input string given to the SQLCLR object.
Here is the iTVF with Tally Table I used (I created the VARCHAR(4000) version by copying and pasting it, changing the name, and changing "@String" to be VARCHAR(4000)):
CREATE FUNCTION dbo.NGramITVF(@String NVARCHAR(4000), @N INT)
RETURNS TABLE
--WITH SCHEMABINDING
AS RETURN
WITH nums AS
(
SELECT TOP (LEN(@string) - (@N - 1))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS [Num]
FROM [sys].[all_columns]
)
SELECT nums.[Num] AS [Position],
SUBSTRING(@string, nums.[Num], @N) AS [Token]
FROM nums;
And here is the .NET / C# code that I used for the SQLCLR streaming TVF:
[Microsoft.SqlServer.Server.SqlFunction(FillRowMethodName = "NGramSolomon_FillRow",
TableDefinition = "Position INT, Token NVARCHAR(4000)")]
public static IEnumerable NGramSolomon(
[SqlFacet(MaxSize = 4000)] SqlChars InputString, SqlInt32 TokenSize)
{
if (InputString.IsNull || TokenSize.IsNull)
{
yield break;
}
int _Take = TokenSize.Value;
int _Limit = ((int)InputString.Length - _Take);
if (_Take < 1 || _Limit < 0)
{
yield break;
}
ResultRow _Result = new ResultRow();
char[] _Chars = InputString.Value;
char[] _Temp = new char[_Take];
for (int _Index = 0; _Index <= _Limit; _Index++)
{
Array.Copy(_Chars, _Index, _Temp, 0, _Take);
_Result.Position = (_Index + 1);
_Result.Token = new SqlChars(_Temp);
yield return _Result;
}
}
public class ResultRow
{
public int Position;
public SqlChars Token;
}
public static void NGramSolomon_FillRow(Object obj,
out SqlInt32 position, out SqlChars token)
{
ResultRow _Item = (ResultRow)obj;
position = new SqlInt32(_Item.Position);
token = _Item.Token;
}
Test results:
NVARCHAR(4000) Tests (4000 characters):
NGLoop : 60 - 80 ms
SQLCLR_ConAlexis : 375 - 400 ms
iTVF_TallyTable : 2 - 6 ms
iTVF_TallyTableVC : 2 - 6 ms
SQLCLR_Solomon : 2 - 9 ms
NVARCHAR(MAX) Tests (40,000 characters):
iTVF_TallyTable : 34 - 50 ms
SQLCLR_ConAlexis : 39,000 ms
SQLCLR_Solomon : 34 - 56 ms
As you can see, the T-SQL version is only a few milliseconds faster than the SQLCLR. Of course, perhaps Alan's version using his "numbersAB" iTVF will be slightly faster, or maybe had I done an inline Tally Table then the T-SQL version might have been slightly faster. But that still wouldn't be much of a difference.
Still, it seems that there is no compelling reason to use SQLCLR for this particular task. HOWEVER, in so far as the NGram stuff might be used for string searches, it should be noted that depending on what those strings are, there could be a case for using SQLCLR. That case would be if the input strings use combining characters. Combining characters are accents and other marks that can be added to many letters and will display in the same position as the base character, appearing as a single character. And multiple combining characters can be used. When present, the base character and any number of combining characters that follow it need to be treated as a single unit. T-SQL has no facility for handling combining sequences, but .NET does. And no, combining characters are not the same as Supplementary Characters, which can be handled correctly in T-SQL when using a Collation ending in _SC, which were introduced in SQL Server 2012.
Take care,
Solomon..
Great stuff Solomon! Its been a busy couple of weeks. I'm going to compile and play around with this code later today and post back. Thanks again sir.
-- Itzik Ben-Gan 2001
July 7, 2016 at 8:26 am
Ok Solomon, I tried to figure this out and failed. Forgive my CLR noobness (I create about 1 CLR/year) and I'm sure this is simple but I'm having a hard time compiling your code. I've been playing around with this on/off for the past week.
I've been trying with the csc.exe in .NET 3.5 and 2.0 I've added (then removed when it did not help) this to the top using <namespace> statements...
using System;
using System.Data;
using Microsoft.SqlServer.Server;
I saved your code as NGramsNV4000.cs and have been attempting to compile the code using this command:
C:\Windows\Microsoft.NET\Framework\v2.0.50727>csc.exe /target:library NGramsNV4000.cs
This is what I get in v2.0.50727:
C:\Windows\Microsoft.NET\Framework\v2.0.50727>csc.exe /target:library NGramsNV4000.cs
Microsoft (R) Visual C# 2005 Compiler version 8.00.50727.5483
for Microsoft (R) Windows (R) 2005 Framework version 2.0.50727
Copyright (C) Microsoft Corporation 2001-2005. All rights reserved.
NGramsNV4000.cs(5,15): error CS1518: Expected class, delegate, enum, interface,
or struct
NGramsNV4000.cs(6,29): error CS1518: Expected class, delegate, enum, interface,
or struct
NGramsNV4000.cs(21,26): error CS1518: Expected class, delegate, enum, interface,
or struct
NGramsNV4000.cs(23,7): error CS1001: Identifier expected
NGramsNV4000.cs(23,9): error CS1518: Expected class, delegate, enum, interface,
or struct
NGramsNV4000.cs(24,7): error CS1001: Identifier expected
NGramsNV4000.cs(24,9): error CS1518: Expected class, delegate, enum, interface,
or struct
NGramsNV4000.cs(24,21): error CS1518: Expected class, delegate, enum, interface,
or struct
NGramsNV4000.cs(24,32): error CS1518: Expected class, delegate, enum, interface,
or struct
NGramsNV4000.cs(31,23): error CS1518: Expected class, delegate, enum, interface,
or struct
NGramsNV4000.cs(34,2): error CS1022: Type or namespace definition, or
end-of-file expected
NGramsNV4000.cs(6,27): error CS0101: The namespace '<global namespace>' already
contains a definition for '?'
NGramsNV4000.cs(4,57): (Location of symbol related to previous error)
NGramsNV4000.cs(23,7): error CS0101: The namespace '<global namespace>' already
contains a definition for '?'
NGramsNV4000.cs(4,57): (Location of symbol related to previous error)
NGramsNV4000.cs(24,7): error CS0101: The namespace '<global namespace>' already
contains a definition for '?'
NGramsNV4000.cs(4,57): (Location of symbol related to previous error)
NGramsNV4000.cs(24,31): error CS0101: The namespace '<global namespace>' already
contains a definition for '?'
NGramsNV4000.cs(4,57): (Location of symbol related to previous error)
This is what I get in v3.5:
C:\Windows\Microsoft.NET\Framework\v3.5>csc.exe /target:library NGramsNV4000.cs
Microsoft (R) Visual C# 2008 Compiler version 3.5.30729.5420
for Microsoft (R) .NET Framework version 3.5
Copyright (C) Microsoft Corporation. All rights reserved.
NGramsNV4000.cs(8,15): error CS1518: Expected class, delegate, enum, interface,
or struct
NGramsNV4000.cs(9,29): error CS1518: Expected class, delegate, enum, interface,
or struct
NGramsNV4000.cs(24,26): error CS1518: Expected class, delegate, enum, interface,
or struct
NGramsNV4000.cs(26,7): error CS1001: Identifier expected
NGramsNV4000.cs(26,9): error CS1518: Expected class, delegate, enum, interface,
or struct
NGramsNV4000.cs(27,7): error CS1001: Identifier expected
NGramsNV4000.cs(27,9): error CS1518: Expected class, delegate, enum, interface,
or struct
NGramsNV4000.cs(27,21): error CS1518: Expected class, delegate, enum, interface,
or struct
NGramsNV4000.cs(27,32): error CS1518: Expected class, delegate, enum, interface,
or struct
NGramsNV4000.cs(34,23): error CS1518: Expected class, delegate, enum, interface,
or struct
NGramsNV4000.cs(37,2): error CS1022: Type or namespace definition, or
end-of-file expected
-- Itzik Ben-Gan 2001
July 7, 2016 at 8:43 am
Alan.B (7/7/2016)
Forgive my CLR noobness (I create about 1 CLR/year)...
You're ahead of me, there. My record has been an average of about -1 per year. That's not a mistake. I've been replacing SQLCLR with T-SQL. 😀 The best replacement, so far, has been for SQLCLR generic audit triggers at work, which were crushing performance. It was taking 4 minutes to update just 4 columns of just 10,000 rows of some 100+ column tables (not my design, for sure). The rewrite got that down to less than 800ms (yeah, I know... I'm slipping :-D) on the widest of the tables.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 9, 2016 at 2:26 pm
Alan.B (7/7/2016)
Ok Solomon, I tried to figure this out and failed. Forgive my CLR noobness (I create about 1 CLR/year) and I'm sure this is simple but I'm having a hard time compiling your code. I've been playing around with this on/off for the past week.I've been trying with the csc.exe in .NET 3.5 and 2.0 I've added (then removed when it did not help) this to the top using <namespace> statements...
using System;
using System.Data;
using Microsoft.SqlServer.Server;
I saved your code as NGramsNV4000.cs and have been attempting to compile the code using this command:
C:\Windows\Microsoft.NET\Framework\v2.0.50727>csc.exe /target:library NGramsNV4000.cs
Hey there, Alan. No need to apologize for anything.
Is there a reason why you aren't using Visual Studio? It makes working with .NET soooo much easier. The "community" edition is free. I can't think of a reason to not use it. Please try that and let me know how it goes.
Take care,
Solomon...
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
July 9, 2016 at 10:56 pm
Alan, if it helps, I just posted the full C# file on PasteBin at:
SQLCLR UDF to parse a string into NGrams
Take care,
Solomon...
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
July 12, 2016 at 7:18 pm
-1 is a wonderful answer. Mixed systems (those with more than one language embedded in each other) are always in efficient and a nightmare for anyone to read or maintain. Converting datatypes alone between languages is awful, but perhaps the biggest hit is finding someone that reads and writes fluently in two or more programming languages (forget about natural languages, we are we are worse). Since 80 to 90% of the cost of a system is in maintaining it, all this does is add to the overhead and total expense of the system the highest level.
Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
July 12, 2016 at 8:09 pm
Solomon Rutzky (7/9/2016)
Alan.B (7/7/2016)
Ok Solomon, I tried to figure this out and failed. Forgive my CLR noobness (I create about 1 CLR/year) and I'm sure this is simple but I'm having a hard time compiling your code. I've been playing around with this on/off for the past week.I've been trying with the csc.exe in .NET 3.5 and 2.0 I've added (then removed when it did not help) this to the top using <namespace> statements...
using System;
using System.Data;
using Microsoft.SqlServer.Server;
I saved your code as NGramsNV4000.cs and have been attempting to compile the code using this command:
C:\Windows\Microsoft.NET\Framework\v2.0.50727>csc.exe /target:library NGramsNV4000.cs
Hey there, Alan. No need to apologize for anything.
Is there a reason why you aren't using Visual Studio? It makes working with .NET soooo much easier. The "community" edition is free. I can't think of a reason to not use it. Please try that and let me know how it goes.
Take care,
Solomon...
Hey Solomon - I just wanted to post a quick update. I recently got a new PC and have not installed Visual Studio yet. I installed VS 2012 and SSDT today and plan on revisiting this - this week. I appreciate your help so much sir. Thanks!
-- Itzik Ben-Gan 2001
July 12, 2016 at 8:31 pm
CELKO (7/12/2016)
-1 is a wonderful answer.
To what? Who are you trying to communicate with here?
Mixed systems (those with more than one language embedded in each other) are always in efficient and a nightmare for anyone to read or maintain.
First, with all due respect, this statement is absolute nonsense. Second, who are you trying to communicate with here? Me? Are you trying to discourage me from using a CLR? Re-read this thread, particularly my reply to Luis' question.
...but perhaps the biggest hit is finding someone that reads and writes fluently in two or more programming languages (forget about natural languages, we are we are worse).
Also nonsense. Complete and absolute meaningless blather.
Welcome back Joe. I missed you.
-- Itzik Ben-Gan 2001
Viewing 15 posts - 1 through 15 (of 23 total)
You must be logged in to reply to this topic. Login to reply