August 25, 2016 at 3:57 pm
Matt Miller (#4) (8/25/2016)
I'm sorry if I am taking the fun out of building a SQL only solution, but there are lots of basic samples of SQLCLR showing how to expose the Regex.Matches() method from .NET CLR.A good evrsion is located here courtesy of Phil Factor:
https://www.simple-talk.com/sql/t-sql-programming/clr-assembly-regex-functions-for-sql-server-by-example/[/url]
NO, NO, NO!
Use NGrams8K[/url]. It will perform better than any CLR or regex solution Ive ever seen and you dont need to install anything, learn any new programming languages, change any SQL Server settings or add any new assemblies.
I'm on a train without SQL Server access at the moment but I'll post something later tonight unless someone beats me to it.
-- Itzik Ben-Gan 2001
August 25, 2016 at 4:52 pm
The content column of the catalog is xml data, so why not query it as such?
I don't have a sql 2012 RS database to grab the correct namespace and I don't have any DataSourceReference tags in my 2016 data, but the general format of the query would be this for 2016.
WITH XMLNAMESPACES( default 'http://schemas.microsoft.com/sqlserver/reporting/2016/01/reportdefinition')
SELECT
cast(cast([Content] as varbinary(max)) as xml).value('(//DataSources/DataSource/@Name)[1]','varchar(100)') as FirstDataSource
, cast(cast([Content] as varbinary(max)) as xml).value('(//DataSources/DataSource/@Name)[2]','varchar(100)') as SecondDataSource
FROM [ReportServer].[dbo].[Catalog]
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
August 25, 2016 at 8:29 pm
Thx but that is not really the challenge...
What I need is an expression that will give me the 3 letters that follow the nth occurrence of the key word if it exists..
I mean see the code below, I need the 3 letters that follow the 2nd occurrence of the word <ConnectString>
...
Sorry forgot to paste the code in my previous post, so here it is. What I meant was after the word <DataReference>
Ok, to get 3 letters that follow the nth occurrence of the key word if it exists using NGrams8K[/url]:
-- User supplied variables
DECLARE
@searchString varchar(100) = '<DataReference>', -- string to search for
@N tinyint = 2; -- occurance
SELECT [str], newstring = SUBSTRING([str], position+LEN(@searchstring), 3)
FROM
(
SELECT
tokenNumber = ROW_NUMBER() OVER (PARTITION BY [str] ORDER BY position),
[str],
position
FROM #t AS t
CROSS APPLY dbo.NGrams8k([str], LEN(@searchString))
WHERE token = @searchString
) ng
WHERE tokenNumber = @N;
As Magoo suggested, because you're working with XML you can just shred it using XQuery. First some sample data:
-- create sample XML data
if object_id('tempdb..#t2') is not null DROP TABLE #t2;
WITH iTally(N) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM (VALUES (1),(1),(1),(1),(1),(1)) x(a)
),
D AS
(
SELECT t2.N, LEFT(newid(),5) AS DataReference
FROM iTally t1, iTally t2
WHERE t2.N < 6
)
SELECT SomeID = N, SampleXML =
(
SELECT DataReference
FROM D D2
WHERE D.N = D2.N
FOR XML PATH(''), ROOT('References'), TYPE
)
INTO #t2
FROM D
GROUP BY N;
... and the solution
DECLARE @N tinyint = 2;
SELECT SomeId, txt --, SampleXML -- uncomment for testing
FROM
(
SELECT
SomeId,
txt = LEFT(x.value('(./text())[1]', 'varchar(100)'),3),
SampleXML,
rn =
ROW_NUMBER() OVER
(
PARTITION BY SomeId
ORDER BY x.value('(../DataReference)[1]', 'varchar(100)')
)
FROM #t2 t
CROSS APPLY t.SampleXML.nodes('*:References/DataReference') tx(x)
) AS splitNodes
WHERE rn = @N;
-- Itzik Ben-Gan 2001
August 26, 2016 at 12:47 am
Matt Miller (#4) (8/25/2016)
I'm sorry if I am taking the fun out of building a SQL only solution, but there are lots of basic samples of SQLCLR showing how to expose the Regex.Matches() method from .NET CLR.A good evrsion is located here courtesy of Phil Factor:
https://www.simple-talk.com/sql/t-sql-programming/clr-assembly-regex-functions-for-sql-server-by-example/[/url]
Hey there ol' friend... you and I went through something similar years ago where regex didn't do so well performance-wise. Have you tried the performance for this solution?
--Jeff Moden
Change is inevitable... Change for the better is not.
August 26, 2016 at 8:01 am
This seems like a close solution to what I want. However I am getting a syntax error ( listed below ) Can you help me with the syntax.
WITH XMLNAMESPACES( 'http://schemas.microsoft.com/sqlserver/reporting/2008/01/reportdefinition' as pd )
SELECT
cast(cast([Content] as varbinary(max)) as xml).value('(//DataSources/DataSource/@Name)[1]','varchar(100)') as FirstDataSource
, cast(cast([Content] as varbinary(max)) as xml).value('(//DataSources/DataSource/@Name)[2]','varchar(100)') as SecondDataSource
FROM [ReportServer].[dbo].[Catalog]
--------
Msg 9420, Level 16, State 1, Line 1
XML parsing: line 1, character 3, illegal xml character
August 26, 2016 at 8:08 am
The problem with the proposed xml solutions, is that the sample data is not xml. It has to be formatted to be xml and that would be more work.
That's if the sample data actually resembles the real data.
August 26, 2016 at 8:18 am
mw112009 (8/26/2016)
This seems like a close solution to what I want. However I am getting a syntax error ( listed below ) Can you help me with the syntax.WITH XMLNAMESPACES( 'http://schemas.microsoft.com/sqlserver/reporting/2008/01/reportdefinition' as pd )
SELECT
cast(cast([Content] as varbinary(max)) as xml).value('(//DataSources/DataSource/@Name)[1]','varchar(100)') as FirstDataSource
, cast(cast([Content] as varbinary(max)) as xml).value('(//DataSources/DataSource/@Name)[2]','varchar(100)') as SecondDataSource
FROM [ReportServer].[dbo].[Catalog]
--------
Msg 9420, Level 16, State 1, Line 1
XML parsing: line 1, character 3, illegal xml character
That looks like you might have some invalid xml in the Content of your Catalog.
Perhaps try it just for one row first, either using TOP(1) or by selecting a specific report.
Alternatively, you could try this
SELECT
cast(cast([Content] as varbinary(max)) as xml).value('(//DataSources/DataSource/@Name)[1]','varchar(100)') as FirstDataSource
, cast(cast([Content] as varbinary(max)) as xml).value('(//DataSources/DataSource/@Name)[2]','varchar(100)') as SecondDataSource
FROM (
SELECT * --Sorry, don't have the actual column names to hand
FROM [ReportServer].[dbo].[Catalog]
WHERE TRY_CONVERT(XML, cast([Content] as varbinary(max)) ) IS NOT NULL
) Catalog
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
August 26, 2016 at 8:19 am
All set for now ... Just by adding a line to the where clause I managed to get rid of that error..
WORKS fine.. but I do have a second question will post it in a few minutes.
So the following query works fine.
WITH XMLNAMESPACES( DEFAULT 'http://schemas.microsoft.com/sqlserver/reporting/2008/01/reportdefinition' )
SELECT
name
, cast(cast([Content] as varbinary(max)) as xml).value('(//DataSources/DataSource/@Name)[1]','varchar(100)') as FirstDataSource
, cast(cast([Content] as varbinary(max)) as xml).value('(//DataSources/DataSource/@Name)[2]','varchar(100)') as SecondDataSource
FROM [ReportServer].[dbo].[Catalog]
WHERE
type = 2
August 26, 2016 at 8:56 am
Question
Consider the segment of xml pasted below
<DataSources>
<DataSource Name="EDW">
<DataSourceReference>EDW</DataSourceReference>
<rd:SecurityType xmlns:rd="http://schemas.microsoft.com/SQLServer/reporting/reportdesigner">None</rd:SecurityType>
<rd:DataSourceID xmlns:rd="http://schemas.microsoft.com/SQLServer/reporting/reportdesigner">91905df0-3a05-4d1b-a59b-568a018c7ad1
</rd:DataSourceID>
</DataSource>
</DataSources>
The Query works fine, except i am not getting a value for the <DataSourceReference> ,
So the last item in my SELECT STMT needs some syntax correction.
WITH XMLNAMESPACES( DEFAULT 'http://schemas.microsoft.com/sqlserver/reporting/2008/01/reportdefinition' )
SELECT
name
, cast(cast([Content] as varbinary(max)) as xml).value('(//DataSources/DataSource/@Name)[1]','varchar(100)') as FirstDataSource
, cast(cast([Content] as varbinary(max)) as xml).value('(//DataSources/DataSource/@Name)[2]','varchar(100)') as SecondDataSource
, cast(cast([Content] as varbinary(max)) as xml).value('(//DataSources/DataSource/@Name)[3]','varchar(100)') as ThirdDataSource
, cast(cast([Content] as varbinary(max)) as xml).value('(//DataSources/DataSource/@Name)[4]','varchar(100)') as FourthDataSource
, cast(cast([Content] as varbinary(max)) as xml).value('(//DataSources/DataSource/@Name)[5]','varchar(100)') as FifthDataSource
, cast(cast([Content] as varbinary(max)) as xml).value('(DataSourceReference)[1]','VARCHAR(250)') as DataSourceReference1
FROM [ReportServer].[dbo].[Catalog]
WHERE
type = 2
August 26, 2016 at 9:21 am
mw112009 (8/26/2016)
QuestionConsider the segment of xml pasted below
<DataSources>
<DataSource Name="EDW">
<DataSourceReference>EDW</DataSourceReference>
<rd:SecurityType xmlns:rd="http://schemas.microsoft.com/SQLServer/reporting/reportdesigner">None</rd:SecurityType>
<rd:DataSourceID xmlns:rd="http://schemas.microsoft.com/SQLServer/reporting/reportdesigner">91905df0-3a05-4d1b-a59b-568a018c7ad1
</rd:DataSourceID>
</DataSource>
</DataSources>
The Query works fine, except i am not getting a value for the <DataSourceReference> ,
So the last item in my SELECT STMT needs some syntax correction.
WITH XMLNAMESPACES( DEFAULT 'http://schemas.microsoft.com/sqlserver/reporting/2008/01/reportdefinition' )
SELECT
name
, cast(cast([Content] as varbinary(max)) as xml).value('(//DataSources/DataSource/@Name)[1]','varchar(100)') as FirstDataSource
, cast(cast([Content] as varbinary(max)) as xml).value('(//DataSources/DataSource/@Name)[2]','varchar(100)') as SecondDataSource
, cast(cast([Content] as varbinary(max)) as xml).value('(//DataSources/DataSource/@Name)[3]','varchar(100)') as ThirdDataSource
, cast(cast([Content] as varbinary(max)) as xml).value('(//DataSources/DataSource/@Name)[4]','varchar(100)') as FourthDataSource
, cast(cast([Content] as varbinary(max)) as xml).value('(//DataSources/DataSource/@Name)[5]','varchar(100)') as FifthDataSource
, cast(cast([Content] as varbinary(max)) as xml).value('(DataSourceReference)[1]','VARCHAR(250)') as DataSourceReference1
FROM [ReportServer].[dbo].[Catalog]
WHERE
type = 2
You have the wrong XPATH and need to grab the text() node
WITH XMLNAMESPACES( DEFAULT 'http://schemas.microsoft.com/sqlserver/reporting/2008/01/reportdefinition' )
SELECT
name
, cast(cast([Content] as varbinary(max)) as xml).value('(//DataSources/DataSource/@Name)[1]','varchar(100)') as FirstDataSource
, cast(cast([Content] as varbinary(max)) as xml).value('(//DataSources/DataSource/@Name)[2]','varchar(100)') as SecondDataSource
, cast(cast([Content] as varbinary(max)) as xml).value('(//DataSources/DataSource/@Name)[3]','varchar(100)') as ThirdDataSource
, cast(cast([Content] as varbinary(max)) as xml).value('(//DataSources/DataSource/@Name)[4]','varchar(100)') as FourthDataSource
, cast(cast([Content] as varbinary(max)) as xml).value('(//DataSources/DataSource/@Name)[5]','varchar(100)') as FifthDataSource
, cast(cast([Content] as varbinary(max)) as xml).value('(//DataSources/DataSource/DataSourceReference/text())[1]','VARCHAR(250)') as DataSourceReference1
FROM [ReportServer].[dbo].[Catalog]
WHERE
type = 2
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
August 26, 2016 at 11:03 am
Thanks a lot , I need some syntax help here ....
The syntax is wrong below,
How would I extract the INITIAL CATALOG and the [Data Source] from the <ConnectString> node ?
Help is appreciated
/*
<?xml version="1.0" encoding="utf-8"?>
<DataSourceDefinition xmlns="http://schemas.microsoft.com/sqlserver/reporting/2006/03/reportdatasource">
<Extension>SQL</Extension>
<ConnectString>Data Source=MHPDATA2;Initial Catalog=CCMSPROD</ConnectString>
<CredentialRetrieval>Integrated</CredentialRetrieval>
<Enabled>True</Enabled>
</DataSourceDefinition>
*/
If object_id('tempdb..#DataSources') IS NOT NULL DROP TABLE #DataSources;
create table #DataSources( name varchar(500) , DataSource varchar(500), InitialCatalog varchar(500) );
WITH XMLNAMESPACES( DEFAULT 'http://schemas.microsoft.com/sqlserver/reporting/2008/01/reportdefinition' )
SELECT
name
, cast(cast([Content] as varbinary(max)) as xml).value('(//ConnectString/text())[1]','varchar(100)') as [Data Source]
, cast(cast([Content] as varbinary(max)) as xml).value('(//ConnectString/text())[1]','varchar(100)') as [Initial Catalog]
FROM [ReportServer].[dbo].[Catalog]
WHERE
type = 5
August 26, 2016 at 11:15 am
You need to extract the <ConnectString> text() node as shown previously, then use something like DelimitedSplit8K[/url] to split that string into pieces.
I'm sorry but I don't have time right now to code that, maybe someone else can help?
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
August 26, 2016 at 11:22 am
Thanks.. I think I can handle from here.. Shall post the solution later..
August 26, 2016 at 7:45 pm
Jeff Moden (8/26/2016)
Matt Miller (#4) (8/25/2016)
I'm sorry if I am taking the fun out of building a SQL only solution, but there are lots of basic samples of SQLCLR showing how to expose the Regex.Matches() method from .NET CLR.A good evrsion is located here courtesy of Phil Factor:
https://www.simple-talk.com/sql/t-sql-programming/clr-assembly-regex-functions-for-sql-server-by-example/[/url]
Hey there ol' friend... you and I went through something similar years ago where regex didn't do so well performance-wise. Have you tried the performance for this solution?
Actually I just did. It's actually impressive, and having revalidated against the CLR solution, at least from my test - Alan's solution does beat CLR. But the interesting thing is that while running it on 2014, on large datasets (20M rows, 2-3k strings with avg 7 matches), on my machine - they are pretty darned close (total run took 8:30):
Ngrams8k came in at 4:08, CLR @ 4:22.
Which was about the same gap got on "small" datasets (100K rows, 250 char length, 3-4 matches). About 4 secs for Ngrams, 19 secs on CLR.
So - NGRAMS8k clearly does seem to perform a bit better, just not the blowout I was half expecting to see. Still it was the first time I browsed through Alan's article on it, so will kick the tires a bit more.
Thanks for the heads up.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
August 29, 2016 at 6:16 am
Thanks, Matt. I sure do appreciate the testing and the feedback.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 16 through 29 (of 29 total)
You must be logged in to reply to this topic. Login to reply