How to use a CSV as parameter in a query

Question

How to use a CSV as parameter in a query

Viewing 5 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 1

Henk Schreij (11/10/2016)
Are you able to change the parameter to a table-parameter?
I wish 🙂
I get the parameters from a outsite source: the people who build the website.
I have some influence on what they pass on, but they came with the solution of a comma separated list of integers (or a space separated list, xml list, etc.).
I'll give it a try again.
@celko/jeff:
I tried the xml solution (I transformed the csv in xml), the performance is weird:
98 % is for the xml reader, 1% for the xml path filter and 1% for the sql excecution (a join of 10 small tables).
The total execution time is 660 msec (the first time with a new combination of parameters) or 15 msec (when parameters used again).
Not so happy with it.
It's the company policy that the DB is used for storing the tables (with FK's including cascades, defaults). But no stored procedures or functions.
I'll have a look, if they will make an exception for the SplitDelimiter function: much better performance I expect.
@celko: I have read your articles. Very good at explaining and well readable, but ...
I feel a bit bullied about the way you describe people who try to unfold a csv string.
I'm not so stupid as you write there, know something about normalization.

If you are never getting more than 5-6 values, then I recommend not to use XML but rather the DelimiterSplit8K or the logic from that function as Alan B demonstrated previously. The main reason for using XML is that it can handle sizes much greater than the DelimitedSplit8K function.

😎

There is another option , which is to use 0 padded fixed length entries, i.e.

0000100002000030000400005

which requires far less iterations for splitting and therefore are much much faster than any other splitting methods, would that be an option?

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 2

Further on the fixed width method, here is an example, 100 x more efficient than XML and roughly 4-8 x more efficient than DelimitedSplit8K.

😎

I have proven many times that generic functions and algorithms are slower than purposely build ones, this scenario is a good example of that rule.

DECLARE @PARAMSTR VARCHAR(8000) = '0001000200030004000500060007000800090010';

;WITH T(N) AS (SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0)) X(N))

, NUMS(N) AS

( SELECT 0 UNION ALL

SELECT TOP((LEN(@PARAMSTR) / 4) - 1) ROW_NUMBER() OVER (ORDER BY @@VERSION) * 4 FROM T T1,T T2,T T3

)

SELECT

CONVERT(INT,SUBSTRING(@PARAMSTR,NM.N + 1,4),0)

FROM NUMS NM;

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 3

Henk Schreij (11/10/2016)
It's the company policy that the DB is used for storing the tables (with FK's including cascades, defaults). But no stored procedures or functions.

Yowch!. I did some part time work for a company like that once. They were plagued by performance problems and every time I told them which query it was coming from the GUI, they said it was the ORM and couldn't do anything about it. They wallowed for a couple of years of customer complaints about having to buy monster hardware to support their app before finally seeing the light when it comes to stored procedures and the like.

I tried the xml solution (I transformed the csv in xml), the performance is weird:
98 % is for the xml reader, 1% for the xml path filter and 1% for the sql excecution (a join of 10 small tables).
The total execution time is 660 msec (the first time with a new combination of parameters) or 15 msec (when parameters used again).
Not so happy with it.

That's been a lot like my experience with XML in general, as well.

I'll have a look, if they will make an exception for the SplitDelimiter function: much better performance I expect.

In most cases, it blows the doors off he XML method. Be aware that if you modify it to use one of the MAX datatypes, it will run twice as slow. If you need to split one of the MAX datatypes on a regular basis, I recommend that someone write a well written splitter in one of the .NET languages and turn it into a CLR function. Be advised, though, that a lot of well-meaning folks flub that pretty badly. They miss things like what leading, trailing, and adjacent delimiters are supposed to mean. To me, leading/trailing means starting/ending with an empty element and adjacent delimiters mean the element between them is also empty. A couple of ready-to-use functionality tests are included in the header of the DelimitedSplit8K function.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 4

Eirikur Eiriksson (11/10/2016)
Further on the fixed width method, here is an example, 100 x more efficient than XML and roughly 4-8 x more efficient than DelimitedSplit8K.
😎
I have proven many times that generic functions and algorithms are slower than purposely build ones, this scenario is a good example of that rule.

Have regularly done both myself and 100% agree with both notions.

I'll add that the fixed field format for files also forces people to do it right and, except for truncation on their end, is really easy to detect formation and transmission errors not to mention being nasty fast.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 5

Jeff Moden (11/10/2016)
Eirikur Eiriksson (11/10/2016)
Further on the fixed width method, here is an example, 100 x more efficient than XML and roughly 4-8 x more efficient than DelimitedSplit8K.
😎
I have proven many times that generic functions and algorithms are slower than purposely build ones, this scenario is a good example of that rule.
Have regularly done both myself and 100% agree with both notions.
I'll add that the fixed field format for files also forces people to do it right and, except for truncation on their end, is really easy to detect formation and transmission errors not to mention being nasty fast.

Now I wonder where I got this habit from....;-)

😎

As always, the proof is in the pudding

USE TEEST;

GO

SET NOCOUNT ON;

DECLARE @PARAMSTR VARCHAR(8000) = REPLICATE(CONVERT(VARCHAR(8000),'0001000200030004000500060007000800090010',0),800);

DECLARE @PARAMSTX VARCHAR(8000) = REPLICATE(CONVERT(VARCHAR(8000),'1,2,3,4,5,6,7,8,9,10,',0),800);

DECLARE @PARAXML XML = REPLICATE(CONVERT(VARCHAR(8000),'<X>1</X><X>2</X><X>3</X><X>4</X><X>5</X>',0),800);

DECLARE @INT_BUCKET INT = 0;

DECLARE @CHR_BUCKET VARCHAR(8000) = '';

DECLARE @TIMER TABLE (T_TEXT VARCHAR(100) NOT NULL, T_TS DATETIME2(7) NOT NULL DEFAULT (SYSDATETIME()));

INSERT INTO @TIMER(T_TEXT) VALUES ('FIXED');

;WITH T(N) AS (SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0)) X(N))

, NUMS(N) AS

( SELECT 0 UNION ALL

SELECT TOP((LEN(@PARAMSTR) / 4) - 1) ROW_NUMBER() OVER (ORDER BY @@VERSION) * 4 FROM T T1,T T2,T T3

)

SELECT

@INT_BUCKET = CONVERT(INT,SUBSTRING(@PARAMSTR,NM.N + 1,4),0)

FROM NUMS NM;

INSERT INTO @TIMER(T_TEXT) VALUES ('FIXED');

INSERT INTO @TIMER(T_TEXT) VALUES ('DELIMITED');

SELECT

@CHR_BUCKET = X.Item

FROM dbo.DelimitedSplit8K(@PARAMSTX,CHAR(44)) X;

INSERT INTO @TIMER(T_TEXT) VALUES ('DELIMITED');

INSERT INTO @TIMER(T_TEXT) VALUES ('XML');

SELECT

@INT_BUCKET = P.DATA.value('(./text())[1]','INT')

FROM @PARAXML.nodes('/X') P(DATA);

INSERT INTO @TIMER(T_TEXT) VALUES ('XML');

SELECT

T.T_TEXT

,DATEDIFF(MICROSECOND,MIN(T_TS),MAX(T.T_TS)) AS DURATION

FROM @TIMER T

GROUP BY T.T_TEXT

ORDER BY DURATION;

Results on an W10 Tablet (4 core Atom)

T_TEXT DURATION

----------- ---------

FIXED 1004

XML 15629

DELIMITED 108780

Only have access to my tablet toy at the moment, would like to see the results of this test from a proper PC/Laptop as the toy is a cpu starved SD card storage thingy 🙂