June 29, 2017 at 2:47 am
Sergiy - Thursday, June 29, 2017 2:09 AMJust tried to load CSV file with freaky quoted values into LibreOffice Calc - works perfectly.
MS could contract those programmers...
I only know of one RFC-4180 compliant parser function in T-SQL and that is the one I wrote few years back, always found it strange that this essential functionality is missing in SQL Server.
😎
The functionCREATE OR ALTER FUNCTION dbo.ITVFN_CSV_SPLITTER_4180_8K
(
@pString VARCHAR(8000)
,@pDelimiter CHAR(1)
,@pTxtQualifier CHAR(1)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
-- *********************************************************************
-- RFC-4180 compliant CSV splitter
-- Eirikur Eiriksson
--
-- This function is used for Comma-Separated Values (CSV) of MIME
-- type "text/csv".
--
-- Common Format and MIME Type for Comma-Separated Values (CSV) Files
-- https://tools.ietf.org/html/rfc4180
--
-- First published 2014/07/24
-- http://www.sqlservercentral.com/articles/SQL+Server+2012/106903/
--
-- *********************************************************************
-- USAGE
--
-- DECLARE @pString VARCHAR(8000) = ''
-- DECLARE @pDelimiter CHAR(1) = ',';
-- DECLARE @pTxtQualifier CHAR(1) = '"';
-- SELECT @pString = '100123,"Blues","West Point, Mississippi","Burnett,'
-- + '""Howlin'' Wolf"" Arthur Chester",1910-06-10,';
-- SELECT
-- CSV.ItemNumber
-- ,csv.Item
-- FROM dbo.ITVFN_CSV_SPLITTER_4180_8K(@pString,@pDelimiter,@pTxtQualifier) CSV
-- *********************************************************************
-- Inline Tally table returning a number sequence equivalent
-- to the length of the input string. The Cast of the Len() to a
-- Bigint prevents an implicit cast.
-- *********************************************************************
WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
, NUMS(N) AS (SELECT TOP(CONVERT(BIGINT,LEN(@pString),0)) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
FROM T T1,T T2,T T3,T T4)
-- ********************************************************************
-- Retrieve the position (N) and the character code (chrCode)
-- for all delimiters (@pDelimiter) and text qualifiers
-- (@pTxtQualifier)
-- ********************************************************************
,ctePrimer(N,chrCode) AS
(
SELECT
t.N
,UNICODE(SUBSTRING(@pString,t.N,1)) AS chrCode
FROM NUMS t
WHERE SUBSTRING(@pString,t.N,1) COLLATE Latin1_General_BIN
= @pDelimiter COLLATE Latin1_General_BIN
OR SUBSTRING(@pString,t.N,1) COLLATE Latin1_General_BIN
= @pTxtQualifier COLLATE Latin1_General_BIN
)
-- ********************************************************************
-- The cteStart encloses the string in virtual delimiters using
-- Union All at the beginning and the end. The main body sets the
-- IsDelim and IsTxQf flags.
-- ********************************************************************
,cteStart(N,IsDelim,IsTQA) AS
(
SELECT
0 AS N
,1 AS IsDelim
,0 AS IsTxQf
UNION ALL
SELECT
t.N
,(1 - SIGN(ABS(t.chrCode - UNICODE(@pDelimiter)))) AS IsDelim
,(1 - SIGN(ABS(t.chrCode - UNICODE(@pTxtQualifier)))) AS IsTxQf
FROM ctePrimer t
UNION ALL
SELECT
LEN(@pString) + 1 AS N
,1 AS IsDelim
,0 AS IsTxQf
)
-- ********************************************************************
-- Position (N), Delimiter flag (IsDelim), Text Qualifier flag
-- (IsTQA) and the running total of the number of appearance of
-- Text Qualifiers
-- ********************************************************************
,cteWorkSet AS
(
SELECT
cST.N
,cST.IsDelim
,cST.IsTQA
,SUM(cST.IsTQA)
OVER (ORDER BY cST.N
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS TQRT
,CASE
WHEN SUM(cST.IsTQA)
OVER (ORDER BY cST.N
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) % 2 = 1 THEN 'X'
ELSE ''
END AS Active
FROM cteStart cST
)
-- ********************************************************************
-- Determine the token type and the types of the preceding and
-- following tokens.
-- ********************************************************************
,cteWSTQ(P_START,IsDelim,NEXT_IsTQA,LAG_IsTQA) AS
(
SELECT
cWS.N AS P_START
,cWS.IsDelim AS IsDelim
,LEAD(cWS.IsTQA,1,0) OVER (ORDER BY cWS.N) AS NEXT_IsTQA
,LAG(cWS.IsTQA,1,0) OVER (ORDER BY cWS.N) AS LAG_IsTQA
FROM cteWorkSet cWS
WHERE cWS.IsDelim = 1
OR cWS.IsTQA = 1
)
-- ********************************************************************
-- Calculate the start position and the length of each element
-- ********************************************************************
,cteWSLEN(P_START,P_LEN) AS
(
SELECT
(X.P_START + X.NEXT_IsTQA + SIGN(X.P_START)) AS P_START
,(LEAD(X.P_START,1,0) OVER (ORDER BY X.P_START) -
((X.P_START + X.NEXT_IsTQA) + SIGN(X.P_START) +
LEAD(X.LAG_IsTQA,1,0) OVER (ORDER BY X.P_START))) AS P_LEN
FROM cteWSTQ X WHERE X.IsDelim = 1
)
-- ********************************************************************
-- Splitting the string using the output of the cteWSLEN, filtering
-- it by the length being non-negative value. The NULLIF returns NULL
-- if the field is empty.
--
-- ********************************************************************
SELECT
ROW_NUMBER() OVER (ORDER BY @@VERSION) AS ItemNumber
,NULLIF(REPLACE(SUBSTRING(@pString,cWL.P_START,cWL.P_LEN),@pTxtQualifier+@pTxtQualifier,@pTxtQualifier),'') AS Item
FROM cteWSLEN cWL
WHERE cWL.P_LEN > -1;
GO
July 8, 2017 at 7:34 pm
OMG... after working with SQL Server for decades, MS has finally done something about one of my decades-old wishes. They've made OPENROWSET handle "true CSV" (RFC 4180 compliant) files. Too bad it's only for 2017. Look for "FORMAT = 'CSV'" without the double quotes in the following article.
https://docs.microsoft.com/en-us/sql/t-sql/functions/openrowset-transact-sql
Now if we could just get them to do the same thing with the much faster BCP and BULK INSERT functionality along with allowing you to pass a variable for things like the file path, etc. It would be a big help if they also did such a thing for OPENROWSET.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 8, 2017 at 10:51 pm
Jeff Moden - Saturday, July 8, 2017 7:34 PMOMG... after working with SQL Server for decades, MS has finally done something about one of my decades-old wishes. They've made OPENROWSET handle "true CSV" (RFC 4180 compliant) files. Too bad it's only for 2017. Look for "FORMAT = 'CSV'" without the double quotes in the following article.
https://docs.microsoft.com/en-us/sql/t-sql/functions/openrowset-transact-sqlNow if we could just get them to do the same thing with the much faster BCP and BULK INSERT functionality along with allowing you to pass a variable for things like the file path, etc. It would be a big help if they also did such a thing for OPENROWSET.
Heh - they already let you use a variable for the filename in BULK INSERT. Assign the statement to a variable. 😉 It's amazing what we've gotten use to doing to get around the limitations imposed on us, isn't it?
July 9, 2017 at 6:32 am
Jeff Moden - Saturday, July 8, 2017 7:34 PMOMG... after working with SQL Server for decades, MS has finally done something about one of my decades-old wishes. They've made OPENROWSET handle "true CSV" (RFC 4180 compliant) files. Too bad it's only for 2017. Look for "FORMAT = 'CSV'" without the double quotes in the following article.
https://docs.microsoft.com/en-us/sql/t-sql/functions/openrowset-transact-sqlNow if we could just get them to do the same thing with the much faster BCP and BULK INSERT functionality along with allowing you to pass a variable for things like the file path, etc. It would be a big help if they also did such a thing for OPENROWSET.
Thanks for the heads up Jeff!
😎
July 10, 2017 at 5:20 am
Ed Wagner - Saturday, July 8, 2017 10:51 PMJeff Moden - Saturday, July 8, 2017 7:34 PMOMG... after working with SQL Server for decades, MS has finally done something about one of my decades-old wishes. They've made OPENROWSET handle "true CSV" (RFC 4180 compliant) files. Too bad it's only for 2017. Look for "FORMAT = 'CSV'" without the double quotes in the following article.
https://docs.microsoft.com/en-us/sql/t-sql/functions/openrowset-transact-sqlNow if we could just get them to do the same thing with the much faster BCP and BULK INSERT functionality along with allowing you to pass a variable for things like the file path, etc. It would be a big help if they also did such a thing for OPENROWSET.
Heh - they already let you use a variable for the filename in BULK INSERT. Assign the statement to a variable. 😉 It's amazing what we've gotten use to doing to get around the limitations imposed on us, isn't it?
😉 There is no spoon. There are no limitations. It's all a workaround. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 5 posts - 31 through 34 (of 34 total)
You must be logged in to reply to this topic. Login to reply