November 6, 2009 at 9:51 pm
Please also consider using an SSIS package to import data from files to a database - that is its primary purpose in life, and I don't like to see a good tool get upset. π
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
November 10, 2009 at 6:39 am
Paul White (11/6/2009)
Please also consider using an SSIS package to import data from files to a database - that is its primary purpose in life, and I don't like to see a good tool get upset. π
SSIS requires a fixed number of columns in the import definition.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
November 10, 2009 at 3:33 pm
GSquared (11/10/2009)
SSIS requires a fixed number of columns in the import definition.
Well it certainly prefers to deal with stable metadata, that's for sure. But, unless the requirement is actually to import a file with truly 'any structure' into an unspecified table, there's often a way around it.
Most frequently I've come across this as different files being subsets of a larger structure, or something like that. If there's some structure or logical to work with, it's often possible to do.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
November 10, 2009 at 3:53 pm
Jeff Moden (11/6/2009)
Simon Parry (11/6/2009)
Hii have tried using BULK INSERT
using
set @string ='BULK INSERT CSVTest FROM ''\\10.1.2.107\rp_uploaded_files\'+@file+''' WITH (FIELDTERMINATOR = '','',ROWTERMINATOR = '''')'
exec (@string)
but this requires the destintation table to have the same number of columns as the import file
The files i want to import have an unknown number of coulmns
Can you help at all?
Simon
Yep... attach a copy of one of the files (unless it has private info in it) to your next post and tell me what you think you'd like to do with it.
Simon?
--Jeff Moden
Change is inevitable... Change for the better is not.
November 10, 2009 at 7:35 pm
Hi
Sorry for the delay
I have attached 3 sample files, its possible each one of these may be used
If the files were the same size each time i could do it, but because they have a varying number of columns im stuck
Many thanks for all your help π
Simon
November 10, 2009 at 11:18 pm
That's the other piece of the puzzle I need, Simon... what do you want to do with the data when it's brought in? Separate tables? Common Table? If a common table, what do you want it to look like?
...ooooorrrrrrr....
... do you just want to be able to use the data short term as if it were in a table?
--Jeff Moden
Change is inevitable... Change for the better is not.
November 11, 2009 at 1:52 am
Hi Jeff
thanks for your reply
i would want the data stored in seperate tables
thanks
Simon
April 23, 2010 at 11:03 am
Using BULK INSERT won't work if you there are variable number of columns in the data file.
Something like:
fld1A,fld2A,fld3A
fld1B,fld2B,fld3B
fld1C,fld2C,fld3C,fld4C
Oddly the OLD DTS data bumps/bulk insert tasks were fine with the above.
They must have used the MSDASQL driver for the insertion process.
π
April 23, 2010 at 12:58 pm
Wow... sorry about that, folks. I really lost track of this post.
Simon, after only 5 months of delay :blush:, do you still need any help on this?
--Jeff Moden
Change is inevitable... Change for the better is not.
April 24, 2010 at 12:25 am
Jeff Moden (4/23/2010)
Wow... sorry about that, folks. I really lost track of this post.Simon, after only 5 months of delay :blush:, do you still need any help on this?
Jeff, i would like to see a solution for this.. i smell that i might run into this problem in a short while.. mine is exactly what Simon Parry was talking about.. i am going to get inconsistent csv files (meaning, the number of columns will differ every day) and i need to import them into a table. So if u have a solution, i am sure u ll have one which will eventually be the best π , please share with us!!
Thanks in advance , Jeff!!
April 24, 2010 at 12:35 am
I have 2 solutions in mind.
Solution 1:
1. Ask the csv team to tel give us if the extra columns will be added in today's file.
2. Create many format (*.fmt) files and use them according to the csv team's response.
3. Utilize BCP/BULK INSERT along with FMT files to import them.
Solution 2:
1. Use OPENROWSET to capture the column names in the csv file.
2. Create a table with those column names (using NVARCHAR for all the columns, of course)
3. Create a BULK INSERT statement using the above table name, dynamically.
4. EXEC the dynamic statment.
Hmmm... hope these 2 solutions are feasible. I did a POC of solution 1 and it worked fine.. For solution 2, i am yet to start! I am also quite interested in GSquared's import to temp table solution, i will give a shot at this in some time!
Now, Jeff, waiting for your inputs on this "tricky" problem . π
Cheers!!
April 24, 2010 at 3:42 pm
ColdCoffee (4/24/2010)
I am also quite interested in GSquared's import to temp table solution, i will give a shot at this in some time!
BWAA-HAA!!! Try it on the Sample2.CSV file the OP provided in his original ZIP file... Surprise, surprise, surprise! π
There is a way to handle it, though...
--Jeff Moden
Change is inevitable... Change for the better is not.
April 24, 2010 at 5:38 pm
I put the OP's files in C:\Temp to have the following code run...
--DROP TABLE #Staging,#Header,#Data
--GO
--===== Supress auto display of rowcounts
SET NOCOUNT ON;
--===== Declare local variables
DECLARE @pFullPath VARCHAR(256), --Full path to the file. May be a UNC. Would be a parameter in a stored proc
@SQLBulk VARCHAR(MAX), --Dynamic Bulk Insert command
@SQLSelect VARCHAR(MAX) --Dynamic Select command
;
SELECT @pFullPath = 'C:\Temp\Sample2.csv' --Just for testing purposes.
;
--===== Create the staging table. The others will be created on the fly.
CREATE TABLE #Staging
(LineData VARCHAR(8000))
;
--===== Bulk Insert the dynamically named file
SELECT @SQLBulk =
' BULK INSERT #Staging
FROM ' + QUOTENAME(@pFullPath,'''')+'
WITH (
FIELDTERMINATOR = '''',
ROWTERMINATOR = ''''
);'
;
EXEC (@SQLBulk)
;
--===== Split and Isolate the header data in a table and cover
-- for missing header names.
WITH
cteNumber AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1 AS LineNumber,
LineData
FROM #Staging
)
SELECT split.ItemNumber,
QUOTENAME(ISNULL(NULLIF(split.Item,''),split.ItemNumber)) AS Data
INTO #Header
FROM cteNumber stage
CROSS APPLY (SELECT * FROM dbo.DelimitedSplit8K(LineData,',')) split
WHERE stage.LineNumber = 0
;
--===== Split and isolate the data
WITH
cteNumber AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1 AS LineNumber,
LineData
FROM #Staging
)
SELECT stage.LineNumber,
split.ItemNumber,
split.Item AS Data
INTO #Data
FROM cteNumber stage
CROSS APPLY (SELECT * FROM dbo.DelimitedSplit8K(LineData,',')) split
WHERE LineNumber > 0
;
--===== Put everything together as a single result set with the proper column names and all
-- Create the dynamic part of the SELECT list first
SELECT @SQLSelect = ISNULL(@SQLSelect+','+CHAR(10),'')
+ 'MAX(CASE WHEN d.ItemNumber =' + STR(h.ItemNumber,3) + ' THEN Data ELSE '''' END) AS ' + h.Data
FROM #Header h
;
-- Add the static part of the query to the dynamic part
SELECT @SQLSelect = 'SELECT d.LineNumber,' + CHAR(10)
+ @SQLSelect + CHAR(10)
+ 'FROM #Data d GROUP BY d.LineNumber'
;
--===== We're ready to rock... return the result set
EXEC (@SQLSelect)
;
... and here's my splitter function...
CREATE FUNCTION dbo.DelimitedSplit8K
/***************************************************************************************************
Purpose:
Split a given string at a given delimiter and return a list of the split elements (items).
Returns:
iTVF containing the following:
ItemNumber = Element position of Item as a BIGINT (not converted to INT to eliminate a CAST)
Item = Element value as a VARCHAR(8000)
CROSS APPLY Usage Example:
---------------------------------------------------------------------------------------------------
--===== Conditionally drop the test tables to make reruns easier for testing.
-- (this is NOT a part of the solution)
IF OBJECT_ID('tempdb..#JBMTest') IS NOT NULL
DROP TABLE #JBMTest
;
--===== Create and populate a test table on the fly (this is NOT a part of the solution).
SELECT *
INTO #JBMTest
FROM (
SELECT 1,'1,10,100,1000,10000,100000,1000000' UNION ALL
SELECT 2,'2000000,200000,20000,2000,200,20,2' UNION ALL
SELECT 3, 'This,is,a,test' UNION ALL
SELECT 4, 'and so is this' UNION ALL
SELECT 5, 'This, too (no pun intended)'
) d (SomeID,SomeValue)
;
GO
--===== Split the CSV column for the whole table using CROSS APPLY (this is the solution)
SELECT test.SomeID, split.ItemNumber, split.Item
FROM #JBMTest test
CROSS APPLY
(
SELECT ItemNumber, Item
FROM dbo.DelimitedSplit8k(test.SomeValue,',')
) split
;
---------------------------------------------------------------------------------------------------
Notes:
1. Optimized for VARCHAR(8000) or less.
2. Optimized for single character delimiter.
3. Optimized for use with CROSS APPLY.
4. Does not "trim" elements just in case leading or trailing blanks are intended.
5. If you don't know how a Tally table can be used to replace loops, please see the following...
http://www.sqlservercentral.com/articles/T-SQL/62867/
6. Changing this function to use VARCHAR(MAX) will cause it to run twice as slow. It's just the
nature of VARCHAR(MAX) whether it fits in-row or not.
7. Multi-machine testing for the method of using UNPIVOT instead of 10 SELECT/UNION ALLs shows
that the UNPIVOT method is quite machine dependent and can slow things down quite a bit.
8. Performance testing shows using "TOP" for the limiting criteria of "N" is actually
slower and slightly more CPU intensive than the traditional WHERE N < LEN(@pString) + 2.
9. Performance testing shows using ORDER BY (SELECT x) where "x" is anything is actually
slower and slightly more CPU intensive than the traditional ORDER BY (SELECT N).
Credits:
This code is the product of many people's efforts including but not limited to the following:
cteTally concept originally by Iztek Ben Gan and "decimalized" by Lynn Pettis (and others) for a
bit of extra speed and finally redacted by Jeff Moden for a different slant on readability and
compactness. Hat's off to Paul White for his simple explanations of CROSS APPLY. Finally,
special thanks to Erland Sommarskog for his tireless efforts to help people understand
what you can actually do with T-SQL. I also thank whoever wrote the first article I ever saw
on "numbers tables" which is located at the following URL ...
Revision History:
Rev 00 - 20 Jan 2010 - Concept: Lynn Pettis and others.
Redaction/Implementation: Jeff Moden
- Base 10 redaction and reduction for CTE. (Total rewrite)
Rev 01 - 13 Mar 2010 - Jeff Moden
- Removed one additional concatenation and one subtraction from the SUBSTRING in the
SELECT List for that tiny bit of extra speed.
Rev 02 - 14 Apr 2010 - Jeff Moden
- No code changes. Added CROSS APPLY usage example to the header, some additional credits,
and extra documentation.
Rev 03 - 18 Apr 2010 - Jeff Moden
- No code changes. Added notes 7, 8, and 9 about certain "optimizations" that dont'
actually work for this type of function.
***************************************************************************************************/
--===== Define I/O parameters
(
@pString VARCHAR(8000),
@pDelimiter CHAR(1)
)
RETURNS TABLE
AS
RETURN
--===== "Inline" CTE Driven "Tally Tableβ produces values up to
-- 10,000... enough to cover VARCHAR(8000)
WITH
E1(N) AS ( --=== Create Ten 1's
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 --10
),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000
cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM E4)
--===== Do the split
SELECT ROW_NUMBER() OVER (ORDER BY N) AS ItemNumber,
N AS StartPosition,
SUBSTRING(@pString, N, CHARINDEX(@pDelimiter, @pString + @pDelimiter, N) - N) AS Item
FROM cteTally
WHERE N < LEN(@pString) + 2
AND SUBSTRING(@pDelimiter + @pString, N, 1) = @pDelimiter
;
--Jeff Moden
Change is inevitable... Change for the better is not.
April 24, 2010 at 8:45 pm
Thanks, Jeff, for posting your approach.. As i dont have SSMS at home, i will check out your code tomorrow and let you know..
But i got a doubt here.. From parsing your code, it looks like you are utilising 2 tables for the csv data. Hmmm, but the Solution 2 i proposed , it seems to use only one table for the csv data.. Anyways, i will try out both the options and give you the result on Monday.
Thanks again, Jeff! π
Cheers!!
C'est Pras!
April 25, 2010 at 10:48 am
ColdCoffee (4/24/2010)
Thanks, Jeff, for posting your approach.. As i dont have SSMS at home, i will check out your code tomorrow and let you know..But i got a doubt here.. From parsing your code, it looks like you are utilising 2 tables for the csv data. Hmmm, but the Solution 2 i proposed , it seems to use only one table for the csv data.. Anyways, i will try out both the options and give you the result on Monday.
Thanks again, Jeff! π
Cheers!!
C'est Pras!
The problem with your first suggestion is that you can't anticipate what humans will add nor call something which is what the current problem is... Unknown things being added without warning.
I like your second solution but the problem with that is... Neither OpenRowSet nor a Text based linked server will open the Sample2.csv file. You could, however, use the method I used to load the header line and parse it to make a Bulk Insert line and you might be able to use a delimited BULK INSERT then but that would also require a table creation using dynamic SQL which adds additional complexity to the problem (although it will probably be faster than doing a Tally split).
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 16 through 30 (of 30 total)
You must be logged in to reply to this topic. Login to reply