Import many column CSV into 2-column sql table

Question

Import many column CSV into 2-column sql table

elliottcoyne

SSC Enthusiast

Points: 146
More actions
December 14, 2013 at 10:35 pm

#287484

Hi there,
I have a source file that has many columns (25) and want to find the most efficient way to load it into a table with just two columns.
I.e I always want be first column to appear so I would want columns 1+2, then 1+3, then 1+4, then 1+5..... Up to 1+25
I was thinking of an insert command that pulls data from a staging table - however bit every row uses the full 25 columns- some are only 3 or 4 and I don't want rows in the final table that just have data in first column 1.
Any help would be appreciated!!

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

pietlinden SSC Guru Points: 63503 More actions · Answer 1

Given that I'm not good at SSIS, I did this in Access, because it's what I know, and it works. (So it is good for some things!)

In my source table "srcTable", the structure was like this:

CREATE TABLE srcTable (

ID INT IDENTITY(1,1) PRIMARY KEY,

FieldA VARCHAR(10),

FieldB VARCHAR(10),

FieldC VARCHAR(10),

FieldD VARCHAR(10)

);

Then my destination table was like this:

CREATE TABLE destTable (

ParentID INT NOT NULL,

TheValue VARCHAR(10) NOT NULL);

Then I created a module to do the dynamic SQL dirty work...

Option Compare Database

Option Explicit

Public Sub NormalizeData()

Dim tdf As DAO.TableDef

Dim i As Integer

Dim strSQL As String

DBEngine(0)(0).TableDefs("srcTable").Fields.Refresh

Set tdf = DBEngine(0)(0).TableDefs("srcTable")

For i = 1 To tdf.Fields.Count - 1

strSQL = "INSERT INTO destTable ( ParentID, TheValue ) SELECT srcTable.ID, srcTable.[" & tdf.Fields(i).Name & "] FROM srcTable WHERE srcTable.[" & tdf.Fields(i).Name & "] IS NOT NULL;"

Debug.Print strSQL

DBEngine(0)(0).Execute strSQL

Next i

Set tdf = Nothing

End Sub

Basically, it grabs the first column name (tdf.fields(0).Name) and the nth column (tdf.fields(i).Name) and inserts the non-null values into the destination table. then it processes the next column until there are no more columns to process.

For the SQL Smarties, I would love to know how to do this in T-SQL... I know about the sys.columns stuff, but not sure how to use it... so I resorted to something I know and that works. =)

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 2

Being a visual oriented individual I have no idea even where to start. It would help if you provided sample and the expected results based on the sample data. The sampe data should be representative of your data, not actual production data.

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 3

elliottcoyne (12/14/2013)
Hi there,
I have a source file that has many columns (25) and want to find the most efficient way to load it into a table with just two columns.
I.e I always want be first column to appear so I would want columns 1+2, then 1+3, then 1+4, then 1+5..... Up to 1+25
I was thinking of an insert command that pulls data from a staging table - however bit every row uses the full 25 columns- some are only 3 or 4 and I don't want rows in the final table that just have data in first column 1.
Any help would be appreciated!!

It's real easy. Just load the data into a staging table table (as you've identified) and then do qualified CROSS APPLYs to unpivot the data which will also allow you to ignore any rows where the 2+ column have no data.

If you'd like a coded example, please refer to the first link in my signature line below "Helpful Links" for the right was to post the example data.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

elliottcoyne SSC Enthusiast Points: 146 More actions · Answer 4

Hi Jeff - thanks for the offer of help. I've followed your guide (as far as I was able to) and created the following to mimic my data source:

--truncate table tstStreamlineCode

--===== Create the test table with

CREATE TABLE tstStreamlineCodes

(ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

NHScode varchar(7),

sl1 varchar(5),

sl2 varchar(5),

sl3 varchar(5),

sl4 varchar(5),

sl5 varchar(5),

sl6 varchar(5),

sl7 varchar(5),

sl8 varchar(5),

sl9 varchar(5)

)

--===== All Inserts into the IDENTITY column

SET IDENTITY_INSERT tstStreamlineCodes ON

--===== Insert the test data into the test table

INSERT INTO tstStreamlineCodes

(ID, NHScode, sl1 ,sl2 ,sl3 ,sl4 ,sl5 ,sl6 ,sl7 ,sl8 ,sl9)

SELECT '1','10002K','4068','4065','4086','4069','4096','4120','4121','4097','4353' UNION ALL

SELECT '2','10006P','4068','4065','4086','4069','4096','4120','4121','4097','' UNION ALL

SELECT '3','1003T', '3632','','','','','','','','' UNION ALL

SELECT '4','1007B','3633','','','','','','','','' UNION ALL

SELECT '5','1024X','1589','2044','','','','','','','' UNION ALL

SELECT '6','1037N','1589','2044','','','','','','',''

--===== Set the identity insert back to normal

SET IDENTITY_INSERT tstStreamlineCodes OFF

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 5

elliottcoyne (12/17/2013)
Hi Jeff - thanks for the offer of help. I've followed your guide (as far as I was able to) and created the following to mimic my data source:
--truncate table tstStreamlineCode
--===== Create the test table with
CREATE TABLE tstStreamlineCodes
(ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
NHScode varchar(7),
sl1 varchar(5),
sl2 varchar(5),
sl3 varchar(5),
sl4 varchar(5),
sl5 varchar(5),
sl6 varchar(5),
sl7 varchar(5),
sl8 varchar(5),
sl9 varchar(5)
)
--===== All Inserts into the IDENTITY column
SET IDENTITY_INSERT tstStreamlineCodes ON
--===== Insert the test data into the test table
INSERT INTO tstStreamlineCodes
(ID, NHScode, sl1 ,sl2 ,sl3 ,sl4 ,sl5 ,sl6 ,sl7 ,sl8 ,sl9)
SELECT '1','10002K','4068','4065','4086','4069','4096','4120','4121','4097','4353' UNION ALL
SELECT '2','10006P','4068','4065','4086','4069','4096','4120','4121','4097','' UNION ALL
SELECT '3','1003T', '3632','','','','','','','','' UNION ALL
SELECT '4','1007B','3633','','','','','','','','' UNION ALL
SELECT '5','1024X','1589','2044','','','','','','','' UNION ALL
SELECT '6','1037N','1589','2044','','','','','','',''
--===== Set the identity insert back to normal
SET IDENTITY_INSERT tstStreamlineCodes OFF

The following should do it (assuming you don't want to include the ID column which could easily be added in):

SELECT ca.NHSCode,ca.SLCode

FROM dbo.tstStreamLineCodes

CROSS APPLY

(

SELECT NHSCode,sl1 UNION ALL

SELECT NHSCode,sl2 UNION ALL

SELECT NHSCode,sl3 UNION ALL

SELECT NHSCode,sl4 UNION ALL

SELECT NHSCode,sl5 UNION ALL

SELECT NHSCode,sl6 UNION ALL

SELECT NHSCode,sl7 UNION ALL

SELECT NHSCode,sl8 UNION ALL

SELECT NHSCode,sl9

)ca(NHSCode,SLCode)

WHERE ca.SLCode > ''

;

Results:

NHSCode SLCode

------- ------

10002K 4068

10002K 4065

10002K 4086

10002K 4069

10002K 4096

10002K 4120

10002K 4121

10002K 4097

10002K 4353

10006P 4068

10006P 4065

10006P 4086

10006P 4069

10006P 4096

10006P 4120

10006P 4121

10006P 4097

1003T 3632

1007B 3633

1024X 1589

1024X 2044

1037N 1589

1037N 2044

(23 row(s) affected)

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)