December 19, 2005 at 9:50 am
I'm going to try and simplify this a little. I have one of our business users comming to me and wanting me to set something up to help them analyze some information they will be getting on a regular basis. What they have is a flat file with some general information followed by 120 columns of survey type data.
So the structure of their flat file looks something like this:
ID
Name
City
State
Q1
Q2
Q3
Q4
Q5
Q6
And so on through Q120
What I want to end up with is 2 tables that resemble the following:
Table 1
ID
Name
City
State
Table 2
ID
Q
These would obviously be a one to many relationship but I think having the Q1 – Q120 in a separate table like this will make the reporting I need to do much easier.
Any suggestions on how to attack this?
Thanks
December 19, 2005 at 10:35 am
Well, the solution of having a one to many relationship looks good and you have yourself suggested it.
What else info do you need on this?
December 19, 2005 at 10:39 am
What I'm looking for is the SQL code to actually transpose the data like this. I've tried a couple of things this morning but without much luck so I thought if anyone has done this before it would help in providing me some direction.
December 19, 2005 at 11:37 am
Hi,
So, lets see if I understand: You need a query that does this transformation for you. Then, you would like to put together a query execute an INSERT INTO SELECT FROM statement?
As far as I know, there is no way to do this using standard SQL. No query can create the view you need to insert into the target table.
I know of some ways to do it, maybe I can help if you provide us some details:
- How many rows are there in the source table?
- Do you plan to load the flat file into a database table?
- Is this a run-once problem or will you need to convert this structure continuously?
cheers,
paulo.
December 19, 2005 at 12:46 pm
Paulo,
You are correct, I'm looking for a query to help me do this to get it into relational tables.
As far as your other questions the number of rows in the source table could be as small as a few to hundreds. I can load the flat file into a database first (a temporary holding area if you will). Finally, this is something we will have to do on a continual basis. It is a file we are receiving from a survey company on a regular basis.
Thanks,
Curt
December 19, 2005 at 1:41 pm
You need to "unpivot" the data. This can be done in T-SQL using a UNION ALL query that converts columns to rows.
eg:
SELECT Name, City, State, Email, 1 As ID, Q1 As Q
FROM YourTable
UNION ALL
SELECT Name, City, State, Email, 2 As ID, Q2 As Q
FROM YourTable
UNION ALL
...
SELECT Name, City, State, Email, 120 As ID, Q120 As Q
FROM YourTable
This would produce 120 rows for each Name, City, State & Email. If the columns are named as you suggest (Q1 - Q120), you don't have to hand code all this, you can write T-SQL against the INFORMATION_SCHEMA views (or syscolumns) to generate the SQL for you.
December 19, 2005 at 2:32 pm
Thanks. Can you get me started on how to do it by writing the T-SQL against the schema?
December 19, 2005 at 5:46 pm
You already got the UNION ALL solution. I presume that if the table does not fit in memory it can be slow. This is why I asked about the size.
Another way to do it, which btw has the advantage of being more concise and in principle faster to execute, consists of declaring a cursor over the source table and inserting the multiple records in the target table. Now, you can do this in two ways: Either code the 120 inserts or use a nested cursor over syscolumns (usually not a good idea because you would be locking system tables).
hope this helps.
paulo.
December 19, 2005 at 9:12 pm
Ehlinger,
As strange as it sounds, if the flat file is VERY consistant, you can actually use BCP or Bulk Insert to transpose the input from the flat file directly into a table. You will, however, have to make a BCP format file to deal with the unusual format... it would start out looking like this...
8.0 |
125
1
SQLCHAR
0
500
"\r\n"
1
ID
""
2
SQLCHAR
0
500
"\r\n"
2
Name
""
3
SQLCHAR
0
500
"\r\n"
3
City
""
4
SQLCHAR
0
500
"\r\n"
4
State
""
5
SQLCHAR
0
500
"\r\n"
5
""
6
SQLCHAR
0
500
"\r\n"
6
Q1
""
7
SQLCHAR
0
500
"\r\n"
7
Q2
""
8
SQLCHAR
0
500
"\r\n"
8
Q3
""
9
SQLCHAR
0
500
"\r\n"
9
Q4
""
10
SQLCHAR
0
500
"\r\n"
10
Q5
""
11
SQLCHAR
0
500
"\r\n"
11
Q6
""
…
…
…
…
…
…
…
…
125
SQLCHAR
0
500
"\r\n"
125
Q120
""
Notice that the "column delimiters" are all CrLf's...
--Jeff Moden
Change is inevitable... Change for the better is not.
December 20, 2005 at 3:33 am
If the transformation during import suggested by Jeff works with your files, it will be the best way to do this. If you have problems, and need to import the data into a staging table first, here is what you could do with it using dynamic SQL.
Disclaimer : Please understand that most of the time it is best to avoid dynamic SQL and use standard, set-based solutions (like the one with UNION). If you know that you will always be working with a table that has 120 "Q" columns, I would suggest to take the time and write the query using 119 UNIONs. However, if you know that the table will always be of the same general structure, but the number of "Q" columns may vary, it would be handy to have SQL that can be quickly modified or parametrized to run on any number of columns.
I can not guarantee that this SQL will perform correctly on your data - it is just an example how it could be done. It does not take into account any previously inserted rows and supposes that all tables are freshly created and empty. I only used 4 Q columns to keep it simple, but you can use as many as you need (or rather as SQL Server allows ). Also, I understand that this action will be performed on a staging table with max. hundreds of rows; for greater amounts of data, I'm not sure what the performance would be.
--staging table to which you import data from flat file
CREATE TABLE #import (imp_ID int identity, cust_name varchar(50), City varchar(50), State varchar(30),
Email varchar(50), Q1 char(1), Q2 char(1), Q3 char(1), Q4 char(1))
--test values
INSERT INTO #import VALUES ('Bill','Houston','Texas','bill@company.com', 'Y','N','1','2')
--tables to hold results
CREATE TABLE #contacts (id_contact int, cust_name varchar(50), City varchar(50), State varchar(30),
Email varchar(50))
CREATE TABLE #questions (id_contact int, Q_id int, Q char(1))
--customers in survey and their contact data
INSERT INTO #contacts (id_contact, cust_name, city, state, email)
SELECT imp_id, cust_name, city, state, email
FROM #import
--bring all Q columns into one to normalize data
DECLARE @counter int, @sql nvarchar(1024)
SET @counter = 0
WHILE @counter < 4
BEGIN
SET @counter = @counter+1
SET @sql = 'INSERT INTO #questions (id_contact, Q_id, Q)
SELECT imp_ID,' + CAST(@counter as varchar(3))+ ',Q'+ CAST(@counter as varchar(3))
+ ' FROM #import'
EXEC sp_executesql @sql
END
drop table #import
drop table #questions
drop table #contacts
HTH, Vladan
EDIT : P.S. To determine the number of Q columns dynamically (supposed no other column begins with 'Q' !!) you can use this. It can be easily modified to use non-temporary tables:
SELECT count(*) from tempdb..syscolumns
where [id] = object_id('tempdb..#import')
and LEFT([name],1) = 'Q'
That means, if you declare a variable, fill it with count of columns and use it instead of the fixed value (4 in my example) in the WHILE clause, you can use the same SQL to transform table with any number of Q columns.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply