June 24, 2014 at 5:53 pm
Hi
I have a master table containing details of over 800000 surveys made up of approximately 400 distinct document names and versions. Each document can have as few as 10 questions but as many as 150. Each question represents one row.
My challenge is to create a separate spreadsheet for each of the 400 distinct document names and versions containing all the rows and columns present in the master table. The largest number of rows would be around 150 and therefore each spreadsheet will not be very big.
e.g. in my sample data below, i will need to create individual Excel files named as follows . . .
"Document1Version1.xlsx" containing all the column names and 6 rows for the 6 questions relating to Document 1 version 1
"Document1Version2.xlsx" containing all the column names and 8 rows for the 8 questions relating to Document 1 version 2
"Document2Version1.xlsx" containing all the column names and 4 rows for the 4 questions relating to Document 2 version 1
I assume that one of the first things is to create a lookup of the distinct document names and versions assign some variables and then use this lookup to loop through and sequentially filter the master table data ready for creating the individual Excel files.
Can anyone help please?
--CREATE TEMP TABLE FOR EXAMPLE
IF OBJECT_ID('tempdb..#excelTest') IS NOT NULL DROP TABLE #excelTest
CREATE TABLE #excelTest (
[rowID] [nvarchar](10) NULL,
[docName] [nvarchar](50) NULL,
[docVersion] [nvarchar](2) NULL,
[question] nvarchar(2),
[blankField] [nvarchar](50) NULL,
) ON [PRIMARY]
GO
--Add sample records
INSERT INTO #excelTest (rowID,docName,docVersion,question,blankField) VALUES
('1','document1','1','q1',null),
('2','document1','1','q2',null),
('3','document1','1','q3',null),
('4','document1','1','q4',null),
('5','document1','1','q5',null),
('6','document1','1','q6',null),
('7','document1','2','q1',null),
('8','document1','2','q2',null),
('9','document1','2','q3',null),
('10','document1','2','q4',null),
('11','document1','2','q5',null),
('12','document1','2','q6',null),
('13','document1','2','q7',null),
('14','document1','2','q8',null),
('15','document2','1','q1',null),
('16','document2','1','q2',null),
('17','document2','1','q3',null),
('18','document2','1','q4',null)
--Output
rowIDdocNamedocVersionquestionblankField
1document11q1NULL
2document11q2NULL
3document11q3NULL
4document11q4NULL
5document11q5NULL
6document11q6NULL
7document12q1NULL
8document12q2NULL
9document12q3NULL
10document12q4NULL
11document12q5NULL
12document12q6NULL
13document12q7NULL
14document12q8NULL
15document21q1NULL
16document21q2NULL
17document21q3NULL
18document21q4NULL
June 24, 2014 at 10:14 pm
Nice setup... makes sorting this stuff out not to hard! I would probably use a cursor because you have to process each combination of Version and Doc separately... something like:
-- CREATE CURSOR TO LOOP OVER UNIQUE VALUES
DECLARE @docName NVARCHAR(50);
DECLARE @docVersion NVARCHAR(2);
DECLARE docsCursor CURSOR FAST_FORWARD FOR
SELECT DISTINCT docName, docVersion
FROM #excelTest
ORDER BY docName, docVersion;
OPEN docsCursor
FETCH NEXT FROM docsCursor INTO @docName, @docVersion
WHILE @@FETCH_STATUS = 0
BEGIN
PRINT @docName + ' ' + @docVersion;
-- build full BCP query
select @sql = 'bcp "' + @dbName + ' SELECT rowID
, docName
, docVersion
, question
, blankField
FROM #excelTest
WHERE docName = @docName
AND docVersion = @docVersion" queryout "' + @fullFileName + '" -c -t"," -CRAW'
-- execute BCP here
EXEC (@sql)
FETCH NEXT FROM docsCursor INTO @docName, @docVersion
END
CLOSE docsCursor
DEALLOCATE docsCursor
The bcp part is a bit of a cheat, because I'm not very good at it... (something more to learn!), but hopefully this points you in the right direction. Of course, someone's going to jump in I'm sure and correct me... if you dig around on the internet, there are examples of using BCP all over the place.
June 25, 2014 at 9:43 am
Ah, many thanks however I've read that cursors aren't the most efficient way of looping through large data sets. As mentioned my routine will need to loop through over 400 times and each time it's selecting data from over 800000 tows in my master table. If the cursor option works in reasonable time then I still need the best way to output each sheet individually and is BCP the most efficient way to achieve this?
June 25, 2014 at 11:09 am
If you're going to output the results to Excel, you have to do it a single dataset at a time, so I don't know that you have any other option than to use a cursor. Cursors aren't a good idea if you're doing set-based operations, but in this case, you would be processing each individual value (the result of the DISTINCT query) one at a time.
If I'm wrong, I would be more than happy for someone here to set me straight.
June 25, 2014 at 4:38 pm
OK, I see your point and I'm happy to use the cursor method to get individual outputs. My problem is that I've tried BCP within the cursor after searching this forum for BCP with queryout examples and cannot get it to output to excel or any other file format (txt, csv). Instead of building up the @SQL string I've run the statement below within the cursor and all I get is the BCP help in the SQL Results window. Could it be its failing because of the use of variables for docName, docVersion and fullFileName? Or is it that I need to use different switches?
I am system admin on my SQL server 2012 and used the -T for trusted connection.
Is there an easier alternative to using the BCP option within the cursor?
EXEC master..xp_cmdshell 'bcp "SELECT * FROM #excelTest
WHERE docName = @docName
AND docVersion = @docVersion"
queryout @fullFileName -T -c -t'
output
usage: bcp {dbtable | query} {in | out | queryout | format} datafile
[-m maxerrors] [-f formatfile] [-e errfile]
[-F firstrow] [-L lastrow] [-b batchsize]
[-n native type] [-c character type] [-w wide character type]
[-N keep non-text native] [-V file format version] [-q quoted identifier]
[-C code page specifier] [-t field terminator] [-r row terminator]
[-i inputfile] [-o outfile] [-a packetsize]
[-S server name] [-U username] [-P password]
[-T trusted connection] [-v version] [-R regional enable]
[-k keep null values] [-E keep identity values]
[-h "load hints"] [-x generate xml format file]
[-d database name] [-K application intent]
NULL
June 25, 2014 at 4:44 pm
I'd approach this differently than T-SQL.
First, I'd be doing this in SSIS. Drop your distinct list of items off to a recordset object and use that as a feeder for a For Each Loop component.
Inside there, using variables to trap the output I'd have another data flow that aimed at your files using expressions on the target and parameters to a proc on the source. You'll have to adjust the delivery to excel but it's relatively easy as you're able to set it to be a variable which you already fed in via the outer for each loop.
The real pain in the arse that comes with this is the ODBC driver to Excel is a piece of crap for datatypes. However, for something that's all text, all the time, you should be fine.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
June 25, 2014 at 8:16 pm
Might be a case of "when all you [know] is a hammer, everything looks like a nail."
June 26, 2014 at 3:56 am
Yes however if you have no hammer and no nails what do you do then?
Maybe I've asked the wrong questions in the correct forum or the correct questions in the wrong forum (should BCP work any differently in SQL 2008 compared with SQL 2012?).
My focus probably should have been BCP and how to get this to work since the cursor looping works fine - thanks for this.
I just cannot get the BCP syntax to produce an output. I have searched the article in these forums for tips on BCP without any success.
I would rather persevere with code and avoid Integration Services at this stage but thanks for the suggestion.
I've enabled xp_cmdshell,
I'm sa within SQL Server 2012
I'm using Trusted connection
I've tried drive letters and full UNC paths in the output filename
Is BCP meant to work with declared variables as it moves through the loop?
Is there an alternative to BCP within the cursor?
June 26, 2014 at 7:54 am
lenrigby (6/25/2014)
OK, I see your point and I'm happy to use the cursor method to get individual outputs. My problem is that I've tried BCP within the cursor after searching this forum for BCP with queryout examples and cannot get it to output to excel or any other file format (txt, csv). Instead of building up the @SQL string I've run the statement below within the cursor and all I get is the BCP help in the SQL Results window. Could it be its failing because of the use of variables for docName, docVersion and fullFileName? Or is it that I need to use different switches?I am system admin on my SQL server 2012 and used the -T for trusted connection.
Is there an easier alternative to using the BCP option within the cursor?
EXEC master..xp_cmdshell 'bcp "SELECT * FROM #excelTest
WHERE docName = @docName
AND docVersion = @docVersion"
queryout @fullFileName -T -c -t'
At the time bcp starts running, "@docName", "@docVersion" and "@fullFileName" isn't even defined, and bcp will try to pass it to SQL and then SQL will return to bcp an indication of failure because these three are undefined. Remember, bcp uses a brand new connection. Additionally, since #excelTest is connection specific it will also be undefined. Also if you plan to use a trusted connection, it needs to be the trusted connection for the userid xp_cmdshell uses at the time its creating the process that ultimately runs the bcp command. I skip that noise and use an sql account, yeah password in program text, I know, but folks don't let me do windows active directory stuff as I'm really only qualified as a convenience store clerk.
When I've done something similar, I used a global temp (##excelTest) because although it is still temporary, it is visible to all database connections / sessions. As for the variables, replace them with actual values before calling bcp. Stuff your ultimate bcp command into a variable, that way you can select it for troubleshooting.
Something like:
if OBJECT_ID('tempdb..##exceltest') is not null
begin
drop table ##exceltest
end
create table ##exceltest
(
docName varchar(100),
docVersion int,
hithar varchar(100)
)
insert into ##exceltest select 'testdoc',1,'well hellooooo'
declare @docName varchar(100)
declare @docVersion int
declare @cmdstr varchar(500)
set @docName = 'testdoc'
set @docVersion = 1
set @cmdstr = 'bcp "select * from ##exceltest where docName = '''
+ @docName + ''' and docVersion = ' + convert(varchar(10),@docVersion)
+ '" queryout d:\transferdir\transferfile.txt -S "myserver\myinstance" -c -U mysqlusername -P mysqlpassword'
select @cmdstr
exec master..xp_cmdshell @cmdstr
Caution, there might be a mixture of single quotes and double quotes and even doubled up single quotes.
(not addressing the cursor part of the equation.)
June 26, 2014 at 9:33 am
Added in the "full file name" part:
if OBJECT_ID('tempdb..##exceltest') is not null
begin
drop table ##exceltest
end
create table ##exceltest
(
docName varchar(100),
docVersion int,
hithar varchar(100)
)
insert into ##exceltest select 'testdoc',1,'well hellooooo'
declare @docName varchar(100)
declare @docVersion int
declare @fullFileName varchar(100)
declare @cmdstr varchar(500)
set @docName = 'testdoc'
set @docVersion = 1
-- assuming d drive on server
set @fullFileName = 'd:\transferdir\transferfile.txt'
set @cmdstr = 'bcp "select * from ##exceltest where docName = '''
+ @docName + ''' and docVersion = ' + convert(varchar(10),@docVersion)
+ '" queryout ' + @fullFileName + ' -S "mysqlserver\mysqlinstance" -c -U mysqluser -P msqlpassword'
select @cmdstr
exec master..xp_cmdshell @cmdstr
June 26, 2014 at 11:14 am
Superb - this BCP works. I will incorporate with the cursor later and report back.
Many thanks
June 26, 2014 at 12:46 pm
awesome! you can now join the SSIS haters club!
(LOL just kidding guys, I'm sure SSIS is great)
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply