Create separate MS Excel files by looping through large table

  • Hi

    I have a master table containing details of over 800000 surveys made up of approximately 400 distinct document names and versions. Each document can have as few as 10 questions but as many as 150. Each question represents one row.

    My challenge is to create a separate spreadsheet for each of the 400 distinct document names and versions containing all the rows and columns present in the master table. The largest number of rows would be around 150 and therefore each spreadsheet will not be very big.

    e.g. in my sample data below, i will need to create individual Excel files named as follows . . .

    "Document1Version1.xlsx" containing all the column names and 6 rows for the 6 questions relating to Document 1 version 1

    "Document1Version2.xlsx" containing all the column names and 8 rows for the 8 questions relating to Document 1 version 2

    "Document2Version1.xlsx" containing all the column names and 4 rows for the 4 questions relating to Document 2 version 1

    I assume that one of the first things is to create a lookup of the distinct document names and versions assign some variables and then use this lookup to loop through and sequentially filter the master table data ready for creating the individual Excel files.

    Can anyone help please?

    --CREATE TEMP TABLE FOR EXAMPLE

    IF OBJECT_ID('tempdb..#excelTest') IS NOT NULL DROP TABLE #excelTest

    CREATE TABLE #excelTest (

    [rowID] [nvarchar](10) NULL,

    [docName] [nvarchar](50) NULL,

    [docVersion] [nvarchar](2) NULL,

    [question] nvarchar(2),

    [blankField] [nvarchar](50) NULL,

    ) ON [PRIMARY]

    GO

    --Add sample records

    INSERT INTO #excelTest (rowID,docName,docVersion,question,blankField) VALUES

    ('1','document1','1','q1',null),

    ('2','document1','1','q2',null),

    ('3','document1','1','q3',null),

    ('4','document1','1','q4',null),

    ('5','document1','1','q5',null),

    ('6','document1','1','q6',null),

    ('7','document1','2','q1',null),

    ('8','document1','2','q2',null),

    ('9','document1','2','q3',null),

    ('10','document1','2','q4',null),

    ('11','document1','2','q5',null),

    ('12','document1','2','q6',null),

    ('13','document1','2','q7',null),

    ('14','document1','2','q8',null),

    ('15','document2','1','q1',null),

    ('16','document2','1','q2',null),

    ('17','document2','1','q3',null),

    ('18','document2','1','q4',null)

    --Output

    rowIDdocNamedocVersionquestionblankField

    1document11q1NULL

    2document11q2NULL

    3document11q3NULL

    4document11q4NULL

    5document11q5NULL

    6document11q6NULL

    7document12q1NULL

    8document12q2NULL

    9document12q3NULL

    10document12q4NULL

    11document12q5NULL

    12document12q6NULL

    13document12q7NULL

    14document12q8NULL

    15document21q1NULL

    16document21q2NULL

    17document21q3NULL

    18document21q4NULL

  • Nice setup... makes sorting this stuff out not to hard! I would probably use a cursor because you have to process each combination of Version and Doc separately... something like:

    -- CREATE CURSOR TO LOOP OVER UNIQUE VALUES

    DECLARE @docName NVARCHAR(50);

    DECLARE @docVersion NVARCHAR(2);

    DECLARE docsCursor CURSOR FAST_FORWARD FOR

    SELECT DISTINCT docName, docVersion

    FROM #excelTest

    ORDER BY docName, docVersion;

    OPEN docsCursor

    FETCH NEXT FROM docsCursor INTO @docName, @docVersion

    WHILE @@FETCH_STATUS = 0

    BEGIN

    PRINT @docName + ' ' + @docVersion;

    -- build full BCP query

    select @sql = 'bcp "' + @dbName + ' SELECT rowID

    , docName

    , docVersion

    , question

    , blankField

    FROM #excelTest

    WHERE docName = @docName

    AND docVersion = @docVersion" queryout "' + @fullFileName + '" -c -t"," -CRAW'

    -- execute BCP here

    EXEC (@sql)

    FETCH NEXT FROM docsCursor INTO @docName, @docVersion

    END

    CLOSE docsCursor

    DEALLOCATE docsCursor

    The bcp part is a bit of a cheat, because I'm not very good at it... (something more to learn!), but hopefully this points you in the right direction. Of course, someone's going to jump in I'm sure and correct me... if you dig around on the internet, there are examples of using BCP all over the place.

  • Ah, many thanks however I've read that cursors aren't the most efficient way of looping through large data sets. As mentioned my routine will need to loop through over 400 times and each time it's selecting data from over 800000 tows in my master table. If the cursor option works in reasonable time then I still need the best way to output each sheet individually and is BCP the most efficient way to achieve this?

  • If you're going to output the results to Excel, you have to do it a single dataset at a time, so I don't know that you have any other option than to use a cursor. Cursors aren't a good idea if you're doing set-based operations, but in this case, you would be processing each individual value (the result of the DISTINCT query) one at a time.

    If I'm wrong, I would be more than happy for someone here to set me straight.

  • OK, I see your point and I'm happy to use the cursor method to get individual outputs. My problem is that I've tried BCP within the cursor after searching this forum for BCP with queryout examples and cannot get it to output to excel or any other file format (txt, csv). Instead of building up the @SQL string I've run the statement below within the cursor and all I get is the BCP help in the SQL Results window. Could it be its failing because of the use of variables for docName, docVersion and fullFileName? Or is it that I need to use different switches?

    I am system admin on my SQL server 2012 and used the -T for trusted connection.

    Is there an easier alternative to using the BCP option within the cursor?

    EXEC master..xp_cmdshell 'bcp "SELECT * FROM #excelTest

    WHERE docName = @docName

    AND docVersion = @docVersion"

    queryout @fullFileName -T -c -t'

    output

    usage: bcp {dbtable | query} {in | out | queryout | format} datafile

    [-m maxerrors] [-f formatfile] [-e errfile]

    [-F firstrow] [-L lastrow] [-b batchsize]

    [-n native type] [-c character type] [-w wide character type]

    [-N keep non-text native] [-V file format version] [-q quoted identifier]

    [-C code page specifier] [-t field terminator] [-r row terminator]

    [-i inputfile] [-o outfile] [-a packetsize]

    [-S server name] [-U username] [-P password]

    [-T trusted connection] [-v version] [-R regional enable]

    [-k keep null values] [-E keep identity values]

    [-h "load hints"] [-x generate xml format file]

    [-d database name] [-K application intent]

    NULL

  • I'd approach this differently than T-SQL.

    First, I'd be doing this in SSIS. Drop your distinct list of items off to a recordset object and use that as a feeder for a For Each Loop component.

    Inside there, using variables to trap the output I'd have another data flow that aimed at your files using expressions on the target and parameters to a proc on the source. You'll have to adjust the delivery to excel but it's relatively easy as you're able to set it to be a variable which you already fed in via the outer for each loop.

    The real pain in the arse that comes with this is the ODBC driver to Excel is a piece of crap for datatypes. However, for something that's all text, all the time, you should be fine.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Might be a case of "when all you [know] is a hammer, everything looks like a nail."

  • Yes however if you have no hammer and no nails what do you do then?

    Maybe I've asked the wrong questions in the correct forum or the correct questions in the wrong forum (should BCP work any differently in SQL 2008 compared with SQL 2012?).

    My focus probably should have been BCP and how to get this to work since the cursor looping works fine - thanks for this.

    I just cannot get the BCP syntax to produce an output. I have searched the article in these forums for tips on BCP without any success.

    I would rather persevere with code and avoid Integration Services at this stage but thanks for the suggestion.

    I've enabled xp_cmdshell,

    I'm sa within SQL Server 2012

    I'm using Trusted connection

    I've tried drive letters and full UNC paths in the output filename

    Is BCP meant to work with declared variables as it moves through the loop?

    Is there an alternative to BCP within the cursor?

  • lenrigby (6/25/2014)


    OK, I see your point and I'm happy to use the cursor method to get individual outputs. My problem is that I've tried BCP within the cursor after searching this forum for BCP with queryout examples and cannot get it to output to excel or any other file format (txt, csv). Instead of building up the @SQL string I've run the statement below within the cursor and all I get is the BCP help in the SQL Results window. Could it be its failing because of the use of variables for docName, docVersion and fullFileName? Or is it that I need to use different switches?

    I am system admin on my SQL server 2012 and used the -T for trusted connection.

    Is there an easier alternative to using the BCP option within the cursor?

    EXEC master..xp_cmdshell 'bcp "SELECT * FROM #excelTest

    WHERE docName = @docName

    AND docVersion = @docVersion"

    queryout @fullFileName -T -c -t'

    At the time bcp starts running, "@docName", "@docVersion" and "@fullFileName" isn't even defined, and bcp will try to pass it to SQL and then SQL will return to bcp an indication of failure because these three are undefined. Remember, bcp uses a brand new connection. Additionally, since #excelTest is connection specific it will also be undefined. Also if you plan to use a trusted connection, it needs to be the trusted connection for the userid xp_cmdshell uses at the time its creating the process that ultimately runs the bcp command. I skip that noise and use an sql account, yeah password in program text, I know, but folks don't let me do windows active directory stuff as I'm really only qualified as a convenience store clerk.

    When I've done something similar, I used a global temp (##excelTest) because although it is still temporary, it is visible to all database connections / sessions. As for the variables, replace them with actual values before calling bcp. Stuff your ultimate bcp command into a variable, that way you can select it for troubleshooting.

    Something like:

    if OBJECT_ID('tempdb..##exceltest') is not null

    begin

    drop table ##exceltest

    end

    create table ##exceltest

    (

    docName varchar(100),

    docVersion int,

    hithar varchar(100)

    )

    insert into ##exceltest select 'testdoc',1,'well hellooooo'

    declare @docName varchar(100)

    declare @docVersion int

    declare @cmdstr varchar(500)

    set @docName = 'testdoc'

    set @docVersion = 1

    set @cmdstr = 'bcp "select * from ##exceltest where docName = '''

    + @docName + ''' and docVersion = ' + convert(varchar(10),@docVersion)

    + '" queryout d:\transferdir\transferfile.txt -S "myserver\myinstance" -c -U mysqlusername -P mysqlpassword'

    select @cmdstr

    exec master..xp_cmdshell @cmdstr

    Caution, there might be a mixture of single quotes and double quotes and even doubled up single quotes.

    (not addressing the cursor part of the equation.)

  • Added in the "full file name" part:

    if OBJECT_ID('tempdb..##exceltest') is not null

    begin

    drop table ##exceltest

    end

    create table ##exceltest

    (

    docName varchar(100),

    docVersion int,

    hithar varchar(100)

    )

    insert into ##exceltest select 'testdoc',1,'well hellooooo'

    declare @docName varchar(100)

    declare @docVersion int

    declare @fullFileName varchar(100)

    declare @cmdstr varchar(500)

    set @docName = 'testdoc'

    set @docVersion = 1

    -- assuming d drive on server

    set @fullFileName = 'd:\transferdir\transferfile.txt'

    set @cmdstr = 'bcp "select * from ##exceltest where docName = '''

    + @docName + ''' and docVersion = ' + convert(varchar(10),@docVersion)

    + '" queryout ' + @fullFileName + ' -S "mysqlserver\mysqlinstance" -c -U mysqluser -P msqlpassword'

    select @cmdstr

    exec master..xp_cmdshell @cmdstr

  • Superb - this BCP works. I will incorporate with the cursor later and report back.

    Many thanks

  • awesome! you can now join the SSIS haters club!

    (LOL just kidding guys, I'm sure SSIS is great)

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply