Bulk Export Best Practices

  • I'm having a discussion with an internal administrator that does not wish to enable xp_cmdshell on a server that is inside the firewall. The argument is that I want it enabled in order to use BCP Queryout to export data that needs to be moved beyond the firewall to a server at a co-lo.

    Several tables need to be exported, some are large and I will need to do a nightly pull of all data in order to update the server at the co-lo which supports the production web app of a portion of the company.

    Data schlepping will look like this:

    SourceDB -> File -> xfer to location accessible by co-lo -> read in to co-lo DB

    I'm not getting sys admin on the source db but I'm asking the administrator to create a proxy account that will let me run bcp via a scheduled (SQL Agent) job (stored proc call). I'm even offering to turn on BCP, export, turn off BCP.

    Advice??? I find tons of stuff about data load best practices that include using BCP. Not finding MSDN documentation on data export best practices.

  • What's the reason this needs to be xp_cmdshell rather than just an invocation of BCP from the sql agent job via command line? If there is work to be done on the DB side, can taht be done in one job step and then have the job's next step be the bulk copy?


    And then again, I might be wrong ...
    David Webb

  • I'm not sure if you're looking for ammunition to get this change enacted in your environment (been there) or looking for alternatives but I'll quickly speak to both.

    The problem many security-minded people have with enabling xp_cmdshell is that it implies more than the ability to bcp data. Proxy accounts are better than nothing in this regard but are not foolproof; the default in most environments is to keep the exposed surface area to/from your SQL Server as small as possible. The question they're going to ask (and that you need to be prepared to answer) is: how else can the same task be accomplished?

    I'd suggest looking into building out some simple SSIS packages if all you need is a data export mechanism. The upfront development time is increased but in terms of a long-lasting/scalable solution I prefer (and have had great success) with SSIS compared to bcp.

    I've been on both sides of this argument and I understand the frustration, always important to keep your options open and understand where the other side is coming from.

  • Hmmm, David, here's how I run BCP:

    DECLARE @SQL NVARCHAR(2000)

    SET @SQL = 'BCP Work.dbo.tblAccount_Raw IN "C:\DataFiles\AccountFile129814978600644674.txt" -c -t"|" -F"2" -T'

    EXEC xp_cmdshell @SQL

    Am I missing something?

    Sean,

    SSIS would be the alternative and I've put in place many SSIS projects to shuffle data in and out of SQL Server. While the initial development is OK I find that they generally, er, well you know, create a vacuum, for production life cycle. This company relies on consultants for this kind of work and I've yet to find a tool that allows me to search SSIS XML to look for the objects involved in the same way I can search the definitions of sys.sql_modules. Bumbling around in variables, properties, sub menus and config files trying to fully understand what the SSIS package is doing is not my favorite hobby. In a TSQL process, its all right there in front of you.

    Since the data source is an internal production web server and they have no linked SQL Server to this box that could run SSIS. Therefore SSIS would have to run on the same box which they consider very sensitive to additional demand. As I mentioned before it is entirely behind the firewall so I really prefer just BCP queryout.

    What I have been looking for is some kind of Microsoft documentation that discusses the options. But in a half hour of Googling I haven't found anything that does the trick.

    Am I winning the argument? Thanks for the debate, it is helpful!

  • Do you build the BCP command line in sql, or are all the values always the same. If they remain constant, then invoking bcp from a step in a job would get around the problem of having to use xp_cmdshell and wouldn't require SSIS. Even if there was some manipulation fo the command line, you might still be able to do that with batch commands and just run the bcp without having to go through t-sql. That would eliminate the xp_cmdshell security difficulties and still get you the flexibility you need.


    And then again, I might be wrong ...
    David Webb

  • I won't argue with you on the complexities of "doing SSIS right", but from extensive experience using both sets of technologies it is infinitely more scalable and robust in the long run.

    That being said you can query the properties of SSIS packages by shredding the XML definition. Below is a very simple example (modified from a more complex script I poached online some time back) of how to get the checkpoint file location for an input package:

    USE DBA

    GO

    DECLARE @sql NVARCHAR(MAX)

    DECLARE @PackagePath VARCHAR(2000)

    SET @PackagePath = 'C:\Program Files (x86)\Microsoft SQL Server\100\DTS\Packages\MyPackage.dtsx';

    CREATE TABLE ##packageXML

    (

    PackagePath VARCHAR(2000),

    PackageXML XML

    );

    SET @sql = '

    INSERT ##packageXML

    SELECT ''@PackagePath'' AS PackagePath,

    CAST(BulkColumn AS XML) AS PackageXML

    FROM OPENROWSET(BULK ''@PackagePath'', SINGLE_BLOB) a

    ';

    SET @sql = REPLACE(@sql, '@PackagePath', @PackagePath);

    EXEC sp_executesql @sql;

    SELECT PackagePath,

    PackageXML.value('declare namespace DTS="www.microsoft.com/SqlServer/Dts";

    /DTS:Executable[1]/DTS:Property[@DTS:Name=''CheckpointFileName''][1]','nvarchar(500)') AS CheckpointFileName

    FROM ##packageXML

    DROP TABLE ##packageXML;

    Definitely off-topic but hope it helps.

  • Thanks for the code to shred the XML. That will come in handy.

    However, from my experience, the best high-volume ETL solution I've used was and an end-to-end TSQL solution, fully data driven, highly configurable and it could move 750MM rows through some heavy processing over a weekend. Multi-threaded processing spawned Agent jobs that parsed out work across all CPUs. It was way easier to understand and work on than SSIS processes that did half what this process did.

  • Fly Girl (5/25/2012)


    Thanks for the code to shred the XML. That will come in handy.

    However, from my experience, the best high-volume ETL solution I've used was and an end-to-end TSQL solution, fully data driven, highly configurable and it could move 750MM rows through some heavy processing over a weekend. Multi-threaded processing spawned Agent jobs that parsed out work across all CPUs. It was way easier to understand and work on than SSIS processes that did half what this process did.

    It's nice to find a kindred spirit on this subject. Thanks for making my day! 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • A bit off topic, for sure, but I find the accepted and common (but incorrect) [font="Arial Black"]use [/font]of the proxy for xp_CmdShell to be a bit of a security oxymoron rather than a desired effect. The common use I'm talking about is to use it to allow users with other than "SA" privs to execute xp_CmdShell directly and I can't think of a worse security risk.

    Using xp_CmdShell doesn't have to be such a risk when done correctly using such a proxy. The whole key to security is for users to have only PUBLIC privs (or at least not "SA" privs as so many apps seem to improperly require) and the privs to execute certain stored procedures. It's actually quite easy to set things up for a user to be able to execute a stored procedure that uses xp_CmdShell without the user being able to see the code for the proc, without the user being able to read from the same tables the proc uses, and without the user being able to execute xp_CmdShell directly.

    My opinion is that xp_CmdShell has unfairly been given a bad rap because a lot of people don't actually know how to set it up for proper security. Are there other secure methods to replace xp_CmdShell functionality? Absolutely. But, I'm like the previous poster "Fly Girl". I don't see the need to involve multiple disiplines to do the things that xp_CmdShell does so easily and so well. For those things that might require a bit more ummmph than what simple DOS commands might provide, you can always use xp_CmdShell to call modern marvels such as Power Shell all from the comfort of a nice, secure, well written T-SQL stored procedure.

    Ironically, unless your system is setup to use xp_CmdShell without risk, it would appear that there's nearly a 100% chance that your system isn't setup correctly security wise. Except for DBAs and the occasional server login, there should be no users (including application users/logins) with more than PUBLIC privs and the right to execute certain stored procedures.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Fly Girl (5/25/2012)


    I'm having a discussion with an internal administrator that does not wish to enable xp_cmdshell on a server...

    If your target workplace is in a corporate environment, which it sounds like it is with the mention of a firewall and a formal separation between the system administrators and the database developers, then you may be blocked from using xp_cmdshell more and more going forward. xp_cmdshell is disabled in SQL Server by default as you are certainly aware and due to its blatant misconfigurations, misuse and abuse over the years the barriers to adoption have grown more numerous and more formalized (auditors are trained to look its use as are lots of security monitoring tools). At my current stop we had to "apply" for an "environment security exception" to allow some legacy apps to continue to function through a migration from 2005 to a new 2008R2 environment where data center monitoring looks for "breaches" known as "cmd shell sessions spawned from a SQL Server". Putting aside whether or not the reputation xp_cmdshell has is deserved or not (I will stay out of that one since Jeff and I have been down that road too many times) this example is precisely why you should consider using this project to cut your teeth in SSIS.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Interesting. What is apparent is that no one here has the understanding to create a read-only proxy account with limited domain permissions. Your point is well taken that once there is a trend it has its own weight and may be impossible to buck.

    Since I've been doing DTS/SSIS since 2002, my teeth are fairly well-worn on SSIS. Every project has its own considerations. As a contractor I'm here for a limited amount of time. Training someone to take over this process at a 'very small company' who only now has a limited grasp of basic SQL queries seems much easier if the process I'm leaving behind is using simple BCP code rather than Visual Studio, config files, error logs, data conversions, variables, etc. as part of the SSIS environment. In general, I prefer to use the simplest tool available to perform the task at hand. What I will likely end up leaving them is a 'hack', teaching them to use the SSMS export Wizard saving the package on the server and scheduling it as a job--void of error handling or QC. That is not a high-quality solution for the company. Nor is it realistic to have an SSIS-trained person to maintain a few simple data exports.

    Just my 2 cents.

  • Fly Girl (6/4/2012)


    Every project has its own considerations. As a contractor I'm here for a limited amount of time. Training someone to take over this process at a 'very small company' who only now has a limited grasp of basic SQL queries seems much easier if the process I'm leaving behind is using simple BCP code rather than Visual Studio, config files, error logs, data conversions, variables, etc. as part of the SSIS environment. In general, I prefer to use the simplest tool available to perform the task at hand. What I will likely end up leaving them is a 'hack', teaching them to use the SSMS export Wizard saving the package on the server and scheduling it as a job--void of error handling or QC. That is not a high-quality solution for the company.

    That's why contractors get paid the big bucks 🙂 I can agree to disagree on the approach, but I would send the smaller shops towards using SSIS. The GUI tends to be friendlier to the newbie and power business users can even crossover. In my experience there is more willingness for newbies to pickup a marketable tool that to roll up their sleeves and start writing procs and bcp command lines.

    Good luck. It sounds like you have a tough decision, implementation and most importantly knowledge handoff in front of you.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply