Best practice of storing many queries to export data?

Question

Best practice of storing many queries to export data?

SQLChao

Mr or Mrs. 500

Points: 534
More actions
October 7, 2015 at 1:22 pm

#321177

One of my current responsibilities is to export data to 3rd party vendors. Each export can contain many csv files. The exports are all different in terms of what data is being sent.
The way I have it currently setup is each file that needs to be created is a view. An SSIS package gets the data from the view, writes to CSV, and then sftp to 3rd party vendor. This seemed like a good idea at first because the columns are static but the calculations might change. So all I have to do is ALTER VIEW and I don't have to change anything in the package.
Is there a better way of doing this? I was curious to see what other people are doing. What makes it challenging is that all the exports are so different. If they were similar I could have created generic views that cover all the exports instead of each export having its own view. Eventually I'm going to have 100's of views.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Eric M Russell SSC Guru Points: 125562 More actions · Answer 1

I'd rather have more generic SSIS packages and manage 100s of views, rather than managing 100s of SSIS packages. I'd also rather deploy 100 views than I would deploy 10 SSIS packages.

Another good thing about having your queries in external views is that you can query directly from the view if needed. For example, there may be a need to use SSRS to generate something like a summary report that accompanies the CSV extract.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

g.britton SSChampion Points: 13863 More actions · Answer 2

Eric M Russell (10/7/2015)
I'd rather have more generic SSIS packages and manage 100s of viewst.

So would I, but you can't have generic dataflows unless the views produce the same columns (number and type). IIUC the OP has at least one dataflow (source=view, dest=csv). That kills the generic idea *unless* the views produce the same output schemas.

FWIW if you use BIML, it can easily generate 100 SSIS packages in moments.

Gerald Britton, Pluralsight courses

SQLChao Mr or Mrs. 500 Points: 534 More actions · Answer 3

Yes you are correct. The files themselves are very different which is why I wasn't able to make them more generic.

So for instance I may have a package that creates 6 csv files and sftps them. That means there are 6 unique views dedicated to that package.

So although I may only have 10 packages, there are 60 views. This is fine but I was just wondering if there was a better way considering I'll have to manage this.

Eric M Russell SSC Guru Points: 125562 More actions · Answer 4

Personally, I'd rather not deal with 100 separate SSIS packages, even if they are auto generated by BIML. If you can leverage BIML or .NET to programmatically modify the DataFlow of a single SSIS package at runtime, then that would be great.

Using an expression based variable for the data source, you can have a single SSIS package export from a dynamically named view and apply a dynamic WHERE clause at runtime. We probably all know about that. However, the problem with this use case is the requirement that each view is different and can have any number of columns. If there is some reason why this really needs to be done in SSIS, for example maybe the export is part of a larger workflow or you're leveraging a 3rd party or custom SSIS component, then you could conditionally branch to a different DataFlow depending on how many columns a view has. But that's messy.

If all you're basically doing is just selecting from a view into a CSV file and then doing an FTP transfer to the client, then that could more easily be accomplished using dynamic T-SQL and the BCP command.

https://www.simple-talk.com/sql/database-administration/creating-csv-files-using-bcp-and-stored-procedures/

http://stackoverflow.com/questions/11028655/t-sql-ftp-using-xp-cmdshell

Taking this route, if you have the process meta-data driven, for example storing the view name, WHERE clause, client folder, FTP account credentials, etc. for each export in a table, and then building up the BCP and FTP commands dynamically at runtime, the export process could be 100 lines of T-SQL code, which is better than 100 SSIS packages.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Jeff Moden SSC Guru Points: 1004405 More actions · Answer 5

Eric M Russell (10/8/2015)
Personally, I'd rather not deal with 100 separate SSIS packages, even if they are auto generated by BIML. If you can leverage BIML or .NET to programmatically modify the DataFlow of a single SSIS package at runtime, then that would be great.
Using an expression based variable for the data source, you can have a single SSIS package export from a dynamically named view and apply a dynamic WHERE clause at runtime. We probably all know about that. However, the problem with this use case is the requirement that each view is different and can have any number of columns. If there is some reason why this really needs to be done in SSIS, for example maybe the export is part of a larger workflow or you're leveraging a 3rd party or custom SSIS component, then you could conditionally branch to a different DataFlow depending on how many columns a view has. But that's messy.
If all you're basically doing is just selecting from a view into a CSV file and then doing an FTP transfer to the client, then that could more easily be accomplished using dynamic T-SQL and the BCP command.
https://www.simple-talk.com/sql/database-administration/creating-csv-files-using-bcp-and-stored-procedures/
http://stackoverflow.com/questions/11028655/t-sql-ftp-using-xp-cmdshell
Taking this route, if you have the process meta-data driven, for example storing the view name, WHERE clause, client folder, FTP account credentials, etc. for each export in a table, and then building up the BCP and FTP commands dynamically at runtime, the export process could be 100 lines of T-SQL code, which is better than 100 SSIS packages.

+`1000.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Kristen-173977 SSCrazy Eights Points: 8687 More actions · Answer 6

Dunno if relevant, but just in case ...

We have to "export" a CSV list of various mail-lists (i.e. filtered on various categories) to a Bulk Email service each night.

We have a small executable (probably written in VBScript! its been around for many years) which makes a query to the DB to a "Config Table" to get details of all the reports to run (in our case they are all Stored Procedure names, but could be VIEWs that will just take a SELECT * - obviously very easy to add a new one), and the details of the "target source" (again, in our case that's the details of the EMail System Owner and Target Name of the list) and it runs the query, bundles up all the rows / columns as CSV, and then squirts the output up to the bulk email service API.

The actual code to do the CSV part was trivial, but we did need to take care of properly quoting columns that had embedded commas, and then double-quoting any of those that had quotes, as per CSV spec, and to catch anything with embedded newlines etc. which is not something that BCP is able to do AFAIK.

Eric M Russell SSC Guru Points: 125562 More actions · Answer 7

Kristen-173977 (10/9/2015)
...
The actual code to do the CSV part was trivial, but we did need to take care of properly quoting columns that had embedded commas, and then double-quoting any of those that had quotes, as per CSV spec, and to catch anything with embedded newlines etc. which is not something that BCP is able to do AFAIK.

Stuff like aliasing column names and stripping quotes from column values can be easily handled within the view or stored procedure, from which BCP exports.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Kristen-173977 SSCrazy Eights Points: 8687 More actions · Answer 8

Eric M Russell (10/9/2015)
Stuff like aliasing column names and stripping quotes from column values can be easily handled within the view or stored procedure, from which BCP exports.

Yeah, but you've got to do it, and do it right, in every column in every view - unless there is a smarter way?

What I normally see recommended is something like

CREATE VIEW MyExportView

AS

SELECT [Col1] = '"' + REPLACE(Col1, '"', '""') + '"',

... repeat for all other STRING columns

Relatively hard to do for all possible Control Characters that might foul up the import at the far end, although the REPLACE can be relatively easily extended to remove CR and LF. I think it all becomes a bit unwieldy at that point!

Apologies if I have misunderstood

lionel.mathio Grasshopper Points: 11 More actions · Answer 9

I personally support Eric M Russell's idea

Graduated from Soran [/url] University with First Class Degree with Honours in Computer Science.

Eric M Russell SSC Guru Points: 125562 More actions · Answer 10

Kristen-173977 (10/9/2015)
Eric M Russell (10/9/2015)
Stuff like aliasing column names and stripping quotes from column values can be easily handled within the view or stored procedure, from which BCP exports.
Yeah, but you've got to do it, and do it right, in every column in every view - unless there is a smarter way?
What I normally see recommended is something like
CREATE VIEW MyExportView
AS
SELECT [Col1] = '"' + REPLACE(Col1, '"', '""') + '"',
... repeat for all other STRING columns
Relatively hard to do for all possible Control Characters that might foul up the import at the far end, although the REPLACE can be relatively easily extended to remove CR and LF. I think it all becomes a bit unwieldy at that point!
Apologies if I have misunderstood

Without changing the original table, a view is the best way to abstract a poorly designed table (bad column names and bad data) as table that is something more usable. If you have many such tables to deal with, then you can write a T-SQL script that leverages INFORMATION_SCHEMA.COLUMNS to generates a CREATE VIEW definition for all tables in the database, similar in concept to an automatic CREATE PROCEDURE generator for CRUD stored procedures.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

ScottPletcher SSC Guru Points: 101119 More actions · Answer 11

Eric M Russell (10/12/2015)
Kristen-173977 (10/9/2015)
Eric M Russell (10/9/2015)
Stuff like aliasing column names and stripping quotes from column values can be easily handled within the view or stored procedure, from which BCP exports.
Yeah, but you've got to do it, and do it right, in every column in every view - unless there is a smarter way?
What I normally see recommended is something like
CREATE VIEW MyExportView
AS
SELECT [Col1] = '"' + REPLACE(Col1, '"', '""') + '"',
... repeat for all other STRING columns
Relatively hard to do for all possible Control Characters that might foul up the import at the far end, although the REPLACE can be relatively easily extended to remove CR and LF. I think it all becomes a bit unwieldy at that point!
Apologies if I have misunderstood
Without changing the original table, a view is the best way to abstract a poorly designed table (bad column names and bad data) as table that is something more usable. If you have many such tables to deal with, then you can write a T-SQL script that leverages INFORMATION_SCHEMA.COLUMNS to generates a CREATE VIEW definition for all tables in the database, similar in concept to an automatic CREATE PROCEDURE generator for CRUD stored procedures.

Yes, it's relatively trivial to generate the code to add delimiters around every column and do other similar things.

But I'd strongly recommend using sys.columns, and other system tables, rather than INFORMATION_SCHEMA views. From my experience, the I_S views perform very poorly and can tend to cause deadlocks. Also, the sys views contain metadata (details) about the underlying sql object not present in the I_S views.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Eric M Russell SSC Guru Points: 125562 More actions · Answer 12

ScottPletcher (10/12/2015)
Eric M Russell (10/12/2015)
Kristen-173977 (10/9/2015)
Eric M Russell (10/9/2015)
Stuff like aliasing column names and stripping quotes from column values can be easily handled within the view or stored procedure, from which BCP exports.
Yeah, but you've got to do it, and do it right, in every column in every view - unless there is a smarter way?
What I normally see recommended is something like
CREATE VIEW MyExportView
AS
SELECT [Col1] = '"' + REPLACE(Col1, '"', '""') + '"',
... repeat for all other STRING columns
Relatively hard to do for all possible Control Characters that might foul up the import at the far end, although the REPLACE can be relatively easily extended to remove CR and LF. I think it all becomes a bit unwieldy at that point!
Apologies if I have misunderstood
Without changing the original table, a view is the best way to abstract a poorly designed table (bad column names and bad data) as table that is something more usable. If you have many such tables to deal with, then you can write a T-SQL script that leverages INFORMATION_SCHEMA.COLUMNS to generates a CREATE VIEW definition for all tables in the database, similar in concept to an automatic CREATE PROCEDURE generator for CRUD stored procedures.
Yes, it's relatively trivial to generate the code to add delimiters around every column and do other similar things.
But I'd strongly recommend using sys.columns, and other system tables, rather than INFORMATION_SCHEMA views. From my experience, the I_S views perform very poorly and can tend to cause deadlocks. Also, the sys views contain metadata (details) about the underlying sql object not present in the I_S views.

I've never seen or heard about querying INFORMATION_SHEMA views to cause blocking issues, I guess that would be in a high volume OLTP database scenario. This thing about tables names with embedded spaces and symbols and also values with embedded " quotes; it sounds like a staging database that originates from imported Excel sheets.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho