April 24, 2015 at 1:59 am
Hello Everyone,
I’ve been assigned a task to develop 3 different SSIS packages.
Package 1 – There will be roughly 50 CSV files, I need to read them one by one and create a new database and a table per CSV file in SQL 2014.
This will be a one-time job.
Which is the best option in SSIS package to create database and tables in SQL 2014? I assume I need use “Execute Script Task” here, although please advise.
Package 2 – This will run on daily basis. It will again read those 50 CSV files one by one and migrate their data to tables.
Also, out of 50, couple of files (the exceptional file names have to be configured somewhere as client might add some more files for it out of 50) needs to be treated in different way.
For them the tables have to be truncated first and then insert data into them.
Which is the optimal and fastest way to move such data from these files to SQL tables? Can I run the parallel job in for each loop container?
Where should I configure such “no ordinary” file names?
Package 3 – This job will run once a six months. It will basically perform a purge for the data.
This needs to be parameterized where client can set-up a cut-of-date for it and it will delete the data from all these 50 tables at SQL server.
How to allow a client to insert the cut-of-date?
What is the optimal option to read all tables of database (50 tables) and perform purge based on the cut-of-date?
Common question:
Where should I keep the path for the folder from where I need to pick the CSV files?
How should I deploy and schedule package 2 and 3?
Can anybody please advise what are the efficient/standard ways to maintain such packages?
Any help on this would be greatly appreciated.
April 24, 2015 at 4:18 am
Package 1 –
It is not a good practice to create a database and tables in SSIS packages.
I'll suggest if it is an initial load, create all databases and tables manually and use SSIS to load files into these tables.
Package 2 –
You can add tasks in parallel, but better not to use more than 4 parallel tasks. You can add these tasks in sequence, like 4 in parallel then other 4 after completion of the first 4 files.
Use a for each loop to fetch 'exceptional' files and you can use parallel load according to the number of files you are receiving. But, don't use more than 4 parallel tasks as it will hamper server performance.
Package 3 -
Create a variable for cut-of-date and make it configurable.
To purge, it will a SQL task. you can use delete option only.
Where should I keep the path for the folder from where I need to pick the CSV files?
Store folder path in a variable and make it configurable using SSIS Configurations (file or table).
How should I deploy and schedule package 2 and 3?
It is up to you, whether you want to use SQL server or file system. I would suggest file system as it is easier to maintain.
Can anybody please advise what are the efficient/standard ways to maintain such packages?
I am not sure what do you want to maintain? These are 2 basic SSIS packages.
____________________________________________________________
APApril 24, 2015 at 4:39 am
Anshul Parmar (4/24/2015)
Package 1 –
It is not a good practice to create a database and tables in SSIS packages.
I'll suggest if it is an initial load, create all databases and tables manually and use SSIS to load files into these tables.
Package 2 –
You can add tasks in parallel, but better not to use more than 4 parallel tasks. You can add these tasks in sequence, like 4 in parallel then other 4 after completion of the first 4 files.
Use a for each loop to fetch 'exceptional' files and you can use parallel load according to the number of files you are receiving. But, don't use more than 4 parallel tasks as it will hamper server performance.
Package 3 -
Create a variable for cut-of-date and make it configurable.
To purge, it will a SQL task. you can use delete option only.
Where should I keep the path for the folder from where I need to pick the CSV files?
Store folder path in a variable and make it configurable using SSIS Configurations (file or table).
How should I deploy and schedule package 2 and 3?
It is up to you, whether you want to use SQL server or file system. I would suggest file system as it is easier to maintain.
Can anybody please advise what are the efficient/standard ways to maintain such packages?
I am not sure what do you want to maintain? These are 2 basic SSIS packages.
This is mostly good advice, but I don't agree with all of it 🙂
1) Parallelism. Determining the optimum number of tasks to run in parallel depends on complexity / hardware / concurrent load. Four is a good starting point, but the absolute "do not use more then 4 ..." advice is not necessarily correct.
2) Configurations. As this is VS2013, there is no longer any need for file or table configurations. This can all be handled in SSISDB environments.
3) Deployment. Development should be done as a single project and deployment is then performed via a single ispac file to SSISDB.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
April 24, 2015 at 4:54 am
Agreed.
If it is VS 2012 or greater then we've project deployment available. So, that will be the best approach.
Also, no need for configurations ...as we've Environments feature available now.
____________________________________________________________
APApril 24, 2015 at 4:57 am
Hello Anshul,
Thank you very much for the quick response.
Package 1:
Please know, the client has many customers who will be supplying such bunch of CSV files to them.
Based on the version number and client name we will be creating the database.
Also, there will be so many columns in each CSV file that they would like to create "as is" in respective table.
So, it is their primary requirement to use SSIS package to generate the database and tables dynamically.
But, yes this will be a one time job per client.
Package 2:
For this package I need to read the CSV files again in for each loop container one by one, extract the SQL table name from the file name, and insert data to respective table. The same thing will happen for all 48 CSV files while for remaining 2 files, it will truncate the table first and then perform insert.
So, I'm wondering how would I add the parallel tasks here? Which type of task(s) can help me in achieving this requirement?
Package 3:
Do I need to store cut-of-date in configuration? If I schedule it for 6 months, how would that date get automatically change for 6 months period? Please advise.
April 24, 2015 at 5:35 am
Hello Phil,
Thank you for the response.
After reading the following blog,
and you suggested,
I require to create one project along with the 3 packages in it.
The folder path from where I'm going to read the CSV files should be configured as project parameter.
This can be changed at SQL server level with environment variable.
Is this correct?
One question:
1. For package 2 can I use Bulk insert task. According to my knowledge bulk insert task does not support logging of data (if any of the rows in CSV file gets failed during import, it cannot be logged).
Can Package execution logs help in that context?
Also can you please revert on other questions that I've asked in previous message. Thanks.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply