December 30, 2010 at 11:16 pm
New to SSIS.
Will someone kindly tell me what questions I need to ask about the data sources in order to select the proper type of connection from the connection manager drop down?
The data sources will be on remote servers, which I currently know nothing about. I will learn more in the near future but want to make sure I am equipped with the right questions to make the most of that interview.
Thx.
hxkresl
January 3, 2011 at 12:27 pm
Hi, I haven't received replies for this question. I am not sure if it is because I am not being clear enough about what I need, of if it is just too elementary (I am afraid it is). Still if someone could help me focus the question so that it can get answered I would appreciate it.
I have been given the scope of a project, and an outline of how many datasources I will be importing from. I do not yet know what kinds of servers or databases they reside in, I only know that in many cases they will be .csv files.
I will need to be able to create a connection manager to these sources. What would be the best questions to ask to determine how I will connect to these data sources?
January 3, 2011 at 12:53 pm
hxkresl (1/3/2011)
Hi, I haven't received replies for this question. I am not sure if it is because I am not being clear enough about what I need, of if it is just too elementary (I am afraid it is).
Or it might be that you posted it originally the day before New Years Day, a holiday in many places. π
Still if someone could help me focus the question so that it can get answered I would appreciate it.
I have been given the scope of a project, and an outline of how many datasources I will be importing from. I do not yet know what kinds of servers or databases they reside in, I only know that in many cases they will be .csv files.
CSV files are flat files, and you'll want the appropriate data source, then simply modify for the correct separators. The rest will depend exactly on what you're importing from. You're looking for information that isn't quickly explained. I personally would start with google: SQL SERVER SSIS "Data Sources" and go through the articles you find there.
I will need to be able to create a connection manager to these sources. What would be the best questions to ask to determine how I will connect to these data sources?
You don't have enough information yet, honestly, for anyone to answer the question. CSV files are simple, as mentioned. The rest, until you know where it's coming from, you can't know how to "talk" to it.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
January 3, 2011 at 1:23 pm
Thank you very much. It was a sanity check as well π
Oh....Happy New Year!
January 3, 2011 at 11:06 pm
OK, so let me take another stab at this.
I know that the sources will be .csv files. Please tell me what I should ask the owner of the data source to 'give' me in order to be able to connect to them.
Do I ask for an IP address? What other information, credentials, etc. do I need to ask for to get to the data they want me to import? Since they are not volunteering this info I assume the answer is somehow obvious and I should know what to ask for....
January 3, 2011 at 11:23 pm
hxkresl (1/3/2011)
OK, so let me take another stab at this.I know that the sources will be .csv files. Please tell me what I should ask the owner of the data source to 'give' me in order to be able to connect to them.
Do I ask for an IP address? What other information, credentials, etc. do I need to ask for to get to the data they want me to import? Since they are not volunteering this info I assume the answer is somehow obvious and I should know what to ask for....
You sound like you've been volunteered as the ETL developer. Lucky you. ETL: Extract. Transform. Load.
They're doing the extraction, and creating the CSV files. You'll have to do the necessary transforms and loads then.
This isn't something you just bring a checklist to, without knowing what your experience is. I could, for example, tell you to confirm the data dictionary. That may or may not mean anything when what you need to be informed of is "field name, size, type, and actual definition". There's a few things of that level.
The next part is that you then need to Load. Load where? To what? Do you need to re-organize the data? Is there cleansing that needs to happen to conform to your system? These aren't questions you ask the person providing the data, it's questions you ask the person receiving it... yourself in this case.
This will be abrupt, but you need to figure out more than just how to read the file. You have to know what you're going to do with it afterwards. Here's a baseline:
In general working with a new load schematic is an iterative process. Find out how the files are being presented to you (ftp, local folders, etc.). Find out if you need to do transportation of them. Find out what is in them. Find out where they're supposed to go. Find out what you need to do to those files to make them fit into their new home.
I wish you luck. I'm not trying to be mean but you might be over your head here if this is business critical. If it's not, take your time with it, go through each piece. You'll have more questions than answers. The technical questions come here with, after a googling and some BOL work. The business questions go to the database developers for the target, or the originators if you have further questions about the source data.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
January 3, 2011 at 11:58 pm
Thank you for your response. Regarding "I'm not trying to be mean but you might be over your head here if this is business critical". It is business critical, but I am only doing the ETL part. To me that sounds doable even if it is my first time. I have been doing my homework on configuring packages in SSIS and learning the various types of transforms. Am also enrolled in an on-line SSIS class which I've kicked into high gear.
I will be importing the .csv data into a SQL Server 2008 database and I will have access to the new data model on this machine.
Regarding "Find out how the files are being presented to you (ftp, local folders, etc.)", would you mind arming me with a way to ask this question? A sentence or two that I might approach the owner of the .csv file with? Would it be fair to have direct read only access to the data? Or is a data dictionary the closest to the data that I will get?
Keep in mind that these .csv files will be refreshed and the SSIS packages will have to run against these .csv files on a continous basis. It's not a one shot deal.
January 4, 2011 at 12:30 am
I think you may be over complicating this a little.
When the SSIS package runs, it will need to access the CSV files to bring in the data. This access can be as simple as just accessing any file on a network, or a little more complex (accessing files on an FTP server, for example).
Your easiest solution when building the package is to provide a UNC file path (\\server\path\filename) to each file you will be importing. This, in SSIS jargon, is called a 'Connection' and, once configured, it is this Connection that the package will use to do its work. If the files will be on your network, this is probably what you will do - so you just need to ask what the files will be called and what data they will contain and confirm that you have access to them.
You'll probably want full access to the source data, as it is common practice to move files which have already been imported to an archive folder of some sort, perhaps adding a date/time stamp to the file name in the process.
A bit later down the road, when you schedule the job to run automatically, you'll have a bit more work to do on file permissions to make sure that the SQL Agent service can also access the source files, but you don't need to get into that yet.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
January 4, 2011 at 12:40 am
^ Double what Phil said.
Sorry, I think my toothache is making me a bit grumpy... and possibly a little loopy... neither of which you care about. π
If all you need to do is transport the data, he's correct, and stated it more succinctly than I could. Thanks Phil.
One comment though: It appears you're approaching this like it's a one shot chance to talk to the people providing the files. It should, in theory, be the beginning of a running dialogue you have with them until the project is complete. Do you feel this won't be the case?
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
January 4, 2011 at 12:47 am
I think the most important questions for the initial interview are:
* where is the file located?
* can the location change over time?
* what are the columns and what data is to be expected
* can the layout of the file change over time?
* what is the filename and can it change over time?
* what do you want me to do with the data?
Ask for a mock-up file with test data that you can use to initially configure the package.
If that works, you can start on expanding the package to be more robust and flexible.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
January 4, 2011 at 12:54 am
Thank you for those answers Phil! (This facet of ETL isn't discussed in SSIS training, beyond configuring the connection manager - simple if everything is on the same machine). Since the connection part doesn't pertain to my long term learning curve of SSIS I wanted to get it out of the way. I'm eager to take a look at the data sources. Thank you!
January 4, 2011 at 12:56 am
Cool, thank you so much Koen.
January 4, 2011 at 12:39 pm
What is a good way to handle the situation if the .csv source data filenames have the potential to change? Do I ask that all source files have time stamps and be put in the same folder? Would a foreach loop and a modification to the Connection Managerβs ConnectionString property be a good way of handling the this situation?
Ideally I would only be thinking about the transforms, and I would like to get the connection issue out of the way.
January 4, 2011 at 1:07 pm
I would typically have them append the date/time to the file something along the lines of XXX_01012010 where the XXX does not change. If your source folder for the csv file is not going to change, but the file name is then you use a foreach loop container. Configure the Enumerator to be ForEach File Enumerator. Hard code the folder and in the text box for file to use the XXX from above plus a wildcard *.* ex: XXX*.*
You then would need to store the fully qualified name to a variable that you can use in your File Connection property to give it the current file and path to import. Just be sure you archive or delete the file when your done or the next time it runs you will be importing files over again π
January 4, 2011 at 1:31 pm
Thank you.
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply