November 29, 2021 at 7:03 am
Seems like you could create a step in your SSIS package that groups all the files that start or end with the same prefix/suffix together. Then just use a For Each file loop, and you're all set. Or did I miss something?
November 29, 2021 at 9:49 am
The filenames are named with the groups there are in - Filename1_Group1.csv, Filename2_Group1.CSV, Filename3_Group1.csv; Filename1_Group2.csv, Filename2_Group2.CSV, Filename3_Group2.csv, and so on
When I create the dataflow task, I add the name of the file as the name of the OLEDB destination component. So when I go to e OLEDB destination->table load or fast view-> NEW, the SQL code is CREATE TABLE Filename1_Group1. I edit this step by adding an identifier column using IDENTITY (1,1)
Once I do this, the resulting tables in SQL are named the same as the raw file mentioned above. To answer your question as to how I identify, it is manual as the names of the file is the name of the sql table that is meant to be created. I can do these in batches, i.e, load all of Group1, followed by Group2, and so on
Hope that helps
I urge you to consider refining this process such that you do not keep creating new tables.
Instead, have a single table per group and add Filename (and any other useful attributes, eg, LoadedAt) as a column in the table.
If you do this, you can process all of the files in a single group using a FOREACH container in SSIS. With seven FOREACH containers, you could import all of the files to your seven group tables.
November 29, 2021 at 11:06 am
That would have been my preferred method as well. But the project has been going on for a few years and this is what has been followed historically, even before I started. I am afraid it might be too complicated to change it now and might cause more damage
November 30, 2021 at 1:38 am
The filenames are named with the groups there are in - Filename1_Group1.csv, Filename2_Group1.CSV, Filename3_Group1.csv; Filename1_Group2.csv, Filename2_Group2.CSV, Filename3_Group2.csv, and so on
When I create the dataflow task, I add the name of the file as the name of the OLEDB destination component. So when I go to e OLEDB destination->table load or fast view-> NEW, the SQL code is CREATE TABLE Filename1_Group1. I edit this step by adding an identifier column using IDENTITY (1,1)
Once I do this, the resulting tables in SQL are named the same as the raw file mentioned above. To answer your question as to how I identify, it is manual as the names of the file is the name of the sql table that is meant to be created. I can do these in batches, i.e, load all of Group1, followed by Group2, and so on
Hope that helps
Perfect. We're almost there. Can you attached the CREATE TABLE code that's used for each of the first 3 groups?
Also, you mentioned the "text qualifier" in your Step 3. Are these files comma separated and all columns are "double quoted" or just some of the "cells" are "double quoted"? And which version of SQL Server are you using? 2016 or less or 2017 or greater?
--Jeff Moden
Change is inevitable... Change for the better is not.
November 30, 2021 at 1:40 am
That would have been my preferred method as well. But the project has been going on for a few years and this is what has been followed historically, even before I started. I am afraid it might be too complicated to change it now and might cause more damage
Not to worry... Let's peel one potato at a time. See what I think is my final question in my previous post above.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 30, 2021 at 11:25 am
Perfect. We're almost there. Can you attached the CREATE TABLE code that's used for each of the first 3 groups?
The create table sql code gets automatically generated when I go to the edit property of the OLEDB destination-> New(table load or fast view). The only thing I do for every new table is add an ID column as shown below
CREATE TABLE FILENAME1_GROUP1 (
id_num INT Identity (1,1),
Column1 varchar (800)
Column2 varchar (800)
.
.
.
ColumnN varchar (800)
)
Also, you mentioned the "text qualifier" in your Step 3. Are these files comma separated and all columns are "double quoted" or just some of the "cells" are "double quoted"? And which version of SQL Server are you using? 2016 or less or 2017 or greater?
I believe some of the column could be double quoted which is why I specify the column delimiter. To be honest, I am not sure what the Column Delimiter field does and how adding ' " ' affects the table. I am using SQL Server 2019
The other steps I perform is I set the column length for each column as varchar(800). I believe this is to prevent truncation of certain text columns; and, in the advanced property of the FLAT FILE Source, I add the FileNameColumn in Custom Properties in the Component Properties tab
Hope that helps
November 30, 2021 at 10:07 pm
Edit: the sql version at work I believe is SQL Server 2016. As for the version on my personal machine it most likely is SQL Server 2019 as I only installed in the last month or so
December 1, 2021 at 5:02 am
Ok. Understood on the column names and the table creation.
Is the data in the files separated by commas or tabs?
Also, is there any chance of attaching one of the files or is the data proprietary or contain PII, etc? Totally understood if you cannot.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 1, 2021 at 9:21 am
The CSVs were created from Excel. I just save as CSV. I am guessing they are tabbed (they do appear in a table format)
The tables do contain confidential information so I cannot share them. However, I can dummy up some raw data, or provide a couple of adventure works csv files to simulate the different filenames. Would that do ?
Thanks
December 1, 2021 at 9:23 am
The CSVs were created from Excel. I just save as CSV. I am guessing they are tabbed (they do appear in a table format)
The tables do contain confidential information so I cannot share them. However, I can dummy up some raw data, or provide a couple of adventure works csv files to simulate the different filenames. Would that do ?
Thanks
If you open one of the CSVs in Notepad++ and then take the menu option View/Show Symbol/Show All Characters, you should be able to see exactly what the row and column delimiters are.
December 1, 2021 at 7:02 pm
The CSVs were created from Excel. I just save as CSV. I am guessing they are tabbed (they do appear in a table format)
The tables do contain confidential information so I cannot share them. However, I can dummy up some raw data, or provide a couple of adventure works csv files to simulate the different filenames. Would that do ?
Thanks
No need on the latter and understood on the former. Excel files that are exported as CSV files aren't actually true "Comma separated files". Rather, they're "Comedy Separated Files" in that only those cells that contain the delimiter as part of the data will be double quoted in the file. It makes for a bit of a train wreck for tools that use are capable of only using true CSV files.
It's not a problem though. It only takes a little, easy to do pre-proceessing.
Since the data is taboo for posting, can you attache one of the actual column header lines from one of the files without adulteration and without any of the data? I can probably do without that (I'll set the code up to figure out what is being used in actual files) but it would provide a confirmation for a test of that bit of functionality.
I know it seems like there's a lot but there's really not. I just have to make sure the "gazintas" are known. 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
December 2, 2021 at 10:02 pm
I think some of the columns may potentially have sensitive content or other identifying names but I will check that and, if necessary, dummy the columns or mask it before sending. Hope that will help
December 2, 2021 at 10:57 pm
Understood there, as well. No worries. It's not difficult for me to cobble some example tables. I was just going "for the real stuff" just to be sure.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 3, 2021 at 10:26 am
Thanks, Jeff. Really appreciate that!
December 6, 2021 at 9:19 pm
Sorry for not getting back sooner as I was away from work and didn't have access to files. I had a look and there aren't too many columns that I can share over here
The common ones are Employee_ID, FirstName, Last Name, Org_Unit, SessionStartdatetime,SessionEndDateTime, WorkSessionCode
I have masked some of the column names in the list. Hope it helps
Viewing 15 posts - 16 through 30 (of 52 total)
You must be logged in to reply to this topic. Login to reply