July 23, 2007 at 11:49 pm
Hi All,
I have a excel workbook with many sheets, in each sheet the first row has to be skipped and the second row contains the column information and thereafter are the records.
The Excel Source in SSIS just gives an option: check if the first row has column names.
But the first row for me is junk -- a link to parent or first sheet-- and has to be skipped and the second row has the column info.
How can this be accomplished .... any suggestions would be of great help!!!
Sample:
id | desc | price | date |
1 | apple | 1.0 | 1/1/1900 |
2 | banana | 2.0 | 1/1/2000 |
Main in the first row is actually a hyperlink ... once we click this it takes us to the first sheet in the workbook which has all sheet names as contents.
I am stuck and not sure how to skip this!!
July 24, 2007 at 1:59 am
I'm not sure there is a native way to do this as the jet drivers have the "first row is header" switch and I don't know that you can change it.
I would say set it up so that there are no headers and manually rename your columns to suit. Then, using a row number (http://www.sqlis.com/93.aspx) you could have a conditional split and ignore anything with a count of < 3.
Cheers,CrispinI can't die, there are too many people who still have to meet me!It's not a bug, SQL just misunderstood me!
July 25, 2007 at 5:50 am
Bad news. There is no good way to do what you want to do save editing the excel workbook itself and deleting that first line. Now if the first row were just column headers, there is a checkbox for that, but it would read from the second line on.
Now, I haven't tried this, and it might be more work than the effort is worth, but here's an idea. Read the entire spreadsheet into an ADO Recordset, delete the first (and possibly second) row and then manipulate from there.
Don't know if the above suggestion is possible. I personally would go with the manual deletion of the first row and work from there.
July 25, 2007 at 8:36 am
Is this a static workbook? Or periodic processing that has to be imported (daily? weekly? monthly? hourly?)? Is there any way of arranging to have the source of the workbook place the link in some other place? Or perhaps it was done deliberately to circumvent what you are trying to do?
Brandie's recordset concept will work, but it sounds like no matter which way you turn, the way the workbook is built makes LOTS of extra work. Try to get it resolved at its source, if you need to do this more than once to a workbook.
July 25, 2007 at 11:59 pm
In the date column (column 4) what is SSIS detecting for a data type? If it is DT_Date the first and second rows (empty row and row with date header) might be recognized as nulls. Then you could use a conditional split and use isnull(date) to exclude the first two rows. Any subsequent rows with date values would pass to the next task.
July 27, 2007 at 1:03 pm
The exact problem is:
I have a excel workbook with each sheet having a table, the number of columns in the table vary. I some how managed to remove the first row which has nothing in it.
I now have the worksheet name as the tablename in the database with same schema except that the datatypes vary.(eg: text in excel file : varchar in sql)
say if there are 10 tables in workbook , i have the same number of tables in the database.
i am looping over through each sheet and trying to insert the appropriate table info to the matching table in the databse.
The problem now is i am not able dynamically change the metadata of the data flow column mapping.( excel file table --to-- sql table ), as there are few tables which have 4 columns and there are few which have 5 , 6 columns.
I get an error which says external metadata source dosent match.
Also i found in an atricle that we cannot change metadata dynamically,if that is correct is there any workaround for this.
Can some body help me in figuring where i am going wrong
July 27, 2007 at 2:05 pm
IF the number of worksheets in a workbook is stable, or at least the maximum number is stable,
IF you can guarantee that, for the same target table, the spreadsheet will be in the same format,
I believe you should be able to build a case structure which will eliminate the need for dynamic metadata, as long as you can reference the correct metadata for each worksheet.
Am I missing something here? I've not done anything like this, so I'm purely in a conceptual zone on a Friday afternoon (i.e., disclaimer...)
July 27, 2007 at 2:53 pm
The work book gets refreshed every night. The number of sheets in the workbook are fixed.
I use expressions & variables to map to the sheet name (which is the table name as well) to source and destination connections.
and then map columns!!! ... say for a table X i have 4 columns and i map it to the same table X in the destination.... initially
Once the loop starts
At run time the source table name changes to Y (having 5 columns) and the destination changes to Y as well .... but the column maping dosent dynamically change and the engine still looks for old meta data(column names) and is throwing an error saying destination medatdata is not syncronized with source.
This was possible in DTS ......... i am not sure how to implement the same in SSIS!!!!!!!
July 30, 2007 at 4:08 am
Dinesh,
What you need is a separate data source for each individual sheet. If the max # is fixed, then you're fine. Just create a source for each possible one and have each source only ever point to one. Then you can change your metadata according to the number of columns in each sheet and according to the datatypes of each sheet.
Yeah, it's a bit of a pain, but it's the only way to make this work for you.
July 30, 2007 at 5:18 am
You cannot change the meta data of a data flow at run time.
Why not have a loop that loops over the sheets in to file, each loop will execute the same data flow but have a conditional split in the DF that sends the rows to the correct table?
Cheers,CrispinI can't die, there are too many people who still have to meet me!It's not a bug, SQL just misunderstood me!
July 30, 2007 at 5:44 am
Opps. I guess I didn't make myself clear.
What I meant to say was that each source can have it's different column count (use the max # of columns you think you'll need for each sheet) and if the # of columns happens to be less during one run, then you should just import NULLS for those "extra" columns and can ignore them during run time.
Better? @=)
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply