Getting Data from MSExcel

Question

Getting Data from MSExcel

cmbrooks08

SSC Eights!

Points: 861
More actions
April 27, 2009 at 11:03 am

#199749

All-
I have xls files put in a share that I need to access with SSIS. The formatting of the excel files is weird. The first 8 or 9 rows are all headings or random text, but about the 10th row starts the data I actually need to extract. The problem is that when I try to use the import export wizard, SSIS thinks the datatypes of all the columns are strings, bc the numberical data I need doesn't start until about the 10th row. That results in the import/export wizard thinking all of the numerical data is nulls, so I can't use that option.
Anyone here know any ways to just extract the data that I need?

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

cmbrooks08 SSC Eights! Points: 861 More actions · Answer 1

Do I have a problem with the way I post information? I've probably posted 4 or 5 things in this forum that get views, but never any replies. Is there something I should do different or some format I should use?

MRitch Ten Centuries Points: 1181 More actions · Answer 2

Follow the link at the bottom of my post for help on getting help. I am sorry I don't have an answer on your Excel issue.

For help on getting help, follow the link below:
http://www.sqlservercentral.com/articles/Best+Practices/61537/

cmbrooks08 SSC Eights! Points: 861 More actions · Answer 3

Thanks...I had seen/read that post and made sure I was following that advice. Maybe my questions just suck, though I thought they were relevant and probably stuff others would need help on.

MRitch Ten Centuries Points: 1181 More actions · Answer 4

Sounds to me that you will need to prep you excel files by stripping the useless data from the beginning of the file and then maybe create a standard set of column headers to place before the data rows. Then I bet you could get an SSIS package to run. I think you can set the data type in the wizard during creation of the package.

I bet someone smarter than me could help you with a script that could trim the unwanted data rows from the beginning of the file, as long as the data is consistent.

For help on getting help, follow the link below:
http://www.sqlservercentral.com/articles/Best+Practices/61537/

cmbrooks08 SSC Eights! Points: 861 More actions · Answer 5

That's what I've been doing as a workaround. The problem is that we're working on building a system that needs to be fully automated by July, meaning that I need to figure out a way to automate that workaround. I was thinking about/have been researching the script component, but am not really sure what I should be looking for.

Any other suggestions?

stevefromOZ SSC-Forever Points: 43646 More actions · Answer 6

I know on the flat file source you can specify to ignore/skip n lines, where you get to set n. Is this on the Excel source as an option? If you manually trim the top n from the file and then use it as the template for the connection, the datatypes should read ok. You'd then have to have something like the 'skip/ignore n rows' I mentioned above to work in conjunction with this too.

HTH,

Steve.

cmbrooks08 SSC Eights! Points: 861 More actions · Answer 7

Steve-

Thanks for the reply. As far as I can find, the Excel connection manager doesn't have that option. It just allows for choice of file, and that's about it. I'm thinking maybe I need to use OpenQuery (or OpenRowSet...I can't remember what its called). I just want to see if anyone has any other ideas before I jump into having to learn that for this project.

stevefromOZ SSC-Forever Points: 43646 More actions · Answer 8

I know it sounds like an obvious question, but any way to get your supplier of the file to mod their output? Let's assume that you work through OPENQUERY or use the script task, what consistent indicator are you going to use to determine that you're now at the data part of the file?

Steve.

cmbrooks08 SSC Eights! Points: 861 More actions · Answer 9

Getting the supplier to mod the file would be nice and solve tons of problems that we have, however, they established that we will adapt to the data as they provide before I started on the project.

As far as a consistent indicator goes, I'm not exactly sure what I need. I can use a row number, as the headings are the same everytime, they just last for about 10 rows. I can probably also use the first row of numerical data, as well. What qualifications do I need to meet to use something as a consistent indicator?

stevefromOZ SSC-Forever Points: 43646 More actions · Answer 10

The row count would/should be ok, i'm just anal about the difference between

about 10 rows

and "always 10 rows" 😀

Just FYI, this link may offer some insight on using the script component of SSIS with Excel files, seems to be all about opening, reading etc.

Steve.

cmbrooks08 SSC Eights! Points: 861 More actions · Answer 11

I should have been clearer. It is always the same number of rows, I just don't have the file open in front of me to say the exact number. I just meant its about 10 rows as opposed to like 25 rows in case that would make a difference. If the row number is an option, then I can use that. I'll take a look at the link you gave me. Thanks a lot for answering!!

ricky-427945 Grasshopper Points: 21 More actions · Answer 12

you could use the sqlcommand within the excel data source something like select * from [sheet1$], i think you should be able to get the data in that way.

tony.hayes SSC Veteran Points: 236 More actions · Answer 13

Could you use the ISNUMERIC function in SQL to test whether the cell contents are seen as numbers? For example, if all the cells in the first row of data have numeric values you should be able to use WHERE (ISNUMERIC(CELL1) = 1) AND (ISNUMERIC(CELL2) = 1) AND so on etc... I admittedly have not built a lot of SSIS packages, so this may be more difficult than it sounds...

ManicStar SSCoach Points: 17992 More actions · Answer 14

If i understand your problem, the solution is pretty simple.

Use a select with a range:

SELECT F1,F2,F3,F4,F5,F6,F7,F8,F9

,f10,f11,f12,f13,f14,f15,f16,f17

,f18,f19,f20,f21,f22,F23,F24,f25

FROM

[Sheet1$A7:Y9000]

Set the range to the first valid row through the max of whatever possible last valid row might be.

Bad data skipped over.