suggestions for extracting values from specific Excel cells

  • I have a spreadsheet that I want to extract values from specific cells. The goal will be eventually pivoting them into a dataset for insert. The trouble is, how do I extract based on the position (E8, for example)? All the tools seem geared towards assuming a set of columns * N rows, and making it a dataset.

  • This was removed by the editor as SPAM

  • anyone?

  • I'm not sure what you mean by "all the tools" and where you're coming from on this. (I imagine other people are in the same boat, reading your message.)  IAC there are lots of ways to handle this from the Excel side in scripting, and how you reference the cell depends on what you know about the spreadsheet.  (I typically put some global constants in place at the top of the file, so the user can change these things if needed, if they really just want to reference cells that are completely arbitrary.)

    If you are working on this from the SQL Server side, and linking to the spreadsheet using the Jet driver or whatever, you have to do it a bit differently.  Again, how you deal with a specific cell depends on what you know about the placement of that cell.  But it will be difficult to deal with it from the SQL side if the spreadsheet isn't regular in format (something like tables, whether you need the tables or not).

    So... more information would probably help you get answers... what side are you working from? And what additional information about the spreadsheet can you provide?

    One way you might want to think about this, which would allow a completely irregular format in the source xls, would be to save it as spreadsheet XML (SSML, in Excel 2003 -- it's quite different in Excel 2007, although still do-able).  You could tag those cells in Excel, similar to creating a bookmark in Word, and then shred the XML with appropriate XPATH on the server side, looking for your significant tags.  This is actually quite a nice way to consume Excel; SSML is pretty easy to handle and there's much less of a hit than linking to the spreadsheet as a datasource to attempt to investigate its contents.

    HTH,

    >L<

  • Ok, the spreadsheet has multiple pages; each worksheet has a different "form". The idea is that folks would be able to distribute the file to companies to fill out. Then, upload them to the server, which would read in the data, and put it into a database. So, I guess I'm working from the DB side - I don't want to alter the spreadsheet. I guess what I'm frustrated by is trying to create this SSIS package - all the data flow tools seem to want input columns, nothing lets me name coordinates. The XML sounds like a good idea, I think, but I don't know how to set that up, either. If I add an Excel source and write it to an in-memory recordset, I can't simply point it to an XML file.

  • I just wrote you a *long* reply with examples and this (*&&!!!#@*(& forum lost it... not the first time <sigh>. I'm overseas in a hotel on a borrowed connection, so can't re-do the whole thing, but will try to re-cap:

    I was thinking that the Excel would be Saved As Spreadsheet XML (that's an option on the menu in 2003) when the "forms" were filled out.  You'd probably consume the XML document into an NTEXT field or something to start, and then XPath query the document for the values you needed, from there.

    XPath is a lot more flexible than SQL queries, when you have many axes or un-regular table, and is a good fit, although I'm sure there are other non-XML ways to do this.  The multiple worksheets in the workbook are no problem; you can address a particular worksheet in a workbook like this /ss:Workbook/ss:Worksheet[3], or query across worksheets if you want to.  You can of course address a worksheet by ss:Name attribute (tab caption) rather than by index.

    Within the workbook, I suggested you do something similar to Word bookmarks, this would use Named Ranges (or named styles if you prefer, but Named Ranges in Excel will probably work better for you).

    The named range will have a node like this in the XML:

      <NamedRange ss:Name="MyTotal" ss:RefersTo="=Sheet3!R2C2"/>

    ... and, because Xpath works along multiple axes in one query, you could address that node if you needed to know the absolute reference to the cell. But you probably wouldn't even need to do that.  The cell itself will look like this:

    <Cell ss:Index="2" ss:StyleID="s21">

    <Data ss:Type="String">test</Data>

    <NamedCell ss:Name="MyTotal"/>

    </Cell>

    ... and the Xpath can be like "give me the value of the Data child of a Cell that has a NamedCell child with the ss:Name attribute having the value 'MyTotal'".  And you can ask for the first such value, a collection of the values, you can restrict the values you receive to a single worksheet (by name or index), or across worksheets.

    Are you getting ideas?

    I know I wrote, as caveats, that all this is different in 2007 but 2007 creates SSML too and it is a very straightforward (although verbose) format.

    Regards,

    >L<

  • Alright - that sounds good. Now, I'm not an expert in XPath. Does this sound right (if I name a range of cells "Annual.Part_time"):

    Cell[NamedCell[@ss:Name="Annual.Part_time"]]/Data

    Have you worked in the SSIS designer in VS05?

  • >>Have you worked in the SSIS designer in VS05?

    Unfortunately somebody else will have to give you advice there, not my area at all.  I'm kind of interested and would have worked out a fuller example if I could have gone in there and whipped something up, but I'd probably do it a novice way and it's better if somebody else chimes in with some appropriate steps for you.

    On your xpath thing, remember that you're in a namespaced schema.  So you will probably have to preface each of your element names in the query like this: ss:Cell (even though it doesn't show that way in the output, it's a child of a namespaced element, so it belongs in that space -- attributes are different in this respect).

    If it bothers you to use the alias (ss, because it *is* an alias (although Excel always seems to use that one, it might someday for some reason use a different alias I suppose!) and if you know for a fact you're always working in the SSML schema, you could wildcard the alias instead, like this: *:Cell.

    Also, you need to reach into the tree a bit more explicitly, by drilling down from the root (/ss:Workbook/... or whatever). 

    You can do this easily, because you know exactly where you're going, and you should. Beginners often make the mistake of using // to indicate "give me this value no matter where it exists in the tree".  That's the xpath equivalent of SELECT *.  It does not belong in a production app and FWIW probably hurts performance a lot more than SELECT * in many cases.  You're basically telling the query to look along *all* branches of the tree for something that matches your chosen Cell definition.  (I suppose that it might refer to the XSD and realize that it only needs to follow some paths, but that's up to the individual parser.)

    Again, I'm not in a position to play with this and expand the idea fully right now. I'll be back from this trip late next week, could kick this around some more if you get stuck on any of the other books.  I *am* interested.

    >L<

  • So, like this?

    /ss:Workbook/ss:Worksheet[@ss:Name="Form C - Data Form"]/ss:Table/ss:Row/ss:Cell[ss:NamedCell[@ss:Name="Annual.Part_time"]]/ssata

    (sorry to seem dense, I've never used XPath)

  • Looks about right, except for the "smileys" that the forum software appears to be adding into our :'d expressions occasionally...

    I don't have anything in front of me to check.  But if you find you can't get it to work and you e-mail me what you've done so far (and at least a limited example) I will either be able to verify that the issue is your xpath expression, and fix it, or possibly suggest what else might be wrong if that part is okay.

    NB:  I may not have web access much for the next week.  Email if you need something, and don't be worried if I don't answer immediately.

    And you're not dense. You're doing fine, and xpath really is different from SQL. 

    Just think: you're still only using one axis (the default axis, moving down through children of a node). Your mind may eventually start to spin with what's possible here <g>. 

    For example, imagine you have a leftmost cell with a label, such as Totals.  You might reference that cell and then look for all its following siblings, rather than drilling down in children as you've done so far, and as feels comfortable coming from a SQL perspective  This could get a set of totals for different significant columns.Then you might need to know what group these totals belonged to, which you could do by referencing the first child of this cell's nearest ancestor.

    Sure it's not "pure" or relational from a coding perspective; instead, it mirrors a document of lumpy structure that the user insists on using to store, input, and share, complex data. 

    I personally think that Excel, in the words of somebody I respect but whose name won't come to me at this moment, is the "crack cocaine of enterprise data".  It's dangerous practice in all sorts of ways. But sometimes you have to do something as specified by all kinds of external requirements, and "do it in Excel" is one of them.

    I would do everything in my power to find out if there was a way around that requirement.  But, if I couldn't, I would use xpath, which is a non-sql syntax for "lumpy" data structures or documents, to handle what I needed to handle.

    I don't mean to get all philosophical about this, and we can keep the discussion down to your practical example if you need help.  I guess I'm feeling a bit defensive about recommending this strategy in this particular forum. I've seen a number of people get quite heated here on the subject of why, or whether, XML is ever "a good idea" in the SQL Server context. I don't want to get into a bunch of arguments about this, and if anybody has a better idea for your scenario, I'm listening.

    >L<

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply