VarBinary(Max) versus XML for over a 100 million records?

  • I have a unique situation where I am saving the formats of a grid cell. I will be having thousands of grids that have thousands of cells, so very quickly I will be dealing with about 100,000,000 cells. The table structure for the cell is as follows:

    grid_rowGUID

    grid_columnGUID

    grid_cellValue

    grid_cellFormat

    My question is - which is better, VarBinary(Max) or XML?

    Better in this case is defined as access speed, retrieval speed, and storage space. I only get one shot at this, so the feedback is super important!

  • What is a "grid cell"?

    What does your application do?

  • You might also be better looking at some other "serialization" mechanism so you get one table per grid rather than one big table as this is likely to perform better.

    Can you give us some more background?

    - James

    --
    James Moore
    Red Gate Software Ltd

  • I just posted and I cannot find the post...is this even working?

  • Some times the session times out and you lose the message.  If you had to hit post reply twice to get the post working then that is what happened.

  • Great...OK...here is my #2 try at replying

    Thank you all for answering

    In summary of what I am trying to do and what I am working with:

    I have an application that has been in development for 3 years and is almost finished. It uses SQL Server 2005 and a windows interface.

    This specific module is a cash flow forecast that operates much like an excel spreadsheet that has multiple sheets within it.

    A GridCell refers to a specific cell within a UI table display. I am using Syncfusion grid control for all of the loading and saving, formula and value management.

    A Template is defined as the following:

    1) Provides an average of 25 editable sheets per template

    2) Each sheet contains an average of 38 columns and 100 rows

    3) Is completely customizable with add, edit, delete of sheets, columns, rows, cells

    4) A template will be copied an average of 200 times per year for individual custoimization by the user.

    5) There will be an average of 25 templates per SQL database installed

    6) In one year, there will be an average of 19 million cells (25*38*100*200) or approximately 100 million after 5 years.

    7) A single template is dynamically created and can also be deleted.

    A Template Set within the SQL database is as follows:

    Template Table --> TemplateID (PK)

    Sheet Table --> SheetID (PK), TemplateID

    Column Table --> Column ID (pk), SheetID, columnFormatting (binary or xml)

    Row Table --> RowID(pk), SheetID, rowFormatting (binary or xml)

    Cell Table --> {RowID, ColumnID}(pk), cellFormatting (binary or xml), CellValue, SheetID (this makes the dataset management easier within the app)

    **There are other fields that store specific detail information, but not included for this discussion.

    columnFormatting, RowFormatting, and CellFormatting store the "skin" display but does not have any direct impact or use for calculations or value usage.

    --------------------------

    Application Use

    The application functions much like an excel spreadsheet, except that it allows users to pull data values from previous Balance Sheets, Income Statements, Deferred Taxes, and Owners Equity (all of these are template based and are working well).

    The big design problem/deviation was that now a template is not used except to jump start an actual cash flow forecast. Where in all other template designs, the template is a create once, use all of the time with the same structure (i.e. no copying or customization per use).

    -----------------------

    Solution Approach #1: Create individual tables per grid display

    Answer: If I create an individual template table for each forecast, then I would be dealing with more than a 25000 tables in 1 year (25 sheets/template *200 copies *5 tables).

    I could create single set of tables for each template created. So there would be 125 tables (25*5 tables) to manage. This is possible, but I am not sure if there is much benefit to the database management (select and update queries). And the storage of the formatting would be still be bad.

    ----------------------------------------

    Solution #2: Look at some other serialization

    Answer: There are only 2 different serializations avaiable to me: binary and xml. If I put everything into binary or XML at the grid level then I will lose the ability to retrieve individual cell level values. I am also with the understanding that I would not be able to perform efficient text searches within the binary/xml data object within SQL 2005.

    ----------------------------------------

    IF you guys think that a combined solution approach where I break out the tables for each Template Set (for use in managing the customization data), and then implement a full grid serialization/xml solution then I will pursue that approach. But it would be much better if there was a good solution that did not require me changing the current database structure since all of the app code uses it.

    Thank you tons for all of your feedback and help! I don't have anybody that have the necessary skills to discuss this particular design challenge - and your feedback and opinion is greatly appreciated!

    -Eric

  • You can search xml very efficiently in SQL 2005, if you use xquery and have the appropriate xml indexes on the column.

    What do you want to do with the data after its in the db? Will you be retrieving entire rows? Searching withing the xml?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I will be searching within the xml for specific values. In this case, whenever there will be a referencing formula (e.g. [Cash_On_Hand{pk199365}] * .25), I will be looking up the referencing text variable based on its primary key within the formulas.

    At other times, I will also need to pull back references based on column or row position.

    Would it be possible for you to point me to a good reference that would explain what you mean by "the appropriate xml indexes on the column". Are you referring to some index within the stored xml document? or the index within the sql table?

    I am reasonably skilled with SQL, but have no experience (until now) with xml inside of sql. I have googled tons and have a few xml books, but the true functionality of XML *inside* of SQL still escapes me.

    In the mean time, I will implement the combined approach of breaking out the tables and then using the xml storage object. I'll just cross my fingers that the sql won't die with large 5MB-15MB xml files.

    Let me know if you have any other ideas - Many HUGE Thanks for the feedback!

    -Eric

  • Would it be possible for you to point me to a good reference that would explain what you mean by "the appropriate xml indexes on the column". Are you referring to some index within the stored xml document? or the index within the sql table?

    SQL 2005 can create an index on the xml data. What that does is index the contents of the xml, shreading it for fast searches. There are two main types - a prmary xml index and secondary xml indexes. The secondary can be created on for searching on element, attribute or path. You'll need to use xQuery to query the xml.

    Look in books online for a good summary. Search for CREATE XML INDEX and you should get a good summary at least.

    I als don't have that much expereince with xml in SQL 2005. I've never needed to use it yet.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks Gail!

    I have been doing lots of reading based on your suggestion from your post a couple of days ago. It looks like it is exactly what is needed for situations like this.

    Even with all of the reading, I had not come across the basic information of primary and secondary xml indexes. Most everybody on the Internet is giving lots either very specific information or very generic. General sample based tutorials are hard to find.

    At this point, I am going to remove the cell table and post the entire grid as an xml object in the sheet table. Following your advice, I will create xml indexes for these xml objects and viola' I get to keep my job another year

    Thank you tons for the feedback! I learned something!

    Best Wishes

    -Eric

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply