Database with odd usage pattern.

  • Hi,

    I am designing a db that will have a rather odd pattern of usage - well unusual by my standards at least!

    Perhaps once or twice a year it will be hit with 50-5000 queries in the space of 30 mins at other times it may be 1 or 2 hits a week. Each query will extract between ~500 and ~2500 rows of data. Each request will return a single char(15) field. Each result set will be sent via XML/SOAP in batches of no more than 300 records to a third party.

    Limitations:

    At this stage each request has to be handled seperatley i.e. i can't batch them up.

    My initial leanings are towards creating a table variable containing the result set for each request and then extracting the records with a loop 300 rows at a time until all the records have been sent for that request.

    Does anyone have any advice for the best way to tackle this problem with regard to table variables, temp db growth etc

    I've not tried anything like this before so i'm unsure of what constitutes normal/acceptable practice.

    I'm open to any suggestions 🙂

    Thanks

    K.

  • I think I'd do something similar to what you're doing. I might create a real table, populate with the data + SPID perhaps, extract batches of 300 from there. Maybe mark the ones sent so that if it blows up you don't restart.

    The big thing will be capacity of the hardware and tuning for this process. You'll have to either plan for and pay for peak usage, or go with less and accept slower response during the peak.

  • I'd be cautious about using the temp tables because it's going to smack tempdb pretty hard. Since this is basically a paging query (give me page 1 (records 1-300), page (301-600), etc.) why not use the methods outlined by Itzik Ben Gan in TSQL Querying. First, if possible, make sure you have a covering index on your source table. Then run the query similar to this:

    @Anchor int -- starting point, should be 1 at the beginning, 301 next, 601, etc.

    SELECT TOP (300) x.MyCol

    FROM dbo.MyTable x

    JOIN dbo.MyTable y

    ON x.Id = @Anchor

    and x.Id > a.Id

    Order by x.id

    It's extremely effecient with index seeks & nested loop joins assuming you have good indexes in place. That should scale better than trying to maintain a bunch of temporary tables, even though you're requerying your base tables over & over again.

    There're more details available in the book. Itzik included additional join material (which is why that sample above looks a bit sparse) as you probably will in real life.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Hi,

    Thanks very much for the replies.

    Steve, there's defintely going to be a great deal of testing done on this particularly to make sure we've got sufficient bandwidth.

    I do like the idea of having a record of which records have been sent in case something does go pop mid-way through the process.

    Grant, that is certainly an interesting idea and one that hadn't occured to me. If my understanding is correct a

    poorly indexed table might force the optimizer to use a hash join which would hit the tempdb anyway and that

    nested joins also reduce the potential for locking contentions? I've read quite a few articles now from Itzik Ben-Gan

    he always makes me realise just how little i really know, i'm just about to order that book it looks just the ticket.

    As i said i don't have any experience of this type of solution. Do you have any feeling that if this was running at full bore 10 million rowsish it would be a highly intensive process with regards disk access/IO, CPU usuage etc. I'm trying to get a feel for what sort of hardware we should be specking.

    My db's up to now in SQL Server are infact ODS's where data loading is done overnight and cubes and reports are the sort of 10 million row aggregation varities.

    Thanks

    K

  • Don't try to compare what you know to Itzik. I've taken his advanced tsql class twice, read tons of his articles and his books and gone to his lectures at PASS. It leads me to the conclusion that if he's a DBA, I'm a mediocre ditch digger.

    But seriously, considering the amount of I/O you will have to deal with, anything you can do to reduce the number of writes will help, and maintaining temporary tables will add, not detract, in most cases. The thing is, you're really going to have to come up with test cases and then run them through performance and load testing. Then you'll know which way works best. Personally, I'd concentrate first and foremost on setting up the tables and the indexes, especially the clustered index, to optimize reads (since it sounds like you're collecting data over time, but then having intense bursts of reads). From there, you can figure out how best to pull stuff together. Also, I'd concentrate on making sure that your queries and the data remain in cache. Anything that might cause recompiles (like temp tables) or cache flushes should be avoided where possible.

    You've got work in front of you.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • LOL, point me in the direction of a shovel!

    I'm actually really excited about this, it's the first time i've written a fully transactional db in SQL Server. You're correct in saying that the data will come into the database slowly over time. I've done a fair amount of data/process modelling, my first calling is as a systems analyst so I feel pretty confident about the logical design of the db as it's actually very simple compared to the ERP solutions that i work with.

    When it comes to SS i've spent most of my time writing reporting systems so I've spent months reading everything i can on here and else where about transactional implementations trying to understand the nuts and bolts of SS but there's no substitute in my book for experience.

    I have had a couple of years now to get used to query/index tuning in SS and troubleshooting but there is as much as anything a confidence gap in my knowledge. I read blogs and posts along the lines of; what should you know for a Dev/DBA interview answer these 20 questions, and I surprise myself with how much I've learnt and then i make some fundamentally stupid mistakes. I guess that's part of the fun 🙂

    The added component is this is going to be the backbone of a new business so no pressure there then :w00t:

    Anyway suffice to say that i have a whole load of nasty testing planned for this system 😀

    Thanks

    K

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply