April 29, 2009 at 9:10 am
For those that are using cursors or are considering using them for accessing various tables depending on some condition there is as I mentioned before the option to make a view that covers all the various tables. A special view can be constructed that is updatable and can be used to insert or modify data into the underlying tables. There are some rules to follow; I won’t bore you with them now 😉
The simplest method which is at least supported since SQL2005, but probably SQL2000 as well is demonstrated in the code below. See this as a teaser to investigate deeper into the things SQL Server offers to you. You are not required to program everything, let alone procedural using cursors 😉
-- First make sure you are in a database that is safe to work in!
--
set nocount on
-- drop demo view (gives an error the first time, ignore it)
drop view dbo.History;
go
-- drop demo History tables (gives an error the first time, ignore it)
drop table dbo.History2005;
drop table dbo.History2006;
drop table dbo.History2007;
drop table dbo.History2008;
go
-- Four history tables, can be located in separate databases if you like that!
--
-- Days in this example are encoded as an integer in the readable format YYYYMMDD
-- Real dates can be used but these cannot be updated trough an unified view!
--
create table dbo.History2005
(
day int not null constraint ck_History2005_day check ( day >= 20050101 and day = 20060101 and day = 20070101 and day = 20080101 and day = 20070101;
go
I just thrown this together in 20 minutes or so, and it just does what it says and is a purely synthetic demonstration of what is possible.
April 29, 2009 at 1:31 pm
Nicely done sir!
April 29, 2009 at 3:13 pm
sharad sinha (4/29/2009)
Approach 2:Use separate queries ...1 for each database...and then return the union of these resultsets...The problem with this approach is that the number of history databases keep growing and would result in a maintenance hell each time needing to add the handling for the new history database.
[font="Verdana"]We have a SQL Server 2000 system that does precisely this. It works okay, although not as well as partitioning does in SQL Server 2005 though.
It's not a maintenance hell, you just need to create the script that adds the new database, creates the empty tables within it, then modifies views that UNION ALL the tables across the history databases. In that way, the maintenance is all automated.
Not too hard to do, and if you get stuck, just post a question on the forums.
I see someone's also added some example code of how the finished products should look. 😀
[/font]
April 29, 2009 at 4:53 pm
Andy DBA (4/27/2009)
Obviously this will not help for cases where you have concurrency issues because you can't screen for that.
And that would make up the second most common of our errors, next to data issues...
Andy DBA (4/27/2009)
Once approach for concurrency problems would be to break the set of good rows into "chunks" and lump all inserts and the update which marks the rows as "complete" into the same transaction. If anything "errors out", the whole transaction will be rolled back. I'm sure you'll get lots of differing opinions on that approach, though. :w00t:
That's actually a solution we've been looking at. It's not as nice as having individual rows labeled with errors from a debugging point of view, nor is it as nice from the customer point of view when a batch of data is unavailable when only a single row was in error. Nonetheless, the performance improvements might be worth it.
Andy DBA (4/27/2009)
Hopefully set based updates won't present as big a target for blocking locks, but depending on your server and other concurrent processes, 1.5 million rows at a time might be a big bite to chew.
Unfortunately, it doesn't work out that way - we tried a single set based approach originally, and we almost always had the whole thing roll back due to concurrency issues. The set based version (which operated on the whole 1.5 million+ rows) ran for between half an hour and and hour before it would hit a concurrency issue and roll back. The cursor based version runs about 4 to 5 hours, and generally fails 20 to 30 rows for concurrency, which are later re-run successfully.
We may do a "batch" approach, in which we split the rows into groups of, say, 1000 or so, and then do set processing on individual batches. It's not the best solution, but it would work, and might buy us enough performance to make it worth it.
April 29, 2009 at 7:08 pm
greatheep (4/29/2009)
Andy DBA (4/27/2009)
Obviously this will not help for cases where you have concurrency issues because you can't screen for that.And that would make up the second most common of our errors, next to data issues...
Andy DBA (4/27/2009)
Once approach for concurrency problems would be to break the set of good rows into "chunks" and lump all inserts and the update which marks the rows as "complete" into the same transaction. If anything "errors out", the whole transaction will be rolled back. I'm sure you'll get lots of differing opinions on that approach, though. :w00t:That's actually a solution we've been looking at. It's not as nice as having individual rows labeled with errors from a debugging point of view, nor is it as nice from the customer point of view when a batch of data is unavailable when only a single row was in error. Nonetheless, the performance improvements might be worth it.
Andy DBA (4/27/2009)
Hopefully set based updates won't present as big a target for blocking locks, but depending on your server and other concurrent processes, 1.5 million rows at a time might be a big bite to chew.Unfortunately, it doesn't work out that way - we tried a single set based approach originally, and we almost always had the whole thing roll back due to concurrency issues. The set based version (which operated on the whole 1.5 million+ rows) ran for between half an hour and and hour before it would hit a concurrency issue and roll back. The cursor based version runs about 4 to 5 hours, and generally fails 20 to 30 rows for concurrency, which are later re-run successfully.
We may do a "batch" approach, in which we split the rows into groups of, say, 1000 or so, and then do set processing on individual batches. It's not the best solution, but it would work, and might buy us enough performance to make it worth it.
If you are on SQL Server 2005, this sounds like a process well suited to SSIS.
April 29, 2009 at 11:41 pm
Hi greatheep,
Something unorthodox....have you tried to introduce a separate table for keeping the status, with the new table forming a 1-1 relationship with the original. The idea behind it is to have as small a table as possible to run your query on, maybe move some other fields that are important to your decision logic as well as to effectively forming a vertical partition. Explicit vertical partitions have official support in SQL Server 2008 Enterprise edition I think, but even without that explicit support you should be able to make a 1-1 and get the benefits, it depends on your case.
I missed the structure of your table, so I cannot determine how much air your table has that can be squeezed out one way or another. At the moment your solution is likely to do a clustered index scan and that means the wider your records the more data must be read, slowing things down. Hence my idea to put the table on a diet 😉
Options are:
Vertical partitioning (identify groups of columns accessed together and split those off to a separate table)
Convert varchar fields to varchar(max) and configure your table to have it store the texts outside the normal page.
The usual options that optimize any table regardless of your current problem.
Just a thought, it might not make any sense in your situation!
If you got a small test case with some bogus data (a lot of bogus data actually) and some query to run against it I will do my best to solve it.
April 29, 2009 at 11:59 pm
greatheep (4/29/2009)
Unfortunately, it doesn't work out that way - we tried a single set based approach originally, and we almost always had the whole thing roll back due to concurrency issues. The set based version (which operated on the whole 1.5 million+ rows) ran for between half an hour and and hour before it would hit a concurrency issue and roll back. The cursor based version runs about 4 to 5 hours, and generally fails 20 to 30 rows for concurrency, which are later re-run successfully.
First, "Set Based" does not equal "Doing it all in a single query". But let's forget that for a minute. Are you telling us that you have a 1.5 million row table and all 1.5 million rows need to be updated?
--Jeff Moden
Change is inevitable... Change for the better is not.
April 30, 2009 at 2:01 am
greatheep (4/29/2009)
Unfortunately, it doesn't work out that way - we tried a single set based approach originally, and we almost always had the whole thing roll back due to concurrency issues. The set based version (which operated on the whole 1.5 million+ rows) ran for between half an hour and and hour before it would hit a concurrency issue and roll back. The cursor based version runs about 4 to 5 hours, and generally fails 20 to 30 rows for concurrency, which are later re-run successfully.
The last project I was working on was a migration project moving data from one corporate accounting system to another and also applying new business rules to the data in the process.
Each accounting period had about 1 million transactions, adding in the other bits which had to be moved resulted in data sets comparable in size to yours as the system was written to migrate 1 period at a time (There was a business reason for splitting the migration like this) if the data was going to be migrated all at once then it would have meant about 36 million rows of data in one go.
Steps taken were
Copy the data from one server to another
Split the data when necessary for the business this meant 1.5M rows became 2M
Renumber everything (keeping the old references in a spare reference column)
Apply new business rules
Check new business rules have been applied consistantly (This was written in parallel by another developer)
Fix or fill in any gaps where there was inconsitancies in the data.
Each period, ie 1.5 Million rows, took 12 minutes to process, with no issues due to concurrency, locking or anything else
It can be done!
April 30, 2009 at 10:30 am
Jeff Moden (4/29/2009)
First, "Set Based" does not equal "Doing it all in a single query". But let's forget that for a minute. Are you telling us that you have a 1.5 million row table and all 1.5 million rows need to be updated?
I agree completely! Usually you have to weed out gross data errors first and work your way down to more specific ones. eg. if you're checking an input date to see if it's in range, you first need to weed out records that don't have a valid date and possibly mark them with their own error type.
Also, a looping solution that breaks set based inserts into 1,000 or 10,000 or whatever rows at a time is not RBAR!
My understanding is that greatheep needs the 1.5 million rows (the good ones) to be broken up and inserted into other tables. It does sound like it might be a good candidate for SSIS, but not sure that alone will help with concurrency errors.
Without knowing more about the concurrency errors or other details, it's hard to offer suggestions. Are they lock timeouts? Are you getting duplicate PK errors? Is the source data flattened? By that I mean is it possible for pieces of one input row to get inserted into multiple rows of a destination table? Is your input table just one giant text column, or have you broken it into discrete indexed elements?
I apologize for stating the obvious, but I would focus first on optimizing the inserts. Is the bulk of their execution time due to concurrency delays, or is it about the same on a quiescent test system? You should be able to get the final target table inserts to run very fast on a quiescent test system if you do some prep work first.
For example, if the insert statements have joins on the source table (eg. "...where dest.col1 = SUBSTRING(source.col1, 10,4)...") that are not supported by indexes, you are likely incurring full table scans of your 1.5M row table.
I apologize if you've already done these, but I suggest taking the following steps to do as much prep work as possible prior to inserting into "shared" tables. (Hopefully others will add to this.)
Put an autoid pk on the input table.
Run queries to mark "bad" input records. (either updates to the input table's error column or inserts to a related error table).
Insert "good" input records into an intermediate staging table with correct data type columns for each discrete piece of data and with indexes on columns you may be joining or filtering on.
At this point, you should not have run into any locking conflicts and all the resource intensive data conversion and indexing should be done. Now you should be able to run lightning fast inserts into the shared target tables, hopefully minimizing the chance for blocking locks.
Finally, it may be that you need to do some work on the other end and optimize processes that the other blocking users are running. No matter how efficient your inserts are, you're going to have concurrency problems if there's zillions of other processes constantly tying up your target tables.
Hope this helps!
April 30, 2009 at 1:18 pm
jwheeler (4/27/2009)
Then surely the manager hasn't done their job by employing the wrong tool for the task?If you have database development to be done you either employ a db dev or a C# / VB developer who is also a SQL dev. You don't buy a hammer to put a screw in the wall, surely?
No, that's wrong. For at least three reasons:
First: A competent manager hires people who can learn, can think clearly, can communicate, and are willing to work hard. These characteristics are what counts if you are hoping to employ someone long term. Specific expertise is useful if you have something that needs doing now with no time for a new recruit to do any learning, but even then is probably less important than the four characteristics I mentioned - and a competent manager very rarely gets into that mess. (I've been there and it's a pain - I avoid it as much as possible!) Of course some managers are looking to hire someone, use them briefly, and get rid of them so they don't care tuppence for the qualities that make a good long-term employee - maybe you want to work for one of those?
Second: How do DBAs/database developers learn SQL, if they can't be hiired to work on databases until they are already experienced in SQL? You surely don't imagine that a University can or should teach students a trade instead of teaching them how to think, how to learn, and a lot of theory?
Third: Many people with SQL experience have poor (or worse) SQL experience; they love writing nested cursors instead of inner joins, they don't understand why some big and nasty operations may need to be chunked, they haven't a clue what as isolation level is and are therefor very dangerous anywhere near a production system with interactive users, and think testing is a QA function nothing to do with development. This of course isn't just an SQL problem: it applies to programmers/software engineers generally (I think C++ programers are generally worse at C++ than SQL programmers are at SQL) - at least in the UK and in the parts of the USA that I have experience of - and makes recruitment an utter nightmare unless you are willing to take on people who, although they don't have experience in a partcular tool or language, are will to lear and capable of doing so..
I came to work with SQL (decades ago) from a background workin ina Machine languages, Assembly languages, Lisp, Wisp, Fortran, Algol 60, Algol 68, Coral, Autocode, and Prolog; I'd written interpreters, compilers, file systems, bits of operating systems, bits of development environment, information retrieval apps, lots of other apps, and hardware test programs; I'd worked both in industry and in academia, done modelling and discrete event simulations, learnt how to apply queuing theory to to data comms software design, and done all sorts of things with computers. But not only I'd never worked in SQL (or Sequel or Quel or...), I'd never worked with any RDBMS. Was the manager who decided I was a reasonable person to put on a database project (because he though that with that background I had proved I could learn new tricks) crazy, by your book? If so, I'm crazy too!
Tom
April 30, 2009 at 1:24 pm
Samuel Vella (4/27/2009)
Jeff Moden (4/27/2009)
Heh... in SQL Server 2005, even that can be done without a cursor. We're just gonna have to wait for it in Barry's series of articles.
OK it *can* be done without a cursor... I have seen some dirty work arounds to get stored procedure calls inside select statements using linked servers but they need certain security features turned on and would still end up as a row by row operation when processed within the database engine (not to mention all the other overheads associated with opening new DB connections) so no performance benefit.
IMO a last resort technique (even after the possibility of SSIS has been excluded).
Why on earth would you need linked servers to do that? It' can be done simply by accumulating a string of commands (perhaps needing execute as) and then calling exec! Of course on SQL 2000 you may run into the infamouts 8000 limit, but this series is explicitly about sql 2005 and onwars.
Tom
April 30, 2009 at 2:37 pm
Tom.Thomson (4/30/2009)
Samuel Vella (4/27/2009)
Jeff Moden (4/27/2009)
Heh... in SQL Server 2005, even that can be done without a cursor. We're just gonna have to wait for it in Barry's series of articles.
OK it *can* be done without a cursor... I have seen some dirty work arounds to get stored procedure calls inside select statements using linked servers but they need certain security features turned on and would still end up as a row by row operation when processed within the database engine (not to mention all the other overheads associated with opening new DB connections) so no performance benefit.
IMO a last resort technique (even after the possibility of SSIS has been excluded).
Why on earth would you need linked servers to do that? It' can be done simply by accumulating a string of commands (perhaps needing execute as) and then calling exec! Of course on SQL 2000 you may run into the infamouts 8000 limit, but this series is explicitly about sql 2005 and onwars.
[font="Verdana"]This is another "Look Ma! No cursor!" approach. Of the two, I would rather take the cursor (actually, just a loop, no cursor). Then I can easily include error handling and so on.
Why do I say that? All you are doing with this approach is serializing the procedure calls with dynamic SQL (as distinct from serializing the procedure calls with a loop over a cursor). So you end up with a whole string of procedure calls, just as you do with a loop. No effective difference to the cursor, just a change in syntax.
So now I am looking forward to Barry's solution! On the other hand, comments were made against the first article about getting operations like mailouts out of the database, with which I agree. So SQL Server Notification Services/Reporting Services could be a better way to do this.
[/font]
April 30, 2009 at 11:14 pm
Jeff Moden (4/27/2009)
Samuel Vella (4/27/2009)
atoth (4/27/2009)
I still find I need Cursors when Sending out batches of Email notifications from the Server.I would love to replace this with set based handling but I can imagine how...particulary step 2
Adam
you're performing a procedural not a set based operation and the "per-row" function calls means that you will need to use a cursor.
Cursors do have their uses and this is one of them.
Heh... in SQL Server 2005, even that can be done without a cursor. We're just gonna have to wait for it in Barry's series of articles.
Barry, I am a little dissapointed with the examples that you used in your article. I mean, how easy is it to convert these examples to set base. Here is a cursor that I would like you to convert to set for me please. If you can do this I will accept the fact that since SQL Server 2005 cursors are no longer necessary
use AdventureWorks
declare @mychar int, @lentxt int, @name varchar(255)
declare test_cur cursor STATIC LOCAL FORWARD_ONLY
for
SELECT firstname+' '+lastname [name] from AdventureWorks.Sales.Customer
-- returns 847 rows
open test_cur
fetch next from test_cur into @name
while @@FETCH_STATUS = 0
begin
set @lentxt = LEN(@name)
print @name
while @lentxt 0
begin
set @mychar = ascii(left(@name,1))
print '"'+left(@name,1)+'" '+ltrim(str(@mychar))
set @name = RIGHT(@name,len(@name)-1)
set @lentxt = @lentxt-1
end
fetch next from test_cur into @name
end
close test_cur
deallocate test_cur
The reason I did this query is because a client phoned me and said that the search engine I created for them on their website is not working. When he types in a persons name which he knows does exist in the database he gets nothing. I checked the data and saw that this name has a double space in it. Easy, I thought, just replace the double spaces with single spaces. Not so easy because one of the sapces was not a space but some special character. Hence my query to see if I can find that character. The character, not in this data set, was a char(160).
:-PManie Verster
Developer
Johannesburg
South Africa
I can do all things through Christ who strengthens me. - Holy Bible
I am a man of fixed and unbending principles, the first of which is to be flexible at all times. - Everett Mckinley Dirkson (Well, I am trying. - Manie Verster)
May 1, 2009 at 12:19 am
Thanks a lot, peter.
Sorry for replying back to you so late...but your reply surely helpedl...And yes, we are using SQL Server 2005. Will be giving a shot to this approach & will update back.
Thanks again
Sharad
May 1, 2009 at 12:40 am
Manie Verster (4/30/2009) Here is a cursor that I would like you to convert to set for me please. If you can do this I will accept the fact that since SQL Server 2005 cursors are no longer necessary.
Does this work?
DECLARE @MaxLen INT
SELECT @MaxLen = MAX(LEN(firstname + ' ' + lastname )) FROM AdventureWorks.Sales.Customer
SET ROWCOUNT @MaxLen;
SELECT IDENTITY(int,1,1) AS id INTO #Tally FROM sysobjects a;
SET ROWCOUNT 0;
SELECT
SUBSTRING(firstname + ' ' + lastname , id,1) search,
ASCII(SUBSTRING(firstname + ' ' + lastname , id,1)),
firstname + ' ' + lastname [searchstr] FROM AdventureWorks.Sales.Customer CROSS JOIN #Tally tally
WHERE NOT ASCII(SUBSTRING(firstname + ' ' + lastname , id,1)) IS NULL
ORDER BY searchstr, id
DROP TABLE #Tally;
(Luckily you didn't mention performance)
Viewing 15 posts - 136 through 150 (of 316 total)
You must be logged in to reply to this topic. Login to reply