Breaking Up is | Easy | to Do |
It's nice to be able to package a process into a single, tidy, elegant query, but it isn't always possible to do so. And even if it is possible, it may not be the best way to do it. Often, we can get better performance out of large or complex queries by breaking them up into smaller pieces. I encountered a great example of this today.
A developer asked me about a query that was taking a really long time to process in the test environment. The particular step that was having issues is building a long string by concatenating short strings to pass through to a remote server for processing of data on the remote server. It was a simple, recursive string building query.
Declare @RowIDs nvarchar(max)
Select @RowIDs = IsNull(@RowIDs + ',', '') + '''' + RowID + ''''
From StagingTable with(nolock)
When there were only 12,000 ID's to concatenate, it ran in about 9 seconds. When the number increased to 18K ID's, it took twice as long, 18 seconds. At 20K ID's, it was taking 2 minutes, and now at 90K records, it is taking almost 1.5 hours. Obviously, the developer was not happy with performance and asked if I knew why the process time increased exponentially.
Upon observing the query run, I determined that the query was using a really large amount of memory, CPU time, and causing a lot of disk I/O. It was simply trying to do too much at once. So, I decided to see what would happen if I broke it down into smaller, more manageable chunks.
I set up a looping process to build several smaller strings. Then it would loop through the smaller strings and concatenate them together into the large, final string. The process could now build the string of 90K+ ID's, starting with short strings of 10K ID's per string, in 26 seconds. If I lowered the number of ID's in the initial set of shorter strings, it ran even faster. 5K of ID's per short string ran in 21 seconds and 1K of ID's ran in 19 seconds.
This is the final query I ended up with:
Declare @Rows nvarchar(max),
@Loops int,
@CurrLoop int,
@MaxIDsPerLoop int
Declare @RowIDs Table (RowIDKey int identity(1, 1) not null primary key,
RowID nvarchar(15) not null)
Declare @RowIDRows Table (RowIDRowKey int identity(1, 1) not null primary key,
RowIDRow nvarchar(max) not null)
Set NoCount On
Set @MaxIDsPerLoop = 1000
Set @CurrLoop = 0
Insert Into @RowIDs (RowID)
Select RowID
From StagingDatabase with(nolock)
Select @Loops = (count(RowIDKey) / @MaxIDsPerLoop>) + 1
From @RowIDs
While @CurrLoop < @Loops
Begin
Set @Rows = Null
Select @Rows = IsNull(@Rows + ',', '') + '''' + RowID + ''''
From @RowIDs
Where RowIDKey % @Loops = @CurrLoop
Insert Into @RowIDRows (RowIDRow)
Select @Rows
Set @CurrLoop = @CurrLoop + 1
End
Set @Rows = Null
Select @Rows = IsNull(@Rows + ',', '') + RowIDRow
From @RowIDRows
Select @Rows
Set NoCount Off