Threw an exception... out of memory

  • Here's my exact source(attached), and I used SSMA to copy the data up to SQL Server 2005 Developer Edition 64-bit with SP2. SSMA is SQL Server Migration Assistant. I set that product to specifically convert the decimal field to decimal(8,3). By the way, the final index took just over an hour to create, and the final db size is ~70 GIG.

    I created the database on my C: drive (the only one I have at the moment - I use the 2nd half of the drive as my backup destination for opsys backups for now. The two partitions are 500 GB each. I used 16,384 MB as the creation size for the db file, and set the log file to max out at 2048 MB.

    The code I've used to perform the various tasks, albeit a batch or portion thereof at a time, is here:

    CREATE TABLE dbo.DISTANCES (

    ZIP1 char(5),

    ZIP2 char(5),

    DISTANCE decimal(7,2)

    )

    DECLARE @START DateTime, @END DateTime, @DURATION decimal(12,3), @LOOPER int, @LOOPCHAR varchar(5)

    SET @LOOPER = 0

    SET @START = GETDATE()

    PRINT 'STARTING DATE/TIME: ' + CAST(@START AS varchar(30))

    WHILE (@LOOPER < 10000)

    BEGIN

    SET @LOOPCHAR = RIGHT('000' + CAST(@LOOPER AS varchar(4)),4) + '%'

    PRINT 'LOOP START FOR #' + @LOOPCHAR

    INSERT INTO dbo.DISTANCES (ZIP1, ZIP2, DISTANCE)

    SELECT Z.[ZIP Code] AS ZIP1, Y.[ZIP Code] AS ZIP2,

    CASE

    WHEN (Z.Latitude = Y.Latitude) AND (Z.Longitude = Y.Longitude) THEN

    CAST(0 AS decimal(7,2))

    ELSE ROUND(69.1 * 180 / PI() * ACOS(

    SIN(RADIANS(Z.Latitude)) * SIN(RADIANS(Y.Latitude)) +

    COS(RADIANS(Z.Latitude)) * COS(RADIANS(Y.Latitude)) *

    COS(RADIANS(Y.Longitude) - RADIANS(Z.Longitude))),2)

    END AS DISTANCE

    FROM dbo.[ZIP Codes] AS Z, dbo.[ZIP Codes] AS Y

    WHERE Z.[ZIP Code] < Y.[ZIP Code] AND

    Z.[ZIP Code] Like @LOOPCHAR

    SET @LOOPER = @LOOPER + 1

    PRINT 'LOOP END #' + @LOOPCHAR

    CHECKPOINT

    BACKUP LOG ZIP_CODES WITH TRUNCATE_ONLY

    DBCC SHRINKFILE(ZIP_CODES_log)

    PRINT 'LOG MAINTENANCE COMPLETED'

    END

    SET @END = GETDATE()

    SET @DURATION = CAST(DATEDIFF(ms, @START, @END) AS decimal(12,3)) / 1000.

    PRINT 'ENDING DATE/TIME: ' + CAST(@END AS varchar(30))

    PRINT 'TOTAL DURATION WAS: ' + CAST(@DURATION AS varchar(30)) + ' SECONDS'

    GO

    ALTER TABLE dbo.DISTANCES

    ADD DIST_CATEGORY smallint

    CREATE NONCLUSTERED INDEX IX_ZIP1

    ON dbo.DISTANCES(ZIP1 ASC)

    WITH FILLFACTOR = 100

    GO

    CREATE NONCLUSTERED INDEX IX_ZIP2

    ON dbo.DISTANCES(ZIP2 ASC)

    WITH FILLFACTOR = 100

    GO

    DECLARE @START DateTime, @END DateTime, @DURATION decimal(12,3), @LOOPER int, @LOOPCHAR varchar(5)

    SET @LOOPER = 0

    SET @START = GETDATE()

    PRINT 'STARTING DATE/TIME: ' + CAST(@START AS varchar(30))

    WHILE (@LOOPER < 1000)

    BEGIN

    SET @LOOPCHAR = RIGHT('00' + CAST(@LOOPER AS varchar(3)),3) + '%'

    PRINT 'LOOP START FOR #' + @LOOPCHAR

    UPDATE dbo.DISTANCES

    SET DIST_CATEGORY =

    CASE WHEN DISTANCE <= 150 THEN 5 * (((CAST(ROUND(DISTANCE + .49,0) AS smallint) - 1) / 5) + 1)

    WHEN DISTANCE <= 250 THEN 250

    WHEN DISTANCE <= 500 THEN 500

    WHEN DISTANCE <= 1000 THEN 1000

    WHEN DISTANCE <= 1500 THEN 1500

    WHEN DISTANCE <= 2000 THEN 2000

    WHEN DISTANCE <= 2500 THEN 2500

    WHEN DISTANCE <= 5000 THEN 5000

    WHEN DISTANCE <= 7500 THEN 7500

    WHEN DISTANCE <= 10000 THEN 10000

    ELSE 10001

    END

    WHERE ZIP1 Like @LOOPCHAR

    SET @LOOPER = @LOOPER + 1

    PRINT 'LOOP END #' + @LOOPCHAR

    CHECKPOINT

    BACKUP LOG ZIP_CODES WITH TRUNCATE_ONLY

    DBCC SHRINKFILE(ZIP_CODES_log)

    PRINT 'LOG MAINTENANCE COMPLETED'

    END

    SET @END = GETDATE()

    SET @DURATION = CAST(DATEDIFF(ms, @START, @END) AS decimal(12,3)) / 1000.

    PRINT 'ENDING DATE/TIME: ' + CAST(@END AS varchar(30))

    PRINT 'TOTAL DURATION WAS: ' + CAST(@DURATION AS varchar(30)) + ' SECONDS'

    GO

    CREATE NONCLUSTERED INDEX IX_DIST_CATEGORY

    ON dbo.DISTANCES(DIST_CATEGORY ASC)

    WITH FILLFACTOR = 100

    GO

    I'll look forward to whatever you come up with. What I'm most eager to learn is just what gets logged, and under what conditions.

    Steve

    (aka smunson)

    :-):-):-)

    Jeff Moden (7/5/2009)


    Never mind, Steve... I found one that's a CSV file at

    http://www.boutell.com/zipcodes/

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Just an FYI, the following query runs in as little as 0.04 seconds to as much as 0.16 seconds - always fast, and always sub-second, and that's a good thing.

    DBCC FREESYSTEMCACHE ('ALL')

    DECLARE @FIND_ZIP AS char(5), @START AS DateTime, @END AS DateTime, @DURATION AS decimal(8,3)

    SET @START = GETDATE()

    SET @FIND_ZIP = '60609'

    SELECT ZIP1, ZIP2, DISTANCE

    FROM dbo.DISTANCES

    WHERE ZIP1 = @FIND_ZIP AND

    DIST_CATEGORY <= 25

    ORDER BY DISTANCE, ZIP2

    SET @END = GETDATE()

    SET @DURATION = DATEDIFF(ms, @START, @END) / 1000.

    SELECT 'QUERY COMPLETED IN ' + CAST(@DURATION AS varchar(8)) + ' SECONDS' AS DURATION

    Steve

    (aka smunson)

    :-):-):-)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • UPDATE:

    Well, actually, my last query wasn't quite right. I just realized that only checking ZIP1 provides only those zip codes larger than than the one being checked, as I had decided to cut the database size in half by only having unique records where the combination of the two zip codes is concerned. Here's the revised query:

    DBCC FREESYSTEMCACHE ('ALL')

    DECLARE @FIND_ZIP AS char(5), @START AS DateTime, @END AS DateTime, @DURATION AS decimal(8,3)

    SET @START = GETDATE()

    SET @FIND_ZIP = '68105'

    SELECT @FIND_ZIP AS ZIP1, ZIP2, DISTANCE

    FROM dbo.DISTANCES

    WHERE ZIP1 = @FIND_ZIP AND

    DIST_CATEGORY <= 25

    UNION

    SELECT @FIND_ZIP AS ZIP1, ZIP1 AS ZIP2, DISTANCE

    FROM dbo.DISTANCES

    WHERE ZIP2 = @FIND_ZIP AND

    DIST_CATEGORY <= 25

    ORDER BY DISTANCE, ZIP1, ZIP2

    SET @END = GETDATE()

    SET @DURATION = DATEDIFF(ms, @START, @END) / 1000.

    SELECT @START AS STARTING, @END AS ENDING,

    'QUERY COMPLETED IN ' + CAST(@DURATION AS varchar(8)) + ' SECONDS' AS DURATION

    The performance is still sub-second, but can go up to .4 something seconds after the very first query, which ran in 1.7 seconds.

    Steve

    (aka smunson)

    :-):-):-)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Just check Zip 2 because there will always be all zip codes in that column.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I meant to ask... what are you doing this for? What's the business reason, I mean?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Would you believe that it's entirely a learning process? I'm trying out things, finding out what kind of stuff happens when you have a huge database, and finding out just how well it can perform. Since I'm not working right now (much as I'd PREFER to be working... unemployment ROTS!), I have the time to learn, so the more I know about things I might encounter in the real world, the better. Thus, the only BUSINESS is that of education. I've already learned a great deal from this forum, and I plan to have that continue for a long time.

    There are always so many ways to do something, and thus I need to be able to know what kinds of things might be worth trying.

    Jeff Moden (7/5/2009)


    I meant to ask... what are you doing this for? What's the business reason, I mean?

    As to just checking ZIP2, that can't work because for any given pair of zip codes, there is exactly one record in the DISTANCES table. Thus if X is the ZIP I'm searching for, if I only check against ZIP1, I get all the pairs where ZIP2 is > X, and if I only check against ZIP2, I get the opposite result, where I get all the pairs where ZIP1 < X. I need both sets together, and putting an OR in a single query isn't going to perform worth beans - I let that fly once, and cancelled the query a minute and a half later, knowing I was on the wrong track.

    Jeff Moden (7/5/2009)


    Just check Zip 2 because there will always be all zip codes in that column.

    Steve

    (aka smunson)

    :-):-):-)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Understood on the "business" of education... I do it for at least an hour a day. Kind of like practicing piano. Heh... I thought you had some extreme need for speed and thus the denormalization of the table.

    On the Zip2 thing... I haven't done the test in a while, but I'm pretty sure that a simple OR will be resolved in a fashion similar to a UNION ALL. You could try that.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • On the "need for speed", that was more of a wish thing... I have an expensive system, and I have expectations for it - realistic in some cases, perhaps not in others - and in this case, it was more of a test drive to see what it can do. This kind of database could feed a mapping application that could grab a map from Yahoo, MapQuest, or Google, and plot the locations of every certain kind of business within a given radius of a typed in zip code. Of course, one would also need data on the business location zip codes and the associated addresses/phone numbers, but what the heck...

    I'm attaching an Excel 2007 spreadsheet with the execution plans for the two ways to run the query, and I'm just not curious enough to find out how long the "OR" version would take to run, given the execution plan it generates.

    Steve

    (aka smunson)

    :-):-):-)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • smunson (7/5/2009)


    I'm just not curious enough to find out how long the "OR" version would take to run, given the execution plan it generates.

    :-):-):-)

    What? Unemployed... in search of knowledge... nothing for the computer to do while your sleeping... 😉

    And remember, execution plans can lie like a rug.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Unemployment is a tad too common these days. Anyway, the idea of an execution plan doing the "lie like a rug" thing has already been put to bed. Given my subsecond response to the UNION query, and the at least 1.5 minutes and probably an AWFUL LOT more for the OR query, that's my logic, but since I'm the paduan learner here, I'll be thorough and let it run tonight. I'll update in the morning. I expect it to be bad - bad to the core, as it were... (:-D:-D:-D)

    Steve

    (aka smunson)

    :-D:-D:-D

    Jeff Moden (7/5/2009)


    smunson (7/5/2009)


    I'm just not curious enough to find out how long the "OR" version would take to run, given the execution plan it generates.

    :-):-):-)

    What? Unemployed... in search of knowledge... nothing for the computer to do while your sleeping... 😉

    And remember, execution plans can lie like a rug.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • UPDATE:

    The "OR" version of the query took 3 mins, 48.383 seconds. Way too long to be able to support any kind of website that might use this kind of database. The difference is a whopping 571 times longer than the worst case query of 0.4 seconds for the UNION version, and over 2,000 times longer than the more typical 0.1 seconds. I do have to admit I expected far worse, but I think this is bad enough on it's own merit.

    What you said about the execution plan being able to "lie like a rug" is no doubt accurate, and when I look at the EstimatedExecutions columns for both plans, the UNION version sums to 415 while the OR version sums to over 21,000, for a difference of about 51 times. I would expect the difference in of another 40x to appear in the respecttive average actual cost of each execution, but I don't have the knowledge right now to be able to figure out if that's indeed the case.

    This has been a most useful exercise in learning about large databases. I just wish I could learn more about exactly what goes into the log file, and when. Apparently, it sure can be one heck of a lot.

    Steve

    (aka smunson)

    :-):-):-)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • It would be interesting to see the complete code examples in the test harness you used.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply