Very Large Datasets

  • Currently we have a Table that stores full paths to image files we have stored in the file system. For one of our clients this table contains 3.5 million records. I need to be able periodically audit these files. I am using a VB.NET application, trying to pull the filenames into a dataset so I can run each file through the EXISTS? function. Obviously this much data overload the user memory available. I would like to process this thing in batches of 1000 or so clearing the dataset between each thousand so my user memory remains somewhat close to the same as when it ran the first 1000. I hope that makes sense. Anyone have any idea the best way to handle this?

  • Why not just pull the filenames into a table and do a left join. 3.5 million rows is really not that much data. I can load a million rows into a temp table on my laptop in about 2.5 seconds. Is this field in your database TEXT or VARCHAR/NVARCHAR data type?

    Jared
    CE - Microsoft

  • I think the OP is having problems with his application and not sql. In other words the 3.5 Million rows is too much for his .NET app. I suspect the OP is doing something like:

    for each DataRow dr in MyDataTable.Rows

    {

    ValidateFileExists();

    }

    .NET is not likely to be very content trying to process 3.5M rows in a datatable.

    That being the case I would think you would have better luck on a .NET forum as the cause of the problem is your application and not SQL.

    You could possibly break this into batches with nested for loops in your .net app. Something like:

    for int x = 0; x <= MyDataTable.Rows.Count / 1000; x++)

    {

    for(int y = 0; y <= 1000; y++)

    {

    if(y + (x * 1000) <= MyDataTable.Rows.Count)

    {

    ValidateFileExists();

    }

    else

    {

    break;

    }

    }

    }

    The above is untested but seems like it should be close. I will leave converting it VB.NET to you.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • The issue with this is the initial data load. It loads down all the data at once before it ever hits the while loop. Your are right it is with a .NET app, but I thought since many people use it that this would be a good place to take a peek. Thanks for any and all help.

  • It is a varchar field length is 256x2 for unicode. Which should cover a mximum path/file name

  • I'd like to suggest that doing this type of audit in VB is probably the wrong place to do it. A stored procedure would do this much more quickly.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I did try this through a Stored Procedure and it actually took a long time. I used a FileExists function I found. I will continue to work it form that route, but I do think VB would be faster handling this process.

    Do you have a sample of what you are referring to?

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply