Are the posted questions getting worse?

  • Lynn Pettis - Thursday, June 29, 2017 9:31 AM

    Chris Harshman - Wednesday, June 28, 2017 2:53 PM

    unfortunately had to write one this week... look through rows of a document table for the last 48 hours and compare the FileName column to the results of xp_fileexist to try to find missing documents (without waiting for the users to say "hey... the file doesn't open!")

    Just thinking that there may be a better way but would need more details.

    I considered getting a list of all files at once, then comparing to the document table, but the problem is there's 5 years or more of the various documents out in the file share, with several thousand generated in an average day.  I suppose I could hack together something in .Net code but it would still be loop oriented instead of set based.

  • Chris Harshman - Thursday, June 29, 2017 10:52 AM

    Lynn Pettis - Thursday, June 29, 2017 9:31 AM

    Chris Harshman - Wednesday, June 28, 2017 2:53 PM

    unfortunately had to write one this week... look through rows of a document table for the last 48 hours and compare the FileName column to the results of xp_fileexist to try to find missing documents (without waiting for the users to say "hey... the file doesn't open!")

    Just thinking that there may be a better way but would need more details.

    I considered getting a list of all files at once, then comparing to the document table, but the problem is there's 5 years or more of the various documents out in the file share, with several thousand generated in an average day.  I suppose I could hack together something in .Net code but it would still be loop oriented instead of set based.

    That's exactly what came to mind when you described the original problem, but thousands of files per day * 5 years = a whole lot of files.  Does the folder even behave moderately in Windows Explorer or is unusable yet?

  • Ed Wagner - Thursday, June 29, 2017 11:14 AM

    That's exactly what came to mind when you described the original problem, but thousands of files per day * 5 years = a whole lot of files.  Does the folder even behave moderately in Windows Explorer or is unusable yet?

    there are several sub-folders within the main document folder on the file share, but some of those are a bit painful to open, you get the scrolling green bar for a couple of minutes on some.  :crazy:  I'm just glad the person who designed this originally did split out the documents by department and process, and then kept them on a separate file server away from the database!

  • Chris Harshman - Thursday, June 29, 2017 10:52 AM

    Lynn Pettis - Thursday, June 29, 2017 9:31 AM

    Chris Harshman - Wednesday, June 28, 2017 2:53 PM

    unfortunately had to write one this week... look through rows of a document table for the last 48 hours and compare the FileName column to the results of xp_fileexist to try to find missing documents (without waiting for the users to say "hey... the file doesn't open!")

    Just thinking that there may be a better way but would need more details.

    I considered getting a list of all files at once, then comparing to the document table, but the problem is there's 5 years or more of the various documents out in the file share, with several thousand generated in an average day.  I suppose I could hack together something in .Net code but it would still be loop oriented instead of set based.

    Quick question, do you only need to compare the filename or the entire file path and name?

  • Lynn Pettis - Thursday, June 29, 2017 12:24 PM

    Quick question, do you only need to compare the filename or the entire file path and name?

    Since there are subfolders, and there is a chance (slim but could happen) that the same filename may exist in more than one subfolder, I should verify the whole path and name.

  • Chris Harshman - Thursday, June 29, 2017 12:30 PM

    Lynn Pettis - Thursday, June 29, 2017 12:24 PM

    Quick question, do you only need to compare the filename or the entire file path and name?

    Since there are subfolders, and there is a chance (slim but could happen) that the same filename may exist in more than one subfolder, I should verify the whole path and name.

    I assume that if a file won't open it is because the file exists in the table but not on disk, correct?

  • Chris Harshman - Thursday, June 29, 2017 12:30 PM

    Lynn Pettis - Thursday, June 29, 2017 12:24 PM

    Quick question, do you only need to compare the filename or the entire file path and name?

    Since there are subfolders, and there is a chance (slim but could happen) that the same filename may exist in more than one subfolder, I should verify the whole path and name.

    Just a thought about how this could be done:

    CREATE TABLE dbo.Test(
      Id INT IDENTITY(1,1)
      , DocumentFileName NVARCHAR(4000)
      , FileHash as CAST(HASHBYTES('MD5',[DocumentFileName]) AS VARBINARY(16)) PERSISTED -- MD5 is a 16 byte value
    );

    INSERT INTO dbo.Test([DocumentFileName])
    EXEC xp_cmdshell 'dir/b/s c:\*.*'; -- not sure if this will work with a UNC, but give it a shot

    DELETE FROM [dbo].[Test] WHERE [DocumentFileName] IS NULL;

    CREATE NONCLUSTERED INDEX idx1 ON [dbo].[Test](FileHash);

    SELECT * FROM [dbo].[Test] AS [t];

    SELECT *
    FROM
      dbo.Media m -- or what ever you table is named
    WHERE
      m.DateCreated >= DATEADD(HOUR,-48,GETDATE()) -- Or what ever column in the table you use for reducing the data set
      AND NOT EXISTS(SELECT 1
                     FROM [dbo].[Test] AS [t]
                     WHERE [t].[FileHash] = [m].[FileHash] -- assumes you have or can put a hash on your table
                     );

    DROP TABLE dbo.Test;

  • Lynn Pettis - Thursday, June 29, 2017 1:18 PM

    I assume that if a file won't open it is because the file exists in the table but not on disk, correct?

    Yes, that is what we are testing for.  FYI, the DIR /B /S command does work with a UNC name, but I cancelled it after it took longer just to run that DIR command as to run the entire CURSOR approach to only look at rows from document table from the past 48 hours.  Thanks for the effort though, I do appreciate you and Brandie actually considering this.

  • Chris Harshman - Thursday, June 29, 2017 2:04 PM

    Lynn Pettis - Thursday, June 29, 2017 1:18 PM

    I assume that if a file won't open it is because the file exists in the table but not on disk, correct?

    Yes, that is what we are testing for.  FYI, the DIR /B /S command does work with a UNC name, but I cancelled it after it took longer just to run that DIR command as to run the entire CURSOR approach to only look at rows from document table from the past 48 hours.  Thanks for the effort though, I do appreciate you and Brandie actually considering this.

    Lynn, great idea with the hash.

    Chris, I've found a similar situation.  I wrote a process to audit (size, date, etc.) directories of files that sound a lot like yours - zillions of files in thousands of directories.  I do a dir command and then import the text file for parsing and organizing.  I find that the dir command takes a very long time because of the sheer volume of files.

    I don't know if it'll help or not, but you might want to have a look at the forfiles command.  The /D switch allows filtering by date, either an absolute date or a number of days.  You could shell out to that in Lynn's example above and then run the check against the smaller list.

  • Ed Wagner - Thursday, June 29, 2017 2:29 PM

    Chris Harshman - Thursday, June 29, 2017 2:04 PM

    Lynn Pettis - Thursday, June 29, 2017 1:18 PM

    I assume that if a file won't open it is because the file exists in the table but not on disk, correct?

    Yes, that is what we are testing for.  FYI, the DIR /B /S command does work with a UNC name, but I cancelled it after it took longer just to run that DIR command as to run the entire CURSOR approach to only look at rows from document table from the past 48 hours.  Thanks for the effort though, I do appreciate you and Brandie actually considering this.

    Lynn, great idea with the hash.

    Chris, I've found a similar situation.  I wrote a process to audit (size, date, etc.) directories of files that sound a lot like yours - zillions of files in thousands of directories.  I do a dir command and then import the text file for parsing and organizing.  I find that the dir command takes a very long time because of the sheer volume of files.

    I don't know if it'll help or not, but you might want to have a look at the forfiles command.  The /D switch allows filtering by date, either an absolute date or a number of days.  You could shell out to that in Lynn's example above and then run the check against the smaller list.

    Another idea would be to see if PowerShell could be used to compare has values between the table in SQL and the files in the file system.  I don't know enough about PowerShell to give it a try.  I really need to learn PowerShell.

  • Lynn Pettis - Thursday, June 29, 2017 2:33 PM

    Ed Wagner - Thursday, June 29, 2017 2:29 PM

    Chris Harshman - Thursday, June 29, 2017 2:04 PM

    Lynn Pettis - Thursday, June 29, 2017 1:18 PM

    I assume that if a file won't open it is because the file exists in the table but not on disk, correct?

    Yes, that is what we are testing for.  FYI, the DIR /B /S command does work with a UNC name, but I cancelled it after it took longer just to run that DIR command as to run the entire CURSOR approach to only look at rows from document table from the past 48 hours.  Thanks for the effort though, I do appreciate you and Brandie actually considering this.

    Lynn, great idea with the hash.

    Chris, I've found a similar situation.  I wrote a process to audit (size, date, etc.) directories of files that sound a lot like yours - zillions of files in thousands of directories.  I do a dir command and then import the text file for parsing and organizing.  I find that the dir command takes a very long time because of the sheer volume of files.

    I don't know if it'll help or not, but you might want to have a look at the forfiles command.  The /D switch allows filtering by date, either an absolute date or a number of days.  You could shell out to that in Lynn's example above and then run the check against the smaller list.

    Another idea would be to see if PowerShell could be used to compare has values between the table in SQL and the files in the file system.  I don't know enough about PowerShell to give it a try.  I really need to learn PowerShell.

    Made a slight change, not sure if it will have much of an affect but it is based on Ed's comment about writing to a file then importing the file.


    CREATE TABLE dbo.Test(
      Id INT IDENTITY(1,1)
      , DocumentFileName NVARCHAR(4000)
      , FileHash as CAST(HASHBYTES('MD5',[DocumentFileName]) AS VARBINARY(16)) PERSISTED -- MD5 is a 16 byte value
    );

    CREATE TABLE [dbo].[Test2](DocumentFileName NVARCHAR(4000));

    EXEC xp_cmdshell 'dir/b/s c:\*.* > C:\files.txt', no_output; -- not sure if this will work with a UNC, but give it a shot

    BULK INSERT [dbo].[Test2]
    FROM 'c:\files.txt';

    INSERT INTO [dbo].[Test](DocumentFileName)
    SELECT * FROM [dbo].[Test2] AS [t]

    CREATE NONCLUSTERED INDEX idx1 ON [dbo].[Test](FileHash);

    SELECT * FROM [dbo].[Test] AS [t];

    SELECT *
    FROM
      dbo.Media m -- or what ever you table is named
    WHERE
      m.DateCreated >= DATEADD(HOUR,-48,GETDATE()) -- Or what ever column in the table you use for reducing the data set
      AND NOT EXISTS(SELECT 1
                     FROM [dbo].[Test] AS [t]
                     WHERE [t].[FileHash] = [m].[FileHash] -- assumes you have or can put a hash on your table
                     );

    DROP TABLE dbo.Test;
    DROP TABLE dbo.Test2;


    Edit: Copy paste issues

  • Lynn Pettis - Thursday, June 29, 2017 2:33 PM

    Ed Wagner - Thursday, June 29, 2017 2:29 PM

    Chris Harshman - Thursday, June 29, 2017 2:04 PM

    Lynn Pettis - Thursday, June 29, 2017 1:18 PM

    I assume that if a file won't open it is because the file exists in the table but not on disk, correct?

    Yes, that is what we are testing for.  FYI, the DIR /B /S command does work with a UNC name, but I cancelled it after it took longer just to run that DIR command as to run the entire CURSOR approach to only look at rows from document table from the past 48 hours.  Thanks for the effort though, I do appreciate you and Brandie actually considering this.

    Lynn, great idea with the hash.

    Chris, I've found a similar situation.  I wrote a process to audit (size, date, etc.) directories of files that sound a lot like yours - zillions of files in thousands of directories.  I do a dir command and then import the text file for parsing and organizing.  I find that the dir command takes a very long time because of the sheer volume of files.

    I don't know if it'll help or not, but you might want to have a look at the forfiles command.  The /D switch allows filtering by date, either an absolute date or a number of days.  You could shell out to that in Lynn's example above and then run the check against the smaller list.

    Another idea would be to see if PowerShell could be used to compare has values between the table in SQL and the files in the file system.  I don't know enough about PowerShell to give it a try.  I really need to learn PowerShell.

    I just do a BCP of the file into a staging table and do an EXCEPT or two between the staging table and the target table.  As for PowerShell, I've not found much utility in it that I can't do with SQL Server and xp_CmdShell, which isn't the security risk that so many people think it is.

    Now, if you want to learn a really cool tool, learn more about WMI/WMIC.  It does a whole lot of what people use PowerShell for but you don't usually don't need to write any loops.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Brandie Tarvin - Thursday, June 29, 2017 4:14 AM

    Chris Harshman - Wednesday, June 28, 2017 2:53 PM

    Jeff Moden - Wednesday, June 28, 2017 6:55 AM

    Heh...  what's the last cursor you wrote and what did it do?

    unfortunately had to write one this week... look through rows of a document table for the last 48 hours and compare the FileName column to the results of xp_fileexist to try to find missing documents (without waiting for the users to say "hey... the file doesn't open!")

    Okay, I'm going to pretend to be a little obtuse, but couldn't you run the results of xp_fileexists into a temp table, pull the document table into another temp table with a "FileExists" bit column set to 0 (or 1 if you prefer your falses to be 1), then update that column in a set-based update for all file names that match?

    So, guys, I'm curious why this post (one of the first responses to this file thing) is being completely ignored. Does no one have a comment on it?

    It just seems a little weird to me that people are starting code based on this exact idea but no one has noticed that I posted about it before the entire discussion really kicked off.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie Tarvin - Friday, June 30, 2017 6:53 AM

    Brandie Tarvin - Thursday, June 29, 2017 4:14 AM

    Okay, I'm going to pretend to be a little obtuse, but couldn't you run the results of xp_fileexists into a temp table, pull the document table into another temp table with a "FileExists" bit column set to 0 (or 1 if you prefer your falses to be 1), then update that column in a set-based update for all file names that match?

    So, guys, I'm curious why this post (one of the first responses to this file thing) is being completely ignored. Does no one have a comment on it?

    It just seems a little weird to me that people are starting code based on this exact idea but no one has noticed that I posted about it before the entire discussion really kicked off.

    No offense, I guess I just didn't understand your approach.  I thought xp_FileExists only produces output as an INT parameter, or with a 3 column 1 row resultset that doesn't include the filename.  I didn't think there was a set based approach with it.

  • OK, I've even tried using XCOPY /L /S /D to try to limit the filenames returned by date, so I only get a few thousand instead of the millions and millions of files, but this command still seems to take 3 or 4 minutes to just get this list.  Thanks again to everyone who tried to look at this, but the cursor based approach runs entirely in under a minute for me, so I'm sticking with this:

    SET NOCOUNT ON
    DECLARE
        @StartDate datetime = DATEADD(hour, -48, Getdate()), --DATEADD(day, DATEDIFF(day, 7, GETDATE()), 0),
        @DocumentPrintID int, @AppName varchar(80), @DocumentFileName varchar(250), @Exists int;

    CREATE TABLE #MissingDocs (DocumentPrintID int, AppName varchar(80), DocumentFileName varchar(250));

    DECLARE cur_Docs CURSOR FAST_FORWARD FOR
    SELECT DocumentPrintID, OriginalInsertAppName, DocumentFileName
      FROM dbo.DocumentPrint
      WHERE PrintDateTime >= @StartDate
        AND DocumentFileName IS NOT NULL;

    OPEN cur_Docs;
    FETCH NEXT FROM cur_Docs INTO @DocumentPrintID, @AppName, @DocumentFileName;

    WHILE @@FETCH_STATUS = 0
    BEGIN
        SET @Exists = 0;
        EXEC master.dbo.xp_fileexist @DocumentFileName, @Exists OUTPUT;

        IF @Exists = 0
            INSERT INTO #MissingDocs (DocumentPrintID, AppName, DocumentFileName)
                VALUES (@DocumentPrintID, @AppName, @DocumentFileName);

        FETCH NEXT FROM cur_Docs INTO @DocumentPrintID, @AppName, @DocumentFileName;
    END

    CLOSE cur_Docs;
    DEALLOCATE cur_Docs;

    SELECT * FROM #MissingDocs;
    DROP TABLE #MissingDocs;

Viewing 15 posts - 59,071 through 59,085 (of 66,742 total)

You must be logged in to reply to this topic. Login to reply