July 3, 2012 at 1:57 am
I have put the results of dir *.* /s into a file and need to database the file, for a discrepancy report between external stored data and the database.
Due to the large size of the file, I need o remove the partial duplicate records pro-grammatically so the records are transformed from raw format shown in **1 below to **2
**1
------
z:\images\12346
z:\images\12346\09132010_1151
09/13/2010 12:46 PM 14,934 12346_09132010_1151_5_i.bmp
09/13/2010 12:46 PM 14,934 12346_09132010_1151_6_i.bmp
z:\images\505722
z:\images\505722\10252011_0849
10/25/2011 11:36 AM 56,200 505722_10252011_0849_35_0.d
10/25/2011 11:36 AM 2,032,930 505722_10252011_0849_21_0.d
10/25/2011 11:36 AM 14,934 505722_10252011_0849_12_i.b
z:\images\8921
z:\images\8921\04192011_1022
04/19/2011 01:36 PM 14,934 8921_04192011_1022_92_i.bmp
**2
--------
z:\images\12346\09132010_1151
09/13/2010 12:46 PM 14,934 12346_09132010_1151_5_i.bmp
09/13/2010 12:46 PM 14,934 12346_09132010_1151_6_i.bmp
z:\images\505722\10252011_0849
10/25/2011 11:36 AM 56,200 505722_10252011_0849_35_0.d
10/25/2011 11:36 AM 2,032,930 505722_10252011_0849_21_0.d
10/25/2011 11:36 AM 14,934 505722_10252011_0849_12_i.b
z:\images\8921\04192011_1022
04/19/2011 01:36 PM 14,934 8921_04192011_1022_92_i.bmp
As the final step, I would have to transform the file from **2 to format below, so it can be imported into database
z:\images\12346\09132010_1151 09/13/2010 12:46 PM 14,934 12346_09132010_1151_5_i.bmp
z:\images\12346\09132010_1151 09/13/2010 12:46 PM 14,934 12346_09132010_1151_6_i.bmp
z:\images\505722\10252011_0849 10/25/2011 11:36 AM 56,200 505722_10252011_0849_35_0.d
z:\images\505722\10252011_0849 10/25/2011 11:36 AM 2,032,930 505722_10252011_0849_21_0.d
z:\images\505722\10252011_0849 10/25/2011 11:36 AM 14,934 505722_10252011_0849_12_i.b
z:\images\8921\04192011_1022 04/19/2011 01:36 PM 14,934 8921_04192011_1022_92_i.bmp
I tried some excel Macros that reformats the data, and works well as long the number of records are in thousands , however when trying the macro on my raw data due to the file size, after six hours or so, the execution of macros was never completed and I gave up on the idea of using Excel . I am now thinking that it may be possible to reformat the files using SQL tools , however wanted to discuss the idea in this forum and see if there are any suggestions on the approach.
July 3, 2012 at 2:35 am
Try DIR *.* /s /n /b
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 3, 2012 at 7:52 am
Thanks for your helpful suggestion, using the Dir *.* /s /b /n , solves part of my problem, and redefines the problem as " how to remove partially duplicate records, using tsql?"
The result of the command looks something like this, any recommendations on how to remove the dups?
z:\images ==> need to remove (duplicate)
z:\images\12346 ==> need to remove (duplicate)
z:\images\12346\09132010_1151 09/13/2010 12:46 PM 14,934 12346_09132010_1151_5_i.bmp
z:\images\12346\09132010_1151 09/13/2010 12:46 PM 14,934 12346_09132010_1151_6_i.bmp
z:\images\505722 ==> need to remove (duplicate)
z:\images\505722\10252011_0849 10/25/2011 11:36 AM 56,200 505722_10252011_0849_35_0.d
z:\images\505722\10252011_0849 10/25/2011 11:36 AM 2,032,930 505722_10252011_0849_21_0.d
z:\images\505722\10252011_0849 10/25/2011 11:36 AM 14,934 505722_10252011_0849_12_i.b
z:\images\8921 ==> need to remove (duplicate)
z:\images\8921\04192011_1022 04/19/2011 01:36 PM 14,934 8921_04192011_1022_92_i.bmp
July 3, 2012 at 8:15 am
The problem is that you are thinking of them as "partial duplicates," which itself is impossible (IMHO). Why not look for the '.' in the file name, since that is what I presume is a defining characteristic of what you would like to keep in your set.
Jared
CE - Microsoft
July 3, 2012 at 8:33 am
Have a look at the output from this, it will point you in the direction. Come back if you have any trouble with it:
DROP TABLE #Filelist
CREATE TABLE #Filelist (DIRoutput VARCHAR(254))
INSERT INTO #Filelist
SELECT 'z:\images' UNION ALL
SELECT 'z:\images\12346' UNION ALL
SELECT 'z:\images\12346\09132010_1151 09/13/2010 12:46 PM 14,934 12346_09132010_1151_5_i.bmp' UNION ALL
SELECT 'z:\images\12346\09132010_1151 09/13/2010 12:46 PM 14,934 12346_09132010_1151_6_i.bmp' UNION ALL
SELECT 'z:\images\505722' UNION ALL
SELECT 'z:\images\505722\10252011_0849 10/25/2011 11:36 AM 56,200 505722_10252011_0849_35_0.d' UNION ALL
SELECT 'z:\images\505722\10252011_0849 10/25/2011 11:36 AM 2,032,930 505722_10252011_0849_21_0.d' UNION ALL
SELECT 'z:\images\505722\10252011_0849 10/25/2011 11:36 AM 14,934 505722_10252011_0849_12_i.b' UNION ALL
SELECT 'z:\images\8921' UNION ALL
SELECT 'z:\images\8921\04192011_1022 04/19/2011 01:36 PM 14,934 8921_04192011_1022_92_i.bmp'
SELECT a.*, b.*
FROM #Filelist a
LEFT JOIN #Filelist b
ON a.DIRoutput LIKE b.DIRoutput+'\%'
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 3, 2012 at 8:34 am
SQLKnowItAll (7/3/2012)
The problem is that you are thinking of them as "partial duplicates," which itself is impossible (IMHO). Why not look for the '.' in the file name, since that is what I presume is a defining characteristic of what you would like to keep in your set.
Risky, even if the search is confined to the tail end of the string.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 3, 2012 at 8:36 am
Hi Jared, thanks for your suggestion. The issue is that, there are no set patterns with the name of the data files. Some files have extensions and some don't. For example the file names below:
i:\PSDATA\Images\Nov2010\11_01_10
i:\PSDATA\Images\Nov2010\11_01_10\bake1
Z:\PSDATA\Images\Nov2010\11_01_10\bake1 11/01/2010 08:54 AM 20,449,708 file1 ==>file name w no ext.
Z:\PSDATA\Images\Nov2010\11_01_10\bake1 11/01/2010 08:54 AM 20,974,558 file10 ==>file name w no ext.
Z:\PSDATA\Images\Nov2010\11_01_10\bake1 11/01/2010 09:17 AM 20,712,502 file11 ==>file name w no ext.
And there is a different variation of file names that go like:
Z:\PSDATA\Images\Sep2005\09_29_05\jone1\thumbnails 05/12/2008 09:08 AM 4,039 file1.jpg
Z:\PSDATA\Images\Sep2005\09_29_05\jone1\thumbnails 05/12/2008 09:08 AM 4,175 file10.jpg
Here is the macro I used in excel that does the work but I run into a brick wall, when running it for big files, (do I need to write a program with a cursor to do the same?)
Option Explicit
Sub DeletePartialDups()
Dim rSrc As Range
Dim c As Range, r As Range
Dim vRes() As Range
Dim sFirstAdr As String
Dim i As Long
Set rSrc = Range("A1", Cells(Rows.Count, "A").End(xlUp))
ReDim vRes(0)
'Get list of cells to delete
For Each c In rSrc
sFirstAdr = c.Address
Set r = rSrc.Find(what:=c, after:=c, LookIn:=xlValues, _
lookat:=xlPart, MatchCase:=False)
If r.Address <> sFirstAdr Then
Set vRes(UBound(vRes)) = c
ReDim Preserve vRes(0 To UBound(vRes) + 1)
End If
Next c
Application.ScreenUpdating = False
For i = UBound(vRes) - 1 To 0 Step -1
vRes(i).Delete xlShiftUp
Next i
Application.ScreenUpdating = True
End Sub
July 3, 2012 at 8:44 am
jaytaba (7/3/2012)
Hi Jared, thanks for your suggestion. The issue is that, there are no set patterns with the name of the data files. Some files have extensions and some don't. For example the file names below:i:\PSDATA\Images\Nov2010\11_01_10
i:\PSDATA\Images\Nov2010\11_01_10\bake1
Z:\PSDATA\Images\Nov2010\11_01_10\bake1 11/01/2010 08:54 AM 20,449,708 file1 ==>file name w no ext.
Z:\PSDATA\Images\Nov2010\11_01_10\bake1 11/01/2010 08:54 AM 20,974,558 file10 ==>file name w no ext.
Z:\PSDATA\Images\Nov2010\11_01_10\bake1 11/01/2010 09:17 AM 20,712,502 file11 ==>file name w no ext.
And there is a different variation of file names that go like:
Z:\PSDATA\Images\Sep2005\09_29_05\jone1\thumbnails 05/12/2008 09:08 AM 4,039 file1.jpg
Z:\PSDATA\Images\Sep2005\09_29_05\jone1\thumbnails 05/12/2008 09:08 AM 4,175 file10.jpg
Here is the macro I used in excel that does the work but I run into a brick wall, when running it for big files, (do I need to write a program with a cursor to do the same?)
Option Explicit
Sub DeletePartialDups()
Dim rSrc As Range
Dim c As Range, r As Range
Dim vRes() As Range
Dim sFirstAdr As String
Dim i As Long
Set rSrc = Range("A1", Cells(Rows.Count, "A").End(xlUp))
ReDim vRes(0)
'Get list of cells to delete
For Each c In rSrc
sFirstAdr = c.Address
Set r = rSrc.Find(what:=c, after:=c, LookIn:=xlValues, _
lookat:=xlPart, MatchCase:=False)
If r.Address <> sFirstAdr Then
Set vRes(UBound(vRes)) = c
ReDim Preserve vRes(0 To UBound(vRes) + 1)
End If
Next c
Application.ScreenUpdating = False
For i = UBound(vRes) - 1 To 0 Step -1
vRes(i).Delete xlShiftUp
Next i
Application.ScreenUpdating = True
End Sub
Fair enough. Though now I see you have more problems than just importing and sorting this data :hehe:
Jared
CE - Microsoft
July 3, 2012 at 9:07 am
Try the following to get the list of files:
dir *.* /a: -D /b /s /n
NOTE: No space between the a: and the -D. I had to put the space there or you got this a:-D
July 3, 2012 at 9:13 am
Lynn Pettis (7/3/2012)
Try the following to get the list of files:
dir *.* /a: -D /b /s /n
NOTE: No space between the a: and the -D. I had to put the space there or you got this a:-D
+1 get the source data right! Nice Lynn!
Jared
CE - Microsoft
July 3, 2012 at 9:15 am
Lynn Pettis (7/3/2012)
Try the following to get the list of files:
dir *.* /a: -D /b /s /n
NOTE: No space between the a: and the -D. I had to put the space there or you got this a:-D
Spot on, nice job Lynn 😎
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 3, 2012 at 11:41 am
Lynn, nice just what I wanted.
Just curious, is there a switch that allows capturing of the file attributes (i.e. size, date)
Thanks again for saving me a lot of trouble.
Regards
Jay
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply