June 5, 2003 at 10:13 am
I was wanting to know if it's possible to query a text file to get a count of how many rows are the same in query analyzer.
Does anyone know if this is possible.
Not sure I am in the right area.
Thanks
June 5, 2003 at 2:00 pm
It would be easy if it were xml, but I am guessing it isn't. What format is it in?
June 5, 2003 at 5:11 pm
Try the following for a case-insensitive query against c:\temp\example.txt (assuming your SQL server is case-insensitive):
SELECT
COUNT(*)-1 AS Duplications,
TextLine
FROM
(
SELECT
F1 AS TextLine
FROM OpenDataSource(
'Microsoft.Jet.OLEDB.4.0',
'Data Source="C:\TEMP\";Extended properties="Text;HDR=NO"')...Example#Txt
) TextInfo
GROUP BY TextLine
HAVING COUNT(*) > 1
COMPUTE SUM(COUNT(*)-1)
Or, for a case-sensitive query, try:
SELECT
COUNT(*)-1 AS Duplications,
TextLine,
Checksum
FROM
(
SELECT
F1 AS TextLine,
BINARY_CHECKSUM (*) AS Checksum
FROM OpenDataSource(
'Microsoft.Jet.OLEDB.4.0',
'Data Source="C:\TEMP\";Extended properties="Text;HDR=NO"')...Example#Txt
) TextInfo
GROUP BY TextLine, Checksum
HAVING COUNT(*) > 1
COMPUTE SUM(COUNT(*)-1)
For my testing, c:\temp\example.txt contained:
This line appears twice (with different CASE)
This line appears 4 times
This line appears 3 times
This line appears 2 times
This line appears 1 time
This line appears 4 times
This line appears 3 times
This line appears 2 times
This line appears 4 times
This line appears 3 times
This line appears 4 times
This line appears twice (with different case)
Cheers,
- Mark
June 5, 2003 at 7:11 pm
Thanks, I will give that a shot. I have created a stored procedure that took care of the problem (had to import to a database but) this looks like a much easier solution.
One other question. When you do count(*) - 1 I don't want to subtract any thing from the count so do I just leave it off.
June 5, 2003 at 8:32 pm
The "count(*) - 1" bit is simply my interpretation of what is a "duplicate". If a line appears 4 times, then methinks we have 1 original plus 3 duplicates. If you interpret this as 4 duplicates, then yes leave the "- 1" off.
Cheers,
- Mark
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply