Query Txt file with Query Analyzer

Question

Post reply

Query Txt file with Query Analyzer

scrocker

SSC Veteran

Points: 231
More actions
June 5, 2003 at 10:13 am

#106676

I was wanting to know if it's possible to query a text file to get a count of how many rows are the same in query analyzer.
Does anyone know if this is possible.
Not sure I am in the right area.
Thanks

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

tkbr0wn Ten Centuries Points: 1092 More actions · Answer 1

It would be easy if it were xml, but I am guessing it isn't. What format is it in?

mccork SSC-Insane Points: 22288 More actions · Answer 2

Try the following for a case-insensitive query against c:\temp\example.txt (assuming your SQL server is case-insensitive):

 

SELECT 

    COUNT(*)-1 AS Duplications,

    TextLine

FROM

(

    SELECT

        F1 AS TextLine

    FROM OpenDataSource( 

        'Microsoft.Jet.OLEDB.4.0',

        'Data Source="C:\TEMP\";Extended properties="Text;HDR=NO"')...Example#Txt

) TextInfo

GROUP BY TextLine

HAVING COUNT(*) > 1

COMPUTE SUM(COUNT(*)-1)

Or, for a case-sensitive query, try:

 

SELECT 

    COUNT(*)-1 AS Duplications,

    TextLine,

    Checksum

FROM

(

    SELECT

        F1 AS TextLine,

        BINARY_CHECKSUM (*) AS Checksum

    FROM OpenDataSource( 

        'Microsoft.Jet.OLEDB.4.0',

        'Data Source="C:\TEMP\";Extended properties="Text;HDR=NO"')...Example#Txt

) TextInfo

GROUP BY TextLine, Checksum

HAVING COUNT(*) > 1

COMPUTE SUM(COUNT(*)-1)

For my testing, c:\temp\example.txt contained:

This line appears twice (with different CASE)

This line appears 4 times

This line appears 3 times

This line appears 2 times

This line appears 1 time

This line appears 4 times

This line appears 3 times

This line appears 2 times

This line appears 4 times

This line appears 3 times

This line appears 4 times

This line appears twice (with different case)

Cheers,
- Mark

scrocker SSC Veteran Points: 231 More actions · Answer 3

Thanks, I will give that a shot. I have created a stored procedure that took care of the problem (had to import to a database but) this looks like a much easier solution.

One other question. When you do count(*) - 1 I don't want to subtract any thing from the count so do I just leave it off.

mccork SSC-Insane Points: 22288 More actions · Answer 4

The "count(*) - 1" bit is simply my interpretation of what is a "duplicate". If a line appears 4 times, then methinks we have 1 original plus 3 duplicates. If you interpret this as 4 duplicates, then yes leave the "- 1" off.

Cheers,
- Mark