Query Txt file with Query Analyzer

  • I was wanting to know if it's possible to query a text file to get a count of how many rows are the same in query analyzer.

    Does anyone know if this is possible.

    Not sure I am in the right area.

    Thanks

  • It would be easy if it were xml, but I am guessing it isn't. What format is it in?

  • Try the following for a case-insensitive query against c:\temp\example.txt (assuming your SQL server is case-insensitive):

     
    
    SELECT
    COUNT(*)-1 AS Duplications,
    TextLine
    FROM
    (
    SELECT
    F1 AS TextLine
    FROM OpenDataSource(
    'Microsoft.Jet.OLEDB.4.0',
    'Data Source="C:\TEMP\";Extended properties="Text;HDR=NO"')...Example#Txt
    ) TextInfo
    GROUP BY TextLine
    HAVING COUNT(*) > 1
    COMPUTE SUM(COUNT(*)-1)

    Or, for a case-sensitive query, try:

     
    
    SELECT
    COUNT(*)-1 AS Duplications,
    TextLine,
    Checksum
    FROM
    (
    SELECT
    F1 AS TextLine,
    BINARY_CHECKSUM (*) AS Checksum
    FROM OpenDataSource(
    'Microsoft.Jet.OLEDB.4.0',
    'Data Source="C:\TEMP\";Extended properties="Text;HDR=NO"')...Example#Txt
    ) TextInfo
    GROUP BY TextLine, Checksum
    HAVING COUNT(*) > 1
    COMPUTE SUM(COUNT(*)-1)

    For my testing, c:\temp\example.txt contained:

    This line appears twice (with different CASE)

    This line appears 4 times

    This line appears 3 times

    This line appears 2 times

    This line appears 1 time

    This line appears 4 times

    This line appears 3 times

    This line appears 2 times

    This line appears 4 times

    This line appears 3 times

    This line appears 4 times

    This line appears twice (with different case)


    Cheers,
    - Mark

  • Thanks, I will give that a shot. I have created a stored procedure that took care of the problem (had to import to a database but) this looks like a much easier solution.

    One other question. When you do count(*) - 1 I don't want to subtract any thing from the count so do I just leave it off.

  • The "count(*) - 1" bit is simply my interpretation of what is a "duplicate". If a line appears 4 times, then methinks we have 1 original plus 3 duplicates. If you interpret this as 4 duplicates, then yes leave the "- 1" off.


    Cheers,
    - Mark

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply