Row count difference between flat file source and sql table

  • I have an SSIS package that imports flatFile data to a table. The package has error output for flat file and oledb destination. The execution results show the flafile to have 176,863 rows and writes the same number to the destination but when I query the table there are only 93,456 rows in the table. Can anyone explain this?

    [Source - scorecardts_tab [1]] Information: The total number of data rows processed for file "C:\Scorecardapp Test\scorecardts.tab" is 176864.

    [FlatFile Errors [2210]] Information: The final commit for the data insertion in "component "FlatFile Errors" (2210)" has started.

    [FlatFile Errors [2210]] Information: The final commit for the data insertion in "component "FlatFile Errors" (2210)" has ended.

    [Destination - SCORECARDAPP_TEST_NEW [1094]] Information: The final commit for the data insertion in "component "Destination - SCORECARDAPP_TEST_NEW" (1094)" has started.

    [Destination - SCORECARDAPP_TEST_NEW [1094]] Information: The final commit for the data insertion in "component "Destination - SCORECARDAPP_TEST_NEW" (1094)" has ended.

    [Scorecardapp Table Errors [1652]] Information: The final commit for the data insertion in "component "Scorecardapp Table Errors" (1652)" has started.

    [Scorecardapp Table Errors [1652]] Information: The final commit for the data insertion in "component "Scorecardapp Table Errors" (1652)" has ended.

    [SSIS.Pipeline] Information: Post Execute phase is beginning.

    Progress: Post Execute - 0 percent complete[Source - scorecardts_tab [1]] Information: The processing of file "C:\Scorecardapp Test\scorecardts.tab" has ended.

    Progress: Post Execute - 100 percent complete[SSIS.Pipeline] Information: "component "Destination - SCORECARDAPP_TEST_NEW" (1094)" wrote 176863 rows.

    [SSIS.Pipeline] Information: "component "Scorecardapp Table Errors" (1652)" wrote 0 rows.

    [SSIS.Pipeline] Information: "component "FlatFile Errors" (2210)" wrote 0 rows.

  • Are there duplicate rows (by PK) in the source data?

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • No

    I have a sequential numbering for the first column (APPNUMBER) and they are all in the table as well as the flat file.

  • I think I know why the count does not match.

    When I exported the data from the sql table (93,456 rows) to a txt file and then imported it to excel the count was 176,863 so all of the data is there. My guess is that sql stores data more efficiently than a (.csv,tab or txt) file. That would explain why the SSIS execution stated it wrote 176,863 rows. If anyone has ever seen documentation on this please share it.

    Thanks for you help and suggestions!

    Doug

  • jdbrown239 (4/10/2013)


    I think I know why the count does not match.

    When I exported the data from the sql table (93,456 rows) to a txt file and then imported it to excel the count was 176,863 so all of the data is there. My guess is that sql stores data more efficiently than a (.csv,tab or txt) file. That would explain why the SSIS execution stated it wrote 176,863 rows. If anyone has ever seen documentation on this please share it.

    Thanks for you help and suggestions!

    Doug

    That sounds unlikely to me. 93,456 rows in SQL Server should equate to 93,456 in a text file and in Excel.

    I would be looking VERY closely at what is going on. Check the original text file in an editor that gives you a row count. If the row count is 176,863, something has gone wrong in the export from SQL Server (or did you perhaps append rows to a file which was not empty?).

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • The original file has a row count of 176,864 and even though the row count from sql is 93,456 when I export to txt file and import to excel for row count it is 176,864. Any ideas why?

  • jdbrown239 (4/10/2013)


    The original file has a row count of 176,864 and even though the row count from sql is 93,456 when I export to txt file and import to excel for row count it is 176,864. Any ideas why?

    None.

    Do not involve Excel if you can avoid it - stick with text files. Excel and SSIS do not always work well together.

    Are you saying that you can export from a SQL table containing 93,456 rows to a text file, and that this text file will then contain 176,864 rows?

    If so, stop right there and do some analysis - this should not be happening. I suggest that you use something like Notepad ++ rather than Notepad when checking the text file.

    Find a row that is in the text file and not in SQL Server and work out why.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • After some more digging I don't think the problem is with the bulk load from the SSIS package. I think the row count in the database is the issue. Exporting the data showed me it was all there and the execution stated how many rows were written. I think the row count (Select column_name = count(*) from table_name) is giving the wrong count due to some sql bug even afetr I updated the stats on the table. Has anyone seen this before?

  • I have a problem with 176,864 rows (text file) -> 93,456 rows (SQL table) -> 176,864 rows (text file). Something does not add up here, but since we can't see what you see there really isn't much we can do.

  • Look for the line breaks in the export. That will cause the difference in count.

  • DevDB (4/11/2013)


    Look for the line breaks in the export. That will cause the difference in count.

    excellent point, something like this would be five rows in a text editor, but one row of data, if the row terminator was \n or some other custom terminator

    'A string{cr}

    with Carriage Returns{cr}

    that finally ends {cr}

    with a{cr}

    {cr}{lf}'

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Line breaks meaning carriage returns. I have an SSIS package that dumps out tables from a database into flat files and I have to verify the counts in the files match the table sizes. My package creates SQL statements for each table. For each field that is of type char, varchar, text, etc., I replace the carriage returns with some other marker. (We use ^^).

    The query for a single table looks like:

    select field1, replace(field2, Chr(13) & char(10), '^^'), field3...from myTable

    The package then opens up each sql statement and a single script task generates the output data (as opposed to using a data flow task).

    Just to be thorough, once the output files are created I have another script task that opens each one, counts the lines and compares the line count to what is expected. If a line count is off for a single file the package fails and my log table has an entry indicating which file's count was off.

  • Mystery solved: I was not getting an accurate row count using

    select APPNUMBER = COUNT(*) from SCORECARDAPP_TEST _new

    When querying the dmv I was able to get the exact count as the flat file.

    SELECT SUM (row_count)

    FROM sys.dm_db_partition_stats

    WHERE object_id=OBJECT_ID('SCORECARDAPP_TEST_new')

    AND (index_id=0 or index_id=1);

    Thanks Everyone!

    Doug

  • Nice package. Do you have the code posted some where?

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply