Get both Count and the data through single query to reduce time

  • My task includes:

    1) Getting the count(which is nothing but the total number of records present in the specific SQL table.As they are millions of rows,  it is taking a lot of time to run the count() query). I need to get the count before I write that data from data flow task(source) to the flat file destination tasks, as I need to write the total count of records as well in the flat file.

    2)I tried to use the RowCount task,and tried to save that rowcount and to further use that in the next task.But it doesn't work for me.

    3) As there are millions of rows in the sql table, it is taking 6 mins(for getting the data) and 6 mins(to get the count() of the records in before step). My manager really asked me to cut down the running time a lot. So, I am trying to run the query once for getting both the count and data(from data flow task to flat file destination).

    Can you please advise me on how step 3 works, like getting both at the same time and to reduce the running time for this whole task?

    I hope someone can help me out on this..

    Thanks in Advance!!

  • Have you tried getting the row count from the sys.partitions view?

    John

  • This is the fast way John Mitchell mentioned:

    SELECT SUM(P.[rows]) AS [RowCount]
    FROM sys.tables AS T
    INNER JOIN sys.partitions AS P
    ON P.[object_id] = T.[object_id]
    AND P.index_id IN (0, 1)
    WHERE T.name = N'myTable'
    AND SCHEMA_NAME(T.schema_id) = 'dbo'

    If you don't mind getting the count after you have run the query to get the rows, just run SELECT @@ROWCOUNT after running the select query.

  • Sorry to say this.

    Actually I am new to working on SQL and creating SSIS Packages. So, can you please explain that to me in a simplified way.

    If the table from which I need to get the data is '[dbo].[VW_TABLE_USERS],then as per your query, I will be giving the following command right:

    SELECT SUM(P.[rows]) AS [RowCount]

    FROM sys.tables AS T

    INNER JOIN sys.partitions AS P

    ON P.[object_id] = T.[object_id]

    AND P.index_id IN (0, 1)

    WHERE T.name = N'VW_TABLE_USERS'

    AND SCHEMA_NAME(T.schema_id) = 'dbo'

    Also can you please tell me the 'Data Access Mode' that I need to select:  'Table or view' OR 'table name or view name variable'?

    Thanks in Advance!!

     

    • This reply was modified 4 years, 8 months ago by  binds.
    Attachments:
    You must be logged in to view attached files.
  • I need to get the count(for header data) first as I need to write the header data(which includes count) first on the destination file and then the data.

  • binds wrote:

    Sorry to say this.

    Actually I am new to working on SQL and creating SSIS Packages. So, can you please explain that to me in a simplified way.

    If the table from which I need to get the data is '[dbo].[VW_TABLE_USERS],then as per your query, I will be giving the following command right:

    SELECT SUM(P.[rows]) AS [RowCount]

    FROM sys.tables AS T

    INNER JOIN sys.partitions AS P

    ON P.[object_id] = T.[object_id]

    AND P.index_id IN (0, 1)

    WHERE T.name = N'VW_TABLE_USERS'

    AND SCHEMA_NAME(T.schema_id) = 'dbo'

     

    Also can you please tell me the 'Data Access Mode' that I need to select:  'Table or view' OR 'table name or view name variable'?

    Thanks in Advance!! 

    SELECT SUM(P.[rows]) AS [RowCount]
    FROM sys.tables AS T
    INNER JOIN sys.partitions AS P
    ON P.[object_id] = T.[object_id]
    AND P.index_id IN (0, 1)
    WHERE T.name = N'VW_TABLE_USERS'
    AND SCHEMA_NAME(T.schema_id) = 'dbo'

    But from the table name it looks like it's a view not a table?

    It's just a select query so the same type of query you use for receiving the results.

  • John Mitchell-245523 wrote:

    Have you tried getting the row count from the sys.partitions view?

    John

    In this case, I think that's a bad idea because that may not match the actual number of rows actually being written out.  The only way it could be done this way is (maybe) exclusively lock the source table, take the count from sys.partitions, and then export the data from the locked table.  I'm not even sure of that because I'm not sure that sys.partitions is instantly updated if he count changes.

    {EDIT} Of course, that's not going to work here anyway because the source ISN'T a table to begin with... It's a VIEW!

    • This reply was modified 4 years, 8 months ago by  Jeff Moden.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • yes, It's a View.

  • Jeff Moden wrote:

    John Mitchell-245523 wrote:

    Have you tried getting the row count from the sys.partitions view?

    John

    In this case, I think that's a bad idea because that may not match the actual number of rows actually being written out.

    How so?  Maybe in SQL Server 2000, where the row_count column wasn't always up to date, but not in modern versions.  And he said he wants the number of rows in the table, not the number of rows that a particular query returns.  Now we've learned that it's not a table after all, it's a bit more difficult.  If he's just selecting the whole view, then probably choose SQL command in SSIS and use a ROLLUP clause in the SELECT statement in order to get the row count at the very end.

    John

  • John Mitchell-245523 wrote:

    Jeff Moden wrote:

    John Mitchell-245523 wrote:

    Have you tried getting the row count from the sys.partitions view?

    John

    In this case, I think that's a bad idea because that may not match the actual number of rows actually being written out.

    How so?  Maybe in SQL Server 2000, where the row_count column wasn't always up to date, but not in modern versions.

    Please cite the official MS document and paragraph that states that sys.partitions is up to date with the true/actual/instantly accurate rowcount at all times.  I ask because I double-checked the MS documentation for both sys.partitions and sys.dm_db_partition_stats and they both say the same thing (emphasis is mine)...

    rows bigint Indicates the approximate number of rows in this partition.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • binds wrote:

    2)I tried to use the RowCount task,and tried to save that rowcount and to further use that in the next task.But it doesn't work for me.

    3) As there are millions of rows in the sql table, it is taking 6 mins(for getting the data) and 6 mins(to get the count() of the records in before step). My manager really asked me to cut down the running time a lot. So, I am trying to run the query once for getting both the count and data(from data flow task to flat file destination).

    I see 2 problems here - the first problem is how to use RowCount in SSIS.  You stated it doesn't work for you - but we have no idea what that really means.  I think the problem is that the row count value will not be populated with an accurate value until *after* all rows have been moved through the pipeline.  Because of that - you cannot provide the row count as the first record of the file...

    The second problem is your query takes 6 minutes to run...and that should be addressed.

    Finally - I think you may want to include a count() over() - this will create a column that will have the total number of rows included in the query.  You could then use that column in the mapping for the header record.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Yes. I got to know that Rowcount variable will not be updated until the complete control flow completes.

    The sql table have millions of records.Is that the reason it is taking more time to run?

     

  • binds wrote:

    Yes. I got to know that Rowcount variable will not be updated until the complete control flow completes.

    The sql table have millions of records.Is that the reason it is taking more time to run?

    Maybe.  Maybe not.  You said you weren't reading from a table but a View.  The view could be the problem.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I can even work with a query instead of view.

    Again,I just want to confirm if working with a table runs fast than a view?

  • Jeff Moden wrote:

    John Mitchell-245523 wrote:

    Jeff Moden wrote:

    John Mitchell-245523 wrote:

    Have you tried getting the row count from the sys.partitions view?

    John

    In this case, I think that's a bad idea because that may not match the actual number of rows actually being written out.

    How so?  Maybe in SQL Server 2000, where the row_count column wasn't always up to date, but not in modern versions.

    Please cite the official MS document and paragraph that states that sys.partitions is up to date with the true/actual/instantly accurate rowcount at all times.  I ask because I double-checked the MS documentation for both sys.partitions and sys.dm_db_partition_stats and they both say the same thing (emphasis is mine)...

    rows bigint Indicates the approximate number of rows in this partition.

    OK, fair enough.  I'm guilty of making the assumption that because experience shows it works better than it used to, it is somehow "fixed".  My mistake.

    John

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply