How to implement the threading in storeprocedures without duplicate

  • Hi All,

        My table have 10000 records and i create the 5 store procedures and my 5 applications exec the same time and each application load the records 2000 .it is working fine but the problem is when run 1 application it load the 2000 records and take 5 min then i run the 2 application it load 1900 records .Here the problem is 1 application are working with duplication records.How to load the 2 application 10000-2000=8000 in threading application.my sp like this below

    ;with P as
    (
    select *,ROW_NUMber() over(order by id) as rownum,
    ntile(5) over(order by id) as thread
    from [Records]
    where not exists(
                    select name from
                    [Data]
                    where
                    Convert(varchar(10),date,103) = @date
                    and
                    [Records].name = [Data].name             
    )
    select name from P where thread = 1

    Regards
    Pols

  • polo.csit - Thursday, July 26, 2018 3:20 AM

    Hi All,

        My table have 10000 records and i create the 5 store procedures and my 5 applications exec the same time and each application load the records 2000 .it is working fine but the problem is when run 1 application it load the 2000 records and take 5 min then i run the 2 application it load 1900 records .Here the problem is 1 application are working with duplication records.How to load the 2 application 10000-2000=8000 in threading application.my sp like this below

    ;with P as
    (
    select *,ROW_NUMber() over(order by id) as rownum,
    ntile(5) over(order by id) as thread
    from [Records]
    where not exists(
                    select name from
                    [Data]
                    where
                    Convert(varchar(10),date,103) = @date
                    and
                    [Records].name = [Data].name             
    )
    select name from P where thread = 1

    Regards
    Pols

    Why do you need to do any sort of threading when there are only 10,000 rows?

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • my table not 10000 records it is 400000 records...

  • polo.csit - Thursday, July 26, 2018 3:20 AM

    Hi All,

        My table have 10000 records and i create the 5 store procedures and my 5 applications exec the same time and each application load the records 2000 .it is working fine but the problem is when run 1 application it load the 2000 records and take 5 min then i run the 2 application it load 1900 records .Here the problem is 1 application are working with duplication records.How to load the 2 application 10000-2000=8000 in threading application.my sp like this below

    ;with P as
    (
    select *,ROW_NUMber() over(order by id) as rownum,
    ntile(5) over(order by id) as thread
    from [Records]
    where not exists(
                    select name from
                    [Data]
                    where
                    Convert(varchar(10),date,103) = @date
                    and
                    [Records].name = [Data].name             
    )
    select name from P where thread = 1

    Regards
    Pols

    Part of the problem with speed here is the use of a function on a column in your WHERE clause.   Change the @date parameter to be of the same data type as the date column in your table, and then be sure there's an index on that table with the date column as the first column, and possibly the name column as the 2nd one, and you will probably run  fast enough that 400,000 rows won't be a problem.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • There's no way to 'thread' within a stored procedure. You can have multiple procedures that each work on a subset of the data, run them from jobs or different connections.

    That said, what are you trying to achieve?  What's the root problem that you're trying to solve here?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • From a functional point of view - the way you're trying to bucket into multiple groups isn't "stable":  if you ever changed the data size, then the bucket assignments will start shifting from one bucket into another.

    Example:  if you had 1000 rows with id's 1-1000, using NTILE(5)
    rows
    1-200 -> bucket 1
    201-400-> bucket 2
    401-600 -> bucket 3
    601-800->bucket 4
    801-1000-> bucket 5
    once you add ANOTHER 100 to your mix, then
    1-400-> bucket 1
    401-800-> bucket 2
    801-1200 -> bucket 3
    1201-1600->bucket 4
    1601-2000-> bucket 5
    Notice that your "new" bucket 1 now holds all of buckets 1 and 2 from the first run, etc....  If you're trying to split work up this way - that's NOT the right splitting strategy.   Try 
    id % 5
    instead (that % represents MODULO here)

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (4) - Thursday, July 26, 2018 8:03 AM

    From a functional point of view - the way you're trying to bucket into multiple groups isn't "stable":  if you ever changed the data size, then the bucket assignments will start shifting from one bucket into another.

    Example:  if you had 1000 rows with id's 1-1000, using NTILE(5)
    rows
    1-200 -> bucket 1
    201-400-> bucket 2
    401-600 -> bucket 3
    601-800->bucket 4
    801-1000-> bucket 5
    once you add ANOTHER 100 to your mix, then
    1-400-> bucket 1
    401-800-> bucket 2
    801-1200 -> bucket 3
    1201-1600->bucket 4
    1601-2000-> bucket 5
    Notice that your "new" bucket 1 now holds all of buckets 1 and 2 from the first run, etc....  If you're trying to split work up this way - that's NOT the right splitting strategy.   Try 
    id % 5
    instead (that % represents MODULO here)

    Indeed. Better to have a semi-static BucketList (heh!) table, which is updated from time to time outside of the 'multi-threaded' process:

    1-400  -> bucket 1
    401-800  -> bucket 2
    801-1200   -> bucket 3
    1201-1600  ->bucket 4
    1601-∞  -> bucket 5

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • sgmunson - Thursday, July 26, 2018 6:40 AM

    polo.csit - Thursday, July 26, 2018 3:20 AM

    Hi All,

        My table have 10000 records and i create the 5 store procedures and my 5 applications exec the same time and each application load the records 2000 .it is working fine but the problem is when run 1 application it load the 2000 records and take 5 min then i run the 2 application it load 1900 records .Here the problem is 1 application are working with duplication records.How to load the 2 application 10000-2000=8000 in threading application.my sp like this below

    ;with P as
    (
    select *,ROW_NUMber() over(order by id) as rownum,
    ntile(5) over(order by id) as thread
    from [Records]
    where not exists(
                    select name from
                    [Data]
                    where
                    Convert(varchar(10),date,103) = @date
                    and
                    [Records].name = [Data].name             
    )
    select name from P where thread = 1

    Regards
    Pols

    Part of the problem with speed here is the use of a function on a column in your WHERE clause.   Change the @date parameter to be of the same data type as the date column in your table, and then be sure there's an index on that table with the date column as the first column, and possibly the name column as the 2nd one, and you will probably run  fast enough that 400,000 rows won't be a problem.

    here speed is not a problem..the problem is duplicate loading in threads.

  • polo.csit - Thursday, July 26, 2018 9:44 AM

    There speed is not a problem..the problem is duplicate loading in threads.

    Please answer Gail's question.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • polo.csit - Thursday, July 26, 2018 9:44 AM

    sgmunson - Thursday, July 26, 2018 6:40 AM

    polo.csit - Thursday, July 26, 2018 3:20 AM

    Hi All,

        My table have 10000 records and i create the 5 store procedures and my 5 applications exec the same time and each application load the records 2000 .it is working fine but the problem is when run 1 application it load the 2000 records and take 5 min then i run the 2 application it load 1900 records .Here the problem is 1 application are working with duplication records.How to load the 2 application 10000-2000=8000 in threading application.my sp like this below

    ;with P as
    (
    select *,ROW_NUMber() over(order by id) as rownum,
    ntile(5) over(order by id) as thread
    from [Records]
    where not exists(
                    select name from
                    [Data]
                    where
                    Convert(varchar(10),date,103) = @date
                    and
                    [Records].name = [Data].name             
    )
    select name from P where thread = 1

    Regards
    Pols

    Part of the problem with speed here is the use of a function on a column in your WHERE clause.   Change the @date parameter to be of the same data type as the date column in your table, and then be sure there's an index on that table with the date column as the first column, and possibly the name column as the 2nd one, and you will probably run  fast enough that 400,000 rows won't be a problem.

    here speed is not a problem..the problem is duplicate loading in threads.

    Really?   Then why are you needing to split it up into batches?

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • polo.csit - Thursday, July 26, 2018 3:20 AM

    Hi All,

        My table have 10000 records and i create the 5 store procedures and my 5 applications exec the same time and each application load the records 2000 .it is working fine but the problem is when run 1 application it load the 2000 records and take 5 min then i run the 2 application it load 1900 records .Here the problem is 1 application are working with duplication records.How to load the 2 application 10000-2000=8000 in threading application.my sp like this below

    ;with P as
    (
    select *,ROW_NUMber() over(order by id) as rownum,
    ntile(5) over(order by id) as thread
    from [Records]
    where not exists(
                    select name from
                    [Data]
                    where
                    Convert(varchar(10),date,103) = @date
                    and
                    [Records].name = [Data].name             
    )
    select name from P where thread = 1

    Regards
    Pols

    I had to do something similar once when sending data to a CMS system over the internet. To get enough rows sent I had to create multiple processes to read the rows as sending each row over the internet to the CMS system had a long wait for the response but if sent in parallel I could increase the rate at which they were sent. So if you are doing what I think you are doing then you can split the rows using the T-SQL mod operator: '%' using some code similar to this:
    DECLARE @ThreadToFind as int
    SET @ThreadToFind=1

    ;with P as
    (
    select *,ROW_NUMber() over(order by id) as rownum,
    id%5 as thread
    from [Records]
    where not exists(select name from [Data] where Convert(varchar(10),date,103) = @date
          and [Records].name = [Data].name
    )
    select name
    from P
    where thread = @ThreadToFind

    This will split the rows into 5 different threads (0 to 4) and you won't get duplicates as it only relies on the value of the Id column to split the rows.

  • duplicate records are loaded in threads...

  • polo.csit - Friday, July 27, 2018 4:30 AM

    duplicate records are loaded in threads...

    You've said this before. Why are you refusing to answer our questions?

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Jonathan AC Roberts - Thursday, July 26, 2018 3:25 PM

    polo.csit - Thursday, July 26, 2018 3:20 AM

    Hi All,

        My table have 10000 records and i create the 5 store procedures and my 5 applications exec the same time and each application load the records 2000 .it is working fine but the problem is when run 1 application it load the 2000 records and take 5 min then i run the 2 application it load 1900 records .Here the problem is 1 application are working with duplication records.How to load the 2 application 10000-2000=8000 in threading application.my sp like this below

    ;with P as
    (
    select *,ROW_NUMber() over(order by id) as rownum,
    ntile(5) over(order by id) as thread
    from [Records]
    where not exists(
                    select name from
                    [Data]
                    where
                    Convert(varchar(10),date,103) = @date
                    and
                    [Records].name = [Data].name             
    )
    select name from P where thread = 1

    Regards
    Pols

    I had to do something similar once when sending data to a CMS system over the internet. To get enough rows sent I had to create multiple processes to read the rows as sending each row over the internet to the CMS system had a long wait for the response but if sent in parallel I could increase the rate at which they were sent. So if you are doing what I think you are doing then you can split the rows using the T-SQL mod operator: '%' using some code similar to this:
    DECLARE @ThreadToFind as int
    SET @ThreadToFind=1

    ;with P as
    (
    select *,ROW_NUMber() over(order by id) as rownum,
    id%5 as thread
    from [Records]
    where not exists(select name from [Data] where Convert(varchar(10),date,103) = @date
          and [Records].name = [Data].name
    )
    select name
    from P
    where thread = @ThreadToFind

    This will split the rows into 5 different threads (0 to 4) and you won't get duplicates as it only relies on the value of the Id column to split the rows.

    Hi Jonathan AC Roberts,

     i am implement the id%5 but some records are not load the in thread(storeprocedure).

  • polo.csit - Friday, July 27, 2018 5:59 AM

    Jonathan AC Roberts - Thursday, July 26, 2018 3:25 PM

    polo.csit - Thursday, July 26, 2018 3:20 AM

    Hi All,

        My table have 10000 records and i create the 5 store procedures and my 5 applications exec the same time and each application load the records 2000 .it is working fine but the problem is when run 1 application it load the 2000 records and take 5 min then i run the 2 application it load 1900 records .Here the problem is 1 application are working with duplication records.How to load the 2 application 10000-2000=8000 in threading application.my sp like this below

    ;with P as
    (
    select *,ROW_NUMber() over(order by id) as rownum,
    ntile(5) over(order by id) as thread
    from [Records]
    where not exists(
                    select name from
                    [Data]
                    where
                    Convert(varchar(10),date,103) = @date
                    and
                    [Records].name = [Data].name             
    )
    select name from P where thread = 1

    Regards
    Pols

    I had to do something similar once when sending data to a CMS system over the internet. To get enough rows sent I had to create multiple processes to read the rows as sending each row over the internet to the CMS system had a long wait for the response but if sent in parallel I could increase the rate at which they were sent. So if you are doing what I think you are doing then you can split the rows using the T-SQL mod operator: '%' using some code similar to this:
    DECLARE @ThreadToFind as int
    SET @ThreadToFind=1

    ;with P as
    (
    select *,ROW_NUMber() over(order by id) as rownum,
    id%5 as thread
    from [Records]
    where not exists(select name from [Data] where Convert(varchar(10),date,103) = @date
          and [Records].name = [Data].name
    )
    select name
    from P
    where thread = @ThreadToFind

    This will split the rows into 5 different threads (0 to 4) and you won't get duplicates as it only relies on the value of the Id column to split the rows.

    Hi Jonathan AC Roberts,

     i am implement the id%5 but some records are not load the in thread(storeprocedure).

    I would create one stored procedure that takes a parameter @ThreadToFind then each process that calls it would call the stored procedure with the thread it wants (0 to 4).
    I think all data should be returned as long as you are calling the stored procedure with all the threads: 0, 1, 2, 3, 4
    You need to provide a more information on what it's not returning and ideally a sample script to illustrate if you want a more specific answer..

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply