record loading

  • I have a table from a Linked Server that I'm pulling data and would like to know if there is a more efficient way based upon this table design. I need to full reload and not use the "t_stamp >= DATEDIFF_BIG(MS, '19691231 20:00:00', GETDATE()) - (1000 * 60 * 30))" in the where clause.

    I know what the min and max for t_stamp is for the table, but that is such a large pull I was wondering if I could break up the select

    and grab say 50,000 records between the min and max until no more records to process. The select is doing and Insert into a staging table for other processing.

    Thanks, and if more info is needed ..

     

    Select 
    distinct ql.Line,ql.Tag_Description,sd.floatvalue as 'CaptureValue',DATEADD(s,t_stamp/1000,'1969-12-31 20:00:00') as 'DateRecorded'
    from
    SM_Query_Lookups ql,
    rem.db1.dbo.sqlt_data_1_2021_08 sd
    where
    (t_stamp >= DATEDIFF_BIG(MS, '19691231 20:00:00', GETDATE()) - (1000 * 60 * 30))
    and ql.Tag_ID = sd.tagid

     

    CREATE TABLE [dbo].[sqlt_data_1_2021_08](
    [tagid] [int] NOT NULL,
    [intvalue] [bigint] NULL,
    [floatvalue] [float] NULL,
    [stringvalue] [nvarchar](255) NULL,
    [datevalue] [datetime] NULL,
    [dataintegrity] [int] NULL,
    [t_stamp] [bigint] NOT NULL,
    PRIMARY KEY CLUSTERED
    (
    [tagid] ASC,
    [t_stamp] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]
    GO
  • Take a look at https://techcommunity.microsoft.com/t5/sql-server-blog/sql-server-2016-minimal-logging-and-impact-of-the-batchsize-in/ba-p/385537. You may be able to take advantage of minimal logging to bulk load the data.

    BCP or bulk insert are an option for loading a table from a file exported out of the source database using bcp. If the table is very large, you can use batch size to control transaction log growth.

    How many rows in the source table ?

  • The source table has 2 billion but not all records are chosen since it does a lookup ql.Tag_ID = sd.tagid

    Any example of a script to do it in batches of 50K and using the min\max of T_Stamp as the driver?

    THanks.

  • Bruin wrote:

    The source table has 2 billion but not all records are chosen since it does a lookup ql.Tag_ID = sd.tagid

    Any example of a script to do it in batches of 50K and using the min\max of T_Stamp as the driver?

    THanks.

    I don't know which table is the linked server table but joining to it means that all rows must be examined.  You need something more discreet.

    You, apparently, have an "accidental" many-to-many join going on and you have the need to use DISTINCT because of that problem.

    Consider using OPENQUERY so the query will run on the remote server and give it some decent criteria to pull the data with.  It'll run a lot fast.

    I also recommend that you avoid using the old fashion method of joining in whatever fix you come up with.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This is the table name from the Linked Server --          rem.db1.dbo.sqlt_data_1_2021_08

    SM_Query_Lookups tbl is on the local server .. The remote server contains all of the values I'm using for my insert to Staging table on

    local server..

    • This reply was modified 2 years ago by  Bruin.
  • I don't know the data well enough to say with conviction, but there has to be some criteria that you could use in an open query, other than a join, to bring the linked server data over in a hurry and the further filter it with a local join.

    Of course, with the way things have been going today, I might just be low on caffeine.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • In my original query I was using this on linked server

    t_stamp >= DATEDIFF_BIG(MS, '19691231 20:00:00', GETDATE()) - (1000 * 60 * 30)) along with the local join..

     

     

  • I'm having syntax problems doing OpenQuery with pulling values..

    DATEADD(s,t_stamp/1000,'1969-12-31 20:00:00') as 'DateRecorded'

  • Gosh, Bruin... how do you expect anyone to help with such little information?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sorry Jeff.

    I have been working with open query trying to get the values I need for my Insert into Staging Table. The query I have is returning some of my values but not getting the local table values.

     

    Insert Into Tag_Data_Capture_staging

    (EquipID,TagName,CaptureValue,DateRecorded)

    Values needed from query

    ql.Line,ql.Tag_Description,sd.floatvalue as 'CaptureValue', DATEADD(s,t_stamp/1000,'1969-12-31 20:00:00') as 'DateRecorded'

    Also, If I know what the min and max for t_stamp is for the remote table and seeing how the remote table is large I was wondering if I could break up the select and grab say 50,000 records between the min and max until no more records to process. Not sure how to incorporate that into script.

    SELECT 
    *
    FROM OPENQUERY (srv1, 'SELECT tagid,floatvalue,DATEADD(s,t_stamp/1000,''1969-12-31 20:00:00'') as ''DateRecorded''
    FROM rem.dbo.sqlt_data_4_2022_12') sd
    left
    JOIN process.dbo.SM_Query_Lookups ql
    on ql.Tag_ID = rem.tagid
    OPTION (MAXDOP 1);
  • I'm sure you have heard of

    • Dynamic SQL
    • Synonyms
    • where field in (a,b,c,d)

    All the above can help you with

    • Build a query that fetches x number of rows only
    • prevent the bad practice you have of using 4 names on your objects
    • deal with the fact that in many many cases a join from a local table to a remote table MAY result on the full remote table being copied locally before the join is applied. (you can see if this is the case by looking at the explain plan in both local and remote servers when your query is executing)

    the remaining work (e.g. build the process) its really up to you as you are the one getting paid to do this work.

  • Bruin wrote:

    Sorry Jeff.

    I have been working with open query trying to get the values I need for my Insert into Staging Table. The query I have is returning some of my values but not getting the local table values.

    Insert Into Tag_Data_Capture_staging (EquipID,TagName,CaptureValue,DateRecorded)

    Values needed from query

    ql.Line,ql.Tag_Description,sd.floatvalue as 'CaptureValue', DATEADD(s,t_stamp/1000,'1969-12-31 20:00:00') as 'DateRecorded'

    Also, If I know what the min and max for t_stamp is for the remote table and seeing how the remote table is large I was wondering if I could break up the select and grab say 50,000 records between the min and max until no more records to process. Not sure how to incorporate that into script.

    SELECT 
    *
    FROM OPENQUERY (srv1, 'SELECT tagid,floatvalue,DATEADD(s,t_stamp/1000,''1969-12-31 20:00:00'') as ''DateRecorded''
    FROM rem.dbo.sqlt_data_4_2022_12') sd
    left
    JOIN process.dbo.SM_Query_Lookups ql
    on ql.Tag_ID = rem.tagid
    OPTION (MAXDOP 1);

    You're still joining to the OPENQUERY.  That means the OPENQUERY has to read all the rows from the linked server.  There's also no criteria on the linked table.  How many rows are you retrieving from the linked table?

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • How many relevant unique tag_ids are there? Are they relatively static, or are new ones inserted continually?

    Can't you insert those into a table on the linked server (or just hardcode in an in clause if very few) so that you can avoid the join back to the local table? Even if rapidly changing, pushing those to the linked server if necessary could probably still be relatively quick if indexed correctly.

  • there are over 500 unique tag_ids

  • Insert Into Tag_Data_Capture_staging
    (TagName,CaptureValue,DateRecorded)
    select *
    from OPENQUERY(srv1, 'select tagid,floatvalue,DATEADD(s,t_stamp/1000,''1969-12-31 20:00:00'') from rem.ignition.dbo.sqlt_data_1_2021_12
    where DATEADD(s,t_stamp/1000,''1969-12-31 20:00:00'') between ''12-09-2021'' AND ''12-10-2021'' and floatvalue is not null');

    I did the Insert to local table and used criteria to narrow down data that is retrieved.  DO you see any better way to do the Open Query ?

    Thanks for your replies.

Viewing 15 posts - 1 through 15 (of 55 total)

You must be logged in to reply to this topic. Login to reply