SSIS Visual Studio 2022: Trying to avoid linked servers using SSIS

  • Hello SSC!

    First, I would like to thank everyone for their assistance with my posts. You folks are awesome and have helped me tremendously over the years!!

    Problem statement: I have an SSIS package that I am trying to optimize. The SOURCE and DESTINATION are on 2 different instances, and I am trying to avoid using a linked server. My company is moving away from Linked Servers for security and performance reasons.

    At the moment, the current package selects 5 days' worth of data and reprocess it every night even if the data has not changed. Very inefficient. The source table has a modified date that should be used in the WHERE clause of the source query, so we only select the data that we need.

    Current query in SSIS package...

    SELECT * FROM dbo.SQL_SOURCE WHERE Mod_Date >= Getdate() -5

    So, I used an execute SQL task to get the date into a variable in the parameter/variable section of the SQL Execution task. I convert here since the data type is datetime2(7) in the source table, and from what I understand, SSIS doesn't recognize datetime2 as a valid data type.

    declare @LastDate datetime 
    declare @LastDateChar varchar(19)

    SELECT @LastDate = max([date])
    FROM dbo.SQL_SOURCE
    WHERE PackageName = 'package'
    AND EventType = 'End'

    Select @LastDateChar = convert(varchar(19),@LastDate,120)

    Select ? = @LastDateChar

    Then I plug the variable into OLDB Source editor:  SQL Command from Variable. Quotes are there on purpose since this eliminated one of the error messages. Not sure why it is needed, but it looks like it is needed.

    "SELECT *  FROM dbo.SQL_SOURCE WHERE Mod_Date >= @[User::LastRunDate]"

    This is the error message that I am getting. I am not sure if my logic is flawed or if my syntax is flawed. Also, this package was not built for efficiency, so if anyone has a better design idea, please let me know. I can do what I wish as long as it doesn't involve a linked server.

    Exception from HRESULT: 0xC0202009
    Error at LOAD SQL SOURCE [2]]: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80040E14.
    An OLE DB record is available. Source: "Microsoft SQL Server Native Client 11.0" Hresult: 0x80040E14 Description: "Statement(s) could not be prepared.".
    An OLE DB record is available. Source: "Microsoft SQL Server Native Client 11.0" Hresult: 0x80040E14 Description: "Must declare the scalar variable "@".".

    Any assistance would be greatly appreciated!!

    Dave

     

     

    • This topic was modified 3 months, 2 weeks ago by  Lord Slaagh.

    The are no problems, only solutions. --John Lennon

  • I suggest not working with partial days.  Therefore, I suggest changing the first query to:

    SELECT * FROM dbo.SQL_SOURCE WHERE Mod_Date >= CAST(CAST(Getdate() AS date) AS datetime) -5

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Not sure if I posted this on the correct forum. I see an SSIS forum all the way at the bottom of the Forums list. Should I move this? Not sure how.

    Thank you.

    The are no problems, only solutions. --John Lennon

  • One of the reasons a package will be built to look back more than a single day is to cover any days where the package run is missed.  On the destination side - you can always create an upsert/merge process off a staging table to identify rows that need to be updated, inserted or ignored.

    One trick to do that is using EXCEPT to identify the rows in a CTE as the source to MERGE.  When MATCHED - update and when NOT MATCHED insert.

    You can do the same thing using a standard update/insert - but you would still need either EXCEPT or a check for any column differences.

    Regardless, you should always make sure any data extracts are pulling the data based on either full days - or based on the last time the process ran.  Using GETDATE() without setting the time to midnight can and will miss rows if that step does not start at *exactly* the same time every time it is executed.

    Some questions to consider - is extracting a single day much faster than extracting 5 days?  Is the comparison between the 5 days of data and the destination taking a long time?  Where are the performance challenges in the current process?  Is it the extract - the transformations (if any) or the load?  Is the load attempting to insert directly into the destination - or are you using staging tables?

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Create another variable, called SQLCommand (or whatever). Define SQLCommand in the Variables section as

    SQLCommand = "SELECT * FROM dbo.SQL_SOURCE WHERE Mod_Date >= " + @[User::LastRunDate]

    and use SQLCommand as your query variable.

    I'd also remove the wildcard and explicitly name the columns you are selecting, otherwise the whole thing will explode if any columns are added to the source table.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Hey Phil,

    Thank you. I was missing the plus sign!!! 🙂

    It was a silly syntax error!

    Thank you all for your answers.

    Jeffrey Williams - Thank you so much for the logic and performance pointers. I will definitely research this. I forgot about EXCEPT.

    The are no problems, only solutions. --John Lennon

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply