Using CTE with Parameters in OLEDB Data Source in SSDT (or why is SSDT so braindead?)

  • <FRUSTRATED RANT>
    Another title for this post could be "Why is SSDT so braindead?" or "Does Microsoft have a clue about the needs of SSDT developers?" 

    It's ***software*** Microsoft - you could make this work if you invested a bit of thought and development effort into SSDT.

    I really *want* to like SSDT, but run into crap like this almost daily requiring a convoluted workaround...
    </FRUSTRATED RANT>

    The below query works fine in SSMS.  It also works as a view, but would require a linked server.  But it fails miserably in SSDT as an OLEDB data source.  Any ideas how to get this to work?

    I could split the extract and deduping into two separate steps using an intermediate table, but am trying to improve performance by having it all execute on the source server before sending the results to the target server.

    (My actual query doesn't use SELECT *, I'm just trying to save hundreds of lines in this post.)

    -- Delta source table extract, new or changed rows between start and end dates
    WITH cteDelta AS (
    SELECT *
    FROM [dbo].[EPISODE]
    WHERE (
    (? <= [valid_to_date] AND [valid_to_date] < ?)
    OR
    (? <= [valid_from_date] AND [valid_from_date] < ?)
    )
    )
    ,
    -- Dedup, only want the most recent row within the keys
    cteDedup AS (
    SELECT *
    FROM (
      SELECT *, ROW=ROW_NUMBER() OVER (
       PARTITION BY facility_identifier, stay_number, episode_sequence_number
       ORDER BY valid_from_date DESC
      )
      FROM cteDelta
    ) x
    WHERE ROW=1
    )
    SELECT *
    FROM cteDedup

  • Nesting the CTE doesn't work either:

    SELECT *
    FROM (
      SELECT *, ROW=ROW_NUMBER() OVER (
       PARTITION BY facility_identifier, stay_number, episode_sequence_number
       ORDER BY replica_valid_from_date DESC
      )
      FROM (
       SELECT *
       FROM [dbo].[EPISODE]
       WHERE (
       (? <= [replica_valid_to_date] AND [replica_valid_to_date] < ?)
       OR
       (? <= [replica_valid_from_date] AND [replica_valid_from_date] < ?)
       )
      ) x
    ) x
    WHERE ROW=1

    The error message says put the entire block of code in a variable - also doesn't work.

  • So I found this SO post:  https://stackoverflow.com/questions/42782464/error-in-ssis-parameters-cannot-be-extracted-when-parsing-query-syntax-error

    Which echoes my original sentiment:  SSDT is braindead.

  • Have you tried building the SQL in an expression and using that as your data source (ie, no parameters)?
    You might have more success if you explicitly name your columns too.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Could be due to the type of your parameters, if they are not text/string types. Also could be that the variables used do not have any values when the package goes through the pre-validation phase. 

    I agree with Phil though. In these situations I prefer to use expressions and build the query that way, making it easier to see and troubleshoot the query.

  • Have you tried making it an inline table-valued function?
    I don't get why would you need a linked server to make it a view or how can you send parameters to a CTE inside a view. The problem might be on how you're calling this, but I have no idea of what you're doing.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares - Thursday, March 8, 2018 9:35 AM

    Have you tried making it an inline table-valued function?
    I don't get why would you need a linked server to make it a view or how can you send parameters to a CTE inside a view. The problem might be on how you're calling this, but I have no idea of what you're doing.

    I'd rather wrap it in a parameterized stored proc before creating a function...

  • Martin Schoombee - Thursday, March 8, 2018 9:40 AM

    Luis Cazares - Thursday, March 8, 2018 9:35 AM

    Have you tried making it an inline table-valued function?
    I don't get why would you need a linked server to make it a view or how can you send parameters to a CTE inside a view. The problem might be on how you're calling this, but I have no idea of what you're doing.

    I'd rather wrap it in a parameterized stored proc before creating a function...

    Why?

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares - Thursday, March 8, 2018 9:45 AM

    Martin Schoombee - Thursday, March 8, 2018 9:40 AM

    Luis Cazares - Thursday, March 8, 2018 9:35 AM

    Have you tried making it an inline table-valued function?
    I don't get why would you need a linked server to make it a view or how can you send parameters to a CTE inside a view. The problem might be on how you're calling this, but I have no idea of what you're doing.

    I'd rather wrap it in a parameterized stored proc before creating a function...

    Why?

    For the same reasons you would otherwise use stored procedures to logically store and execute business logic.

  • You might have to put this in a stored procedure and EXECUTE WITH RESULTS SETS where you define the schema expected. Here is MS's example

    EXEC uspGetEmployeeManagers 16
    WITH RESULT SETS

     ([Reporting Level] int NOT NULL,
      [ID of Employee] int NOT NULL,
      [Employee First Name] nvarchar(50) NOT NULL,
      [Employee Last Name] nvarchar(50) NOT NULL,
      [Employee ID of Manager] nvarchar(max) NOT NULL,
      [Manager First Name] nvarchar(50) NOT NULL,
      [Manager Last Name] nvarchar(50) NOT NULL )
    );

  • Thanks for all the replies.  I've been pretty busy so apologies for the delayed follow up post.

    "Have you tried building the SQL in an expression and using that as your data source (ie, no parameters)?" 
    I started trying this, but the text of the code was so long that I bailed on this approach.  Another equally viable question would be "Is Microsoft capable of writing a competent code parser?"

    "You might have more success if you explicitly name your columns too."
    That is not germane to my main issue.  Besides, I said this in my OP:  "(My actual query doesn't use SELECT *, I'm just trying to save hundreds of lines in this post.)"

    "Could be due to the type of your parameters, if they are not text/string types."
    The parameters are DATETIME2, read from a query against a parameters table.  So the parameters (variables) are stored as System.Object.

    "Also could be that the variables used do not have any values when the package goes through the pre-validation phase."
    They don't have values during the design phase, but have values during execution.  However, braindead SSDT won't let me save the code if it can't work out the parameters (see below).

    "In these situations I prefer to use expressions and build the query that way, making it easier to see and troubleshoot the query."
    Again, see below.  I assert that your preference is merely because SSDT doesn't allow you to do what you want.  As you'll see below, the query is not complex at all.  And I choose not to put hundreds of lines in the expression editor, only to replace my parameters via dynamic SQL, merely because SSDT is so braindead.  Using the "?" as placeholders for the parameter values is exactly what that functionality is designed for.  You're merely manually coding what should be working functionality in SSDT.

    "Have you tried making it an inline table-valued function?"
    "I'd rather wrap it in a parameterized stored proc before creating a function..."

    No.  But isn't this just another kludge to work around SSDT's lack of functionality?

    "I don't get why would you need a linked server to make it a view or how can you send parameters to a CTE inside a view. The problem might be on how you're calling this, but I have no idea of what you're doing."
    My source table is on a remote server.  I have no rights on that server - I can only read data, can't create views, sp's, functions, etc.
    I'm doing a delta extract, selecting rows that have changed between a sliding start and end date (these dates are my parameters)
    Once the data is extracted to my local server, where I have more rights, I can use views, sp's, functions, etc. against that data.
    The source table is SCD2, so there are a lot of history rows within the natural keys.  I'm only concerned with the latest row within the natural keys.  If I could dedup the data at the remote server, I think I'd get a performance gain, instead of pulling rows to my local server, then deduping it there.
    If the remote server was a linked server, I'd have more options on my local server, like creating a view, etc.  Instead, I'm using SSDT's capability of OLEDB source on one server, OLEDB target on another server, with the data "flowing" through the server on which the package is executing.
    The remote server is also more powerful than my local server, so I'd like to do as much processing there as possible (i.e. dedup there, then pull the reduced data across the network).

    "...how can you send parameters to a CTE inside a view..."
    It's not a CTE instead a view, it's a CTE as part of a query within an OLEDB data source.  And you can't send parameters to a CTE, since SSDT is so braindead (see below).

    Here's the "see below" part / example:

    So I created a simplified example:

    SQL Task:
    SELECT DataExtract_StartTime, DataExtract_EndTime FROM meta.Parameter_Values
    Single row ResultSet
    Save the values to DataExtract_StartTime and DataExtract_EndTime variables (System.object) respectively

    Data Flow Task:
    OLEDB Source:
    SELECT 1 AS ROW_COUNT
    FROM EPISODE
    WHERE (? <= replica_valid_from_date and replica_valid_to_date < ?)
    Map the parameters to DataExtract_StartTime and DataExtract_EndTime respectively

    Connect it to a Row Count task for testing.

    Test it, it works fine

    Now, change the query in the OLEDB Source to:
    WITH cteSSDTIsBrainDead AS (
    SELECT 1 AS ROW_COUNT
    FROM EPISODE
    WHERE (? <= replica_valid_from_date and replica_valid_to_date < ?)
    )
    SELECT ROW_COUNT FROM cteSSDTIsBrainDead

    I can't even save this query, I get this error:

    Source: "Microsoft SQL Server Native Client 11.0" Hresult: 0x80004005 Description: "Syntax error, permission violation, or other nonspecific error".

    If I replace the ? with '2015-12-01 00:00:00' and '2016-01-01 00:00:00', I can run the 2nd query just fine in SSMS on the remote server.

    OK, so maybe a CTE confuses the OLEDB data source.  But certainly a nested query should work, right???
    SELECT 1 AS ROW_COUNT
    FROM (
    SELECT 1 AS ROW_COUNT
    FROM EPISODE
    WHERE (? <= replica_valid_from_date and replica_valid_to_date < ?)
    )

    Same braindead results.

    How bloody hard is it to write a code parser that 1) finds question marks, 2) replaces that placeholder with the value of the parameters (with any requisite type conversions), 3) submit that code on the selected server?

    Perhaps it would be a bit tricky if a question mark was in the text, say in a where clause?  So, how bloody hard would it be for SSDT to say "I can't parse this query, would you like to manually define your parameters?".  Then, allow me to manually define the parameters corresponding to the questions marks, then run the code?

    How bloody hard would it be for Microsoft to actually test SSDT under common use cases, and fix these design bugs???

    How bloody hard would it be for Microsoft to actually put themselves in the shoes of an SSDT developer, and create well-designed, fully functional software?

    As far as my actually problem...I've bailed on trying to performance tune my processing.  I "download" the full delta extract (i.e. data between start and end dates, but with duplicate SCD2 history) to a staging table on my local server, then I dedup it there.  At least SSDT understands the parameters in the where clause of a very basic SELECT query.

    I'm open to all the ideas suggested above - I certainly don't have the SQL Server experience of most folks on this forum.  And I'm grateful for the ideas, and for the time it took you to share those ideas.  But, speaking strictly objectively here, most of those ideas just sound like klunky workarounds to SSDT's shortcomings, and that you would use a different approach if SSDT wasn't so braindead.

    Hopefully Microsoft will improve/fix SSDT in a future release.

  • In my opinion, embedding complex SQL logic in an SSIS package is a bad idea. Because when it comes to considering changes to a database (for example refactoring, new features, whatever), it's very easy to forget about code which resides outside of the database (in SSIS, SSRS, Windows apps, Web apps). Even if you remember, the exercise of changing the database can suddenly increase significantly in scope, because of all of the different areas in which changes need to be made.
    If, instead, you keep all complex logic inside the database, in the form of views/procs etc, this problem is easier to manage.
    That is why I create a proc for anything more complex than a simple SELECT, when coding in SSIS.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Phil Parkin - Friday, March 16, 2018 5:20 AM

    In my opinion, embedding complex SQL logic in an SSIS package is a bad idea. Because when it comes to considering changes to a database (for example refactoring, new features, whatever), it's very easy to forget about code which resides outside of the database (in SSIS, SSRS, Windows apps, Web apps). Even if you remember, the exercise of changing the database can suddenly increase significantly in scope, because of all of the different areas in which changes need to be made.
    If, instead, you keep all complex logic inside the database, in the form of views/procs etc, this problem is easier to manage.
    That is why I create a proc for anything more complex than a simple SELECT, when coding in SSIS.

    Have to agree with Phil here; almost all of my SSRS and SSIS packages use SP's for their interactions with the database. This is especially more important as our main database isn't maintained externally and DDL changes will and do happen without any prior notification (and break things). If you have the option of creating this within the database, I strongly suggest you do; you'll probably find the issue resolves itself.

    I might have missed it in a later post, but you state in your original post that the SQL fails miserably; I can't see you explain what "failing miserably" means. I'm guessing, perhaps, you mean it performs badly, or perhaps you're saying that the SQL doesn't actually work (at all).

    Your posted SQL also probably doesn't show what your actual query really is. You mention that you would need to use a linked server, but I can't see any reference to that in the query you have. At a guess, however, is the problem that you can't make changes to the server the data is hosted in, and those you're using different server to create objects (such as SPs) that can?

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Having had problems myself using Temporary Tables and CTE's with SSIS your best option is to create a stored procedure and use EXECUTE WITH RESULTSETS. I believe the problem is that when using a CTE SSIS is not able to determine the Schema hence you have to define it in your Execute statement.

  • @All:
    From my post above:
    "My source table is on a remote server. I have no rights on that server - I can only read data, can't create views, sp's, functions, etc."
    And where do you suggest I store these views, stored procedures, etc???????

    @Phil Parkin:
    "...embedding complex SQL logic in an SSIS package is a bad idea."
    Please answer whether you think these queries are simple or complex.  A yes or no answer is fine (and preferred):

    1) SELECT * FROM MYTABLE
    2) SELECT * FROM MYTABLE WHERE valid_from_date BETWEEN ? AND ?  (where ? is an SSDT parameter)
    3) SELECT x.* FROM (SELECT *, ROW=ROW_NUMBER() OVER (PARTITION BY whatever ORDER BY whatever) FROM MYTABLE WHERE valid_from_date BETWEEN ? AND ?) AS x WHERE ROW=1

    IMO, the correct answer is yes for all three.  If you think a simple nested query with a windowing function is "complex", well, I don't know what to say on that.  BTW, all queries would work fine - SSDT wouldn't consider it "complex" - if I just remove the parameters.

    @Thom_A:
    "Have to agree with Phil here; almost all of my SSRS and SSIS packages use SP's for their interactions with the database."
    How do you handle the situation where you need to extract data from ServerA (source) to ServerB (target), and you have no rights on ServerA (other than reading the data)?
    BTW, once my data is on my local server, all the rest of my SSDT processing is using views and stored procedures to encapsulate the required logic.  SSDT is just the "glue" or "plumbing" to package up and execute the code.

    "I'm guessing, perhaps, you mean it performs badly, or perhaps you're saying that the SQL doesn't actually work (at all)."
    I think I explained this clearly in my later post.  The SQL doesn't actually work at all (even complains when saving it).

    "You mention that you would need to use a linked server,"
    Forget I ever mentioned linked server.
    IIIIIFFFFF I had a linked server, one approach (not saying I would use it) would be to use the linked server + a view on my local server to encapsulate the logic and to pull the data across the network (via the linked server) to my local server. 

    @Tim.ffitch:
    "your best option is to create a stored procedure and use EXECUTE WITH RESULTSETS"
    See above re: access rights on the remote server.
    I'll read up on EXECUTE WITH RESULTSETS...thanks.

    "I believe the problem is that when using a CTE SSIS is not able to determine the Schema hence you have to define it in your Execute statement."
    I disagree.  I tested this on a small table using SQL Command in the OLEDB data source:

    WITH cteLastRow AS (
    SELECT *
    FROM (
    SELECT *, ROW=ROW_NUMBER() OVER (
    PARTITION BY facility_identifier
    ORDER BY replica_valid_from_date DESC)
    FROM dbo.FACILITY) AS src
    WHERE ROW=1
    )
    SELECT * FROM cteLastRow

    This works fine.

    BTW, THIS IS JUST AN EXAMPLE.  Phil, if you comment about SELECT *, or it's too complex, or whatever, I'll "scream".  It's just an example that SSDT and OLEDB data source can handle such a hideously complex query as the data source.

    But this fails:

    WITH cteLastRow AS (
    SELECT * 
    FROM (
    SELECT *, ROW=ROW_NUMBER() OVER (
    PARTITION BY facility_identifier
    ORDER BY replica_valid_from_date DESC)
    FROM dbo.FACILITY) AS src
    WHERE ROW=1
    )
    SELECT * FROM cteLastRow WHERE replica_valid_from_date > ?

    Doesn't even save.

    I'm happy to receive further comments, but I remain convinced SSDT is braindead with respect to parameter parsing.  In fact, while I appreciate all the comments, not a single one addressed my assertion that SSDT parameter parsing is broken.  Instead, they all essentially say "don't do that", rather than addressing the subject as stated.

    I'm thinking this post is about run its course...

    I'll Google how to enter bugs on SSDT (I think it's Microsoft Connect???) and see if MS have anything to say (not holding my breath on that one).

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply