Select from view on FileTable throws error

  • I have a view that computes some fields, using SubString and CharIndex. When I do a Select *, it works great. When I add some conditions, specifying conditions for the COMPUTED FIELDS, the query bombs claiming that invalid arguments were passed to the SubString function.

     

    Those computed fields all get computed correctly - there are conditions in the view to eliminate values that would make those functions bomb, so the computed fields are only executed on values that allow proper completion of the computed value. However, when I put a where clause to the select that extracts what I want from the view, it seems like it tries to compute for ALL rows in the source table, and only afterwards apply those conditions specified in the view that are what lets the view work at all.

     

    Is there any way to make the view behave properly? I need it to apply the conditions specified in the view itself, THEN do the computations, and THEN apply the conditions that I specifiy in my select from the view.

     

    I don't know if it makes any difference, but the initial view selects from a FileTable in a different database in the same instance on the same server, with access to it via a Synonym pointing into that other database. The other database contains ONLY FileTables, so all code, including the base view are in the database that contains all the code, and is the one directly accessed by the user app.

  • The reason this happens is because the conditions cause a different execution plan to be generated - and that new plan performs those calculations before applying the where clause.  That means those rows where the computed column would not be able to execute are processed - and thus, you get an error.

    The fix is to correct the computed column so it works for all rows.  That means, if the source column doesn't contain the character(s) you are looking for using CHARINDEX - you either return the full string or you return a NULL or some other value.

    Remember, a view is not materialized - it is incorporated into the outer query and then an execution plan is generated.  SQL Server has determined that reading all of the data from the underlying source tables and then filtering the results is the better option.

    In these cases, I would not use the view - instead I would write the full query using the view as a starting point and then incorporate the additional logic.  This would allow me the ability to do something like:

    SELECT computed_column = SUBSTRING(v.source_column, 1, p1.pos - 1)
    FROM ...
    CROSS APPLY (VALUES (concat(source_column, ','))) AS v(source_column)
    CROSS APPLY (VALUES (CHARINDEX(',', v.source_column, 0)))) AS p1(pos)

    I can even add an additional CROSS APPLY to calculate the computed column - and then use that computed column in the SELECT or WHERE as needed.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • I see. That makes sense, but my source view is rather complicated. The base view picks out all the stuff that is acceptable, and the application is then supposed to select what it wants, but ONLY FROM the restricted set offered. Putting all the initial filtering conditions into the application would make it extremely messy. Also, I have a general protocol that the application is allowed to access only views and stored procedures, which are all in various schemas, and access by various users is dictated by schemas, restricted by their membership in the organization's domain.

    Your suggestion would also require making accessible all the fields that my base view removes, and which the application (theoretically) knows nothing about, although I have written the entire system solo, so it's not a matter of a DBA trying keep a check-rein on a developer.

    Is there really no way to force the engine to do things the way I want? I realize that arguing with the optimizer is generally a bad idea, but I have a fair amount of work invested in this separation of tasks - first restriction by the view, presenting to the app only what I want the app to select from, then having the app make a secondary choice from that. It doesn't seem to me such an outlandish requirement, and I am not especially concerned with performance in this case.

    I wonder, would putting the source view in the FileTable database maybe interrupt the optimization process? Would reaching into a different database disrupt things enough that it would not try to help me, but take the view from the external database as is?

  • The Where clause in the base view is the most complex part of the whole thing. I am looking at various filenames and picking out those I want. Things like Thumbs.db and Desktop.ini are not acceptable names. What is acceptable takes quite a bit of testing, since the rules for filenames in this app are rather flexible. One of the conditions is

    AND((name LIKE '[0-9];[0-9][0-9][0-9][0-9][^0-9]%')
    OR (name LIKE '[0-9][0-9];[0-9][0-9][0-9][0-9][^0-9]%')
    OR (name LIKE '[0-9][0-9][0-9];[0-9][0-9][0-9][0-9][^0-9]%')
    OR (name LIKE '[0-9][0-9][0-9][0-9];[0-9][0-9][0-9][0-9][^0-9]%')
    OR (name LIKE '[0-9][0-9][0-9][0-9][0-9];[0-9][0-9][0-9][0-9][^0-9]%'))

    and there are several like that. Trying to put all that into the app is something I would like to avoid if at all possible, especially since the [name] column is not even one of the fields I expose to the app.

    And I don't really get why the new plan would do these computations. Is not a major point of a Where clause to restrict what the engine needs to work on? Why would the optimizer generate a plan that does a bunch of calculations whose results are only going to be discarded? There are thousands of posts on the net advising people to NOT put functions in their Where clauses for exactly that reason. It seems completely fractured for the optimizer to do just that.

    • This reply was modified 1 year, 4 months ago by  pdanes.
  • Well, I did it with a stored procedure, but it takes some extra dicking around in the app, because it caches the results. I have to do some extra clearing and re-specifying the record source, because just requerying the source does not re-read in the updated FileTable. At least it works, but I still think there should be some way to make my original configuration with the view work properly.

  • If I'm reading all of this correctly, another issue with filtering the calculated columns of a view is that all of the rows must first be materialized and then the criteria is applied.  That's not going to scale so well.

    I have, in the past, converted views to iTVFs (inline Table Valued Functions), which can be thought of as "parameterized views", but not all views can be converted.

    There's also the possibility of creating an "Indexed View", with would automatically be kept up to date in a Clustered Index but that has some serious limitations on what the code can contain and, unless it's a pre-aggregation, it's a wanton duplication of data.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    If I'm reading all of this correctly, another issue with filtering the calculated columns of a view is that all of the rows must first be materialized and then the criteria is applied.  That's not going to scale so well.

    So it seems, and I don't understand why that is happening. It seems completely back-asswards to me. I would expect the view, which contains restrictive conditions, to be filtered first, THEN generate the computed columns. Why compute columns when you already know you're to discard the results?

    There's also the possibility of creating an "Indexed View", with would automatically be kept up to date in a Clustered Index but that has some serious limitations on what the code can contain and, unless it's a pre-aggregation, it's a wanton duplication of data.

    I looked at that as well, but have not been able to do it. Schemabinding is not permitted on the view as I have written it. Although, now that I think about it, I might be able to create a filtered and persisted view that contains only the conditions, and add a second layer with the computed columns selecting from that. It is not a huge table, so the issue of 'wasted' disk space is not a problem.

    • This reply was modified 1 year, 4 months ago by  pdanes.
  • That did it. I put the view into the FileTable database and changed it around a bit to eliminate Unions. It's massively ugly now, but getting rid of unions and synonym references allowed me to create an index on it, making it persisted. My synonym in the main database now refers to this view instead of the actual FileTable, and all works as it should. Where clauses on queries of this linked table (view) in the app filter data properly.

     

    I still think it's screwy that SQL Server made a query plan that does column computations BEFORE removing rows according to conditions in the Where clause. Performing calculations when you KNOW in advance that some of the results of those calculations will be discarded unused seems the height of absurdity. If anyone has any thoughts on why the engine does this, I'd love to hear them.

  • pdanes wrote:

    I still think it's screwy that SQL Server made a query plan that does column computations BEFORE removing rows according to conditions in the Where clause.

    In calculated columns in a view, the calculations in columns must be executed to produce the values that can be filtered and only then can the filter be applied to determine which rows have calculations that result in values according to the desired criteria.  How could it do otherwise?

    The only way to effectively pre-filter on a a normal view is to filter on columns that are not calculated.  It's just like having SARGable criteria.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    In calculated columns in a view, the calculations in columns must be executed to produce the values that can be filtered and only then can the filter be applied to determine which rows have calculations that result in values according to the desired criteria.  How could it do otherwise?

    The only way to effectively pre-filter on a a normal view is to filter on columns that are not calculated.  It's just like having SARGable criteria.

    But that's exactly what I do. The base view selects strictly on values from the table. I didn't even think it would be possible to do otherwise.

    My base query looks like (aircode):

    Select Compute1(Fld1) as tmpResult1, Compute2(Fld2) as tmpResult2, Compute3(Fld3) as tmpResult3
    From Table
    Where Fld1 = 'A' and Fld2 = 'B' And Fld3 = 'C'

    This works fine, and also works when I select the view with no conditions:

    Select * From Query1

    But it blows up when I try:

    Select * From Query1 Where tmpResult1 = 'D'

    I could see such a where clause not working in the base, since tmpResult1 has yet to be computed - how could you put a condition to test something before you have it? Or can you? I've never tried - it seems obvious to me that it could not work, but maybe I'm not giving the engine enough credit for being able to do weird things. But the second query should be processing the RESULTS of the first query, and by that time, the computed values should exist. And the bad values should NOT exist - the base query is supposed to filter them out, precisely to keep this from happening. It seems the engine is putting together my select from a select into one overall select, and getting confused about what will be available when. Isn't there some way to force what I want? I seem to remember something like a NoExpand hint is supposed to tell the engine to make queries behave as written, instead of trying to be smart about it, joining the two queries into one, and winding up with a mess like this, where it is trying to compute something I explicitly exclude from the computations.

    • This reply was modified 1 year, 4 months ago by  pdanes.
    • This reply was modified 1 year, 4 months ago by  pdanes.
    • This reply was modified 1 year, 4 months ago by  pdanes.
  • pdanes wrote:

    But it blows up when I try:

    Select * From Query1 Where tmpResult1 = 'D'

    That, good Sir, is what I'm talking about being the problem.  That is making for basically a non-SARGable query.  tmpResult1 has to be calculated for all rows in the view before it can be filtered.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    pdanes wrote:

    But it blows up when I try:

    Select * From Query1 Where tmpResult1 = 'D'

    That, good Sir, is what I'm talking about being the problem.  That is making for basically a non-SARGable query.  tmpResult1 has to be calculated for all rows in the view before it can be filtered.

    But why? tmpResult1 should be calculated ONLY for those that the base view permits to exist at all. The base query permits only those values for which tmpResult1 can be properly calculated. THEN those calculated values should be filtered further, in the second step of selecting from the first query, rather than directly from the base table. The base table contains all sort of ballast that makes the calculations blow up, which is why I filter FIRST - to get rid of all the non-calculable junk, and the second step should be selecting only from the acceptable, properly calculated results. What is happening is the query-from-query is bypassing my initial (and critical) filtering conditions, trying to calculate values which I explicitly tell it to NOT use. I want that filtering to happen as the first step. I already know that these rows contain unusable values. I'm making a point of telling the engine that those rows are to be excluded, always, *-before-* it does anything else, and it's ignoring what I tell it.

  • You still seem to think SQL Server should materialize the 'view' before applying the filter from the outer/calling query.  SQL Server does not do that - it compiles the view and the outer query into a single statement and then generates the execution plan.

    There is no 'base view' that only permits specific rows.  The view is not materialized - it is not static - and SQL Server will not guarantee that it will process the resulting query in any specific order.  SQL Server can push filtering to any portion of the process depending on what it has determined is the best plan.

    The real fix to this issue is to make sure each computed column doesn't fail, regardless of the where clause used to filter out the 'bad' rows that cause the computed columns to generate an error.

    As soon as you add additional criteria against that view - whether that is a join or in the where clause - SQL Server will generate a plan specific to that query.  In this case, that caused SQL Server to process the computed columns prior to applying the filters embedded in the view - and that caused the failure.

    For SUBSTRING with CHARINDEX - the easiest way to ensure you get a value is to make sure the search character(s) used for CHARINDEX appear for every row.  One way to do that is to use CROSS APPLY with VALUES and concatenate the search character(s) to the end of the column.

    CROSS APPLY (VALUES (concat(t.column, ','))) AS v(column)

    This then allows you to use SUBSTRING(v.column, 1, CHARINDEX(',', v.column, 1)) - and this will work because all rows contain the delimiter and you won't get an invalid length passed to SUBSTRING.  If you don't have that search character in the column - CHARINDEX returns a 0 which is invalid.

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Okay, thanks. I guess  this is one of the pitfalls of using a declarative language instead of a procedural one. I still have trouble getting my head around it sometimes, but I have to use the tool as  it works, not as I wish it did.

    There is more to the situation than just the single character looked for by CharIndex. I have used the trick of gluing a trailing character to make sure a search doesn't fail, but this needs much more than that. The details are not especially interesting, but in brief, filenames must match a specific sort of mask, of characters and numbers in a specific pattern. I cannot control what sort of filenames people might add to this system, but I need to pick out those that match this pattern for display by this query. There are other procedures that comb for non-matching and announce it elsewhere so they can be dealt with offline, but I have no mechanism for real-time blocking of 'bad' names, and building one would be insanely complicated, so I have to deal with any text string that can possibly be a filename, even including things like Cyrillic characters. That is why I do the primary filtering first, the get rid of anything that doesn't match the pattern, because both the number of ways in which it can match or not match the pattern are large.

     

    In any case, I got it working the way I want with the materialized view in the external database, which does the filtering first, and I could always write a stored procedure if I got really stuck. So I have a working setup, and I learned something in the process. Appreciate the time and explanations.

    • This reply was modified 1 year, 4 months ago by  pdanes.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply