Weird problem with substring function over 'text' column

  • In the query below, I get the Invalid length parameter passed to the LEFT or SUBSTRING function error, but can't figure out why.

    Query:

    select
    table1.Id,
    substring(
    table2.source_data,
    charindex('<Node1>',table2.source_data) + 7,
    charindex('</Node1>',table2.source_data) - charindex('<Node1>',table2.source_data) - 7
    ) as X
    from
    table1
    inner join table2 on (table2.Id = table1.Id)
    where
    (table1.in_timestamp >= @dateFromIncluded and table1.in_timestamp < @dateToExcluded)

    Please note that the table2.source_data has text data type. If I select the three parameters in the substring function separate, without using substring (so just to see the values), everything looks OK. I see the value table2.source_data, I see the index from, and I see the length. I can copy/paste this to Excel and perform substring, and it works correctly.

    Examples when the error occurs:

    • if I use the full month for data range, for example from '2023-03-01' to '2023-04-01'
    • if I use range bigger then 10 days, approximately, for example '2023-03-01' to '2023-03-10', from '2023-03-02' to '2023-03-11'

    Examples when the error doesn't occur:

    • if I use every possible day in the range (from '2023-03-01' to '2023-03-02', and so on...)
    • if I use a smaller range, for example from '2023-03-01' to '2023-03-09', and then on... from '2023-03-02' to '2023-03-10', from '2023-03-03' to '2023-03-11', ...
    • if I add an additional predicate to the where statement 'and table1.Id in (list of all ids)'. I get the list of all ids if I execute 'select distinct table1.Id'
    • if I store the query results into a temp table or a table variable, and then perform substring on this data.

    Any idea why this could happen? Currently I solved the problem with the use of table variable, like this:

    declare @data (Id int, source_data text, substring_from int, substring_to int);

    insert into @data (Id, source_data, substring_from, substring_to)
    select
    table1.Id,
    table2.source_data,
    charindex('<Node1>',table2.source_data) + 7,
    charindex('</Node1>',table2.source_data) - charindex('<Node1>',table2.source_data) - 7
    from
    table1
    inner join table2 on (table2.Id = table1.Id)
    where
    (table1.in_timestamp >= @dateFromIncluded and table1.in_timestamp < @dateToExcluded);

    select Id, substring(source_data, substring_from, substring_to)
    from @data

     

     

     

  • The text data type has been depreciated for sometime and the remarks in the following state that CHARINDEX cannot be used with image, ntext, or text data types.

    https://learn.microsoft.com/en-us/sql/t-sql/functions/charindex-transact-sql?view=sql-server-ver16

    I suspect </Node1> in some of your rows starts at greater than 8000. You might want to try PATINDEX() but I suspect you will need to CAST table2.source_data to varchar(MAX).

  • Just to be safe, I would CAST the text to varchar(max).  And, also, add code in case '</Node1>' is missing:

    select
    table1.Id,
    substring(
    ca1.source_data_varchar,
    charindex('<Node1>',ca1.source_data_varchar) + 7,
    nullif(charindex('</Node1>',ca1.source_data_varchar), 0) - charindex('<Node1>',ca1.source_data_varchar) - 7
    ) as X
    from
    table1
    inner join table2 on (table2.Id = table1.Id)
    cross apply (select cast(table2.source_data as varchar(max)) as source_data_varchar) ca1
    where
    (table1.in_timestamp >= @dateFromIncluded and table1.in_timestamp < @dateToExcluded)

     

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply