Substring Function in Where Clause

  • I am currently doing string manipulations with T-sql. In the where statement of my code I have the following substring function:

    WHERE

    SUBSTRING(LineRead,2,4) IN ('mon ','tue ','wed ','thu ','fri ','sat ','sun ')

    This produces the output that I want albeit it takes a while. I have been reading up about using EXISITS instead of IN. I just wanted to know if there is another way of rewriting this where clause.

  • eseosaoregie (4/20/2010)


    I am currently doing string manipulations with T-sql. In the where statement of my code I have the following substring function:

    WHERE

    SUBSTRING(LineRead,2,4) IN ('mon ','tue ','wed ','thu ','fri ','sat ','sun ')

    This produces the output that I want albeit it takes a while. I have been reading up about using EXISITS instead of IN. I just wanted to know if there is another way of rewriting this where clause.

    The reason it takes a while is that it can't use an index if one exists. Can you add a computed column to the table with the substring values and index it?

  • If i add the computed colum presumably the index to create is a non clustered index?

  • eseosaoregie (4/21/2010)


    If i add the computed colum presumably the index to create is a non clustered index?

    yes it should be non clus. index

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • I created the computed column and the index on that column. The computed column is as follows

    --Add Computed Column

    ALTER TABLE MyTable

    ADD comp_column AS substring(LineRead,2,4) varchar

    --Create Column

    CREATE INDEX IX_COMPColumn ON MyTable(comp_column)

    However when I run the query it takes the same time. How can I alter the where clause syntax such that it can utilize the index on the computed column?

  • eseosaoregie (4/21/2010)


    I created the computed column and the index on that column. The computed column is as follows

    --Add Computed Column

    ALTER TABLE MyTable

    ADD comp_column AS substring(LineRead,2,4) varchar

    --Create Column

    CREATE INDEX IX_COMPColumn ON MyTable(comp_column)

    However when I run the query it takes the same time. How can I alter the where clause syntax such that it can utilize the index on the computed column?

    it should not contain any function like "substring" , it will force the sql optimizer NOT TO USE indexes so here

    your query WHERE derived_cloumn like 'text%'

    Can you also post exec plan along with table and indexes. ?

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • try:

    comp_column IN ('mon ','tue ','wed ','thu ','fri ','sat ','sun ')

    [font="Arial Narrow"]bc[/font]

  • my tables are:

    ---index

    CREATE

    INDEX [IX_FirstFour] ON stg_LogFiles ([Firstfour])

    WITH

    DROP_EXISTING

    ON [PRIMARY]

    ---tables

    CREATE TABLE [stg_LogFiles] (

    [LineRead] [varchar] (100) COLLATE Latin1_General_CI_AS NULL ,

    [Importeddate] [datetime] NULL CONSTRAINT [DF__stg_LogFi__Impor__5441852A] DEFAULT (getdate()),

    [Firstfour] AS (substring([LineRead],2,4))

    ) ON [PRIMARY]

    GO

    CREATE TABLE [fct_Logs] (

    [LogDate] [datetime] NULL ,

    [Application] [int] NULL ,

    [Database] [int] NULL ,

    [User] [int] NULL ,

    [Retrieval] [int] NULL

    ) ON [PRIMARY]

    GO

    ---the query

    INSERT INTO [ESSBASE_TEST_DW].[dbo].[fct_Logs]([LogDate], [Application], [Database], [User], [Retrieval])

    (SELECT

    CONVERT(datetime,SUBSTRING(B.[Date],5,7)+ RIGHT(B.[Date],4) + SUBSTRING(B.[Date],11,9),108),

    A.ApplicationID,

    A.DatabaseID,

    U.UserID,

    1

    FROM

    (SELECT

    SUBSTRING(LineRead,2,24) AS [Date]

    ,SUBSTRING(

    LineRead,

    CHARINDEX('/',LineRead)+1,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)-1 - CHARINDEX('/',LineRead)) AS [Application]

    ,SUBSTRING(

    LineRead,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)+1,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)+1)-1

    - CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)) AS [Database]

    ,SUBSTRING(

    LineRead,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)+1) +1,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)+1)+1)- 1 -

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1) +1)) AS [User]

    FROM dbo.stg_LogFiles

    WHERE

    --CHARINDEX('[',LineRead) = 1

    --AND

    SUBSTRING(LineRead,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)

    ,2) <> '//' -- ONLY SELECT records where user id is present

    AND

    (Firstfour = 'mon ' or Firstfour = 'tue ' or Firstfour = 'wed ' or Firstfour = 'thu ' or Firstfour = 'fri ' or Firstfour = 'sun ' or Firstfour = 'sat ')

    )B

    LEFT JOIN dbo.dimApplication A ON A.Application = B.Application

    LEFT JOIN dbo.dimUser U ON U.UserName = B.[User]

    )

  • It looks like it still needs to scan the whole table for this:

    SUBSTRING(LineRead,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)

    ,2) <> '//'

  • eseosaoregie (4/21/2010)


    my tables are:

    ---index

    CREATE

    INDEX [IX_FirstFour] ON stg_LogFiles ([Firstfour])

    WITH

    DROP_EXISTING

    ON [PRIMARY]

    ---tables

    CREATE TABLE [stg_LogFiles] (

    [LineRead] [varchar] (100) COLLATE Latin1_General_CI_AS NULL ,

    [Importeddate] [datetime] NULL CONSTRAINT [DF__stg_LogFi__Impor__5441852A] DEFAULT (getdate()),

    [Firstfour] AS (substring([LineRead],2,4))

    ) ON [PRIMARY]

    GO

    CREATE TABLE [fct_Logs] (

    [LogDate] [datetime] NULL ,

    [Application] [int] NULL ,

    [Database] [int] NULL ,

    [User] [int] NULL ,

    [Retrieval] [int] NULL

    ) ON [PRIMARY]

    GO

    ---the query

    INSERT INTO [ESSBASE_TEST_DW].[dbo].[fct_Logs]([LogDate], [Application], [Database], [User], [Retrieval])

    (SELECT

    CONVERT(datetime,SUBSTRING(B.[Date],5,7)+ RIGHT(B.[Date],4) + SUBSTRING(B.[Date],11,9),108),

    A.ApplicationID,

    A.DatabaseID,

    U.UserID,

    1

    FROM

    (SELECT

    SUBSTRING(LineRead,2,24) AS [Date]

    ,SUBSTRING(

    LineRead,

    CHARINDEX('/',LineRead)+1,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)-1 - CHARINDEX('/',LineRead)) AS [Application]

    ,SUBSTRING(

    LineRead,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)+1,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)+1)-1

    - CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)) AS [Database]

    ,SUBSTRING(

    LineRead,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)+1) +1,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)+1)+1)- 1 -

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1) +1)) AS [User]

    FROM dbo.stg_LogFiles

    WHERE

    --CHARINDEX('[',LineRead) = 1

    --AND

    SUBSTRING(LineRead,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)

    ,2) <> '//' -- ONLY SELECT records where user id is present

    AND

    (Firstfour = 'mon ' or Firstfour = 'tue ' or Firstfour = 'wed ' or Firstfour = 'thu ' or Firstfour = 'fri ' or Firstfour = 'sun ' or Firstfour = 'sat ')

    )B

    LEFT JOIN dbo.dimApplication A ON A.Application = B.Application

    LEFT JOIN dbo.dimUser U ON U.UserName = B.[User]

    )

    Now, if you'd provide a little test data for that, you'd have an answer by now. See the first link in my signature line below for how to do that properly. Thanks.

    As a side bar, it looks like you need a very simple split function to hammer all of this out... again, correctly posted test data would go a long way in helping resolve this in the most effecient manner possible.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I have now added sample data along with all the code for tables and indexes etc. Hopefully this will make things clearer

    ---tables

    CREATE TABLE [stg_LogFiles] (

    [LineRead] [varchar] (100) NULL ,

    [Importeddate] [datetime] DEFAULT (getdate()),

    [Firstfour] AS (substring([LineRead],2,4))

    ) ON [PRIMARY]

    GO

    CREATE TABLE [fct_Logs] (

    [LogDate] [datetime] NULL ,

    [Application] [int] NULL ,

    [Database] [int] NULL ,

    [User] [int] NULL ,

    [Retrieval] [int] NULL

    ) ON [PRIMARY]

    GO

    CREATE TABLE [dbo].[dimApplication] (

    [Application] [varchar] (5) NULL ,

    [ApplicationID] [int] IDENTITY (1, 1) NOT NULL ,

    [Database] [varchar] (6) NULL ,

    [DatabaseID] [int] NULL

    ) ON [PRIMARY]

    --user table

    CREATE TABLE [dbo].[dimUser] (

    [UserID] [int] IDENTITY (1, 1) NOT NULL ,

    [UserName] [nvarchar] (10) NULL ,

    [Country] [nvarchar] (20) NULL ,

    [CountryID] [int] NULL

    ) ON [PRIMARY]

    GO

    --index for user table

    CREATE INDEX [IX_USER] ON [dbo].[dimUser]([UserName]) ON [PRIMARY]

    GO

    --index for application table

    CREATE INDEX [IX_Application] ON [dbo].[dimApplication]([Application]) ON [PRIMARY]

    GO

    --index for computed column on stg_LogFiles table

    CREATE INDEX [IX_FirstFour] ON stg_LogFiles ([Firstfour])

    ON [PRIMARY]

    --data for application table

    INSERT INTO [dimApplication]

    SELECT 'GMR08','1','OLTCHK','1' UNION ALL

    SELECT 'GMR09','2','OLTCHK','1' UNION ALL

    SELECT 'GMR10','3','OLTCHK','1'

    --data for user table

    INSERT INTO [dimUser]

    SELECT '1','KRRHEJ','Venezuela','43' UNION ALL

    SELECT '2','SGTAYL','Venezuela','43' UNION ALL

    SELECT '3','PHGANI','Venezuela','43' UNION ALL

    SELECT '4','ROBAIR','Venezuela','43' UNION ALL

    SELECT '5','silven','Venezuela','43' UNION ALL

    SELECT '6','COAGUJ','Venezuela','43' UNION ALL

    SELECT '7','USFITT','Venezuela','43' UNION ALL

    SELECT '8','admin','Netherlands','21'

    --data for stg_LogFiles table

    INSERT INTO [dbo].[stg_LogFiles] (LineRead)

    SELECT '[Mon Jan 05 13:51:05 2009]Local/GMR09///Info(1013205)' UNION ALL

    SELECT '[Mon Jan 05 13:51:18 2009]Local/GMR09///Info(1013210)' UNION ALL

    SELECT '[Mon Jan 05 13:51:30 2009]Local/GMR09/OTLCHK/admin/Info(1021000)' UNION ALL

    SELECT '[Mon Jan 05 13:52:59 2009]Local/GMR09/OTLCHK/admin/Info(1021000)' UNION ALL

    SELECT '[Mon Jan 05 13:53:19 2009]Local/GMR09/OTLCHK/admin/Info(1021000)' UNION ALL

    SELECT '[Mon Jan 05 13:53:28 2009]Local/GMR09///Info(1013210)' UNION ALL

    SELECT '[Mon Jan 05 13:53:40 2009]Local/GMR09///Info(1013210)'

    ---the query

    INSERT INTO [ESSBASE_TEST_DW].[dbo].[fct_Logs]([LogDate], [Application], [Database], [User], [Retrieval])

    (SELECT

    CONVERT(datetime,SUBSTRING(B.[Date],5,7)+ RIGHT(B.[Date],4) + SUBSTRING(B.[Date],11,9),108),

    A.ApplicationID,

    A.DatabaseID,

    U.UserID,

    1

    FROM

    (SELECT

    SUBSTRING(LineRead,2,24) AS [Date]

    ,SUBSTRING(

    LineRead,

    CHARINDEX('/',LineRead)+1,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)-1 - CHARINDEX('/',LineRead)) AS [Application]

    ,SUBSTRING(

    LineRead,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)+1,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)+1)-1

    - CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)) AS [Database]

    ,SUBSTRING(

    LineRead,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)+1) +1,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)+1)+1)- 1 -

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1) +1)) AS [User]

    FROM dbo.stg_LogFiles

    WHERE

    --CHARINDEX('[',LineRead) = 1

    --AND

    SUBSTRING(LineRead,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)

    ,2) <> '//' -- ONLY SELECT records where user id is present

    AND

    (Firstfour = 'mon ' or Firstfour = 'tue ' or Firstfour = 'wed ' or Firstfour = 'thu ' or Firstfour = 'fri ' or Firstfour = 'sun ' or Firstfour = 'sat ')

    )B

    LEFT JOIN dbo.dimApplication A ON A.Application = B.Application

    LEFT JOIN dbo.dimUser U ON U.UserName = B.[User]

    )

  • @eseosaoregie,

    Not all people ( including me ) can understand exec paln iin any format (here xlsm format) . i found it all impossible to read it. so always try to post it in XMl/graphical format.

    Moreover, do yoo have indexes oon "application" and "users" column, which are being used in LEFT JOINs in query ??

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • there are indexes on the Application and User Tables. My previous post has all the structures and sampl data which i should have posted initially.

  • (SELECT

    SUBSTRING(LineRead,2,24) AS [Date]

    ,SUBSTRING(

    LineRead,

    CHARINDEX('/',LineRead)+1,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)-1 - CHARINDEX('/',LineRead)) AS [Application]

    ,SUBSTRING(

    LineRead,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)+1,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)+1)-1

    - CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)) AS [Database]

    ,SUBSTRING(

    LineRead,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)+1) +1,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)+1)+1)- 1 -

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1) +1)) AS [User]

    FROM dbo.stg_LogFiles

    WHERE

    --SUBSTRING(LineRead, CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1) ,2) <> '//' AND

    (Firstfour = 'mon ' or Firstfour = 'tue ' or Firstfour = 'wed ' or Firstfour = 'thu ' or Firstfour = 'fri ' or Firstfour = 'sun ' or Firstfour = 'sat ')

    )B AND

    WHERE

    --SUBSTRING(LineRead, CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1) ,2) <> '//'

    both sql are causing table scan ,

    1) i tried it with clustered index replacing your indexes , found some improvement

    2) "WHERE SUBSTRING" should also be avoided to force optimzer to select SEEK operation

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • How did you replace my indexes with a clustered index? Were they added to the apllication and user table?

Viewing 15 posts - 1 through 15 (of 33 total)

You must be logged in to reply to this topic. Login to reply