Will the following procedure provide accurate sorting data each time with millions of records without a time field ?

  • Sample Table

    USE [Testing]

    GO

    /****** Object: Table [dbo].[Testing] Script Date: 4/25/2014 11:08:18 AM ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    SET ANSI_PADDING ON

    GO

    CREATE TABLE [dbo].[Testing](

    [ID] [int] IDENTITY(1,1) NOT NULL,

    [BeginDate] [datetime] NULL,

    [Company] [varchar](200) NULL,

    PRIMARY KEY CLUSTERED

    (

    [ID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    SET ANSI_PADDING OFF

    GO

    Sample data

    insert into Testing

    Values('2014-04-10 00:00:00.000','Exxon')

    insert into Testing

    Values('2014-04-09 00:00:00.000','Walmart')

    insert into Testing

    Values('2014-04-08 00:00:00.000','Chase')

    insert into Testing

    Values('2014-04-07 00:00:00.000','ADP')

    insert into Testing

    Values('2014-04-06 00:00:00.000','Chevron')

    insert into Testing

    Values('2014-04-05 00:00:00.000','BOA')

    insert into Testing

    Values('2014-04-04 00:00:00.000','Sears')

    insert into Testing

    Values('2014-04-03 00:00:00.000','Google')

    insert into Testing

    Values('2014-04-02 00:00:00.000','Apple')

    insert into Testing

    Values('2014-04-01 00:00:00.000','Intel')

    Sample Code

    set statistics time on;

    WITH results AS

    (

    select

    id,

    begindate,

    company,

    ROW_NUMBER() OVER (ORDER BY (SELECT 1)) RowNum

    FROM dbo.testing

    WHERE 1=1 )

    SELECT

    id,

    begindate,

    company,

    (select count(*)from results) totalcount

    FROM (SELECT *, ROW_NUMBER() OVER(ORDER BY BeginDate asc) R1 FROM results) T

    WHERE R1 BETWEEN 0 AND 10

    OPTION(Maxdop 8)

    set statistics time off

    begindate index

    USE [Testing]

    GO

    /****** Object: Index [<IX_BeginDate>] Script Date: 4/25/2014 11:15:02 AM ******/

    CREATE NONCLUSTERED INDEX [<IX_BeginDate>] ON [dbo].[Testing]

    (

    [BeginDate] ASC

    )

    INCLUDE ( [ID],

    [Company]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    It seems to work fine with one million records.

    Each primary key is unique, but the begindate is non-unique, and i guess even if i use datetime2 and add nanoseconds, from what i have read, there is a chance that i could have a duplicate datetime since the date is imported via XML from multiple sources.

    Thanks

  • You can't guarantee and order if you don't have an ORDER BY in your query. If you have repeated values, the order won't be guaranteed either.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • So the 'Row Number Order By' statements are not enough?

    Is there a way to fix this either in the query, or by inserting a nanosecond time loop in the data?

    Thanks

  • ROW_NUMBER is not designed to return data in certain order (even if it returns the rows in order most of the times when dealing with 10 rows). Your query is overcomplicated to return a simple result. Maybe I'm missing something, but this should do the same thing.

    select TOP 10

    id,

    begindate,

    company,

    count(*) OVER( PARTITION BY (SELECT NULL))

    FROM dbo.testing

    ORDER BY BeginDate

    Note that you're counting all the rows but you're returning only 10. That doesn't seem logic.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • I was trying to provide something simple as a 'proof of concept'. Here is all the code. Thanks

    USE [Testing]

    GO

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    -- =============================================

    --

    -- =============================================

    ALTER Procedure [dbo].[Testing]

    -- Add the parameters for the stored procedure here

    @Title varchar(250),

    @Company varchar(250),

    @Industry VARCHAR(250),

    @Industry2 VARCHAR(250),

    @Industry3 VARCHAR(250),

    @CompanyBlank VARCHAR(5),

    @Date int,

    @JobTitle int,

    @JobType int,

    @Experience int,

    @education int,

    @Endzipcode int,

    @Startzipcode int,

    @State int,

    @City int,

    @Areacode int,

    @Salary int,

    @MaxSalary int,

    @fromRec int,

    @toRec int,

    @OrderType VARCHAR(10),

    @OrderBy VARCHAR(100)

    AS

    DECLARE @sql NVARCHAR(MAX)

    SELECT @sql = 'Begin

    WITH results AS

    (

    select

    id,

    industry1, industry2, industry3,

    jobtitle as title,

    company as companyname,

    did,

    JobDescription as description,

    begindate,

    locationcity as city,

    locationstate as state,

    '

    if (@OrderBy = 'begindate')

    Begin

    --4-24-14 SELECT @sql = @sql + 'RowNum = ROW_NUMBER() OVER (ORDER BY begindate ' + ' ' + @OrderType + ', ' + 'id' + ') '

    SELECT @sql = @sql + 'RowNum = ROW_NUMBER() OVER (ORDER BY (SELECT 1)) '

    End

    SELECT @sql = @sql + ' FROM dbo.test

    WHERE 1=1 '

    if (@Date <> 0)

    Begin

    set @sql = @sql + ' and begindate >''' + cast(((select dateadd(d,@Date,GETDATE()))) as varchar(20)) + ''''

    End

    if (@Title <> '')

    Begin

    set @sql = @sql + ' AND (SELECT COUNT(*) FROM SPLIT(''' + cast(@Title as varchar(250)) + ''','','') WHERE jobtitle like ''%'' + ITEMS + ''%'') > 0'

    End

    if (@Company <> '')

    Begin

    set @sql = @sql + ' and company = ''' + cast(@Company as varchar(250)) + ''''

    End

    if (@CompanyBlank = 'yes')

    Begin

    set @sql = @sql + ' and company <> ''' + '' + ''''

    End

    if (@Industry <> '')

    Begin

    set @sql = @sql + ' and industry1 = ''' + cast(@Industry as varchar(250)) + ''''

    End

    if (@Industry2 <> '')

    Begin

    set @sql = @sql + ' and industry2 = ''' + cast(@Industry2 as varchar(250)) + ''''

    End

    if (@Industry3 <> '')

    Begin

    set @sql = @sql + ' and industry3 = ''' + cast(@Industry3 as varchar(250)) + ''''

    End

    if (@City <> 0)

    Begin

    set @sql = @sql + ' and ' + cast(@City as varchar(10)) + ' in (select id from cities where LTRIM(RTRIM(locationcity)) = LTRIM(RTRIM(Cities.Name)))'

    End

    if (@State <> 0)

    Begin

    set @sql = @sql + ' and ' + cast(@State as varchar(10)) + ' in (select id from regions where LTRIM(RTRIM(locationstate)) = LTRIM(RTRIM(regions.abbreviatedName))) '

    End

    set @sql = @sql + ')

    SELECT

    id,

    title,

    companyname,

    did,

    begindate,

    description,

    city,

    state,

    city + '', '' + state as Location,

    (select count(*) from results) totalcount

    FROM (SELECT *, ROW_NUMBER() OVER(ORDER BY BeginDate ' + ' ' + @OrderType +') R1 FROM results) T

    WHERE R1 BETWEEN ' + cast(@fromRec as varchar(10)) + ' AND ' + cast(@toRec as varchar(10)) +

    '

    OPTION(Maxdop 8)

    End

    '

    --print @sql

    exec (@sql)

  • Speaking about bad coding habits :crazy:

    Your code will allow sql injection. You need to parametrize your query.

    You will construct your conditions like this:

    if (@Date <> 0)

    Begin

    set @sql = @sql + ' and begindate > dateadd(d,@Date,GETDATE()) '

    End

    if (@Title <> '')

    Begin

    set @sql = @sql + ' AND (SELECT COUNT(*) FROM SPLIT(@Title ,'','') WHERE jobtitle like ''%'' + ITEMS + ''%'') > 0'

    End

    if (@Company <> '')

    Begin

    set @sql = @sql + ' and company = @Company '

    End

    And need to add the parameters definition on your sp_executesql call. Read more about how to do it in here: http://technet.microsoft.com/es-es/library/ms188001.aspx and here: http://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/

    If you can't define your order in the front end, you could use QUOTENAME on the column needed. I would suggest that you check the ORDER BY with OFFSET and FETCH options to return the correct order.

    Example:

    ORDER BY BeginDate

    OFFSET @fromRec ROWS

    FETCH NEXT @toRec - @fromRec + 1 ROWS

    The split function might cause performance problems, check for a great option in here: http://www.sqlservercentral.com/articles/Tally+Table/72993/

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • I forgot to mention that the CTE and ROW_NUMBER functions are not needed at all if you change to ORDER BY with OFFSET and FETCH options.

    The MAXDOP option might give you more problems than solutions if you don't have an explicit reason to use it. And a dynamic query might not be the appropiate place to use it.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Thanks

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply