What can I do to speed it up? Thanks.

  • I just got a table with 16736580 records. Index is created.

    A test shows a query like

    select * from table where charindex(column, keyword) > 1

    costs 14 minutes to return result.

    I wonder what should I do to optimize the query/table/db/server?

    Thank you for your input.

    As the table/db/server is my laptop, I can do whatever I want for the testing purpose, so any suggestion is welcome.

    I am using SQL2008R2 running on Windows 7 (64bits), hardware:

    2.8GHz CPU, 8G RAM, free hard disk space: 400GB

  • halifaxdal (8/7/2015)


    I just got a table with 16736580 records. Index is created.

    A test shows a query like

    select * from table where charindex(column, keyword) > 1

    costs 14 minutes to return result.

    I wonder what should I do to optimize the query/table/db/server?

    Thank you for your input.

    As the table/db/server is my laptop, I can do whatever I want for the testing purpose, so any suggestion is welcome.

    I am using SQL2008R2 running on Windows 7 (64bits), hardware:

    2.8GHz CPU, 8G RAM, free hard disk space: 400GB

    No amount of indexing is going to help with that query. You have wrapped a column from the table inside a function in the where predicate. That means a scan, plain and simple. It MUST evaluate the result for that function for each and every row with no exception.

    Think of this like a phone book. Your query says, "Find all entries in the phone book that have the letter 's' as part of the name". You can't scroll to a part of the phone book where those entries would be. You would have to look at every single name and see if it contains an 's'.

    Perhaps with an explanation of what you need to accomplish we can come up with a different way of retrieving this information.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Thanks Sean.

    The table contains daily data that to be analyzed to find out those fit into certain criteria, that's what the query is for. I don't know what else you might need to know, task is simple, the headache is the amount of records.

  • halifaxdal (8/7/2015)


    Thanks Sean.

    The table contains daily data that to be analyzed to find out those fit into certain criteria, that's what the query is for. I don't know what else you might need to know, task is simple, the headache is the amount of records.

    I meant what is the criteria? Maybe there is another way think about pulling this data.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (8/7/2015)


    halifaxdal (8/7/2015)


    Thanks Sean.

    The table contains daily data that to be analyzed to find out those fit into certain criteria, that's what the query is for. I don't know what else you might need to know, task is simple, the headache is the amount of records.

    I meant what is the criteria? Maybe there is another way think about pulling this data.

    There is a key column that contains sensitive data, if that sensitive data (say filename) matches certain criteria, for example, from a pre-defined websites, or pre-defined file extension, then then record will be picked up.

  • halifaxdal (8/7/2015)


    Sean Lange (8/7/2015)


    halifaxdal (8/7/2015)


    Thanks Sean.

    The table contains daily data that to be analyzed to find out those fit into certain criteria, that's what the query is for. I don't know what else you might need to know, task is simple, the headache is the amount of records.

    I meant what is the criteria? Maybe there is another way think about pulling this data.

    There is a key column that contains sensitive data, if that sensitive data (say filename) matches certain criteria, for example, from a pre-defined websites, or pre-defined file extension, then then record will be picked up.

    Is there a way you can isolate just the piece of the information you might need to look for? If not you may be stuck doing what is effectively a wildcard search with leading and trailing wildcards. Is fulltext an option here?

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (8/7/2015)


    halifaxdal (8/7/2015)


    Sean Lange (8/7/2015)


    halifaxdal (8/7/2015)


    Thanks Sean.

    The table contains daily data that to be analyzed to find out those fit into certain criteria, that's what the query is for. I don't know what else you might need to know, task is simple, the headache is the amount of records.

    I meant what is the criteria? Maybe there is another way think about pulling this data.

    There is a key column that contains sensitive data, if that sensitive data (say filename) matches certain criteria, for example, from a pre-defined websites, or pre-defined file extension, then then record will be picked up.

    Is there a way you can isolate just the piece of the information you might need to look for? If not you may be stuck doing what is effectively a wildcard search with leading and trailing wildcards. Is fulltext an option here?

    Can you elaborate the reason for implementing fulltext here?

    The original data has no index, it was such a memory hog to create index in it, basically for hours I can't do anything, and the indexing result is uncertain, failed for the first time actually but succeeded when I tried it again.

    That being said, I am a little bit afraid of implementing fulltext, is it a way to know the progress of it? to create index, I am totally blind to the progress, I don't know how long it takes and what's the percentage it completed, I don't know if it will succeed or not, etc.

    Is there a way to partition the table based on a timestamp column in it? so the table will be splitted into smaller tables for one day each.

    Never done that before, just a silly thought. Thanks.

  • halifaxdal (8/7/2015)


    Sean Lange (8/7/2015)


    halifaxdal (8/7/2015)


    Sean Lange (8/7/2015)


    halifaxdal (8/7/2015)


    Thanks Sean.

    The table contains daily data that to be analyzed to find out those fit into certain criteria, that's what the query is for. I don't know what else you might need to know, task is simple, the headache is the amount of records.

    I meant what is the criteria? Maybe there is another way think about pulling this data.

    There is a key column that contains sensitive data, if that sensitive data (say filename) matches certain criteria, for example, from a pre-defined websites, or pre-defined file extension, then then record will be picked up.

    Is there a way you can isolate just the piece of the information you might need to look for? If not you may be stuck doing what is effectively a wildcard search with leading and trailing wildcards. Is fulltext an option here?

    Can you elaborate the reason for implementing fulltext here?

    The original data has no index, it was such a memory hog to create index in it, basically for hours I can't do anything, and the indexing result is uncertain, failed for the first time actually but succeeded when I tried it again.

    That being said, I am a little bit afraid of implementing fulltext, is it a way to know the progress of it? to create index, I am totally blind to the progress, I don't know how long it takes and what's the percentage it completed, I don't know if it will succeed or not, etc.

    Is there a way to partition the table based on a timestamp column in it? so the table will be splitted into smaller tables for one day each.

    Never done that before, just a silly thought. Thanks.

    Let's forget about fulltext indexing for now.

    I am guess that if you think an index on that column will actually help you didn't really understand my post about the phone book. When you use CHARINDEX(column, value) you negate any indexing. It will be forced to evaluate the result of that function for every single row.

    Partitioning is NOT a mechanism for improving performance of a query. It isn't going to help here and could even potentially slow it down even further. I don't know that there is much you can if you have to look at every one of your 16.7 million rows for a given pattern. That just screams of something being horribly wrong.

    I asked previously if there is a way to isolate the portion that you need to search for. Thinking if you could parse that part into its own column then you searching would be a lot simpler and faster.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I asked previously if there is a way to isolate the portion that you need to search for. Thinking if you could parse that part into its own column then you searching would be a lot simpler and faster.

    I thought about your idea, it is possible to split the particular column into multiple columns, say url or file extension, again, the amount of records scared me from further thinking it.

  • halifaxdal (8/7/2015)


    I asked previously if there is a way to isolate the portion that you need to search for. Thinking if you could parse that part into its own column then you searching would be a lot simpler and faster.

    I thought about your idea, it is possible to split the particular column into multiple columns, say url or file extension, again, the amount of records scared me from further thinking it.

    Well since you are effectively storing multiple pieces of data in a single column you are now discovering one of the major reasons this is not a good approach. Even if you can just create a computed column (or one for each piece of data in the main column) it will greatly help since you would be able to add an index to the computed column(s). There are few restrictions to adding indexes to computed columns but it sounds like this shouldn't be an issue. https://msdn.microsoft.com/en-us/library/ms189292.aspx

    If you have to parse/split this column it is going to be very slow for that process but after it will make everything around that column better.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • If you have to parse/split this column it is going to be very slow for that process but after it will make everything around that column better.

    I agree with you.

  • Split the table horizontally and parallel process each part.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I think what Sean and Jeff are saying is to isolate your sensitive flags into other columns. If your single column contains multiple pieces of data, is it possible to split out the sensitive stuff into other columns when you create the row? I don't know if this is as simple as a flag that the row contains something sensitive or if you'd need multiple columns, one for each type of sensitive data. Then, as Sean said, index your search columns and you can search to your heart's content. You'll no longer have to search your big column with wildcards or the non-SARGable predicate you had in your OP.

    In a nutshell, your performance is suffering because you have everything in a single column. You want to isolate the data you need to search on into other columns.

  • halifaxdal (8/7/2015)


    I just got a table with 16736580 records. Index is created.

    A test shows a query like

    select * from table where charindex(column, keyword) > 1

    costs 14 minutes to return result.

    I wonder what should I do to optimize the query/table/db/server?

    Thank you for your input.

    As the table/db/server is my laptop, I can do whatever I want for the testing purpose, so any suggestion is welcome.

    I am using SQL2008R2 running on Windows 7 (64bits), hardware:

    2.8GHz CPU, 8G RAM, free hard disk space: 400GB

    Actually, we have so little information about the problem that it's nearly impossible to help here. For example...

    1. Is this a one-off task or will it need to be done more than once?

    2. How many keywords will you need to do this search for?

    3. Is it important to know which key word(s) caused the row to be selected?

    4. What is the structure of the table. Specifically, what is the CREATE TABLE statement for this table and all indexes associated with the table?

    5. What do 10 rows of this table look like with and without the keyword(s) that you're trying to find?

    If the data isn't "secret" in any way, it would be real handy if you'd post the DDL and readily consumable data IAW the first helpful link in may signature line below. Of course, you already knew that because you've been told that more than once. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you all replies.

    As the data is really confidential and I can't risk to expose myself under any uncertain situation, so I am sorry I can't tell any real data here but what I tell is some row has some content that might look like the following:

    e:\deploy\applications\intel management engine interface and serial over lan driver (sol) driver 7.1.2.1041v2\me_sw\mewmiprov\me\cim_schema\system\cim_computersystemdma.mof

    c:\$recycle.bin\s-1-5-21-3125639655-2069970247-2443061104-29869\$iqzvjd6.jpg

    c:\users\asdf\music\the cranberrieso need to argue\02 i can't be with you.mp3

    e:\oracle\epm\docs\epm~1.111\wa_user\about_navigating_data_objects.html

    Marijuana; Suicide; ***; butt; don’t say anything; drug; kick; knife; knives; marijuana; murder; naked; opiate; party; prick; scam; sex; smoke; smoke a joint; smoking; sneak; speculation; stabbed; stoned; suck; tripping

    for example, the first record contains info like xxxxxx.mof and the mof is deemed as a confidential file, meaning this record should be picked out and further analysis will be done to determine what,who, when,where, how....

    The criteria is evolving, new criteria could be added anytime.

    What I read here from your replies is I should really consider to split up the column and maybe even derive a new table containing only useful columns or calculated column. I will see if this is feasible in terms of the processing time.

    Thank you.

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply