January 11, 2016 at 11:47 am
We have a table containing open Job positions, and want to use Full-Text search on the job descriptions, to look for "finance", "accounting" etc, so I set up full-text searching, but it seems much slower than I would expect. The data being searched is in a varchar(max) column, there are 330,000 rows, the longest string is 64,000 chracters, and only 370 rows greater than 10,000 characters, so seems to me it shouldn't be a long running query..
What I did:
1) Enabled the database for full text search.
2) Define Full-Text Index, select the column JOB_Combined_Title_Description
3) create a new catalog on the F drive
4) Populate the catalog
5) run query such as:
Takes 10 seconds returning 18,000 rows
select job_id, job_title
from Job_TextSearchTesting
WHERE CONTAINS(JOB_Combined_Title_Description, '"finance" OR "accounting"');
Table: Full text search is built on the 2nd to last column, JOB_Combined_Title_Description
It had HTML code, but I stripped that out first.
USE [My_Database]
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[Job_TextSearchTesting](
[JOB_ID] [int] IDENTITY(125000,1) NOT NULL,
[JOB_origMemID] [int] NULL,
[JOB_memID] [int] NULL,
[JOB_memRecID] [varchar](20) NULL,
[JOB_entryMemID] [int] NULL,
[JOB_entryRecID] [varchar](20) NULL,
[JOBS_ID] [int] NULL,
[JOB_PrivateOnly] [bit] NOT NULL,
[JOB_jobType] [varchar](50) NULL,
[JOB_jobType11] [int] NULL,
[JOB_jobType21] [int] NULL,
[JOB_refNum] [varchar](40) NULL,
[JOB_releaseDate] [datetime] NULL,
[FIRMT_ID] [int] NULL,
[TRANSTYPE_ID] [int] NULL,
[JOB_title] [varchar](80) NULL,
[JOB_funcCode1] [varchar](5) NULL,
[JOB_funcCode2] [varchar](5) NULL,
[JOB_srMgmt] [bit] NOT NULL,
[JOB_reportsTo] [varchar](40) NULL,
[JOB_coSizeNum] [decimal](18, 1) NULL,
[JOB_coSize] [varchar](15) NULL,
[JOB_coSizeCode] [bigint] NULL,
[JOB_indCode1] [varchar](5) NULL,
[JOB_indCode2] [varchar](5) NULL,
[JOB_indName1] [varchar](50) NULL,
[JOB_indName2] [varchar](50) NULL,
[JOB_locCode1] [varchar](6) NULL,
[JOB_locName1] [varchar](50) NULL,
[JOB_locCity] [varchar](50) NULL,
[JOB_actLocCode] [varchar](6) NULL,
[JOB_PI_company] [text] NULL,
[JOB_salaryFormat] [bit] NOT NULL,
[JOB_salaryDMin] [money] NULL,
[JOB_salaryDMax] [money] NULL,
[JOBCUR_ID] [int] NULL,
[JOB_bonus] [int] NULL,
[JOB_bonusPrMin] [float] NULL,
[JOB_bonusPrMax] [float] NULL,
[JOB_bonusDMin] [money] NULL,
[JOB_bonusDMax] [money] NULL,
[JOB_options] [int] NULL,
[JOB_commission] [int] NULL,
[JOB_car] [int] NULL,
[JOB_incentives] [int] NULL,
[RELOC_ID] [int] NULL,
[JOB_PI_salary1] [text] NULL,
[JOB_salaryMid] [money] NULL,
[JOB_bonusMid] [money] NULL,
[JOB_totalComp] [money] NULL,
[JOB_salaryReport] [varchar](50) NULL,
[JOB_PI_salary2] [text] NULL,
[JOB_PI_specReq] [text] NULL,
[JOB_PI_companyDesc] [text] NULL,
[JOB_PI_posDesc] [text] NULL,
[JOB_PI_RemClose] [varchar](255) NULL,
[JOB_PI_posUpdated] [text] NULL,
[JOBRMV_ID] [int] NULL,
[JOB_replyDirect] [bit] NOT NULL,
[JOB_postNameDisp] [bit] NOT NULL,
[JOB_postName] [varchar](150) NULL,
[JOB_postAddressDisp] [bit] NOT NULL,
[JOB_postAddress] [text] NULL,
[JOB_postPhoneDisp] [bit] NOT NULL,
[JOB_postPhone] [varchar](30) NULL,
[JOB_postFaxDisp] [bit] NOT NULL,
[JOB_postFax] [varchar](30) NULL,
[JOB_postEmailDisp] [bit] NOT NULL,
[JOB_postEmail] [varchar](100) NULL,
[JOB_confidentialEmail] [varchar](100) NULL,
[JOB_PI_contact] [text] NULL,
[JOB_PI_contactEdit] [bit] NOT NULL,
[JOB_staffComments] [text] NULL,
[JOB_operator] [varchar](30) NULL,
[JOB_lastUpdated] [datetime] NOT NULL,
[JOB_creationDate] [datetime] NOT NULL,
[JOB_locCode2] [varchar](6) NULL,
[JOB_locName2] [varchar](50) NULL,
[JOB_statusNum] [int] NULL,
[JOB_emailTest] [int] NOT NULL,
[JOB_memSpecType] [varchar](10) NULL,
[JOB_Coupon] [varchar](6) NULL,
[job_ShowRecProfile] [bit] NOT NULL,
[JOB_AcceptReferrals] [bit] NOT NULL,
[JOB_BasicMemApply] [char](1) NULL,
[JOB_BasicMemApplyDate] [datetime] NULL,
[JOB_JobFeed] [char](1) NULL,
[JOB_JobFeedDate] [datetime] NULL,
[JOB_postApplyLinkDisp] [bit] NULL,
[JOB_postApplyLink] [varchar](max) NULL,
[JOB_JobLevel] [int] NULL,
[JOB_EmailSubject] [varchar](max) NULL,
[JOB_specinstr] [varchar](max) NULL,
[JOB_FeedCity] [varchar](50) NULL,
[JOB_FeedState] [varchar](20) NULL,
[JOB_FeedCountry] [varchar](50) NULL,
[JOB_Combined_Title_Description] [varchar](max) NULL,
[JOB_Title_Description_Updated] [char](1) NULL,
CONSTRAINT [PK_Job_TextSearchTesting] PRIMARY KEY CLUSTERED
(
[JOB_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
EXEC sys.sp_addextendedproperty @name=N'MS_Description', @value=N'Combine JOB_title JOB_PI_posDesc for Full text searching' , @level0type=N'SCHEMA',@level0name=N'dbo', @level1type=N'TABLE',@level1name=N'Job_TextSearchTesting', @level2type=N'COLUMN',@level2name=N'JOB_Combined_Title_Description'
January 11, 2016 at 12:36 pm
A few things:
Can you post the query plan that's used for this query?
Also, does that column need to be NULLable? If not can you set it to NOT NULL, run the query and see if that helps?
-- Itzik Ben-Gan 2001
January 11, 2016 at 1:09 pm
Alan.B (1/11/2016)
A few things:Can you post the query plan that's used for this query?
Also, does that column need to be NULLable? If not can you set it to NOT NULL, run the query and see if that helps?
There are NULLs in the column, not sure why though. Could be old, inactive records.
StmtText
------------------------------------------------------------------------------
select job_id, job_title
from Job_TextSearchTesting
WHERE CONTAINS(JOB_Combined_Title_Description, '"finance" OR "accounting"')
StmtText
---------------------------------------------------------------------------------------
|--Hash Match(Inner Join, HASH: ([Full-text Search Engine].)=([execunet_DataMaint].[dbo].[Job_TextSearchTesting].[JOB_ID]))
|--Remote Scan(OBJECT: (CONTAINS))
|--Clustered Index Scan(OBJECT: ([execunet_DataMaint].[dbo].[Job_TextSearchTesting].[PK_Job_TextSearchTesting]))
January 11, 2016 at 1:58 pm
homebrew01 (1/11/2016)
Alan.B (1/11/2016)
A few things:Can you post the query plan that's used for this query?
Also, does that column need to be NULLable? If not can you set it to NOT NULL, run the query and see if that helps?
There are NULLs in the column, not sure why though. Could be old, inactive records.
StmtText
------------------------------------------------------------------------------
select job_id, job_title
from Job_TextSearchTesting
WHERE CONTAINS(JOB_Combined_Title_Description, '"finance" OR "accounting"')
StmtText
---------------------------------------------------------------------------------------
|--Hash Match(Inner Join, HASH: ([Full-text Search Engine].)=([execunet_DataMaint].[dbo].[Job_TextSearchTesting].[JOB_ID]))
|--Remote Scan(OBJECT: (CONTAINS))
|--Clustered Index Scan(OBJECT: ([execunet_DataMaint].[dbo].[Job_TextSearchTesting].[PK_Job_TextSearchTesting]))
Ok, I'm buried at the moment with work but I think I have a good alternative way of getting what you need. I'll be back online in a few hours and post what I have.
-- Itzik Ben-Gan 2001
January 11, 2016 at 7:38 pm
Ok, I have a nasty fast alternative for your under performing full-text index. We're going to create what I'll call a Nindex (N-Grams Index); it involves a Tally table and a couple indexed views. I may update this post with a solution that only uses one indexed view but, for now, this will be a huge improvement. The code below can be copied/pasted and ran as is.
In summary we're going to:
1) Create a Tally table with 1,000,000 rows and do some pre-cleanup
2) Create 330,000 rows of sample data similar to what you are working with 5.5% of the records containing the text "accounting" or "finance" in your JOB_Combined_Title_Description column
3) Create 2 indexed views
Lastly, I'll show you how to utilize those index views to get your 18,000 records in ~200ms. Note the comments in my code:
USE tempdb -- somewhere safe for testing
GO
/****************************************************************************************
STEP 1: Prep and required tally table (a CTE Tally table will not do)
****************************************************************************************/;
IF OBJECT_ID('dbo.Job_TextContainsA') IS NOT NULL DROP VIEW dbo.Job_TextContainsA;
IF OBJECT_ID('dbo.Job_TextContainsF') IS NOT NULL DROP VIEW dbo.Job_TextContainsF;
IF OBJECT_ID('dbo.Job_TextSearch') IS NOT NULL DROP TABLE dbo.Job_TextSearch;
IF OBJECT_ID('dbo.Tally') IS NOT NULL DROP TABLE dbo.Tally;
CREATE TABLE dbo.Tally(N int NOT NULL)
INSERT dbo.Tally
SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a, sys.all_columns b;
ALTER TABLE dbo.Tally ADD CONSTRAINT pk_tally_N PRIMARY KEY CLUSTERED(N);
ALTER TABLE dbo.Tally ADD CONSTRAINT uq_tally_N UNIQUE(N);
GO
/****************************************************************************************
STEP 2: Use the tally table to create the sample data
This runs for about 10 seconds on my PC and creates a simplified version of your
table with 330,000 records. ~5.5% of the JOB_Combined_Title_Description fields will
contain the text "accounting" or "finance".
****************************************************************************************/;
WITH
base AS
(
SELECT TOP(330000)
job_id = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
x = ABS(CHECKSUM(newid())%1000)+1,
fa = ABS(CHECKSUM(newid())%2)
FROM dbo.tally a, dbo.tally b
),
txt AS
(
SELECT
job_id,
job_Combined_Title_Description = REPLICATE(NEWID(),ABS(ABS(CHECKSUM(newid())%200)-100)),
x, fa
FROM base
)
SELECT
job_id,
job_title = ISNULL(CAST('Some Job Title...' AS varchar(100)),'x'),
job_combined_title_description =
ISNULL
((
CASE
WHEN x < 55
THEN STUFF
(
JOB_Combined_Title_Description,
ABS(CHECKSUM(newid())%100)+1,0,
CASE fa WHEN 1 THEN 'Accounting' ELSE 'Finance' END
)
ELSE JOB_Combined_Title_Description
END),CAST(NEWID() AS varchar(36))+'Finance')
INTO dbo.Job_TextSearch
FROM txt;
GO
ALTER TABLE dbo.Job_TextSearch ALTER COLUMN job_id int NOT NULL;
GO
ALTER TABLE dbo.Job_TextSearch ADD CONSTRAINT pk_strings PRIMARY KEY(job_id);
GO
/****************************************************************************************
STEP 3: Create a couple indexed views that filter for the text "accounting" of "finance"
The first view will contain all of the job_ids where job_Combined_Title_Description
contains "accounting" and another for "finance".
Notes:
1. The indexes will take about 2 minutes to build but you would only need to build them
once.
2. This could all be done in one indexed view but at a slight performance cost; for now
we'll use two.
3. Instead of a view you could do this in a table but an indexed view will be much easier
to maintain.
4. If you will need to search for more text than just "accounting" or "finance" we can
come back revise this solution.
****************************************************************************************/;
-- View for "accounting"
CREATE VIEW dbo.Job_TextContainsA
WITH SCHEMABINDING AS
SELECT job_id, AF='A'
FROM dbo.Job_TextSearch
CROSS JOIN dbo.Tally
WHERE N <= LEN(job_Combined_Title_Description)
AND LEN(JOB_Combined_Title_Description) >= 10
AND SUBSTRING(JOB_Combined_Title_Description,N,10) = 'Accounting';
GO
-- View for "finance"
CREATE VIEW dbo.Job_TextContainsF
WITH SCHEMABINDING AS
SELECT job_id, AF='F'
FROM dbo.Job_TextSearch
CROSS JOIN dbo.Tally
WHERE N <= LEN(JOB_Combined_Title_Description)
AND LEN(JOB_Combined_Title_Description) >= 7
AND SUBSTRING(JOB_Combined_Title_Description,N,7) = 'Finance';
GO
-- Create indexes on the views
CREATE UNIQUE CLUSTERED INDEX pk_ContainsA ON dbo.Job_TextContainsA (job_id, AF);
GO
CREATE UNIQUE CLUSTERED INDEX pk_ContainsF ON dbo.Job_TextContainsF (job_id, AF);
GO
Here's see how to retrieve the data and run a performance test:
SET STATISTICS TIME ON;
SELECT s.job_id, Job_title
FROM dbo.Job_TextSearch s
JOIN dbo.Job_TextContainsA a ON s.job_id = a.job_id
UNION ALL
SELECT s.job_id, Job_title
FROM dbo.Job_TextSearch s
JOIN dbo.Job_TextContainsA a ON s.job_id = a.job_id;
SET STATISTICS TIME OFF;
Test Results:
SQL Server Execution Times:
CPU time = 218 ms, elapsed time = 236 ms.
... and the query plan:
The one warning is: if this is a table that get's inserted and updated a lot or you're inserting/updating 1000's of rows at a time my solution will add some overhead (not too much but there is a cost there).
Edit: couple errors in my comments
-- Itzik Ben-Gan 2001
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply