May 23, 2008 at 10:31 am
Calling for TSQL help
I need help altering a User Defined Function. WHat I need it to do, is take a string of characters and count the number of specific character appears in a specific location. The specific location changes. When I first arrived it was done by cursor...so I updated it to this
CREATE FUNCTION [dbo].[ufn_SubDomScore_MC] ( @pInput VARCHAR(1000), @SUBDOMAIN VARCHAR(100), @Grade VARCHAR(5), @Subject VARCHAR(5), @form VARCHAR(5), @Qtype VARCHAR(5), @pSearchChar CHAR(1) )
RETURNS INT
BEGIN
DECLARE @GetItemNoCURSOR
DECLARE @vCount INT
-- Get position numbers of item in string to count
DECLARE @ITEMNO TABLE (
ITEMNO INT NOT NULL)
INSERT INTO @ITEMNO
SELECT
ITEMNO
FROM
WK_SDT_QUESTIONS
WHERE
SUBDOMAIN = @SUBDOMAIN
AND grade = @grade
AND [subject] = @Subject
AND form = @form
AND qtype = @qtype
DECLARE @COUNT TABLE (VCOUNT VARCHAR(5))
INSERT @COUNT (VCOUNT)
SELECT SUBSTRING(@pInput, ITEMNo, 1)
FROM @ITEMNO
SELECT @vCount = (SELECT COUNT(*) FROM @COUNT WHERE VCOUNT = '+')
RETURN @vCount
END
Which did increase the speed, however, I think using a CTE will really knock this out of the park. I attempted to do with this code
CREATE FUNCTION [dbo].[ufn_SubDomScore2_MC] ( @pInput VARCHAR(1000), @SUBDOMAIN VARCHAR(100), @Grade VARCHAR(5), @Subject VARCHAR(5), @form VARCHAR(5), @Qtype VARCHAR(5), @pSearchChar CHAR(1) )
RETURNS INT
BEGIN
DECLARE @vCount INT
SET @vCount = 0;
DECLARE @N VARCHAR(5);
WITH GETITEMNO (ITEMNO) AS
(SELECT
CAST(ITEMNO AS INT)
FROM
WK_SDT_QUESTIONS
WHERE
SUBDOMAIN = @SUBDOMAIN
AND grade = @grade
AND [subject] = @Subject
AND form = @form
AND qtype = @qtype
)
SELECT @N = SUBSTRING(@pInput, ITEMNO, 1) FROM GETITEMNO
BEGIN
IF @N = @pSearchChar
SET @vCount = @vCount + 1
END
RETURN @vCount
END
However, it appears to only bringing back the count from the last line of the CTE. Is there a way to do this with a CTE and if so, what am I missing?
The DDL for the WK_SDT_QUESTION table is as follows:
CREATE TABLE [dbo].[WK_SDT_QUESTIONS](
[QUESTION_ID] [bigint] IDENTITY(1,1) NOT NULL,
[grade] [varchar](5) NULL,
[subject] [varchar](5) NULL,
[form] [varchar](5) NULL,
[qtype] [varchar](5) NULL,
[ITEMNO] [varchar](10) NULL,
[DROPPED] [varchar](5) NULL,
[ORGANIZER] [varchar](50) NULL,
[SUBDOMAIN] [varchar](50) NULL
) ON [User]
GO
I attached 100 rows of data for the table.
Thanks
Marvin Dillard
Senior Consultant
Claraview Inc
May 23, 2008 at 3:15 pm
the function body can be replaced with a single select.
CREATE FUNCTION [dbo].[ufn_SubDomScore_MC] (
@pInput VARCHAR(1000), @SUBDOMAIN VARCHAR(100),
@Grade VARCHAR(5), @Subject VARCHAR(5),
@form VARCHAR(5), @Qtype VARCHAR(5),
@pSearchChar CHAR(1) )
RETURNS INT
as
begin
declare @VCOUNT int
SELECT @VCOUNT = count(*)
FROM
WK_SDT_QUESTIONS
WHERE
SUBDOMAIN = @SUBDOMAIN
AND grade = @grade
AND [subject] = @Subject
AND form = @form
AND qtype = @qtype
and substring( @pInput, ITEMNO, 1 ) = '+'
return @VCOUNT
end
May 23, 2008 at 3:32 pm
Antonio
Thanks, I'll give it a whirl and compare the response time. I'll post back once I get the results
MD
Marvin Dillard
Senior Consultant
Claraview Inc
May 23, 2008 at 8:52 pm
Antonio,
This saved about 25% of the time. The SP that uses the function was able to produce 7M rows in 4 Hours.
Thanks
Marvin Dillard
Senior Consultant
Claraview Inc
May 24, 2008 at 9:33 am
Marvin,
I'm a bit confused and I'm obviously missing something.... I looked at the 100 rows of test data you attached... Without me wading through all the code you posted, what is it that you're actually trying to do (brief but clear description)? You said you created 7M rows... can you describe the process verbally instead of me reading the code you don't want? Thanks.
P.S. 4 Hours to gen 7M rows is a bit slow... that's why I'm asking for a clearer definition.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 24, 2008 at 1:19 pm
Jeff
What I have is a stored proc that passes a string of "+" and "-" which represents whether a student received a positive score on a particular question. The string is usually 40 characters. Each character in that string can represent a different skill that was assessed, however, they are not all in a row. Say, on a math test, the first character of the string might represent geometry, the second character addition and subtraction, the third might be geometry again. Normally there are around 4 to 20 different skills represented in the string.
The table that I posted corresponds to the various skills with the itemno field being the exact location in the string that applies to that certian skill. So what the UDF is doing is taking determining which skill each character in the string represents and rather or not the student received a positive or negative score for that question. After all is said it done, the return provides the raw score for each skill to the stored proc to merge with the other information.
Hope this definition makes since. 4 Hours was fast, as when I first arrived it was accommplished with a cursor which took about 5 to 6 hours to process the same amount of data.
Hope this explanation helps.
Thanks
Marvin
Marvin Dillard
Senior Consultant
Claraview Inc
May 24, 2008 at 2:35 pm
Jeff
Now you made me think, why even use a UDF. I could just parse the string in a CTE and do a join there. That would eliminate the call to the UDF which should speed things up.
I'll let you know how it turns out.
Thanks
Marvin
Marvin Dillard
Senior Consultant
Claraview Inc
May 24, 2008 at 3:55 pm
Heh... post the other data from the other table and let's see if we can get it down to a couple of minutes, eh? 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
May 24, 2008 at 5:01 pm
Here is the assessment table
CREATE TABLE [dbo].[LD_ASSESSMENT](
[CODEOD] [varchar](50) NULL,
[CODEOGOD] [varchar](50) NULL,
[CODEPT] [varchar](50) NULL,
[CODEOGPT] [varchar](50) NULL,
[YEAR] [varchar](50) NULL,
[GRADE] [varchar](50) NULL,
[LITHO] [varchar](50) NULL,
[LNAME] [varchar](50) NULL,
[FNAME] [varchar](50) NULL,
[MI] [varchar](50) NULL,
[STUD_ID] [varchar](50) NULL,
[IDSOURCE] [varchar](50) NULL,
[MC_SUO] [varchar](50) NULL,
[BDATE] [varchar](50) NULL,
[SEX] [varchar](50) NULL,
[ETHNIC] [varchar](50) NULL,
[PARTICIPATE] [varchar](50) NULL,
[ACADEMIC_YR] [varchar](50) NULL,
[TEST_TYPE] [varchar](50) NULL,
[EXC] [varchar](50) NULL,
[EXC_REASON] [varchar](50) NULL,
[MIG] [varchar](50) NULL,
[ELIG] [varchar](50) NULL,
[BAS] [varchar](50) NULL,
[LEP] [varchar](50) NULL,
[LEP_ACCOM] [varchar](50) NULL,
[LEP_PARA] [varchar](50) NULL,
[LEP_ENGL] [varchar](50) NULL,
[LEP_INT] [varchar](50) NULL,
[LEP_BILNG] [varchar](50) NULL,
[LEP_CUE] [varchar](50) NULL,
[LEP_ORAL] [varchar](50) NULL,
[LEP_ONLINE] [varchar](50) NULL,
[LEP_TECH] [varchar](50) NULL,
[LEP_DICT] [varchar](50) NULL,
[LEP_EXTEND] [varchar](50) NULL,
[LEP_OTH] [varchar](50) NULL,
[LEP_PSP] [varchar](50) NULL,
[LEP_EXIT] [varchar](50) NULL,
[EXT] [varchar](50) NULL,
[EXT_BEFORE] [varchar](50) NULL,
[EXT_DURING] [varchar](50) NULL,
[EXT_AFTER] [varchar](50) NULL,
[EXT_INTSES] [varchar](50) NULL,
[EXT_SUMMER] [varchar](50) NULL,
[IEP] [varchar](50) NULL,
[P504] [varchar](50) NULL,
[DISABLTY] [varchar](50) NULL,
[ACMOD_U] [varchar](50) NULL,
[ACMOD_C] [varchar](50) NULL,
[ACM_PARA] [varchar](50) NULL,
[ACM_INT] [varchar](50) NULL,
[ACM_ORAL] [varchar](50) NULL,
[ACM_CUE] [varchar](50) NULL,
[ACM_TECH] [varchar](50) NULL,
[ACM_ONLINE] [varchar](50) NULL,
[ACM_DIC] [varchar](50) NULL,
[ACM_AUD] [varchar](50) NULL,
[ACM_OTH] [varchar](50) NULL,
[ACM_EXTEND] [varchar](50) NULL,
[GIFT_COG] [varchar](50) NULL,
[GIFT_NONCOG] [varchar](50) NULL,
[GIFT_POOL] [varchar](50) NULL,
[GIFT_TLT] [varchar](50) NULL,
[GIFT_SRV] [varchar](50) NULL,
[LUNCH] [varchar](50) NULL,
[VOC_SEQ] [varchar](50) NULL,
[VOC_NOT] [varchar](50) NULL,
[EXCWP] [varchar](50) NULL,
[EXCWP_REASON] [varchar](50) NULL,
[EXCAP] [varchar](50) NULL,
[EXCAP_REASON] [varchar](50) NULL,
[ORRS_RD] [varchar](50) NULL,
[ORRS_MA] [varchar](50) NULL,
[ORRS_SC] [varchar](50) NULL,
[ORRS_SS] [varchar](50) NULL,
[ORRS_AH] [varchar](50) NULL,
[ORRS_PL] [varchar](50) NULL,
[ORWS_RD] [varchar](50) NULL,
[ORWS_MA] [varchar](50) NULL,
[ORWS_SC] [varchar](50) NULL,
[ORWS_SS] [varchar](50) NULL,
[ORWS_AH] [varchar](50) NULL,
[ORWS_PL] [varchar](50) NULL,
[MCRS_RD] [varchar](50) NULL,
[MCRS_MA] [varchar](50) NULL,
[MCRS_SC] [varchar](50) NULL,
[MCRS_SS] [varchar](50) NULL,
[MCRS_AH] [varchar](50) NULL,
[MCRS_PL] [varchar](50) NULL,
[MCAT_RD] [varchar](50) NULL,
[MCAT_MA] [varchar](50) NULL,
[MCAT_SC] [varchar](50) NULL,
[MCAT_SS] [varchar](50) NULL,
[MCAT_AH] [varchar](50) NULL,
[MCAT_PL] [varchar](50) NULL,
[SSSE_RDE] [varchar](50) NULL,
[SSSE_MAE] [varchar](50) NULL,
[SSSE_SCE] [varchar](50) NULL,
[SSSE_SSE] [varchar](50) NULL,
[SSSE_AHE] [varchar](50) NULL,
[SSSE_PLE] [varchar](50) NULL,
[SS_RDE] [varchar](50) NULL,
[SS_MAE] [varchar](50) NULL,
[SS_SCE] [varchar](50) NULL,
[SS_SSE] [varchar](50) NULL,
[SS_AHE] [varchar](50) NULL,
[SS_PLE] [varchar](50) NULL,
[SS_OWE] [varchar](50) NULL,
[TOTRD] [varchar](50) NULL,
[TOTMA] [varchar](50) NULL,
[TOTSC] [varchar](50) NULL,
[TOTSS] [varchar](50) NULL,
[TOTAH] [varchar](50) NULL,
[TOTPL] [varchar](50) NULL,
[OR_TASK] [varchar](50) NULL,
[OR_WR1] [varchar](50) NULL,
[WR_SCORE] [varchar](50) NULL,
[WP_FIN] [varchar](50) NULL,
[WP_SCORE] [varchar](50) NULL,
[AP_FIN] [varchar](50) NULL,
[RDFORM] [varchar](50) NULL,
[RD_MC] [varchar](50) NULL,
[RD_OR] [varchar](50) NULL,
[MAFORM] [varchar](50) NULL,
[MA_MC] [varchar](50) NULL,
[MA_OR] [varchar](50) NULL,
[SCFORM] [varchar](50) NULL,
[SC_MC] [varchar](50) NULL,
[SC_OR] [varchar](50) NULL,
[SSFORM] [varchar](50) NULL,
[SS_MC] [varchar](50) NULL,
[SS_OR] [varchar](50) NULL,
[AHFORM] [varchar](50) NULL,
[AH_MC] [varchar](50) NULL,
[AH_OR] [varchar](50) NULL,
[PLFORM] [varchar](50) NULL,
[PL_MC] [varchar](50) NULL,
[PL_OR] [varchar](50) NULL,
[NRT_RDLG_RSP] [varchar](50) NULL,
[NRT_MA_RSP] [varchar](50) NULL,
[EPAS_EN] [varchar](50) NULL,
[EPAS_MA] [varchar](50) NULL,
[EPAS_RD] [varchar](50) NULL,
[EPAS_SC] [varchar](50) NULL,
[EPAS_COMPOSITE] [varchar](50) NULL,
[WOFORM] [varchar](50) NULL,
[WO_MC] [varchar](50) NULL,
[SSID] [varchar](50) NULL
) ON [User]
And this is the current stored proc
(the snippet for reading. I do a union all for all of the different assessments
SELECT
LEFT(CODEOGOD, 3) AS 'District_Code',
CASE
WHEN EXCWP_REASON = '06' THEN 'ALT ASMNT'
WHEN EXCAP_REASON = '06' THEN 'ALT ASMNT'
ELSE 'KCCT'
END 'Test_Description',
LD_ASSESSMENT.[YEAR] + '-06-30' AS 'Assessment_School_Year_Date',
'READING ' +
CAST(LD_ASSESSMENT.[GRADE] AS VARCHAR(2)) + ' ' +
CASE
WHEN RDFORM IS NULL THEN ''
ELSE CAST(RDFORM AS VARCHAR(10))
END
+' ' + 'SPRING' AS 'Item_Description',
LD_ASSESSMENT.[YEAR] + '-04-23' AS 'Test_Date',
SSID AS 'Student_ID',
'ORGANIZER ID -' + CAST(b.ORGANIZER_ID AS VARCHAR(10)) AS 'Item_Response_Description',
NULL AS 'Alpha_Value',
[dbo].[ufn_Organizer_MC](RD_MC, b.ORGANIZER, b.grade, b.subject, b.form, b.qtype, '+') AS 'Numeric_Value',
'N' AS 'Level_Of_Aggregation',
LD_ASSESSMENT.[YEAR] + '-06-30' AS 'School_Year',
CASE
WHEN LEN(CODEOGOD) >= 4 THEN RIGHT(CODEOGOD, 3)
ELSE '000'
END 'Location_Code',
NULL AS 'Achievement_Level',
NULL AS 'Staff_ID'
FROM LD_SDT
LEFT OUTER JOIN WK_SDT_ORGANIZER B ON
LD_ASSESSMENT.GRADE = b.grade
AND b.FORM = LD_ASSESSMENT.RDFORM
AND b.Subject = 'RD'
AND b.qtype = 'MC'
WHERE LD_ASSESSMENT.GRADE IN (SELECT GRADE FROM WK_ASSESSMENT_GRADE_05 WHERE SUBJECT_CD ='RD')
Marvin Dillard
Senior Consultant
Claraview Inc
May 24, 2008 at 5:35 pm
Jeff
I just did the same with a CTE instead of a call to a UDF. Here is the snippet.
WITH GETRDMCSCORE (SSID, GRADE, RD_MC_SCORE, organizer) AS
(SELECT b.SSID, a.GRADE, COUNT(b.RD_MC) AS RD_MC_SCORE, a.ORGANIZER
FROM
WK_SDT_QUESTIONS a
LEFT OUTER JOIN LD_SDT b ON
a.form = b.rdFORM
AND a.GRADE = b.GRADE
WHERE a.subject = 'RD'
AND a.qtype = 'MC'
and substring(b.RD_MC, CAST(a.ITEMNO AS INT), 1 ) = '+'
GROUP BY b.SSID, a.ORGANIZER, a.grade)
SELECT
LEFT(CODEOGOD, 3) AS 'District_Code',
CASE
WHEN EXCWP_REASON = '06' THEN 'ALT ASMNT'
WHEN EXCAP_REASON = '06' THEN 'ALT ASMNT'
ELSE 'KCCT'
END 'Test_Description',
LD_ASSESSMENT.[YEAR] + '-06-30' AS 'Assessment_School_Year_Date',
'READING ' +
CAST(LD_ASSESSMENT.[GRADE] AS VARCHAR(2)) + ' ' +
CASE
WHEN RDFORM IS NULL THEN ''
ELSE CAST(RDFORM AS VARCHAR(10))
END
+' ' + 'SPRING' AS 'Item_Description',
LD_ASSESSMENT.[YEAR] + '-04-23' AS 'Test_Date',
LD_ASSESSMENT.SSID AS 'Student_ID',
'ORGANIZER ID -' + CAST(b.ORGANIZER_ID AS VARCHAR(10)) AS 'Item_Response_Description',
NULL AS 'Alpha_Value',
c.RD_MC_SCORE AS 'Numeric_Value',
'N' AS 'Level_Of_Aggregation',
LD_ASSESSMENT.[YEAR] + '-06-30' AS 'School_Year',
CASE
WHEN LEN(CODEOGOD) >= 4 THEN RIGHT(CODEOGOD, 3)
ELSE '000'
END 'Location_Code',
NULL AS 'Achievement_Level',
NULL AS 'Staff_ID'
FROM LD_SDT
LEFT OUTER JOIN WK_SDT_ORGANIZER B ON
LD_ASSESSMENT.GRADE = b.grade
AND b.FORM = LD_ASSESSMENT.RDFORM
AND b.Subject = 'RD'
AND b.qtype = 'MC'
LEFT OUTER JOIN GETRDMCSCORE c ON
LD_ASSESSMENT.SSID = c.SSID
AND c.ORGANIZER = b.ORGANIZER
WHERE LD_ASSESSMENT.GRADE IN (SELECT GRADE FROM WK_ASSESSMENT_GRADE_05 WHERE SUBJECT_CD ='RD')
The results were outstanding, it returned 543165 rows in a little over 9 Minutes. At that rate, 7M rows should return in less than two hours which is half the time of calling the UDF 7M times.
Thanks for prompting me to look outside of the UDF
Marvin
Marvin Dillard
Senior Consultant
Claraview Inc
May 24, 2008 at 5:48 pm
FROM LD_SDT
LEFT OUTER JOIN WK_SDT_ORGANIZER B ON
LD_ASSESSMENT.GRADE = b.grade
AND b.FORM = LD_ASSESSMENT.RDFORM
AND b.Subject = 'RD'
AND b.qtype = 'MC'
WHERE LD_ASSESSMENT.GRADE IN (SELECT GRADE FROM WK_ASSESSMENT_GRADE_05 WHERE SUBJECT_CD ='RD')
... and therein lies the source of the performance problem... a correlated subquery on the table called LD_ASSESSMENT. The outer query (that you've not included) has a join to LD_ADDESSMENT and you are using it in these inner unioned queries as a correlation.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 24, 2008 at 6:03 pm
Jeff
Actually, I was mocking up the query. I guess I missed that one. LD_SDT was changed to LD_ASSESSMENT but I guess my search and replace missed that one. So in reality they are both one in the same.
Marvin Dillard
Senior Consultant
Claraview Inc
May 24, 2008 at 8:43 pm
Marvin,
So, is the CTE a good enough solution for you?
Shifting gears a bit... its pretty clever how they put the test question assessment list together so that it works against a string of +/- answers. It's a PITA, but it's still clever, none the less.
One of the things I'm not quickly seeing in the code, is a filter to ensure you don't pick up any dropped questions.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 24, 2008 at 11:11 pm
Jeff
The CTE is good enough, heck, to go from 6 to 7 hours down to two is a vast improvement. There is a mechanism to ID the dropped questions, but I haven't added it yet but will to the final code I present on Tuesday.
Thanks for all of your help
Marvin
Marvin Dillard
Senior Consultant
Claraview Inc
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply