July 16, 2014 at 12:12 pm
Hello everyone,
Can anyone please suggest me on how can I avoid using correlated subqueries in the following create view statement:
CREATE VIEW [dbo].[PDR_V_FUTURE_POLICY_VERS] AS
SELECT pol.Source_System_Identifier, pol.BATCH_NUM, pol.ACCT_NO, pol.stat_header_id, pol.POLICY_NO, pol.POLICY_NBR_SEQ, pol.POLICY_ID, pol.VERS_CNTL_MAJ_MIN, pol.vers_data_maj_min,
pol.DATA_SET_NBR, sts.cur_status_c AS attrb_data, pol.tran_eff_d policy_tran, sts.effective_d status_tran
FROM dbo.PDR_POL pol,
(SELECT t.acct_no, t.policy_id, t.cur_status_c, t.effective_d, t.process_dt
FROM dbo.pol_policy_status_v AS t
WHERE t.policy_status_v_id = (SELECT MAX(i.policy_status_v_id)
FROM dbo.pol_policy_status_v AS i
WHERE i.acct_no = t.acct_no
AND i.policy_id = t.policy_id
AND i.process_dt = (SELECT MAX(ii.process_dt)
FROM dbo.pol_policy_status_v AS ii
WHERE ii.acct_no = i.acct_no
AND ii.policy_id = i.policy_id
AND ii.effective_d = (SELECT MAX(iii.effective_d)
FROM dbo.pol_policy_status_v AS iii
WHERE iii.acct_no = ii.acct_no
AND iii.policy_id = ii.policy_id)))) AS sts
WHERE pol.acct_no = sts.acct_no
AND pol.policy_id = sts.policy_id
AND pol.data_set_nbr = (SELECT MAX(i.data_set_nbr)
FROM dbo.pdr_pol AS i
WHERE i.acct_no = pol.acct_no
AND i.policy_id = pol.policy_id
AND i.stat_tran_c not like '%CNCLPA')
AND (((pol.stat_tran_c = 'CANCEL' OR pol.stat_tran_c = 'CANREWRITE') AND sts.cur_status_c = 'CANCEL')
OR (pol.stat_tran_c in ('REINSTATE','REINWGAP') AND sts.cur_status_c <> 'CANCEL'
AND pol.vers_cntl_maj_min = (SELECT MAX(i.vers_cntl_maj_min)
FROM dbo.pdr_pol AS i
WHERE i.acct_no = pol.acct_no
AND i.policy_id = pol.policy_id
AND i.data_set_nbr = pol.data_set_nbr)
AND pol.vers_cntl_maj_min <> pol.vers_data_maj_min)
OR (pol.stat_tran_c NOT IN ('REINSTATE','REINWGAP') AND sts.cur_status_c <> 'CANCEL'
AND pol.vers_cntl_maj_min = (SELECT MAX(i.vers_cntl_maj_min)
FROM dbo.pdr_pol AS i
WHERE i.acct_no = pol.acct_no
AND i.policy_id = pol.policy_id
AND i.data_set_nbr = pol.data_set_nbr)
and pol.vers_cntl_maj_min = pol.vers_data_maj_min));
GO
Any suggestions would be highly appreciated.
Thanks
T-SQL beginner
July 16, 2014 at 12:29 pm
Can you provide the requirements of view? Something like:
"Return the Policy information for active policies based on the most recent status of the policy."
I know that isn't right, but it is a start based on what I'm seeing in the query. I also don't know enough about the schema of the database and the data to know relationship between columns. I wouldn't think that you'd have to navigate from max effective date to max process data to get the max policy status v id.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
July 16, 2014 at 12:35 pm
Thanks for your reply Jack.
Basically the reason for creating this view is to get the latest version of policy since there are a lot of of policies in the database. My management has asked me if I can improve the query in a way that it is easily understandable by the person reading it. So any suggestion on that end would be great ...
July 16, 2014 at 12:57 pm
T-SQL beginnger (7/16/2014)
Thanks for your reply Jack.Basically the reason for creating this view is to get the latest version of policy since there are a lot of of policies in the database. My management has asked me if I can improve the query in a way that it is easily understandable by the person reading it. So any suggestion on that end would be great ...
Okay, so what are the rules that determine the latest version of a policy? Where I work (at a company in the insurance business) the most recent version of a policy is the one with the most recent effective date not greater than today. In your case it looks like there are quite a few more rules than that.
For readability alone I'd start by breaking out the sub-queries into CTE's because I find them easier to read. So I might start with something like this:
WITH max_status_effective_d
AS (
SELECT
iii.acct_no,
iii.policy_id,
MAX(iii.effective_d) AS effective_d
FROM
dbo.pol_policy_status_v AS iii
GROUP BY
iii.acct_no,
iii.policy_id
),
max_process_dt
AS (
SELECT
ii.acct_no,
ii.policy_id,
MAX(ii.process_dt) AS process_dt
FROM
dbo.pol_policy_status_v AS ii
WHERE
EXISTS ( SELECT
1
FROM
max_status_effective_d AS ED
WHERE
ED.acct_no = ii.acct_no AND
ED.policy_id = ii.policy_id AND
ED.effective_d = ii.effective_d )
GROUP BY
ii.acct_no,
ii.policy_id
),
max_policy_status_v_id
AS (
SELECT
i.acct_no,
i.policy_id,
MAX(i.policy_status_v_id) AS policy_status_v_id
FROM
dbo.pol_policy_status_v AS i
WHERE
EXISTS ( SELECT
1
FROM
max_process_dt AS ED
WHERE
ED.acct_no = i.acct_no AND
ED.policy_id = i.policy_id AND
ED.process_dt = i.process_dt )
GROUP BY
i.acct_no,
i.policy_id
),
policy_status
AS (
SELECT
t.acct_no,
t.policy_id,
t.cur_status_c,
t.effective_d,
t.process_dt
FROM
dbo.pol_policy_status_v AS t
WHERE
EXISTS ( SELECT
1
FROM
max_policy_status_v_id AS i
WHERE
i.acct_no = t.acct_no AND
i.policy_id = t.policy_id AND
i.policy_status_v_id = max_policy_status_v_id.policy_status_v_id )
)
SELECT
pol.Source_System_Identifier,
pol.BATCH_NUM,
pol.ACCT_NO,
pol.stat_header_id,
pol.POLICY_NO,
pol.POLICY_NBR_SEQ,
pol.POLICY_ID,
pol.VERS_CNTL_MAJ_MIN,
pol.vers_data_maj_min,
pol.DATA_SET_NBR,
sts.cur_status_c AS attrb_data,
pol.tran_eff_d policy_tran,
sts.effective_d status_tran
FROM
dbo.PDR_POL pol
JOIN policy_status AS PS
ON pol.acct_no = PS.acct_no AND
pol.policy_id = PS.policy_id
WHERE
pol.data_set_nbr = (
SELECT
MAX(i.data_set_nbr)
FROM
dbo.pdr_pol AS i
WHERE
i.acct_no = pol.acct_no AND
i.policy_id = pol.policy_id AND
i.stat_tran_c NOT LIKE '%CNCLPA'
) AND
(
(
(
pol.stat_tran_c = 'CANCEL' OR
pol.stat_tran_c = 'CANREWRITE'
) AND
PS.cur_status_c = 'CANCEL'
) OR
(
pol.stat_tran_c IN ('REINSTATE', 'REINWGAP') AND
PS.cur_status_c <> 'CANCEL' AND
pol.vers_cntl_maj_min = (
SELECT
MAX(i.vers_cntl_maj_min)
FROM
dbo.pdr_pol AS i
WHERE
i.acct_no = pol.acct_no AND
i.policy_id = pol.policy_id AND
i.data_set_nbr = pol.data_set_nbr
) AND
pol.vers_cntl_maj_min <> pol.vers_data_maj_min
) OR
(
pol.stat_tran_c NOT IN ('REINSTATE', 'REINWGAP') AND
PS.cur_status_c <> 'CANCEL' AND
pol.vers_cntl_maj_min = (
SELECT
MAX(i.vers_cntl_maj_min)
FROM
dbo.pdr_pol AS i
WHERE
i.acct_no = pol.acct_no AND
i.policy_id = pol.policy_id AND
i.data_set_nbr = pol.data_set_nbr
) AND
pol.vers_cntl_maj_min = pol.vers_data_maj_min
)
);
I know that there is a lot more that could be done for readability and performance as it seems like the query hits the same tables over and over and that could be reduced.
If you read links 1 & 3 in my signature and posted as suggested in them I think I or someone else could be even more help.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
July 16, 2014 at 1:03 pm
and another possible approach. ** although without test data I can't be sure. 🙂
I'd also recommend coming up with another way to identify the data_set_nbr. That is what is really going to kill your performance.
CREATE VIEW [dbo].[PDR_V_FUTURE_POLICY_VERS]
AS
BEGIN
;WITH
sts
AS (SELECT acct_no, policy_id, cur_status_c
,MAX(effective_d) OVER (PARTITION BY acct_no, policy_id) AS maxEffDate
,MAX(process_dt) OVER (PARTITION BY acct_no, policy_id) AS maxPrcDate
,MAX(policy_status_v_id) OVER (PARTITION BY acct_no, policy_id) AS maxStatus
FROM dbo.pol_policy_status_v AS t)
SELECT
pol.Source_System_Identifier,
pol.BATCH_NUM,
pol.ACCT_NO,
pol.stat_header_id,
pol.POLICY_NO,
pol.POLICY_NBR_SEQ,
pol.POLICY_ID,
pol.VERS_CNTL_MAJ_MIN,
pol.vers_data_maj_min,
pol.DATA_SET_NBR,
sts.cur_status_c AS attrb_data,
pol.tran_eff_d as policy_tran,
sts.maxEffDate AS status_tran
FROM dbo.PDR_POL pol
INNER JOIN sts
ON sts.acct_no = pol.acct_no
AND sts.policy_id = pol.policy_id
AND sts.maxPrcDate = pol.process_dt
AND sts.maxEffDate = pol.effective_d
AND sts.maxStatus = pol.policy_status_v_id
WHERE
AND
pol.data_set_nbr = (SELECT MAX(i.data_set_nbr)
FROM dbo.pdr_pol AS i
WHERE i.acct_no = pol.acct_no
AND i.policy_id = pol.policy_id
AND i.stat_tran_c not like '%CNCLPA')
AND (((pol.stat_tran_c = 'CANCEL' OR pol.stat_tran_c = 'CANREWRITE') AND sts.cur_status_c = 'CANCEL')
OR (pol.stat_tran_c in ('REINSTATE','REINWGAP') AND sts.cur_status_c <> 'CANCEL'
AND pol.vers_cntl_maj_min = (SELECT MAX(i.vers_cntl_maj_min)
FROM dbo.pdr_pol AS i
WHERE i.acct_no = pol.acct_no
AND i.policy_id = pol.policy_id
AND i.data_set_nbr = pol.data_set_nbr)
AND pol.vers_cntl_maj_min <> pol.vers_data_maj_min)
OR (pol.stat_tran_c NOT IN ('REINSTATE','REINWGAP') AND sts.cur_status_c <> 'CANCEL'
AND pol.vers_cntl_maj_min = (SELECT MAX(i.vers_cntl_maj_min)
FROM dbo.pdr_pol AS i
WHERE i.acct_no = pol.acct_no
AND i.policy_id = pol.policy_id
AND i.data_set_nbr = pol.data_set_nbr)
and pol.vers_cntl_maj_min = pol.vers_data_maj_min));
END
GO
______________________________________________________________________
Personal Motto: Why push the envelope when you can just open it?
If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.
Jason L. SelburgJuly 16, 2014 at 1:10 pm
Jason Selburg (7/16/2014)
and another possible approach. ** although without test data I can't be sure. 🙂I'd also recommend coming up with another way to identify the data_set_nbr. That is what is really going to kill your performance.
I agree with all that. I thought about doing the CTE you did, but I wasn't confident that it would be the same as the nested sub-queries (again, no test data)m, so I just went for something I find easier to read to start with.
I also think leveraging ROW_NUMBER to get to the most recent status row may be faster. I've seen examples where using row_number instead of MAX to get a specific row has been more efficient.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
July 16, 2014 at 1:20 pm
Jack Corbett (7/16/2014)
I also think leveraging ROW_NUMBER to get to the most recent status row may be faster. I've seen examples where using row_number instead of MAX to get a specific row has been more efficient.
Funny, that was the way I wrote it first.
______________________________________________________________________
Personal Motto: Why push the envelope when you can just open it?
If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.
Jason L. SelburgJuly 16, 2014 at 1:21 pm
I made a few changes to give you an idea. They might be wrong, but hopefully you can modify them accordingly.
WITH PStatus AS(
SELECT t.acct_no
,t.policy_id
,t.cur_status_c
,t.effective_d
,t.process_dt
,ROW_NUMBER() OVER(PARTITION BY acct_no,
policy_id
ORDER BY effective_id DESC,
process_dt DESC,
policy_status_v_id DESC ) rn
FROM dbo.pol_policy_status_v
)
SELECT pol.Source_System_Identifier
,pol.BATCH_NUM
,pol.ACCT_NO
,pol.stat_header_id
,pol.POLICY_NO
,pol.POLICY_NBR_SEQ
,pol.POLICY_ID
,pol.VERS_CNTL_MAJ_MIN
,pol.vers_data_maj_min
,pol.DATA_SET_NBR
,sts.cur_status_c AS attrb_data
,pol.tran_eff_d policy_tran
,sts.effective_d status_tran
FROM dbo.PDR_POL pol
JOIN PStatus AS sts ON pol.acct_no = sts.acct_no
AND pol.policy_id = sts.policy_id
WHERE sts.rn = 1 --The most recent from pol_policy_status_v (or something like that)
AND pol.data_set_nbr = (
SELECT MAX(i.data_set_nbr)
FROM dbo.pdr_pol AS i
WHERE i.acct_no = pol.acct_no
AND i.policy_id = pol.policy_id
AND i.stat_tran_c NOT LIKE '%CNCLPA'
)
AND (
(
pol.stat_tran_c IN( 'CANCEL', 'CANREWRITE')
AND sts.cur_status_c = 'CANCEL'
)
OR (
pol.stat_tran_c IN ('REINSTATE','REINWGAP')
AND sts.cur_status_c <> 'CANCEL'
AND pol.vers_cntl_maj_min = (
SELECT MAX(i.vers_cntl_maj_min)
FROM dbo.pdr_pol AS i
WHERE i.acct_no = pol.acct_no
AND i.policy_id = pol.policy_id
AND i.data_set_nbr = pol.data_set_nbr
)
AND pol.vers_cntl_maj_min <> pol.vers_data_maj_min
)
OR (
pol.stat_tran_c NOT IN ('REINSTATE','REINWGAP')
AND sts.cur_status_c <> 'CANCEL'
AND pol.vers_cntl_maj_min = (
SELECT MAX(i.vers_cntl_maj_min)
FROM dbo.pdr_pol AS i
WHERE i.acct_no = pol.acct_no
AND i.policy_id = pol.policy_id
AND i.data_set_nbr = pol.data_set_nbr
)
AND pol.vers_cntl_maj_min = pol.vers_data_maj_min
)
);
July 16, 2014 at 1:27 pm
Thanks a lot guys for your help. I actually joined this insurance company like a week ago and they have asked me to help improve this code.
I understand that you guys don't have the test data which doesn't helps the cause.
I actually tried running both the queries and for Jack's solution I get the following error:
Msg 4104, Level 16, State 1, Line 73
The multi-part identifier "max_policy_status_v_id.policy_status_v_id" could not be bound.
whereas for Jason's solution I am getting these errors:
Msg 207, Level 16, State 1, Line 26
Invalid column name 'process_dt'.
Msg 207, Level 16, State 1, Line 28
Invalid column name 'policy_status_v_id'
I will try to check the underlying tables and fix these errors to see if I get the same result set as with the original view and let you know.
Thanks again for your, really appreciate it !
July 16, 2014 at 1:31 pm
For the error in mine you just need to replace the max_policy_status_v_id to i because I aliased the CTE in the FROM clause.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
July 16, 2014 at 1:38 pm
I actually changed it to t and sts aliases to PS later in the code and it works fine.
I'm getting the same result set as original query.
Thanks
July 16, 2014 at 1:40 pm
T-SQL beginnger (7/16/2014)
I actually changed it to t and sts aliases to PS later in the code and it works fine.I'm getting the same result set as original query.
Thanks
Not saying that my approach is better, but you should always try different methods to see which performs better in your environment. ** just good practice IMHO.
______________________________________________________________________
Personal Motto: Why push the envelope when you can just open it?
If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.
Jason L. SelburgJuly 16, 2014 at 1:43 pm
Jason, I tried fixing the errors with your code and now I'm getting this error:
Msg 206, Level 16, State 2, Line 1
Operand type clash: date is incompatible with int
which is weird but I'm still looking into it
July 16, 2014 at 1:45 pm
Yeah, I will definitely try all three solutions and see which one is better.
July 16, 2014 at 1:46 pm
Jason Selburg (7/16/2014)
Not saying that my approach is better, but you should always try different methods to see which performs better in your environment. ** just good practice IMHO.
I'm pretty confident your approach will out-perform what I did because I didn't even try to improve performance, just readability. I just wasn't confident enough that I'd get the logic even close to the same, which is why I had the questions in my post with the code.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
Viewing 15 posts - 1 through 15 (of 26 total)
You must be logged in to reply to this topic. Login to reply