string_agg issue

  • I have a string_agg line in my code that is bringing back hundreds of duplicates in my service_cd field. Any ideas? Thx!:

    mbr_name     codes

    abc                   971, 971, 971, 971..............

    xyz                   978, 978, 978, 978..............

    select distinct
    s.MBR_NAME
    , string_agg(cast(s.SERVICE_CD as varchar(max)), ', ') as codes
    from MyReporting.dbo.EPISODE as s
    where s.EPISODE_TYPE in ( x2, x3 )
    group by s.MBR_NAME
    , s.SERVICE_CD
  • Your DISTINCT is applied to the result of the STRING_AGG, not to the elements it contains.

    Remove the dupes in a CTE and select your STRING_AGG from that.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • What is the datatype of SERVICE_CD? CASTing every element to VARCHAR(MAX) sounds like madness.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • It's varchar(15). Casting as varchar(max) was the seemingly only way to avoid this error:

    "STRING_AGG aggregation result exceeded the limit of 8000 bytes. Use LOB types to avoid result truncation."

  • DaveBriCam wrote:

    It's varchar(15). Casting as varchar(max) was the seemingly only way to avoid this error:

    "STRING_AGG aggregation result exceeded the limit of 8000 bytes. Use LOB types to avoid result truncation."

    After getting rid of the dupes, hopefully the error goes away without the need for a CAST.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Maybe like this?:

    select 
    s.MBR_NAME
    , string_agg(cast(s.SERVICE_CD as varchar(max)), ', ') as codes
    from (
    select distinct MBR_NAME, SERVICE_CD
    FROM MyReporting.dbo.EPISODE
    where EPISODE_TYPE in ( x2, x3 )
    ) as s
    group by s.MBR_NAME

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Thx! I prefer, for now, to look for a subquery since my *real* SQL already has one CTE (I've used 20+ before). I should not have oversimplified my code, but below is what I'm really up against. I tried the below and the duplicates persist

    declare @startdate date = '2023-08-01';
    declare @enddate date = '2023-08-31';

    with CTE_SARids
    as -- this is for collecting all the 'SAR' numbers for a date range along with the patients
    (select distinct
    DH.ENC_IDN
    , DH.MBR_IDN
    , DH.HIST_ADDED_DATE
    from MyReporting.dbo.V_MODEL_EPISODE_DECISION_HIST as DH
    where DH.HIST_ADDED_DATE
    between @startdate and @enddate
    group by DH.ENC_IDN
    , DH.MBR_IDN
    , DH.HIST_ADDED_DATE)

    -- Bringing back rest of needed data
    select SAR.ENC_IDN as 'SAR ID'
    , M.INDIVIDUALIZATION_NUMBER as 'Patient ID'
    , M.MBR_NAME as 'Patient Name'
    , DH.CURRENT_DECISION as 'Status'

    , string_agg(cast(S.SERVICE_CD as varchar(max)), ', ') as 'PC Codes'

    , convert(date, DH.HIST_ADDED_DATE, 101) as 'Initial Submission Date'
    , convert(date, DH.LAST_CHANGED_DATE, 101) as 'Effective Date'
    from CTE_SARids as SAR
    inner join MyReporting.dbo.V_MODEL_EPISODES as e
    on SAR.ENC_IDN = e.ENC_IDN

    inner join
    (
    select distinct
    SERVICE_CD
    , ENC_IDN
    , SVC_EXTN_IDN
    from MyReporting.dbo.V_MODEL_EPISODE_SERVICES
    ) as S

    on S.ENC_IDN = e.ENC_IDN
    inner join MyReporting.dbo.V_MODEL_EPISODE_PROVIDERS as P
    on P.ENC_IDN = e.ENC_IDN
    inner join MyReporting.dbo.V_MODEL_AUTH_SERVICE_EXTN_HIST as EH
    on EH.ENC_IDN = S.ENC_IDN
    and EH.SVC_EXTN_IDN = S.SVC_EXTN_IDN
    inner join MyReporting.dbo.V_MODEL_EPISODE_DECISION_HIST as DH
    on DH.ENC_IDN = S.ENC_IDN
    and DH.SVC_EXTN_IDN = S.SVC_EXTN_IDN
    inner join MyReporting.dbo.V_MODEL_EPISODE_COVERAGE as EC
    on EC.ENC_IDN = S.ENC_IDN
    inner join MyReporting.dbo.V_MODEL_MEMBERS as M
    on M.MBR_IDN = e.MBR_IDN
    inner join JivaReporting.dbo.V_MODEL_MBR_COVERAGE as MC
    on MC.MBR_IDN = e.MBR_IDN
    and MC.ID_TYPE_CD = 'alt'
    where e.EPISODE_TYPE in ( 'Behavioral Health Inpatient', 'Behavioral Health Outpatient', 'Inpatient', 'OutPatient' )
    and cast(DH.UPDATED_DATE as date)
    between @startdate and @enddate
    and cast(e.RECEIVED_DATE as date) > '3/31/2023'
    and DH.CURRENT_DECISION <> '-'
    and DH.CURRENT_DECISION in ( 'Approved', 'Denied' )
    and S.SERVICE_CD in ( '97151', '97152', '97153', '97154', '97155', '97156', '97157' )
    group by M.INDIVIDUALIZATION_NUMBER
    , M.MBR_NAME
    , SAR.ENC_IDN
    , DH.HIST_ADDED_DATE
    , EC.CLIENT_NAME
    , DH.LAST_CHANGED_DATE
    , DH.CURRENT_DECISION
    , S.SERVICE_CD
    , SAR.ENC_IDN;






    :

  • Wow, don't be offended, but what a mess! It might be time to think about a divide and conquer approach. Build up your result sets individually and then assemble them at the end to output what you need, as this will be much easier to debug.

    Also, in the CTE, you are doing a SELECT DISTINCT and a GROUP BY, on the same columns. Remove the GROUP BY and you'll get the same results.

     

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • select SAR.ENC_IDN                                          as 'SAR ID'
    , M.INDIVIDUALIZATION_NUMBER as 'Patient ID'
    , M.MBR_NAME as 'Patient Name'
    , DH.CURRENT_DECISION as 'Status'

    , S.SERVICE_CDS as 'PC Codes' --<<--

    , convert(date, DH.HIST_ADDED_DATE, 101) as 'Initial Submission Date'
    , convert(date, DH.LAST_CHANGED_DATE, 101) as 'Effective Date'
    from ...
    inner join --<<--
    (
    select ENC_IDN,
    string_agg(SERVICE_CD, ', ') AS SERVICE_CDS
    from
    (
    select distinct
    SERVICE_CD
    , ENC_IDN
    from MyReporting.dbo.V_MODEL_EPISODE_SERVICES
    where S.SERVICE_CD in ( '97151', '97152', '97153', '97154', '97155', '97156', '97157' )
    ) as S0
    group by ENC_IDN
    ) as S
    on S.ENC_IDN = e.ENC_IDN
    ...
    /* shouldn't need nor want this
    group by M.INDIVIDUALIZATION_NUMBER --<<--
    , M.MBR_NAME
    , SAR.ENC_IDN
    , DH.HIST_ADDED_DATE
    , EC.CLIENT_NAME
    , DH.LAST_CHANGED_DATE
    , DH.CURRENT_DECISION
    , S.SERVICE_CD
    , SAR.ENC_IDN;
    */

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Thanks Scott! That worked

  • Great.  Glad it helped, and that the abbreviated way I posted it made sense.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • I have another pesky string_agg for which I'm trying to follow the example above but not getting the results like a field that has the aggregate like: "All, MH":

    SELECT DISTINCT
    clm.pat_id AS 'Consumer #',
    TRIM(pat.pat_ln) AS 'LastName',
    TRIM(pat.pat_fn) AS 'FirstName',
    TRIM(pat.pat_mn) AS 'MiddleName',
    ISNULL(
    (
    SELECT ins_desc
    FROM AlphaMCS_shc.dbo.tb_insurances
    WHERE clm.bp_id = ins_id
    ),
    ''
    ) AS 'Benefit Plan',
    CAST(pat.pat_dob AS DATE) AS 'DOB',
    cty.cnty_nm AS 'County',
    ad.state AS 'State',


    s.Diagnosis_Group AS 'DiagnosisGroup',

    pcodes.pc_desc AS 'ServiceDescription',
    clm.prv_id AS 'ProviderNK',
    sit.site_id AS 'SiteProviderID1',
    sit.site_nm AS 'SiteProviderName1',

    CAST(clm.clm_dos AS DATE) AS 'Date Of Service'
    FROM AlphaMCS_shc.dbo.vw_claims clm
    INNER JOIN AlphaMCS_shc.dbo.tb_patients pat
    ON clm.pat_id = pat.pat_id
    INNER JOIN AlphaMCS_shc.dbo.tb_pat_addresses AS ad
    ON ad.pat_id = pat.pat_id
    INNER JOIN AlphaMCS_shc.dbo.tb_pat_diags AS pd
    ON pd.pat_id = pat.pat_id
    INNER JOIN AlphaMCS_shc.dbo.tb_diag_to_diag_groups AS ddg
    ON ddg.dx_id = pd.dx_id
    INNER JOIN AlphaMCS_shc.dbo.tb_diag_groups AS dg
    ON dg.dx_grp_id = ddg.dx_grp_id
    INNER JOIN AlphaMCS_shc.dbo.tb_counties CTY
    ON cty.cnty_id = ad.cnty_id
    INNER JOIN AlphaMCS_shc.dbo.tb_providers prv
    ON prv.prv_id = clm.prv_id
    INNER JOIN AlphaMCS_shc.dbo.tb_sites sit
    ON sit.site_id = clm.site_id
    INNER JOIN AlphaMCS_shc.dbo.tb_proc_codes pcodes
    ON clm.pc_id = pcodes.pc_id

    INNER JOIN
    (
    select dx_grp_id,
    string_agg(dx_grp_desc, ', ') AS Diagnosis_Group
    from
    (
    select distinct dx_grp_id,
    dx_grp_desc
    from AlphaMCS_shc.dbo.tb_diag_groups
    WHERE dx_grp_desc IN ('All','AO','IDD','MH','SA','MH High End','MED')
    ) as S0
    group BY dx_grp_id
    ) as S
    on S.dx_grp_id = dg.dx_grp_id

    WHERE clm.clm_dos
    BETWEEN '2023-12-01' AND '2024-01-31'
    AND clm.rev_code = '0911'
    ORDER BY clm.pat_id, 'Date Of Service'
  • Got it to work with some CTE acrobatics:

    WITH First_CTE AS (
    SELECT DISTINCT
    clm.pat_id,
    pat.pat_ln,
    pat.pat_fn,
    pat.pat_mn,
    ISNULL(
    (
    SELECT ins_desc
    FROM AlphaMCS_shc.dbo.tb_insurances
    WHERE clm.bp_id = ins_id
    ),
    ''
    ) AS 'Benefit_Plan',
    pat.pat_dob,
    cty.cnty_nm,
    ad.state,
    dg.dx_grp_desc ,
    pcodes.pc_desc,
    clm.prv_id,
    sit.site_id,
    sit.site_nm,
    clm.clm_dos
    FROM AlphaMCS_shc.dbo.vw_claims clm
    INNER JOIN AlphaMCS_shc.dbo.tb_patients pat
    ON clm.pat_id = pat.pat_id
    INNER JOIN AlphaMCS_shc.dbo.tb_pat_addresses AS ad
    ON ad.pat_id = pat.pat_id
    INNER JOIN AlphaMCS_shc.dbo.tb_pat_diags AS pd
    ON pd.pat_id = pat.pat_id
    INNER JOIN AlphaMCS_shc.dbo.tb_diag_to_diag_groups AS ddg
    ON ddg.dx_id = pd.dx_id
    INNER JOIN AlphaMCS_shc.dbo.tb_diag_groups AS dg
    ON dg.dx_grp_id = ddg.dx_grp_id
    INNER JOIN AlphaMCS_shc.dbo.tb_counties CTY
    ON cty.cnty_id = ad.cnty_id
    INNER JOIN AlphaMCS_shc.dbo.tb_providers prv
    ON prv.prv_id = clm.prv_id
    INNER JOIN AlphaMCS_shc.dbo.tb_sites sit
    ON sit.site_id = clm.site_id
    INNER JOIN AlphaMCS_shc.dbo.tb_proc_codes pcodes
    ON clm.pc_id = pcodes.pc_id
    WHERE clm.clm_dos
    BETWEEN '2023-12-01' AND '2024-01-31'
    AND clm.rev_code = '0911'
    )
    ,Second_CTE
    AS
    (
    SELECT
    *
    FROM First_CTE
    GROUP BY
    pat_id,
    pat_ln,
    pat_fn,
    pat_mn,
    Benefit_Plan,
    pat_dob,
    cnty_nm,
    state,
    dx_grp_desc,
    pc_desc,
    prv_id,
    site_id,
    site_nm,
    clm_dos
    )
    SELECT
    pat_id AS 'Consumer #',
    TRIM(pat_ln) AS 'LastName',
    TRIM(pat_fn) AS 'FirstName',
    TRIM(pat_mn) AS 'MiddleName',
    Benefit_Plan AS 'Benefit Plan',
    CAST(pat_dob AS DATE) AS 'DOB',
    cnty_nm AS 'County',
    state AS 'State',
    string_agg(dx_grp_desc, ', ') AS 'DiagnosisGroup',
    pc_desc AS 'ServiceDescription',
    prv_id AS 'ProviderNK',
    site_id AS 'SiteProviderID1',
    site_nm AS 'SiteProviderName1',
    CAST(clm_dos AS DATE) AS 'Date Of Service'
    FROM Second_CTE
    GROUP BY
    pat_id,
    pat_ln,
    pat_fn,
    pat_mn,
    Benefit_Plan,
    pat_dob,
    cnty_nm,
    state,
    pc_desc,
    prv_id,
    site_id,
    site_nm,
    clm_dos
    ORDER BY [Consumer #]


Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply