How to filter out the data?

  • Hello All,

    Need your help to filter out the data, please. I couldn't find a better term to describe, but hope the example below will instantly clarify what I need:

    1. There is a range of simple physical barcodes, applied on imported products - I am sure you have seen many examples in supermarkets around you. Let's assume for a given order the physical barcodes are numbered from 1 to 100. Some of those barcodes could remain unused and will be utilised with the next order, some can be cancelled, e.g. the glue is dry and they could not be applied on a product, etc. Below is a SQL representation of what might be.

    For order id = 1 I want to retrieve eventually a list, showing something like the list below.

    I can implement in C#, using loops, but I wonder if it is possible to have it done in SQL, on a server side, please?

    Many thanks in advance, as always!

    Code_TypeStart_IdEnd_Id
    all_codes_set15
    all_codes_set714
    all_codes_set1749
    all_codes_set61100
    voided66
    cancelled5060
    unused1516

     

     

     

    DECLARE @v_order_id INT = 1;
    WITH myData AS(
    SELECT 'all_codes_set' AS code_type,
    1 AS start_id,
    100 AS end_id,
    1 AS order_id
    UNION ALL
    SELECT 'all_codes_set' AS code_type,
    101 AS start_id,
    110 AS end_id,
    2 AS order_id
    UNION ALL
    SELECT 'voided' AS code_type,
    6 AS start_id,
    6 AS end_id,
    1 AS order_id
    UNION ALL
    SELECT 'cancelled' AS code_type,
    50 AS start_id,
    60 AS end_id,
    1 AS order_id
    UNION ALL
    SELECT 'unused' AS code_type,
    15 AS start_id,
    16 AS end_id,
    1 AS order_id
    UNION ALL
    SELECT 'unused' AS code_type,
    106 AS start_id,
    106 AS end_id,
    2 AS order_id
    )
    SELECT *FROM myData m
    WHERE m.order_id = @v_order_id
  • Looks like Min/Max works for your scenario

    IF OBJECT_ID('tempdb..#Codes') IS NOT NULL
    DROP TABLE #Codes

    Create table #Codes (
    Code_Type varchar(20),
    Start_Id int,
    End_Id int)

    insert into #Codes values
    ('all_codes_set',1,5 ),
    ('all_codes_set',7,14 ),
    ('all_codes_set',17,49 ),
    ('all_codes_set',61,100),
    ('voided',6,6),
    ('cancelled',50,60),
    ('unused',15,16)

    Select Code_type, min(Start_Id) as StartID, max(End_ID) as EndID
    from #codes
    group by Code_Type

    • This reply was modified 3 years, 10 months ago by  Mike01.

    For better, quicker answers, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Thank you, Mike01

    When I run your example I only receive one line per type, as GROUP BY expected to behave, where what I need is sequences as in my expected results set: from 1 to 5, from 7 to 14 etc.

  • I created DDL based on your sample.  You hardcoded the orderid in your union all statement.  Can you provide usable DDL and data to help show what you are looking for?

    For better, quicker answers, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • The fastest way to reliably get an answer is not necessarily the "best" way but it's functional.  Afaik something like this would work.  It uses a tally function to expand the rows.  The CTE 'lag_gaps_cte' creates the sequence fn.n (and it's LAG) using dbo.fnTally which expands the row range of start_id/end_id.  The 'lag_gaps_cte' CTE also excludes voided, cancelled, and unused values.  Then, using the calculated 'gap_flag' (from the 'lag_gaps_cte') the CTE 'sum_grps_cte' creates groups ('gap_grp') of contiguous sequences.  The outer query uses the 'gap_grp' in the GROUP BY which summarizes the (remaining) rows where code_type='all_codes_set'.  Also, to get the outer query to display in the correct order the query adds a 'sort_order' column to the CTE's and uses it in the ORDER BY of the outer query.

    DECLARE @v_order_id INT = 1;

    WITH
    myData AS(
    SELECT 'all_codes_set' AS code_type,
    1 AS start_id,
    100 AS end_id,
    1 AS order_id
    UNION ALL
    SELECT 'all_codes_set' AS code_type,
    101 AS start_id,
    110 AS end_id,
    2 AS order_id
    UNION ALL
    SELECT 'voided' AS code_type,
    6 AS start_id,
    6 AS end_id,
    1 AS order_id
    UNION ALL
    SELECT 'cancelled' AS code_type,
    50 AS start_id,
    60 AS end_id,
    1 AS order_id
    UNION ALL
    SELECT 'unused' AS code_type,
    15 AS start_id,
    16 AS end_id,
    1 AS order_id
    UNION ALL
    SELECT 'unused' AS code_type,
    106 AS start_id,
    106 AS end_id,
    2 AS order_id
    ),
    voided_cte(sort_order, code_type, start_id, end_id, order_id) as (
    select 2, *
    from myData
    where order_id = @v_order_id
    and code_type='voided'),
    cancelled_cte(sort_order, code_type, start_id, end_id, order_id) as (
    select 3, *
    from myData
    where order_id = @v_order_id
    and code_type='cancelled'),
    unused_cte(sort_order, code_type, start_id, end_id, order_id) as (
    select 4, *
    from myData
    where order_id = @v_order_id
    and code_type='unused'),
    lag_gaps_cte(code_type, start_id, end_id, order_id, n, gap_flag) as (
    select m.*, fn.n, case when fn.n-lag(fn.n, 1, 0) over (order by fn.n)>1 then 1 else 0 end
    from myData m
    cross apply dbo.fnTally(1, m.end_id-m.start_id+1) fn
    where m.order_id = @v_order_id
    and code_type='all_codes_set'
    and not exists (select 1
    from voided_cte v
    where fn.n between v.start_id and v.end_id)
    and not exists (select 1
    from cancelled_cte c
    where fn.n between c.start_id and c.end_id)
    and not exists (select 1
    from unused_cte u
    where fn.n between u.start_id and u.end_id)),
    sum_grps_cte(sort_order, code_type, start_id, end_id, order_id, n, gap_flag, gap_grp) as (
    select 1, *, sum(gap_flag) over (order by n)
    from lag_gaps_cte)
    select sort_order, code_type, min(n) start_id, max(n) end_id, order_id
    from sum_grps_cte
    group by sort_order, code_type, gap_grp, order_id
    union all
    select * from voided_cte
    union all
    select * from cancelled_cte
    union all
    select * from unused_cte
    order by sort_order, start_id;
    sort_ordercode_typestart_idend_idorder_id
    1all_codes_set151
    1all_codes_set7141
    1all_codes_set17491
    1all_codes_set611001
    2voided661
    3cancelled50601
    4unused15161

     

     

     

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Thank you, Steve. It is really a great solution, yet I cannot use fnTally function in this case. Simply not allowed.

  • Ok, the non-tvf same approach uses an additional CTE and SELECT TOP(n).  The following code is to be embedded in the SQL to replace dbo.fnTally.  The maximum # of rows is up to 20^4 or 160,000 (which should be plenty but if it's not then additional CROSS JOIN's could be added).

    /* the row_goal sets the number of rows to generate */
    declare @row_goal int=5;

    /* numbers_cte (1, 2, ..., 20) is CROSS JOINED 4 times to potentially generate up to 160,000 rows (20^4). */
    /* the row_goal makes the query very efficient because the unneeded rows are never generated */
    with numbers_cte(n) as (
    select * from (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),
    (11),(12),(13),(14),(15),(16),(17),(18),(19),(20)) v(n))
    select top(@row_goal) row_number() over (order by (select null)) as N
    from numbers_cte n1
    cross join numbers_cte n2
    cross join numbers_cte n3
    cross join numbers_cte n4;
    N
    1
    2
    3
    4
    5

     

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • DECLARE @v_order_id INT = 1;

    WITH
    myData AS(
    SELECT 'all_codes_set' AS code_type,
    1 AS start_id,
    100 AS end_id,
    1 AS order_id
    UNION ALL
    SELECT 'all_codes_set' AS code_type,
    101 AS start_id,
    110 AS end_id,
    2 AS order_id
    UNION ALL
    SELECT 'voided' AS code_type,
    6 AS start_id,
    6 AS end_id,
    1 AS order_id
    UNION ALL
    SELECT 'cancelled' AS code_type,
    50 AS start_id,
    60 AS end_id,
    1 AS order_id
    UNION ALL
    SELECT 'unused' AS code_type,
    15 AS start_id,
    16 AS end_id,
    1 AS order_id
    UNION ALL
    SELECT 'unused' AS code_type,
    106 AS start_id,
    106 AS end_id,
    2 AS order_id
    ),
    numbers_cte(n) as (
    select * from (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),
    (11),(12),(13),(14),(15),(16),(17),(18),(19),(20)) v(n)),
    voided_cte(sort_order, code_type, start_id, end_id, order_id) as (
    select 2, *
    from myData
    where order_id = @v_order_id
    and code_type='voided'),
    cancelled_cte(sort_order, code_type, start_id, end_id, order_id) as (
    select 3, *
    from myData
    where order_id = @v_order_id
    and code_type='cancelled'),
    unused_cte(sort_order, code_type, start_id, end_id, order_id) as (
    select 4, *
    from myData
    where order_id = @v_order_id
    and code_type='unused'),
    lag_gaps_cte(code_type, start_id, end_id, order_id, n, gap_flag) as (
    select m.*, fn.n, case when fn.n-lag(fn.n, 1, 0) over (order by fn.n)>1 then 1 else 0 end
    from myData m
    cross apply (select top(m.end_id-m.start_id+1) row_number() over (order by (select null)) as N
    from numbers_cte n1
    cross join numbers_cte n2
    cross join numbers_cte n3
    cross join numbers_cte n4) fn
    where m.order_id = @v_order_id
    and code_type='all_codes_set'
    and not exists (select 1
    from voided_cte v
    where fn.n between v.start_id and v.end_id)
    and not exists (select 1
    from cancelled_cte c
    where fn.n between c.start_id and c.end_id)
    and not exists (select 1
    from unused_cte u
    where fn.n between u.start_id and u.end_id)),
    sum_grps_cte(sort_order, code_type, start_id, end_id, order_id, n, gap_flag, gap_grp) as (
    select 1, *, sum(gap_flag) over (order by n)
    from lag_gaps_cte)
    select sort_order, code_type, min(n) start_id, max(n) end_id, order_id
    from sum_grps_cte
    group by sort_order, code_type, gap_grp, order_id
    union all
    select * from voided_cte
    union all
    select * from cancelled_cte
    union all
    select * from unused_cte
    order by sort_order, start_id;

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Here is my take:

    Declare @orderCodeSets Table (order_id int, code_type varchar(20), start_id int, end_id int);

    Insert Into @orderCodeSets (order_id, code_type, start_id, end_id)
    Values (1, 'all_code_sets', 1, 100)
    , (2, 'all_code_sets', 101, 110)
    , (1, 'voided', 6, 6)
    , (1, 'cancelled', 50, 60)
    , (1, 'unused', 15, 16)
    , (2, 'unused', 106, 106);

    Declare @v_order_id int = 1;

    With t (n)
    As (
    Select t.n
    From (
    Values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) As t(n)
    )
    , iTally (Number)
    As (
    Select checksum(row_number() over(Order By @@spid))
    From t t1, t t2, t t3 --1000
    )
    , all_code_types
    As (
    Select ocs.order_id
    , code_type = coalesce(cs2.code_type, ocs.code_type)
    , start_id = it.Number
    , max_end = max(it.Number) over(Partition By ocs.order_id) + 1
    From iTally it
    Inner Join @orderCodeSets ocs On it.Number Between ocs.start_id And ocs.end_id
    Left Join @orderCodeSets cs2 On cs2.order_id = ocs.order_id
    And cs2.code_type <> 'all_code_sets'
    And it.Number Between cs2.start_id And cs2.end_id
    Where ocs.order_id = @v_order_id
    And ocs.code_type = 'all_code_sets'
    )
    , code_types
    As (
    Select acs.order_id
    , acs.code_type
    , acs.start_id
    , start_range = iif(acs.code_type <> lag(acs.code_type, 1, 1) over(Partition By acs.order_id Order By acs.start_id), 1, 0)
    , acs.max_end
    From all_code_types acs
    )
    Select ct.order_id
    , ct.code_type
    , ct.start_id
    , end_id = lead(ct.start_id, 1, ct.max_end) over(Partition By ct.order_id Order By ct.start_id) - 1
    From code_types ct
    Where ct.start_range = 1
    Order By
    ct.order_id
    , ct.code_type;

    Any solution with an inline tally table will need to generate enough rows to cover the maximum range of code sets per order.  This allows up to 1000 rows per order - not sure if that will be enough.

    This works by exploding out the rows for all_code_sets per order...then 'replacing' the rows for all code types that are not 'all_code_sets'.  Then - each start range is identified by checking the previous rows code type...if different then this is the start of a range.

    Now it is just a matter of calculating the end id - which will be the next start minus one - unless it is the last row which will be our max_end for the order (we calculate that as max + 1 so the minus one from lead works for all values)

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Steve, Jeffrey

    Thank you both for such a great help! May I ask couple of more questions, please:

    1. The real data I have seen may have over 20,000 barcodes per order (think of a pallet with boxes on it, with hundreds of barcodes in each). Jeffrey has mentioned his script is limited to 1,000 only. Is it possible to increase the limit significantly or it will affect the performance?
    2. I appreciate I can run many tests to establish what will be faster - using your wonderful solutions or to simply query the database to get the ranges and then "loop" through them in C# - what would be your gut feeling, please?
  • You can add two more tables in the itally.  That should generate more than enough rows.

    As for performance, that needs to be tested.

    We can shortcut the number of rows generated, when i am back at my desk I can provide those changes

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Here are the modifications:

    Declare @orderCodeSets Table (order_id int, code_type varchar(20), start_id int, end_id int);

    Insert Into @orderCodeSets (order_id, code_type, start_id, end_id)
    Values (1, 'all_code_sets', 1, 20000)
    , (2, 'all_code_sets', 20001, 20110)
    , (1, 'voided', 6, 6)
    , (1, 'cancelled', 50, 60)
    , (1, 'unused', 15, 16)
    , (1, 'voided', 10101, 10120)
    , (1, 'cancelled', 11150, 11360)
    , (1, 'unused', 11515, 11616)
    , (1, 'voided', 16806, 16909)
    , (1, 'cancelled', 18850, 18960)
    , (1, 'unused', 19915, 19816)
    , (2, 'unused', 106, 106);

    Declare @v_order_id int = 1;

    Declare @max_id int = (Select ocs.end_id + 1
    From @orderCodeSets ocs
    Where ocs.order_id = @v_order_id
    And ocs.code_type = 'all_code_sets');

    With t (n)
    As (
    Select t.n
    From (
    Values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) As t(n)
    )
    , iTally (Number)
    As (
    Select Top (@max_id)
    checksum(row_number() over(Order By @@spid))
    From t t1, t t2, t t3, t t4, t t5
    )
    , all_code_types
    As (
    Select ocs.order_id
    , code_type = coalesce(cs2.code_type, ocs.code_type)
    , start_id = it.Number
    From iTally it
    Inner Join @orderCodeSets ocs On it.Number Between ocs.start_id And ocs.end_id
    Left Join @orderCodeSets cs2 On cs2.order_id = ocs.order_id
    And cs2.code_type <> 'all_code_sets'
    And it.Number Between cs2.start_id And cs2.end_id
    Where ocs.order_id = @v_order_id
    And ocs.code_type = 'all_code_sets'
    )
    , code_types
    As (
    Select acs.order_id
    , acs.code_type
    , acs.start_id
    , start_range = iif(acs.code_type <> lag(acs.code_type, 1, 1) over(Partition By acs.order_id Order By acs.start_id), 1, 0)
    From all_code_types acs
    )
    Select ct.order_id
    , ct.code_type
    , ct.start_id
    , end_id = lead(ct.start_id, 1, @max_id) over(Partition By ct.order_id Order By ct.start_id) - 1
    From code_types ct
    Where ct.start_range = 1
    Order By
    ct.start_id;

     

    • This reply was modified 3 years, 10 months ago by  Jeffrey Williams. Reason: Removed the max_end column from the query - isn't needed since we switched to using @max_id

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Thank you, Jeffrey

     

    Much appreciated. Will now read more to understand why adding only three tables significantly increased the result set - I am not well familiar with

    Select t.n 
    From (
    Values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) As t(n)

    approach in general. I think in Oracle we would use DUAL table instead.

     

    Thank you again.

  • In Oracle - I wouldn't use this structure at all.  I would generate a sequence as needed using INTERVAL - and of course, dual.  But then again I haven't done anything in Oracle in over 15 years.

    The from clause can be rewritten as:  From t t1 Cross Join t t2 Cross Join t t3 Cross Join t t4 Cross Join t t5

    So we get 10 rows cross joined to 10 rows = 100 rows, cross joined with 10 rows = 1000 rows, cross joined with 10 rows = 10000 rows, cross joined with 10 rows = 100000 rows.

    The TOP short-circuits the generation of rows so we only generate the maximum needed for the order.

     

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • I see! Thank you for the explanation, Jeffrey.

     

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply