How to filter out the data?

Question

Post reply

How to filter out the data?

BOR15K

SSCertifiable

Points: 5894
More actions
January 11, 2021 at 12:58 pm
Go to Answer
#3830875
Hello All,
Need your help to filter out the data, please. I couldn't find a better term to describe, but hope the example below will instantly clarify what I need:
1. There is a range of simple physical barcodes, applied on imported products - I am sure you have seen many examples in supermarkets around you. Let's assume for a given order the physical barcodes are numbered from 1 to 100. Some of those barcodes could remain unused and will be utilised with the next order, some can be cancelled, e.g. the glue is dry and they could not be applied on a product, etc. Below is a SQL representation of what might be.
For order id = 1 I want to retrieve eventually a list, showing something like the list below.
I can implement in C#, using loops, but I wonder if it is possible to have it done in SQL, on a server side, please?
Many thanks in advance, as always!
```
Code_TypeStart_IdEnd_Id
all_codes_set15
all_codes_set714
all_codes_set1749
all_codes_set61100
voided66
cancelled5060
unused1516
```
```
DECLARE @v_order_id INT = 1;
WITH myData AS(
SELECT 'all_codes_set' AS code_type, 
1 AS start_id,
100 AS end_id,
1 AS order_id
UNION ALL
SELECT 'all_codes_set' AS code_type, 
101 AS start_id,
110 AS end_id,
2 AS order_id
UNION ALL
SELECT 'voided' AS code_type, 
6 AS start_id,
6 AS end_id,
1 AS order_id
UNION ALL
SELECT 'cancelled' AS code_type, 
50 AS start_id,
60 AS end_id,
1 AS order_id
UNION ALL
SELECT 'unused' AS code_type, 
15 AS start_id,
16 AS end_id,
1 AS order_id
UNION ALL
SELECT 'unused' AS code_type, 
106 AS start_id,
106 AS end_id,
2 AS order_id
)
SELECT *FROM myData m
WHERE m.order_id = @v_order_id
```

Jeffrey Williams

SSC Guru

Points: 90303

January 11, 2021 at 7:40 pm

Answer

#3831110

Here are the modifications:

Declare @orderCodeSets Table (order_id int, code_type varchar(20), start_id int, end_id int);

 Insert Into @orderCodeSets (order_id, code_type, start_id, end_id)
 Values (1, 'all_code_sets', 1, 20000)
      , (2, 'all_code_sets', 20001, 20110)
      , (1, 'voided', 6, 6)
      , (1, 'cancelled', 50, 60)
      , (1, 'unused', 15, 16)
      , (1, 'voided', 10101, 10120)
      , (1, 'cancelled', 11150, 11360)
      , (1, 'unused', 11515, 11616)
      , (1, 'voided', 16806, 16909)
      , (1, 'cancelled', 18850, 18960)
      , (1, 'unused', 19915, 19816)
      , (2, 'unused', 106, 106);

Declare @v_order_id int = 1;

Declare @max_id int = (Select ocs.end_id + 1
                         From @orderCodeSets ocs
                        Where ocs.order_id = @v_order_id
                          And ocs.code_type = 'all_code_sets');

   With t (n)
     As (
 Select t.n 
   From (
 Values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) As t(n)
        )
      , iTally (Number)
     As (
 Select Top (@max_id)
        checksum(row_number() over(Order By @@spid))
   From t t1, t t2, t t3, t t4, t t5                     
        )
      , all_code_types
     As (
 Select ocs.order_id
      , code_type = coalesce(cs2.code_type, ocs.code_type)
      , start_id = it.Number
   From iTally                           it
  Inner Join @orderCodeSets             ocs On it.Number Between ocs.start_id And ocs.end_id
   Left Join @orderCodeSets             cs2 On cs2.order_id = ocs.order_id
                                           And cs2.code_type <> 'all_code_sets'
                                           And it.Number Between cs2.start_id And cs2.end_id
  Where ocs.order_id = @v_order_id
    And ocs.code_type = 'all_code_sets'
        )
      , code_types
     As (
 Select acs.order_id
      , acs.code_type
      , acs.start_id
      , start_range = iif(acs.code_type <> lag(acs.code_type, 1, 1) over(Partition By acs.order_id Order By acs.start_id), 1, 0)
   From all_code_types                  acs
        )
 Select ct.order_id
      , ct.code_type
      , ct.start_id
      , end_id = lead(ct.start_id, 1, @max_id) over(Partition By ct.order_id Order By ct.start_id) - 1
   From code_types                       ct
  Where ct.start_range = 1
  Order By
        ct.start_id;

This reply was modified 3 years, 10 months ago by Jeffrey Williams. Reason: Removed the max_end column from the query - isn't needed since we switched to using @max_id

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

Mike01 SSChampion Points: 11395 More actions · Answer 1

Looks like Min/Max works for your scenario

IF OBJECT_ID('tempdb..#Codes') IS NOT NULL
DROP TABLE #Codes

Create table #Codes (
Code_Type varchar(20),
Start_Id int,
End_Id int)

insert into #Codes values
('all_codes_set',1,5 ),
('all_codes_set',7,14 ),
('all_codes_set',17,49 ),
('all_codes_set',61,100),
('voided',6,6),
('cancelled',50,60),
('unused',15,16)

Select Code_type, min(Start_Id) as StartID, max(End_ID) as EndID
from #codes
group by Code_Type

This reply was modified 3 years, 10 months ago by Mike01.

For better, quicker answers, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

BOR15K SSCertifiable Points: 5894 More actions · Answer 2

Thank you, Mike01

When I run your example I only receive one line per type, as GROUP BY expected to behave, where what I need is sequences as in my expected results set: from 1 to 5, from 7 to 14 etc.

Mike01 SSChampion Points: 11395 More actions · Answer 3

I created DDL based on your sample. You hardcoded the orderid in your union all statement. Can you provide usable DDL and data to help show what you are looking for?

For better, quicker answers, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Steve Collins SSCrazy Points: 2157 More actions · Answer 4

The fastest way to reliably get an answer is not necessarily the "best" way but it's functional. Afaik something like this would work. It uses a tally function to expand the rows. The CTE 'lag_gaps_cte' creates the sequence fn.n (and it's LAG) using dbo.fnTally which expands the row range of start_id/end_id. The 'lag_gaps_cte' CTE also excludes voided, cancelled, and unused values. Then, using the calculated 'gap_flag' (from the 'lag_gaps_cte') the CTE 'sum_grps_cte' creates groups ('gap_grp') of contiguous sequences. The outer query uses the 'gap_grp' in the GROUP BY which summarizes the (remaining) rows where code_type='all_codes_set'. Also, to get the outer query to display in the correct order the query adds a 'sort_order' column to the CTE's and uses it in the ORDER BY of the outer query.

DECLARE @v_order_id INT = 1;

WITH
myData AS(
SELECT 'all_codes_set' AS code_type, 
1 AS start_id,
100 AS end_id,
1 AS order_id
UNION ALL
SELECT 'all_codes_set' AS code_type, 
101 AS start_id,
110 AS end_id,
2 AS order_id
UNION ALL
SELECT 'voided' AS code_type, 
6 AS start_id,
6 AS end_id,
1 AS order_id
UNION ALL
SELECT 'cancelled' AS code_type, 
50 AS start_id,
60 AS end_id,
1 AS order_id
UNION ALL
SELECT 'unused' AS code_type, 
15 AS start_id,
16 AS end_id,
1 AS order_id
UNION ALL
SELECT 'unused' AS code_type, 
106 AS start_id,
106 AS end_id,
2 AS order_id
),
voided_cte(sort_order, code_type, start_id, end_id, order_id) as (
    select 2, * 
    from myData
    where order_id = @v_order_id
          and code_type='voided'),
cancelled_cte(sort_order, code_type, start_id, end_id, order_id) as (
    select 3, * 
    from myData
    where order_id = @v_order_id
          and code_type='cancelled'),
unused_cte(sort_order, code_type, start_id, end_id, order_id) as (
    select 4, * 
    from myData
    where order_id = @v_order_id
          and code_type='unused'),
lag_gaps_cte(code_type, start_id, end_id, order_id, n, gap_flag) as (
    select m.*, fn.n, case when fn.n-lag(fn.n, 1, 0) over (order by fn.n)>1 then 1 else 0 end
    from myData m
         cross apply dbo.fnTally(1, m.end_id-m.start_id+1) fn
    where m.order_id = @v_order_id
          and code_type='all_codes_set'
          and not exists (select 1 
                          from voided_cte v 
                          where fn.n between v.start_id and v.end_id)
          and not exists (select 1 
                          from cancelled_cte c 
                          where fn.n between c.start_id and c.end_id)
          and not exists (select 1 
                          from unused_cte u
                          where fn.n between u.start_id and u.end_id)),
sum_grps_cte(sort_order, code_type, start_id, end_id, order_id, n, gap_flag, gap_grp) as (
    select 1, *, sum(gap_flag) over (order by n)
    from lag_gaps_cte)
select sort_order, code_type, min(n) start_id, max(n) end_id, order_id
from sum_grps_cte
group by sort_order, code_type, gap_grp, order_id
union all
select * from voided_cte
union all
select * from cancelled_cte
union all
select * from unused_cte
order by sort_order, start_id;

sort_ordercode_typestart_idend_idorder_id
1all_codes_set151
1all_codes_set7141
1all_codes_set17491
1all_codes_set611001
2voided661
3cancelled50601
4unused15161

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

BOR15K SSCertifiable Points: 5894 More actions · Answer 5

Thank you, Steve. It is really a great solution, yet I cannot use fnTally function in this case. Simply not allowed.

Steve Collins SSCrazy Points: 2157 More actions · Answer 6

Ok, the non-tvf same approach uses an additional CTE and SELECT TOP(n). The following code is to be embedded in the SQL to replace dbo.fnTally. The maximum # of rows is up to 20^4 or 160,000 (which should be plenty but if it's not then additional CROSS JOIN's could be added).

/* the row_goal sets the number of rows to generate */
declare @row_goal       int=5;

/* numbers_cte (1, 2, ..., 20) is CROSS JOINED 4 times to potentially generate up to 160,000 rows (20^4). */
/* the row_goal makes the query very efficient because the unneeded rows are never generated */
with numbers_cte(n) as (
    select * from (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),
                          (11),(12),(13),(14),(15),(16),(17),(18),(19),(20)) v(n))
select top(@row_goal) row_number() over (order by (select null)) as N
from numbers_cte n1
     cross join numbers_cte n2
     cross join numbers_cte n3
     cross join numbers_cte n4;

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Steve Collins SSCrazy Points: 2157 More actions · Answer 7

DECLARE @v_order_id INT = 1;

WITH
myData AS(
SELECT 'all_codes_set' AS code_type, 
1 AS start_id,
100 AS end_id,
1 AS order_id
UNION ALL
SELECT 'all_codes_set' AS code_type, 
101 AS start_id,
110 AS end_id,
2 AS order_id
UNION ALL
SELECT 'voided' AS code_type, 
6 AS start_id,
6 AS end_id,
1 AS order_id
UNION ALL
SELECT 'cancelled' AS code_type, 
50 AS start_id,
60 AS end_id,
1 AS order_id
UNION ALL
SELECT 'unused' AS code_type, 
15 AS start_id,
16 AS end_id,
1 AS order_id
UNION ALL
SELECT 'unused' AS code_type, 
106 AS start_id,
106 AS end_id,
2 AS order_id
),
numbers_cte(n) as (
    select * from (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),
                          (11),(12),(13),(14),(15),(16),(17),(18),(19),(20)) v(n)),
voided_cte(sort_order, code_type, start_id, end_id, order_id) as (
    select 2, * 
    from myData
    where order_id = @v_order_id
          and code_type='voided'),
cancelled_cte(sort_order, code_type, start_id, end_id, order_id) as (
    select 3, * 
    from myData
    where order_id = @v_order_id
          and code_type='cancelled'),
unused_cte(sort_order, code_type, start_id, end_id, order_id) as (
    select 4, * 
    from myData
    where order_id = @v_order_id
          and code_type='unused'),
lag_gaps_cte(code_type, start_id, end_id, order_id, n, gap_flag) as (
    select m.*, fn.n, case when fn.n-lag(fn.n, 1, 0) over (order by fn.n)>1 then 1 else 0 end
    from myData m
         cross apply (select top(m.end_id-m.start_id+1) row_number() over (order by (select null)) as N
                      from numbers_cte n1
                           cross join numbers_cte n2
                           cross join numbers_cte n3
                           cross join numbers_cte n4) fn
    where m.order_id = @v_order_id
          and code_type='all_codes_set'
          and not exists (select 1 
                          from voided_cte v 
                          where fn.n between v.start_id and v.end_id)
          and not exists (select 1 
                          from cancelled_cte c 
                          where fn.n between c.start_id and c.end_id)
          and not exists (select 1 
                          from unused_cte u
                          where fn.n between u.start_id and u.end_id)),
sum_grps_cte(sort_order, code_type, start_id, end_id, order_id, n, gap_flag, gap_grp) as (
    select 1, *, sum(gap_flag) over (order by n)
    from lag_gaps_cte)
select sort_order, code_type, min(n) start_id, max(n) end_id, order_id
from sum_grps_cte
group by sort_order, code_type, gap_grp, order_id
union all
select * from voided_cte
union all
select * from cancelled_cte
union all
select * from unused_cte
order by sort_order, start_id;

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Jeffrey Williams SSC Guru Points: 90303 More actions · Answer 8

Here is my take:

Declare @orderCodeSets Table (order_id int, code_type varchar(20), start_id int, end_id int);

 Insert Into @orderCodeSets (order_id, code_type, start_id, end_id)
 Values (1, 'all_code_sets', 1, 100)
      , (2, 'all_code_sets', 101, 110)
      , (1, 'voided', 6, 6)
      , (1, 'cancelled', 50, 60)
      , (1, 'unused', 15, 16)
      , (2, 'unused', 106, 106);

Declare @v_order_id int = 1;

   With t (n)
     As (
 Select t.n 
   From (
 Values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) As t(n)
        )
      , iTally (Number)
     As (
 Select checksum(row_number() over(Order By @@spid))
   From t t1, t t2, t t3                                --1000
        )
      , all_code_types
     As (
 Select ocs.order_id
      , code_type = coalesce(cs2.code_type, ocs.code_type)
      , start_id = it.Number
      , max_end = max(it.Number) over(Partition By ocs.order_id) + 1
   From iTally                           it
  Inner Join @orderCodeSets             ocs On it.Number Between ocs.start_id And ocs.end_id
   Left Join @orderCodeSets             cs2 On cs2.order_id = ocs.order_id
                                           And cs2.code_type <> 'all_code_sets'
                                           And it.Number Between cs2.start_id And cs2.end_id
  Where ocs.order_id = @v_order_id
    And ocs.code_type = 'all_code_sets'
        )
      , code_types
     As (
 Select acs.order_id
      , acs.code_type
      , acs.start_id
      , start_range = iif(acs.code_type <> lag(acs.code_type, 1, 1) over(Partition By acs.order_id Order By acs.start_id), 1, 0)
      , acs.max_end
   From all_code_types                  acs
        )
 Select ct.order_id
      , ct.code_type
      , ct.start_id
      , end_id = lead(ct.start_id, 1, ct.max_end) over(Partition By ct.order_id Order By ct.start_id) - 1
   From code_types                       ct
  Where ct.start_range = 1
  Order By
        ct.order_id
      , ct.code_type;

Any solution with an inline tally table will need to generate enough rows to cover the maximum range of code sets per order. This allows up to 1000 rows per order - not sure if that will be enough.

This works by exploding out the rows for all_code_sets per order...then 'replacing' the rows for all code types that are not 'all_code_sets'. Then - each start range is identified by checking the previous rows code type...if different then this is the start of a range.

Now it is just a matter of calculating the end id - which will be the next start minus one - unless it is the last row which will be our max_end for the order (we calculate that as max + 1 so the minus one from lead works for all values)

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

BOR15K SSCertifiable Points: 5894 More actions · Answer 9

Steve, Jeffrey

Thank you both for such a great help! May I ask couple of more questions, please:

The real data I have seen may have over 20,000 barcodes per order (think of a pallet with boxes on it, with hundreds of barcodes in each). Jeffrey has mentioned his script is limited to 1,000 only. Is it possible to increase the limit significantly or it will affect the performance?
I appreciate I can run many tests to establish what will be faster - using your wonderful solutions or to simply query the database to get the ranges and then "loop" through them in C# - what would be your gut feeling, please?

Jeffrey Williams SSC Guru Points: 90303 More actions · Answer 10

You can add two more tables in the itally. That should generate more than enough rows.

As for performance, that needs to be tested.

We can shortcut the number of rows generated, when i am back at my desk I can provide those changes

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

BOR15K SSCertifiable Points: 5894 More actions · Answer 11

Thank you, Jeffrey

Much appreciated. Will now read more to understand why adding only three tables significantly increased the result set - I am not well familiar with

Select t.n 
   From (
 Values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) As t(n)

approach in general. I think in Oracle we would use DUAL table instead.

Thank you again.

Jeffrey Williams SSC Guru Points: 90303 More actions · Answer 12

In Oracle - I wouldn't use this structure at all. I would generate a sequence as needed using INTERVAL - and of course, dual. But then again I haven't done anything in Oracle in over 15 years.

The from clause can be rewritten as: From t t1 Cross Join t t2 Cross Join t t3 Cross Join t t4 Cross Join t t5

So we get 10 rows cross joined to 10 rows = 100 rows, cross joined with 10 rows = 1000 rows, cross joined with 10 rows = 10000 rows, cross joined with 10 rows = 100000 rows.

The TOP short-circuits the generation of rows so we only generate the maximum needed for the order.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

BOR15K SSCertifiable Points: 5894 More actions · Answer 13

BOR15K

SSCertifiable

Points: 5894

January 11, 2021 at 9:33 pm

#3831189

I see! Thank you for the explanation, Jeffrey.