April 4, 2012 at 9:09 am
Hi,
Let's say I have a table of locations (currently 3 but can grow), a table of customers, and a table of transactions.
I need to generate a report that shows how many customers shopped at the following locations:
Location 1 only
Location 2 only
Location 3 only
Location 1&2 but not 3
Location 1&3 but not 2
Location 2&2 but not 1
Location 1&2&3
This is just a simplified example, but basically we scan a barcode when a customer comes to each location. So it would store customer_id, location_id and date.
Any suggestions for how to write such a query? Is there a way to do this dynamically so that it'll still work when the locations change?
April 4, 2012 at 9:15 am
pixelwiz (4/4/2012)
Hi,Let's say I have a table of locations (currently 3 but can grow), a table of customers, and a table of transactions.
I need to generate a report that shows how many customers shopped at the following locations:
Location 1 only
Location 2 only
Location 3 only
Location 1&2 but not 3
Location 1&3 but not 2
Location 2&2 but not 1
Location 1&2&3
This is just a simplified example, but basically we scan a barcode when a customer comes to each location. So it would store customer_id, location_id and date.
Any suggestions for how to write such a query? Is there a way to do this dynamically so that it'll still work when the locations change?
Yes.
If you want a more detailed answer, please read this article[/url] about the best way to post DDL and sample data then edit your post to include it. That way the unpaid volunteers from this forum will be able to provide you with working and tested code to fulfil your particular requirements.
Thanks 😀
April 4, 2012 at 9:33 am
Ok, let me try to follow that post...
Query to create a temp table:
-----------
CREATE TABLE [dbo].[mytemptable](
[id] [int] IDENTITY(1,1) NOT NULL,
[customer_id] [int] NOT NULL,
[location_id] [int] NOT NULL,
[date_entered] [datetime] NOT NULL,
CONSTRAINT [PK_mytemptable] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
----------------------
Here is some sample data:
---------------------------
INSERT INTO mytemptable (customer_id, location_id, date_entered)
VALUES(1,1,'2012-04-01 00:00:00.000'),
(2,1,'2012-04-01 00:00:00.000'),
(3,2,'2012-04-01 00:00:00.000'),
(4,3,'2012-04-01 00:00:00.000'),
(5,3,'2012-04-01 00:00:00.000'),
(1,2,'2012-04-02 00:00:00.000'),
(1,3,'2012-04-03 00:00:00.000'),
(3,3,'2012-04-02 00:00:00.000'),
(4,2,'2012-04-02 00:00:00.000'),
(5,3,'2012-04-02 00:00:00.000'),
(1,1,'2012-04-04 00:00:00.000'),
(2,1,'2012-04-04 00:00:00.000'),
(3,2,'2012-04-04 00:00:00.000'),
(4,3,'2012-04-04 00:00:00.000'),
(5,3,'2012-04-04 00:00:00.000')
---------------------
April 4, 2012 at 9:50 am
One last thing that may not have been covered in the article, expected results. Based on the sample data you provided, what should the results from the query be?
April 4, 2012 at 11:00 am
Ok, what I'd like the results to look like is something like this:
I tried to get the outputs calculated manually not 100% sure I'm right...
For multiple locations, I'm trying to count how many customers have been to more than one location. If one customer has been to location 1 and location 2 twice, it should still only count it as 1.
Location IDs | Unique Scans | Repeat Scans
1 | 2 | 2
2 | 3 | 1
3 | 4 | 5
1,2 | 1 | NA
1,3 | 1 | NA
2,3 | 2 | NA
1,2,3 | 3 | NA
April 4, 2012 at 2:04 pm
Here is the only solution I've come up with so far, and it's not dynamic. Below partners are actually locations
SELECT
SUM(CASE WHEN p1 > 0 AND p2 = 0 AND p3 = 0 THEN 1 ELSE 0 END) p1Only,
SUM(CASE WHEN p1 = 0 AND p2 > 0 AND p3 = 0 THEN 1 ELSE 0 END) p2Only,
SUM(CASE WHEN p1 = 0 AND p2 = 0 AND p3 > 0 THEN 1 ELSE 0 END) p3Only,
SUM(CASE WHEN p1 > 0 AND p2 > 0 AND p3 = 0 THEN 1 ELSE 0 END) p1p2,
SUM(CASE WHEN p1 > 0 AND p2 = 0 AND p3 > 0 THEN 1 ELSE 0 END) p1p3,
SUM(CASE WHEN p1 = 0 AND p2 > 0 AND p3 > 0 THEN 1 ELSE 0 END) p2p3,
SUM(CASE WHEN p1 > 0 AND p2 > 0 AND p3 > 0 THEN 1 ELSE 0 END) p1p2p3
FROM (
SELECT customer_id, [1] AS p1, [2] AS p2, [3] AS p3
FROM (
SELECTDISTINCT c.customer_id, p.partner_id, (
SELECT COUNT(id)
FROM vw_dcs_scans c2
WHERE c2.customer_id = c.customer_id
AND c2.partner_id = p.partner_id) cnt
FROM vw_dcs_scans c,
(
SELECT DISTINCT spl.partner_id
FROM source_partners sp
INNER JOIN source_partner_locations spl
ON sp.id = spl.partner_id
INNER JOIN sources s
ON s.id = spl.source_id
INNER JOIN events e
ON e.id = s.event_id
WHERE e.id = 122
) p
) AS sourceTable
PIVOT
(
SUM(cnt) FOR partner_id IN ([1],[2],[3])
) AS pvtTable
) AS dataTable
April 5, 2012 at 6:53 am
If I may, I'd like to ask a question as I've taken an interest in this problem.
If I construct a list of locations, and assign to it the customers that have visited that location like this:
--LocCusts
--11, 2
--21, 3, 4
--31, 3, 4, 5
I understand how you're getting counts of 2,3,4 for visits to locations 1,2,3.
But for customers visiting:
--> 1+2 but not 3, it looks the count should be 0
--> 1+3 but not 2, it looks the count should be 0
--> 2+3 but not 1, it looks the count should be 2
Am I missing something? Perhaps my brain is fried... tough day today.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
April 5, 2012 at 7:27 am
I think you're right, I may have done my sample output values incorrectly, I was trying to do it in my head and probably counted it wrong.
April 5, 2012 at 7:42 am
Also, I think your repeat counts are wrong.
--LocCustomers(Visits)
--11(2), 2(2)
--21(1), 3(2), 4(1)
--31(1), 3(1), 4(2), 5(3)
So now, counting repeats I get:
Loc Repeats
1 2
2 1
3 3
Now that I know what I'm looking for I may be close to a solution.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
April 5, 2012 at 7:43 am
Try this working with your data. I have created 2 extra tables for the locations and decode_locations. Working on a binary theme each location can be allocated a number (power of 2), so location 1 = 1, location 2 = 2, location 3 = 4, location 4 = 8 etc.
If you then sum the locations that a customer has visited then
sum total = 7 would be locations 1,2,3
sum total = 4 would mean only 3 visited.
The final query (Build 3) uses two subqueries so have split out each intermediate query to show the progress although these are not required in the final script.
CREATE TABLE [dbo].[mytemptable](
[id] [int] IDENTITY(1,1) NOT NULL,
[customer_id] [int] NOT NULL,
[location_id] [int] NOT NULL,
[date_entered] [datetime] NOT NULL,
CONSTRAINT [PK_mytemptable] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
INSERT INTO mytemptable (customer_id, location_id, date_entered)
VALUES (1,1,'2012-04-01 00:00:00.000'),
(2,1,'2012-04-01 00:00:00.000'),
(3,2,'2012-04-01 00:00:00.000'),
(4,3,'2012-04-01 00:00:00.000'),
(5,3,'2012-04-01 00:00:00.000'),
(1,2,'2012-04-02 00:00:00.000'),
(1,3,'2012-04-03 00:00:00.000'),
(3,3,'2012-04-02 00:00:00.000'),
(4,2,'2012-04-02 00:00:00.000'),
(5,3,'2012-04-02 00:00:00.000'),
(1,1,'2012-04-04 00:00:00.000'),
(2,1,'2012-04-04 00:00:00.000'),
(3,2,'2012-04-04 00:00:00.000'),
(4,3,'2012-04-04 00:00:00.000'),
(5,3,'2012-04-04 00:00:00.000')
select * from mytemptable
create table Locations(
Location_ID int,
Location_Name varchar(30),
Location_Value bigint
)
create table Decode_Locations(
OutcomeTotal bigint,
OutcomeText varchar(30),
OutputOrder int
)
insert into Locations values (1,'Location A',1)
insert into Locations values (2,'Location B',2)
insert into Locations values (3,'Location C',4)
insert into Decode_Locations values (1,'Only Location A',10)
insert into Decode_Locations values (2,'Only Location B',20)
insert into Decode_Locations values (3,'Locations A and B only',40)
insert into Decode_Locations values (4,'Only Location C',30)
insert into Decode_Locations values (5,'Locations A and C only',50)
insert into Decode_Locations values (6,'Locations B and C only',60)
insert into Decode_Locations values (7,'All Locations (A,B.C)',70)
-- Build 1 : Find distinct customer to location combinations
select customer_id, location_id, count(*) as NumberVisits
from mytemptable
group by customer_id, location_id
-- Build 2 : Find which customers have visited which locations
---- using binary notation 1=Location A, 2=Location B, 4=Location C...(from Locations table above)
select Customer_ID, sum(Location_Value) as LocationsVisited, sum(NumberVisits) as VisitCount
from
(select customer_id, location_id, count(*) as NumberVisits
from mytemptable
group by customer_id, location_id) as VC
inner join Locations as L
on VC.location_id = L.Location_ID
group by Customer_ID
-- Build 3 : Find which customers have visited which locations
select DL.OutcomeText, count(LV.customer_id) as CustomerCount, isnull(sum(VisitCount),0) as NumberVisits from
(select Customer_ID, sum(Location_Value) as LocationsVisited, sum(NumberVisits) as VisitCount
from
(select customer_id, location_id, count(*) as NumberVisits
from mytemptable
group by customer_id, location_id) as VC
inner join Locations as L
on VC.location_id = L.Location_ID
group by Customer_ID) as LV
right outer join Decode_Locations as DL
on LV.LocationsVisited = DL.OutcomeTotal
group by DL.OutcomeText, DL.OutputOrder
order by DL.OutputOrder
From the sample data given the results would be :
OutcomeText, CustomerCount, NumberVisits
Only Location A, 1, 2
Only Location B, 0, 0
Only Location C, 1, 3
Locations A and B only, 0, 0
Locations A and C only, 0, 0
Locations B and C only, 2, 6
All Locations (A,B.C), 1, 4
So total of 5 different customers with the profiles and the number of numbers is 15 (ie number of rows in sample data set).
Fitz
April 5, 2012 at 8:57 am
Wow, very interesting
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply