Repeat and Unique customers at multiple locations

  • Hi,

    Let's say I have a table of locations (currently 3 but can grow), a table of customers, and a table of transactions.

    I need to generate a report that shows how many customers shopped at the following locations:

    Location 1 only

    Location 2 only

    Location 3 only

    Location 1&2 but not 3

    Location 1&3 but not 2

    Location 2&2 but not 1

    Location 1&2&3

    This is just a simplified example, but basically we scan a barcode when a customer comes to each location. So it would store customer_id, location_id and date.

    Any suggestions for how to write such a query? Is there a way to do this dynamically so that it'll still work when the locations change?

  • pixelwiz (4/4/2012)


    Hi,

    Let's say I have a table of locations (currently 3 but can grow), a table of customers, and a table of transactions.

    I need to generate a report that shows how many customers shopped at the following locations:

    Location 1 only

    Location 2 only

    Location 3 only

    Location 1&2 but not 3

    Location 1&3 but not 2

    Location 2&2 but not 1

    Location 1&2&3

    This is just a simplified example, but basically we scan a barcode when a customer comes to each location. So it would store customer_id, location_id and date.

    Any suggestions for how to write such a query? Is there a way to do this dynamically so that it'll still work when the locations change?

    Yes.

    If you want a more detailed answer, please read this article[/url] about the best way to post DDL and sample data then edit your post to include it. That way the unpaid volunteers from this forum will be able to provide you with working and tested code to fulfil your particular requirements.

    Thanks 😀


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Ok, let me try to follow that post...

    Query to create a temp table:

    -----------

    CREATE TABLE [dbo].[mytemptable](

    [id] [int] IDENTITY(1,1) NOT NULL,

    [customer_id] [int] NOT NULL,

    [location_id] [int] NOT NULL,

    [date_entered] [datetime] NOT NULL,

    CONSTRAINT [PK_mytemptable] PRIMARY KEY CLUSTERED

    (

    [id] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    ----------------------

    Here is some sample data:

    ---------------------------

    INSERT INTO mytemptable (customer_id, location_id, date_entered)

    VALUES(1,1,'2012-04-01 00:00:00.000'),

    (2,1,'2012-04-01 00:00:00.000'),

    (3,2,'2012-04-01 00:00:00.000'),

    (4,3,'2012-04-01 00:00:00.000'),

    (5,3,'2012-04-01 00:00:00.000'),

    (1,2,'2012-04-02 00:00:00.000'),

    (1,3,'2012-04-03 00:00:00.000'),

    (3,3,'2012-04-02 00:00:00.000'),

    (4,2,'2012-04-02 00:00:00.000'),

    (5,3,'2012-04-02 00:00:00.000'),

    (1,1,'2012-04-04 00:00:00.000'),

    (2,1,'2012-04-04 00:00:00.000'),

    (3,2,'2012-04-04 00:00:00.000'),

    (4,3,'2012-04-04 00:00:00.000'),

    (5,3,'2012-04-04 00:00:00.000')

    ---------------------

  • One last thing that may not have been covered in the article, expected results. Based on the sample data you provided, what should the results from the query be?

  • Ok, what I'd like the results to look like is something like this:

    I tried to get the outputs calculated manually not 100% sure I'm right...

    For multiple locations, I'm trying to count how many customers have been to more than one location. If one customer has been to location 1 and location 2 twice, it should still only count it as 1.

    Location IDs | Unique Scans | Repeat Scans

    1 | 2 | 2

    2 | 3 | 1

    3 | 4 | 5

    1,2 | 1 | NA

    1,3 | 1 | NA

    2,3 | 2 | NA

    1,2,3 | 3 | NA

  • Here is the only solution I've come up with so far, and it's not dynamic. Below partners are actually locations

    SELECT

    SUM(CASE WHEN p1 > 0 AND p2 = 0 AND p3 = 0 THEN 1 ELSE 0 END) p1Only,

    SUM(CASE WHEN p1 = 0 AND p2 > 0 AND p3 = 0 THEN 1 ELSE 0 END) p2Only,

    SUM(CASE WHEN p1 = 0 AND p2 = 0 AND p3 > 0 THEN 1 ELSE 0 END) p3Only,

    SUM(CASE WHEN p1 > 0 AND p2 > 0 AND p3 = 0 THEN 1 ELSE 0 END) p1p2,

    SUM(CASE WHEN p1 > 0 AND p2 = 0 AND p3 > 0 THEN 1 ELSE 0 END) p1p3,

    SUM(CASE WHEN p1 = 0 AND p2 > 0 AND p3 > 0 THEN 1 ELSE 0 END) p2p3,

    SUM(CASE WHEN p1 > 0 AND p2 > 0 AND p3 > 0 THEN 1 ELSE 0 END) p1p2p3

    FROM (

    SELECT customer_id, [1] AS p1, [2] AS p2, [3] AS p3

    FROM (

    SELECTDISTINCT c.customer_id, p.partner_id, (

    SELECT COUNT(id)

    FROM vw_dcs_scans c2

    WHERE c2.customer_id = c.customer_id

    AND c2.partner_id = p.partner_id) cnt

    FROM vw_dcs_scans c,

    (

    SELECT DISTINCT spl.partner_id

    FROM source_partners sp

    INNER JOIN source_partner_locations spl

    ON sp.id = spl.partner_id

    INNER JOIN sources s

    ON s.id = spl.source_id

    INNER JOIN events e

    ON e.id = s.event_id

    WHERE e.id = 122

    ) p

    ) AS sourceTable

    PIVOT

    (

    SUM(cnt) FOR partner_id IN ([1],[2],[3])

    ) AS pvtTable

    ) AS dataTable

  • If I may, I'd like to ask a question as I've taken an interest in this problem.

    If I construct a list of locations, and assign to it the customers that have visited that location like this:

    --LocCusts

    --11, 2

    --21, 3, 4

    --31, 3, 4, 5

    I understand how you're getting counts of 2,3,4 for visits to locations 1,2,3.

    But for customers visiting:

    --> 1+2 but not 3, it looks the count should be 0

    --> 1+3 but not 2, it looks the count should be 0

    --> 2+3 but not 1, it looks the count should be 2

    Am I missing something? Perhaps my brain is fried... tough day today.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • I think you're right, I may have done my sample output values incorrectly, I was trying to do it in my head and probably counted it wrong.

  • Also, I think your repeat counts are wrong.

    --LocCustomers(Visits)

    --11(2), 2(2)

    --21(1), 3(2), 4(1)

    --31(1), 3(1), 4(2), 5(3)

    So now, counting repeats I get:

    Loc Repeats

    1 2

    2 1

    3 3

    Now that I know what I'm looking for I may be close to a solution.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Try this working with your data. I have created 2 extra tables for the locations and decode_locations. Working on a binary theme each location can be allocated a number (power of 2), so location 1 = 1, location 2 = 2, location 3 = 4, location 4 = 8 etc.

    If you then sum the locations that a customer has visited then

    sum total = 7 would be locations 1,2,3

    sum total = 4 would mean only 3 visited.

    The final query (Build 3) uses two subqueries so have split out each intermediate query to show the progress although these are not required in the final script.

    CREATE TABLE [dbo].[mytemptable](

    [id] [int] IDENTITY(1,1) NOT NULL,

    [customer_id] [int] NOT NULL,

    [location_id] [int] NOT NULL,

    [date_entered] [datetime] NOT NULL,

    CONSTRAINT [PK_mytemptable] PRIMARY KEY CLUSTERED

    (

    [id] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    INSERT INTO mytemptable (customer_id, location_id, date_entered)

    VALUES (1,1,'2012-04-01 00:00:00.000'),

    (2,1,'2012-04-01 00:00:00.000'),

    (3,2,'2012-04-01 00:00:00.000'),

    (4,3,'2012-04-01 00:00:00.000'),

    (5,3,'2012-04-01 00:00:00.000'),

    (1,2,'2012-04-02 00:00:00.000'),

    (1,3,'2012-04-03 00:00:00.000'),

    (3,3,'2012-04-02 00:00:00.000'),

    (4,2,'2012-04-02 00:00:00.000'),

    (5,3,'2012-04-02 00:00:00.000'),

    (1,1,'2012-04-04 00:00:00.000'),

    (2,1,'2012-04-04 00:00:00.000'),

    (3,2,'2012-04-04 00:00:00.000'),

    (4,3,'2012-04-04 00:00:00.000'),

    (5,3,'2012-04-04 00:00:00.000')

    select * from mytemptable

    create table Locations(

    Location_ID int,

    Location_Name varchar(30),

    Location_Value bigint

    )

    create table Decode_Locations(

    OutcomeTotal bigint,

    OutcomeText varchar(30),

    OutputOrder int

    )

    insert into Locations values (1,'Location A',1)

    insert into Locations values (2,'Location B',2)

    insert into Locations values (3,'Location C',4)

    insert into Decode_Locations values (1,'Only Location A',10)

    insert into Decode_Locations values (2,'Only Location B',20)

    insert into Decode_Locations values (3,'Locations A and B only',40)

    insert into Decode_Locations values (4,'Only Location C',30)

    insert into Decode_Locations values (5,'Locations A and C only',50)

    insert into Decode_Locations values (6,'Locations B and C only',60)

    insert into Decode_Locations values (7,'All Locations (A,B.C)',70)

    -- Build 1 : Find distinct customer to location combinations

    select customer_id, location_id, count(*) as NumberVisits

    from mytemptable

    group by customer_id, location_id

    -- Build 2 : Find which customers have visited which locations

    ---- using binary notation 1=Location A, 2=Location B, 4=Location C...(from Locations table above)

    select Customer_ID, sum(Location_Value) as LocationsVisited, sum(NumberVisits) as VisitCount

    from

    (select customer_id, location_id, count(*) as NumberVisits

    from mytemptable

    group by customer_id, location_id) as VC

    inner join Locations as L

    on VC.location_id = L.Location_ID

    group by Customer_ID

    -- Build 3 : Find which customers have visited which locations

    select DL.OutcomeText, count(LV.customer_id) as CustomerCount, isnull(sum(VisitCount),0) as NumberVisits from

    (select Customer_ID, sum(Location_Value) as LocationsVisited, sum(NumberVisits) as VisitCount

    from

    (select customer_id, location_id, count(*) as NumberVisits

    from mytemptable

    group by customer_id, location_id) as VC

    inner join Locations as L

    on VC.location_id = L.Location_ID

    group by Customer_ID) as LV

    right outer join Decode_Locations as DL

    on LV.LocationsVisited = DL.OutcomeTotal

    group by DL.OutcomeText, DL.OutputOrder

    order by DL.OutputOrder

    From the sample data given the results would be :

    OutcomeText, CustomerCount, NumberVisits

    Only Location A, 1, 2

    Only Location B, 0, 0

    Only Location C, 1, 3

    Locations A and B only, 0, 0

    Locations A and C only, 0, 0

    Locations B and C only, 2, 6

    All Locations (A,B.C), 1, 4

    So total of 5 different customers with the profiles and the number of numbers is 15 (ie number of rows in sample data set).

    Fitz

  • Wow, very interesting

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply