Table returning duplicates when joined on

  • Hello, I'll try my best to explain the situation.

    We currently have an application whereby the user will come along sign up add some information about them, choose what interests / hobbies they're in to and then click save.

    After completing the sigh up they're able to search for other individuals within locations (look at this functionality as a dating site) We currently have a stored procedure whereby when the user clicks Advance Search they fill in all the fields they filled in when signing up and when they click search we return all users that match the data the user has selected. This part is working correctly.

    The issue im having is extending this stored procedure to take a user defined table type which has two columns UserId and ActivityId both columns are nullable.

    This user defined table type is referred to in two places, when the user signs up and when they do an advance search.

    When the user does an advance search he / she can choose a list of hobbies / interests this will then be passed in to the database and I will join on my User_Activities table to find the users that have the Activities passed in.

    When I join on this table I get duplicate records purely because one user may have 5 - 10 activities depending on how active the individual is. My current stored procedure looks like this

    @Ambition int = null,

    @Body int = null,

    @Diet int = null,

    @Drinking int = null,

    @Ethnicity int = null,

    @Exercise int = null,

    @HeightFrom int = null,

    @HeightTo int = null,

    @Intent int = null,

    @Gender int = null,

    @Seeking int = null,

    @Kids int = null,

    @Language int = null,

    @Marital int = null,

    @Smoking int = null,

    @WeightFrom int = null,

    @WeightTo int = null,

    @CountryId int = null,

    @StateId int = null,

    @CityName varchar(50) = null,

    @AgeFrom int = null,

    @AgeTo int = null,

    @Radius int = null,

    @TVP dbo.UserActivities READONLY -- Consists of the Activities

    AS

    BEGIN

    SET NOCOUNT ON;

    Declare @Geo Geography = null

    Declare @CityId int = null

    Set @CityId = (Select isnull(id, null) from [Location].Cities where CityPostcode = @Cityname and StateId = @StateId)

    Set @Geo = (Select Geo from Location.Cities where CityPostcode = @CityName and StateId = ISNULL(@StateId, StateId))

    SELECT p.Id [UserId], p.Username, p.Intent, pd.HeadLine, p.Seeking, p.Gender, s.StateName + ', ' + c.CityName [Location],

    p.LoggedIn, p.LastLoggedIn, p1.PhotoId

    FROM [User].User_Profile p

    INNER JOIN [User].[User_Details] pd on p.Id = pd.UserId

    INNER JOIN [User].User_Address a on p.Id = a.UserId

    INNER JOIN [Location].States s on a.[State] = s.Id

    INNER JOIN [Location].Cities c on a.City = c.Id

    INNER JOIN [User].User_Photos p1 on p.Id = p1.UserId

    --INNER JOIN [User].[User_Activities] ua on p.Id = ua.UserId -- When I join I get duplicate records

    WHERE p1.IsProfilePic = 1

    AND a.[Country] = @CountryId

    AND a.[State] = ISNULL(@StateId, a.[State])

    AND pd.Ambition = ISNULL(@Ambition, pd.Ambition)

    AND pd.Body = ISNULL(@Body, pd.Body)

    AND pd.Diet = ISNULL(@Diet, pd.Diet)

    AND pd.Drinking = ISNULL(@Drinking, pd.Drinking)

    AND pd.Ethnicity = ISNULL(@Ethnicity, pd.Ethnicity)

    AND pd.Exercise = ISNULL(@Exercise, pd.Exercise)

    AND pd.Height between ISNULL(@HeightFrom, pd.Height) and ISNULL(@HeightTo, pd.Height)

    AND (

    (@Seeking = 3) -- Male or Female

    OR

    (@Gender is null and @Seeking is null)

    OR

    (@Gender = 1 and @Seeking= 1 and p.Gender = @Gender and p.Seeking = @Seeking) -- Female seeking Female

    OR

    (@Gender = 2 and @Seeking = 2 and p.Gender = @Gender and p.Seeking = @Seeking) -- Male seeking Male

    OR

    (@Gender = 1 and @Seeking = 2 and p.Gender = 2 and p.Gender != @Gender and p.Seeking != @Seeking ) -- Female seeking Male

    OR

    (@Gender = 2 and @Seeking = 1 and p.Gender = 1 and p.Gender != @Gender and p.Seeking != @Seeking) -- Male seeking Female

    )

    AND pd.Kids = ISNULL(@Kids, pd.Kids)

    AND pd.[Language] = ISNULL(@Language, pd.[Language])

    AND pd.Marital = ISNULL(@Marital, pd.Marital)

    AND pd.Smoking = ISNULL(@Smoking, pd.Smoking)

    AND pd.[Weight] between ISNULL(@WeightFrom, pd.[Weight]) and ISNULL(@WeightTo, pd.[Weight])

    AND a.[State] = ISNULL(@StateId, a.[State])

    AND (

    (@CityId is null and @Radius is null and a.City = isnull(@CityId, a.city)) -- Search by city

    OR -- Search on City and radius

    (@CityId is not null AND @Geo is not null AND ceiling(@Geo.STDistance(c.Geo)/1000) <= @Radius)

    OR

    (a.City = @CityId)

    )

    AND p.Age between ISNULL(@AgeFrom, p.Age) and ISNULL(@AgeTo, p.Age)

    END

    So my question is how can I select all users that match the criteria passed in, as well as finding the users that match the activities passed in (the user can either match all of them or a minimum of 1)

    Now there may be a better way of doing this I'm not strong in SQL I'm more of a front end developer so I'm welcome to hear feedback / improvements.

    Also just a point which I feel will help, my User_Activities has two columns UserId and ActivityId identical to the user defined table type

  • If you're not returning columns from User_Activities, you can change the INNER JOIN to a WHERE EXISTS:

    SELECT aBunchOfColumns

    FROM User_Profile AS p

    INNER JOIN aBunchOfTables

    ON someCondition

    WHERE EXISTS (

    SELECT *

    FROM User_Activities AS ua

    WHERE p.Id = ua.UserId

    )

    -- Gianluca Sartori

  • Hi thanks for your help, that seems to be working I did receive duplicate profiles again but I managed to resolve that by using DISTINCT on my select statement.

    Question though when I pass in null for @TVP is returns me no profiles when in fact it should so what you have provided me doesn't cater for an empty @TVP how can I cater for that?

  • .Netter (7/8/2015)


    Hi thanks for your help, that seems to be working I did receive duplicate profiles again but I managed to resolve that by using DISTINCT on my select statement.

    Question though when I pass in null for @TVP is returns me no profiles when in fact it should so what you have provided me doesn't cater for an empty @TVP how can I cater for that?

    You originally specified there would be a "minimum of 1". What result would you expect when the TVP is empty?

    Steve (aka sgmunson) πŸ™‚ πŸ™‚ πŸ™‚
    Rent Servers for Income (picks and shovels strategy)

  • Correct, I was informed this would be a required field but the spec has now changed whereby this is optional so if they pass is null for @TVP I want it to ignore @TVP and just filter on the rest of the where clauses.

  • Hey, can anyone shed any light on this please I've been battling it all day I tried checking the count of TVP and then saying in my where clause @Count > 0 but that didn't seem to be work πŸ™

  • I wonder if you could just modify Spaghettidba's criteria to check for the null @tvp maybe?

    SELECT aBunchOfColumns

    FROM User_Profile AS p

    INNER JOIN aBunchOfTables

    ON someCondition

    WHERE EXISTS (

    SELECT *

    FROM User_Activities AS ua

    WHERE p.Id = ua.UserId

    )

    OR User_Activities IS NULL

  • Apologies on the late reply was pulled on to another project, that sadly didn't work, can any suggest anything else?

  • .Netter (7/14/2015)


    Apologies on the late reply was pulled on to another project, that sadly didn't work, can any suggest anything else?

    You have to stop assuming that you implemented the suggestion correctly and stop assuming that we know what you know. What, exactly, didn't work, and how did you detect that it failed to work? You can't just say "it didn't work" and think we'll know what you're talking about. We have to operate solely on what you tell us, and you're not being terribly detailed. Please be specific. When something doesn't work, post the attempted code so we can see if there might have been some misunderstanding that led to bad code, and post the results so we can see what results it did produce, as that often leads to a solution. You seem to be more fond of just throwing stuff at the wall and hoping it will stick. That's not a terribly productive method of problem solving. Help us help you... post the details, please...

    Steve (aka sgmunson) πŸ™‚ πŸ™‚ πŸ™‚
    Rent Servers for Income (picks and shovels strategy)

  • Ok how about this modification of spaghettidba's code:

    SELECT aBunchOfColumns

    FROM User_Profile AS p

    INNER JOIN aBunchOfTables

    ON someCondition

    WHERE EXISTS (

    SELECT *

    FROM User_Activities AS ua

    WHERE p.Id = ua.UserId

    )

    OR

    (SELECT COUNT(*) FROM User_Activities) = 0

    But even with that suggestion, I do have to share sgmunson's concerns that you aren't giving very much feedback or assistance to us in working through this other than "it didn't work". Even saying in reply to my post that "it didn't change the results" would be something, and for the purposes of the above suggestion, thats what I'm assuming happened.

  • I understand where your coming from I assumed we were still on the same page in regards to what I posted after the first code sample was provided i.e regarding @TVP being null.

    If I comment the statement

    WHERE EXISTS (

    SELECT *

    FROM User_Activities AS ua

    WHERE p.Id = ua.UserId and ua.ActivityId in (Select ActivityId from @TVP)

    )

    and execute the procedure I see all profiles, but as soon as I un-comment that statement above and pass in null for @TVP and then execute the procedure again I don't get any profiles due to that statement not catering for a null @TVP.

    If I pass in some activities within @TVP then it works as expected I see profiles which have the chosen Activities linked, so the issue is I need to change that statement in such a way that it can cater for a null @TVP (if @TVP is null then ignore it bascially)

    I have tried the following (this is at the top of the procedure)

    Declare @Count int = (Select Count(*) from @TVP)

    Then in the WHERE Exists condition it looks like this:

    WHERE EXISTS (

    SELECT *

    FROM User_Activities AS ua

    WHERE p.Id = ua.UserId and @Count > 0 and ua.ActivityId in (Select ActivityId from @TVP)

    )

    But again that didn't give me the desired result, regardless of me passing in Activities or passing in null for @TVP I get an empty record set.

  • If I understand your requirement correctly, then the below should do the trick, convoluted though it is:

    WHERE Β EXISTS (

    Β  Β  SELECT *

    Β  Β  FROM User_Activities AS ua

    Β  Β  WHERE p.Id = ua.UserId and (ua.ActivityId in (Select ActivityId from @TVP)

    OR NOT EXISTS (SELECT 1 from @TVP))

    )

    I have the feeling it can be done more simply, but I'll have to mull it over a bit.

    This is also the sort of thing where dynamically constructing the query or calling a different stored procedure based on whether @TVP is empty can be considered.

    It may well be that the execution plan that is best when @TVP is empty is quite different than the execution plan when @TVP isn't. In that situation, it may be best to use a different query for each case. This is rather like a "catch all" query, and Gail Shaw has a nice write-up about those here: http://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/

    Cheers!

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply