June 11, 2020 at 4:50 pm
Hi,
I am trying to create a query for gap analysis when country value do not match based on following condition :
--For single value country in both #FuzzyMatch1 and #FuzzyMatch2, the country should be a perfect match. If both country doesn't match 100% then
--I need to include in select statement to report as a gap
-- ID 1 , this record should not be flagged as a discrepancy as USA exists in #FuzzyMatch2 even though it's not a perfect match
-- ID 2 , this record should not be flagged as a discrepancy as Africa exists in #FuzzyMatch2 even though it's not a perfect match
-- ID 3 , this record should be flagged as a discrepancy as neither USA or India exists in #FuzzyMatch2
-- ID 4 , this record should be flagged as a discrepancy as Mexico doesnot exist in #FuzzyMatch2
-- ID 5 , this record should not be flagged as it's a perfect match
CREATE TABLE #FuzzyMatch1
(
ID INT,
Country NVARCHAR(100)
)
CREATE TABLE #FuzzyMatch2
(
ID INT,
Country NVARCHAR(100)
)
INSERT INTO #FuzzyMatch1 ( ID,Country)
SELECT 1,'USA;Canada;Australia' UNION ALL
SELECT 2,'USA;Canada;Africa' UNION ALL
SELECT 3,'USA;India' UNION ALL
SELECT 4,'Mexico' UNION ALL
SELECT 5,'New Zealand'
INSERT INTO #FuzzyMatch2 ( ID,Country)
SELECT 1,'USA;Nepal' UNION ALL
SELECT 2,'China;Africa' UNION ALL
SELECT 3,'Pakistan;Brazil' UNION ALL
SELECT 4,'Algeria' UNION ALL
SELECT 5,'New Zealand'
SELECT * FROM #FuzzyMatch1
SELECT * FROM #FuzzyMatch2
--desired result
SELECT 3 as ID,'USA;India'AS Country,'Pakistan;Brazil' As Country
SELECT 4 as ID,'Mexico'AS Country,'Algeria' As Country
DROP TABLE #FuzzyMatch1
DROP TABLE #FuzzyMatch2
June 11, 2020 at 6:48 pm
So, you want to split the values in the first table, and return results WHERE Country NOT LIKE('%'+split.Value+'%') , right?
Edit: I would say, first suggestion is to normalize the data and not store multiple values in one column. If that's not an option, let us know and we'll help you work through the above.
-------------------------------------------------------------------------------------------------------------------------------------
Please follow Best Practices For Posting On Forums to receive quicker and higher quality responses
June 11, 2020 at 8:32 pm
These are two separate tables coming into SQL from SharePoint list . I need to provide a gap report where the countries do not match.
Thanks,
PSB
June 11, 2020 at 8:38 pm
splitting the values into separate rows is not an option .
June 11, 2020 at 9:10 pm
You're doing the wrong thing badly. You don't seem to know the table must have a primary key by definition; this is not an option! A row is not a record.; They are totally different concepts. There is no such thing as a generic "id" in RDBMS; it must be the identifier of something in particular, according to what is called the Law of Identity in logic. You also seem to have failed to do any basic research because you would know that there is an ISO standard code of three letters for country names. No, you need to do that for yourself.
You've also never read a book on RDBMS or SQL because you would know CSV or a violation of First Normal Form, (1NF) the very foundations of RDBMS. Finally, why are you using the old non-ANSI syntax for your insertion statements? Are you still using the original Sybase engine?
CREATE TABLE County_List_1
(country_code CHAR(3) NOT NULL PRIMARY KEY);
CREATE TABLE County_List_2
(country_code CHAR(3) NOT NULL PRIMARY KEY);
INSERT INTO County_List_1
VALUES (...);
INSERT INTO County_List_2
VALUES (...);
You can now use the set-based operators, INTERSECT and EXCEPT, to find the differences. Between the two tables.
Please post DDL and follow ANSI/ISO standards when asking for help.
June 12, 2020 at 2:53 am
EXCEPT gives one to one match in all cases. I want conditional matches depending on case by case basis.
June 12, 2020 at 4:33 am
Here's one way to do this... it's horrible because, without a Clustered Index on the ID column, it takes 4 scans of the table and doesn't do much better with the Clustered Index. Of course, that's the nature of these types of problems. Everything that touches the denormalized column is going to be a train wreck especially when you're looking for things that don't have a match.
WITH cteMatch AS
(--==== Find an an all matches. Return the IDs that have at least one match.
-- This acts like the mapping/bridge table that someone should create
-- and keep up to date but without the extra two columns for such a table.
SELECT fm1.ID
FROM #FuzzyMatch1 fm1
JOIN #FuzzyMatch2 fm2 ON fm1.ID = fm2.ID
CROSS APPLY STRING_SPLIT(fm1.Country,';') split1
CROSS APPLY STRING_SPLIT(fm2.Country,';') split2
WHERE split1.value = split2.value
GROUP BY fm1.ID
)--==== Return the rows from both tables that don''t have at least one match.
SELECT FM1_ID = fm1.ID
,FM1_Country = fm1.Country
,FM2_ID = fm2.ID
,FM2_Country = fm2.Country
FROM #FuzzyMatch1 fm1
JOIN #FuzzyMatch2 fm2 ON fm1.ID = fm2.ID
WHERE NOT EXISTS (SELECT * FROM cteMatch m WHERE m.ID = fm1.ID)
;
The output is as expected...
--Jeff Moden
Change is inevitable... Change for the better is not.
June 12, 2020 at 4:40 am
You're doing the wrong thing badly. You don't seem to know the table must have a primary key by definition; this is not an option! A row is not a record.; They are totally different concepts. There is no such thing as a generic "id" in RDBMS; it must be the identifier of something in particular, according to what is called the Law of Identity in logic. You also seem to have failed to do any basic research because you would know that there is an ISO standard code of three letters for country names. No, you need to do that for yourself.
You've also never read a book on RDBMS or SQL because you would know CSV or a violation of First Normal Form, (1NF) the very foundations of RDBMS. Finally, why are you using the old non-ANSI syntax for your insertion statements? Are you still using the original Sybase engine?
CREATE TABLE County_List_1
(country_code CHAR(3) NOT NULL PRIMARY KEY);
CREATE TABLE County_List_2
(country_code CHAR(3) NOT NULL PRIMARY KEY);
INSERT INTO County_List_1
VALUES (...);
INSERT INTO County_List_2
VALUES (...);
You can now use the set-based operators, INTERSECT and EXCEPT, to find the differences. Between the two tables.
Because you refuse to understand and acknowledge the actual "RECORDS" that are contained in the ROWS of the given problem, your code won't provide the answer to the problem. It only does what YOU want it to.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 15, 2020 at 2:18 pm
Worked . Thanks
June 15, 2020 at 3:21 pm
Worked . Thanks
Thank you for the feedback. Since you're the one that will need to support it, at least in the short term, do you understand how and why it works?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply