November 9, 2022 at 6:28 am
I have a table where values into field is like this way
There are multiple rows now we need to find distinct value from all this row and create a lookup table
Here are 3 rows from this tables
1 ["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Structured Cabling","Prod Vert - Video"]
2 ["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Video"]
3 ["Acct Vert - Chemical"]
I was thinking to create a data from split string function and get data.
After doing that i need to create another table where main table id and lookup id and create a record.
Here is examples
After creating a lookup table data will like this way
1 Acct Vert – Chemical
2 Acct Vert – Energy
3 Acct Vert - Seaport
Here is another table we need it
1 1
1 2
1 3
2 1
2 2
2 2
3 1
Create table Maintable ( id int IDENTITY(1,1), multiname varchar(500) )
Create table looktable ( id int IDENTITY(1,1), name varchar(100) )
Create table facttable ( MainTableid int , Lookuptableid int )
insert into Maintable Values ('["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Structured Cabling","Prod Vert - Video"]')
insert into Maintable Values ('["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Video"]')
insert into Maintable Values ('["Acct Vert - Chemical"]')
November 9, 2022 at 7:48 am
Did you searched SSC for a split string article ?
"Tally OH! An Improved SQL 8K “CSV Splitter” Function"
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
November 18, 2022 at 6:44 am
This was removed by the editor as SPAM
November 21, 2022 at 11:24 pm
Here's a quick example solution that may work depending on the data size, processing frequency, etc. (Temporary tables used here for example dataset processing)
IF OBJECT_ID('tempdb..#Maintable') IS NULL
BEGIN
--DROP TABLE #Maintable
Create table #Maintable ( id int IDENTITY(1,1), multiname varchar(500) );
insert into #Maintable Values ('["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Structured Cabling","Prod Vert - Video"]')
insert into #Maintable Values ('["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Video"]')
insert into #Maintable Values ('["Acct Vert - Chemical"]')
END
IF OBJECT_ID('tempdb..#Maintable_Split') IS NULL
BEGIN
--DROP TABLE #Maintable_Split
; WITH cte AS (
SELECT
x.id
, REPLACE(REPLACE(REPLACE(
x.multiname
,'["','')-- replace the array square brace and opening item beginning double quote
,'"]','')-- replace the array square brace and closing item ending double quote
,'","',CHAR(30))-- replace the (double quote - comma - double quote) record delimiter to a single sparsely used hidden characture for use in the string split later
AS multiname
FROM #Maintable x
)
SELECT
x.id
, y.value AS name
INTO #Maintable_Split
FROM cte x
OUTER APPLY STRING_SPLIT(x.multiname,CHAR(30)) y -- split the string based on the single sparsely used hidden characture you created above in the cte
END
IF OBJECT_ID('tempdb..#looktable') IS NULL
BEGIN
--DROP TABLE #looktable
Create table #looktable ( id int IDENTITY(1,1), name varchar(100) )
INSERT INTO #looktable(name)
SELECT
x.name
FROM #Maintable_Split x
LEFT JOIN #looktable y ON y.name = x.name
WHERE y.id IS NULL
GROUP BY
x.name
END
IF OBJECT_ID('tempdb..#facttable') IS NULL
BEGIN
--DROP TABLE #facttable
Create table #facttable ( MainTableid int , Lookuptableid int )
INSERT INTO #facttable (MainTableid,Lookuptableid)
SELECT
x.id AS MainTableid
, y.id AS Lookuptableid
FROM #Maintable_Split x
JOIN #looktable y ON y.name = x.name
LEFT JOIN #facttable z ON z.MainTableid = x.id AND z.Lookuptableid = y.id
WHERE z.MainTableid IS NULL
END
SELECT *
FROM #facttable x
WHERE 1=1
November 22, 2022 at 8:51 am
@sks_989 ,
You're using SQL Server 2017. It has a nice little goodie known as TRIM() that will easily replace the need for multiple REPLACES.
Also, get in the habit of leaving some simple bread-crumbs in the form of comments that provide both visual code separators and a short logical description of what the code does for the next person to touch the code... which might be you a year from now. 😀
This will do as you ask except I'll let you have some of the fun by writing the two FK's that are necessary for the #FactTable.
--=============================================================================
-- Create and populate the MainTable from the given data.
-- This is NOT a part of the solution. This is just setting up the data.
--=============================================================================
DROP TABLE IF EXISTS #MainTable
;
CREATE TABLE #MainTable
(
ID int IDENTITY(1,1)
,MultiName varchar(500) NOT NULL
,CONSTRAINT PK_MainTable PRIMARY KEY CLUSTERED (ID)
)
;
INSERT INTO #MainTable WITH (TABLOCK)
VALUES ('["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Structured Cabling","Prod Vert - Video"]')
,('["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Video"]')
,('["Acct Vert - Chemical"]')
;
--=============================================================================
-- Create and populate a working table that contains the split/clean data.
--=============================================================================
DROP TABLE IF EXISTS #Split
;
SELECT MainTableID = mt.ID
,t1.Name
INTO #Split
FROM #Maintable mt
CROSS APPLY STRING_SPLIT(multiname,',')s1
CROSS APPLY (VALUES(TRIM('[]"' FROM s1.Value)))t1(Name)
;
--=============================================================================
-- Create and populate the Lookup table from the unique split data.
--=============================================================================
DROP TABLE IF EXISTS #LookUp
;
CREATE TABLE #LookUp
(
LookUpTableID int IDENTITY(1,1)
,Name varchar(100) NOT NULL
,CONSTRAINT PK_LookUp PRIMARY KEY CLUSTERED (LookUpTableID)
,INDEX AK_Name UNIQUE (Name)
)
;
INSERT INTO #LookUp WITH (TABLOCK)
(Name)
SELECT DISTINCT
Name
FROM #Split
ORDER BY Name
OPTION (MAXDOP 1) --Keep parallelism from messing up the order
;
--=============================================================================
-- Create and populate the FactTable "bridge" table joning by VERT names.
--=============================================================================
DROP TABLE IF EXISTS #FactTable
;
CREATE TABLE #FactTable
(
MainTableID int NOT NULL
,LookupTableID int NOT NULL
,CONSTRAINT PK_FactTable PRIMARY KEY CLUSTERED (MainTableID,LookupTableID)
)
;
INSERT INTO #FactTable WITH (TABLOCK)
(MainTableID, LookUpTableID)
SELECT s.MainTableID
,l.LookUpTableID
FROM #Split s
JOIN #LookUp l ON s.Name = l.Name
;
--===== All done... Drink BEER!
Also, I did like like Justin did... I used Temp Tables for this demo because I use DROP TABLE for this demo to make demo reruns easier and don't want you to make the mistake of dropping a real table by mistake. 😀
Also... if it were me, I'd split the "VERT" names into a VertCategory and VertType column. For that, you'd need to use the DelimitedSplit8K function that Johan provide the link for above because STRING_SPLIT() does not return a position ordinal until 2022 (what the hell was MS thinking when they made that mistake?). It'll make future data analysis and reporting a shedload easier. I didn't want to presume to do it that way for you, though.
@justin... welcome aboard, mate!
--Jeff Moden
Change is inevitable... Change for the better is not.
November 22, 2022 at 2:48 pm
Quick thought, this can be simplified by using OPENJSON
😎
Here is an example:
USE TEEST;
GO
SET NOCOUNT ON;
GO
---------------------------------------------------------------------
-- Declare the test variable
---------------------------------------------------------------------
DECLARE @TSTRING NVARCHAR(MAX) = N'["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Structured Cabling","Prod Vert - Video"]';
---------------------------------------------------------------------
-- Return the Ordinal position (key) and the Value for each entry in
-- the test string
---------------------------------------------------------------------
;WITH PRE_PARSE(VKEY,VVAL) AS
(
SELECT
SJ.
,SJ.[value]
FROM OPENJSON(@TSTRING) SJ
)
---------------------------------------------------------------------
-- Split each value by the "-" delimiter
---------------------------------------------------------------------
,SPLIT_VALUES(RRID,VKEY,SVAL) AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY PP.VKEY ORDER BY PP.VKEY) AS RRID
,PP.VKEY
,SP.[value] AS SVAL
FROM PRE_PARSE PP
CROSS APPLY string_split(PP.VVAL,N'-') SP
)
---------------------------------------------------------------------
-- Final output
---------------------------------------------------------------------
SELECT
SV.VKEY
,MAX(CASE WHEN RRID = 1 THEN SV.SVAL END) AS VCOL
,MAX(CASE WHEN RRID = 2 THEN SV.SVAL END) AS VVAL
FROM SPLIT_VALUES SV
GROUP BY SV.VKEY
;
The result set:
VKEY VCOL VVAL
----- ----------- --------------------
0 Acct Vert Chemical
1 Acct Vert Energy
2 Acct Vert Seaport
3 Prod Vert Access Control
4 Prod Vert Structured Cabling
5 Prod Vert Video
November 22, 2022 at 4:58 pm
Jeff's code seems to generate primary keys and produces the correct output afaik. Creating foreign keys should be straightforward for the OP. The split could be accomplished with OPENJSON without specifying a schema and without the need for STRING_SPLIT imo
Using OPENJSON (Jeff's code)
--===== Before getting started... Drink beer NOW!
--=============================================================================
-- Create and populate a working table that contains the split/clean data.
--=============================================================================
DROP TABLE IF EXISTS #Split
;
SELECT MainTableID = mt.ID
,oj.value [Name]
INTO #Split
FROM #Maintable mt
CROSS APPLY openjson(mt.MultiName) oj
;
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
November 22, 2022 at 6:04 pm
Although JSON is a simple solution, I generally avoid it for single level JSON because it's a fair bit slower than DelimitedSplit8k, Eirikur's wonderful "Lead" addition to the DelimitedSplit8k function, or STRING_SPLIT(). Just sayin'... 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
November 22, 2022 at 6:51 pm
Comma is not an escape character in JSON (towards the bottom of the page) 😉
https://www.json.org/json-en.html
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
November 22, 2022 at 7:04 pm
Comma is not an escape character in JSON (towards the bottom of the page) 😉
https://www.json.org/json-en.html%5B/quote%5D
M$-JSon might not be strict 😉
😎
November 22, 2022 at 7:09 pm
Jeff's code seems to generate primary keys and produces the correct output afaik. Creating foreign keys should be straightforward for the OP. The split could be accomplished with OPENJSON without specifying a schema and without the need for STRING_SPLIT imo
Using OPENJSON (Jeff's code)
--===== Before getting started... Drink beer NOW!
--=============================================================================
-- Create and populate a working table that contains the split/clean data.
--=============================================================================
DROP TABLE IF EXISTS #Split
;
SELECT MainTableID = mt.ID
,oj.value [Name]
INTO #Split
FROM #Maintable mt
CROSS APPLY openjson(mt.MultiName) oj
;
The same can be done without specifying a schema with just about any method capable of performing a Split. The OP asked for tables, though... probably for building some form of Normalized data for much larger data or maybe even a DW.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 22, 2022 at 7:40 pm
If you load up one of the strings with a bunch of commas it causes an error afaik. JSON doesn't care how many commas in a row. A tiny point maybe 🙂
Try with this input
DROP TABLE IF EXISTS #MainTable
;
CREATE TABLE #MainTable
(
ID int IDENTITY(1,1)
,MultiName varchar(500) NOT NULL
,CONSTRAINT PK_MainTable PRIMARY KEY CLUSTERED (ID)
)
;
INSERT INTO #MainTable WITH (TABLOCK)
VALUES ('["Acct Vert - Chemical,,, etc","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Structured Cabling","Prod Vert - Video"]')
,('["Acct Vert - Chemical","Acct Vert - Energy","Acct Vert - Seaport","Prod Vert - Access Control","Prod Vert - Video"]')
,('["Acct Vert - Chemical"]')
;
Also, Jeff could you explain a little about the (TABLOCK) hint? Also, " OPTION (MAXDOP 1) --Keep parallelism from messing up the order"?
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
November 22, 2022 at 10:37 pm
Also, Jeff could you explain a little about the (TABLOCK) hint? Also, " OPTION (MAXDOP 1) --Keep parallelism from messing up the order"?
Sure...
About the WITH (TABLOCK)... it's being used on a Temp table and TembDB is in the simple recovery model. We're inserting more than one row into a Temp table that has a Clustered Index on it. WITH (TABLOCK) will cause it to be minimally logged because of the Simple Recovery Model. It's become an automatic habit for me. I do it even if there's no Clustered Index involved or something else may prevent the Minimal Logging because it'll still prevent "eventual escalation" from row to page and make things a bit faster. Lot's faster if there's a fair bit of data and it has a Clustered Index on it when starting out empty. It'll also cause minimal logging if inserting into a heap. You can't get Minimal logging for INSERTs without it so I add it to code in TempDB as a matter of rote.
As for the MAXDOP 1 thing, that's another habit of mine if I want the IDENTITY column to reflect the order of data. Even if parallelism weren't possible, using MAXDOP in such a fashion pretty much guarantees it won't happen. If it does, things might not end up in the order I want them. It's also a visual indicator (to me, as least) that my intent for the IDENTITY column matching the sort order was of primary concern.
For inserts that also have variables in the code, I'll also include the Recompile option for such inserts because Minimal Logging frequently doesn't happen without it if variables are present. Sometimes it not required but I don't trust MS or the optimizer to make such a decision for me.
Some people claim it's all overkill. Ok... whatever. Someday they'll need to wear "Depends" and then they'll understand. 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
November 23, 2022 at 4:42 pm
Interesting and much appreciated. Happy Thanksgiving!
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply