March 12, 2013 at 10:03 am
The base column would, as I understand it, always come across as-is in terms of it's NULL/NOT NULL property if selected straightaway in a SELECT...INTO. By 'not copying NULLability' I figured you were talking about overriding the base column using an expression, as you showed by wrapping a column in the SELECT-list with ISNULL, which seems to work on all the data types I tried it on. Similarly you can make a NOT NULL column NULL-able by wrapping it with NULLIF.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
March 12, 2013 at 10:49 am
... Similarly you can make a NOT NULL column NULL-able by wrapping it with NULLIF.
... or CASE WHEN where relevant.
March 12, 2013 at 11:53 am
COALESCE too...
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
March 12, 2013 at 4:51 pm
Hi Guys.
Thanks for all your comments. Anyway going back to the original question will it be a definate that I have to create the table first before selecting into as the source table I will be selecting from will have NULL values.
It looks like I have to create the table first so my questions is if i do
create table bla bla bla
then select * from source table into bla bla bla
what will happen if it comes across a null value? will everything stop or will it carry on inserting what it can?
also is there a way to say
If IS NULL then
do something?
else
carry on inserting
END IF
March 12, 2013 at 6:26 pm
alan_lynch (3/12/2013)
Hi Guys.Thanks for all your comments. Anyway going back to the original question...
Yeah, we can get distracted delving too far into the details 😀
...will it be a definate that I have to create the table first before selecting into as the source table I will be selecting from will have NULL values.
Nope. Not necessary. If you have a lot of data try it different ways. Some things to try:
1. Use SELECT...INTO to create the table and populate it in one step. Then add the PK in a second step. This what the sample code below does.
2. Try creating the table first and using INSERT INTO...SELECT to populate it. Then add the PK after the data is in the table. If your database is in SIMPLE or BULK_LOGGED recovery mode add the TABLOCK hint to the INSERT INTO...SELECT.
3. Again create the table first, but immediately add the PK to the empty table. Then use INSERT INTO...SELECT to populate it. With this one also, if your database is in SIMPLE or BULK_LOGGED recovery mode add the TABLOCK hint to the INSERT INTO...SELECT.
You may be surprised at the outcome. Please post the results if you find the time.
It looks like I have to create the table first so my questions is if i do
create table bla bla bla
then select * from source table into bla bla bla
what will happen if it comes across a null value? will everything stop or will it carry on inserting what it can?
It would fail when it comes time to add the PK so you'll want to ignore those rows by using a filter in the SELECT portion of your INSERT INTO...SELECT (see sample code below).
also is there a way to say
If IS NULL then
do something?
else
carry on inserting
END IF
You can do this using the ISNULL function inline (see sample code), the NULLIF function, the COALESCE function, or a combination of those. If you have a very complex set of logic you want to apply to a single column can use a CASE expression in the SELECT portion of the INSERT INTO...SELECT query to manipulate the outcome based on multiple conditions.
The code you posted initially was a good start and Jeff extended it to handle the requirements. The new wrinkle you just added about omitting rows with a NULL name is reflected below. I simply modified Jeff's code to add that condition.
To force your key column to be NOT NULL in your SELECT...INTO you'll use the ISNULL function as Jeff showed, and that plus the filter on the WHERE-clause I just added should allow you to add your PK constraint without the engine complaining about the NULL-ability of the column.
Here is a working proof of concept with some sample data that I think tests the requirements we have so far:
USE tempdb;
IF EXISTS ( SELECT *
FROM sys.objects
WHERE object_id = OBJECT_ID(N'dbo.bigtable')
AND type IN (N'U') )
DROP TABLE dbo.bigtable;
GO
IF EXISTS ( SELECT *
FROM sys.objects
WHERE object_id = OBJECT_ID(N'dbo.lookuptable')
AND type IN (N'U') )
DROP TABLE dbo.lookuptable;
GO
CREATE TABLE dbo.bigtable
(
Software_Name_Original VARCHAR(100) NULL,
Software_Name_Raw VARCHAR(100)
);
GO
INSERT INTO dbo.bigtable
(Software_Name_Original, Software_Name_Raw)
VALUES ('Pacman', 'Pacman'),
('Ms. Pacman', 'Pacman'),
('Dig Dug', 'Dig Dug'),
(NULL, 'Galaga');
WITH cteDupeCheck
AS (
SELECT RowNum = ROW_NUMBER() OVER (PARTITION BY Software_Name_Raw ORDER BY Software_Name_Raw),
SN_Original = Software_Name_Original,
SN_New = Software_Name_Raw
FROM dbo.BigTable
WHERE Software_Name_Original IS NOT NULL -- this will guarantee that only rows with values are pulled into 'lookuptable'
)
SELECT SN_Original = ISNULL(SN_Original, 0) --ISNULL makes the column NOT NULL
,
SN_New
INTO dbo.Lookuptable
FROM cteDupeCheck
WHERE RowNum = 1;
--===== Since we now have a NOT NULL column,
-- add the desired unique clustered index.
ALTER TABLE dbo.LookUpTable
ADD CONSTRAINT PK_LookUpTable
PRIMARY KEY CLUSTERED (SN_Original);
-- show the results
SELECT *
FROM dbo.lookuptable;
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
March 12, 2013 at 7:24 pm
ok guys this is what I have with an error, see error below
use TestData
go
Drop table lookuptable;
WITH cteDupeCheck
AS (
SELECT RowNum = ROW_NUMBER() OVER (PARTITION BY Software_Name_Raw ORDER BY Software_Name_Raw),
SN_Original = Software_Name_Original,
SN_New = Software_Name_Raw,
SV_New = Software_Version_raw,
SP_New = Software_Publisher_Raw,
Licensable = Software_Category,
MSDN_Flag,
CDL_Flag,
Source,
Pending_Classification_Flag,
Auto_Classification_Flag,
Software_Classification_Version,
Manual_Deletion,
Load_Date
FROM dbo.BigTable
WHERE Software_Name_Original IS NOT NULL -- this will guarantee that only rows with values are pulled into 'lookuptable'
)
SELECT SN_Original = ISNULL(SN_Original, 0) --ISNULL makes the column NOT NULL
,
SN_New,
SV_New,
SP_New,
Licensable,
MSDN_Flag,
CDL_Flag,
Source,
Pending_Classification_Flag,
Auto_Classification_Flag,
Software_Classification_Version,
Manual_Deletion,
Load_Date
INTO dbo.Lookuptable
FROM cteDupeCheck
WHERE RowNum = 1;
--===== Since we now have a NOT NULL column,
-- add the desired unique clustered index.
ALTER TABLE dbo.LookUpTable
ADD CONSTRAINT PK_LookUpTable
PRIMARY KEY CLUSTERED (SN_Original);
-- show the results
SELECT *
FROM dbo.lookuptable;
ERROR MESSAGE IS AS FOLLOWS
(19905 row(s) affected)
Msg 1505, Level 16, State 1, Line 40
The CREATE UNIQUE INDEX statement terminated because a duplicate key was found for the object name 'dbo.Lookuptable' and the index name 'PK_LookUpTable'. The duplicate key value is (Cover Designer).
Msg 1750, Level 16, State 0, Line 40
Could not create constraint. See previous errors.
The statement has been terminated.
March 12, 2013 at 8:17 pm
I see it. Change this:
RowNum = ROW_NUMBER() OVER (PARTITION BY Software_Name_Raw ORDER BY Software_Name_Raw),
to
RowNum = ROW_NUMBER() OVER (PARTITION BY Software_Name_Original ORDER BY Software_Name_Original),
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
March 12, 2013 at 9:21 pm
absolutey fantastic
thanks very much that works
Alan
Viewing 8 posts - 16 through 22 (of 22 total)
You must be logged in to reply to this topic. Login to reply