March 11, 2013 at 11:36 pm
I am creating a table by using an insert into from another table
then I am deleting any duplicate rows from that table
then I try to create a primary key on column1(SN_Original)
but I get an error saying
Msg 8111, Level 16, State 1, Line 27
Cannot define PRIMARY KEY constraint on nullable column in table 'lookuptable'
is there a way round this
here is my code
Drop table lookuptable
select dbo.bigtable.Software_Name_Original as SN_Original ,
dbo.bigtable.Software_Name_Raw as SN_New
into lookuptable
from dbo.BigTable
order by dbo.bigtable.Software_Name_Raw
--Delete duplicate rows from lookup table based on SN_NEW (software_name_new)
DELETE f
FROM (
select row_number() over (partition by SN_NEW order by SN_NEW) as rn from lookuptable
) AS f
WHERE rn > 1
ALTER TABLE lookuptable ADD CONSTRAINT SN_OriginalPK
PRIMARY KEY CLUSTERED (SN_Original);
March 12, 2013 at 12:11 am
When you use
Select Into
the columns created would take the data type ,nullability etc. properties from the columns available in FROM clause.
so make sure you have SOFTWARE_NAME_ORIGINAL as NOT NULL ... Since , primary key can't be created on Nullable columns ..
~ demonfox
___________________________________________________________________
Wondering what I would do next , when I am done with this one :ermm:
March 12, 2013 at 12:13 am
Or , even better , don't use select into ..
Just create a table first and then Insert values into it ..
I find it better than Select into .
~ demonfox
___________________________________________________________________
Wondering what I would do next , when I am done with this one :ermm:
March 12, 2013 at 3:30 am
demonfox (3/12/2013)
Or , even better , don't use select into ..Just create a table first and then Insert values into it ..
I find it better than Select into .
Could you elaborate on this, please?
SELECT INTO is "performance winner", why would creating table first be better?
If you use SELECT INTO you cannot create PK or INDEX in time of "SELECT INTO", but after it's execution you can create whatever you like on a new table created.
SELECT INTO will copy column definition from source (eg. IDENTITY if present) or from result of datatype casting:
SELECT CAST(NULL as INT) Col1, CAST(NULL AS Varchar(40)) ... INTO ...
will create two columns Col1 INT and Col2 varchar(40).
SELECT INTO is the best way of creating temp tables in stored procedures whenever they required.
March 12, 2013 at 7:02 am
Eugene Elutin (3/12/2013)SELECT INTO is "performance winner", why would creating table first be better?
Not always. You can achieve minimal logging with INSERT INTO...SELECT as well. Some people prefer to create the table first and there is no hard-and-fast rule that says you shouldn't go that route.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
March 12, 2013 at 7:27 am
opc.three (3/12/2013)
Eugene Elutin (3/12/2013)SELECT INTO is "performance winner", why would creating table first be better?
Not always. You can achieve minimal logging with INSERT INTO...SELECT as well. Some people prefer to create the table first and there is no hard-and-fast rule that says you shouldn't go that route.
There are few ways to achieve minimal logging for insertes with different prerequisites for each of the method. I wonder, is content of the "minimal logging" the same for all methods?
March 12, 2013 at 7:29 am
What do you mean by "content"?
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
March 12, 2013 at 7:42 am
Reminder: There's no such thing as "minimal logging" if the database is using the "Full" recovery model.
Reminder: Temporarily shifting to "Simple" recovery destroys the log chain and either a full backup or differential backup will need to be taken immediately afterward changing the recovery model back to "FULL" to establish a new log chain.
Reminder: Temporarily shifting to "Bulk Logged" recovery destroys the ability to do point-in-time restores for any log file that contains minimally logged operations.
Further, opc.three is correct. Although SELECT/INTO is a miracle of performance even in the "FULL" recovery model, it's not always better if you need to add a clustered index or other indexes. Sometimes, it is more efficient to create the table and then do an INSERT/SELECT. There is no rule of thumb here, though. To know which is faster, you must test. Like anything else in SQL Server, "It Depends".
--Jeff Moden
Change is inevitable... Change for the better is not.
March 12, 2013 at 7:44 am
demonfox (3/12/2013)
When you useSelect Into
the columns created would take the data type ,nullability etc. properties from the columns available in FROM clause.
so make sure you have SOFTWARE_NAME_ORIGINAL as NOT NULL ... Since , primary key can't be created on Nullable columns ..
Not quite. SELECT/INTO will copy the IDENTITY property, the Data Type, and the data. It will NOT copy nullability. That's why the OP is currently having the problem he cited. I'll post the fix for this in a minute or two.
[EDIT] I stand corrected. Not sure why I thought it wouldn't work in 2K5 but it does. Looking into why I made such a mistake.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 12, 2013 at 8:29 am
Jeff Moden (3/12/2013)
demonfox (3/12/2013)
When you useSelect Into
the columns created would take the data type ,nullability etc. properties from the columns available in FROM clause.
so make sure you have SOFTWARE_NAME_ORIGINAL as NOT NULL ... Since , primary key can't be created on Nullable columns ..
Not quite. SELECT/INTO will copy the IDENTITY property, the Data Type, and the data. It will NOT copy nullability. That's why the OP is currently having the problem he cited. I'll post the fix for this in a minute or two.
What do you mean by "will NOT copy nullability"?
As I'm aware, it does copy it from source column:
CREATE TABLE Table1 (Col1 INT NULL, Col2 INT NOT NULL)
SELECT * INTO Table2 FROM Table1
exec sp_help Table2
I agree that there is no rule of thum for using SELECT ... INTO over INSERT ... SELECT, however I always start with SELECT ... INTO before considering the second one, in most of cases, I have experience with, it does win...
March 12, 2013 at 8:40 am
alan_lynch (3/11/2013)
I am creating a table by using an insert into from another tablethen I am deleting any duplicate rows from that table
then I try to create a primary key on column1(SN_Original)
but I get an error saying
Msg 8111, Level 16, State 1, Line 27
Cannot define PRIMARY KEY constraint on nullable column in table 'lookuptable'
is there a way round this
here is my code
Drop table lookuptable
select dbo.bigtable.Software_Name_Original as SN_Original ,
dbo.bigtable.Software_Name_Raw as SN_New
into lookuptable
from dbo.BigTable
order by dbo.bigtable.Software_Name_Raw
--Delete duplicate rows from lookup table based on SN_NEW (software_name_new)
DELETE f
FROM (
select row_number() over (partition by SN_NEW order by SN_NEW) as rn from lookuptable
) AS f
WHERE rn > 1
ALTER TABLE lookuptable ADD CONSTRAINT SN_OriginalPK
PRIMARY KEY CLUSTERED (SN_Original);
Before I get into the simple code to do as you ask, I'd like to suggest a couple of things.
1. Always use a 2 part naming convention. there's a slight performance advantage to it (especially for frequently executed GUI related procs) and it makes your code more bullet proof if someone ever creates another schema and creates a table with the same object name in that schema. That also means that you should correctly use table aliases for queries that address more than 1 table. It also means you need to stop using the 3 part naming convention especially since it's been deprecated for the last couple of revisions of SQL Server.
2. Format your code for readability especially when it comes to indenting. You might actually be the next person that has to work on it. 😉
3. Start using semi-colons where they belong for every statement. Leaving them out has also been deprecated for a couple of revisions of SQL Server.
4. Break out Books Online (press the {f1} key in SSMS to get there) and read about CTEs. They're basically the same as the sub-query you wrote but reveal themselves in a more natural "top down" fashion.
5. Only select what you need. DELETEs are expensive because you first had to load the data and then delete it. On top of that, the DELETEs get logged even if you're in the SIMPLE recovery mode because it's not ever minimally logged.
With all of that in mind, here's the (untested) code that I'd use to replace your original code. I say "untested" only because I don't have your original table handy and you didn't provide any readily consumable data to test against. See the first link in my signature line below for future posts.
Here's the code...
--===== If the lookup table already exists, drop it.
IF OBJECT_ID('dbo.LookupTable','U') IS NOT NULL
DROP TABLE dbo.LookupTable
;
--===== Create and populate the new lookup table on-the-fly.
-- Since you're not creating an IDENTITY column here,
-- and ORDER BY is a useless and only makes the code
-- take longer an use more resources.
-- Also, stop using the 3 part naming convention in the
-- SELECT list... it's been deprecated.
-- Also notice that there's no need to load a bunch of
-- data and then delete it. That's another waste of
-- clock cycles.
WITH
cteDupeCheck AS
(
SELECT RowNum = ROW_NUMBER() OVER (PARTITION BY Software_Name_Raw
ORDER BY Software_Name_Raw)
, SN_Original = Software_Name_Original
, SN_New = Software_Name_Raw
FROM dbo.BigTable
)
SELECT SN_Original = ISNULL(SN_Original,0) --ISNULL makes the column NOT NULL
, SN_New
INTO dbo.Lookuptable
FROM cteDupeCheck
WHERE RowNum = 1
;
--===== Since we now have a NOT NULL column,
-- add the desired unique clustered index.
ALTER TABLE dbo.LookUpTable
ADD CONSTRAINT PK_LookUpTable
PRIMARY KEY CLUSTERED (SN_Original)
;
Let us know how it works out for you. Feedback is our only reward. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
March 12, 2013 at 8:50 am
Eugene Elutin (3/12/2013)I agree that there is no rule of thum for using SELECT ... INTO over INSERT ... SELECT, however I always start with SELECT ... INTO before considering the second one, in most of cases, I have experience with, it does win...
I start with SELECT...INTO as well (now anyway, thanks to Jeff explaining the situation a year or so back) but with the need for a PK and possibly other indexes, we should not dismiss INSERT INTO...SELECT straightaway.
I think you and Jeff were typing at the same time and his post should clear the air on what was meant by 'not copying NULLability', a simple matter of perspective.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
March 12, 2013 at 8:56 am
Eugene Elutin (3/12/2013)
Jeff Moden (3/12/2013)
demonfox (3/12/2013)
When you useSelect Into
the columns created would take the data type ,nullability etc. properties from the columns available in FROM clause.
so make sure you have SOFTWARE_NAME_ORIGINAL as NOT NULL ... Since , primary key can't be created on Nullable columns ..
Not quite. SELECT/INTO will copy the IDENTITY property, the Data Type, and the data. It will NOT copy nullability. That's why the OP is currently having the problem he cited. I'll post the fix for this in a minute or two.
What do you mean by "will NOT copy nullability"?
As I'm aware, it does copy it from source column:
CREATE TABLE Table1 (Col1 INT NULL, Col2 INT NOT NULL)
SELECT * INTO Table2 FROM Table1
exec sp_help Table2
I agree that there is no rule of thum for using SELECT ... INTO over INSERT ... SELECT, however I always start with SELECT ... INTO before considering the second one, in most of cases, I have experience with, it does win...
First, I absolutely agree. My experience has been that SELECT/INTO will usually win especially when using it to create a Temp Table (minimal logging there) and especially when no indexes are involved. In most cases, no indexes will be needed because, hopefully, you've only captured the data you really need and table scans will be as fast as seeks followed by a range scan (like what occurred in my "Hierchies on Steroids" articles... the heap was actually faster than a clustered table). I have, however, done some experiments with opc.three and there are times when the "create then fill" method does work faster than SELECt/INTO followed by an index creation. Like I said, "It Depends".
On the second part, it would appear they fixed the copy of nullability problem in 2008 and above. Sorry for not being aware of that. If you run your code in 2005, here's the result you get. {EDIT} My apologies. I don't know why I thought SELECT/INTO wouldn't copy nullability. There was a reason but I'll be darned if I can remember what it was.
Column_name Type Computed Length Prec Scale Nullable TrimTrailingBlanks FixedLenNullInSource Collation
----------- ---- -------- ------ ----- ----- -------- ------------------ -------------------- ---------
Col1 int no 4 10 0 yes (n/a) (n/a) NULL
Col2 int no 4 10 0 no (n/a) (n/a) NULL
Still, I'll continue to use the ISNULL method to make the code bullet proof. It costs almost nothing in terms of performance to do so.
As a sidebar, the ISNULL method will also strip the IDENTITY property off a column if you need to do such a thing.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 12, 2013 at 9:10 am
Jeff Moden (3/12/2013)
.... If you run your code in 2005, here's the result you get.
Column_name Type Computed Length Prec Scale Nullable TrimTrailingBlanks FixedLenNullInSource Collation
----------- ---- -------- ------ ----- ----- -------- ------------------ -------------------- ---------
Col1 int no 4 10 0 yes (n/a) (n/a) NULL
Col2 int no 4 10 0 no (n/a) (n/a) NULL
...
So, it works in SQL2005 too 😉
I agree on use of ISNULL to make column NOT NULL and strip away its IDENTITY if such thing is desired outcome.
On creating indexes post table creation - agree as well, it's all depends.
March 12, 2013 at 9:31 am
Eugene Elutin (3/12/2013)
Jeff Moden (3/12/2013)
.... If you run your code in 2005, here's the result you get.
Column_name Type Computed Length Prec Scale Nullable TrimTrailingBlanks FixedLenNullInSource Collation
----------- ---- -------- ------ ----- ----- -------- ------------------ -------------------- ---------
Col1 int no 4 10 0 yes (n/a) (n/a) NULL
Col2 int no 4 10 0 no (n/a) (n/a) NULL
...
So, it works in SQL2005 too 😉
I agree on use of ISNULL to make column NOT NULL and strip away its IDENTITY if such thing is desired outcome.
On creating indexes post table creation - agree as well, it's all depends.
Dammit. What's going on here? First, appologies for not drinking enough coffee before posting. :blush: These old eyes didn't see the absence of NOT on the first column. I'm not sure why I thought it wouldn't copy nullability and now I've got some digging to do.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 22 total)
You must be logged in to reply to this topic. Login to reply