April 11, 2010 at 3:03 pm
I'm practising SQL Exercises from WikiBooks and am getting an error with one of the solutions they have provided to their practise exercises, and can't figure out why I'm getting that error. The code for creating the database and populating it's two tables is pasted below, relatively simple.
The exercises are to run code to i) select the average price of each manufacturer's products, showing the manufacturer's name and ii) select the average price of each manufacturer's products, showing the manufacturer's name. Both of these are separate exercises.
When I run the solution provided by wikibooks, I get the following error:
"The text, ntext, and image data types cannot be compared or sorted, except when using IS NULL or LIKE operator"
Code to Create DB and populate tables:
CREATE TABLE Manufacturers (
Code INTEGER PRIMARY KEY NOT NULL,
Name TEXT NOT NULL
);
CREATE TABLE Products (
Code INTEGER PRIMARY KEY NOT NULL,
Name TEXT NOT NULL ,
Price REAL NOT NULL ,
Manufacturer INTEGER NOT NULL
CONSTRAINT fk_Manufacturers_Code REFERENCES MANUFACTURERS(Code)
);
Scripts I'm trying to run :
Select the average price of each manufacturer's products, showing the manufacturer's name:
SELECT AVG(Price), Manufacturers.Name
FROM Products INNER JOIN Manufacturers
ON Products.Manufacturer = Manufacturers.Code
GROUP BY Manufacturers.Name;
Select the names of manufacturer whose products have an average price larger than or equal to $150:
SELECT AVG(Price), Manufacturers.Name
FROM Products INNER JOIN Manufacturers
ON Products.Manufacturer = Manufacturers.Code
GROUP BY Manufacturers.Name
HAVING AVG(Price) >= 150
Any ideas?
April 11, 2010 at 3:12 pm
both your tables have a column named "NAME" of the datatype TEXT:
..Name TEXT NOT NULL ,...
those data types are stored in a different way; a pointer is stored and the actual data is in a separate spot on the disk;
because of that storage method, that imposes a limitation on things you can do...in your example, you cannot GROUP BY that column.
the best thing to do is to change it to a VARCHAR(MAX) or a decent sized field like VARCHAR(200); you are not going to have a NAME that is paragraphs in size, so make it a practical size so you can actually test the code you are practicing with.
Lowell
April 11, 2010 at 3:28 pm
Thanks a lot Lowell, I changed it VARCHAR (50) since this is just for testing purposes, and both scripts work now. Something new learnt.
However, I didn't quite understand what you meant by "those data types are stored in a different way; a pointer is stored and the actual data is in a separate spot on the disk; because of that storage method, that imposes a limitation on things you can do...in your example, you cannot GROUP BY that column." Would you mind elaborating?
Another related question: When we select data type of VARCHAR (200), does it allot 200 characters worth of space to that field, or only the characters entered? So if I type "Lowell" in this field, will it still occupy 200 characters on the physical hard disk, or just 6?
April 11, 2010 at 4:25 pm
Like2SQL (4/11/2010)
Thanks a lot Lowell, I changed it VARCHAR (50) since this is just for testing purposes, and both scripts work now. Something new learnt.However, I didn't quite understand what you meant by "those data types are stored in a different way; a pointer is stored and the actual data is in a separate spot on the disk; because of that storage method, that imposes a limitation on things you can do...in your example, you cannot GROUP BY that column." Would you mind elaborating?
The text and ntext datatypes were created to handle very long character strings(more than about 8000 characters for text, more than about 4000 characters for ntext). These were strings which could not be held in a normal database page because they were too big. So space is allocated for them somewhere else in the disc area and the row "containing" the data really contains a pointer to the data instead of containing the data itself. There are a lot of restrictions on what you can do with text and ntext data, as opposed to char/varchar and nchar/nvarchar data which really is contained in the row, and of course text and ntext variables can't be copied to local variables. This is not a very satisfactory state of affairs, and in the latest version of SQL server there are new datatypes varchar(max) and nvarchar(max) for very large character strings which don't have all these restrictions, and the text and ntext datatypes will be phased out in a new release sometime soon.
If you have things like a name and use text datatype, the disc space allocated will be a 16 byte pointer in the row and a multiple of some quite large number of bytes where the pointer points to. A short variable length string datatype like varchar(200) or varchar(128) is probably big enough to hold any name, and will use far less disc space, so varchar or nvarchar is the way to go for something like a name. If you need names over 8000 characters long you will need to use varchar(max) and set the configuration option that allows the use of out of row storage for this type.
You can find some quite useful documentation at http://msdn.microsoft.com/en-gb/library/ms187993.aspx and at http://msdn.microsoft.com/en-gb/library/ms178158.aspx.
Another related question: When we select data type of VARCHAR (200), does it allot 200 characters worth of space to that field, or only the characters entered? So if I type "Lowell" in this field, will it still occupy 200 characters on the physical hard disk, or just 6?
If you type N characters in a varchar field it will occupy N+2 characters. So a varchar(200) field with "Lowell" in it takes up 8 characters-worth of space on the disc, not 200 characters.
Tom
April 11, 2010 at 4:25 pm
some of this may be beyond my ability to explain it, but here goes:
a TEXT or IMAGE datatype fields can store something like 2 gig of characters; it's variable length, so if you just put something small in it, say my name, it's going to occupy 6 chars, plus two bytes for a"pointer" to where the data is actually stored on the disk. it's stored differently, with some caveats, but it is at least always treated differently than other datatypes.
other data types are stored in "pages" of data that SQL server keeps organized and has immediate access to. a page is a bit over 8K bytes. so a 100 meg database is chopped up into a bunch of 8K pages, and organized for accessibility.
for performance reasons, SQL would not keep a potentially huge document in pages so it can be grabbed easily;that would waste memory for something that probably is not accessed very often.
instead in keeps a pointer so it knows where the data is when you need it; SQL can index, and therefore can "group by" and do lots of other stuff, only to the data that exists in the pages of memory, and not the data outside of the pages; that would me IMAGE datatypes as well, which might be binary files of executables, documents etc.
behind the scenes, technically a TEXT field is stored just like a VARCHAR(8000) (the max size) until it gets bigger thank *K, then it gets moved off to separate memory; but because one row in the table might be less than 8K and the other might be bigger, the syntax for the field has to use the rules assuming something larger.
you also want to compare VARCHAR and CHAR variables; a varchar(200) with my name would take 6 characters, plus a CHAR(0) ending character to mark the end of the data;it could potentially be bigger, especially if UPDATEd, but for the moment it's pretty much the actual size of the data.
a CHAR(200) data would be padded with spaces all the way out to the full defined size of 200 characters.
Like2SQL (4/11/2010)
Thanks a lot Lowell, I changed it VARCHAR (50) since this is just for testing purposes, and both scripts work now. Something new learnt.However, I didn't quite understand what you meant by "those data types are stored in a different way; a pointer is stored and the actual data is in a separate spot on the disk; because of that storage method, that imposes a limitation on things you can do...in your example, you cannot GROUP BY that column." Would you mind elaborating?
Another related question: When we select data type of VARCHAR (200), does it allot 200 characters worth of space to that field, or only the characters entered? So if I type "Lowell" in this field, will it still occupy 200 characters on the physical hard disk, or just 6?
Lowell
April 11, 2010 at 5:24 pm
Lowell (4/11/2010)
behind the scenes, technically a TEXT field is stored just like a VARCHAR(8000) (the max size) until it gets bigger thank *K, then it gets moved off to separate memory; but because one row in the table might be less than 8K and the other might be bigger, the syntax for the field has to use the rules assuming something larger.
Yes, it is. Like2SQL should ignore the bit of my post that says it always uses the pointer, even when the text field is small. Thanks, Lowell, you've taught me something today.
Tom
April 19, 2010 at 12:41 am
The older LOB data types (text, ntext, and image) are always stored off-row, by default, though this behaviour can be modified by the 'text in row' table option. The exact storage layout depends on the size of the LOB data.
Data less than 64 bytes in length is contained in an 84-byte LOB B-tree root structure pointed to by the 16-byte in-row structure. (This root structure is stored off row).
Data less than 32KB in length is stored in a B-tree structure managed by the 84-byte root structure. The actual data might be stored on many LOB pages, in chunks determined by the size of the writes performed by the application. The data is stored wherever SQL Server finds room - with small chunks being combined to help fill pages. These pages might store data from more than one row, and are known as TEXT_MIXED pages.
Once the data exceeds about 40KB, SQL Server starts allocating dedicated pages - known as TEXT_DATA allocations - that are never shared across rows.
If the 'text in row' is set for a particular table, SQL Server attempts to store LOB data up to the specified size (24 to 7000 bytes or 256 bytes by default) in the row. In any case, the 16-byte structure is replaced in the row by the LOB B-tree root (minimum size 24 bytes - hence the minimum value for 'text in row').
Data might not always be stored in-row if doing so would exceed the usual row-length limits. SQL Server gives priority to expanding variable length columns (like VARCHAR) over LOB data, so an update to a VARCHAR column would make the row size exceed the maximum, the LOB will be pushed off-row.
As should be apparent, LOB data storage can get very complex.
MAX data types are, by default, stored in-row up to 8000 bytes (this can be overridden by the 'large value types out of row' table option).
The Storage Engine treats MAX data types below 8000 bytes exactly as if the column were VARCHAR(8000), NVARCHAR(4000), or VARBINARY(8000). This means, for example, that MAX data might be stored in ROW_OVERFLOW allocation units. Once the size exceeds 8000 bytes, MAX data is stored exactly as for the older LOB types.
It is important not to confuse storage with other concerns however: even though the Storage Engine treats MAX data types in a special way, other components in SQL Server have to treat them differently. One of the reasons MAX-type variables perform less well is because they use tempdb storage. Another long story there...
April 19, 2010 at 6:35 pm
Paul White NZ (4/19/2010)
lomg description of lob and other large thing storage types
Paul, that's wonderful. Please keep producinjg comments like that one.
Tom
April 20, 2010 at 6:47 am
Like2SQL - you may find that the exercises you are working with are designed for something like MS Access, where, I believe, the TEXT datatype is something different to SQL Server's. In fact, I think the Access TEXT type is more or less a varchar(255).
If that's the case, you may need to perform some mental translation whenever you come across that term.
Cheers
John
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply