Code first vs rightsizing nvarchar(n)/nvarcar(max)

  • I'm working on a project with a co-worker and I was working on creating a database for some integration of data for his workflow.

    I was trying to figure out the max length that my varchar(n) would need to be and during that discussion he tells me well, that his database via Entity Framework by default just creates all strings as nvarchar(max). To which I thought I remembered seeing various articles about potential performance hits if the string goes over 8K bytes/4k Chars (if all of the unicode chars are below the values where they jump to 4bytes/value) i.e. stored off page values, and cpu overhead due to the extra 24bits needed etc.

    Is this still an issue? I have read many thing various places stating that why do you need for instance unlimited string lengths for a FirstName.

    He is under the impression that after if the database resides on a SQL 2012 or better that using nvarchar(max) isn't anything to worry about if it's storing less then 4k characters (not being stored off page).

    I'm just thinking that just the extra byte per character is storage issue when scaled up by not using varchar(n/max)

    Is that right?

    Thanks in advance.

  • You're right. Nvarchar shouldn't be used unless it's needed or realistically could be needed.

    For example, if you create a two-character code column, there's no reason to use nvarchar, since you control the values the code can have.

    Similarly, you should never use length "(max)" unless it's absolutely needed. Any max length has some additional overhead, because of memory allocations. Even worse, it could prevent the DBA from properly tuning the table. The DBA can force (max) columns out of page if he/she feels that is best for overall performance; I have done so many times. But if you arbitrarily make most/all columns nvarchar(max), that option will no longer realistically be available.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • yakko_Warner (10/26/2016)


    I'm working on a project with a co-worker and I was working on creating a database for some integration of data for his workflow.

    I was trying to figure out the max length that my varchar(n) would need to be and during that discussion he tells me well, that his database via Entity Framework by default just creates all strings as nvarchar(max). To which I thought I remembered seeing various articles about potential performance hits if the string goes over 8K bytes/4k Chars (if all of the unicode chars are below the values where they jump to 4bytes/value) i.e. stored off page values, and cpu overhead due to the extra 24bits needed etc.

    Is this still an issue? I have read many thing various places stating that why do you need for instance unlimited string lengths for a FirstName.

    He is under the impression that after if the database resides on a SQL 2012 or better that using nvarchar(max) isn't anything to worry about if it's storing less then 4k characters (not being stored off page).

    I'm just thinking that just the extra byte per character is storage issue when scaled up by not using varchar(n/max)

    Is that right?

    Thanks in advance.

    From my personal experience.

    On one reasonably big database (of a GPS provider) I've managed to cut query response time (on average) by 7 times simply by redefining all NVARCHAR(MAX) columns as NVARCHAR(500) (all actual strings in there were not longer than 250 characters).

    7 times.

    With no index tuning, without even seeing a single line of code.

    _____________
    Code for TallyGenerator

  • Sergiy (10/26/2016)


    yakko_Warner (10/26/2016)


    I'm working on a project with a co-worker and I was working on creating a database for some integration of data for his workflow.

    I was trying to figure out the max length that my varchar(n) would need to be and during that discussion he tells me well, that his database via Entity Framework by default just creates all strings as nvarchar(max). To which I thought I remembered seeing various articles about potential performance hits if the string goes over 8K bytes/4k Chars (if all of the unicode chars are below the values where they jump to 4bytes/value) i.e. stored off page values, and cpu overhead due to the extra 24bits needed etc.

    Is this still an issue? I have read many thing various places stating that why do you need for instance unlimited string lengths for a FirstName.

    He is under the impression that after if the database resides on a SQL 2012 or better that using nvarchar(max) isn't anything to worry about if it's storing less then 4k characters (not being stored off page).

    I'm just thinking that just the extra byte per character is storage issue when scaled up by not using varchar(n/max)

    Is that right?

    Thanks in advance.

    From my personal experience.

    On one reasonably big database (of a GPS provider) I've managed to cut query response time (on average) by 7 times simply by redefining all NVARCHAR(MAX) columns as NVARCHAR(500) (all actual strings in there were not longer than 250 characters).

    7 times.

    With no index tuning, without even seeing a single line of code.

    Best optimisation with least effort.

    That must have been a great day.

    Catch-all queries done right [/url]
    Gail Shaw's Performance Blog[/url]

  • How large are we talking? TerraBytes or larger?

    Also can you quantify the 7x faster?

    Thanks!

  • yakko_Warner (10/27/2016)


    How large are we talking? TerraBytes or larger?

    Also can you quantify the 7x faster?

    Thanks!

    I was not a DBA there, so did not look into DB sized.

    Performance wise - terabytes are not so important as number of rows.

    It was a database of one of global GPS providers, with all geographic locations around the globe.

    Regarding 7 times - queries which normally took 30 seconds started respond within 5 seconds, the ones which timed out on 1minute - between 5 and and 10.

    Again, query from query was different, 7 times was an average estimation from their lead DEV.

    _____________
    Code for TallyGenerator

  • MadAdmin (10/27/2016)


    Sergiy (10/26/2016)


    yakko_Warner (10/26/2016)


    I'm working on a project with a co-worker and I was working on creating a database for some integration of data for his workflow.

    I was trying to figure out the max length that my varchar(n) would need to be and during that discussion he tells me well, that his database via Entity Framework by default just creates all strings as nvarchar(max). To which I thought I remembered seeing various articles about potential performance hits if the string goes over 8K bytes/4k Chars (if all of the unicode chars are below the values where they jump to 4bytes/value) i.e. stored off page values, and cpu overhead due to the extra 24bits needed etc.

    Is this still an issue? I have read many thing various places stating that why do you need for instance unlimited string lengths for a FirstName.

    He is under the impression that after if the database resides on a SQL 2012 or better that using nvarchar(max) isn't anything to worry about if it's storing less then 4k characters (not being stored off page).

    I'm just thinking that just the extra byte per character is storage issue when scaled up by not using varchar(n/max)

    Is that right?

    Thanks in advance.

    From my personal experience.

    On one reasonably big database (of a GPS provider) I've managed to cut query response time (on average) by 7 times simply by redefining all NVARCHAR(MAX) columns as NVARCHAR(500) (all actual strings in there were not longer than 250 characters).

    7 times.

    With no index tuning, without even seeing a single line of code.

    Best optimisation with least effort.

    That must have been a great day.

    Did you use len() or datalength() to find the length of the strings.

  • yakko_Warner (10/27/2016)


    Did you use len() or datalength() to find the length of the strings.

    Both.

    For verification.

    It did not matter at the end.

    They trimmed all strings in .Net code before storing in database.

    With geographic locations, building or parsing address strings it's very important not to have trailing spaces on your way.

    _____________
    Code for TallyGenerator

  • Your cow-orker should learn about data annotations.

    In current version of Visual studio (2015), SQL Server 2014, this entity does indeed create an NVARCHAR(max) column:

    public class TestEntity

    {

    public int ItemID { get; set; }

    public string SomeStringValue { get; set; }

    }

    But this entity creates an nvarchar of only 100:

    public class TestEntity

    {

    public int ItemID { get; set; }

    [MaxLength(100)]

    public string SomeStringValue { get; set; }

    }

    If you have to do code first, use data annotations for pity's sake!

    (from System.ComponentModel.DataAnnotations)

  • This is the first real Development project that is going to be "Mission Critical" to the company. So this is a first time out for us on this.

    I'm not sure that I will be able to convince him to make the change in his database but. I will take it into account in mine. i.e. since only English varchar(n) string data-types.

    Thing is this one database currently will over time maybe only get 500ish rows. (which will grow larger as the company grows).

    I will keep this in my pocket to improve performance if I can't convince them to change it now.

    Also in regards to the changing the data-type from nvarchar(max) to nvarchar(500) or varchar(n) what sort of data loss can occur, and/or after changing the data-types do you then run de-fragmentation scripts or anything else on the table/indexes b/c of the change?

  • ... he tells me well, that his database via Entity Framework by default just creates all strings as nvarchar(max).

    You plenty of reasons to be concerned at this point with this guy's approach to data modeling. For example, why doesn't he just use VARBINARY(MAX) for LastName, so users have the additional flexibility supplying an image of their family crest or sigil? You should scold this guy for limiting his design. 😀

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • So my co-worker just told me that he was dumping his database and will just be using mine. ;-(

    But no longer nvarcar(max) 😉

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply