How to Improve Database Design to Speed Up SQL Queries

  • Jonathan AC Roberts wrote:

    MMartin1 wrote:

    Muskan wrote:

    Improving database design can have a significant impact on the speed of SQL queries. Here are some tips to optimize your database design and enhance query performance:

    • <b>....</b>
    • ...
    • ..
    • Denormalization: While normalization is generally recommended, there are cases where denormalization can improve query performance. Denormalization involves adding redundant data or pre-calculating values to avoid expensive joins or calculations. This can be beneficial for read-heavy applications or when dealing with complex queries.
    • ...

    ...

    ...

    .

    This section make no sense put together as it is.

    LOL, denormalization ....It probably meant NOT recommended.

    It can speed up reads?  As if this was the exception

    This looks AI generated, and off target with meaning.  

    Actually I think denormalization is often used to speed up queries when designing a database or a data warehouses as it removes a table and a join to that table in a query.

    Yes indeed, that is what I meant. Just that the chat bot was confusing matters

    "While normalization is generally recommended, there are cases where denormalization can improve query performance"

    With that sentence you would expect to see a case where it is not beneficial after the comma.

    • This reply was modified 1 year, 7 months ago by  MMartin1.

    ----------------------------------------------------

  • Jonathan AC Roberts wrote:

    MMartin1 wrote:

    Muskan wrote:

    Improving database design can have a significant impact on the speed of SQL queries. Here are some tips to optimize your database design and enhance query performance:

    • <b>....</b>
    • ...
    • ..
    • Denormalization: While normalization is generally recommended, there are cases where denormalization can improve query performance. Denormalization involves adding redundant data or pre-calculating values to avoid expensive joins or calculations. This can be beneficial for read-heavy applications or when dealing with complex queries.
    • ...

    ...

    ...

    .

    This section make no sense put together as it is.

    LOL, denormalization ....It probably meant NOT recommended.

    It can speed up reads?  As if this was the exception

    This looks AI generated, and off target with meaning.  

    Actually I think denormalization is often used to speed up queries when designing a database or a data warehouses as it removes a table and a join to that table in a query.

    I totally agree

    I did express denormalising speeds up reads in my initial post in this thread.

    "A database design geared towards faster reads is "denormalised." "

     

    • This reply was modified 1 year, 7 months ago by  MMartin1.

    ----------------------------------------------------

  • MMartin1 wrote:

    You have to have them fairly orderly because a sufficietly fragmented index is of no use.

    I have to be a bit contrary there...

    The fragmentation of an index means nothing for performance when looking up single rows in an OLTP system.  In such cases, there's zero difference in performance whether the index has 0% logical fragmentation or 99.9999% fragmentation.

    Logical fragmentation also doesn't mean as much as people would have us believe even on large multi-row queries.  If you notice in virtually every article, 'tube, and blog, they all say that it CAN but no one ever steps up to prove it with demonstrable code.

    I've also seen it where removing fragmentation actually causes queries to slow down quite a bit because the index suddenly occupies fewer pages and SQL Server suddenly decides that it no longer needs to go parallel for a given task.

    Someday, hopefully soon, I'll be able to document a lot of this in the form of articles along with the normal inclusion of demonstrable code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I have to disagree regarding the "natural" vs. "surrogate" primary keys. In almost all cases, surrogate keys are the better choice, for 4 reasons:

    1. Natural keys are often too large, requiring to much index space.
    2. (Actually, 1a): This gets worse if you have to have composite keys. Avoid composite keys, except for relation tables (n:m tables).
    3. Natural keys are often not really unique, contrary to your belief. E.g. because of errors of the issuing authority.
    4. Natural keys can change, e.g. because you mistyped them when entering first. This is not a show stopper, but every update trickles down to all child tables.

    Use auto increment columns, or, if you need to create the key in memory, go for GUIDs.

  • Surrogate keys are only better in terms of being easier to join.  But if the source data has a natural key there almost always value in enforcing that even if that's in addition to a surrogate key.  Just sticking an auto assigned surrogate key on your records and ignoring the natural keys is a great way to end up with duplicate records.

  • Jonathan AC Roberts wrote:

    MMartin1 wrote:

    Muskan wrote:

    Improving database design can have a significant impact on the speed of SQL queries. Here are some tips to optimize your database design and enhance query performance:

    • <b>....</b>
    • ...
    • ..
    • Denormalization: While normalization is generally recommended, there are cases where denormalization can improve query performance. Denormalization involves adding redundant data or pre-calculating values to avoid expensive joins or calculations. This can be beneficial for read-heavy applications or when dealing with complex queries.
    • ...

    ...

    ...

    .

    This section make no sense put together as it is.

    LOL, denormalization ....It probably meant NOT recommended.

    It can speed up reads?  As if this was the exception

    This looks AI generated, and off target with meaning.  

    Actually I think denormalization is often used to speed up queries when designing a database or a data warehouses as it removes a table and a join to that table in a query.

    Ok... just don't do it in the database.  That's what Data Warehouses are for. 😀

    Personally, I never understand the denormalization thing.  Have a properly designed, normalized database and have the people using it learn how to write better queries.

    A great example is what I just fixed at work today.  It was code that I found that had apparently caused a really nasty customer facing issue on Friday.  A single query was taking an average of 19 seconds to run, consumed more that 60,000 data pages of I/O every run, and was necessarily running more than 45,000 times in less than 12 hours.  It was consuming between 10 and 14 Terrabytes (a million million or 1 followed by 12 zeros) every day.

    I got it down to running in less than 1 MILLI-Second and only uses 33 data pages to run and I didn't de normalize a bloody thing.  In fact, the problem was caused by a denormalized table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jonathan AC Roberts wrote:

    Muskan wrote:

    Improving database design can have a significant impact on the speed of SQL queries. Here are some tips to optimize your database design and enhance query performance:

    • Indexing: Proper indexing can greatly improve query performance. Identify the columns frequently used in WHERE, JOIN, and ORDER BY clauses, and create indexes on those columns. Be cautious not to over-index, as it can slow down data modification operations.
    • Normalize the schema: Normalize your database schema to eliminate data redundancy and improve data integrity. This involves breaking down data into logical tables and reducing data duplication. Normalization helps minimize the storage required and can speed up query execution.
    • Denormalization: While normalization is generally recommended, there are cases where denormalization can improve query performance. Denormalization involves adding redundant data or pre-calculating values to avoid expensive joins or calculations. This can be beneficial for read-heavy applications or when dealing with complex queries.
    • Partitioning: Partitioning divides large tables into smaller, more manageable parts based on a specific criterion (e.g., date range or key range). Partitioning can enhance query performance by reducing the amount of data that needs to be scanned.
    • Query optimization: Review and optimize your SQL queries to ensure they are structured efficiently. Use appropriate JOIN types, filter rows early in the query using WHERE clauses, and avoid unnecessary calculations or function calls in the SELECT clause. Analyze query execution plans to identify areas for improvement.
    • Database indexing statistics: Keep your database's indexing statistics up to date. These statistics help the query optimizer make better decisions when creating execution plans. Regularly update statistics to ensure accurate query optimization.
    • Avoid unnecessary data retrieval: Retrieve only the necessary columns and rows. Minimize the amount of data transferred between the database and the application. Use SELECT statements with specific column names instead of selecting all columns using.
    • Optimize data types and table structure: Use appropriate data types that match the nature of the data. Avoid using large data types when smaller ones would suffice. Additionally, ensure that your table structure is well-designed, avoiding excessive column sizes and null values where possible.
    • Proper hardware and configuration: Ensure that your database server is properly configured and has sufficient hardware resources such as CPU, memory, and storage to handle the workload. Proper server configuration can significantly improve query performance.
    • Regular maintenance: Perform regular database maintenance tasks, such as index rebuilding, statistics updating, and data purging/archiving. This helps keep the database in optimal condition and avoids performance degradation over time.

    Remember, the effectiveness of these optimizations may vary depending on your specific database system, workload, and data characteristics. It's essential to benchmark and test the impact of any changes before applying them to a production environment.

    ChatGPT?

    I'm thinking more along the lines of "CrapGPT". 😀

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Jonathan AC Roberts wrote:

    MMartin1 wrote:

    Muskan wrote:

    Improving database design can have a significant impact on the speed of SQL queries. Here are some tips to optimize your database design and enhance query performance:

    • <b>....</b>
    • ...
    • ..
    • Denormalization: While normalization is generally recommended, there are cases where denormalization can improve query performance. Denormalization involves adding redundant data or pre-calculating values to avoid expensive joins or calculations. This can be beneficial for read-heavy applications or when dealing with complex queries.
    • ...

    ...

    ...

    .

    This section make no sense put together as it is.

    LOL, denormalization ....It probably meant NOT recommended.

    It can speed up reads?  As if this was the exception

    This looks AI generated, and off target with meaning.  

    Actually I think denormalization is often used to speed up queries when designing a database or a data warehouses as it removes a table and a join to that table in a query.

    Ok... just don't do it in the database.  That's what Data Warehouses are for. 😀

    Personally, I never understand the denormalization thing.  Have a properly designed, normalized database and have the people using it learn how to write better queries.

    A great example is what I just fixed at work today.  It was code that I found that had apparently caused a really nasty customer facing issue on Friday.  A single query was taking an average of 19 seconds to run, consumed more that 60,000 data pages of I/O every run, and was necessarily running more than 45,000 times in less than 12 hours.  It was consuming between 10 and 14 Terrabytes (a million million or 1 followed by 12 zeros) every day.

    I got it down to running in less than 1 MILLI-Second and only uses 33 data pages to run and I didn't de normalize a bloody thing.  In fact, the problem was caused by a denormalized table.

    Jeff, This is a very interesting situation you fixed. Denormalised data has fewer joins and faster for reads in general. Now I am curious, what exactly was the issue ?

     

    • This reply was modified 1 year, 5 months ago by  MMartin1.

    ----------------------------------------------------

  • I must commend the author for this one

    "In some situations, there is no column guaranteed to be unique for each entry, and developers might be tempted to rely on surrogate keys. It is better practice, however, to generate a surrogate key based on a composite of multiple columns that can uniquely identify each entry in the database."

    Nothing wrong with using identify type surrogates, but only AFTER actual keys are identified. It is true that JOINs on integers are faster than JOINs on composite keys (at least it is faster to type them).

    We are not designing databases for speed, we are doing it to ensure data integrity. What is the use for data retrieved quickly if it is inaccurate or could be inaccurate? Determining natural keys is a must.

    Another must is NOT NULL. NULLs creep up when we do not have a complete set of values to insert into a record. Think about that at conceptual design stage. If something is not guaranteed to exist at INSERT time, move it to another table.

    Vulgare denormalization is not desirable. However, there are ways of increasing data consistency by adding some CONTROLED  redundancy. For example, students at a school are allowed to take lessons in mora than one language (I am from Canada, eh). For them we must have teachers and books and all in language of choice. we can star with:

    STudents (StudentID PK, Language),  - OK

    Teachers (TeacherID PK, LAnguage), - OK

    StudentTeachers (teacherID FK to teachers, StudentID FK to Students, PK( TEacherID,StudentID))  -  BAD IDEA, even if fully normalized. Why?

    There is no guarantee that English speaking teacher will not be paired wit French speaking teacher. I've see this happening in practice, again I am from Canada. Better solution is:

    STudents (StudentID PK, Language, SuperKey (StudentID,Language)),  - OK

    Teachers (TeacherID PK, LAnguage, SuperKey (TeacherID, LAnguage)), - OK

    StudentTeachers (teacherID , StudentID , Language),  PK (TeacherID, StudentID),

    FK1 (TeacherID,LAnguage) to Teachers (TeacherID, LAnguage0)

    FK2 (StudentID,LAnguage  to Students (StudentID, Language)

    Now we cannot mix English students with French teachers, at least at work.

    🙂

    Zidar's Theorem: The best code is no code at all...

  • Zidar wrote:

    We are not designing databases for speed, we are doing it to ensure data integrity. What is the use for data retrieved quickly if it is inaccurate or could be inaccurate? Determining natural keys is a must.

    While I agree that designing for absolute data integrity is the #1 truth with no justifiable deviation, the #2 truth is that of performance and, although secondary in nature, it's a very, very close second.  If you can't get at the truthful data when needed and in a timely fashion, you have a serious problem.

    Just to preach a bit, for both "truths" in design, I've found that "Good enough usually isn't".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I agree with Jeff 100%.  Performance is paramount, assuming the foundation is sound. Bot fast and safe at the same time is an ultimate goal, but at the same time it is a  luxury not too many of us have enough skills to achieve.

    Cheers

     

    Zidar's Theorem: The best code is no code at all...

  • "

    STudents (StudentID PK, Language, SuperKey (StudentID,Language)),  - OK

    Teachers (TeacherID PK, LAnguage, SuperKey (TeacherID, LAnguage)), - OK

    StudentTeachers (teacherID , StudentID , Language),  PK (TeacherID, StudentID),

    FK1 (TeacherID,LAnguage) to Teachers (TeacherID, LAnguage0)

    FK2 (StudentID,LAnguage  to Students (StudentID, Language)

    "

    Language should have its own table with languageID. A person can speak more than one language.

    new table -- > languages

    A person may be a teacher and a student. Who knows?

    --> Person Table, Roles table

    Two association tables

    personRoles (personID, roleID) and personLanguages (personID, languageID)

    you can map students who speak french to a teacher who speaks french with this design.

    ----------------------------------------------------

  • Zidar wrote:

    I agree with Jeff 100%.  Performance is paramount, assuming the foundation is sound. Bot fast and safe at the same time is an ultimate goal, but at the same time it is a  luxury not too many of us have enough skills to achieve.

    Heh... and always remember... that's a personal choice. 🙁

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply