Best way of writing a query

  • Hello,

    below is a set of queries. According to my perception, both should return the same result.

    select * from blobs where blob_id in

    (Select blob_id from attachments where purchase_order_id in

    (Select purchase_order_id from purchase_orders where is_submitted=0 and year(created_date)<=2010))

    select b.blob_id from blobs b,attachments a, purchase_orders p where

    b.blob_id=a.blob_id and a.purchase_order_id=p.purchase_order_id and p.is_submitted=0

    and year(p.created_date)<=2010

    The 1st one takes time to execute while the second one returns the result immediately. Need to know if the 2nd one is the best way to write the above query.

    2nd is the 1st one returns fewer rows than the 2nd one though both should return the same result.

    Could you please clarify the same for me.

    Thanks in advance.

    Regards,

    K. Sripriya

  • Hi ,

    The difference will be shown in the execution plan.

    Try this article for starters

    http://www.simple-talk.com/sql/performance/execution-plan-basics/

    BTW this syntax 'year(p.created_date)<=2010' is non-sargable. Basically this means that sqlserver will be unable to use any potential index on the column as it has to get every row and test which year the created date is.

    If the syntax were , p.created_date >='20100101' and p.created_date <'20110101' then an index is more likely to be used.



    Clear Sky SQL
    My Blog[/url]

  • Thank you for your response. The execution plan is not very clear. Could you please explain the above queries to me. For the second query, when I put distinct, I get the same result as the 1st one.

  • Ok , it wasnt clear from your post.

    The issue is , probably , duplicate data

    Create TAble #t1

    (

    id integer

    )

    go

    Create TAble #t2

    (

    id integer

    )

    go

    insert into #t1 values(1),(1)

    go

    insert into #t2 values(1),(1)

    go

    select * from #t1 where id in(Select id from #t2)

    go

    select * from #t1 join #t2

    on #t1.id = #t2.id



    Clear Sky SQL
    My Blog[/url]

  • Thank you. Let me explain.

    I have three tables, purchase_orders, attachments, blobs.

    Purchase Orders table is the parent and attachments is a child of it.

    Blobs is also a child table.

    The attachment table has purchase id as well as the blob id. Attachments and blobs are linked using the blob_id.

    Now, my requirement is to get the data from blobs table where the records are related to those orders created before 2010. Purchase order and attachment are linked using the Purchase_order_id field.

    Attachments and blobs are related using the blob_id field.

    Regards,

    K. Sripriya.

  • Hello,

    about your first question: yes, the second query is best than the first in terms of performance. The reason is that, when you code a subquery, the DB engine can attempt to run your subquery one time for each row in the main table; you coded two subqueries, so you are in risk twice.

    About your second question: you have two attachments with the same blob_id, or you have two purchase_orders with the same purchase_order_id. You should add attachment_id (if it exists) and purchase_order_id to the results of your query to detect these duplicate values.

    This query can help you to find the blob_id selected more than once:

    select b.blob_id, COUNT(*)

    from blobs b,attachments a, purchase_orders p

    where b.blob_id=a.blob_id and a.purchase_order_id=p.purchase_order_id and p.is_submitted=0

    and p.created_date<'2011-01-01'

    GROUP BY b.blob_id

    HAVING COUNT(*) > 1

    Regards,

    Francesc

  • PriyaBabu,

    I noticed you used the older syntax of joining the tables in the WHERE clause. While this works, it is preferred to join the tables in the FROM part of the statement. It is much easier to read that way. Less mistakes are made in SQL code when the code is easy to read and understand.

    Here would be a preferred way to write your query:

    SELECT DISTINCT B.Blob_id

    FROM Blobs B

    INNER JOIN Attachments A ON

    B.Blob_id = A.Blob_id

    INNER JOIN Purchase_Orders P ON

    A.Purchase_Order_id = P.Purchase_Order_id

    WHERE P.Is_Submitted = 0

    AND P.Created_Date < '2010-01-01'

    Regards,

    Todd Fifield

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply