September 22, 2015 at 8:13 am
Please help me re-write this query. I am sure we can get rid of multiple self joins. Data table in my query is HUGE ( 1 Billion + rows)
Thank you
SELECT COUNT(*)
FROM unit u
INNER JOIN Data_Entity Outtsed ON Outtsed.category = 'unit outage' AND Outtsed.category_id = u.unit_id
INNER JOIN Data_model mr ON mr.model_id = u.model_id
INNER JOIN Data Outtsd ON Outtsed.entity_id = Outtsd.entity_id AND Outtsd.model_id = u.model_id
LEFT JOIN Data_Entity ThUCtsed ON ThUCtsed.category = 'thermal UC' AND ThUCtsed.category_id = u.unit_id
LEFT JOIN Data ThUCtsd ON ThUCtsed.entity_id = ThUCtsd.entity_id AND ThUCtsd.model_id = u.model_id AND ThUCtsd.date = Outtsd.date
LEFT JOIN Data_Entity ThSStsed ON ThSStsed.category = 'thermal SS' AND ThSStsed.category_id = u.unit_id
LEFT JOIN Data ThSStsd ON ThSStsed.entity_id = ThSStsd.entity_id AND ThSStsd.model_id = u.model_id AND ThSStsd.date = Outtsd.date
LEFT JOIN Data_Entity HFtsed ON HFtsed.category = 'hydro forecast' AND HFtsed.category_id = u.unit_id
LEFT JOIN Data HFtsd ON HFtsed.entity_id = HFtsd.entity_id AND HFtsd.model_id = u.model_id AND HFtsd.date = Outtsd.date
LEFT JOIN Data_Entity PStsed ON PStsed.category = 'pumps forecast' AND PStsed.category_id = u.unit_id
LEFT JOIN Data PStsd ON PStsed.entity_id = PStsd.entity_id AND PStsd.model_id = u.model_id AND PStsd.date = Outtsd.date
LEFT JOIN Data_Entity SOLtsed ON SOLtsed.category = 'solar forecast' AND SOLtsed.category_id = u.unit_id
LEFT JOIN Data SOLtsd ON SOLtsed.entity_id = SOLtsd.entity_id AND SOLtsd.model_id = u.model_id AND SOLtsd.date = Outtsd.date
LEFT JOIN Data_Entity WINtsed ON WINtsed.category = 'wind forecast' AND WINtsed.category_id = u.unit_id
LEFT JOIN Data WINtsd ON WINtsed.entity_id = WINtsd.entity_id AND WINtsd.model_id = u.model_id AND WINtsd.date = Outtsd.date
September 22, 2015 at 8:25 am
If you take all those LEFT JOINs away, do you get a different result?
I mean, what happens if you try this?
select count(*)
from unit u
join Data_Entity de on de.category = 'unit outage'
and de.category_id = u.unit_id
join Data_model dm on dm.model_id = u.model_id
join Data d on de.entity_id = d.entity_id
and d.model_id = u.model_id
Best practice suggests that you should schema-qualify your object names.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
September 22, 2015 at 8:40 am
There are 2 issues here:
Joins in general can generate duplicate rows which might affect the count. If you remove any join it could reduce the row count.
Inner joins can reduce the number of rows which will affect the count as well.
Basically, we can't tell you which joins aren't necessary because that's defined by the design of your database and the requirements for this particular query.
September 22, 2015 at 8:47 am
Thanks Phil
U r the best. I dont know what i was thinking when i wrote that query.over thinking i guess.
It worked.
Thank you all.
September 22, 2015 at 11:56 am
ekant_alone (9/22/2015)
Thanks PhilU r the best. I dont know what i was thinking when i wrote that query.over thinking i guess.
It worked.
Thank you all.
Glad to help. But please remember, as Luis pointed out, that the results between the two versions could be quite different, depending on your data.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply