October 24, 2021 at 10:23 am
Hi all,
I'm still pretty new to SQL and could use some help getting this query to work. The ultimate goal is to join two temporary tables.
In the first temporary table, I pulled all of the distinct rows from a table called 'Table A' and used a JOIN clause to combine it with a table called 'Table_B' so that I could apply a condition - in this case, I only wanted the most recently submitted records.
This is the query I used to accomplish this:
CREATE TEMPORARY TABLE temp_A as
SELECT DISTINCT a.ID,
a.Year,
a.Variable_1,
a.Variable_2,
a.Variable_3,
a.Variable_4
FROM db_prod.Table_A a
INNER JOIN
(SELECT * FROM db_prod.Table_B
WHERE isMostRecentSubmission = True) AS b
ON a.ID = b.ID;
This is what the output table looks like:
ID Year Var_1 Var_2 Var_3 Var_4
1001 2019 X 100 1 2
1002 2019 Y 101 3 1
1001 2020 Z 102 1 4
1003 2020 Y 103 2 4
1002 2021 Z 101 4 3
For the second temp table, if unique combos for 'ID', 'Var_1', and 'Var_2' appear in multiple years in the temporary table 'temp_A', I only kept the row with the most recent year and dropped the previous years.
Here is the query:
CREATE TEMPORARY TABLE temp_B as
SELECT ID, Var_1, Var_2
max(Year) as 'Max_Year'
FROM temp_B
GROUP BY ID, Var_1, Var_2;
This is what the output table looks like:
ID Var_1 Var_2 Max_Year
1002 Y 101 2019
1003 Y 103 2020
1002 Z 101 2021
The ultimate goal is to use the 'temp_A' table to bring in the missing rows: ('Var_3' & 'Var_4') and join them back onto the 'temp_B' table.
So I want the final output table to look like this:
Max_Year ID Var_1 Var_2 Var_3 Var_4
2019 1002 Y 101 3 1
2020 1003 Y 103 2 4
2021 1002 Z 101 4 3
I'm pretty sure it's simple, I just can't wrap my head around how to do it. Let me know if I didn't explain something properly or you need additional context. Thanks!
NOTE: The more efficient the query the better, as the 'temp_A' table has over 280K records and the 'temp_B' table has over 175K observations and I've been having issues with the server losing connection.
October 25, 2021 at 11:10 am
Thanks for posting your issue and hopefully someone will answer soon.
This is an automated bump to increase visibility of your question.
October 25, 2021 at 3:26 pm
First, I think you would likely get better support on a MySQL forum than a SQL Server forum.
Second, my understanding of MySQL is that it keeps the temporary tables in the same way that SQL server does. A quick google confirmed this - "MySQL removes the temporary table automatically when the session ends or the connection is terminated". So you could just join temp_A on temp_B on the ID column.
Now, for efficiency, I have a feeling that is more likely related to indexes as you don't have a lot of rows. I have tables with hundreds of millions of rows and even that isn't really that large of data and I haven't had timeout issues.
I expect your timeout issues (server losing connection) is a configurable option in MySQL either in the connection string or on the server itself.
But to JOIN them, I expect it would just be a simple "SELECT <columns> FROM temp_A JOIN temp_B ON temp_A.ID = temp_B.ID", but you may need to specify the join type. I do not know MySQL well (or really at all), so I would recommend looking up the syntax if you are not certain.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply