I’ll be honest, ever since I did a SQL Homework about doing code reviews I’ve wanted to do a blog post about them. Recently Emily Krager (TikTok | Twitter) did a TikTok about code review suggestions which seemed like a good excuse for me to do this. If you don’t follow her I recommend it, she does a great job of combining humor and technology and is just a lot of fun to listen to. Here is her list as best I was able to transcribe it.
- Does every new variable and function actually get used?
- Does all of the Boolean logic make sense?
- Are we forgetting any edge cases (empty/null states for example)?
- Testing. Were there any tests added? Can there be tests added? Is there a reason tests can’t be added? If so what and can we modify the code to make it more testable?
- Ask a lot of questions. For example: Why is this here? Why did you do it this way? Is there a different way that you considered? (Documentation mentioned.)
- Is this extensible? If we have to change something later how hard is it going to be?
- Formatting check. Auto formatters help a lot!
Now, database coding is somewhat different than normal coding, but that doesn’t mean that there isn’t quite a bit of overlap. So using her list as a template here is my idea for a database code review. Feel free to add to this in the comments, and of course I’m not perfect and you certainly don’t have to agree with me.
- Does every new variable get used? In each query does every table referenced have a purpose?
- We have variables just like any coding language but on top of checking for variable usage, check for table usage in queries. I can’t tell you how many times I’ve run across a table in a query that doesn’t actually get referenced anywhere else in the query. Now sometimes a table in an INNER JOIN is there as a way to restrict the output but if so, let’s be aware that that’s why it’s there. It can give you a handle on tuning the thing later. In fact any extra code in a query that doesn’t have a specific purpose should be carefully looked at and possibly removed.
- Does all of the logic make sense.
- WHERE clauses, JOIN/ONs, subqueries, etc. The list seems never ending sometimes. We have a LOT of logic in our queries. Carefully confirm that your query logic is what you intended it to be. An OR in the wrong place, or a condition in an ON instead of the WHERE can give you data that you did not expect. Just because you are getting data that looks correct now does not mean it will be correct in production, or even next week.
- Yes, we have normal code logic as well. Make sure it does what’s expected.
- Edge cases!!!!!!
- One of the most important things I learned in college was testing/checking edge cases. And with data we get a lot of them, not just NULL, but invalid dates, letters in numbers, too big a value, to small, etc. Your code doesn’t need to check them all, but you should be aware of them.
- Testing
- Emily was talking about having formal tests to be sure your output is correct and that is equally as important for us. Just ask Steve Jones (blog|twitter) sometime. He (among others) does whole talks about it. It’s something that we in the database world need to get better at IMO.
- I’m going to add internal error handling to this because again, it’s something that database people need to get better at using all the time. Is there internal error handling? And if not why not? And if so, is it appropriate?
- Ask a lot of questions?
- This kind of reminds me of Rubber Duck Debugging. Which, I’ll be honest I’d thought I’d written a post on but I can’t find it. So expect one soon. Basically though you explain your problem to the duck. In this case the programmer is explaining themselves to the reviewer. Although I’m going to hope that you, as the person doing the code review, can come up with some better questions than the duck does.
- Is this extensible? Will it scale?
- Exactly as Emily said it’s important that we be able to easily expand our code later.
- That said, for us, it’s even more important that your code is scalable. Your code works great at 10 rows of data, 100, 1000, even 10,000 rows. But does it work at a million? A billion?
- Formatting.
- Can someone else easily read your code? Do you have formatting standards? Are they being followed? Auto formatters are helpful here, but regardless, make sure the code is well formatted.
- Documentation.
- No, code is not self formatting. Well, anything past a certain level of complexity anyway. Is the code commented sufficiently? And by sufficiently I mean is the junior DBA who’s going to be handed this monstrosity next month going to be able to follow along with what you are doing.
- Data types.
- Not specific to database work, but probably more important here than in standard code. Do your variable data types match what they are being compared against? Do all of your comparisons in your code/queries use the same data type? If not is there a way to fix that?
And of course none of this covers table structures etc. This is just for doing a code review. You’ll also want to have a separate review for your database structures.