July 10, 2020 at 3:05 pm
Sorry if this is in the wrong forum... feel free to move it.
I'm okay with T-SQL, but BigQuery boggles my mind. Is there a good reference comparing the two somewhere? I was looking for a little bit of something sort of like an intro and recipes for converting between the two.
thanks!
July 10, 2020 at 6:48 pm
Haven't seen that. I did run across this item on learning BigQuery
July 10, 2020 at 7:05 pm
Thanks, Steve, I'll check it out. (Gotta say, I never expected to have to help my nephew figure his way through BigQuery, but here I am!)
This book looks promising, but it's just about BigQuery...
Might be the best I'm going to do for the time being.
July 10, 2020 at 7:13 pm
I haven't seen a comparison.
that being said, if you're learning it, maybe you want to write one. I'd pay you, even for a series of articles as you learn the differences in the languages.
July 10, 2020 at 7:48 pm
Be warned, I'm a rank beginner... I'm absolutely clueless about BigQuery, but that might be a good thing. I guess I might have to just make an account for BigQuery (BQ, because I'm lazy) and use data there. At the moment, I'm thinking of doing something like:
One thing I was hoping to figure out (which I may do with CTEs) was solving the "sanity check" question - Well, running a query against an enormous dataset can take 15 minutes, so I wanted a way to test on the smallest reasonable dataset I could. I haven't figured it out yet, but I was tempted to create 2 CTEs -- one for "today" and one for "yesterday" (except the way things are in Nephew's BQ dataset is that each one is in a separate table, and the table names change). So I'd do a TOP/LIMIT query for each and compare a thin horizontal "slice" of data from each just to see if things make sense or are working.
part of the fun is that I have zero clue what to expect. The other problem is getting my head around Python and Pandas, because I think that's the sane way of dealing with the result sets from BQ. (And something like Notebooks)... because it would be self-documenting.
Any thoughts?
Pieter
July 10, 2020 at 8:19 pm
It might be interesting to just see your impressions and experiments. I'd think you present this as "what I learned" rather than "this is what you should do"
July 10, 2020 at 9:04 pm
Oh, I definitely wouldn't do that! It would be more like "Notes from the road: what I learned (and maybe didn't) using BigQuery"
July 11, 2020 at 3:50 pm
Okay, just found one really interesting feature... described here: https://cloud.google.com/bigquery/docs/querying-wildcard-tables#limitations
It makes sense if your data pipeline generates something like CSV files with a format of [standard file prefix]MMDDYY
the part to keep in mind is that in BigQuery, you often upload a series of huge files/CSVs/whatever. In addition to filtering what's IN a table, in BQ, you can filter the tables that "participate" in the query, like this:
SELECT
max,
ROUND((max-32)*5/9,1) celsius,
mo,
da,
year
FROM
bigquery-public-data.noaa_gsod.gsod19*
WHERE
max != 9999.9 # code for missing data
AND _TABLE_SUFFIX BETWEEN '29' and '35'
ORDER BY
max DESC
It's like partitioned tables without the hassle of partitioning. It also means you can import entire files with the same structure into their own tables, and then combine them with _TABLE_SUFFIX instead of writing a messy UNION ALL query that would require dynamic T-SQL. If you're doing time-based analysis (like changes in value or whatever over time), you can query all the tables at once. (Super handy if you're testing a query ... you can just eliminate all but the first table by changing the _TABLE_SUFFIX filter so only one table is returned.) Funky, but I can definitely see how this would be helpful if you're doing queries on a massive scale. I seriously think it's time for a book! Otherwise, I might spend too much time thinking in terms of how SQL Server does it, and the two database engines are vastly different.
Once upon a time, I did cancer research, and their studies were all in separate databases/tables. It would have been pretty wild to do a query using something like _TABLE_SUFFIX (I wonder if there's a PREFIX version of that.) to get ALL the data from ALL the protocols of the same type (like "lung", "central nervous system", etc) and just do a super quick summary of *all* of them at once, just to look for patterns.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply