-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Environment details
- OS: Linux (GKE container)
- Python: 3.12
google-cloud-bigqueryversion: 3.30.0 (also confirmed present on latestmain)
Steps to reproduce
Call query_job.to_geodataframe(progress_bar_type="tqdm", ...) on a query that takes longer than 0.5s. The issue is intermittent and affects a small fraction of queries. In my particular example it was a MERGE query, could be that it's some edge case related to how those plans are expressed? But I ran a batch of a dozen of these queries and only 2 hit the issue (and I've also never seen it before despite running this same structure of query many times).
Stack trace
File "google/cloud/bigquery/job/query.py", line 2154, in to_geodataframe
query_result = wait_for_query(self, progress_bar_type, max_results=max_results)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "google/cloud/bigquery/_tqdm_helpers.py", line 113, in wait_for_query
current_stage = query_job.query_plan[i]
~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
Description
In _tqdm_helpers.py:wait_for_query, the index i is incremented on line 133 whenever a completed stage is detected, but the bounds check on line 131 (i < default_total - 1) uses default_total from the current iteration. On the next iteration, query_job.query_plan is re-read (line 111-113) after reload() has refreshed the job state from the server, and its length may no longer be consistent with i.
i = 0
while True:
if query_job.query_plan:
default_total = len(query_job.query_plan)
current_stage = query_job.query_plan[i] # <-- IndexError here
...
try:
query_result = query_job.result(timeout=0.5)
...
break
except concurrent.futures.TimeoutError:
query_job.reload()
if current_stage:
if current_stage.status == "COMPLETE":
if i < default_total - 1:
progress_bar.update(i + 1)
i += 1
continueThere is no bounds check on i before accessing query_job.query_plan[i] on line 113. Adding something like if i < len(query_job.query_plan) before the access would prevent the IndexError.