Skip to content

IndexError in _tqdm_helpers.wait_for_query when query_plan changes between iterations #16168

@bnaul

Description

@bnaul

Environment details

  • OS: Linux (GKE container)
  • Python: 3.12
  • google-cloud-bigquery version: 3.30.0 (also confirmed present on latest main)

Steps to reproduce

Call query_job.to_geodataframe(progress_bar_type="tqdm", ...) on a query that takes longer than 0.5s. The issue is intermittent and affects a small fraction of queries. In my particular example it was a MERGE query, could be that it's some edge case related to how those plans are expressed? But I ran a batch of a dozen of these queries and only 2 hit the issue (and I've also never seen it before despite running this same structure of query many times).

Stack trace

File "google/cloud/bigquery/job/query.py", line 2154, in to_geodataframe
    query_result = wait_for_query(self, progress_bar_type, max_results=max_results)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "google/cloud/bigquery/_tqdm_helpers.py", line 113, in wait_for_query
    current_stage = query_job.query_plan[i]
                    ~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

Description

In _tqdm_helpers.py:wait_for_query, the index i is incremented on line 133 whenever a completed stage is detected, but the bounds check on line 131 (i < default_total - 1) uses default_total from the current iteration. On the next iteration, query_job.query_plan is re-read (line 111-113) after reload() has refreshed the job state from the server, and its length may no longer be consistent with i.

i = 0
while True:
    if query_job.query_plan:
        default_total = len(query_job.query_plan)
        current_stage = query_job.query_plan[i]        # <-- IndexError here
        ...
    try:
        query_result = query_job.result(timeout=0.5)
        ...
        break
    except concurrent.futures.TimeoutError:
        query_job.reload()
        if current_stage:
            if current_stage.status == "COMPLETE":
                if i < default_total - 1:
                    progress_bar.update(i + 1)
                    i += 1
        continue

There is no bounds check on i before accessing query_job.query_plan[i] on line 113. Adding something like if i < len(query_job.query_plan) before the access would prevent the IndexError.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions