feat: Add openqa-llm-investigate and integrate into auto-review hook by okurz · Pull Request #536 · os-autoinst/os-autoinst-scripts

okurz · 2026-04-11T13:54:36Z

Motivation:
We want to integrate LLM analysis into the openQA investigation workflow to
provide concise summaries of test failures. This helps reviewers quickly
understand if an issue is a new product regression, a test regression, or an
infrastructure problem, without scheduling potentially costly openqa-investigate
jobs prematurely.

Design Choices:

Created a standalone Python script openqa-llm-investigate using typer
and httpx.
The script acts as a gatekeeper in
openqa-label-known-issues-and-investigate-hook: it parses the LLM's response
and only outputs the job URL to trigger further bisections if the LLM
determines it is necessary.
Fetches job details, test results, and test history from the openQA API to
build a comprehensive prompt.
Used pytest and unittest.mock for unit testing the Python script.
Updated the existing Bash test suite to mock and assert the execution of
openqa-llm-investigate.

Example run openqa-llm-investigate -v https://openqa.opensuse.org/tests/5841514 --dry:

INFO: HTTP Request: GET https://openqa.opensuse.org/api/v1/jobs/5841514 "HTTP/1.1 200 OK"
INFO: HTTP Request: GET https://openqa.opensuse.org/tests/5841514/file/autoinst-log.txt "HTTP/1.1 200 OK"
INFO: HTTP Request: GET https://openqa.opensuse.org/tests/5841514/investigation_ajax "HTTP/1.1 200 OK"
INFO: HTTP Request: GET https://openqa.opensuse.org/api/v1/jobs?build=20260409-okurz-llm&test=extra_tests_gnome-okurz-llm&result=failed "HTTP/1.1 200 OK"
INFO: HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
https://openqa.opensuse.org/tests/5841514
**LLM Investigation summary:**

The issue is an existing, recurring problem (failed 10 times in this build) that appears to be a product regression, as the failure is rooted in a low-level kernel panic/fatal page fault during the `multi_users_dm` test. Since the investigation info states "None available," we cannot determine if it is new compared to the last good build, but the high failure count suggests it is a persistent system instability. INVESTIGATE: YES, because the failure is a critical, low-level kernel crash affecting multiple runs in the current build, requiring deep system debugging.

Generated and posted example comment:
https://openqa.opensuse.org/tests/5841514#comment-887840

Benefits:

Reduces the number of unnecessary and costly openqa-investigate jobs by
filtering them through an LLM first.
Provides immediate, actionable summaries of failures directly as openQA
comments.
Improves efficiency of test reviewers by providing context directly in the job
page.

Related issue: https://progress.opensuse.org/issues/198056

Motivation: The CI was failing because it relied on a hardcoded list of dependencies in the workflow file, which didn't include the newly added 'httpx' and 'typer'. Design Choices: - Added a '[build-system]' block to 'pyproject.toml' to make the project installable via pip. - Moved 'dev' dependency group to '[project.optional-dependencies]' for better compatibility with standard pip. - Updated '.github/workflows/ci.yml' to use 'pip install .[dev]' instead of a hardcoded list. Benefits: - Future-proofs the CI by automatically including any new dependencies added to 'pyproject.toml'. - Simplifies the workflow configuration.

Motivation: The CI was failing during 'pip install .[dev]' because setuptools was unable to find 'Readme.md' (the file is 'README.md'), was failing to discover packages in a flat layout with multiple top-level directories, and was missing configuration for dynamic versioning. Design Choices: - Corrected readme filename to 'README.md'. - Explicitly set 'packages = []' in '[tool.setuptools]' as this project is a collection of standalone scripts, not a traditional Python package.

Motivation: We want to integrate LLM analysis into the openQA investigation workflow to provide concise summaries of test failures. This helps reviewers quickly understand if an issue is a new product regression, a test regression, or an infrastructure problem, without scheduling potentially costly openqa-investigate jobs prematurely. Design Choices: - Created a standalone Python script `openqa-llm-investigate` using `typer` and `httpx`. - Fetches job details, test results, and test history from the openQA API to build a comprehensive prompt. - Used `pytest` and `unittest.mock` for unit testing the Python script. - Updated the existing Bash test suite to mock and assert the execution of `openqa-llm-investigate`. Benefits: - Reduces the number of unnecessary and costly `openqa-investigate` jobs by filtering them through an LLM first. - Provides immediate, actionable summaries of failures directly as openQA comments. - Improves efficiency of test reviewers by providing context directly in the job page. Related issue: os-autoinst/os-autoinst#2857

Motivation: Ensure that costly openqa-investigate and bisection jobs are only triggered after an LLM has confirmed the necessity. Design Choices: - Modified 'investigate-and-bisect' in 'openqa-label-known-issues-and-investigate-hook' to call 'openqa-llm-investigate'. - The bash function now captures the output of the LLM script. If no URL is returned (meaning the LLM decided against investigation), the process terminates early. - Updated the test suite to include mocks for the LLM script and verified both the 'YES' and 'NO' investigation paths. Benefits: - Prevents redundant resource consumption by filtering investigation candidates through an intelligent gatekeeper. - Provides consistent behavior with the existing 'label' mechanism.

Motivation: Ensure the investigation workflow continues even if the LLM server is unavailable or the openqa-llm-investigate script fails for other reasons. Design Choices: - Modified the investigate-and-bisect function to catch failures from openqa-llm-investigate. - Added a warning message when a failure occurs. - Implemented a fallback that uses the original test URL for standard investigation when the LLM script fails. - Expanded the Bash test suite to verify the fallback mechanism and ensure correct behavior when the LLM script suggests skipping. Benefits: - Increases robustness of the automated investigation pipeline. - Prevents blocking the entire investigation process due to transient LLM service issues.

okurz · 2026-04-17T16:52:05Z

included all suggestions from @d3flex

okurz marked this pull request as draft April 11, 2026 13:57

okurz force-pushed the feature/003_llm_investigation branch 4 times, most recently from 667281a to 95d9ae0 Compare April 11, 2026 18:24

okurz marked this pull request as ready for review April 11, 2026 18:25

okurz added 2 commits April 17, 2026 08:05

okurz force-pushed the feature/003_llm_investigation branch from 95d9ae0 to 1edd8e1 Compare April 17, 2026 06:05

d3flex reviewed Apr 17, 2026

View reviewed changes

Comment thread openqa-llm-investigate Outdated

d3flex reviewed Apr 17, 2026

View reviewed changes

Comment thread openqa-llm-investigate Outdated

d3flex reviewed Apr 17, 2026

View reviewed changes

Comment thread openqa-llm-investigate Outdated

Martchus approved these changes Apr 17, 2026

View reviewed changes

okurz added 3 commits April 17, 2026 18:51

okurz force-pushed the feature/003_llm_investigation branch from 1edd8e1 to 1456376 Compare April 17, 2026 16:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add openqa-llm-investigate and integrate into auto-review hook#536

feat: Add openqa-llm-investigate and integrate into auto-review hook#536
okurz wants to merge 5 commits intoos-autoinst:masterfrom
okurz:feature/003_llm_investigation

okurz commented Apr 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

okurz commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

okurz commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

okurz commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

okurz commented Apr 11, 2026 •

edited

Loading