Skip to content

feat: Add openqa-llm-investigate and integrate into auto-review hook#536

Open
okurz wants to merge 5 commits intoos-autoinst:masterfrom
okurz:feature/003_llm_investigation
Open

feat: Add openqa-llm-investigate and integrate into auto-review hook#536
okurz wants to merge 5 commits intoos-autoinst:masterfrom
okurz:feature/003_llm_investigation

Conversation

@okurz
Copy link
Copy Markdown
Member

@okurz okurz commented Apr 11, 2026

Motivation:
We want to integrate LLM analysis into the openQA investigation workflow to
provide concise summaries of test failures. This helps reviewers quickly
understand if an issue is a new product regression, a test regression, or an
infrastructure problem, without scheduling potentially costly openqa-investigate
jobs prematurely.

Design Choices:

  • Created a standalone Python script openqa-llm-investigate using typer
    and httpx.
  • The script acts as a gatekeeper in
    openqa-label-known-issues-and-investigate-hook: it parses the LLM's response
    and only outputs the job URL to trigger further bisections if the LLM
    determines it is necessary.
  • Fetches job details, test results, and test history from the openQA API to
    build a comprehensive prompt.
  • Used pytest and unittest.mock for unit testing the Python script.
  • Updated the existing Bash test suite to mock and assert the execution of
    openqa-llm-investigate.

Example run openqa-llm-investigate -v https://openqa.opensuse.org/tests/5841514 --dry:

INFO: HTTP Request: GET https://openqa.opensuse.org/api/v1/jobs/5841514 "HTTP/1.1 200 OK"
INFO: HTTP Request: GET https://openqa.opensuse.org/tests/5841514/file/autoinst-log.txt "HTTP/1.1 200 OK"
INFO: HTTP Request: GET https://openqa.opensuse.org/tests/5841514/investigation_ajax "HTTP/1.1 200 OK"
INFO: HTTP Request: GET https://openqa.opensuse.org/api/v1/jobs?build=20260409-okurz-llm&test=extra_tests_gnome-okurz-llm&result=failed "HTTP/1.1 200 OK"
INFO: HTTP Request: POST http://localhost:8080/v1/chat/completions "HTTP/1.1 200 OK"
https://openqa.opensuse.org/tests/5841514
**LLM Investigation summary:**

The issue is an existing, recurring problem (failed 10 times in this build) that appears to be a product regression, as the failure is rooted in a low-level kernel panic/fatal page fault during the `multi_users_dm` test. Since the investigation info states "None available," we cannot determine if it is new compared to the last good build, but the high failure count suggests it is a persistent system instability. INVESTIGATE: YES, because the failure is a critical, low-level kernel crash affecting multiple runs in the current build, requiring deep system debugging.

Generated and posted example comment:
https://openqa.opensuse.org/tests/5841514#comment-887840

Benefits:

  • Reduces the number of unnecessary and costly openqa-investigate jobs by
    filtering them through an LLM first.
  • Provides immediate, actionable summaries of failures directly as openQA
    comments.
  • Improves efficiency of test reviewers by providing context directly in the job
    page.

Related issue: https://progress.opensuse.org/issues/198056

@okurz okurz marked this pull request as draft April 11, 2026 13:57
@okurz okurz force-pushed the feature/003_llm_investigation branch 4 times, most recently from 667281a to 95d9ae0 Compare April 11, 2026 18:24
@okurz okurz marked this pull request as ready for review April 11, 2026 18:25
okurz added 2 commits April 17, 2026 08:05
Motivation:
The CI was failing because it relied on a hardcoded list of dependencies
in the workflow file, which didn't include the newly added 'httpx' and
'typer'.

Design Choices:
- Added a '[build-system]' block to 'pyproject.toml' to make the
  project installable via pip.
- Moved 'dev' dependency group to '[project.optional-dependencies]' for
  better compatibility with standard pip.
- Updated '.github/workflows/ci.yml' to use 'pip install .[dev]' instead
  of a hardcoded list.

Benefits:
- Future-proofs the CI by automatically including any new dependencies
  added to 'pyproject.toml'.
- Simplifies the workflow configuration.
Motivation:
The CI was failing during 'pip install .[dev]' because setuptools was unable
to find 'Readme.md' (the file is 'README.md'), was failing to discover
packages in a flat layout with multiple top-level directories, and was
missing configuration for dynamic versioning.

Design Choices:
- Corrected readme filename to 'README.md'.
- Explicitly set 'packages = []' in '[tool.setuptools]' as this project is a
  collection of standalone scripts, not a traditional Python package.
@okurz okurz force-pushed the feature/003_llm_investigation branch from 95d9ae0 to 1edd8e1 Compare April 17, 2026 06:05
Comment thread openqa-llm-investigate Outdated
Comment thread openqa-llm-investigate Outdated
Comment thread openqa-llm-investigate Outdated
okurz added 3 commits April 17, 2026 18:51
Motivation:
We want to integrate LLM analysis into the openQA investigation workflow to
provide concise summaries of test failures. This helps reviewers quickly
understand if an issue is a new product regression, a test regression, or an
infrastructure problem, without scheduling potentially costly openqa-investigate
jobs prematurely.

Design Choices:
- Created a standalone Python script `openqa-llm-investigate` using `typer`
  and `httpx`.
- Fetches job details, test results, and test history from the openQA API to
  build a comprehensive prompt.
- Used `pytest` and `unittest.mock` for unit testing the Python script.
- Updated the existing Bash test suite to mock and assert the execution of
  `openqa-llm-investigate`.

Benefits:
- Reduces the number of unnecessary and costly `openqa-investigate` jobs by
  filtering them through an LLM first.
- Provides immediate, actionable summaries of failures directly as openQA
  comments.
- Improves efficiency of test reviewers by providing context directly in the job
  page.

Related issue: os-autoinst/os-autoinst#2857
Motivation:
Ensure that costly openqa-investigate and bisection jobs are only
triggered after an LLM has confirmed the necessity.

Design Choices:
- Modified 'investigate-and-bisect' in
  'openqa-label-known-issues-and-investigate-hook' to call
  'openqa-llm-investigate'.
- The bash function now captures the output of the LLM script. If no
  URL is returned (meaning the LLM decided against investigation),
  the process terminates early.
- Updated the test suite to include mocks for the LLM script and
  verified both the 'YES' and 'NO' investigation paths.

Benefits:
- Prevents redundant resource consumption by filtering investigation
  candidates through an intelligent gatekeeper.
- Provides consistent behavior with the existing 'label' mechanism.
Motivation:
Ensure the investigation workflow continues even if the LLM server is
unavailable or the openqa-llm-investigate script fails for other reasons.

Design Choices:
- Modified the investigate-and-bisect function to catch failures from
  openqa-llm-investigate.
- Added a warning message when a failure occurs.
- Implemented a fallback that uses the original test URL for standard
  investigation when the LLM script fails.
- Expanded the Bash test suite to verify the fallback mechanism and
  ensure correct behavior when the LLM script suggests skipping.

Benefits:
- Increases robustness of the automated investigation pipeline.
- Prevents blocking the entire investigation process due to transient
  LLM service issues.
@okurz okurz force-pushed the feature/003_llm_investigation branch from 1edd8e1 to 1456376 Compare April 17, 2026 16:51
@okurz
Copy link
Copy Markdown
Member Author

okurz commented Apr 17, 2026

included all suggestions from @d3flex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants