[WIP] feat: add watchdog for runtime session handling by radugheo · Pull Request #187 · UiPath/uipath-mcp-python

radugheo · 2026-03-23T13:45:35Z

Added SessionWatchdog that periodically monitors session health and cleans up dead or idle sessions. Sessions accumulate locally after the server removes them (e.g. when an agent finishes); the watchdog detects completed tasks and idle timeouts, then tears down the local session.
Also did the following refactors:

Unified session removal into a single remove_session method, replacing 4 inconsistent cleanup paths (_handle_signalr_session_closed, remove_session, _cleanup loop, _monitor_http_server_process)
Removed unused active_request_count from SessionHealthInfo
Added _cleanup_done guard to _handle_signalr_session_closed to prevent racing with shutdown
Fixed _run_server finally block cancelling _keep_alive_task redundantly (now only cancelled by _cleanup)
Extracted on_keep_alive_response closure into _on_keep_alive_response method
Extracted _get_server_env helper to deduplicate env var merging (also fixed _register mutating self._server.env by not copying)
Fixed _on_runtime_abort docstring and log message incorrectly referring to "SignalR" (it's an HTTP POST)
Replaced _wait_for_http_server_ready bare ValueError with UiPathMcpRuntimeError for consistency
Normalized {str(e)} to {e} in f-strings

Development Package

Use uipath pack --nolock to get the latest dev build from this PR (requires version range).
Add this package as a dependency in your pyproject.toml:

[project]
dependencies = [
  # Exact version:
  "uipath-mcp==0.1.4.dev1001870321",

  # Any version from PR
  "uipath-mcp>=0.1.4.dev1001870000,<0.1.4.dev1001880000"
]

[[tool.uv.index]]
name = "testpypi"
url = "https://test.pypi.org/simple/"
publish-url = "https://test.pypi.org/legacy/"
explicit = true

[tool.uv.sources]
uipath-mcp = { index = "testpypi" }

[tool.uv]
override-dependencies = [
    "uipath-mcp>=0.1.4.dev1001870000,<0.1.4.dev1001880000",
]

Copilot

Pull request overview

Adds a session “watchdog” mechanism to the CLI runtime so it can periodically detect and clean up unhealthy sessions (dead tasks / idle sessions), and introduces session health reporting to support that monitoring.

Changes:

Introduce SessionWatchdog background task with periodic session health checks and removal hooks.
Add SessionHealthInfo + activity tracking to session servers, and integrate watchdog lifecycle into UiPathMcpRuntime.
Add unit tests for watchdog behavior and update .gitignore for common IDE folders.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`tests/test_watchdog.py`	Adds unit tests covering watchdog detection logic and lifecycle.
`src/uipath_mcp/_cli/_runtime/_watchdog.py`	New watchdog implementation and default timeout/interval constants.
`src/uipath_mcp/_cli/_runtime/_session.py`	Adds `SessionHealthInfo`, transport identifiers, and last-activity tracking.
`src/uipath_mcp/_cli/_runtime/_runtime.py`	Implements watchdog provider methods and starts/stops watchdog with runtime lifecycle.
`.gitignore`	Ignores `.idea/`, `.vscode/`, and `.claude/`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-23T13:49:21Z

src/uipath_mcp/_cli/_runtime/_watchdog.py

+tracer = trace.get_tracer(__name__)
+
+WATCHDOG_CHECK_INTERVAL = 30  # seconds
+SESSION_IDLE_TIMEOUT = 60  # 1 minute for testing purposes - to be changed to 1 hour


SESSION_IDLE_TIMEOUT is set to 60 seconds with a note "for testing purposes" but this value is used as the default in production (UiPathMcpRuntime constructs SessionWatchdog(self) without overriding idle_timeout). This will aggressively terminate idle sessions after 1 minute. Please either change the default to the intended production timeout (e.g., 1 hour), or make the timeout configurable and ensure runtime passes the correct value.

Suggested change

SESSION_IDLE_TIMEOUT = 60 # 1 minute for testing purposes - to be changed to 1 hour

SESSION_IDLE_TIMEOUT = 3600 # 1 hour default idle timeout for production

Copilot · 2026-03-23T13:49:21Z

src/uipath_mcp/_cli/_runtime/_runtime.py

+        session_server = self._session_servers.pop(session_id, None)
+        if session_server is not None:
+            logger.warning(
+                f"Removing session {session_id}: {reason}"
+            )
+            try:
+                await session_server.stop()
+            except Exception:
+                logger.error(
+                    f"Error stopping session {session_id} during watchdog removal",
+                    exc_info=True,
+                )
+            await self._close_session_on_server(session_id)


remove_session() pops the session from _session_servers before awaiting session_server.stop() and the server-side DELETE. While those awaits are in-flight, _handle_signalr_message() can observe the session_id missing and create/start a new session server for the same session_id, racing with the close and potentially causing flapping. Consider preventing re-creation while a session is being removed (e.g., keep an entry with a 'closing' marker, or guard with an async lock, or only pop after the stop/close completes).

Suggested change

session_server = self._session_servers.pop(session_id, None)

if session_server is not None:

logger.warning(

f"Removing session {session_id}: {reason}"

)

try:

await session_server.stop()

except Exception:

logger.error(

f"Error stopping session {session_id} during watchdog removal",

exc_info=True,

)

await self._close_session_on_server(session_id)

session_server = self._session_servers.get(session_id)

if session_server is not None:

logger.warning(

f"Removing session {session_id}: {reason}"

)

try:

try:

await session_server.stop()

except Exception:

logger.error(

f"Error stopping session {session_id} during watchdog removal",

exc_info=True,

)

await self._close_session_on_server(session_id)

finally:

# Only remove the session from the registry after shutdown completes

self._session_servers.pop(session_id, None)

Copilot · 2026-03-23T13:49:22Z

src/uipath_mcp/_cli/_runtime/_runtime.py

    async def _cleanup(self) -> None:
        """Clean up all resources."""
        if self._cleanup_done:
            return
        self._cleanup_done = True

        await self._on_runtime_abort()

        if self._token_refresher:
            await self._token_refresher.stop()

        if self._keep_alive_task:
            self._keep_alive_task.cancel()
            try:
                await self._keep_alive_task
            except asyncio.CancelledError:
                pass

+        if self._watchdog:
+            await self._watchdog.stop()
+            self._watchdog = None


In _cleanup(), the watchdog is stopped only after awaiting _on_runtime_abort(), token refresher shutdown, and keep-alive cancellation. During those awaits, the watchdog can still run and concurrently mutate/stop sessions while cleanup is also stopping sessions, creating races and noisy logs. Consider stopping the watchdog immediately after setting _cleanup_done (or at least before awaiting long-running calls) to make cleanup deterministic.

Copilot

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-23T17:13:18Z

src/uipath_mcp/_cli/_runtime/_watchdog.py

+WATCHDOG_CHECK_INTERVAL = 30  # seconds
+SESSION_IDLE_TIMEOUT = 60  # 1 minute for testing purposes - to be changed to 1 hour
+


SESSION_IDLE_TIMEOUT defaults to 60 seconds (and UiPathMcpRuntime starts the watchdog with defaults). That will aggressively tear down sessions in production after 1 minute of inactivity and can also prematurely cancel sandboxed runtimes via remove_session(). Please bump the default to the intended value (e.g., 3600s) and/or make it configurable via runtime/server config so tests can still use a shorter timeout without affecting real runs.

Copilot · 2026-03-23T17:13:18Z

src/uipath_mcp/_cli/_runtime/_runtime.py

+            health = s.get_health_info()
+            runtime_sessions[sid] = {
+                "task_done": health.task_done,
+                "active_requests": len(s._active_requests),


_on_keep_alive_response() reaches into BaseSessionServer's private _active_requests to log active request counts. This tightly couples the runtime to session internals and makes refactors risky (especially since SessionHealthInfo was introduced for health reporting). Consider exposing an official API for this (e.g., add an active_request_count field back to SessionHealthInfo or add a public method/property on BaseSessionServer) and use that instead of accessing _active_requests directly.

Suggested change

"active_requests": len(s._active_requests),

"active_requests": health.active_request_count,

feat: add watchdog for runtime session handling

2c07433

radugheo marked this pull request as ready for review March 23, 2026 13:45

Copilot AI review requested due to automatic review settings March 23, 2026 13:45

Copilot started reviewing on behalf of radugheo March 23, 2026 13:46 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

radugheo added 2 commits March 23, 2026 17:38

refact: remove verbose session deletion code

9dfeb8d

refactor: solve runtime code inconsistencies

a32082a

radugheo added the build:dev Create a dev build from the pr label Mar 23, 2026

radugheo requested a review from Copilot March 23, 2026 17:09

Copilot started reviewing on behalf of radugheo March 23, 2026 17:09 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] feat: add watchdog for runtime session handling#187

[WIP] feat: add watchdog for runtime session handling#187
radugheo wants to merge 3 commits intomainfrom
feat/add-watchdog-for-sessions

radugheo commented Mar 23, 2026 •

edited by github-actions bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	SESSION_IDLE_TIMEOUT = 60 # 1 minute for testing purposes - to be changed to 1 hour
	SESSION_IDLE_TIMEOUT = 3600 # 1 hour default idle timeout for production

		WATCHDOG_CHECK_INTERVAL = 30 # seconds
		SESSION_IDLE_TIMEOUT = 60 # 1 minute for testing purposes - to be changed to 1 hour

	"active_requests": len(s._active_requests),
	"active_requests": health.active_request_count,

Conversation

radugheo commented Mar 23, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Development Package

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

radugheo commented Mar 23, 2026 •

edited by github-actions bot

Loading