diff --git a/.github/workflows/release-notes-check.yml b/.github/workflows/release-notes-check.yml index 2eb7cee1..9a9f0d1f 100644 --- a/.github/workflows/release-notes-check.yml +++ b/.github/workflows/release-notes-check.yml @@ -22,7 +22,7 @@ jobs: - name: Get changed files id: changed-files - uses: tj-actions/changed-files@v44 + uses: tj-actions/changed-files@v46.0.1 with: files_yaml: | code: diff --git a/.gitignore b/.gitignore index 02f0c972..003d72a2 100644 --- a/.gitignore +++ b/.gitignore @@ -45,3 +45,6 @@ flask_session tmp**cwd /tmp_images nul +/.github/plans +*.xlsx +/artifacts/tests diff --git a/CLAUDE.md b/CLAUDE.md index 4e9c2830..2ad07a8a 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -58,6 +58,7 @@ return render_template('page.html', settings=public_settings) ## Version Management +- Its important to update the version at the end of every plan - Version is stored in `config.py`: `VERSION = "X.XXX.XXX"` - When incrementing, only change the third segment (e.g., `0.238.024` -> `0.238.025`) - Include the current version in functional test file headers and documentation files @@ -83,7 +84,7 @@ return render_template('page.html', settings=public_settings) ## Release Notes -After completing code changes, offer to update `docs/explanation/release_notes.md`. +After completing plans and code changes, offer to update `docs/explanation/release_notes.md`. - Add entries under the current version from `config.py` - If the version was bumped, create a new section at the top: `### **(vX.XXX.XXX)**` diff --git a/README.md b/README.md index 31ea020b..5ebf8208 100644 --- a/README.md +++ b/README.md @@ -121,6 +121,58 @@ This step will begin the deployment process. azd up ``` +## Deployment Runtime Notes + +### Container +> [!NOTE] +> +> The container deployments of Simple Chat does NOT need this step, when you run `azd up` for new installs or `azd deploy` for updates, the container is configured to run with gunicorn. + +- The repo-provided `azd`, Bicep, Terraform, and Azure CLI deployers are **container-based** App Service deployments. +- For those container deployments, do **not** set an App Service Stack Settings Startup command. + - The container already starts Gunicorn through `application/single_app/Dockerfile`. + +## Native Python +- For **native Python App Service** deployments, deploy the `application/single_app` folder and set the App Service Startup command explicitly. + +Native Python deployment references: + +- [Manual deployment notes](./docs/reference/deploy/manual_deploy.md) +- [Manual setup steps](./docs/setup_instructions_manual.md#installing-and-deploying-the-application-code) +- [VS Code deployment steps](./docs/setup_instructions_manual.md#deploying-via-vs-code-recommended-for-simplicity) +- [Azure CLI ZIP deploy steps](./docs/setup_instructions_manual.md#deploying-via-azure-cli-zip-deploy) + +To set the Startup command in Azure Portal: + +1. Go to the App Service. +2. Open **Settings** > **Configuration** > **Stack Settings**. +3. Enter the following Startup command. +4. Save the change, then stop and start the app. + +Use this Startup command for native Python App Service deployments: + +```bash +python -m gunicorn -c gunicorn.conf.py app:app +``` + +> [!IMPORTANT] +> +> Running Simple Chat with gunicorn improves the experience with better request handling and concurrency. + +## Upgrade Paths + +- For a concise upgrade decision guide, see [docs/how-to/upgrade_paths.md](docs/how-to/upgrade_paths.md). + +### Container +- **Container-based upgrades** should usually start with `azd deploy` for code-only changes. Use `azd up` only when the release also changes infrastructure. +- If your App Service is already configured to pull from ACR and you want image-only rollouts, use the ACR/image refresh approach described in [docs/how-to/upgrade_paths.md](docs/how-to/upgrade_paths.md) instead of treating every release as a full reprovisioning event. + +### Native Python +- **Native Python App Service upgrades** should reuse the manual deployment path, validate the Startup command above, and deploy the `application/single_app` folder with VS Code or Azure CLI ZIP deploy. +- [Manual native Python upgrade guide](./docs/setup_instructions_manual.md#upgrading-the-application) +- [Native Python ZIP deploy reference](./docs/setup_instructions_manual.md#deploying-via-azure-cli-zip-deploy) +- [Native Python deployment notes](./docs/reference/deploy/manual_deploy.md) + ## Architecture ![Architecture](./docs/images/architecture.png) @@ -144,6 +196,7 @@ azd up - **Metadata Extraction (Optional)**: Apply an AI model (configurable GPT model via Admin Settings) to automatically generate keywords, two-sentence summaries, and infer author/date for uploaded documents. Allows manual override for richer search context. - **File Processing Logs (Optional)**: Enable verbose logging for all ingestion pipelines (workspaces and ephemeral chat uploads) to aid in debugging, monitoring, and auditing file processing steps. - **Redis Cache (Optional)**: Integrate Azure Cache for Redis to provide a distributed, high-performance session store. This enables true horizontal scaling and high availability by decoupling user sessions from individual app instances. +- **SQL Database Agents (Optional)**: Connect agents to Azure SQL or other SQL databases through configurable SQL Query and SQL Schema plugins. Database schema is automatically discovered and injected into agent instructions at load time, enabling agents to answer natural language questions by generating and executing SQL queries without requiring users to know table or column names. - **Authentication & RBAC**: Secure access via Azure Active Directory (Entra ID) using MSAL. Supports Managed Identities for Azure service authentication, group-based controls, and custom application roles (`Admin`, `User`, `CreateGroup`, `SafetyAdmin`, `FeedbackAdmin`). - **Supported File Types**: diff --git a/application/single_app/Dockerfile b/application/single_app/Dockerfile index 65483ac6..6f04b41a 100644 --- a/application/single_app/Dockerfile +++ b/application/single_app/Dockerfile @@ -7,15 +7,13 @@ FROM mcr.microsoft.com/azurelinux/base/python:3.12 AS builder ARG UID ARG GID -# Setup pip.conf if has content -COPY pip.conf.d/ /etc/pip.conf.d +# Copy pip.conf into the image for pip configuration +COPY docker-customization/pip.conf /etc/pip.conf # CA # copy certs to /etc/pki/ca-trust/source/anchors -COPY custom-ca-certificates/ /etc/ssl/certs -RUN mkdir -p /etc/pki/ca-trust/source/anchors/ \ - && update-ca-trust enable \ - && cp /etc/ssl/certs/*.crt /etc/pki/ca-trust/source/anchors/ \ +COPY docker-customization/custom-ca-certificates/ /etc/pki/ca-trust/source/anchors +RUN update-ca-trust enable \ && update-ca-trust extract ENV PYTHONUNBUFFERED=1 @@ -44,6 +42,7 @@ ARG UID ARG GID COPY --from=builder /etc/pki /etc/pki +COPY --from=builder /etc/ssl/certs /etc/ssl/certs COPY --from=builder /home/nonroot /home/nonroot COPY --from=builder /etc/passwd /etc/passwd COPY --from=builder /etc/group /etc/group @@ -59,8 +58,11 @@ ENV HOME=/home/nonroot \ PYTHONIOENCODING=utf-8 \ LANG=C.UTF-8 \ LC_ALL=C.UTF-8 \ - PYTHONUNBUFFERED=1 - + PYTHONUNBUFFERED=1 \ + SSL_CERT_FILE=/etc/ssl/certs/ca-bundle.crt \ + SSL_CERT_DIR=/etc/ssl/certs \ + REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-bundle.crt + WORKDIR /app # Copy application code and set ownership @@ -69,4 +71,4 @@ COPY --chown=${UID}:${GID} application/single_app ./ # Expose port EXPOSE 5000 -ENTRYPOINT [ "python3", "/app/app.py" ] +ENTRYPOINT ["python3", "-m", "gunicorn", "-c", "/app/gunicorn.conf.py", "app:app"] diff --git a/application/single_app/app.py b/application/single_app/app.py index 2354b1b5..ca74071a 100644 --- a/application/single_app/app.py +++ b/application/single_app/app.py @@ -75,6 +75,7 @@ from route_backend_public_prompts import * from route_backend_user_agreement import register_route_backend_user_agreement from route_backend_conversation_export import register_route_backend_conversation_export +from route_backend_thoughts import register_route_backend_thoughts from route_backend_speech import register_route_backend_speech from route_backend_tts import register_route_backend_tts from route_enhanced_citations import register_enhanced_citations_routes @@ -102,7 +103,7 @@ # Ensure filesystem session directory (when used) points to a writable path inside container. if SESSION_TYPE == 'filesystem': - app.config['SESSION_FILE_DIR'] = SESSION_FILE_DIR if 'SESSION_FILE_DIR' in globals() else os.environ.get('SESSION_FILE_DIR', '/app/flask_session') + app.config['SESSION_FILE_DIR'] = globals().get('SESSION_FILE_DIR', os.environ.get('SESSION_FILE_DIR', '/app/flask_session')) try: os.makedirs(app.config['SESSION_FILE_DIR'], exist_ok=True) except Exception as e: @@ -141,9 +142,30 @@ from functions_settings import get_settings from functions_authentication import get_current_user_id from functions_global_agents import ensure_default_global_agent_exists +from background_tasks import start_background_task_threads from route_external_health import * +_app_init_lock = threading.Lock() +_app_initialized = False +_background_tasks_lock = threading.Lock() +_background_tasks_started = False + + +def is_running_under_gunicorn(): + """Return True when the current process is a Gunicorn worker.""" + server_software = os.environ.get('SERVER_SOFTWARE', '') + return 'gunicorn' in server_software.lower() or bool(os.environ.get('GUNICORN_CMD_ARGS')) + + +def should_start_background_tasks(): + """Enable background loops unless the runtime explicitly disables them.""" + env_value = os.environ.get('SIMPLECHAT_RUN_BACKGROUND_TASKS') + if env_value is not None: + return env_value.strip().lower() not in ('0', 'false', 'no', 'off') + + return True + # =================== Session Configuration =================== def configure_sessions(settings): """Configure session backend (Redis or filesystem) once. @@ -160,7 +182,7 @@ def configure_sessions(settings): redis_client = None try: if redis_auth_type == 'managed_identity': - print("Redis enabled using Managed Identity") + log_event("Redis enabled using Managed Identity", level=logging.INFO) from config import get_redis_cache_infrastructure_endpoint credential = DefaultAzureCredential() redis_hostname = redis_url.split('.')[0] @@ -175,9 +197,25 @@ def configure_sessions(settings): socket_connect_timeout=5, socket_timeout=5 ) + elif redis_auth_type == 'key_vault': + log_event("Redis enabled using Key Vault Secret", level=logging.INFO) + from functions_keyvault import retrieve_secret_direct + redis_key_secret_name = settings.get('redis_key', '').strip() + redis_password = retrieve_secret_direct(redis_key_secret_name) + if redis_password: + redis_password = redis_password.strip() + redis_client = Redis( + host=redis_url, + port=6380, + db=0, + password=redis_password, + ssl=True, + socket_connect_timeout=5, + socket_timeout=5 + ) else: redis_key = settings.get('redis_key', '').strip() - print("Redis enabled using Access Key") + log_event("Redis enabled using Access Key", level=logging.INFO) redis_client = Redis( host=redis_url, port=6380, @@ -190,7 +228,7 @@ def configure_sessions(settings): # Test the connection redis_client.ping() - print("✅ Redis connection successful") + log_event("✅ Redis connection successful", level=logging.INFO) app.config['SESSION_TYPE'] = 'redis' app.config['SESSION_REDIS'] = redis_client @@ -212,202 +250,66 @@ def configure_sessions(settings): Session(app) # =================== Helper Functions =================== -@app.before_first_request -def before_first_request(): - print("Initializing application...") - settings = get_settings(use_cosmos=True) - app_settings_cache.configure_app_cache(settings, get_redis_cache_infrastructure_endpoint(settings.get('redis_url', '').strip().split('.')[0])) - app_settings_cache.update_settings_cache(settings) - sanitized_settings = sanitize_settings_for_logging(settings) - debug_print(f"DEBUG:Application settings: {sanitized_settings}") - sanitized_settings_cache = sanitize_settings_for_logging(app_settings_cache.get_settings_cache()) - debug_print(f"DEBUG:App settings cache initialized: {'Using Redis cache:' + str(app_settings_cache.app_cache_is_using_redis)} {sanitized_settings_cache}") - - initialize_clients(settings) - ensure_custom_logo_file_exists(app, settings) - # Enable Application Insights logging globally if configured - print("Setting up Application Insights logging...") - setup_appinsights_logging(settings) - logging.basicConfig(level=logging.DEBUG) - print("Application initialized.") - ensure_default_global_agent_exists() - - # Background task to check for expired logging timers - def check_logging_timers(): - """Background task that checks for expired logging timers and disables logging accordingly""" - while True: - try: - settings = get_settings() - current_time = datetime.now() - settings_changed = False - - # Check debug logging timer - if (settings.get('enable_debug_logging', False) and - settings.get('debug_logging_timer_enabled', False) and - settings.get('debug_logging_turnoff_time')): - - turnoff_time = settings.get('debug_logging_turnoff_time') - if isinstance(turnoff_time, str): - try: - turnoff_time = datetime.fromisoformat(turnoff_time) - except: - turnoff_time = None - - if turnoff_time and current_time >= turnoff_time: - debug_print(f"logging timer expired at {turnoff_time}. Disabling debug logging.") - settings['enable_debug_logging'] = False - settings['debug_logging_timer_enabled'] = False - settings['debug_logging_turnoff_time'] = None - settings_changed = True - - # Check file processing logs timer - if (settings.get('enable_file_processing_logs', False) and - settings.get('file_processing_logs_timer_enabled', False) and - settings.get('file_processing_logs_turnoff_time')): - - turnoff_time = settings.get('file_processing_logs_turnoff_time') - if isinstance(turnoff_time, str): - try: - turnoff_time = datetime.fromisoformat(turnoff_time) - except: - turnoff_time = None - - if turnoff_time and current_time >= turnoff_time: - print(f"File processing logs timer expired at {turnoff_time}. Disabling file processing logs.") - settings['enable_file_processing_logs'] = False - settings['file_processing_logs_timer_enabled'] = False - settings['file_processing_logs_turnoff_time'] = None - settings_changed = True - - # Save settings if any changes were made - if settings_changed: - update_settings(settings) - print("Logging settings updated due to timer expiration.") - - except Exception as e: - print(f"Error in logging timer check: {e}") - log_event(f"Error in logging timer check: {e}", level=logging.ERROR) - - # Check every 60 seconds - time.sleep(60) - - # Start the background timer check thread - timer_thread = threading.Thread(target=check_logging_timers, daemon=True) - timer_thread.start() - print("Logging timer background task started.") - - # Background task to check for expired approval requests - def check_expired_approvals(): - """Background task that checks for expired approval requests and auto-denies them""" - while True: - try: - from functions_approvals import auto_deny_expired_approvals - denied_count = auto_deny_expired_approvals() - if denied_count > 0: - print(f"Auto-denied {denied_count} expired approval request(s).") - except Exception as e: - print(f"Error in approval expiration check: {e}") - log_event(f"Error in approval expiration check: {e}", level=logging.ERROR) - - # Check every 6 hours (21600 seconds) - time.sleep(21600) - - # Start the approval expiration check thread - approval_thread = threading.Thread(target=check_expired_approvals, daemon=True) - approval_thread.start() - print("Approval expiration background task started.") - - # Background task to check retention policy execution time - def check_retention_policy(): - """Background task that executes retention policy at scheduled time""" - while True: - try: - settings = get_settings() - - # Check if any retention policy is enabled - personal_enabled = settings.get('enable_retention_policy_personal', False) - group_enabled = settings.get('enable_retention_policy_group', False) - public_enabled = settings.get('enable_retention_policy_public', False) - - if personal_enabled or group_enabled or public_enabled: - current_time = datetime.now(timezone.utc) - - # Check if next scheduled run time has passed - next_run = settings.get('retention_policy_next_run') - should_run = False - - if next_run: - try: - next_run_dt = datetime.fromisoformat(next_run) - # Run if we've passed the scheduled time - if current_time >= next_run_dt: - should_run = True - except Exception as parse_error: - print(f"Error parsing next_run timestamp: {parse_error}") - # If we can't parse, fall back to checking last_run - last_run = settings.get('retention_policy_last_run') - if last_run: - try: - last_run_dt = datetime.fromisoformat(last_run) - # Run if last run was more than 23 hours ago - if (current_time - last_run_dt).total_seconds() > (23 * 3600): - should_run = True - except: - should_run = True - else: - should_run = True - else: - # No next_run set, check last_run instead - last_run = settings.get('retention_policy_last_run') - if last_run: - try: - last_run_dt = datetime.fromisoformat(last_run) - # Run if last run was more than 23 hours ago - if (current_time - last_run_dt).total_seconds() > (23 * 3600): - should_run = True - except: - should_run = True - else: - # Never run before, execute now - should_run = True - - if should_run: - print(f"Executing scheduled retention policy at {current_time.isoformat()}") - from functions_retention_policy import execute_retention_policy - results = execute_retention_policy(manual_execution=False) - - if results.get('success'): - print(f"Retention policy execution completed: " - f"{results['personal']['conversations']} personal conversations, " - f"{results['personal']['documents']} personal documents, " - f"{results['group']['conversations']} group conversations, " - f"{results['group']['documents']} group documents, " - f"{results['public']['conversations']} public conversations, " - f"{results['public']['documents']} public documents deleted.") - else: - print(f"Retention policy execution failed: {results.get('errors')}") - - except Exception as e: - print(f"Error in retention policy check: {e}") - log_event(f"Error in retention policy check: {e}", level=logging.ERROR) - - # Check every 5 minutes for more responsive scheduling - time.sleep(300) - - # Start the retention policy check thread - retention_thread = threading.Thread(target=check_retention_policy, daemon=True) - retention_thread.start() - print("Retention policy background task started.") - - # Initialize Semantic Kernel and plugins - enable_semantic_kernel = settings.get('enable_semantic_kernel', False) - per_user_semantic_kernel = settings.get('per_user_semantic_kernel', False) - if enable_semantic_kernel and not per_user_semantic_kernel: - print("Semantic Kernel is enabled. Initializing...") - initialize_semantic_kernel() +def start_background_tasks(): + """Start background loops once per process when enabled for the current runtime.""" + global _background_tasks_started + + with _background_tasks_lock: + if _background_tasks_started: + return + + if not should_start_background_tasks(): + print("Background tasks disabled for this web process.") + _background_tasks_started = True + return + start_background_task_threads() + _background_tasks_started = True + + +def initialize_application(force=False): + """Initialize caches, clients, sessions, and optional background services once per process.""" + global _app_initialized + + with _app_init_lock: + if _app_initialized and not force: + return + + print("Initializing application...") + settings = get_settings(use_cosmos=True) + redis_hostname = settings.get('redis_url', '').strip().split('.')[0] + app_settings_cache.configure_app_cache( + settings, + get_redis_cache_infrastructure_endpoint(redis_hostname) + ) + app_settings_cache.update_settings_cache(settings) + sanitized_settings = sanitize_settings_for_logging(settings) + debug_print(f"DEBUG:Application settings: {sanitized_settings}") + sanitized_settings_cache = sanitize_settings_for_logging(app_settings_cache.get_settings_cache()) + debug_print(f"DEBUG:App settings cache initialized: {'Using Redis cache:' + str(app_settings_cache.app_cache_is_using_redis)} {sanitized_settings_cache}") + + initialize_clients(settings) + ensure_custom_logo_file_exists(app, settings) + print("Setting up Application Insights logging...") + setup_appinsights_logging(settings) + logging.basicConfig(level=logging.DEBUG) + ensure_default_global_agent_exists() + + start_background_tasks() + + enable_semantic_kernel = settings.get('enable_semantic_kernel', False) + per_user_semantic_kernel = settings.get('per_user_semantic_kernel', False) + if enable_semantic_kernel and not per_user_semantic_kernel: + print("Semantic Kernel is enabled. Initializing...") + initialize_semantic_kernel() + + configure_sessions(settings) + _app_initialized = True + print("Application initialized.") - # Unified session setup - configure_sessions(settings) + +@app.before_request +def ensure_application_initialized(): + initialize_application() @app.context_processor def inject_settings(): @@ -487,6 +389,16 @@ def markdown_filter(text): # Add the filter to the Jinja environment app.jinja_env.filters['markdown'] = markdown_filter +# Register a custom Jinja filter for nl2br (newline to
) +def nl2br_filter(value): + """Escape HTML then convert newline characters to
tags.""" + from markupsafe import escape, Markup + if not value: + return Markup('') + return Markup(str(escape(value)).replace('\n', '
\n')) + +app.jinja_env.filters['nl2br'] = nl2br_filter + # =================== Default Routes ===================== @app.route('/') @swagger_route(security=get_auth_security()) @@ -641,16 +553,27 @@ def list_semantic_kernel_plugins(): # ------------------- API User Agreement Routes ---------- register_route_backend_user_agreement(app) +# ------------------- API Thoughts Routes ---------------- +register_route_backend_thoughts(app) + # ------------------- Extenral Health Routes ---------- register_route_external_health(app) if __name__ == '__main__': - settings = get_settings(use_cosmos=True) - app_settings_cache.configure_app_cache(settings, get_redis_cache_infrastructure_endpoint(settings.get('redis_url', '').strip().split('.')[0])) - app_settings_cache.update_settings_cache(settings) - initialize_clients(settings) - debug_mode = os.environ.get("FLASK_DEBUG", "0") == "1" + use_gunicorn = os.environ.get("SIMPLECHAT_USE_GUNICORN", "0").strip().lower() in ('1', 'true', 'yes', 'on') + + if use_gunicorn and not debug_mode: + gunicorn_config_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'gunicorn.conf.py') + print(f"Starting Gunicorn using {gunicorn_config_path}") + os.execvp(sys.executable, [sys.executable, '-m', 'gunicorn', '-c', gunicorn_config_path, 'app:app']) + + if use_gunicorn and debug_mode: + print("⚠️ WARNING: Both Gunicorn and Flask debug mode are enabled, which is not supported. Please disable one of them, app will not run until resolved.") + log_event("WARNING: Running with both Gunicorn and Flask debug mode is not supported. Please disable one of them, app will not run until resolved.", level=logging.WARNING) + exit(1) + + initialize_application(force=True) if debug_mode: # Local development with HTTPS diff --git a/application/single_app/app_settings_cache.py b/application/single_app/app_settings_cache.py index cf908540..e7345efb 100644 --- a/application/single_app/app_settings_cache.py +++ b/application/single_app/app_settings_cache.py @@ -5,9 +5,14 @@ This supports the dynamic selection of redis or in-memory caching of settings. """ import json +import logging from redis import Redis from azure.identity import DefaultAzureCredential +# NOTE: functions_keyvault is imported locally inside configure_app_cache to avoid a circular +# import (functions_keyvault -> app_settings_cache -> functions_keyvault). +# functions_appinsights is also imported locally for the same reason. + _settings = None APP_SETTINGS_CACHE = {} update_settings_cache = None @@ -16,6 +21,8 @@ def configure_app_cache(settings, redis_cache_endpoint=None): global _settings, update_settings_cache, get_settings_cache, APP_SETTINGS_CACHE, app_cache_is_using_redis + # Local import to avoid circular dependency: functions_keyvault imports app_settings_cache. + from functions_appinsights import log_event _settings = settings use_redis = _settings.get('enable_redis_cache', False) @@ -24,9 +31,8 @@ def configure_app_cache(settings, redis_cache_endpoint=None): redis_url = settings.get('redis_url', '').strip() redis_auth_type = settings.get('redis_auth_type', 'key').strip().lower() if redis_auth_type == 'managed_identity': - print("[ASC] Redis enabled using Managed Identity") + log_event("[ASC] Redis enabled using Managed Identity", level=logging.INFO) credential = DefaultAzureCredential() - redis_hostname = redis_url.split('.')[0] cache_endpoint = redis_cache_endpoint token = credential.get_token(cache_endpoint) redis_client = Redis( @@ -36,9 +42,32 @@ def configure_app_cache(settings, redis_cache_endpoint=None): password=token.token, ssl=True ) + elif redis_auth_type == 'key_vault': + log_event("[ASC] Redis enabled using Key Vault Secret", level=logging.INFO) + # Local import to avoid circular dependency: functions_keyvault imports app_settings_cache. + from functions_keyvault import retrieve_secret_direct + redis_key_secret_name = settings.get('redis_key', '').strip() + try: + # Pass settings directly: get_settings_cache() is still None at this point + # because configure_app_cache has not finished initialising the cache yet. + redis_password = retrieve_secret_direct(redis_key_secret_name, settings=settings) + if redis_password: + redis_password = redis_password.strip() + log_event("[ASC] Redis key retrieved from Key Vault successfully", level=logging.INFO) + except Exception as kv_err: + log_event(f"[ASC] ERROR: Failed to retrieve Redis key from Key Vault: {kv_err}", level=logging.ERROR, exceptionTraceback=True) + raise + + redis_client = Redis( + host=redis_url, + port=6380, + db=0, + password=redis_password, + ssl=True + ) else: redis_key = settings.get('redis_key', '').strip() - print("[ASC] Redis enabled using Access Key") + log_event("[ASC] Redis enabled using Access Key", level=logging.INFO) redis_client = Redis( host=redis_url, port=6380, diff --git a/application/single_app/background_tasks.py b/application/single_app/background_tasks.py new file mode 100644 index 00000000..c978bf9e --- /dev/null +++ b/application/single_app/background_tasks.py @@ -0,0 +1,330 @@ +# background_tasks.py + +"""Shared background task runners for web-process and dedicated scheduler use.""" + +import logging +import os +import socket +import threading +import time +import uuid +from datetime import datetime, timedelta, timezone + +from azure.core import MatchConditions + +from config import cosmos_settings_container, exceptions +from functions_appinsights import log_event +from functions_debug import debug_print +from functions_settings import get_settings, update_settings + + +def _get_lock_holder_id(): + """Return a process-unique holder id for distributed background task locks.""" + return f"{socket.gethostname()}:{os.getpid()}:{threading.get_ident()}" + + +def _is_expired_timestamp(timestamp_value, current_time): + """Return True when the stored lock expiration timestamp is missing or expired.""" + if not timestamp_value: + return True + + try: + expiration_time = datetime.fromisoformat(timestamp_value) + except Exception: + return True + + return expiration_time <= current_time + + +def acquire_distributed_task_lock(task_name, lease_seconds): + """Acquire a Cosmos-backed lease for a background task across workers and instances.""" + current_time = datetime.now(timezone.utc) + expires_at = current_time + timedelta(seconds=lease_seconds) + lock_id = f"background_task_lock_{task_name}" + lock_body = { + 'id': lock_id, + 'type': 'background_task_lock', + 'task_name': task_name, + 'holder_id': _get_lock_holder_id(), + 'acquired_at': current_time.isoformat(), + 'expires_at': expires_at.isoformat(), + 'lease_seconds': lease_seconds, + 'lock_token': str(uuid.uuid4()) + } + + try: + cosmos_settings_container.create_item(body=lock_body) + return lock_body + except Exception as exc: + if getattr(exc, 'status_code', None) != 409: + log_event( + 'background_task_lock_create_error', + {'task_name': task_name, 'error': str(exc)}, + level=logging.ERROR + ) + return None + + try: + existing_lock = cosmos_settings_container.read_item(item=lock_id, partition_key=lock_id) + except Exception as exc: + log_event( + 'background_task_lock_read_error', + {'task_name': task_name, 'error': str(exc)}, + level=logging.ERROR + ) + return None + + if not _is_expired_timestamp(existing_lock.get('expires_at'), current_time): + return None + + replacement_lock = dict(existing_lock) + replacement_lock.update(lock_body) + + try: + cosmos_settings_container.replace_item( + item=lock_id, + body=replacement_lock, + etag=existing_lock.get('_etag'), + match_condition=MatchConditions.IfNotModified + ) + return replacement_lock + except Exception as exc: + status_code = getattr(exc, 'status_code', None) + if status_code not in (409, 412): + log_event( + 'background_task_lock_replace_error', + {'task_name': task_name, 'error': str(exc), 'status_code': status_code}, + level=logging.ERROR + ) + return None + + +def release_distributed_task_lock(lock_document): + """Release a previously acquired distributed background task lock.""" + if not lock_document: + return + + lock_id = lock_document.get('id') + holder_id = lock_document.get('holder_id') + if not lock_id or not holder_id: + return + + try: + current_lock = cosmos_settings_container.read_item(item=lock_id, partition_key=lock_id) + except Exception: + return + + if current_lock.get('holder_id') != holder_id: + return + + try: + cosmos_settings_container.delete_item( + item=lock_id, + partition_key=lock_id, + etag=current_lock.get('_etag'), + match_condition=MatchConditions.IfNotModified + ) + except Exception: + return + + +def _should_run_retention_policy(settings, current_time): + """Return True when retention policy work should run for the current schedule state.""" + personal_enabled = settings.get('enable_retention_policy_personal', False) + group_enabled = settings.get('enable_retention_policy_group', False) + public_enabled = settings.get('enable_retention_policy_public', False) + + if not (personal_enabled or group_enabled or public_enabled): + return False + + next_run = settings.get('retention_policy_next_run') + if next_run: + try: + next_run_dt = datetime.fromisoformat(next_run) + return current_time >= next_run_dt + except Exception as parse_error: + print(f"Error parsing next_run timestamp: {parse_error}") + + last_run = settings.get('retention_policy_last_run') + if last_run: + try: + last_run_dt = datetime.fromisoformat(last_run) + return (current_time - last_run_dt).total_seconds() > (23 * 3600) + except Exception: + return True + + return True + + +def check_logging_timers_once(): + """Disable temporary logging settings after their timer expires.""" + settings = get_settings() + current_time = datetime.now() + settings_changed = False + + if ( + settings.get('enable_debug_logging', False) + and settings.get('debug_logging_timer_enabled', False) + and settings.get('debug_logging_turnoff_time') + ): + turnoff_time = settings.get('debug_logging_turnoff_time') + if isinstance(turnoff_time, str): + try: + turnoff_time = datetime.fromisoformat(turnoff_time) + except Exception: + turnoff_time = None + + if turnoff_time and current_time >= turnoff_time: + debug_print(f"logging timer expired at {turnoff_time}. Disabling debug logging.") + settings['enable_debug_logging'] = False + settings['debug_logging_timer_enabled'] = False + settings['debug_logging_turnoff_time'] = None + settings_changed = True + + if ( + settings.get('enable_file_processing_logs', False) + and settings.get('file_processing_logs_timer_enabled', False) + and settings.get('file_processing_logs_turnoff_time') + ): + turnoff_time = settings.get('file_processing_logs_turnoff_time') + if isinstance(turnoff_time, str): + try: + turnoff_time = datetime.fromisoformat(turnoff_time) + except Exception: + turnoff_time = None + + if turnoff_time and current_time >= turnoff_time: + print(f"File processing logs timer expired at {turnoff_time}. Disabling file processing logs.") + settings['enable_file_processing_logs'] = False + settings['file_processing_logs_timer_enabled'] = False + settings['file_processing_logs_turnoff_time'] = None + settings_changed = True + + if settings_changed: + update_settings(settings) + print("Logging settings updated due to timer expiration.") + + +def check_expired_approvals_once(): + """Auto-deny expired approval requests and return the affected count.""" + from functions_approvals import auto_deny_expired_approvals + + lock_document = acquire_distributed_task_lock('approval_expiry', lease_seconds=1800) + if not lock_document: + debug_print('Skipping approval expiration check because another worker holds the lease.') + return None + + try: + denied_count = auto_deny_expired_approvals() + if denied_count > 0: + print(f"Auto-denied {denied_count} expired approval request(s).") + finally: + release_distributed_task_lock(lock_document) + + return denied_count + + +def check_retention_policy_once(): + """Run scheduled retention processing when the next execution window is due.""" + settings = get_settings() + + current_time = datetime.now(timezone.utc) + + if not _should_run_retention_policy(settings, current_time): + return None + + lock_document = acquire_distributed_task_lock('retention_policy', lease_seconds=3600) + if not lock_document: + debug_print('Skipping retention policy check because another worker holds the lease.') + return None + + settings = get_settings() + current_time = datetime.now(timezone.utc) + if not _should_run_retention_policy(settings, current_time): + release_distributed_task_lock(lock_document) + return None + + print(f"Executing scheduled retention policy at {current_time.isoformat()}") + from functions_retention_policy import execute_retention_policy + + try: + results = execute_retention_policy(manual_execution=False) + if results.get('success'): + print( + "Retention policy execution completed: " + f"{results['personal']['conversations']} personal conversations, " + f"{results['personal']['documents']} personal documents, " + f"{results['group']['conversations']} group conversations, " + f"{results['group']['documents']} group documents, " + f"{results['public']['conversations']} public conversations, " + f"{results['public']['documents']} public documents deleted." + ) + else: + print(f"Retention policy execution failed: {results.get('errors')}") + finally: + release_distributed_task_lock(lock_document) + + return results + + +def run_logging_timer_loop(): + """Run the logging timer monitor forever.""" + while True: + try: + check_logging_timers_once() + except Exception as exc: + print(f"Error in logging timer check: {exc}") + log_event(f"Error in logging timer check: {exc}", level=logging.ERROR) + + time.sleep(60) + + +def run_approval_expiration_loop(): + """Run approval expiration checks forever.""" + while True: + try: + check_expired_approvals_once() + except Exception as exc: + print(f"Error in approval expiration check: {exc}") + log_event(f"Error in approval expiration check: {exc}", level=logging.ERROR) + + time.sleep(21600) + + +def run_retention_policy_loop(): + """Run retention policy scheduling checks forever.""" + while True: + try: + check_retention_policy_once() + except Exception as exc: + print(f"Error in retention policy check: {exc}") + log_event(f"Error in retention policy check: {exc}", level=logging.ERROR) + + time.sleep(300) + + +def start_background_task_threads(): + """Start all background task loops for the current process.""" + task_specs = [ + ('Logging timer background task started.', run_logging_timer_loop), + ('Approval expiration background task started.', run_approval_expiration_loop), + ('Retention policy background task started.', run_retention_policy_loop), + ] + + started_threads = [] + for startup_message, task_target in task_specs: + worker_thread = threading.Thread(target=task_target, daemon=True) + worker_thread.start() + print(startup_message) + started_threads.append(worker_thread) + + return started_threads + + +def run_scheduler_forever(): + """Start all scheduler loops and keep the process alive.""" + start_background_task_threads() + print('SimpleChat scheduler is running.') + + while True: + time.sleep(3600) \ No newline at end of file diff --git a/application/single_app/config.py b/application/single_app/config.py index 91288225..059e8706 100644 --- a/application/single_app/config.py +++ b/application/single_app/config.py @@ -94,7 +94,7 @@ EXECUTOR_TYPE = 'thread' EXECUTOR_MAX_WORKERS = 30 SESSION_TYPE = 'filesystem' -VERSION = "0.239.002" +VERSION = "0.239.150" SECRET_KEY = os.getenv('SECRET_KEY', 'dev-secret-key-change-in-production') @@ -150,7 +150,6 @@ def get_allowed_extensions(enable_video=False, enable_audio=False): Args: enable_video: Whether video file support is enabled - enable_audio: Whether audio file support is enabled Returns: set: Allowed file extensions @@ -176,12 +175,14 @@ def get_allowed_extensions(enable_video=False, enable_audio=False): # Add Support for Custom Azure Environments CUSTOM_GRAPH_URL_VALUE = os.getenv("CUSTOM_GRAPH_URL_VALUE", "") +CUSTOM_GRAPH_AUTHORITY_URL_VALUE = os.getenv("CUSTOM_GRAPH_AUTHORITY_URL_VALUE", "") CUSTOM_IDENTITY_URL_VALUE = os.getenv("CUSTOM_IDENTITY_URL_VALUE", "") CUSTOM_RESOURCE_MANAGER_URL_VALUE = os.getenv("CUSTOM_RESOURCE_MANAGER_URL_VALUE", "") CUSTOM_BLOB_STORAGE_URL_VALUE = os.getenv("CUSTOM_BLOB_STORAGE_URL_VALUE", "") CUSTOM_COGNITIVE_SERVICES_URL_VALUE = os.getenv("CUSTOM_COGNITIVE_SERVICES_URL_VALUE", "") CUSTOM_SEARCH_RESOURCE_MANAGER_URL_VALUE = os.getenv("CUSTOM_SEARCH_RESOURCE_MANAGER_URL_VALUE", "") CUSTOM_REDIS_CACHE_INFRASTRUCTURE_URL_VALUE = os.getenv("CUSTOM_REDIS_CACHE_INFRASTRUCTURE_URL_VALUE", "") +CUSTOM_OIDC_METADATA_URL_VALUE = os.getenv("CUSTOM_OIDC_METADATA_URL_VALUE", "") # Azure AD Configuration @@ -193,41 +194,42 @@ def get_allowed_extensions(enable_video=False, enable_audio=False): MICROSOFT_PROVIDER_AUTHENTICATION_SECRET = os.getenv("MICROSOFT_PROVIDER_AUTHENTICATION_SECRET") LOGIN_REDIRECT_URL = os.getenv("LOGIN_REDIRECT_URL") HOME_REDIRECT_URL = os.getenv("HOME_REDIRECT_URL") # Front Door URL for home page - -OIDC_METADATA_URL = f"https://login.microsoftonline.com/{TENANT_ID}/v2.0/.well-known/openid-configuration" AZURE_ENVIRONMENT = os.getenv("AZURE_ENVIRONMENT", "public") # public, usgovernment, custom -if AZURE_ENVIRONMENT == "custom": - AUTHORITY = f"{CUSTOM_IDENTITY_URL_VALUE}/{TENANT_ID}" +WORD_CHUNK_SIZE = 400 + +if AZURE_ENVIRONMENT == "custom" or CUSTOM_IDENTITY_URL_VALUE or CUSTOM_GRAPH_AUTHORITY_URL_VALUE: + AUTHORITY = f"{CUSTOM_IDENTITY_URL_VALUE.rstrip('/')}/{TENANT_ID}" + base_authority = CUSTOM_GRAPH_AUTHORITY_URL_VALUE or CUSTOM_IDENTITY_URL_VALUE + if not base_authority: + base_authority = AUTHORITY.rstrip('/').removesuffix(f"/{TENANT_ID}") + authority = base_authority elif AZURE_ENVIRONMENT == "usgovernment": AUTHORITY = f"https://login.microsoftonline.us/{TENANT_ID}" + authority = AzureAuthorityHosts.AZURE_GOVERNMENT else: AUTHORITY = f"https://login.microsoftonline.com/{TENANT_ID}" + authority = AzureAuthorityHosts.AZURE_PUBLIC_CLOUD -WORD_CHUNK_SIZE = 400 - -if AZURE_ENVIRONMENT == "usgovernment": +if AZURE_ENVIRONMENT == "custom": + OIDC_METADATA_URL = CUSTOM_OIDC_METADATA_URL_VALUE or f"https://login.microsoftonline.com/{TENANT_ID}/v2.0/.well-known/openid-configuration" + resource_manager = CUSTOM_RESOURCE_MANAGER_URL_VALUE + video_indexer_endpoint = os.getenv("CUSTOM_VIDEO_INDEXER_ENDPOINT", "https://api.videoindexer.ai") + credential_scopes=[resource_manager + "/.default"] + cognitive_services_scope = CUSTOM_COGNITIVE_SERVICES_URL_VALUE + search_resource_manager = CUSTOM_SEARCH_RESOURCE_MANAGER_URL_VALUE + KEY_VAULT_DOMAIN = os.getenv("KEY_VAULT_DOMAIN", ".vault.azure.net") +elif AZURE_ENVIRONMENT == "usgovernment": OIDC_METADATA_URL = f"https://login.microsoftonline.us/{TENANT_ID}/v2.0/.well-known/openid-configuration" resource_manager = "https://management.usgovcloudapi.net" - authority = AzureAuthorityHosts.AZURE_GOVERNMENT credential_scopes=[resource_manager + "/.default"] cognitive_services_scope = "https://cognitiveservices.azure.us/.default" video_indexer_endpoint = "https://api.videoindexer.ai.azure.us" search_resource_manager = "https://search.azure.us" KEY_VAULT_DOMAIN = ".vault.usgovcloudapi.net" - -elif AZURE_ENVIRONMENT == "custom": - resource_manager = CUSTOM_RESOURCE_MANAGER_URL_VALUE - authority = CUSTOM_IDENTITY_URL_VALUE - video_indexer_endpoint = os.getenv("CUSTOM_VIDEO_INDEXER_ENDPOINT", "https://api.videoindexer.ai") - credential_scopes=[resource_manager + "/.default"] - cognitive_services_scope = CUSTOM_COGNITIVE_SERVICES_URL_VALUE - search_resource_manager = CUSTOM_SEARCH_RESOURCE_MANAGER_URL_VALUE - KEY_VAULT_DOMAIN = os.getenv("KEY_VAULT_DOMAIN", ".vault.azure.net") else: OIDC_METADATA_URL = f"https://login.microsoftonline.com/{TENANT_ID}/v2.0/.well-known/openid-configuration" resource_manager = "https://management.azure.com" - authority = AzureAuthorityHosts.AZURE_PUBLIC_CLOUD credential_scopes=[resource_manager + "/.default"] cognitive_services_scope = "https://cognitiveservices.azure.com/.default" video_indexer_endpoint = "https://api.videoindexer.ai" @@ -257,6 +259,8 @@ def get_redis_cache_infrastructure_endpoint(redis_hostname: str) -> str: storage_account_user_documents_container_name = "user-documents" storage_account_group_documents_container_name = "group-documents" storage_account_public_documents_container_name = "public-documents" +storage_account_personal_chat_container_name = "personal-chat" +storage_account_group_chat_container_name = "group-chat" # Initialize Azure Cosmos DB client cosmos_endpoint = os.getenv("AZURE_COSMOS_ENDPOINT") @@ -459,6 +463,18 @@ def get_redis_cache_infrastructure_endpoint(redis_hostname: str) -> str: default_ttl=-1 # TTL disabled by default, enabled per-document for auto-cleanup ) +cosmos_thoughts_container_name = "thoughts" +cosmos_thoughts_container = cosmos_database.create_container_if_not_exists( + id=cosmos_thoughts_container_name, + partition_key=PartitionKey(path="/user_id") +) + +cosmos_archived_thoughts_container_name = "archive_thoughts" +cosmos_archived_thoughts_container = cosmos_database.create_container_if_not_exists( + id=cosmos_archived_thoughts_container_name, + partition_key=PartitionKey(path="/user_id") +) + def ensure_custom_logo_file_exists(app, settings): """ If custom_logo_base64 or custom_logo_dark_base64 is present in settings, ensure the appropriate @@ -745,9 +761,11 @@ def initialize_clients(settings): # This addresses the issue where the application assumes containers exist if blob_service_client: for container_name in [ - storage_account_user_documents_container_name, - storage_account_group_documents_container_name, - storage_account_public_documents_container_name + storage_account_user_documents_container_name, + storage_account_group_documents_container_name, + storage_account_public_documents_container_name, + storage_account_personal_chat_container_name, + storage_account_group_chat_container_name ]: try: container_client = blob_service_client.get_container_client(container_name) diff --git a/application/single_app/example.env b/application/single_app/example.env index 38318b48..7803e0b3 100644 --- a/application/single_app/example.env +++ b/application/single_app/example.env @@ -15,4 +15,9 @@ TENANT_ID="" SECRET_KEY="Generate-A-Strong-Random-Secret-Key-Here!" # AZURE_ENVIRONMENT: Set based on your cloud environment # Options: "public", "usgovernment", "custom" -AZURE_ENVIRONMENT="public" \ No newline at end of file +AZURE_ENVIRONMENT="public" + +# Optional Graph overrides (for cross-cloud identity/Graph scenarios) +# Example values: +# CUSTOM_GRAPH_URL_VALUE="https://graph.microsoft.com" +# CUSTOM_GRAPH_AUTHORITY_URL_VALUE="https://login.microsoftonline.com" \ No newline at end of file diff --git a/application/single_app/functions_activity_logging.py b/application/single_app/functions_activity_logging.py index 2a653a47..efb6e780 100644 --- a/application/single_app/functions_activity_logging.py +++ b/application/single_app/functions_activity_logging.py @@ -1393,3 +1393,332 @@ def log_retention_policy_force_push( level=logging.ERROR ) debug_print(f"⚠️ Warning: Failed to log retention policy force push: {str(e)}") + + +# === AGENT & ACTION ACTIVITY LOGGING === + +def log_agent_creation( + user_id: str, + agent_id: str, + agent_name: str, + agent_display_name: Optional[str] = None, + scope: str = 'personal', + group_id: Optional[str] = None +) -> None: + """ + Log an agent creation activity. + + Args: + user_id: The ID of the user who created the agent + agent_id: The unique ID of the new agent + agent_name: The name of the agent + agent_display_name: The display name of the agent + scope: 'personal', 'group', or 'global' + group_id: The group ID (only for group scope) + """ + try: + activity_record = { + 'id': str(uuid.uuid4()), + 'user_id': user_id, + 'activity_type': 'agent_creation', + 'timestamp': datetime.utcnow().isoformat(), + 'created_at': datetime.utcnow().isoformat(), + 'entity_type': 'agent', + 'operation': 'create', + 'entity': { + 'id': agent_id, + 'name': agent_name, + 'display_name': agent_display_name or agent_name + }, + 'workspace_type': scope, + 'workspace_context': {} + } + if scope == 'group' and group_id: + activity_record['workspace_context']['group_id'] = group_id + + cosmos_activity_logs_container.create_item(body=activity_record) + log_event( + message=f"Agent created: {agent_name} ({scope}) by user {user_id}", + extra=activity_record, + level=logging.INFO + ) + debug_print(f"✅ Agent creation logged: {agent_name} ({scope})") + except Exception as e: + log_event( + message=f"Error logging agent creation: {str(e)}", + extra={'user_id': user_id, 'agent_id': agent_id, 'scope': scope, 'error': str(e)}, + level=logging.ERROR + ) + debug_print(f"⚠️ Warning: Failed to log agent creation: {str(e)}") + + +def log_agent_update( + user_id: str, + agent_id: str, + agent_name: str, + agent_display_name: Optional[str] = None, + scope: str = 'personal', + group_id: Optional[str] = None +) -> None: + """ + Log an agent update activity. + + Args: + user_id: The ID of the user who updated the agent + agent_id: The unique ID of the agent + agent_name: The name of the agent + agent_display_name: The display name of the agent + scope: 'personal', 'group', or 'global' + group_id: The group ID (only for group scope) + """ + try: + activity_record = { + 'id': str(uuid.uuid4()), + 'user_id': user_id, + 'activity_type': 'agent_update', + 'timestamp': datetime.utcnow().isoformat(), + 'created_at': datetime.utcnow().isoformat(), + 'entity_type': 'agent', + 'operation': 'update', + 'entity': { + 'id': agent_id, + 'name': agent_name, + 'display_name': agent_display_name or agent_name + }, + 'workspace_type': scope, + 'workspace_context': {} + } + if scope == 'group' and group_id: + activity_record['workspace_context']['group_id'] = group_id + + cosmos_activity_logs_container.create_item(body=activity_record) + log_event( + message=f"Agent updated: {agent_name} ({scope}) by user {user_id}", + extra=activity_record, + level=logging.INFO + ) + debug_print(f"✅ Agent update logged: {agent_name} ({scope})") + except Exception as e: + log_event( + message=f"Error logging agent update: {str(e)}", + extra={'user_id': user_id, 'agent_id': agent_id, 'scope': scope, 'error': str(e)}, + level=logging.ERROR + ) + debug_print(f"⚠️ Warning: Failed to log agent update: {str(e)}") + + +def log_agent_deletion( + user_id: str, + agent_id: str, + agent_name: str, + scope: str = 'personal', + group_id: Optional[str] = None +) -> None: + """ + Log an agent deletion activity. + + Args: + user_id: The ID of the user who deleted the agent + agent_id: The unique ID of the agent + agent_name: The name of the agent + scope: 'personal', 'group', or 'global' + group_id: The group ID (only for group scope) + """ + try: + activity_record = { + 'id': str(uuid.uuid4()), + 'user_id': user_id, + 'activity_type': 'agent_deletion', + 'timestamp': datetime.utcnow().isoformat(), + 'created_at': datetime.utcnow().isoformat(), + 'entity_type': 'agent', + 'operation': 'delete', + 'entity': { + 'id': agent_id, + 'name': agent_name + }, + 'workspace_type': scope, + 'workspace_context': {} + } + if scope == 'group' and group_id: + activity_record['workspace_context']['group_id'] = group_id + + cosmos_activity_logs_container.create_item(body=activity_record) + log_event( + message=f"Agent deleted: {agent_name} ({scope}) by user {user_id}", + extra=activity_record, + level=logging.INFO + ) + debug_print(f"✅ Agent deletion logged: {agent_name} ({scope})") + except Exception as e: + log_event( + message=f"Error logging agent deletion: {str(e)}", + extra={'user_id': user_id, 'agent_id': agent_id, 'scope': scope, 'error': str(e)}, + level=logging.ERROR + ) + debug_print(f"⚠️ Warning: Failed to log agent deletion: {str(e)}") + + +def log_action_creation( + user_id: str, + action_id: str, + action_name: str, + action_type: Optional[str] = None, + scope: str = 'personal', + group_id: Optional[str] = None +) -> None: + """ + Log an action/plugin creation activity. + + Args: + user_id: The ID of the user who created the action + action_id: The unique ID of the new action + action_name: The name of the action + action_type: The type of the action (e.g., 'openapi', 'sql_query') + scope: 'personal', 'group', or 'global' + group_id: The group ID (only for group scope) + """ + try: + activity_record = { + 'id': str(uuid.uuid4()), + 'user_id': user_id, + 'activity_type': 'action_creation', + 'timestamp': datetime.utcnow().isoformat(), + 'created_at': datetime.utcnow().isoformat(), + 'entity_type': 'action', + 'operation': 'create', + 'entity': { + 'id': action_id, + 'name': action_name, + 'type': action_type + }, + 'workspace_type': scope, + 'workspace_context': {} + } + if scope == 'group' and group_id: + activity_record['workspace_context']['group_id'] = group_id + + cosmos_activity_logs_container.create_item(body=activity_record) + log_event( + message=f"Action created: {action_name} ({scope}) by user {user_id}", + extra=activity_record, + level=logging.INFO + ) + debug_print(f"✅ Action creation logged: {action_name} ({scope})") + except Exception as e: + log_event( + message=f"Error logging action creation: {str(e)}", + extra={'user_id': user_id, 'action_id': action_id, 'scope': scope, 'error': str(e)}, + level=logging.ERROR + ) + debug_print(f"⚠️ Warning: Failed to log action creation: {str(e)}") + + +def log_action_update( + user_id: str, + action_id: str, + action_name: str, + action_type: Optional[str] = None, + scope: str = 'personal', + group_id: Optional[str] = None +) -> None: + """ + Log an action/plugin update activity. + + Args: + user_id: The ID of the user who updated the action + action_id: The unique ID of the action + action_name: The name of the action + action_type: The type of the action + scope: 'personal', 'group', or 'global' + group_id: The group ID (only for group scope) + """ + try: + activity_record = { + 'id': str(uuid.uuid4()), + 'user_id': user_id, + 'activity_type': 'action_update', + 'timestamp': datetime.utcnow().isoformat(), + 'created_at': datetime.utcnow().isoformat(), + 'entity_type': 'action', + 'operation': 'update', + 'entity': { + 'id': action_id, + 'name': action_name, + 'type': action_type + }, + 'workspace_type': scope, + 'workspace_context': {} + } + if scope == 'group' and group_id: + activity_record['workspace_context']['group_id'] = group_id + + cosmos_activity_logs_container.create_item(body=activity_record) + log_event( + message=f"Action updated: {action_name} ({scope}) by user {user_id}", + extra=activity_record, + level=logging.INFO + ) + debug_print(f"✅ Action update logged: {action_name} ({scope})") + except Exception as e: + log_event( + message=f"Error logging action update: {str(e)}", + extra={'user_id': user_id, 'action_id': action_id, 'scope': scope, 'error': str(e)}, + level=logging.ERROR + ) + debug_print(f"⚠️ Warning: Failed to log action update: {str(e)}") + + +def log_action_deletion( + user_id: str, + action_id: str, + action_name: str, + action_type: Optional[str] = None, + scope: str = 'personal', + group_id: Optional[str] = None +) -> None: + """ + Log an action/plugin deletion activity. + + Args: + user_id: The ID of the user who deleted the action + action_id: The unique ID of the action + action_name: The name of the action + action_type: The type of the action + scope: 'personal', 'group', or 'global' + group_id: The group ID (only for group scope) + """ + try: + activity_record = { + 'id': str(uuid.uuid4()), + 'user_id': user_id, + 'activity_type': 'action_deletion', + 'timestamp': datetime.utcnow().isoformat(), + 'created_at': datetime.utcnow().isoformat(), + 'entity_type': 'action', + 'operation': 'delete', + 'entity': { + 'id': action_id, + 'name': action_name, + 'type': action_type + }, + 'workspace_type': scope, + 'workspace_context': {} + } + if scope == 'group' and group_id: + activity_record['workspace_context']['group_id'] = group_id + + cosmos_activity_logs_container.create_item(body=activity_record) + log_event( + message=f"Action deleted: {action_name} ({scope}) by user {user_id}", + extra=activity_record, + level=logging.INFO + ) + debug_print(f"✅ Action deletion logged: {action_name} ({scope})") + except Exception as e: + log_event( + message=f"Error logging action deletion: {str(e)}", + extra={'user_id': user_id, 'action_id': action_id, 'scope': scope, 'error': str(e)}, + level=logging.ERROR + ) + debug_print(f"⚠️ Warning: Failed to log action deletion: {str(e)}") diff --git a/application/single_app/functions_agent_payload.py b/application/single_app/functions_agent_payload.py index 09f1f343..2a7935a5 100644 --- a/application/single_app/functions_agent_payload.py +++ b/application/single_app/functions_agent_payload.py @@ -45,6 +45,21 @@ "azure_agent_apim_gpt_deployment", "azure_agent_apim_gpt_api_version", ] +_SERVER_MANAGED_FIELDS = [ + "_attachments", + "_etag", + "_rid", + "_self", + "_ts", + "created_at", + "created_by", + "modified_at", + "modified_by", + "updated_at", + "last_updated", + "user_id", + "group_id", +] _MAX_FIELD_LENGTHS = { "name": 100, @@ -146,12 +161,18 @@ def _validate_foundry_field_lengths(foundry_settings: Dict[str, Any]) -> None: if isinstance(value, str) and len(value) > max_len: raise AgentPayloadError(f"azure_ai_foundry.{field} exceeds maximum length of {max_len}.") + +def _strip_server_managed_fields(payload: Dict[str, Any]) -> None: + for field in _SERVER_MANAGED_FIELDS: + payload.pop(field, None) + def sanitize_agent_payload(agent: Dict[str, Any]) -> Dict[str, Any]: """Return a sanitized copy of the agent payload or raise AgentPayloadError.""" if not isinstance(agent, dict): raise AgentPayloadError("Agent payload must be an object.") sanitized = deepcopy(agent) + _strip_server_managed_fields(sanitized) _normalize_text_fields(sanitized) for field in _STRING_DEFAULT_FIELDS: diff --git a/application/single_app/functions_authentication.py b/application/single_app/functions_authentication.py index e4bcf480..79487696 100644 --- a/application/single_app/functions_authentication.py +++ b/application/single_app/functions_authentication.py @@ -52,11 +52,12 @@ def _save_cache(cache): # Decide how to handle this, maybe clear cache or log extensively # session.pop("token_cache", None) # Option: Clear on serialization failure -def _build_msal_app(cache=None): +def _build_msal_app(cache=None, authority_override=None): """Builds the MSAL ConfidentialClientApplication, optionally initializing with a cache.""" + authority = authority_override or AUTHORITY return ConfidentialClientApplication( CLIENT_ID, - authority=AUTHORITY, + authority=authority, client_credential=CLIENT_SECRET, token_cache=cache # Pass the cache instance here ) @@ -88,7 +89,7 @@ def get_valid_access_token(scopes=None): required_scopes = scopes or SCOPE # Use default SCOPE if none provided - msal_app = _build_msal_app(cache=_load_cache()) + msal_app = _build_msal_app(cache=_load_cache(), authority_override=get_graph_authority()) user_info = session.get("user", {}) # MSAL uses home_account_id which combines oid and tid # Construct it carefully based on your id_token_claims structure @@ -160,7 +161,7 @@ def get_valid_access_token_for_plugins(scopes=None): required_scopes = scopes or SCOPE # Use default SCOPE if none provided - msal_app = _build_msal_app(cache=_load_cache()) + msal_app = _build_msal_app(cache=_load_cache(), authority_override=get_graph_authority()) user_info = session.get("user", {}) # MSAL uses home_account_id which combines oid and tid # Construct it carefully based on your id_token_claims structure @@ -844,6 +845,103 @@ def get_current_user_info(): "displayName": user.get("name") } + +def _normalize_authority(authority_base, tenant_id): + """Normalize an authority URL and append tenant when appropriate.""" + base = (authority_base or "").strip().rstrip("/") + tenant = (tenant_id or "").strip() + + if not base or not tenant: + return base + + lowered = base.lower() + tenant_lower = tenant.lower() + + if lowered.endswith(f"/{tenant_lower}"): + return base + + if lowered.endswith("/common") or lowered.endswith("/organizations") or lowered.endswith("/consumers"): + return base + + return f"{base}/{tenant}" + + +def get_graph_authority(): + """ + Resolve authority for Graph token acquisition, independent of general Azure environment defaults. + + Precedence: + 1. CUSTOM_GRAPH_AUTHORITY_URL_VALUE if provided + 2. Custom cloud identity authority for AZURE_ENVIRONMENT=custom + 3. Gov/Public cloud authority based on AZURE_ENVIRONMENT + """ + custom_graph_authority = (CUSTOM_GRAPH_AUTHORITY_URL_VALUE or "").strip() + if custom_graph_authority: + return _normalize_authority(custom_graph_authority, TENANT_ID) + + if AZURE_ENVIRONMENT == "custom": + return _normalize_authority(CUSTOM_IDENTITY_URL_VALUE, TENANT_ID) + + if AZURE_ENVIRONMENT == "usgovernment": + return f"https://login.microsoftonline.us/{TENANT_ID}" + + return f"https://login.microsoftonline.com/{TENANT_ID}" + + +def get_graph_base_url(): + """ + Resolve the Microsoft Graph base URL for this deployment. + + Precedence: + 1. CUSTOM_GRAPH_URL_VALUE if provided (works in any AZURE_ENVIRONMENT mode) + 2. Azure Gov Graph for usgovernment + 3. Public Graph by default + + Returns: + str: Normalized Graph base URL ending with /v1.0 + """ + custom_graph_url = (CUSTOM_GRAPH_URL_VALUE or "").strip().rstrip("/") + if custom_graph_url: + normalized = custom_graph_url + lowered = normalized.lower() + + # Allow legacy values such as https://.../v1.0/users + if lowered.endswith("/users"): + normalized = normalized[:-6].rstrip("/") + lowered = normalized.lower() + + if "/v1.0" not in lowered: + normalized = f"{normalized}/v1.0" + + return normalized + + if AZURE_ENVIRONMENT == "usgovernment": + return "https://graph.microsoft.us/v1.0" + + return "https://graph.microsoft.com/v1.0" + + +def get_graph_endpoint(path=""): + """ + Build a full Graph endpoint from a relative path. + + Args: + path (str): Relative Graph path (for example: "/users" or "users/{id}") + + Returns: + str: Fully qualified Microsoft Graph URL + """ + base_url = get_graph_base_url().rstrip("/") + path = (path or "").strip() + + if not path: + return base_url + + if not path.startswith("/"): + path = f"/{path}" + + return f"{base_url}{path}" + def get_user_profile_image(): """ Fetches the user's profile image from Microsoft Graph and returns it as base64. @@ -854,13 +952,7 @@ def get_user_profile_image(): debug_print("get_user_profile_image: Could not acquire access token") return None - # Determine the correct Graph endpoint based on Azure environment - if AZURE_ENVIRONMENT == "usgovernment": - profile_image_endpoint = "https://graph.microsoft.us/v1.0/me/photo/$value" - elif AZURE_ENVIRONMENT == "custom": - profile_image_endpoint = f"{CUSTOM_GRAPH_URL_VALUE}/me/photo/$value" - else: - profile_image_endpoint = "https://graph.microsoft.com/v1.0/me/photo/$value" + profile_image_endpoint = get_graph_endpoint("/me/photo/$value") headers = { "Authorization": f"Bearer {token}", diff --git a/application/single_app/functions_content.py b/application/single_app/functions_content.py index 376d23f4..e9636bb7 100644 --- a/application/single_app/functions_content.py +++ b/application/single_app/functions_content.py @@ -1,5 +1,7 @@ # functions_content.py +import email.utils + from functions_debug import debug_print from config import * from functions_settings import * @@ -306,6 +308,57 @@ def chunk_word_file_into_pages(di_pages): # Current logic returns empty list if no words. return new_pages + +def _parse_retry_after_seconds(response_headers): + """Return retry delay in seconds from rate-limit headers when available.""" + if response_headers is None: + return None + + for header_name in ('retry-after-ms', 'x-ms-retry-after-ms'): + try: + retry_ms = response_headers.get(header_name) + if retry_ms is None: + continue + + retry_after = float(retry_ms) / 1000 + if retry_after > 0: + return retry_after + except (TypeError, ValueError): + continue + + retry_header = response_headers.get('retry-after') + try: + retry_after = float(retry_header) + if retry_after > 0: + return retry_after + except (TypeError, ValueError): + pass + + if not retry_header: + return None + + retry_date_tuple = email.utils.parsedate_tz(retry_header) + if retry_date_tuple is None: + return None + + retry_after = float(email.utils.mktime_tz(retry_date_tuple) - time.time()) + if retry_after <= 0: + return None + + return retry_after + + +def _get_rate_limit_wait_time(rate_limit_error, fallback_delay): + """Prefer service-provided retry timing and fall back to jittered backoff.""" + response = getattr(rate_limit_error, 'response', None) + response_headers = getattr(response, 'headers', None) + retry_after = _parse_retry_after_seconds(response_headers) + + if retry_after is not None and retry_after <= 60: + return retry_after + + return fallback_delay * random.uniform(1.0, 1.5) + def generate_embedding( text, max_retries=5, @@ -352,7 +405,7 @@ def generate_embedding( embedding_model = selected_embedding_model['deploymentName'] while True: - random_delay = random.uniform(0.5, 2.0) + random_delay = random.uniform(0.05, 0.2) time.sleep(random_delay) try: @@ -379,9 +432,116 @@ def generate_embedding( if retries > max_retries: return None - wait_time = current_delay * random.uniform(1.0, 1.5) + wait_time = _get_rate_limit_wait_time(e, current_delay) + debug_print( + f"[EMBEDDING] Rate limited, retrying in {wait_time:.2f}s " + f"(attempt {retries}/{max_retries})" + ) time.sleep(wait_time) current_delay *= delay_multiplier except Exception as e: raise + +def generate_embeddings_batch( + texts, + batch_size=16, + max_retries=5, + initial_delay=1.0, + delay_multiplier=2.0 +): + """Generate embeddings for multiple texts in batches. + + Azure OpenAI embeddings API accepts a list of strings as input. + This reduces per-call overhead and delay significantly. + + Args: + texts: List of text strings to embed. + batch_size: Number of texts per API call (default 16). + max_retries: Max retries on rate limit errors. + initial_delay: Initial retry delay in seconds. + delay_multiplier: Multiplier for exponential backoff. + + Returns: + list of (embedding, token_usage) tuples, one per input text. + """ + settings = get_settings() + + enable_embedding_apim = settings.get('enable_embedding_apim', False) + + if enable_embedding_apim: + embedding_model = settings.get('azure_apim_embedding_deployment') + embedding_client = AzureOpenAI( + api_version=settings.get('azure_apim_embedding_api_version'), + azure_endpoint=settings.get('azure_apim_embedding_endpoint'), + api_key=settings.get('azure_apim_embedding_subscription_key')) + else: + if (settings.get('azure_openai_embedding_authentication_type') == 'managed_identity'): + token_provider = get_bearer_token_provider(DefaultAzureCredential(), cognitive_services_scope) + + embedding_client = AzureOpenAI( + api_version=settings.get('azure_openai_embedding_api_version'), + azure_endpoint=settings.get('azure_openai_embedding_endpoint'), + azure_ad_token_provider=token_provider + ) + + embedding_model_obj = settings.get('embedding_model', {}) + if embedding_model_obj and embedding_model_obj.get('selected'): + selected_embedding_model = embedding_model_obj['selected'][0] + embedding_model = selected_embedding_model['deploymentName'] + else: + embedding_client = AzureOpenAI( + api_version=settings.get('azure_openai_embedding_api_version'), + azure_endpoint=settings.get('azure_openai_embedding_endpoint'), + api_key=settings.get('azure_openai_embedding_key') + ) + + embedding_model_obj = settings.get('embedding_model', {}) + if embedding_model_obj and embedding_model_obj.get('selected'): + selected_embedding_model = embedding_model_obj['selected'][0] + embedding_model = selected_embedding_model['deploymentName'] + + results = [] + for i in range(0, len(texts), batch_size): + batch = texts[i:i + batch_size] + retries = 0 + current_delay = initial_delay + + while True: + random_delay = random.uniform(0.05, 0.2) + time.sleep(random_delay) + + try: + response = embedding_client.embeddings.create( + model=embedding_model, + input=batch + ) + + for item in response.data: + token_usage = None + if hasattr(response, 'usage') and response.usage: + token_usage = { + 'prompt_tokens': response.usage.prompt_tokens // len(batch), + 'total_tokens': response.usage.total_tokens // len(batch), + 'model_deployment_name': embedding_model + } + results.append((item.embedding, token_usage)) + break + + except RateLimitError as e: + retries += 1 + if retries > max_retries: + raise + + wait_time = _get_rate_limit_wait_time(e, current_delay) + debug_print( + f"[EMBEDDING_BATCH] Rate limited, retrying in {wait_time:.2f}s " + f"(attempt {retries}/{max_retries})" + ) + time.sleep(wait_time) + current_delay *= delay_multiplier + + except Exception as e: + raise + + return results diff --git a/application/single_app/functions_conversation_unread.py b/application/single_app/functions_conversation_unread.py new file mode 100644 index 00000000..583c4080 --- /dev/null +++ b/application/single_app/functions_conversation_unread.py @@ -0,0 +1,44 @@ +# functions_conversation_unread.py + +"""Helpers for conversation unread assistant-response state.""" + +from datetime import datetime + + +def normalize_conversation_unread_state(conversation_item): + """Ensure unread assistant-response fields always exist on a conversation.""" + if not isinstance(conversation_item, dict): + return conversation_item + + conversation_item['has_unread_assistant_response'] = bool( + conversation_item.get('has_unread_assistant_response', False) + ) + conversation_item['last_unread_assistant_message_id'] = conversation_item.get( + 'last_unread_assistant_message_id' + ) + conversation_item['last_unread_assistant_at'] = conversation_item.get( + 'last_unread_assistant_at' + ) + return conversation_item + + +def mark_conversation_unread( + conversation_item, + assistant_message_id, + unread_timestamp=None, +): + """Mark a conversation as having an unread assistant response.""" + normalized_item = normalize_conversation_unread_state(conversation_item) + normalized_item['has_unread_assistant_response'] = True + normalized_item['last_unread_assistant_message_id'] = assistant_message_id + normalized_item['last_unread_assistant_at'] = unread_timestamp or datetime.utcnow().isoformat() + return normalized_item + + +def clear_conversation_unread(conversation_item): + """Clear unread assistant-response state from a conversation.""" + normalized_item = normalize_conversation_unread_state(conversation_item) + normalized_item['has_unread_assistant_response'] = False + normalized_item['last_unread_assistant_message_id'] = None + normalized_item['last_unread_assistant_at'] = None + return normalized_item diff --git a/application/single_app/functions_debug.py b/application/single_app/functions_debug.py index 5cbf6a2e..d7615be3 100644 --- a/application/single_app/functions_debug.py +++ b/application/single_app/functions_debug.py @@ -3,32 +3,48 @@ from app_settings_cache import get_settings_cache from functions_settings import * -def debug_print(message, category="INFO", **kwargs): + +def _format_debug_message(message, args): + """Support legacy printf-style debug calls while preserving plain strings.""" + message_text = str(message) + if not args: + return message_text + + try: + return message_text % args + except Exception: + rendered_args = ", ".join(str(arg) for arg in args) + return f"{message_text} {rendered_args}" + + +def _emit_debug_message(settings, message, category, flush, kwargs): + if settings.get('enable_debug_logging', False): + debug_msg = f"[DEBUG] [{category}]: {message}" + if kwargs: + kwargs_str = ", ".join(f"{k}={v}" for k, v in kwargs.items()) + debug_msg += f" ({kwargs_str})" + print(debug_msg, flush=flush) + + +def debug_print(message, *args, category="INFO", **kwargs): """ Print debug message only if debug logging is enabled in settings. Args: message (str): The debug message to print + *args: Optional printf-style values applied to the message category (str): Optional category for the debug message **kwargs: Additional key-value pairs to include in debug output """ - #print(f"DEBUG_PRINT CALLED WITH MESSAGE: {message}") + flush = kwargs.pop('flush', False) + formatted_message = _format_debug_message(message, args) + try: cache = get_settings_cache() - if cache.get('enable_debug_logging', False): - debug_msg = f"[DEBUG] [{category}]: {message}" - if kwargs: - kwargs_str = ", ".join(f"{k}={v}" for k, v in kwargs.items()) - debug_msg += f" ({kwargs_str})" - print(debug_msg) + _emit_debug_message(cache, formatted_message, category, flush, kwargs) except Exception: settings = get_settings() - if settings.get('enable_debug_logging', False): - debug_msg = f"[DEBUG] [{category}]: {message}" - if kwargs: - kwargs_str = ", ".join(f"{k}={v}" for k, v in kwargs.items()) - debug_msg += f" ({kwargs_str})" - print(debug_msg) + _emit_debug_message(settings, formatted_message, category, flush, kwargs) def is_debug_enabled(): diff --git a/application/single_app/functions_documents.py b/application/single_app/functions_documents.py index ce08066d..87c9ba34 100644 --- a/application/single_app/functions_documents.py +++ b/application/single_app/functions_documents.py @@ -1646,6 +1646,191 @@ def save_chunks(page_text_content, page_number, file_name, user_id, document_id, # Return token usage information for accumulation return token_usage +def save_chunks_batch(chunks_data, user_id, document_id, group_id=None, public_workspace_id=None): + """ + Save multiple chunks at once using batch embedding and batch AI Search upload. + Significantly faster than calling save_chunks() per chunk. + + Args: + chunks_data: list of dicts with keys: page_text_content, page_number, file_name + user_id: The user ID + document_id: The document ID + group_id: Optional group ID for group documents + public_workspace_id: Optional public workspace ID for public documents + + Returns: + dict with 'total_tokens', 'prompt_tokens', 'model_deployment_name' + """ + from functions_content import generate_embeddings_batch + + current_time = datetime.now(timezone.utc).strftime('%Y-%m-%dT%H:%M:%SZ') + is_group = group_id is not None + is_public_workspace = public_workspace_id is not None + + # Retrieve metadata once for all chunks + try: + if is_public_workspace: + metadata = get_document_metadata( + document_id=document_id, + user_id=user_id, + public_workspace_id=public_workspace_id + ) + elif is_group: + metadata = get_document_metadata( + document_id=document_id, + user_id=user_id, + group_id=group_id + ) + else: + metadata = get_document_metadata( + document_id=document_id, + user_id=user_id + ) + + if not metadata: + raise ValueError(f"No metadata found for document {document_id}") + + version = metadata.get("version") if metadata.get("version") else 1 + except Exception as e: + log_event(f"[save_chunks_batch] Error retrieving metadata for document {document_id}: {repr(e)}", level=logging.ERROR) + raise + + # Generate all embeddings in batches + texts = [c['page_text_content'] for c in chunks_data] + try: + embedding_results = generate_embeddings_batch(texts) + except Exception as e: + log_event(f"[save_chunks_batch] Error generating batch embeddings for document {document_id}: {e}", level=logging.ERROR) + raise + + # Check for vision analysis once + vision_analysis = metadata.get('vision_analysis') + vision_text = "" + if vision_analysis: + vision_text_parts = [] + vision_text_parts.append("\n\n=== AI Vision Analysis ===") + vision_text_parts.append(f"Model: {vision_analysis.get('model', 'unknown')}") + if vision_analysis.get('description'): + vision_text_parts.append(f"\nDescription: {vision_analysis['description']}") + if vision_analysis.get('objects'): + objects_list = vision_analysis['objects'] + if isinstance(objects_list, list): + vision_text_parts.append(f"\nObjects Detected: {', '.join(objects_list)}") + else: + vision_text_parts.append(f"\nObjects Detected: {objects_list}") + if vision_analysis.get('text'): + vision_text_parts.append(f"\nVisible Text: {vision_analysis['text']}") + if vision_analysis.get('analysis'): + vision_text_parts.append(f"\nContextual Analysis: {vision_analysis['analysis']}") + vision_text = "\n".join(vision_text_parts) + + # Build all chunk documents + chunk_documents = [] + total_token_usage = {'total_tokens': 0, 'prompt_tokens': 0, 'model_deployment_name': None} + + for idx, chunk_info in enumerate(chunks_data): + embedding, token_usage = embedding_results[idx] + page_number = chunk_info['page_number'] + file_name = chunk_info['file_name'] + page_text_content = chunk_info['page_text_content'] + + if token_usage: + total_token_usage['total_tokens'] += token_usage.get('total_tokens', 0) + total_token_usage['prompt_tokens'] += token_usage.get('prompt_tokens', 0) + if not total_token_usage['model_deployment_name']: + total_token_usage['model_deployment_name'] = token_usage.get('model_deployment_name') + + chunk_id = f"{document_id}_{page_number}" + enhanced_chunk_text = page_text_content + vision_text if vision_text else page_text_content + + if is_public_workspace: + chunk_document = { + "id": chunk_id, + "document_id": document_id, + "chunk_id": str(page_number), + "chunk_text": enhanced_chunk_text, + "embedding": embedding, + "file_name": file_name, + "chunk_keywords": [], + "chunk_summary": "", + "page_number": page_number, + "author": [], + "title": "", + "document_classification": "None", + "document_tags": metadata.get('tags', []), + "chunk_sequence": page_number, + "upload_date": current_time, + "version": version, + "public_workspace_id": public_workspace_id + } + elif is_group: + shared_group_ids = metadata.get('shared_group_ids', []) if metadata else [] + chunk_document = { + "id": chunk_id, + "document_id": document_id, + "chunk_id": str(page_number), + "chunk_text": enhanced_chunk_text, + "embedding": embedding, + "file_name": file_name, + "chunk_keywords": [], + "chunk_summary": "", + "page_number": page_number, + "author": [], + "title": "", + "document_classification": "None", + "document_tags": metadata.get('tags', []), + "chunk_sequence": page_number, + "upload_date": current_time, + "version": version, + "group_id": group_id, + "shared_group_ids": shared_group_ids + } + else: + shared_user_ids = metadata.get('shared_user_ids', []) if metadata else [] + chunk_document = { + "id": chunk_id, + "document_id": document_id, + "chunk_id": str(page_number), + "chunk_text": enhanced_chunk_text, + "embedding": embedding, + "file_name": file_name, + "chunk_keywords": [], + "chunk_summary": "", + "page_number": page_number, + "author": [], + "title": "", + "document_classification": "None", + "document_tags": metadata.get('tags', []), + "chunk_sequence": page_number, + "upload_date": current_time, + "version": version, + "user_id": user_id, + "shared_user_ids": shared_user_ids + } + + chunk_documents.append(chunk_document) + + # Batch upload to AI Search + try: + if is_public_workspace: + search_client = CLIENTS["search_client_public"] + elif is_group: + search_client = CLIENTS["search_client_group"] + else: + search_client = CLIENTS["search_client_user"] + + # Upload in sub-batches of 32 to avoid request size limits + upload_batch_size = 32 + for i in range(0, len(chunk_documents), upload_batch_size): + sub_batch = chunk_documents[i:i + upload_batch_size] + search_client.upload_documents(documents=sub_batch) + + except Exception as e: + log_event(f"[save_chunks_batch] Error uploading batch to AI Search for document {document_id}: {e}", level=logging.ERROR) + raise + + return total_token_usage + def get_document_metadata_for_citations(document_id, user_id=None, group_id=None, public_workspace_id=None): """ Retrieve keywords and abstract from a document for creating metadata citations. @@ -4669,37 +4854,30 @@ def process_single_tabular_sheet(df, document_id, user_id, file_name, update_cal # Consider accumulating page count in the caller if needed. update_callback(number_of_pages=num_chunks_final) - # Save chunks, prepending the header to each + # Save chunks, prepending the header to each — use batch processing for speed + all_chunks = [] for idx, chunk_rows_content in enumerate(final_chunks_content, start=1): - # Prepend header - header length does not count towards chunk size limit chunk_with_header = header_string + chunk_rows_content - - update_callback( - current_file_chunk=idx, - status=f"Saving chunk {idx}/{num_chunks_final} from {file_name}..." - ) - - args = { + all_chunks.append({ "page_text_content": chunk_with_header, "page_number": idx, - "file_name": file_name, - "user_id": user_id, - "document_id": document_id - } + "file_name": file_name + }) - if is_public_workspace: - args["public_workspace_id"] = public_workspace_id - elif is_group: - args["group_id"] = group_id + if all_chunks: + update_callback( + current_file_chunk=1, + status=f"Batch processing {num_chunks_final} chunks from {file_name}..." + ) - token_usage = save_chunks(**args) - total_chunks_saved += 1 - - # Accumulate embedding tokens - if token_usage: - total_embedding_tokens += token_usage.get('total_tokens', 0) - if not embedding_model_name: - embedding_model_name = token_usage.get('model_deployment_name') + batch_token_usage = save_chunks_batch( + all_chunks, user_id, document_id, + group_id=group_id, public_workspace_id=public_workspace_id + ) + total_chunks_saved = len(all_chunks) + if batch_token_usage: + total_embedding_tokens = batch_token_usage.get('total_tokens', 0) + embedding_model_name = batch_token_usage.get('model_deployment_name') return total_chunks_saved, total_embedding_tokens, embedding_model_name @@ -4729,63 +4907,93 @@ def process_tabular(document_id, user_id, temp_file_path, original_filename, fil args["group_id"] = group_id upload_to_blob(**args) + update_callback(enhanced_citations=True, status=f"Enhanced citations enabled for {file_ext}") - try: - if file_ext == '.csv': - # Process CSV - # Read CSV, attempt to infer header, keep data as string initially - df = pandas.read_csv( - temp_file_path, - keep_default_na=False, - dtype=str - ) - args = { - "df": df, - "document_id": document_id, - "user_id": user_id, + # When enhanced citations is on, index a single schema summary chunk + # instead of row-by-row chunking. The tabular processing plugin handles analysis. + if enable_enhanced_citations: + try: + if file_ext == '.csv': + df_preview = pandas.read_csv(temp_file_path, keep_default_na=False, dtype=str, nrows=5) + full_df = pandas.read_csv(temp_file_path, keep_default_na=False, dtype=str) + row_count = len(full_df) + columns = [str(column) for column in df_preview.columns] + preview_rows = df_preview.head(5).to_string(index=False) + + schema_summary = ( + f"Tabular data file: {original_filename}\n" + f"Columns ({len(columns)}): {', '.join(columns)}\n" + f"Total rows: {row_count}\n" + f"Preview (first 5 rows):\n{preview_rows}\n\n" + f"This file is available for detailed analysis via the Tabular Processing plugin." + ) + elif file_ext in ('.xlsx', '.xls', '.xlsm'): + engine = 'openpyxl' if file_ext in ('.xlsx', '.xlsm') else 'xlrd' + excel_file = pandas.ExcelFile(temp_file_path, engine=engine) + workbook_sections = [] + + for sheet_name in excel_file.sheet_names: + df_preview = excel_file.parse(sheet_name, keep_default_na=False, dtype=str, nrows=3) + full_df = excel_file.parse(sheet_name, keep_default_na=False, dtype=str) + columns = [str(column) for column in df_preview.columns] + preview_rows = df_preview.head(3).to_string(index=False) + workbook_sections.append( + f"Sheet: {sheet_name}\n" + f"Columns ({len(columns)}): {', '.join(columns)}\n" + f"Total rows: {len(full_df)}\n" + f"Preview (first 3 rows):\n{preview_rows}" + ) + + schema_summary = ( + f"Tabular workbook: {original_filename}\n" + f"Sheets ({len(excel_file.sheet_names)}): {', '.join(excel_file.sheet_names)}\n\n" + + "\n\n".join(workbook_sections) + + "\n\nThis workbook is available for detailed analysis via the Tabular Processing plugin." + ) + else: + raise ValueError(f"Unsupported tabular file type: {file_ext}") + + update_callback(number_of_pages=1, status=f"Indexing schema summary for {original_filename}...") + + save_args = { + "page_text_content": schema_summary, + "page_number": 1, "file_name": original_filename, - "update_callback": update_callback + "user_id": user_id, + "document_id": document_id } - if is_public_workspace: - args["public_workspace_id"] = public_workspace_id + save_args["public_workspace_id"] = public_workspace_id elif is_group: - args["group_id"] = group_id + save_args["group_id"] = group_id - result = process_single_tabular_sheet(**args) - if isinstance(result, tuple) and len(result) == 3: - chunks, tokens, model = result - total_chunks_saved = chunks - total_embedding_tokens += tokens - if not embedding_model_name: - embedding_model_name = model - else: - total_chunks_saved = result - - elif file_ext in ('.xlsx', '.xls', '.xlsm'): - # Process Excel (potentially multiple sheets) - excel_file = pandas.ExcelFile( - temp_file_path, - engine='openpyxl' if file_ext in ('.xlsx', '.xlsm') else 'xlrd' - ) - sheet_names = excel_file.sheet_names - base_name, ext = os.path.splitext(original_filename) - - accumulated_total_chunks = 0 - for sheet_name in sheet_names: - update_callback(status=f"Processing sheet '{sheet_name}'...") - # Read specific sheet, get values (not formulas), keep data as string - # Note: pandas typically reads values, not formulas by default. - df = excel_file.parse(sheet_name, keep_default_na=False, dtype=str) + token_usage = save_chunks(**save_args) + total_chunks_saved = 1 + if token_usage: + total_embedding_tokens = token_usage.get('total_tokens', 0) + embedding_model_name = token_usage.get('model_deployment_name') - # Create effective filename for this sheet - effective_filename = f"{base_name}-{sheet_name}{ext}" if len(sheet_names) > 1 else original_filename + # Don't return here — fall through to metadata extraction below + except Exception as e: + log_event(f"[process_tabular] Error creating schema summary, falling back to row-by-row: {e}", level=logging.WARNING) + # Fall through to existing row-by-row processing + # Only do row-by-row chunking if schema-only didn't produce chunks + if total_chunks_saved == 0: + try: + if file_ext == '.csv': + # Process CSV + # Read CSV, attempt to infer header, keep data as string initially + df = pandas.read_csv( + temp_file_path, + keep_default_na=False, + dtype=str + ) args = { "df": df, "document_id": document_id, "user_id": user_id, - "file_name": effective_filename, + "file_name": original_filename, "update_callback": update_callback } @@ -4797,21 +5005,62 @@ def process_tabular(document_id, user_id, temp_file_path, original_filename, fil result = process_single_tabular_sheet(**args) if isinstance(result, tuple) and len(result) == 3: chunks, tokens, model = result - accumulated_total_chunks += chunks + total_chunks_saved = chunks total_embedding_tokens += tokens if not embedding_model_name: embedding_model_name = model else: - accumulated_total_chunks += result + total_chunks_saved = result - total_chunks_saved = accumulated_total_chunks # Total across all sheets + elif file_ext in ('.xlsx', '.xls', '.xlsm'): + # Process Excel (potentially multiple sheets) + excel_file = pandas.ExcelFile( + temp_file_path, + engine='openpyxl' if file_ext in ('.xlsx', '.xlsm') else 'xlrd' + ) + sheet_names = excel_file.sheet_names + base_name, ext = os.path.splitext(original_filename) + accumulated_total_chunks = 0 + for sheet_name in sheet_names: + update_callback(status=f"Processing sheet '{sheet_name}'...") + # Read specific sheet, get values (not formulas), keep data as string + # Note: pandas typically reads values, not formulas by default. + df = excel_file.parse(sheet_name, keep_default_na=False, dtype=str) - except pandas.errors.EmptyDataError: - print(f"Warning: Tabular file or sheet is empty: {original_filename}") - update_callback(status=f"Warning: File/sheet is empty - {original_filename}", number_of_pages=0) - except Exception as e: - raise Exception(f"Failed processing Tabular file {original_filename}: {e}") + # Create effective filename for this sheet + effective_filename = f"{base_name}-{sheet_name}{ext}" if len(sheet_names) > 1 else original_filename + + args = { + "df": df, + "document_id": document_id, + "user_id": user_id, + "file_name": effective_filename, + "update_callback": update_callback + } + + if is_public_workspace: + args["public_workspace_id"] = public_workspace_id + elif is_group: + args["group_id"] = group_id + + result = process_single_tabular_sheet(**args) + if isinstance(result, tuple) and len(result) == 3: + chunks, tokens, model = result + accumulated_total_chunks += chunks + total_embedding_tokens += tokens + if not embedding_model_name: + embedding_model_name = model + else: + accumulated_total_chunks += result + + total_chunks_saved = accumulated_total_chunks # Total across all sheets + + except pandas.errors.EmptyDataError: + log_event(f"[process_tabular] Warning: Tabular file or sheet is empty: {original_filename}", level=logging.WARNING) + update_callback(status=f"Warning: File/sheet is empty - {original_filename}", number_of_pages=0) + except Exception as e: + raise Exception(f"Failed processing Tabular file {original_filename}: {e}") # Extract metadata if enabled and chunks were processed settings = get_settings() diff --git a/application/single_app/functions_global_actions.py b/application/single_app/functions_global_actions.py index 91f0d9f9..122ea9e8 100644 --- a/application/single_app/functions_global_actions.py +++ b/application/single_app/functions_global_actions.py @@ -11,6 +11,7 @@ import traceback from datetime import datetime from config import cosmos_global_actions_container +from functions_authentication import get_current_user_id from functions_keyvault import keyvault_plugin_save_helper, keyvault_plugin_get_helper, keyvault_plugin_delete_helper, SecretReturnType def get_global_actions(return_type=SecretReturnType.TRIGGER): @@ -60,27 +61,57 @@ def get_global_action(action_id, return_type=SecretReturnType.TRIGGER): return None -def save_global_action(action_data): +def save_global_action(action_data, user_id=None): """ Save or update a global action. Args: action_data (dict): Action data to save + user_id (str, optional): The user ID of the person performing the action Returns: dict: Saved action data or None if failed """ try: + if user_id is None: + user_id = get_current_user_id() + if not user_id: + user_id = "system" + # Ensure required fields if 'id' not in action_data: action_data['id'] = str(uuid.uuid4()) # Add metadata action_data['is_global'] = True - action_data['created_at'] = datetime.utcnow().isoformat() - action_data['updated_at'] = datetime.utcnow().isoformat() + now = datetime.utcnow().isoformat() + + # Check if this is a new action or an update to preserve created_by/created_at + existing_action = None + try: + existing_action = cosmos_global_actions_container.read_item( + item=action_data['id'], + partition_key=action_data['id'] + ) + except Exception: + pass + + if existing_action: + action_data['created_by'] = existing_action.get('created_by') or user_id + action_data['created_at'] = existing_action.get('created_at') or now + else: + action_data['created_by'] = user_id + action_data['created_at'] = now + action_data['modified_by'] = user_id + action_data['modified_at'] = now + action_data['updated_at'] = now print(f"💾 Saving global action: {action_data.get('name', 'Unknown')}") # Store secrets in Key Vault before upsert - action_data = keyvault_plugin_save_helper(action_data, scope_value=action_data.get('id'), scope="global") + action_data = keyvault_plugin_save_helper( + action_data, + scope_value=action_data.get('id'), + scope="global", + existing_plugin=existing_action, + ) result = cosmos_global_actions_container.upsert_item(body=action_data) print(f"✅ Global action saved successfully: {result['id']}") return result @@ -104,7 +135,7 @@ def delete_global_action(action_id): try: print(f"🗑️ Deleting global action: {action_id}") # Delete secrets from Key Vault before deleting the action - action = get_global_action(action_id) + action = get_global_action(action_id, return_type=SecretReturnType.NAME) if action: keyvault_plugin_delete_helper(action, scope_value=action_id, scope="global") cosmos_global_actions_container.delete_item( diff --git a/application/single_app/functions_global_agents.py b/application/single_app/functions_global_agents.py index 5cf6a3d4..87976510 100644 --- a/application/single_app/functions_global_agents.py +++ b/application/single_app/functions_global_agents.py @@ -163,25 +163,46 @@ def get_global_agent(agent_id): return None -def save_global_agent(agent_data): +def save_global_agent(agent_data, user_id=None): """ Save or update a global agent. Args: agent_data (dict): Agent data to save + user_id (str, optional): The user ID of the person performing the action Returns: dict: Saved agent data or None if failed """ try: - user_id = get_current_user_id() + if user_id is None: + user_id = get_current_user_id() cleaned_agent = sanitize_agent_payload(agent_data) if 'id' not in cleaned_agent: cleaned_agent['id'] = str(uuid.uuid4()) cleaned_agent['is_global'] = True cleaned_agent['is_group'] = False - cleaned_agent['created_at'] = datetime.utcnow().isoformat() - cleaned_agent['updated_at'] = datetime.utcnow().isoformat() + now = datetime.utcnow().isoformat() + + # Check if this is a new agent or an update to preserve created_by/created_at + existing_agent = None + try: + existing_agent = cosmos_global_agents_container.read_item( + item=cleaned_agent['id'], + partition_key=cleaned_agent['id'] + ) + except Exception: + pass + + if existing_agent: + cleaned_agent['created_by'] = existing_agent.get('created_by', user_id) + cleaned_agent['created_at'] = existing_agent.get('created_at', now) + else: + cleaned_agent['created_by'] = user_id + cleaned_agent['created_at'] = now + cleaned_agent['modified_by'] = user_id + cleaned_agent['modified_at'] = now + cleaned_agent['updated_at'] = now log_event( "Saving global agent.", extra={"agent_name": cleaned_agent.get('name', 'Unknown')}, diff --git a/application/single_app/functions_group_actions.py b/application/single_app/functions_group_actions.py index bc6aa4ea..450d34e5 100644 --- a/application/single_app/functions_group_actions.py +++ b/application/single_app/functions_group_actions.py @@ -82,14 +82,36 @@ def get_group_action( return _clean_action(action, group_id, return_type) -def save_group_action(group_id: str, action_data: Dict[str, Any]) -> Dict[str, Any]: +def save_group_action(group_id: str, action_data: Dict[str, Any], user_id: Optional[str] = None) -> Dict[str, Any]: """Create or update a group action entry.""" payload = dict(action_data) action_id = payload.get("id") or str(uuid.uuid4()) payload["id"] = action_id payload["group_id"] = group_id - payload["last_updated"] = datetime.utcnow().isoformat() + now = datetime.utcnow().isoformat() + payload["last_updated"] = now + + # Track who created/modified this action + existing_action = None + try: + existing_action = cosmos_group_actions_container.read_item( + item=action_id, + partition_key=group_id, + ) + except exceptions.CosmosResourceNotFoundError: + pass + except Exception: + pass + + if existing_action: + payload["created_by"] = existing_action.get("created_by", user_id) + payload["created_at"] = existing_action.get("created_at", now) + else: + payload["created_by"] = user_id + payload["created_at"] = now + payload["modified_by"] = user_id + payload["modified_at"] = now payload.setdefault("name", "") payload.setdefault("displayName", payload.get("name", "")) @@ -107,7 +129,12 @@ def save_group_action(group_id: str, action_data: Dict[str, Any]) -> Dict[str, A payload.pop("user_id", None) - payload = keyvault_plugin_save_helper(payload, scope_value=group_id, scope="group") + payload = keyvault_plugin_save_helper( + payload, + scope_value=group_id, + scope="group", + existing_plugin=existing_action, + ) try: stored = cosmos_group_actions_container.upsert_item(body=payload) diff --git a/application/single_app/functions_group_agents.py b/application/single_app/functions_group_agents.py index 8bf6f87c..7cbb8324 100644 --- a/application/single_app/functions_group_agents.py +++ b/application/single_app/functions_group_agents.py @@ -63,16 +63,38 @@ def get_group_agent(group_id: str, agent_id: str) -> Optional[Dict[str, Any]]: return None -def save_group_agent(group_id: str, agent_data: Dict[str, Any]) -> Dict[str, Any]: +def save_group_agent(group_id: str, agent_data: Dict[str, Any], user_id: Optional[str] = None) -> Dict[str, Any]: """Create or update a group agent entry.""" payload = sanitize_agent_payload(agent_data) agent_id = payload.get("id") or str(uuid.uuid4()) payload["id"] = agent_id payload["group_id"] = group_id - payload["last_updated"] = datetime.utcnow().isoformat() + now = datetime.utcnow().isoformat() + payload["last_updated"] = now payload["is_global"] = False payload["is_group"] = True + # Track who created/modified this agent + existing_agent = None + try: + existing_agent = cosmos_group_agents_container.read_item( + item=agent_id, + partition_key=group_id, + ) + except exceptions.CosmosResourceNotFoundError: + pass + except Exception: + pass + + if existing_agent: + payload["created_by"] = existing_agent.get("created_by", user_id) + payload["created_at"] = existing_agent.get("created_at", now) + else: + payload["created_by"] = user_id + payload["created_at"] = now + payload["modified_by"] = user_id + payload["modified_at"] = now + # Required/defaulted fields payload.setdefault("name", "") payload.setdefault("display_name", payload.get("name", "")) diff --git a/application/single_app/functions_keyvault.py b/application/single_app/functions_keyvault.py index 2094814f..fbf693c2 100644 --- a/application/single_app/functions_keyvault.py +++ b/application/single_app/functions_keyvault.py @@ -44,12 +44,126 @@ ] ui_trigger_word = "Stored_In_KeyVault" +SQL_PLUGIN_TYPES = {"sql_query", "sql_schema"} +SQL_PLUGIN_SENSITIVE_ADDITIONAL_FIELDS = {"connection_string", "password"} +SQL_PLUGIN_SENSITIVE_AUTH_FIELDS = {"client_secret"} +REDACTED_SECRET_VALUE = "***REDACTED***" class SecretReturnType(Enum): VALUE = "value" TRIGGER = "trigger" NAME = "name" + +def _get_nested_dict_value(data, path): + """Return a nested dictionary value, or None when the path is missing.""" + current = data + for key in path: + if not isinstance(current, dict) or key not in current: + return None + current = current.get(key) + return current + + +def _set_nested_dict_value(data, path, value): + """Set a nested dictionary value while preserving dictionary copies.""" + current = data + for key in path[:-1]: + nested = current.get(key) + if not isinstance(nested, dict): + nested = {} + else: + nested = dict(nested) + current[key] = nested + current = nested + current[path[-1]] = value + + +def _get_existing_secret_reference(existing_plugin, path): + """Return an existing Key Vault reference for the provided path, when present.""" + existing_value = _get_nested_dict_value(existing_plugin or {}, path) + if isinstance(existing_value, str) and validate_secret_name_dynamic(existing_value): + return existing_value + return None + + +def _build_plugin_additional_field_secret_name(plugin_name, field_name): + """Build a stable Key Vault secret base name for plugin additional fields.""" + return f"{plugin_name}-{field_name}".replace("__", "-") + + +def _is_sql_plugin(plugin_dict): + """Return True when the plugin manifest is a SQL action.""" + plugin_type = (plugin_dict or {}).get("type", "") + return isinstance(plugin_type, str) and plugin_type.lower() in SQL_PLUGIN_TYPES + + +def _is_sql_sensitive_additional_field(plugin_dict, field_name): + """Return True when the additional field should be treated as a SQL secret.""" + return _is_sql_plugin(plugin_dict) and field_name in SQL_PLUGIN_SENSITIVE_ADDITIONAL_FIELDS + + +def _store_plugin_secret_reference(updated_plugin, existing_plugin, path, secret_name, scope_value, source, scope): + """Store or preserve a plugin secret reference for the provided nested path.""" + value = _get_nested_dict_value(updated_plugin, path) + if not value: + return + + existing_reference = _get_existing_secret_reference(existing_plugin, path) + + if value == ui_trigger_word: + if existing_reference: + _set_nested_dict_value(updated_plugin, path, existing_reference) + return + _set_nested_dict_value( + updated_plugin, + path, + build_full_secret_name(secret_name, scope_value, source, scope), + ) + return + + if validate_secret_name_dynamic(value): + _set_nested_dict_value(updated_plugin, path, value) + return + + full_secret_name = store_secret_in_key_vault( + secret_name, + value, + scope_value, + source=source, + scope=scope, + ) + _set_nested_dict_value(updated_plugin, path, full_secret_name) + + +def redact_plugin_secret_values(plugin_dict, redaction_value=REDACTED_SECRET_VALUE): + """Return a copy of the plugin manifest with secret-bearing values redacted.""" + if not isinstance(plugin_dict, dict): + return plugin_dict + + redacted = dict(plugin_dict) + auth = redacted.get("auth", {}) + if isinstance(auth, dict): + new_auth = dict(auth) + if new_auth.get("key"): + new_auth["key"] = redaction_value + for auth_field in SQL_PLUGIN_SENSITIVE_AUTH_FIELDS: + if new_auth.get(auth_field): + new_auth[auth_field] = redaction_value + redacted["auth"] = new_auth + + additional_fields = redacted.get("additionalFields", {}) + if isinstance(additional_fields, dict): + new_additional_fields = dict(additional_fields) + for key, value in additional_fields.items(): + if not value: + continue + if key.endswith("__Secret") or _is_sql_sensitive_additional_field(redacted, key): + new_additional_fields[key] = redaction_value + redacted["additionalFields"] = new_additional_fields + + return redacted + def retrieve_secret_from_key_vault(secret_name, scope_value, scope="global", source="global"): """ Retrieve a secret from Key Vault using a dynamic name based on source, scope, and scope_value. @@ -66,10 +180,10 @@ def retrieve_secret_from_key_vault(secret_name, scope_value, scope="global", sou Exception: If retrieval fails or configuration is invalid. """ if source not in supported_sources: - logging.error(f"Source '{source}' is not supported. Supported sources: {supported_sources}") + log_event(f"Source '{source}' is not supported. Supported sources: {supported_sources}", level=logging.ERROR) raise ValueError(f"Source '{source}' is not supported. Supported sources: {supported_sources}") if scope not in supported_scopes: - logging.error(f"Scope '{scope}' is not supported. Supported scopes: {supported_scopes}") + log_event(f"Scope '{scope}' is not supported. Supported scopes: {supported_scopes}", level=logging.ERROR) raise ValueError(f"Scope '{scope}' is not supported. Supported scopes: {supported_scopes}") full_secret_name = build_full_secret_name(secret_name, scope_value, source, scope) @@ -104,12 +218,59 @@ def retrieve_secret_from_key_vault_by_full_name(full_secret_name): secret_client = SecretClient(vault_url=key_vault_url, credential=get_keyvault_credential()) retrieved_secret = secret_client.get_secret(full_secret_name) - print(f"Secret '{full_secret_name}' retrieved successfully from Key Vault.") + log_event(f"Secret '{full_secret_name}' retrieved successfully from Key Vault.", level=logging.INFO) return retrieved_secret.value except Exception as e: - logging.error(f"Failed to retrieve secret '{full_secret_name}' from Key Vault: {str(e)}") + log_event(f"Failed to retrieve secret '{full_secret_name}' from Key Vault: {str(e)}", level=logging.ERROR, exceptionTraceback=True) return full_secret_name +def retrieve_secret_direct(secret_name, settings=None): + """ + Retrieve a secret directly from Key Vault by its exact name, bypassing source/scope name + validation and the enable_key_vault_secret_storage guard. Use this for infrastructure + secrets (e.g. Redis key) where the secret name is arbitrary and not controlled by the + scope_value--source--scope--secret_name convention. + + Args: + secret_name (str): The exact Key Vault secret name. + settings (dict, optional): Settings dict to use directly. If None, falls back to + app_settings_cache.get_settings_cache(). Pass settings explicitly when calling + before the cache is initialised (e.g. during configure_app_cache bootstrap). + + Returns: + str: The secret value. + + Raises: + ValueError: If Key Vault is not configured in settings. + Exception: If the secret cannot be retrieved. + """ + # Use provided settings directly when supplied (e.g. during bootstrap before the + # settings cache is initialised), otherwise fall back to the cache. + if settings is None: + settings = app_settings_cache.get_settings_cache() + + + enable_key_vault_secret_storage = settings.get("enable_key_vault_secret_storage", False) + + if not enable_key_vault_secret_storage: + raise ValueError("Key Vault secret storage is not enabled in settings.") + + key_vault_name = settings.get("key_vault_name", "").strip() + if not key_vault_name: + raise ValueError("Key Vault name is not configured in settings (key_vault_name).") + if not secret_name: + raise ValueError("secret_name must not be empty.") + + try: + key_vault_url = f"https://{key_vault_name}{KEY_VAULT_DOMAIN}" + # Pass settings through so get_keyvault_credential doesn't call the uninitialised cache. + secret_client = SecretClient(vault_url=key_vault_url, credential=get_keyvault_credential(settings=settings)) + retrieved = secret_client.get_secret(secret_name) + log_event(f"Secret '{secret_name}' retrieved successfully from Key Vault.", level=logging.INFO) + return retrieved.value + except Exception as e: + log_event(f"Failed to retrieve secret '{secret_name}' from Key Vault: {str(e)}", level=logging.ERROR, exceptionTraceback=True) + raise def store_secret_in_key_vault(secret_name, secret_value, scope_value, source="global", scope="global"): """ @@ -130,32 +291,31 @@ def store_secret_in_key_vault(secret_name, secret_value, scope_value, source="gl settings = app_settings_cache.get_settings_cache() enable_key_vault_secret_storage = settings.get("enable_key_vault_secret_storage", False) if not enable_key_vault_secret_storage: - logging.warn(f"Key Vault secret storage is not enabled.") + log_event("Key Vault secret storage is not enabled.", level=logging.WARNING) return secret_value key_vault_name = settings.get("key_vault_name", None) if not key_vault_name: - logging.warn(f"Key Vault name is not configured.") + log_event("Key Vault name is not configured.", level=logging.WARNING) return secret_value if source not in supported_sources: - logging.error(f"Source '{source}' is not supported. Supported sources: {supported_sources}") + log_event(f"Source '{source}' is not supported. Supported sources: {supported_sources}", level=logging.ERROR) raise ValueError(f"Source '{source}' is not supported. Supported sources: {supported_sources}") if scope not in supported_scopes: - logging.error(f"Scope '{scope}' is not supported. Supported scopes: {supported_scopes}") + log_event(f"Scope '{scope}' is not supported. Supported scopes: {supported_scopes}", level=logging.ERROR) raise ValueError(f"Scope '{scope}' is not supported. Supported scopes: {supported_scopes}") - full_secret_name = build_full_secret_name(secret_name, scope_value, source, scope) try: key_vault_url = f"https://{key_vault_name}{KEY_VAULT_DOMAIN}" secret_client = SecretClient(vault_url=key_vault_url, credential=get_keyvault_credential()) secret_client.set_secret(full_secret_name, secret_value) - print(f"Secret '{full_secret_name}' stored successfully in Key Vault.") + log_event(f"Secret '{full_secret_name}' stored successfully in Key Vault.", level=logging.INFO) return full_secret_name except Exception as e: - logging.error(f"Failed to store secret '{full_secret_name}' in Key Vault: {str(e)}") + log_event(f"Failed to store secret '{full_secret_name}' in Key Vault: {str(e)}", level=logging.ERROR, exceptionTraceback=True) return secret_value def build_full_secret_name(secret_name, scope_value, source, scope): @@ -175,7 +335,7 @@ def build_full_secret_name(secret_name, scope_value, source, scope): """ full_secret_name = f"{clean_name_for_keyvault(scope_value)}--{source}--{scope}--{clean_name_for_keyvault(secret_name)}" if not validate_secret_name_dynamic(full_secret_name): - logging.error(f"The full secret name '{full_secret_name}' is invalid.") + log_event(f"The full secret name '{full_secret_name}' is invalid.", level=logging.ERROR) raise ValueError(f"The full secret name '{full_secret_name}' is invalid.") return full_secret_name @@ -240,10 +400,10 @@ def keyvault_agent_save_helper(agent_dict, scope_value, scope="global"): full_secret_name = store_secret_in_key_vault(secret_name, value, scope_value, source=source, scope=scope) updated[key] = full_secret_name except Exception as e: - logging.error(f"Failed to store agent key '{key}' in Key Vault: {e}") + log_event(f"Failed to store agent key '{key}' in Key Vault: {e}", level=logging.ERROR, exceptionTraceback=True) raise Exception(f"Failed to store agent key '{key}' in Key Vault: {e}") else: - log_event(f"Agent key '{key}' not found while APIM is '{use_apim}' or empty in agent '{agent_name}'. No action taken.", level="INFO") + log_event(f"Agent key '{key}' not found while APIM is '{use_apim}' or empty in agent '{agent_name}'. No action taken.", level=logging.INFO) return updated def keyvault_agent_get_helper(agent_dict, scope_value, scope="global", return_type=SecretReturnType.TRIGGER): @@ -283,19 +443,21 @@ def keyvault_agent_get_helper(agent_dict, scope_value, scope="global", return_ty else: updated[key] = ui_trigger_word except Exception as e: - logging.error(f"Failed to retrieve agent key '{key}' for agent '{agent_name}' from Key Vault: {e}") + log_event(f"Failed to retrieve agent key '{key}' for agent '{agent_name}' from Key Vault: {e}", level=logging.ERROR, exceptionTraceback=True) return updated return updated -def keyvault_plugin_save_helper(plugin_dict, scope_value, scope="global"): +def keyvault_plugin_save_helper(plugin_dict, scope_value, scope="global", existing_plugin=None): """ For plugin dicts, store the auth.key in Key Vault if auth.type is 'key', 'servicePrincipal', 'basic', or 'connection_string', - and replace its value with the Key Vault secret name. Also supports dynamic secret storage for any additionalFields key ending with '__Secret'. + and replace its value with the Key Vault secret name. Also supports dynamic secret storage for any additionalFields key ending with '__Secret', + along with SQL plugin secret-bearing additional fields such as connection strings and passwords. Args: plugin_dict (dict): The plugin dictionary to process. scope_value (str): The value for the scope (e.g., plugin id). scope (str): The scope (e.g., 'user', 'global'). + existing_plugin (dict, optional): Existing stored plugin manifest used to preserve Key Vault references during edit flows. Returns: dict: A new plugin dict with sensitive values replaced by Key Vault references. @@ -307,58 +469,98 @@ def keyvault_plugin_save_helper(plugin_dict, scope_value, scope="global"): This allows plugin writers to dynamically store secrets without custom code. """ if scope not in supported_scopes: - logging.error(f"Scope '{scope}' is not supported. Supported scopes: {supported_scopes}") + log_event(f"Scope '{scope}' is not supported. Supported scopes: {supported_scopes}", level=logging.ERROR) raise ValueError(f"Scope '{scope}' is not supported. Supported scopes: {supported_scopes}") source = "action" # Use 'action' for plugins per app convention updated = dict(plugin_dict) plugin_name = updated.get('name', 'plugin') auth = updated.get('auth', {}) if isinstance(auth, dict): + auth = dict(auth) + updated['auth'] = auth auth_type = auth.get('type', None) if auth_type in supported_action_auth_types and 'key' in auth and auth['key']: - value = auth['key'] - if value == ui_trigger_word: - auth['key'] = build_full_secret_name(plugin_name, scope_value, source, scope) - updated['auth'] = auth - elif validate_secret_name_dynamic(value): - auth['key'] = build_full_secret_name(plugin_name, scope_value, source, scope) - updated['auth'] = auth - else: + try: + _store_plugin_secret_reference( + updated, + existing_plugin, + ('auth', 'key'), + plugin_name, + scope_value, + source, + scope, + ) + except Exception as e: + log_event(f"Failed to store plugin key in Key Vault: {e}", level=logging.ERROR, exceptionTraceback=True) + raise Exception(f"Failed to store plugin key in Key Vault: {e}") + else: + log_event(f"Auth type '{auth_type}' does not require Key Vault storage for plugin '{plugin_name}'.", level=logging.INFO) + + for auth_field in SQL_PLUGIN_SENSITIVE_AUTH_FIELDS: + if auth.get(auth_field): try: - full_secret_name = store_secret_in_key_vault(plugin_name, value, scope_value, source=source, scope=scope) - new_auth = dict(auth) - new_auth['key'] = full_secret_name - updated['auth'] = new_auth + _store_plugin_secret_reference( + updated, + existing_plugin, + ('auth', auth_field), + f"{plugin_name}-{auth_field}", + scope_value, + source, + scope, + ) except Exception as e: - logging.error(f"Failed to store plugin key in Key Vault: {e}") - raise Exception(f"Failed to store plugin key in Key Vault: {e}") - else: - print(f"Auth type '{auth_type}' does not require Key Vault storage. Does not match ") + log_event( + f"Failed to store plugin auth secret '{auth_field}' in Key Vault: {e}", + level=logging.ERROR, + exceptionTraceback=True, + ) + raise Exception(f"Failed to store plugin auth secret '{auth_field}' in Key Vault: {e}") # Handle additionalFields dynamic secrets additional_fields = updated.get('additionalFields', {}) if isinstance(additional_fields, dict): new_additional_fields = dict(additional_fields) + updated['additionalFields'] = new_additional_fields for k, v in additional_fields.items(): - if k.endswith('__Secret') and v: + if not v: + continue + if k.endswith('__Secret'): addset_source = 'action-addset' base_field = k[:-8] # Remove '__Secret' - akv_key = f"{plugin_name}-{base_field}".replace('__', '-') - full_secret_name = build_full_secret_name(akv_key, scope_value, addset_source, scope) - if v == ui_trigger_word: - new_additional_fields[k] = full_secret_name - continue - elif validate_secret_name_dynamic(v): - new_additional_fields[k] = full_secret_name - continue - else: - try: - full_secret_name = store_secret_in_key_vault(akv_key, v, scope_value, source=addset_source, scope=scope) - new_additional_fields[k] = full_secret_name - except Exception as e: - logging.error(f"Failed to store plugin additionalField secret '{k}' in Key Vault: {e}") - raise Exception(f"Failed to store plugin additionalField secret '{k}' in Key Vault: {e}") - updated['additionalFields'] = new_additional_fields + akv_key = _build_plugin_additional_field_secret_name(plugin_name, base_field) + try: + _store_plugin_secret_reference( + updated, + existing_plugin, + ('additionalFields', k), + akv_key, + scope_value, + addset_source, + scope, + ) + except Exception as e: + log_event(f"Failed to store plugin additionalField secret '{k}' in Key Vault: {e}", level=logging.ERROR, exceptionTraceback=True) + raise Exception(f"Failed to store plugin additionalField secret '{k}' in Key Vault: {e}") + elif _is_sql_sensitive_additional_field(updated, k): + addset_source = 'action-addset' + akv_key = _build_plugin_additional_field_secret_name(plugin_name, k) + try: + _store_plugin_secret_reference( + updated, + existing_plugin, + ('additionalFields', k), + akv_key, + scope_value, + addset_source, + scope, + ) + except Exception as e: + log_event( + f"Failed to store SQL plugin additionalField secret '{k}' in Key Vault: {e}", + level=logging.ERROR, + exceptionTraceback=True, + ) + raise Exception(f"Failed to store SQL plugin additionalField secret '{k}' in Key Vault: {e}") return updated # Helper to retrieve plugin secrets from Key Vault def keyvault_plugin_get_helper(plugin_dict, scope_value, scope="global", return_type=SecretReturnType.TRIGGER): @@ -375,51 +577,45 @@ def keyvault_plugin_get_helper(plugin_dict, scope_value, scope="global", return_ dict: A new plugin dict with sensitive values replaced by ui_trigger_word if stored in Key Vault. """ if scope not in supported_scopes: - logging.error(f"Scope '{scope}' is not supported. Supported scopes: {supported_scopes}") + log_event(f"Scope '{scope}' is not supported. Supported scopes: {supported_scopes}", level=logging.ERROR) raise ValueError(f"Scope '{scope}' is not supported. Supported scopes: {supported_scopes}") updated = dict(plugin_dict) plugin_name = updated.get('name', 'plugin') auth = updated.get('auth', {}) if isinstance(auth, dict): - if 'key' in auth and auth['key']: - value = auth['key'] - if validate_secret_name_dynamic(value): + new_auth = dict(auth) + auth_updated = False + for auth_field in ('key', *SQL_PLUGIN_SENSITIVE_AUTH_FIELDS): + value = auth.get(auth_field) + if value and validate_secret_name_dynamic(value): try: if return_type == SecretReturnType.VALUE: - actual_key = retrieve_secret_from_key_vault_by_full_name(value) - new_auth = dict(auth) - new_auth['key'] = actual_key - updated['auth'] = new_auth + new_auth[auth_field] = retrieve_secret_from_key_vault_by_full_name(value) elif return_type == SecretReturnType.NAME: - new_auth = dict(auth) - new_auth['key'] = value - updated['auth'] = new_auth + new_auth[auth_field] = value else: - new_auth = dict(auth) - new_auth['key'] = ui_trigger_word - updated['auth'] = new_auth + new_auth[auth_field] = ui_trigger_word + auth_updated = True except Exception as e: - logging.error(f"Failed to retrieve action {plugin_name} key from Key Vault: {e}") - raise Exception(f"Failed to retrieve action {plugin_name} key from Key Vault: {e}") + log_event(f"Failed to retrieve action {plugin_name} auth field '{auth_field}' from Key Vault: {e}", level=logging.ERROR, exceptionTraceback=True) + raise Exception(f"Failed to retrieve action {plugin_name} auth field '{auth_field}' from Key Vault: {e}") + if auth_updated: + updated['auth'] = new_auth additional_fields = updated.get('additionalFields', {}) if isinstance(additional_fields, dict): new_additional_fields = dict(additional_fields) for k, v in additional_fields.items(): - if k.endswith('__Secret') and v and validate_secret_name_dynamic(v): - addset_source = 'action-addset' - base_field = k[:-8] # Remove '__Secret' - akv_key = f"{plugin_name}-{base_field}".replace('__', '-') + if (k.endswith('__Secret') or _is_sql_sensitive_additional_field(updated, k)) and v and validate_secret_name_dynamic(v): try: if return_type == SecretReturnType.VALUE: - actual_secret = retrieve_secret_from_key_vault(f"{akv_key}", scope_value, scope, addset_source) - new_additional_fields[k] = actual_secret + new_additional_fields[k] = retrieve_secret_from_key_vault_by_full_name(v) elif return_type == SecretReturnType.NAME: new_additional_fields[k] = v else: new_additional_fields[k] = ui_trigger_word except Exception as e: - logging.error(f"Failed to retrieve action additionalField secret '{k}' from Key Vault: {e}") + log_event(f"Failed to retrieve action additionalField secret '{k}' from Key Vault: {e}", level=logging.ERROR, exceptionTraceback=True) raise Exception(f"Failed to retrieve action additionalField secret '{k}' from Key Vault: {e}") updated['additionalFields'] = new_additional_fields return updated @@ -439,45 +635,41 @@ def keyvault_plugin_delete_helper(plugin_dict, scope_value, scope="global"): Raises: """ if scope not in supported_scopes: - log_event(f"Scope '{scope}' is not supported. Supported scopes: {supported_scopes}", level="WARNING") + log_event(f"Scope '{scope}' is not supported. Supported scopes: {supported_scopes}", level=logging.WARNING) raise ValueError(f"Scope '{scope}' is not supported. Supported scopes: {supported_scopes}") settings = app_settings_cache.get_settings_cache() enable_key_vault_secret_storage = settings.get("enable_key_vault_secret_storage", False) key_vault_name = settings.get("key_vault_name", None) if not enable_key_vault_secret_storage or not key_vault_name: - log_event(f"Key Vault secret storage is not enabled or key vault name is missing.", level="WARNING") + log_event("Key Vault secret storage is not enabled or key vault name is missing.", level=logging.WARNING) return plugin_dict source = "action" plugin_name = plugin_dict.get('name', 'plugin') auth = plugin_dict.get('auth', {}) if isinstance(auth, dict): - if 'key' in auth and auth['key']: - secret_name = auth['key'] - if validate_secret_name_dynamic(secret_name): + for auth_field in ('key', *SQL_PLUGIN_SENSITIVE_AUTH_FIELDS): + secret_name = auth.get(auth_field) + if secret_name and validate_secret_name_dynamic(secret_name): try: key_vault_url = f"https://{key_vault_name}{KEY_VAULT_DOMAIN}" - log_event(f"Deleting action secret '{secret_name}' for action '{plugin_name}' for '{scope}' '{scope_value}'", level="INFO") + log_event(f"Deleting action auth secret '{auth_field}' for action '{plugin_name}' for '{scope}' '{scope_value}'", level=logging.INFO) client = SecretClient(vault_url=key_vault_url, credential=get_keyvault_credential()) client.begin_delete_secret(secret_name) except Exception as e: - logging.error(f"Error deleting action secret '{secret_name}' for action '{plugin_name}': {e}") - raise Exception(f"Error deleting action secret '{secret_name}' for action '{plugin_name}': {e}") + log_event(f"Error deleting action auth secret '{auth_field}' for action '{plugin_name}': {e}", level=logging.ERROR, exceptionTraceback=True) + raise Exception(f"Error deleting action auth secret '{auth_field}' for action '{plugin_name}': {e}") additional_fields = plugin_dict.get('additionalFields', {}) if isinstance(additional_fields, dict): for k, v in additional_fields.items(): - if k.endswith('__Secret') and v and validate_secret_name_dynamic(v): - addset_source = 'action-addset' - base_field = k[:-8] # Remove '__Secret' - akv_key = f"{plugin_name}-{base_field}".replace('__', '-') + if (k.endswith('__Secret') or _is_sql_sensitive_additional_field(plugin_dict, k)) and v and validate_secret_name_dynamic(v): try: - keyvault_secret_name = build_full_secret_name(akv_key, scope_value, addset_source, scope) key_vault_url = f"https://{key_vault_name}{KEY_VAULT_DOMAIN}" - log_event(f"Deleting action additionalField secret '{k}' for action '{plugin_name}' for '{scope}' '{scope_value}'", level="INFO") + log_event(f"Deleting action additionalField secret '{k}' for action '{plugin_name}' for '{scope}' '{scope_value}'", level=logging.INFO) client = SecretClient(vault_url=key_vault_url, credential=get_keyvault_credential()) - client.begin_delete_secret(keyvault_secret_name) + client.begin_delete_secret(v) except Exception as e: - logging.error(f"Error deleting action additionalField secret '{k}' for action '{plugin_name}': {e}") + log_event(f"Error deleting action additionalField secret '{k}' for action '{plugin_name}': {e}", level=logging.ERROR, exceptionTraceback=True) raise Exception(f"Error deleting action additionalField secret '{k}' for action '{plugin_name}': {e}") return plugin_dict @@ -511,22 +703,29 @@ def keyvault_agent_delete_helper(agent_dict, scope_value, scope="global"): if validate_secret_name_dynamic(secret_name): try: key_vault_url = f"https://{key_vault_name}{KEY_VAULT_DOMAIN}" - log_event(f"Deleting agent secret '{secret_name}' for agent '{agent_name}' for '{scope}' '{scope_value}'", level="INFO") + log_event(f"Deleting agent secret '{secret_name}' for agent '{agent_name}' for '{scope}' '{scope_value}'", level=logging.INFO) client = SecretClient(vault_url=key_vault_url, credential=get_keyvault_credential()) client.begin_delete_secret(secret_name) except Exception as e: - logging.error(f"Error deleting secret '{secret_name}' for agent '{agent_name}': {e}") + log_event(f"Error deleting secret '{secret_name}' for agent '{agent_name}': {e}", level=logging.ERROR, exceptionTraceback=True) raise Exception(f"Error deleting secret '{secret_name}' for agent '{agent_name}': {e}") return agent_dict -def get_keyvault_credential(): +def get_keyvault_credential(settings=None): """ Get the Key Vault credential using DefaultAzureCredential, optionally with a managed identity client ID. + Args: + settings (dict, optional): Settings dict to use directly. If None, falls back to + app_settings_cache.get_settings_cache(). Pass settings explicitly when calling + before the cache is initialised (e.g. during configure_app_cache bootstrap). + Returns: DefaultAzureCredential: The credential object for Key Vault access. """ - settings = app_settings_cache.get_settings_cache() + if settings is None: + settings = app_settings_cache.get_settings_cache() + key_vault_identity = settings.get("key_vault_identity", None) if key_vault_identity is not None: credential = DefaultAzureCredential(managed_identity_client_id=key_vault_identity) diff --git a/application/single_app/functions_logging.py b/application/single_app/functions_logging.py index a78f413e..a011a7a8 100644 --- a/application/single_app/functions_logging.py +++ b/application/single_app/functions_logging.py @@ -5,7 +5,7 @@ def add_file_task_to_file_processing_log(document_id, user_id, content): settings = get_settings() - enable_file_processing_log = settings.get('enable_file_processing_log', True) + enable_file_processing_log = settings.get('enable_file_processing_logs', True) if enable_file_processing_log: try: diff --git a/application/single_app/functions_notifications.py b/application/single_app/functions_notifications.py index 15ce11e4..9bb4409c 100644 --- a/application/single_app/functions_notifications.py +++ b/application/single_app/functions_notifications.py @@ -31,6 +31,10 @@ 'icon': 'bi-file-earmark-check', 'color': 'success' }, + 'chat_response_complete': { + 'icon': 'bi-chat-dots', + 'color': 'success' + }, 'document_processing_failed': { 'icon': 'bi-file-earmark-x', 'color': 'danger' @@ -218,6 +222,41 @@ def create_public_workspace_notification( ) +def create_chat_response_notification( + user_id, + conversation_id, + message_id, + conversation_title='', + response_preview='', +): + """Create a personal notification when a chat response completes.""" + normalized_title = str(conversation_title or '').strip() or 'Conversation' + normalized_preview = str(response_preview or '').strip() + if len(normalized_preview) > 160: + normalized_preview = f"{normalized_preview[:157]}..." + + notification_message = ( + normalized_preview + or f'The AI model responded in {normalized_title}.' + ) + + return create_notification( + user_id=user_id, + notification_type='chat_response_complete', + title=f'AI responded in {normalized_title}', + message=notification_message, + link_url=f'/chats?conversationId={conversation_id}', + link_context={ + 'workspace_type': 'personal', + 'conversation_id': conversation_id, + }, + metadata={ + 'conversation_id': conversation_id, + 'message_id': message_id, + } + ) + + def get_user_notifications(user_id, page=1, per_page=20, include_read=True, include_dismissed=False, user_roles=None): """ Fetch notifications visible to a user from personal, group, and public workspace scopes. @@ -452,6 +491,46 @@ def mark_notification_read(notification_id, user_id): return False +def mark_chat_response_notifications_read_for_conversation(user_id, conversation_id): + """Mark personal chat-completion notifications read for a conversation.""" + try: + query = """ + SELECT * FROM c + WHERE c.user_id = @user_id + AND c.notification_type = @notification_type + AND c.metadata.conversation_id = @conversation_id + """ + params = [ + {'name': '@user_id', 'value': user_id}, + {'name': '@notification_type', 'value': 'chat_response_complete'}, + {'name': '@conversation_id', 'value': conversation_id}, + ] + + notifications = list(cosmos_notifications_container.query_items( + query=query, + parameters=params, + partition_key=user_id + )) + + marked_count = 0 + for notification in notifications: + read_by = notification.get('read_by', []) + if user_id in read_by: + continue + + read_by.append(user_id) + notification['read_by'] = read_by + cosmos_notifications_container.upsert_item(notification) + marked_count += 1 + + return marked_count + except Exception as e: + debug_print( + f"Error marking chat response notifications as read for conversation {conversation_id}: {e}" + ) + return 0 + + def dismiss_notification(notification_id, user_id): """ Dismiss a notification for a specific user (adds to dismissed_by). diff --git a/application/single_app/functions_personal_actions.py b/application/single_app/functions_personal_actions.py index 6345438e..56d5e36f 100644 --- a/application/single_app/functions_personal_actions.py +++ b/application/single_app/functions_personal_actions.py @@ -109,19 +109,40 @@ def save_personal_action(user_id, action_data): try: # Check if an action with this name already exists existing_action = None + if action_data.get('id'): + existing_action = get_personal_action( + user_id, + action_data['id'], + return_type=SecretReturnType.NAME, + ) if 'name' in action_data and action_data['name']: - existing_action = get_personal_action(user_id, action_data['name']) + existing_action = existing_action or get_personal_action( + user_id, + action_data['name'], + return_type=SecretReturnType.NAME, + ) # Preserve existing ID if updating, or generate new ID if creating + now = datetime.utcnow().isoformat() if existing_action: - # Update existing action - preserve the original ID + # Update existing action - preserve the original ID and creation tracking action_data['id'] = existing_action['id'] + action_data['created_by'] = existing_action.get('created_by', user_id) + action_data['created_at'] = existing_action.get('created_at', now) elif 'id' not in action_data or not action_data['id']: # New action - generate UUID for ID action_data['id'] = str(uuid.uuid4()) - + action_data['created_by'] = user_id + action_data['created_at'] = now + else: + # Has an ID but no existing action found - treat as new + action_data['created_by'] = user_id + action_data['created_at'] = now + action_data['modified_by'] = user_id + action_data['modified_at'] = now + action_data['user_id'] = user_id - action_data['last_updated'] = datetime.utcnow().isoformat() + action_data['last_updated'] = now # Validate required fields required_fields = ['name', 'displayName', 'type', 'description'] @@ -145,7 +166,12 @@ def save_personal_action(user_id, action_data): action_data['auth']['type'] = 'identity' # Store secrets in Key Vault before upsert - action_data = keyvault_plugin_save_helper(action_data, scope_value=user_id, scope="user") + action_data = keyvault_plugin_save_helper( + action_data, + scope_value=user_id, + scope="user", + existing_plugin=existing_action, + ) result = cosmos_personal_actions_container.upsert_item(body=action_data) # Remove Cosmos metadata from response cleaned_result = {k: v for k, v in result.items() if not k.startswith('_')} @@ -168,7 +194,7 @@ def delete_personal_action(user_id, action_id): """ try: # Try to find the action first to get the correct ID - action = get_personal_action(user_id, action_id) + action = get_personal_action(user_id, action_id, return_type=SecretReturnType.NAME) if not action: return False diff --git a/application/single_app/functions_personal_agents.py b/application/single_app/functions_personal_agents.py index a4a5e47d..3c6c275e 100644 --- a/application/single_app/functions_personal_agents.py +++ b/application/single_app/functions_personal_agents.py @@ -128,9 +128,33 @@ def save_personal_agent(user_id, agent_data): cleaned_agent.setdefault(field, '') if 'id' not in cleaned_agent: cleaned_agent['id'] = str(f"{user_id}_{cleaned_agent.get('name', 'default')}") - + + # Check if this is a new agent or an update to preserve created_by/created_at + existing_agent = None + try: + existing_agent = cosmos_personal_agents_container.read_item( + item=cleaned_agent['id'], + partition_key=user_id + ) + except exceptions.CosmosResourceNotFoundError: + pass + except Exception: + pass + + now = datetime.utcnow().isoformat() + if existing_agent: + # Preserve original creation tracking + cleaned_agent['created_by'] = existing_agent.get('created_by', user_id) + cleaned_agent['created_at'] = existing_agent.get('created_at', now) + else: + # New agent + cleaned_agent['created_by'] = user_id + cleaned_agent['created_at'] = now + cleaned_agent['modified_by'] = user_id + cleaned_agent['modified_at'] = now + cleaned_agent['user_id'] = user_id - cleaned_agent['last_updated'] = datetime.utcnow().isoformat() + cleaned_agent['last_updated'] = now cleaned_agent['is_global'] = False cleaned_agent['is_group'] = False diff --git a/application/single_app/functions_settings.py b/application/single_app/functions_settings.py index 8176939d..a282f409 100644 --- a/application/single_app/functions_settings.py +++ b/application/single_app/functions_settings.py @@ -25,6 +25,7 @@ def get_settings(use_cosmos=False): 'enable_text_plugin': True, 'enable_default_embedding_model_plugin': False, 'enable_fact_memory_plugin': True, + 'enable_tabular_processing_plugin': False, 'enable_multi_agent_orchestration': False, 'max_rounds_per_agent': 1, 'enable_semantic_kernel': False, @@ -205,6 +206,9 @@ def get_settings(use_cosmos=False): 'require_member_of_feedback_admin': False, 'enable_conversation_archiving': False, + # Processing Thoughts + 'enable_thoughts': False, + # Search and Extract 'azure_ai_search_endpoint': '', 'azure_ai_search_key': '', @@ -258,8 +262,12 @@ def get_settings(use_cosmos=False): # Other 'max_file_size_mb': 150, + 'tabular_preview_max_blob_size_mb': 200, 'conversation_history_limit': 10, 'default_system_prompt': '', + # Access denied message shown on the home page for signed-in users who lack required roles. + # Default is hard-coded; admins can override via Admin Settings (persisted in Cosmos DB). + 'access_denied_message': 'You are logged in but do not have the required permissions to access this application.\nPlease contact an administrator for access.', 'enable_file_processing_logs': True, 'file_processing_logs_timer_enabled': False, 'file_timer_value': 1, @@ -268,7 +276,7 @@ def get_settings(use_cosmos=False): 'enable_external_healthcheck': False, # Streaming settings - 'streamingEnabled': False, + 'streamingEnabled': True, # Reasoning effort settings (per-model) 'reasoningEffortSettings': {}, @@ -391,6 +399,9 @@ def update_settings(new_settings): # always fetch the latest settings doc, which includes your merges settings_item = get_settings() settings_item.update(new_settings) + # Dependency enforcement: tabular processing requires enhanced citations + if not settings_item.get('enable_enhanced_citations', False): + settings_item['enable_tabular_processing_plugin'] = False cosmos_settings_container.upsert_item(settings_item) cache_updater = getattr(app_settings_cache, "update_settings_cache", None) if callable(cache_updater): diff --git a/application/single_app/functions_thoughts.py b/application/single_app/functions_thoughts.py new file mode 100644 index 00000000..c6ffe9dd --- /dev/null +++ b/application/single_app/functions_thoughts.py @@ -0,0 +1,256 @@ +# functions_thoughts.py + +import uuid +import time +from datetime import datetime, timezone +from config import cosmos_thoughts_container, cosmos_archived_thoughts_container, cosmos_messages_container +from functions_appinsights import log_event +from functions_settings import get_settings + + +class ThoughtTracker: + """Stateful per-request tracker that writes processing step records to Cosmos DB. + + Each add_thought() call immediately upserts a document so that polling + clients can see partial progress before the final response is sent. + + All Cosmos writes are wrapped in try/except so thought errors never + interrupt the chat processing flow. + """ + + def __init__(self, conversation_id, message_id, thread_id, user_id): + self.conversation_id = conversation_id + self.message_id = message_id + self.thread_id = thread_id + self.user_id = user_id + self.current_index = 0 + settings = get_settings() + self.enabled = settings.get('enable_thoughts', False) + + def add_thought(self, step_type, content, detail=None): + """Write a thought step to Cosmos immediately. + + Args: + step_type: One of search, tabular_analysis, web_search, + agent_tool_call, generation, content_safety. + content: Short human-readable description of the step. + detail: Optional technical detail (function names, params, etc.). + + Returns: + The thought document id, or None if disabled/failed. + """ + if not self.enabled: + return None + + thought_id = str(uuid.uuid4()) + thought_doc = { + 'id': thought_id, + 'conversation_id': self.conversation_id, + 'message_id': self.message_id, + 'thread_id': self.thread_id, + 'user_id': self.user_id, + 'step_index': self.current_index, + 'step_type': step_type, + 'content': content, + 'detail': detail, + 'duration_ms': None, + 'timestamp': datetime.now(timezone.utc).isoformat() + } + self.current_index += 1 + + try: + cosmos_thoughts_container.upsert_item(thought_doc) + except Exception as e: + log_event(f"ThoughtTracker.add_thought failed: {e}", level="WARNING") + return None + + return thought_id + + def complete_thought(self, thought_id, duration_ms): + """Patch an existing thought with its duration after the step finishes.""" + if not self.enabled or not thought_id: + return + + try: + thought_doc = cosmos_thoughts_container.read_item( + item=thought_id, + partition_key=self.user_id + ) + thought_doc['duration_ms'] = duration_ms + cosmos_thoughts_container.upsert_item(thought_doc) + except Exception as e: + log_event(f"ThoughtTracker.complete_thought failed: {e}", level="WARNING") + + def timed_thought(self, step_type, content, detail=None): + """Convenience: add a thought and return a timer helper. + + Usage: + timer = tracker.timed_thought('search', 'Searching documents...') + # ... do work ... + timer.stop() + """ + start = time.time() + thought_id = self.add_thought(step_type, content, detail) + return _ThoughtTimer(self, thought_id, start) + + +class _ThoughtTimer: + """Helper returned by ThoughtTracker.timed_thought() for auto-duration capture.""" + + def __init__(self, tracker, thought_id, start_time): + self._tracker = tracker + self._thought_id = thought_id + self._start = start_time + + def stop(self): + elapsed_ms = int((time.time() - self._start) * 1000) + self._tracker.complete_thought(self._thought_id, elapsed_ms) + return elapsed_ms + + +# --------------------------------------------------------------------------- +# CRUD helpers +# --------------------------------------------------------------------------- + +def get_thoughts_for_message(conversation_id, message_id, user_id): + """Return all thoughts for a specific assistant message, ordered by step_index.""" + try: + query = ( + "SELECT * FROM c " + "WHERE c.conversation_id = @conv_id " + "AND c.message_id = @msg_id " + "ORDER BY c.step_index ASC" + ) + params = [ + {"name": "@conv_id", "value": conversation_id}, + {"name": "@msg_id", "value": message_id}, + ] + results = list(cosmos_thoughts_container.query_items( + query=query, + parameters=params, + partition_key=user_id + )) + return results + except Exception as e: + log_event(f"get_thoughts_for_message failed: {e}", level="WARNING") + return [] + + +def get_pending_thoughts(conversation_id, user_id): + """Return the latest thoughts for a conversation that are still in-progress. + + Used by the polling endpoint. Retrieves thoughts created within the last + 5 minutes for the conversation, grouped by the most recent message_id. + """ + try: + five_minutes_ago = datetime.now(timezone.utc) + from datetime import timedelta + five_minutes_ago = (five_minutes_ago - timedelta(minutes=5)).isoformat() + + query = ( + "SELECT * FROM c " + "WHERE c.conversation_id = @conv_id " + "AND c.timestamp >= @since " + "ORDER BY c.timestamp DESC" + ) + params = [ + {"name": "@conv_id", "value": conversation_id}, + {"name": "@since", "value": five_minutes_ago}, + ] + results = list(cosmos_thoughts_container.query_items( + query=query, + parameters=params, + partition_key=user_id + )) + + if not results: + return [] + + # Group by the most recent message_id + latest_message_id = results[0].get('message_id') + latest_thoughts = [ + t for t in results if t.get('message_id') == latest_message_id + ] + # Return in ascending step_index order + latest_thoughts.sort(key=lambda t: t.get('step_index', 0)) + return latest_thoughts + except Exception as e: + log_event(f"get_pending_thoughts failed: {e}", level="WARNING") + return [] + + +def get_thoughts_for_conversation(conversation_id, user_id): + """Return all thoughts for a conversation.""" + try: + query = ( + "SELECT * FROM c " + "WHERE c.conversation_id = @conv_id " + "ORDER BY c.timestamp ASC" + ) + params = [ + {"name": "@conv_id", "value": conversation_id}, + ] + results = list(cosmos_thoughts_container.query_items( + query=query, + parameters=params, + partition_key=user_id + )) + return results + except Exception as e: + log_event(f"get_thoughts_for_conversation failed: {e}", level="WARNING") + return [] + + +def archive_thoughts_for_conversation(conversation_id, user_id): + """Copy all thoughts for a conversation to the archive container, then delete originals.""" + try: + thoughts = get_thoughts_for_conversation(conversation_id, user_id) + for thought in thoughts: + archived = dict(thought) + archived['archived_at'] = datetime.now(timezone.utc).isoformat() + cosmos_archived_thoughts_container.upsert_item(archived) + + for thought in thoughts: + cosmos_thoughts_container.delete_item( + item=thought['id'], + partition_key=user_id + ) + except Exception as e: + log_event(f"archive_thoughts_for_conversation failed: {e}", level="WARNING") + + +def delete_thoughts_for_conversation(conversation_id, user_id): + """Delete all thoughts for a conversation.""" + try: + thoughts = get_thoughts_for_conversation(conversation_id, user_id) + for thought in thoughts: + cosmos_thoughts_container.delete_item( + item=thought['id'], + partition_key=user_id + ) + except Exception as e: + log_event(f"delete_thoughts_for_conversation failed: {e}", level="WARNING") + + +def delete_thoughts_for_message(message_id, user_id): + """Delete all thoughts associated with a specific assistant message.""" + try: + query = ( + "SELECT * FROM c " + "WHERE c.message_id = @msg_id" + ) + params = [ + {"name": "@msg_id", "value": message_id}, + ] + results = list(cosmos_thoughts_container.query_items( + query=query, + parameters=params, + partition_key=user_id + )) + for thought in results: + cosmos_thoughts_container.delete_item( + item=thought['id'], + partition_key=user_id + ) + except Exception as e: + log_event(f"delete_thoughts_for_message failed: {e}", level="WARNING") diff --git a/application/single_app/gunicorn.conf.py b/application/single_app/gunicorn.conf.py new file mode 100644 index 00000000..8f7e3d5e --- /dev/null +++ b/application/single_app/gunicorn.conf.py @@ -0,0 +1,28 @@ +# gunicorn.conf.py +import os + + +def _env_int(name, default): + value = os.environ.get(name) + if value is None or value == '': + return default + + try: + return int(value) + except ValueError: + return default + + +bind = os.environ.get('GUNICORN_BIND', f"0.0.0.0:{os.environ.get('PORT', '5000')}") +worker_class = os.environ.get('GUNICORN_WORKER_CLASS', 'gthread') +workers = _env_int('GUNICORN_WORKERS', 2) +threads = _env_int('GUNICORN_THREADS', 8) +timeout = _env_int('GUNICORN_TIMEOUT', 900) +graceful_timeout = _env_int('GUNICORN_GRACEFUL_TIMEOUT', 60) +keepalive = _env_int('GUNICORN_KEEPALIVE', 75) +max_requests = _env_int('GUNICORN_MAX_REQUESTS', 500) +max_requests_jitter = _env_int('GUNICORN_MAX_REQUESTS_JITTER', 50) +accesslog = '-' +errorlog = '-' +capture_output = True +preload_app = False diff --git a/application/single_app/openapi_security.py b/application/single_app/openapi_security.py index 52b1751a..e61dfeda 100644 --- a/application/single_app/openapi_security.py +++ b/application/single_app/openapi_security.py @@ -1,35 +1,26 @@ """ OpenAPI File Security Validator -This module provides security validation for OpenAPI specification files -to prevent malicious content from being uploaded or processed. +This module provides security validation for uploaded OpenAPI specification +files to prevent malicious content from being uploaded or processed. """ import os import yaml import json -import tempfile -import requests import re -from typing import Dict, Any, List, Optional, Tuple -from urllib.parse import urlparse +from typing import Dict, Any, List, Tuple from werkzeug.utils import secure_filename class OpenApiSecurityValidator: - """Security validator for OpenAPI specification files and URLs.""" + """Security validator for uploaded OpenAPI specification files.""" # Maximum file size for OpenAPI specs (5MB) MAX_FILE_SIZE = 5 * 1024 * 1024 - # Maximum content size when fetching from URL (10MB) - MAX_URL_CONTENT_SIZE = 10 * 1024 * 1024 - # Allowed file extensions ALLOWED_EXTENSIONS = {'.yaml', '.yml', '.json'} - # Timeout for URL requests (30 seconds) - URL_TIMEOUT = 30 - # Dangerous patterns that should not appear in OpenAPI specs DANGEROUS_PATTERNS = [ # Code injection attempts @@ -96,43 +87,6 @@ def validate_filename(self, filename: str) -> Tuple[bool, str]: return True, "" - def validate_url(self, url: str) -> Tuple[bool, str]: - """Validate URL for security.""" - if not url: - return False, "URL is required" - - try: - parsed = urlparse(url) - - # Only allow HTTP/HTTPS - if parsed.scheme not in ['http', 'https']: - return False, "Only HTTP and HTTPS URLs are allowed" - - # Block localhost and private networks - hostname = parsed.hostname - if not hostname: - return False, "Invalid hostname" - - # Block dangerous hostnames - blocked_hosts = [ - 'localhost', '127.0.0.1', '0.0.0.0', - '::1', '169.254.169.254' # AWS metadata service - ] - - if hostname.lower() in blocked_hosts: - return False, "Access to localhost and metadata services is not allowed" - - # Block private IP ranges (basic check) - if (hostname.startswith('10.') or - hostname.startswith('192.168.') or - hostname.startswith('172.')): - return False, "Access to private networks is not allowed" - - return True, "" - - except Exception as e: - return False, f"Invalid URL format: {str(e)}" - def scan_content_for_threats(self, content: str) -> Tuple[bool, List[str]]: """Scan content for dangerous patterns.""" threats = [] @@ -143,12 +97,10 @@ def scan_content_for_threats(self, content: str) -> Tuple[bool, List[str]]: return len(threats) == 0, threats - def validate_file_size(self, file_size: int, is_url: bool = False) -> Tuple[bool, str]: + def validate_file_size(self, file_size: int) -> Tuple[bool, str]: """Validate file size limits.""" - max_size = self.MAX_URL_CONTENT_SIZE if is_url else self.MAX_FILE_SIZE - - if file_size > max_size: - max_mb = max_size / (1024 * 1024) + if file_size > self.MAX_FILE_SIZE: + max_mb = self.MAX_FILE_SIZE / (1024 * 1024) return False, f"File size exceeds maximum allowed size of {max_mb}MB" return True, "" @@ -236,86 +188,6 @@ def validate_file_content(self, file_path: str) -> Tuple[bool, Dict[str, Any], s except Exception as e: return False, {}, f"Error validating file: {str(e)}" - def validate_url_content(self, url: str) -> Tuple[bool, Dict[str, Any], str]: - """Validate OpenAPI spec from URL.""" - try: - # Validate URL format - url_valid, url_error = self.validate_url(url) - if not url_valid: - return False, {}, url_error - - # Fetch content with security headers - headers = { - 'User-Agent': 'SimpleChat-OpenAPI-Validator/1.0', - 'Accept': 'application/json, application/x-yaml, text/yaml, text/plain', - 'Accept-Encoding': 'gzip, deflate' - } - - response = requests.get( - url, - headers=headers, - timeout=self.URL_TIMEOUT, - stream=True, - allow_redirects=True, - verify=True # Verify SSL certificates - ) - - response.raise_for_status() - - # Check content size before loading - content_length = response.headers.get('content-length') - if content_length and int(content_length) > self.MAX_URL_CONTENT_SIZE: - return False, {}, f"Content size exceeds maximum allowed size" - - # Read content with size limit - content = "" - total_size = 0 - for chunk in response.iter_content(chunk_size=8192, decode_unicode=True): - # chunk is already a string when decode_unicode=True - chunk_size = len(chunk.encode('utf-8')) if isinstance(chunk, str) else len(chunk) - total_size += chunk_size - if total_size > self.MAX_URL_CONTENT_SIZE: - return False, {}, "Content size exceeds maximum allowed size" - content += chunk - - # Validate content size - size_valid, size_error = self.validate_file_size(total_size, is_url=True) - if not size_valid: - return False, {}, size_error - - # Scan for dangerous patterns - safe, threats = self.scan_content_for_threats(content) - if not safe: - return False, {}, f"Security threats detected: {'; '.join(threats)}" - - # Parse content - content_type = response.headers.get('content-type', '').lower() - try: - if 'yaml' in content_type or url.endswith(('.yaml', '.yml')): - spec = yaml.safe_load(content) - else: - spec = json.loads(content) - except (yaml.YAMLError, json.JSONDecodeError) as e: - return False, {}, f"Invalid content format: {str(e)}" - - # Validate OpenAPI structure - structure_valid, structure_error = self.validate_openapi_structure(spec) - if not structure_valid: - return False, {}, structure_error - - return True, spec, "" - - except requests.exceptions.Timeout: - return False, {}, "Request timeout - URL took too long to respond" - except requests.exceptions.SSLError: - return False, {}, "SSL certificate verification failed" - except requests.exceptions.ConnectionError: - return False, {}, "Connection error - unable to reach URL" - except requests.exceptions.HTTPError as e: - return False, {}, f"HTTP error: {e.response.status_code}" - except Exception as e: - return False, {}, f"Error fetching URL content: {str(e)}" - def create_safe_filename(self, original_filename: str) -> str: """Create a safe filename for storage.""" # Use werkzeug's secure_filename but ensure we keep the extension @@ -344,11 +216,6 @@ def validate_openapi_file(file_path: str) -> Tuple[bool, Dict[str, Any], str]: return openapi_validator.validate_file_content(file_path) -def validate_openapi_url(url: str) -> Tuple[bool, Dict[str, Any], str]: - """Convenience function to validate an OpenAPI spec from URL.""" - return openapi_validator.validate_url_content(url) - - def is_safe_openapi_filename(filename: str) -> bool: """Quick check if filename is safe for OpenAPI specs.""" valid, _ = openapi_validator.validate_filename(filename) diff --git a/application/single_app/requirements.txt b/application/single_app/requirements.txt index c8156eab..ac378dde 100644 --- a/application/single_app/requirements.txt +++ b/application/single_app/requirements.txt @@ -38,7 +38,7 @@ langchain-text-splitters==0.3.9 beautifulsoup4==4.13.3 openpyxl==3.1.5 xlrd==2.0.1 -pillow==11.1.0 +pillow==12.1.1 ffmpeg-binaries-compat==1.0.1 ffmpeg-python==0.2.0 semantic-kernel>=1.39.4 diff --git a/application/single_app/route_backend_agents.py b/application/single_app/route_backend_agents.py index 57097ee5..2f631af7 100644 --- a/application/single_app/route_backend_agents.py +++ b/application/single_app/route_backend_agents.py @@ -23,6 +23,11 @@ from functions_appinsights import log_event from json_schema_validation import validate_agent from swagger_wrapper import swagger_route, get_auth_security +from functions_activity_logging import ( + log_agent_creation, + log_agent_update, + log_agent_deletion, +) bpa = Blueprint('admin_agents', __name__) @@ -147,6 +152,18 @@ def set_user_agents(): for agent_name in agents_to_delete: delete_personal_agent(user_id, agent_name) + # Log individual agent activities + for agent in filtered_agents: + a_name = agent.get('name', '') + a_id = agent.get('id', '') + a_display = agent.get('display_name', a_name) + if a_name in current_agent_names: + log_agent_update(user_id=user_id, agent_id=a_id, agent_name=a_name, agent_display_name=a_display, scope='personal') + else: + log_agent_creation(user_id=user_id, agent_id=a_id, agent_name=a_name, agent_display_name=a_display, scope='personal') + for agent_name in agents_to_delete: + log_agent_deletion(user_id=user_id, agent_id=agent_name, agent_name=agent_name, scope='personal') + log_event("User agents updated", extra={"user_id": user_id, "agents_count": len(filtered_agents)}) return jsonify({'success': True}) @@ -175,6 +192,9 @@ def delete_user_agent(agent_name): # Delete from personal_agents container delete_personal_agent(user_id, agent_name) + # Log agent deletion activity + log_agent_deletion(user_id=user_id, agent_id=agent_to_delete.get('id', agent_name), agent_name=agent_name, scope='personal') + # Check if there are any agents left and if they match global_selected_agent remaining_agents = get_personal_agents(user_id) if len(remaining_agents) > 0: @@ -270,11 +290,12 @@ def create_group_agent_route(): cleaned_payload.pop(key, None) try: - saved = save_group_agent(active_group, cleaned_payload) + saved = save_group_agent(active_group, cleaned_payload, user_id=user_id) except Exception as exc: debug_print('Failed to save group agent: %s', exc) return jsonify({'error': 'Unable to save agent'}), 500 + log_agent_creation(user_id=user_id, agent_id=saved.get('id', ''), agent_name=saved.get('name', ''), agent_display_name=saved.get('display_name', ''), scope='group', group_id=active_group) return jsonify(saved), 201 @@ -325,11 +346,12 @@ def update_group_agent_route(agent_id): return jsonify({'error': str(exc)}), 400 try: - saved = save_group_agent(active_group, cleaned_payload) + saved = save_group_agent(active_group, cleaned_payload, user_id=user_id) except Exception as exc: debug_print('Failed to update group agent %s: %s', agent_id, exc) return jsonify({'error': 'Unable to update agent'}), 500 + log_agent_update(user_id=user_id, agent_id=agent_id, agent_name=saved.get('name', ''), agent_display_name=saved.get('display_name', ''), scope='group', group_id=active_group) return jsonify(saved), 200 @@ -360,6 +382,7 @@ def delete_group_agent_route(agent_id): if not removed: return jsonify({'error': 'Agent not found'}), 404 + log_agent_deletion(user_id=user_id, agent_id=agent_id, agent_name=agent_id, scope='group', group_id=active_group) return jsonify({'message': 'Agent deleted'}), 200 # User endpoint to set selected agent (new model, not legacy default_agent) @@ -504,10 +527,11 @@ def add_agent(): cleaned_agent['id'] = '15b0c92a-741d-42ff-ba0b-367c7ee0c848' # Save to global agents container - result = save_global_agent(cleaned_agent) + result = save_global_agent(cleaned_agent, user_id=str(get_current_user_id())) if not result: return jsonify({'error': 'Failed to save agent.'}), 500 + log_agent_creation(user_id=str(get_current_user_id()), agent_id=cleaned_agent.get('id', ''), agent_name=cleaned_agent.get('name', ''), agent_display_name=cleaned_agent.get('display_name', ''), scope='global') log_event("Agent added", extra={"action": "add", "agent": {k: v for k, v in cleaned_agent.items() if k != 'id'}, "user": str(get_current_user_id())}) # --- HOT RELOAD TRIGGER --- setattr(builtins, "kernel_reload_needed", True) @@ -615,10 +639,11 @@ def edit_agent(agent_name): return jsonify({'error': 'Agent not found.'}), 404 # Save the updated agent - result = save_global_agent(cleaned_agent) + result = save_global_agent(cleaned_agent, user_id=str(get_current_user_id())) if not result: return jsonify({'error': 'Failed to save agent.'}), 500 + log_agent_update(user_id=str(get_current_user_id()), agent_id=cleaned_agent.get('id', ''), agent_name=agent_name, agent_display_name=cleaned_agent.get('display_name', ''), scope='global') log_event( f"Agent {agent_name} edited", extra={ @@ -660,6 +685,7 @@ def delete_agent(agent_name): if not success: return jsonify({'error': 'Failed to delete agent.'}), 500 + log_agent_deletion(user_id=str(get_current_user_id()), agent_id=agent_to_delete.get('id', ''), agent_name=agent_name, scope='global') log_event("Agent deleted", extra={"action": "delete", "agent_name": agent_name, "user": str(get_current_user_id())}) # --- HOT RELOAD TRIGGER --- setattr(builtins, "kernel_reload_needed", True) diff --git a/application/single_app/route_backend_chats.py b/application/single_app/route_backend_chats.py index e452fed4..33ce24aa 100644 --- a/application/single_app/route_backend_chats.py +++ b/application/single_app/route_backend_chats.py @@ -13,10 +13,13 @@ import asyncio, types import ast import json +import os +import queue import re +import threading from typing import Any, Dict, List, Mapping, Optional from config import * -from flask import g +from flask import Response, copy_current_request_context, g, stream_with_context from functions_authentication import * from functions_search import * from functions_settings import * @@ -24,28 +27,2096 @@ from functions_group import find_group_by_id, get_user_role_in_group from functions_chat import * from functions_conversation_metadata import collect_conversation_metadata, update_conversation_with_metadata +from functions_conversation_unread import mark_conversation_unread from functions_debug import debug_print +from functions_notifications import create_chat_response_notification from functions_activity_logging import log_chat_activity, log_conversation_creation, log_token_usage from flask import current_app from swagger_wrapper import swagger_route, get_auth_security +from functions_thoughts import ThoughtTracker + + +def get_tabular_discovery_function_names(): + """Return discovery-oriented tabular function names from the plugin.""" + from semantic_kernel_plugins.tabular_processing_plugin import TabularProcessingPlugin + + return TabularProcessingPlugin.get_discovery_function_names() + + +def get_tabular_analysis_function_names(): + """Return analytical tabular function names from the plugin.""" + from semantic_kernel_plugins.tabular_processing_plugin import TabularProcessingPlugin + + return TabularProcessingPlugin.get_analysis_function_names() + + +def get_tabular_thought_excluded_parameter_names(): + """Return tabular parameter names hidden from thought details.""" + from semantic_kernel_plugins.tabular_processing_plugin import TabularProcessingPlugin + + return TabularProcessingPlugin.get_thought_excluded_parameter_names() + + +def is_tabular_schema_summary_question(user_question): + """Return True for workbook-structure questions that should use schema summary tooling.""" + normalized_question = re.sub(r'\s+', ' ', str(user_question or '').strip().lower()) + if not normalized_question: + return False + + direct_phrases = ( + 'summarize this workbook', + 'summarize the workbook', + 'describe this workbook', + 'describe the workbook', + 'what worksheets', + 'which worksheets', + 'what sheets', + 'which sheets', + 'what tabs', + 'which tabs', + 'what does each worksheet represent', + 'what does each sheet represent', + 'what does each tab represent', + 'what do the worksheets represent', + 'what do the sheets represent', + 'how are they related', + 'how do they relate', + 'workbook schema', + 'worksheet schema', + 'sheet schema', + ) + if any(phrase in normalized_question for phrase in direct_phrases): + return True + + structure_patterns = ( + r'\bwhich sheet\b.*\b(contain|contains|has|holds)\b', + r'\bwhat sheet\b.*\b(contain|contains|has|holds)\b', + r'\bhow (are|do)\b.*\b(worksheets|sheets|tabs)\b.*\b(relate|related)\b', + ) + return any(re.search(pattern, normalized_question) for pattern in structure_patterns) + + +def is_tabular_entity_lookup_question(user_question): + """Return True for cross-sheet entity lookup questions that need related-record traversal.""" + normalized_question = re.sub(r'\s+', ' ', str(user_question or '').strip().lower()) + if not normalized_question or is_tabular_schema_summary_question(normalized_question): + return False + + direct_phrases = ( + 'find taxpayer', + 'find return', + 'show their profile', + 'related records', + 'full story', + 'case history', + ) + relationship_keywords = ( + 'profile', + 'tax return summary', + 'w-2', + 'w2', + '1099', + 'payment', + 'refund', + 'notice', + 'audit', + 'installment agreement', + 'installment', + 'related', + ) + if any(phrase in normalized_question for phrase in direct_phrases) and any( + keyword in normalized_question for keyword in relationship_keywords + ): + return True + + entity_lookup_patterns = ( + r'\bfind\b.*\b(show|summarize|explain)\b.*\b(profile|related|record|records)\b', + r'\b(show|summarize)\b.*\b(profile|related|record|records)\b.*\b(w-2|w2|1099|payment|refund|notice|audit|installment)\b', + ) + return any(re.search(pattern, normalized_question) for pattern in entity_lookup_patterns) + + +def is_tabular_cross_sheet_bridge_question(user_question): + """Return True for grouped analytical questions that may need multiple worksheets.""" + normalized_question = re.sub(r'\s+', ' ', str(user_question or '').strip().lower()) + if ( + not normalized_question + or is_tabular_schema_summary_question(normalized_question) + or is_tabular_entity_lookup_question(normalized_question) + ): + return False + + aggregate_keywords = ( + 'how many', + 'count', + 'counts', + 'total', + 'totals', + 'sum', + 'average', + 'avg', + 'minimum', + 'maximum', + 'min', + 'max', + ) + grouping_patterns = ( + r'\bfor each\b', + r'\beach\b', + r'\bper\b', + r'\bby\b\s+[a-z0-9_\-]+(?:\s+[a-z0-9_\-]+){0,2}', + ) + + return any(keyword in normalized_question for keyword in aggregate_keywords) and any( + re.search(pattern, normalized_question) for pattern in grouping_patterns + ) + + +def get_tabular_execution_mode(user_question): + """Select the tabular orchestration mode for the user's question.""" + if is_tabular_schema_summary_question(user_question): + return 'schema_summary' + if is_tabular_entity_lookup_question(user_question): + return 'entity_lookup' + return 'analysis' + + +def build_tabular_fallback_system_message(tabular_filenames_str, execution_mode='analysis'): + """Build the final GPT fallback guidance after the mini SK pass fails.""" + if execution_mode == 'schema_summary': + return ( + f"IMPORTANT: The selected workspace tabular file(s) are {tabular_filenames_str}. " + "The search results include a workbook schema summary with worksheet names, columns, and sample rows, but they do not include the full data. " + "For workbook-structure questions such as what worksheets exist, what each worksheet represents, and how the sheets relate, answer from the schema summary only. " + "Do not mention running additional plugin tools or performing calculations that were not completed. " + "If a relationship is only implied by shared columns or names, describe it as an inferred relationship rather than a confirmed join." + ) + + return ( + f"IMPORTANT: The selected workspace tabular file(s) are {tabular_filenames_str}. " + "The prior tabular tool pass could not compute tool-backed results. " + "The search results contain only a schema summary (column names and a few sample rows), NOT the full data. " + "Answer cautiously using only the schema summary already provided. " + "Do not invent numeric totals, claim that full-data analysis succeeded, or mention additional plugin calls that were not completed. " + "If the user's question requires computed values that are not present in the schema summary, say that the computation could not be completed from the available tool results." + ) + + +def build_search_augmentation_system_prompt(retrieved_content): + """Build the retrieval augmentation prompt without blocking later tool-backed results.""" + return f"""You are an AI assistant. Use the following retrieved document excerpts to answer the user's question. Cite sources using the format (Source: filename, Page: page number). + + Retrieved Excerpts: + {retrieved_content} + + Base your answer only on information supported by the retrieved excerpts and any computed tool-backed results included elsewhere in this conversation context. If the answer is not supported by that information, say so. + If computed tabular results are provided in another system message, treat them as authoritative for row-level values, calculations, and numeric conclusions. Do not say that you lack direct access to the data when those computed results are present. + + Example + User: What is the policy on double dipping? + Assistant: The policy prohibits entities from using federal funds received through one program to apply for additional funds through another program, commonly known as 'double dipping' (Source: PolicyDocument.pdf, Page: 12) + """ + + +def build_tabular_computed_results_system_message(source_label, tabular_analysis): + """Build the outer-model handoff message for successful tabular analysis.""" + return ( + f"The following tabular results were computed from {source_label} using " + f"tabular_processing plugin functions:\n\n" + f"{tabular_analysis}\n\n" + "These are tool-backed results derived from the full underlying tabular data, not just retrieved schema excerpts. " + "Treat them as authoritative for row-level facts, calculations, and numeric conclusions. " + "Do not say that you lack direct access to the data if the answer is present in these computed results." + ) def get_kernel(): return getattr(g, 'kernel', None) or getattr(builtins, 'kernel', None) -def get_kernel_agents(): - g_agents = getattr(g, 'kernel_agents', None) - builtins_agents = getattr(builtins, 'kernel_agents', None) - log_event(f"[SKChat] get_kernel_agents - g.kernel_agents: {type(g_agents)} ({len(g_agents) if g_agents else 0} agents), builtins.kernel_agents: {type(builtins_agents)} ({len(builtins_agents) if builtins_agents else 0} agents)", level=logging.INFO) - return g_agents or builtins_agents + +def get_kernel_agents(): + g_agents = getattr(g, 'kernel_agents', None) + builtins_agents = getattr(builtins, 'kernel_agents', None) + log_event(f"[SKChat] get_kernel_agents - g.kernel_agents: {type(g_agents)} ({len(g_agents) if g_agents else 0} agents), builtins.kernel_agents: {type(builtins_agents)} ({len(builtins_agents) if builtins_agents else 0} agents)", level=logging.INFO) + return g_agents or builtins_agents + + +def is_personal_chat_conversation(conversation_item): + """Return True when a conversation belongs to personal chat scope.""" + chat_type = str((conversation_item or {}).get('chat_type') or '').strip().lower() + return not chat_type.startswith('group') and not chat_type.startswith('public') + + +class BackgroundStreamBridge: + """Relay SSE events from a background worker to the active HTTP stream.""" + + def __init__(self, max_queue_size=200): + self._queue = queue.Queue(maxsize=max_queue_size) + self._sentinel = object() + self._consumer_attached = True + self._state_lock = threading.Lock() + + def push(self, event): + """Queue an SSE event unless the consumer has already detached.""" + while True: + with self._state_lock: + consumer_attached = self._consumer_attached + + if not consumer_attached: + return False + + try: + self._queue.put(event, timeout=0.25) + return True + except queue.Full: + continue + + def finish(self): + """Signal stream completion to the active consumer.""" + while True: + with self._state_lock: + consumer_attached = self._consumer_attached + + if not consumer_attached: + return + + try: + self._queue.put(self._sentinel, timeout=0.25) + return + except queue.Full: + continue + + def iter_events(self): + """Yield queued SSE events until the worker finishes.""" + while True: + next_item = self._queue.get() + if next_item is self._sentinel: + break + yield next_item + + def detach_consumer(self): + """Stop queueing new events once the HTTP consumer disconnects.""" + with self._state_lock: + already_detached = not self._consumer_attached + self._consumer_attached = False + + if already_detached: + return + + while True: + try: + self._queue.get_nowait() + except queue.Empty: + break + + +def get_new_plugin_invocations(invocations, baseline_count): + """Return only the plugin invocations created after the baseline count.""" + if not invocations: + return [] + + if baseline_count <= 0: + return list(invocations) + + if baseline_count >= len(invocations): + return [] + + return list(invocations[baseline_count:]) + + +def split_tabular_plugin_invocations(invocations): + """Split tabular plugin invocations into discovery and analytical categories.""" + discovery_invocations = [] + analytical_invocations = [] + other_invocations = [] + + for invocation in invocations or []: + function_name = getattr(invocation, 'function_name', '') + + if function_name in get_tabular_discovery_function_names(): + discovery_invocations.append(invocation) + elif function_name in get_tabular_analysis_function_names(): + analytical_invocations.append(invocation) + else: + other_invocations.append(invocation) + + return discovery_invocations, analytical_invocations, other_invocations + + +def get_tabular_invocation_result_payload(invocation): + """Parse a tabular invocation result payload when it is JSON-like.""" + result = getattr(invocation, 'result', None) + if isinstance(result, dict): + return result + if not isinstance(result, str): + return None + + try: + payload = json.loads(result) + except Exception: + return None + + return payload if isinstance(payload, dict) else None + + +def get_tabular_invocation_error_message(invocation): + """Return an error message for a tabular invocation, including JSON error payloads.""" + explicit_error_message = getattr(invocation, 'error_message', None) + if explicit_error_message: + return str(explicit_error_message) + + result_payload = get_tabular_invocation_result_payload(invocation) + if result_payload and result_payload.get('error'): + return str(result_payload['error']) + + return None + + +def get_tabular_invocation_candidate_sheets(invocation): + """Return candidate workbook sheets suggested by a tabular tool error payload.""" + result_payload = get_tabular_invocation_result_payload(invocation) + candidate_sheets = result_payload.get('candidate_sheets') if result_payload else None + if not isinstance(candidate_sheets, list): + return [] + + normalized_candidate_sheets = [] + seen_candidate_sheets = set() + for candidate_sheet in candidate_sheets: + normalized_candidate_sheet = str(candidate_sheet or '').strip() + if not normalized_candidate_sheet: + continue + + lowercase_candidate_sheet = normalized_candidate_sheet.lower() + if lowercase_candidate_sheet in seen_candidate_sheets: + continue + + seen_candidate_sheets.add(lowercase_candidate_sheet) + normalized_candidate_sheets.append(normalized_candidate_sheet) + + return normalized_candidate_sheets + + +def get_tabular_invocation_selected_sheet(invocation): + """Return the resolved sheet used by a tabular invocation when available.""" + result_payload = get_tabular_invocation_result_payload(invocation) or {} + invocation_parameters = getattr(invocation, 'parameters', {}) or {} + + selected_sheet = str( + result_payload.get('selected_sheet') + or invocation_parameters.get('sheet_name') + or '' + ).strip() + return selected_sheet or None + + +def get_tabular_invocation_data_rows(invocation): + """Return tabular result rows when the invocation payload includes them.""" + result_payload = get_tabular_invocation_result_payload(invocation) or {} + rows = result_payload.get('data') + return rows if isinstance(rows, list) else [] + + +def normalize_tabular_overlap_value(value): + """Normalize row identifier values so they can be intersected reliably.""" + if isinstance(value, (dict, list, tuple)): + return json.dumps(value, sort_keys=True, default=str) + if value is None: + return None + return str(value) + + +def get_tabular_overlap_identifier_column(row_sets): + """Return a shared identifier column suitable for intersecting row sets.""" + common_columns = None + + for rows in row_sets or []: + if not rows: + return None + + row_columns = set() + for row in rows: + if not isinstance(row, dict): + continue + row_columns.update(str(column_name) for column_name in row.keys()) + + if not row_columns: + return None + + if common_columns is None: + common_columns = row_columns + else: + common_columns &= row_columns + + if not common_columns: + return None + + identifier_candidates = [ + column_name for column_name in common_columns + if column_name.lower() == 'id' or column_name.lower().endswith('id') + ] + if not identifier_candidates: + return None + + preferred_order = { + 'flightid': 0, + 'returnid': 1, + 'taxpayerid': 2, + 'paymentid': 3, + 'caseid': 4, + 'accountid': 5, + 'recordid': 6, + 'id': 7, + } + + return sorted( + identifier_candidates, + key=lambda column_name: ( + preferred_order.get(column_name.lower(), 99), + column_name.lower(), + ), + )[0] + + +def describe_tabular_invocation_conditions(invocation): + """Render a compact description of the invocation filters for raw fallbacks.""" + parameters = getattr(invocation, 'parameters', {}) or {} + + query_expression = str(parameters.get('query_expression') or '').strip() + if query_expression: + return query_expression + + column_name = str(parameters.get('column') or '').strip() + operator = str(parameters.get('operator') or '').strip() + value = parameters.get('value') + if column_name and operator: + return f"{column_name} {operator} {value}" + + lookup_column = str(parameters.get('lookup_column') or '').strip() + lookup_value = parameters.get('lookup_value') + if lookup_column: + return f"{lookup_column} == {lookup_value}" + + return None + + +def get_tabular_query_overlap_summary(invocations, max_rows=25): + """Summarize overlap across successful row-returning tabular calls. + + This is a defensive fallback for cases where tool execution succeeded but the + inner SK synthesis step failed before it could combine the results. + """ + grouped_invocations = {} + + for invocation in invocations or []: + function_name = getattr(invocation, 'function_name', '') + if function_name not in {'query_tabular_data', 'filter_rows'}: + continue + + rows = get_tabular_invocation_data_rows(invocation) + if not rows: + continue + + result_payload = get_tabular_invocation_result_payload(invocation) or {} + group_key = ( + str(result_payload.get('filename') or '').strip(), + str(get_tabular_invocation_selected_sheet(invocation) or '').strip(), + ) + grouped_invocations.setdefault(group_key, []).append({ + 'invocation': invocation, + 'rows': rows, + 'payload': result_payload, + }) + + best_summary = None + + for (filename, selected_sheet), grouped_items in grouped_invocations.items(): + if len(grouped_items) < 2: + continue + + row_sets = [grouped_item['rows'] for grouped_item in grouped_items] + identifier_column = get_tabular_overlap_identifier_column(row_sets) + if not identifier_column: + continue + + overlapping_keys = None + for rows in row_sets: + row_keys = { + normalize_tabular_overlap_value(row.get(identifier_column)) + for row in rows + if isinstance(row, dict) and normalize_tabular_overlap_value(row.get(identifier_column)) is not None + } + if overlapping_keys is None: + overlapping_keys = row_keys + else: + overlapping_keys &= row_keys + + if not overlapping_keys: + continue + + ordered_sample_rows = [] + seen_sample_keys = set() + for row in grouped_items[0]['rows']: + if not isinstance(row, dict): + continue + + row_key = normalize_tabular_overlap_value(row.get(identifier_column)) + if row_key not in overlapping_keys or row_key in seen_sample_keys: + continue + + ordered_sample_rows.append(row) + seen_sample_keys.add(row_key) + if len(ordered_sample_rows) >= max_rows: + break + + source_queries = [] + for grouped_item in grouped_items: + rendered_conditions = describe_tabular_invocation_conditions(grouped_item['invocation']) + if rendered_conditions: + source_queries.append(rendered_conditions) + + overlap_summary = { + 'filename': filename or None, + 'selected_sheet': selected_sheet or None, + 'identifier_column': identifier_column, + 'overlap_count': len(overlapping_keys), + 'sample_rows': ordered_sample_rows, + 'sample_rows_limited': len(overlapping_keys) > len(ordered_sample_rows), + 'source_queries': source_queries, + } + + if best_summary is None or overlap_summary['overlap_count'] > best_summary['overlap_count']: + best_summary = overlap_summary + + return best_summary + + +def get_tabular_invocation_compact_payload(invocation, max_rows=10): + """Return a compact, prompt-safe summary of a successful tabular invocation.""" + result_payload = get_tabular_invocation_result_payload(invocation) + if not result_payload: + return None + + function_name = getattr(invocation, 'function_name', '') + compact_payload = { + 'function': function_name, + 'filename': result_payload.get('filename'), + 'selected_sheet': result_payload.get('selected_sheet'), + } + + if function_name == 'aggregate_column': + compact_payload.update({ + 'column': result_payload.get('column'), + 'operation': result_payload.get('operation'), + 'result': result_payload.get('result'), + }) + elif function_name in {'group_by_aggregate', 'group_by_datetime_component'}: + for key_name in ( + 'group_by', + 'date_component', + 'aggregate_column', + 'operation', + 'groups', + 'highest_group', + 'highest_value', + 'lowest_group', + 'lowest_value', + 'top_results', + ): + if key_name in result_payload: + compact_payload[key_name] = result_payload.get(key_name) + elif function_name == 'lookup_value': + for key_name in ( + 'lookup_column', + 'lookup_value', + 'target_column', + 'value', + 'total_matches', + 'returned_rows', + ): + if key_name in result_payload: + compact_payload[key_name] = result_payload.get(key_name) + + data_rows = get_tabular_invocation_data_rows(invocation) + if data_rows: + compact_payload['sample_rows'] = data_rows[:max_rows] + compact_payload['sample_rows_limited'] = len(data_rows) > max_rows + elif function_name in {'query_tabular_data', 'filter_rows'}: + for key_name in ('total_matches', 'returned_rows'): + if key_name in result_payload: + compact_payload[key_name] = result_payload.get(key_name) + + data_rows = get_tabular_invocation_data_rows(invocation) + if data_rows: + compact_payload['sample_rows'] = data_rows[:max_rows] + compact_payload['sample_rows_limited'] = len(data_rows) > max_rows + + rendered_conditions = describe_tabular_invocation_conditions(invocation) + if rendered_conditions: + compact_payload['conditions'] = rendered_conditions + else: + compact_payload.update(result_payload) + + return compact_payload + + +def build_tabular_analysis_fallback_from_invocations(invocations): + """Build a compact computed-results handoff from successful tool calls. + + Used when the mini SK tabular pass completed tool execution but failed to + produce a final natural-language synthesis response. + """ + successful_invocations = [ + invocation for invocation in (invocations or []) + if not get_tabular_invocation_error_message(invocation) + ] + if not successful_invocations: + return None + + overlap_summary = get_tabular_query_overlap_summary(successful_invocations) + compact_results = [] + for invocation in successful_invocations[:8]: + compact_payload = get_tabular_invocation_compact_payload(invocation) + if compact_payload is None: + continue + compact_results.append(compact_payload) + + if not overlap_summary and not compact_results: + return None + + rendered_sections = [ + "The following structured results come directly from successful tabular tool executions.", + "Use them as computed evidence even though the inner tabular synthesis step did not complete.", + ] + + if overlap_summary: + rendered_sections.append( + "OVERLAP SUMMARY:\n" + f"{json.dumps(overlap_summary, indent=2, default=str)}" + ) + + if compact_results: + rendered_sections.append( + "TOOL RESULT SUMMARIES:\n" + f"{json.dumps(compact_results, indent=2, default=str)}" + ) + + return "\n\n".join(rendered_sections) + + +def get_tabular_invocation_selected_sheets(invocations): + """Return unique selected-sheet names for a group of tabular invocations.""" + selected_sheets = [] + seen_sheet_names = set() + + for invocation in invocations or []: + selected_sheet = get_tabular_invocation_selected_sheet(invocation) + if not selected_sheet: + continue + + lowered_sheet_name = selected_sheet.lower() + if lowered_sheet_name in seen_sheet_names: + continue + + seen_sheet_names.add(lowered_sheet_name) + selected_sheets.append(selected_sheet) + + return selected_sheets + + +def get_tabular_retry_sheet_overrides(invocations): + """Choose workbook sheet overrides for the next retry based on failed tool payloads.""" + candidate_scores_by_filename = {} + candidate_details_by_filename = {} + + for invocation in invocations or []: + function_name = getattr(invocation, 'function_name', '') + if function_name not in get_tabular_analysis_function_names(): + continue + + result_payload = get_tabular_invocation_result_payload(invocation) or {} + invocation_parameters = getattr(invocation, 'parameters', {}) or {} + filename = str( + result_payload.get('filename') + or invocation_parameters.get('filename') + or '' + ).strip() + if not filename: + continue + + candidate_sheets = get_tabular_invocation_candidate_sheets(invocation) + if not candidate_sheets: + continue + + selected_sheet = str(result_payload.get('selected_sheet') or '').strip().lower() + missing_column = str(result_payload.get('missing_column') or '').strip() + + filename_scores = candidate_scores_by_filename.setdefault(filename, {}) + filename_details = candidate_details_by_filename.setdefault(filename, []) + candidate_count = len(candidate_sheets) + + for candidate_index, candidate_sheet in enumerate(candidate_sheets): + if selected_sheet and candidate_sheet.lower() == selected_sheet: + continue + + score = max(1, candidate_count - candidate_index) + filename_scores[candidate_sheet] = filename_scores.get(candidate_sheet, 0) + score + + if missing_column: + filename_details.append(f"missing column '{missing_column}'") + + retry_sheet_overrides = {} + for filename, filename_scores in candidate_scores_by_filename.items(): + if not filename_scores: + continue + + selected_sheet_name = sorted( + filename_scores.items(), + key=lambda item: (-item[1], item[0].lower()) + )[0][0] + detail_messages = candidate_details_by_filename.get(filename, []) + detail_text = ', '.join(detail_messages[:3]) if detail_messages else None + retry_sheet_overrides[filename] = { + 'sheet_name': selected_sheet_name, + 'detail': detail_text, + } + + return retry_sheet_overrides + + +def split_tabular_analysis_invocations(invocations): + """Split analytical tabular invocations into successful and failed calls.""" + successful_invocations = [] + failed_invocations = [] + + for invocation in invocations or []: + function_name = getattr(invocation, 'function_name', '') + if function_name not in get_tabular_analysis_function_names(): + continue + + if get_tabular_invocation_error_message(invocation): + failed_invocations.append(invocation) + else: + successful_invocations.append(invocation) + + return successful_invocations, failed_invocations + + +def summarize_tabular_invocation_errors(invocations): + """Return a stable list of unique tabular tool error messages.""" + unique_errors = [] + seen_errors = set() + + for invocation in invocations or []: + error_message = get_tabular_invocation_error_message(invocation) + if not error_message: + continue + + normalized_error_message = error_message.strip() + if not normalized_error_message or normalized_error_message in seen_errors: + continue + + seen_errors.add(normalized_error_message) + unique_errors.append(normalized_error_message) + + return unique_errors + + +def filter_tabular_citation_invocations(invocations): + """Hide discovery-only citation noise when analytical tabular calls exist.""" + if not invocations: + return [] + + successful_analytical_invocations, _ = split_tabular_analysis_invocations(invocations) + if successful_analytical_invocations: + return successful_analytical_invocations + + successful_schema_summary_invocations = [] + for invocation in invocations or []: + if getattr(invocation, 'function_name', '') != 'describe_tabular_file': + continue + if get_tabular_invocation_error_message(invocation): + continue + successful_schema_summary_invocations.append(invocation) + + if successful_schema_summary_invocations: + return successful_schema_summary_invocations + + return [] + + +def format_tabular_thought_parameter_value(value): + """Render a concise parameter value for tabular thought details.""" + if value is None: + return None + + if isinstance(value, (dict, list, tuple)): + rendered_value = json.dumps(value, default=str) + else: + rendered_value = str(value) + + if not rendered_value: + return None + + if len(rendered_value) > 120: + rendered_value = rendered_value[:117] + '...' + + return rendered_value + + +def get_tabular_tool_thought_payloads(invocations): + """Convert tabular plugin invocations into user-visible thought payloads.""" + thought_payloads = [] + + for invocation in invocations or []: + function_name = getattr(invocation, 'function_name', 'unknown_tool') + duration_ms = getattr(invocation, 'duration_ms', None) + error_message = get_tabular_invocation_error_message(invocation) + success = getattr(invocation, 'success', True) and not error_message + parameters = getattr(invocation, 'parameters', {}) or {} + + filename = parameters.get('filename') + sheet_name = parameters.get('sheet_name') + duration_suffix = f" ({int(duration_ms)}ms)" if duration_ms else "" + content = f"Tabular tool {function_name}{duration_suffix}" + if filename: + content = f"Tabular tool {function_name} on {filename}{duration_suffix}" + if filename and sheet_name: + content = f"Tabular tool {function_name} on {filename} [{sheet_name}]{duration_suffix}" + if not success: + content = f"{content} failed" + + detail_parts = [] + for parameter_name, parameter_value in parameters.items(): + if parameter_name in get_tabular_thought_excluded_parameter_names(): + continue + + rendered_value = format_tabular_thought_parameter_value(parameter_value) + if rendered_value is None: + continue + + detail_parts.append(f"{parameter_name}={rendered_value}") + + rendered_error_message = format_tabular_thought_parameter_value(error_message) + if rendered_error_message: + detail_parts.append(f"error={rendered_error_message}") + + detail_parts.append(f"success={success}") + detail = "; ".join(detail_parts) if detail_parts else None + thought_payloads.append((content, detail)) + + return thought_payloads + + +def get_tabular_status_thought_payloads(invocations, analysis_succeeded): + """Return additional tabular status thoughts for retries and fallbacks.""" + successful_analytical_invocations, failed_analytical_invocations = split_tabular_analysis_invocations(invocations) + if not failed_analytical_invocations: + return [] + + error_messages = summarize_tabular_invocation_errors(failed_analytical_invocations) + detail = "; ".join(error_messages) if error_messages else None + + if analysis_succeeded and successful_analytical_invocations: + return [( + "Tabular analysis recovered after retrying tool errors", + detail, + )] + + if analysis_succeeded: + return [( + "Tabular analysis recovered via internal fallback after tool errors", + detail, + )] + + return [( + "Tabular analysis encountered tool errors before fallback", + detail, + )] + + +def _normalize_tabular_sheet_token(token): + """Normalize question and sheet-name tokens for lightweight matching.""" + normalized = re.sub(r'[^a-z0-9]+', '', str(token or '').lower()) + if len(normalized) > 4 and normalized.endswith('ies'): + return normalized[:-3] + 'y' + if len(normalized) > 3 and normalized.endswith('s') and not normalized.endswith('ss'): + return normalized[:-1] + return normalized + + +def _tokenize_tabular_sheet_text(text): + """Tokenize free text into normalized sheet-matching tokens.""" + original_text = re.sub(r'(?i)w[\s\-_]*2', ' w2 ', str(text or '')) + expanded_text = re.sub(r'([a-z])([A-Z])', r'\1 \2', original_text) + expanded_text = re.sub(r'([A-Za-z])([0-9])', r'\1 \2', expanded_text) + expanded_text = re.sub(r'([0-9])([A-Za-z])', r'\1 \2', expanded_text) + expanded_text = re.sub(r'[_\-]+', ' ', expanded_text) + tokens = [] + seen_tokens = set() + + for raw_text in (original_text, expanded_text): + for raw_token in re.split(r'[^a-z0-9]+', raw_text.lower()): + normalized_token = _normalize_tabular_sheet_token(raw_token) + if not normalized_token or len(normalized_token) <= 1: + continue + if normalized_token in seen_tokens: + continue + seen_tokens.add(normalized_token) + tokens.append(normalized_token) + + return tokens + + +def _score_tabular_sheet_match(sheet_name, question_text, columns=None): + """Score how strongly a worksheet name matches the user question. + + When *columns* (a list of column-name strings from the sheet schema) is + provided, column-name tokens that overlap with the question contribute to + the score. This allows sheets whose names are generic (e.g. "Orders") to + still score highly when the question references column values like + "sales" or "profit". + """ + question_tokens = set(_tokenize_tabular_sheet_text(question_text)) + question_phrase = ' '.join(_tokenize_tabular_sheet_text(question_text)) + sheet_tokens = _tokenize_tabular_sheet_text(sheet_name) + if not sheet_tokens: + return 0 + + sheet_phrase = ' '.join(sheet_tokens) + score = 0 + + if sheet_phrase and sheet_phrase in question_phrase: + score += 8 + + token_matches = sum(1 for token in sheet_tokens if token in question_tokens) + score += token_matches * 3 + + if len(sheet_tokens) == 1 and sheet_tokens[0] in question_tokens: + score += 4 + + # Column-name overlap: each matching column token adds 2 points. + if columns and question_tokens: + column_tokens = set() + for col_name in columns: + column_tokens.update(_tokenize_tabular_sheet_text(col_name)) + column_matches = sum(1 for token in question_tokens if token in column_tokens) + score += column_matches * 2 + + return score + + +def _select_relevant_workbook_sheets(sheet_names, question_text, minimum_score=1, per_sheet=None): + """Return all workbook sheets that appear relevant to the question.""" + ranked_sheets = [] + for sheet_name in sheet_names or []: + columns = None + if per_sheet: + sheet_info = per_sheet.get(sheet_name, {}) + columns = sheet_info.get('columns', []) + score = _score_tabular_sheet_match(sheet_name, question_text, columns=columns) + if score < minimum_score: + continue + ranked_sheets.append((score, sheet_name)) + + ranked_sheets.sort(key=lambda item: (-item[0], item[1].lower())) + return [sheet_name for _, sheet_name in ranked_sheets] + + +def _build_tabular_cross_sheet_bridge_plan(sheet_names, question_text, per_sheet=None): + """Infer a lightweight reference-sheet to fact-sheet plan for grouped workbook questions.""" + if not per_sheet or not is_tabular_cross_sheet_bridge_question(question_text): + return None + + ranked_sheets = [] + for sheet_name in sheet_names or []: + sheet_info = per_sheet.get(sheet_name, {}) + columns = sheet_info.get('columns', []) + row_count = sheet_info.get('row_count', 0) or 0 + score = _score_tabular_sheet_match(sheet_name, question_text, columns=columns) + if score <= 0: + continue + ranked_sheets.append({ + 'sheet_name': sheet_name, + 'score': score, + 'row_count': row_count, + }) + + if len(ranked_sheets) < 2: + return None + + fact_sheet = max( + ranked_sheets, + key=lambda item: (item['row_count'], item['score'], item['sheet_name'].lower()), + ) + reference_candidates = [ + item for item in ranked_sheets + if item['sheet_name'] != fact_sheet['sheet_name'] and item['row_count'] > 0 + ] + if not reference_candidates: + return None + + reference_sheet = min( + reference_candidates, + key=lambda item: (item['row_count'], -item['score'], item['sheet_name'].lower()), + ) + + if fact_sheet['row_count'] <= reference_sheet['row_count']: + return None + + if fact_sheet['row_count'] < max(25, reference_sheet['row_count'] * 2): + return None + + relevant_sheets = [reference_sheet['sheet_name'], fact_sheet['sheet_name']] + for item in sorted(ranked_sheets, key=lambda entry: (-entry['score'], entry['sheet_name'].lower())): + if item['sheet_name'] in relevant_sheets: + continue + relevant_sheets.append(item['sheet_name']) + + return { + 'reference_sheet': reference_sheet['sheet_name'], + 'reference_row_count': reference_sheet['row_count'], + 'fact_sheet': fact_sheet['sheet_name'], + 'fact_row_count': fact_sheet['row_count'], + 'relevant_sheets': relevant_sheets, + } + + +def is_tabular_access_limited_analysis(analysis_text): + """Return True when a tool-backed analysis still claims the data is unavailable.""" + normalized_analysis = re.sub(r'\s+', ' ', str(analysis_text or '').strip().lower()) + if not normalized_analysis: + return False + + inaccessible_phrases = ( + "don't have direct access", + 'do not have direct access', + "don't have", + 'do not have', + 'visible excerpt you provided', + 'if those tool-backed results exist', + 'allow me to query again', + 'can outline what i would retrieve', + ) + return any(phrase in normalized_analysis for phrase in inaccessible_phrases) + + +def _select_likely_workbook_sheet(sheet_names, question_text, per_sheet=None): + """Return a likely sheet name when the user question strongly matches one sheet.""" + best_sheet = None + best_score = 0 + runner_up_score = 0 + + for sheet_name in sheet_names or []: + columns = None + if per_sheet: + sheet_info = per_sheet.get(sheet_name, {}) + columns = sheet_info.get('columns', []) + score = _score_tabular_sheet_match(sheet_name, question_text, columns=columns) + + if score > best_score: + runner_up_score = best_score + best_score = score + best_sheet = sheet_name + elif score > runner_up_score: + runner_up_score = score + + if best_score <= 0 or best_score == runner_up_score: + return None + + return best_sheet + + +async def run_tabular_sk_analysis(user_question, tabular_filenames, user_id, + conversation_id, gpt_model, settings, + source_hint="workspace", group_id=None, + public_workspace_id=None, + execution_mode='analysis'): + """Run lightweight SK with TabularProcessingPlugin to analyze tabular data. + + Creates a temporary Kernel with only the TabularProcessingPlugin, uses the + same chat model as the user's session, and returns computed analysis results. + Returns None on failure for graceful degradation. + """ + from semantic_kernel import Kernel as SKKernel + from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion + from semantic_kernel.connectors.ai.function_choice_behavior import FunctionChoiceBehavior + from semantic_kernel.connectors.ai.open_ai.prompt_execution_settings.azure_chat_prompt_execution_settings import AzureChatPromptExecutionSettings + from semantic_kernel.contents.chat_history import ChatHistory as SKChatHistory + from semantic_kernel_plugins.tabular_processing_plugin import TabularProcessingPlugin + + try: + plugin_logger = get_plugin_logger() + execution_mode = execution_mode if execution_mode in {'analysis', 'schema_summary', 'entity_lookup'} else 'analysis' + schema_summary_mode = execution_mode == 'schema_summary' + entity_lookup_mode = execution_mode == 'entity_lookup' + log_event( + f"[Tabular SK Analysis] Starting {execution_mode} analysis for files: {tabular_filenames}", + level=logging.INFO, + ) + + # 1. Create lightweight kernel with only tabular plugin + kernel = SKKernel() + tabular_plugin = TabularProcessingPlugin() + kernel.add_plugin(tabular_plugin, plugin_name="tabular_processing") + + # 2. Create chat service using same config as main chat + enable_gpt_apim = settings.get('enable_gpt_apim', False) + if enable_gpt_apim: + chat_service = AzureChatCompletion( + service_id="tabular-analysis", + deployment_name=gpt_model, + endpoint=settings.get('azure_apim_gpt_endpoint'), + api_key=settings.get('azure_apim_gpt_subscription_key'), + api_version=settings.get('azure_apim_gpt_api_version'), + ) + else: + auth_type = settings.get('azure_openai_gpt_authentication_type') + if auth_type == 'managed_identity': + token_provider = get_bearer_token_provider(DefaultAzureCredential(), cognitive_services_scope) + chat_service = AzureChatCompletion( + service_id="tabular-analysis", + deployment_name=gpt_model, + endpoint=settings.get('azure_openai_gpt_endpoint'), + api_version=settings.get('azure_openai_gpt_api_version'), + ad_token_provider=token_provider, + ) + else: + chat_service = AzureChatCompletion( + service_id="tabular-analysis", + deployment_name=gpt_model, + endpoint=settings.get('azure_openai_gpt_endpoint'), + api_key=settings.get('azure_openai_gpt_key'), + api_version=settings.get('azure_openai_gpt_api_version'), + ) + kernel.add_service(chat_service) + + # 3. Pre-dispatch: load file schemas to eliminate discovery LLM rounds + source_context = f"source='{source_hint}'" + if group_id: + source_context += f", group_id='{group_id}'" + if public_workspace_id: + source_context += f", public_workspace_id='{public_workspace_id}'" + + schema_parts = [] + workbook_sheet_hints = {} + workbook_related_sheet_hints = {} + workbook_cross_sheet_bridge_hints = {} + workbook_blob_locations = {} + retry_sheet_overrides = {} + previous_failed_call_parameters = [] # entity lookup: concrete failed call params for retry hints + allowed_function_filters = { + 'included_functions': [ + f"tabular_processing-{function_name}" + for function_name in ( + ['describe_tabular_file'] + if schema_summary_mode else + sorted(get_tabular_analysis_function_names()) + ) + ] + } + for fname in tabular_filenames: + try: + container, blob_path = tabular_plugin._resolve_blob_location_with_fallback( + user_id, conversation_id, fname, source_hint, + group_id=group_id, public_workspace_id=public_workspace_id + ) + schema_info = tabular_plugin._build_workbook_schema_summary( + container, + blob_path, + fname, + preview_rows=2, + ) + workbook_blob_locations[fname] = (container, blob_path) + + if schema_info.get('is_workbook') and schema_info.get('sheet_count', 0) > 1: + # Build a compact sheet directory so the model can pick the + # relevant sheet itself instead of us guessing. + per_sheet = schema_info.get('per_sheet_schemas', {}) + likely_sheet = _select_likely_workbook_sheet( + schema_info.get('sheet_names', []), + user_question, + per_sheet=per_sheet, + ) + relevant_sheets = _select_relevant_workbook_sheets( + schema_info.get('sheet_names', []), + user_question, + per_sheet=per_sheet, + ) + cross_sheet_bridge_plan = None + if not schema_summary_mode and not entity_lookup_mode: + cross_sheet_bridge_plan = _build_tabular_cross_sheet_bridge_plan( + schema_info.get('sheet_names', []), + user_question, + per_sheet=per_sheet, + ) + if entity_lookup_mode: + workbook_related_sheet_hints[fname] = relevant_sheets or list(schema_info.get('sheet_names', [])) + elif cross_sheet_bridge_plan: + workbook_cross_sheet_bridge_hints[fname] = cross_sheet_bridge_plan + workbook_related_sheet_hints[fname] = cross_sheet_bridge_plan.get('relevant_sheets', []) + likely_sheet = cross_sheet_bridge_plan.get('fact_sheet') or likely_sheet + if likely_sheet: + workbook_sheet_hints[fname] = likely_sheet + if not entity_lookup_mode and not cross_sheet_bridge_plan: + tabular_plugin.set_default_sheet(container, blob_path, likely_sheet) + elif not entity_lookup_mode and not cross_sheet_bridge_plan: + # Fallback for analysis mode: pick the sheet with the + # most rows so that set_default_sheet is always called + # and the model can omit sheet_name on tool calls. + fallback_sheet = max( + schema_info.get('sheet_names', []), + key=lambda s: per_sheet.get(s, {}).get('row_count', 0), + default=None, + ) + if fallback_sheet: + likely_sheet = fallback_sheet + workbook_sheet_hints[fname] = likely_sheet + tabular_plugin.set_default_sheet(container, blob_path, likely_sheet) + + sheet_directory = [] + for sname in schema_info.get('sheet_names', []): + sheet_info = per_sheet.get(sname, {}) + sheet_directory.append({ + 'sheet_name': sname, + 'row_count': sheet_info.get('row_count', 0), + 'columns': sheet_info.get('columns', []), + }) + directory_schema = { + 'filename': fname, + 'is_workbook': True, + 'sheet_count': schema_info.get('sheet_count', 0), + 'likely_sheet': likely_sheet, + 'sheet_directory': sheet_directory, + } + schema_parts.append(json.dumps(directory_schema, indent=2, default=str)) + log_event( + f"[Tabular SK Analysis] Pre-loaded workbook {fname} directory " + f"({schema_info.get('sheet_count', 0)} sheets available)" + + (f"; likely sheet '{likely_sheet}'" if likely_sheet else ''), + level=logging.DEBUG, + ) + else: + schema_parts.append(json.dumps(schema_info, indent=2, default=str)) + if schema_info.get('is_workbook'): + # Single-sheet workbook — set default so the model needs no sheet arg + single_sheet = (schema_info.get('sheet_names') or [None])[0] + if single_sheet: + tabular_plugin.set_default_sheet(container, blob_path, single_sheet) + df = tabular_plugin._read_tabular_blob_to_dataframe(container, blob_path) + log_event(f"[Tabular SK Analysis] Pre-loaded schema for {fname} ({len(df)} rows)", level=logging.DEBUG) + except Exception as e: + log_event(f"[Tabular SK Analysis] Failed to pre-load schema for {fname}: {e}", level=logging.WARNING) + schema_parts.append(json.dumps({"filename": fname, "error": f"Could not pre-load: {str(e)}"})) + + schema_context = "\n".join(schema_parts) + + def build_system_prompt(force_tool_use=False, tool_error_messages=None, execution_gap_messages=None): + if schema_summary_mode: + retry_prefix = "" + if force_tool_use: + retry_prefix = ( + "RETRY MODE: Your previous attempt did not execute a usable workbook-schema tool call. " + "You MUST call describe_tabular_file before writing any answer text. " + "Do not switch to aggregate, filter, query, lookup, or grouped-analysis tools for worksheet-summary questions.\n\n" + ) + + tool_error_feedback = "" + if tool_error_messages: + rendered_errors = "\n".join( + f"- {error_message}" for error_message in tool_error_messages + ) + tool_error_feedback = ( + "PREVIOUS TOOL ERRORS:\n" + f"{rendered_errors}\n" + "Correct the function arguments and retry describe_tabular_file immediately.\n\n" + ) + + return ( + "You are a workbook schema analyst. The workbook structure is available through the " + "tabular_processing plugin and the pre-loaded schema context. You MUST call " + "describe_tabular_file before answering. Use the workbook-level response to identify " + "worksheet names, what each worksheet represents, and the high-confidence relationships " + "visible from shared identifiers, columns, and sheet purposes.\n\n" + f"{retry_prefix}" + f"{tool_error_feedback}" + f"FILE SCHEMAS:\n" + f"{schema_context}\n\n" + "AVAILABLE FUNCTIONS: describe_tabular_file only.\n\n" + "IMPORTANT:\n" + "1. Call describe_tabular_file for each workbook you need to summarize.\n" + "2. For multi-sheet workbooks, omit sheet_name so the tool returns workbook-level sheet schemas.\n" + "3. Summarize the worksheet list, what each worksheet represents, and any cross-sheet relationships visible from shared identifiers or repeated business entities.\n" + "4. Do not switch to aggregate, filter, query, lookup, or grouped-analysis tools for workbook-structure questions.\n" + "5. If a relationship is not explicit, describe it as an inference from the schema rather than a confirmed join.\n" + "6. Do not mention hypothetical follow-up analyses or failed attempts unless the user explicitly asked about failures." + ) + + retry_prefix = "" + if force_tool_use: + retry_prefix = ( + "RETRY MODE: Your previous attempt did not execute a usable analytical tool call. " + "You MUST call one or more analytical tabular_processing plugin functions before writing any answer text. " + "Do not say the analysis still needs to be run — run it now.\n\n" + ) + + tool_error_feedback = "" + if tool_error_messages: + rendered_errors = "\n".join( + f"- {error_message}" for error_message in tool_error_messages + ) + tool_error_feedback = ( + "PREVIOUS TOOL ERRORS:\n" + f"{rendered_errors}\n" + "Correct the function arguments and try again. If the operation is not 'count', provide an aggregate_column.\n\n" + ) + + execution_gap_feedback = "" + if execution_gap_messages: + rendered_gaps = "\n".join( + f"- {gap_message}" for gap_message in execution_gap_messages + ) + execution_gap_feedback = ( + "PREVIOUS EXECUTION GAPS:\n" + f"{rendered_gaps}\n" + "Correct the analysis plan and query the missing related worksheets before answering.\n\n" + ) + + missing_sheet_feedback = "" + if tool_error_messages and any( + 'Specify sheet_name or sheet_index on analytical calls.' in error_message + for error_message in tool_error_messages + ): + if entity_lookup_mode: + # Entity lookup: generate concrete per-sheet filter_rows examples from the actual failed call parameters + call_example_lines = [] + for failed_params in previous_failed_call_parameters[:2]: + fname = failed_params.get('filename', '') + col = failed_params.get('column', '') + op = failed_params.get('operator', '==') + val = failed_params.get('value', '') + if not fname or not col or not val: + continue + related_sheets = workbook_related_sheet_hints.get(fname) or list(workbook_sheet_hints.values()) + for sheet in related_sheets[:6]: + call_example_lines.append( + f' filter_rows(filename="{fname}", sheet_name="{sheet}", column="{col}", operator="{op}", value="{val}")' + ) + if call_example_lines: + examples_block = "\n".join(call_example_lines) + missing_sheet_feedback = ( + "MULTI-SHEET RETRY REQUIRED: Your previous calls omitted sheet_name and all failed.\n" + "For this multi-sheet workbook, sheet_name is MANDATORY in every analytical call.\n" + "Execute ALL of these calls now (copy exactly as written):\n" + f"{examples_block}\n\n" + ) + else: + related_lines = [ + "MULTI-SHEET RETRY REQUIRED: Your previous calls omitted sheet_name.", + "Add sheet_name to every analytical call. Relevant worksheets per file:", + ] + for workbook_name, related_sheets in workbook_related_sheet_hints.items(): + related_lines.append( + f" {workbook_name}: query each of: {', '.join(related_sheets[:6])}" + ) + missing_sheet_feedback = "\n".join(related_lines) + "\n\n" + else: + guidance_lines = [ + "MULTI-SHEET RETRY: Your previous analytical call omitted sheet_name on a multi-sheet workbook.", + "Retry immediately with sheet_name set to the most relevant worksheet from sheet_directory.", + "For account/category lookup questions by month, use filter_rows or query_tabular_data on the label column first, then read the requested month column.", + "Do not aggregate an entire month column unless the user explicitly asked for a total, sum, average, min, max, or count.", + ] + for workbook_name, hinted_sheet in workbook_sheet_hints.items(): + guidance_lines.append( + f"Likely worksheet for {workbook_name} based on the question text: {hinted_sheet}." + ) + missing_sheet_feedback = "\n".join(guidance_lines) + "\n\n" + + sheet_hint_feedback = "" + if workbook_sheet_hints: + rendered_hints = "\n".join( + f"- {workbook_name}: likely worksheet '{hinted_sheet}'" + for workbook_name, hinted_sheet in workbook_sheet_hints.items() + ) + sheet_hint_feedback = ( + "LIKELY WORKSHEET HINTS:\n" + f"{rendered_hints}\n" + "Use the likely worksheet unless the question clearly refers to a different sheet or a prior tool error identified a better recovery sheet.\n\n" + ) + + recovery_sheet_feedback = "" + if retry_sheet_overrides: + rendered_recovery_hints = "\n".join( + ( + f"- {workbook_name}: retry on worksheet '{override_payload['sheet_name']}'" + + (f" ({override_payload['detail']})" if override_payload.get('detail') else '') + ) + for workbook_name, override_payload in retry_sheet_overrides.items() + ) + recovery_sheet_feedback = ( + "RECOVERY WORKSHEET HINTS:\n" + f"{rendered_recovery_hints}\n" + "These recovery hints override the original likely-sheet guess when the previous tool call failed on the wrong worksheet.\n\n" + ) + + related_sheet_feedback = "" + if workbook_related_sheet_hints: + rendered_related_sheet_hints = "\n".join( + f"- {workbook_name}: {', '.join(related_sheets)}" + for workbook_name, related_sheets in workbook_related_sheet_hints.items() + if related_sheets + ) + if rendered_related_sheet_hints: + related_sheet_instruction = ( + 'Use these worksheets to satisfy cross-sheet profile and related-record requests.' + if entity_lookup_mode else + 'Use these worksheets together when the answer may require one sheet for entities and another for facts.' + ) + related_sheet_feedback = ( + "QUESTION-RELEVANT WORKSHEET HINTS:\n" + f"{rendered_related_sheet_hints}\n" + f"{related_sheet_instruction}\n\n" + ) + + cross_sheet_bridge_feedback = "" + if workbook_cross_sheet_bridge_hints: + rendered_bridge_hints = "\n".join( + ( + f"- {workbook_name}: reference worksheet '{bridge_hint['reference_sheet']}' " + f"({bridge_hint['reference_row_count']} rows); fact worksheet '{bridge_hint['fact_sheet']}' " + f"({bridge_hint['fact_row_count']} rows)" + ) + for workbook_name, bridge_hint in workbook_cross_sheet_bridge_hints.items() + ) + cross_sheet_bridge_feedback = ( + "CROSS-SHEET BRIDGE PLAN:\n" + f"{rendered_bridge_hints}\n" + "For grouped cross-sheet questions, first use the reference worksheet to identify canonical entity or category names, then compute the requested metric from the fact worksheet. Prefer shared identifier or name columns over yes/no, boolean, or membership-flag columns.\n\n" + ) + + if entity_lookup_mode: + entity_retry_prefix = retry_prefix + if force_tool_use: + entity_retry_prefix = ( + "RETRY MODE: Your previous attempt did not complete the related-record lookup. " + "You MUST call one or more analytical tabular_processing plugin functions before writing any answer text. " + "Query the missing related worksheets explicitly with sheet_name.\n\n" + ) + + return ( + "You are a workbook entity lookup analyst. The full dataset is available through the " + "tabular_processing plugin functions. The user is asking for one entity and related records across worksheets. " + "You MUST use one or more tabular_processing plugin functions before answering. Never answer from the schema preview alone.\n\n" + f"{entity_retry_prefix}" + f"{tool_error_feedback}" + f"{execution_gap_feedback}" + f"{recovery_sheet_feedback}" + f"{sheet_hint_feedback}" + f"{related_sheet_feedback}" + f"{missing_sheet_feedback}" + f"FILE SCHEMAS:\n" + f"{schema_context}\n\n" + "AVAILABLE FUNCTIONS: lookup_value, aggregate_column, filter_rows, query_tabular_data, " + "group_by_aggregate, and group_by_datetime_component.\n\n" + "Discovery functions are not available in this analysis run because schema context is already pre-loaded.\n\n" + "IMPORTANT:\n" + "1. Pass sheet_name='' on EVERY analytical call for multi-sheet workbooks. Do not rely on a default sheet for cross-sheet entity lookups.\n" + "2. First retrieve the primary entity row on the most relevant worksheet.\n" + "3. Then query other relevant worksheets explicitly to collect related records.\n" + "4. When a retrieved row contains a secondary identifier such as ReturnID, CaseID, AccountID, PaymentID, W2ID, or Form1099ID, reuse it to query dependent worksheets.\n" + "5. Do not stop after the first successful row if the question asks for related records across sheets.\n" + "6. If a requested record type has no corresponding worksheet in the workbook, say that the workbook does not contain that record type.\n" + "7. Clearly distinguish between no matching rows and no corresponding worksheet.\n" + "8. Summarize concrete found records sheet-by-sheet using the tool results, not schema placeholders.\n" + "9. Do not mention hypothetical follow-up analyses, parser errors, or failed attempts unless the user explicitly asked about failures and you have actual tool error output to report." + ) + + return ( + "You are a data analyst. The full dataset is available through the " + "tabular_processing plugin functions. You MUST use one or more " + "tabular_processing plugin functions before answering. Never answer from " + "the schema preview alone. Never say that you would need to run the " + "analysis later — run it now.\n\n" + f"{retry_prefix}" + f"{tool_error_feedback}" + f"{execution_gap_feedback}" + f"{recovery_sheet_feedback}" + f"{sheet_hint_feedback}" + f"{related_sheet_feedback}" + f"{cross_sheet_bridge_feedback}" + f"{missing_sheet_feedback}" + f"FILE SCHEMAS:\n" + f"{schema_context}\n\n" + "AVAILABLE FUNCTIONS: lookup_value, aggregate_column, filter_rows, query_tabular_data, " + "group_by_aggregate, and group_by_datetime_component for year/quarter/month/week/day/hour trend analysis.\n\n" + "Discovery functions are not available in this analysis run because schema context is already pre-loaded.\n\n" + "IMPORTANT:\n" + "1. Use the pre-loaded schema to pick the correct columns, then call the plugin functions.\n" + "2. For multi-sheet workbooks, review the sheet_directory to find the most relevant sheet for the question. Pass sheet_name='' in every analytical tool call unless a trustworthy default sheet has already been established. If a CROSS-SHEET BRIDGE PLAN is provided, query the listed worksheets explicitly and do not rely on a default sheet.\n" + "3. If a previous tool error says a requested column is missing on the current sheet and suggests candidate sheets, switch to one of those candidate sheets immediately.\n" + "4. For account/category lookup questions at a specific period or metric, use lookup_value first. Provide lookup_column, lookup_value, and target_column.\n" + "5. If lookup_value is not sufficient, use filter_rows or query_tabular_data on the label column, then read the requested period column.\n" + "6. Only use aggregate_column when the user explicitly asks for a sum, average, min, max, or count across rows.\n" + "7. For time-based questions on datetime columns, use group_by_datetime_component.\n" + "8. For threshold, ranking, comparison, or correlation-like questions, first filter/query the relevant rows, then compute grouped metrics.\n" + "9. When the question asks for grouped results for each entity or category and a cross-sheet bridge plan is available, use the reference worksheet to identify the canonical entities or categories and the fact worksheet to compute the metric. Do not answer 'each X' by grouping a yes/no, boolean, or membership-flag column unless the user explicitly asked about that flag.\n" + "10. When the question asks for rows satisfying multiple conditions, prefer one combined query_expression using and/or instead of separate broad queries that you plan to intersect later.\n" + "11. Batch multiple independent function calls in a SINGLE response whenever possible.\n" + "12. Keep max_rows as small as possible. Only increase it when the user explicitly asked for an exhaustive row list or export; otherwise return total_matches plus representative rows.\n" + "13. For analytical questions, prefer lookup/filter/query plus aggregate/grouped computations over raw row or preview output.\n" + "14. For identifier-based workbook questions, locate the identifier on the correct sheet before explaining downstream calculations.\n" + "15. For peak, busiest, highest, or lowest questions, use grouped functions and inspect the highest_group, highest_value, lowest_group, and lowest_value summary fields.\n" + "16. Return only computed findings and name the strongest drivers clearly.\n" + "17. Do not mention hypothetical follow-up analyses, parser errors, or failed attempts unless the user explicitly asked about failures and you have actual tool error output to report." + ) + + baseline_invocations = plugin_logger.get_invocations_for_conversation( + user_id, + conversation_id, + limit=1000 + ) + baseline_invocation_count = len(baseline_invocations) + previous_tool_error_messages = [] + previous_execution_gap_messages = [] + + for attempt_number in range(1, 4): + force_tool_use = attempt_number > 1 + # 4. Build chat history with pre-loaded schemas + chat_history = SKChatHistory() + chat_history.add_system_message(build_system_prompt( + force_tool_use=force_tool_use, + tool_error_messages=previous_tool_error_messages, + execution_gap_messages=previous_execution_gap_messages, + )) + + chat_history.add_user_message( + f"Analyze the tabular data to answer: {user_question}\n" + f"Use user_id='{user_id}', conversation_id='{conversation_id}', {source_context}." + ) + + # 5. Execute with auto function calling + execution_settings = AzureChatPromptExecutionSettings( + service_id="tabular-analysis", + function_choice_behavior=( + FunctionChoiceBehavior.Required( + maximum_auto_invoke_attempts=8, + filters=allowed_function_filters, + ) + if force_tool_use else + FunctionChoiceBehavior.Auto( + maximum_auto_invoke_attempts=7, + filters=allowed_function_filters, + ) + ), + ) + + result = None + synthesis_exception = None + try: + result = await chat_service.get_chat_message_contents( + chat_history, execution_settings, kernel=kernel + ) + except Exception as exc: + synthesis_exception = exc + log_event( + f"[Tabular SK Analysis] Attempt {attempt_number} synthesis failed after tool execution setup: {exc}", + level=logging.WARNING, + exceptionTraceback=True, + ) + + invocations_after = plugin_logger.get_invocations_for_conversation( + user_id, + conversation_id, + limit=1000 + ) + new_invocations = get_new_plugin_invocations(invocations_after, baseline_invocation_count) + new_invocation_count = len(new_invocations) + discovery_invocations, analytical_invocations, _ = split_tabular_plugin_invocations(new_invocations) + successful_analytical_invocations, failed_analytical_invocations = split_tabular_analysis_invocations(new_invocations) + successful_schema_summary_invocations = [] + failed_schema_summary_invocations = [] + for invocation in discovery_invocations: + if getattr(invocation, 'function_name', '') != 'describe_tabular_file': + continue + if get_tabular_invocation_error_message(invocation): + failed_schema_summary_invocations.append(invocation) + else: + successful_schema_summary_invocations.append(invocation) + + if synthesis_exception is not None: + raw_tool_fallback = None + if not schema_summary_mode: + raw_tool_fallback = build_tabular_analysis_fallback_from_invocations( + successful_analytical_invocations, + ) + + if raw_tool_fallback: + log_event( + f"[Tabular SK Analysis] Falling back to raw successful tool summaries after attempt {attempt_number} synthesis error", + extra={ + 'successful_tool_count': len(successful_analytical_invocations), + 'attempt_number': attempt_number, + }, + level=logging.WARNING, + ) + return raw_tool_fallback + + log_event( + f"[Tabular SK Analysis] Attempt {attempt_number} could not recover from synthesis error", + extra={ + 'successful_tool_count': len(successful_analytical_invocations), + 'failed_tool_count': len(failed_analytical_invocations), + 'attempt_number': attempt_number, + }, + level=logging.WARNING, + ) + break + + if result and result[0].content: + analysis = result[0].content.strip() + if len(analysis) > 20000: + analysis = analysis[:20000] + "\n[Analysis truncated]" + + if schema_summary_mode: + if successful_schema_summary_invocations: + log_event( + f"[Tabular SK Analysis] Schema summary complete via {len(successful_schema_summary_invocations)} workbook tool call(s) on attempt {attempt_number}", + level=logging.INFO, + ) + return analysis + + if failed_schema_summary_invocations: + previous_tool_error_messages = summarize_tabular_invocation_errors(failed_schema_summary_invocations) + log_event( + f"[Tabular SK Analysis] Attempt {attempt_number} used workbook schema tool(s) but all returned errors; retrying", + extra={ + 'tool_errors': previous_tool_error_messages, + 'failed_tool_count': len(failed_schema_summary_invocations), + }, + level=logging.WARNING, + ) + elif analytical_invocations: + log_event( + f"[Tabular SK Analysis] Attempt {attempt_number} used analytical tool(s) during schema-summary mode without usable workbook results; retrying", + level=logging.WARNING, + ) + elif discovery_invocations: + discovery_function_names = sorted({ + invocation.function_name for invocation in discovery_invocations + }) + log_event( + f"[Tabular SK Analysis] Attempt {attempt_number} used only discovery tool(s) {discovery_function_names} without usable workbook summary; retrying", + level=logging.WARNING, + ) + elif new_invocation_count > 0: + log_event( + f"[Tabular SK Analysis] Attempt {attempt_number} used unsupported tool(s) without usable workbook results; retrying", + level=logging.WARNING, + ) + else: + log_event( + f"[Tabular SK Analysis] Attempt {attempt_number} returned narrative without workbook schema tool use; retrying", + level=logging.WARNING, + ) + else: + if successful_analytical_invocations: + previous_tool_error_messages = [] + previous_failed_call_parameters = [] + + if entity_lookup_mode: + selected_sheets = get_tabular_invocation_selected_sheets(successful_analytical_invocations) + execution_gap_messages = [] + + # Cross-sheet results ("ALL (cross-sheet search)") already span + # the entire workbook — no execution gap for sheet coverage. + has_cross_sheet_result = any( + 'cross-sheet' in (s or '').lower() for s in selected_sheets + ) + + if len(selected_sheets) <= 1 and not has_cross_sheet_result: + rendered_selected_sheets = ', '.join(selected_sheets) if selected_sheets else 'unknown worksheet' + execution_gap_messages.append( + f"Previous attempt only queried worksheet(s): {rendered_selected_sheets}. The question asks for related records across worksheets, so query additional relevant sheets explicitly with sheet_name." + ) + + if is_tabular_access_limited_analysis(analysis): + execution_gap_messages.append( + 'Previous attempt still claimed the requested data was unavailable even though analytical tool calls succeeded. Use the returned rows and answer directly.' + ) + + if execution_gap_messages and attempt_number < 3: + previous_execution_gap_messages = execution_gap_messages + log_event( + f"[Tabular SK Analysis] Attempt {attempt_number} entity lookup was incomplete despite successful tool calls; retrying", + extra={ + 'selected_sheets': selected_sheets, + 'execution_gaps': previous_execution_gap_messages, + 'successful_tool_count': len(successful_analytical_invocations), + }, + level=logging.WARNING, + ) + baseline_invocation_count = len(invocations_after) + continue + + previous_execution_gap_messages = [] + log_event( + f"[Tabular SK Analysis] Analysis complete via {len(successful_analytical_invocations)} analytical tool call(s) on attempt {attempt_number}", + level=logging.INFO + ) + return analysis + + if failed_analytical_invocations: + previous_tool_error_messages = summarize_tabular_invocation_errors(failed_analytical_invocations) + previous_execution_gap_messages = [] + retry_sheet_overrides = get_tabular_retry_sheet_overrides(failed_analytical_invocations) + for workbook_name, override_payload in retry_sheet_overrides.items(): + blob_location = workbook_blob_locations.get(workbook_name) + if not blob_location: + continue + + container_name, blob_name = blob_location + tabular_plugin.set_default_sheet( + container_name, + blob_name, + override_payload['sheet_name'], + ) + + if retry_sheet_overrides: + log_event( + f"[Tabular SK Analysis] Attempt {attempt_number} selected retry worksheet override(s): {retry_sheet_overrides}", + level=logging.INFO, + ) + # For entity_lookup mode, extract and cache concrete call parameters + # so the retry prompt can generate per-sheet corrected call examples + if entity_lookup_mode: + seen_entity_filters = set() + entity_call_params = [] + for invoc in failed_analytical_invocations: + error_msg = get_tabular_invocation_error_message(invoc) or '' + if 'Specify sheet_name or sheet_index on analytical calls.' not in error_msg: + continue + invoc_params = getattr(invoc, 'parameters', {}) or {} + fn = getattr(invoc, 'function_name', '') + fname = str(invoc_params.get('filename') or '').strip() + if fn == 'filter_rows': + col = str(invoc_params.get('column') or '').strip() + op = str(invoc_params.get('operator') or '==').strip() + val = str(invoc_params.get('value') or '').strip() + elif fn == 'lookup_value': + col = str(invoc_params.get('lookup_column') or '').strip() + op = '==' + val = str(invoc_params.get('lookup_value') or '').strip() + else: + continue + if not fname or not col or not val: + continue + filter_key = (fname, col, val) + if filter_key in seen_entity_filters: + continue + seen_entity_filters.add(filter_key) + entity_call_params.append({ + 'filename': fname, + 'column': col, + 'operator': op, + 'value': val, + }) + previous_failed_call_parameters = entity_call_params + log_event( + f"[Tabular SK Analysis] Attempt {attempt_number} used analytical tool(s) but all returned errors; retrying", + extra={ + 'tool_errors': previous_tool_error_messages, + 'failed_tool_count': len(failed_analytical_invocations), + }, + level=logging.WARNING + ) + elif analytical_invocations: + previous_execution_gap_messages = [] + log_event( + f"[Tabular SK Analysis] Attempt {attempt_number} used analytical tool(s) without usable computed results; retrying", + level=logging.WARNING + ) + elif discovery_invocations: + previous_execution_gap_messages = [] + discovery_function_names = sorted({ + invocation.function_name for invocation in discovery_invocations + }) + log_event( + f"[Tabular SK Analysis] Attempt {attempt_number} used only discovery tool(s) {discovery_function_names} without computed analysis; retrying", + level=logging.WARNING + ) + elif new_invocation_count > 0: + previous_execution_gap_messages = [] + log_event( + f"[Tabular SK Analysis] Attempt {attempt_number} used unsupported tool(s) without computed analysis; retrying", + level=logging.WARNING + ) + else: + previous_execution_gap_messages = [] + log_event( + f"[Tabular SK Analysis] Attempt {attempt_number} returned narrative without tool use; retrying", + level=logging.WARNING + ) + + else: + if schema_summary_mode and failed_schema_summary_invocations: + previous_tool_error_messages = summarize_tabular_invocation_errors(failed_schema_summary_invocations) + log_event( + f"[Tabular SK Analysis] Attempt {attempt_number} returned no content after workbook tool errors; retrying", + extra={ + 'tool_errors': previous_tool_error_messages, + 'failed_tool_count': len(failed_schema_summary_invocations), + }, + level=logging.WARNING, + ) + elif failed_analytical_invocations: + previous_tool_error_messages = summarize_tabular_invocation_errors(failed_analytical_invocations) + previous_execution_gap_messages = [] + log_event( + f"[Tabular SK Analysis] Attempt {attempt_number} returned no content after tool errors; retrying", + extra={ + 'tool_errors': previous_tool_error_messages, + 'failed_tool_count': len(failed_analytical_invocations), + }, + level=logging.WARNING + ) + else: + log_event( + f"[Tabular SK Analysis] Attempt {attempt_number} returned no content", + level=logging.WARNING + ) + + baseline_invocation_count = len(invocations_after) + + log_event("[Tabular SK Analysis] Unable to obtain computed tool-backed results", level=logging.WARNING) + return None + + except Exception as e: + log_event(f"[Tabular SK Analysis] Error: {e}", level=logging.WARNING, exceptionTraceback=True) + return None + +def collect_tabular_sk_citations(user_id, conversation_id): + """Collect plugin invocations from the tabular SK analysis and convert to citation format.""" + from semantic_kernel_plugins.plugin_invocation_logger import get_plugin_logger + + plugin_logger = get_plugin_logger() + plugin_invocations = plugin_logger.get_invocations_for_conversation(user_id, conversation_id) + plugin_invocations = filter_tabular_citation_invocations(plugin_invocations) + + if not plugin_invocations: + return [] + + def make_json_serializable(obj): + if obj is None: + return None + elif isinstance(obj, (str, int, float, bool)): + return obj + elif isinstance(obj, dict): + return {str(k): make_json_serializable(v) for k, v in obj.items()} + elif isinstance(obj, (list, tuple)): + return [make_json_serializable(item) for item in obj] + else: + return str(obj) + + citations = [] + for inv in plugin_invocations: + timestamp_str = None + if inv.timestamp: + if hasattr(inv.timestamp, 'isoformat'): + timestamp_str = inv.timestamp.isoformat() + else: + timestamp_str = str(inv.timestamp) + + parameters = getattr(inv, 'parameters', {}) or {} + sheet_name = parameters.get('sheet_name') + sheet_index = parameters.get('sheet_index') + tool_name = f"{inv.plugin_name}.{inv.function_name}" + if sheet_name: + tool_name = f"{tool_name} [{sheet_name}]" + elif sheet_index not in (None, ''): + tool_name = f"{tool_name} [sheet #{sheet_index}]" + + citation = { + 'tool_name': tool_name, + 'function_name': inv.function_name, + 'plugin_name': inv.plugin_name, + 'function_arguments': make_json_serializable(parameters), + 'function_result': make_json_serializable(inv.result), + 'duration_ms': inv.duration_ms, + 'timestamp': timestamp_str, + 'success': inv.success, + 'error_message': make_json_serializable(inv.error_message), + 'user_id': inv.user_id, + 'sheet_name': sheet_name, + 'sheet_index': sheet_index, + } + citations.append(citation) + + log_event(f"[Tabular SK Citations] Collected {len(citations)} tool execution citations", level=logging.INFO) + return citations + + +def is_tabular_filename(filename): + """Return True when the filename has a supported tabular extension.""" + if not filename or not isinstance(filename, str): + return False + + _, extension = os.path.splitext(filename.strip().lower()) + return extension.lstrip('.') in TABULAR_EXTENSIONS + + +def get_citation_location(file_name, page_number=None, chunk_text=None, sheet_name=None): + """Return a display label/value pair for a citation location.""" + if sheet_name: + return 'Sheet', str(sheet_name) + + normalized_chunk_text = (chunk_text or '').strip() + if is_tabular_filename(file_name) and ( + normalized_chunk_text.startswith('Tabular workbook:') + or normalized_chunk_text.startswith('Tabular data file:') + ): + return 'Location', 'Workbook Schema' + + return 'Page', str(page_number or 1) + + +def get_document_container_for_scope(document_scope): + """Return the Cosmos documents container that matches the workspace scope.""" + if document_scope == 'group': + return cosmos_group_documents_container + if document_scope == 'public': + return cosmos_public_documents_container + return cosmos_user_documents_container + + +def get_selected_workspace_tabular_filenames(selected_document_ids=None, selected_document_id=None, document_scope='personal'): + """Resolve explicitly selected workspace documents and return tabular filenames.""" + selected_ids = list(selected_document_ids or []) + if not selected_ids and selected_document_id and selected_document_id != 'all': + selected_ids = [selected_document_id] + + if not selected_ids: + return set() + + cosmos_container = get_document_container_for_scope(document_scope) + tabular_filenames = set() + + for doc_id in selected_ids: + if not doc_id or doc_id == 'all': + continue + + try: + doc_query = ( + "SELECT TOP 1 c.file_name, c.title " + "FROM c WHERE c.id = @doc_id " + "ORDER BY c.version DESC" + ) + doc_params = [{"name": "@doc_id", "value": doc_id}] + doc_results = list(cosmos_container.query_items( + query=doc_query, + parameters=doc_params, + enable_cross_partition_query=True + )) + + if not doc_results: + continue + + file_name = doc_results[0].get('file_name') or doc_results[0].get('title') + if is_tabular_filename(file_name): + tabular_filenames.add(file_name) + except Exception as e: + log_event( + f"[Tabular SK Analysis] Failed to resolve selected document '{doc_id}': {e}", + level=logging.WARNING + ) + + return tabular_filenames + + +def collect_workspace_tabular_filenames(combined_documents=None, selected_document_ids=None, + selected_document_id=None, document_scope='personal'): + """Collect tabular filenames from search results and explicit workspace selection.""" + tabular_filenames = set() + + for source_doc in combined_documents or []: + file_name = source_doc.get('file_name', '') + if is_tabular_filename(file_name): + tabular_filenames.add(file_name) + + tabular_filenames.update(get_selected_workspace_tabular_filenames( + selected_document_ids=selected_document_ids, + selected_document_id=selected_document_id, + document_scope=document_scope, + )) + + return tabular_filenames + + +def determine_tabular_source_hint(document_scope, active_group_id=None, active_public_workspace_id=None): + """Map workspace scope metadata to the tabular plugin source hint.""" + if document_scope == 'group' and active_group_id: + return 'group' + if document_scope == 'public' and active_public_workspace_id: + return 'public' + return 'workspace' + def register_route_backend_chats(app): + def build_background_stream_response(event_generator_factory): + """Run SSE generation in background execution so it survives disconnects.""" + stream_bridge = BackgroundStreamBridge() + + @copy_current_request_context + def stream_worker(): + try: + for event in event_generator_factory(): + stream_bridge.push(event) + except Exception as e: + debug_print(f"[STREAM BACKGROUND] Worker error: {e}") + stream_bridge.push( + f"data: {json.dumps({'error': f'Internal server error: {str(e)}'})}\n\n" + ) + finally: + stream_bridge.finish() + + executor = current_app.extensions.get('executor') + if executor: + try: + executor.submit(stream_worker) + except Exception as e: + debug_print(f"[STREAM BACKGROUND] Executor submit failed, falling back to thread: {e}") + worker_thread = threading.Thread(target=stream_worker, daemon=True) + worker_thread.start() + else: + worker_thread = threading.Thread(target=stream_worker, daemon=True) + worker_thread.start() + + def consume_stream(): + try: + for event in stream_bridge.iter_events(): + yield event + except GeneratorExit: + stream_bridge.detach_consumer() + raise + finally: + stream_bridge.detach_consumer() + + return Response( + stream_with_context(consume_stream()), + mimetype='text/event-stream', + headers={ + 'Cache-Control': 'no-cache', + 'X-Accel-Buffering': 'no', + 'Connection': 'keep-alive' + } + ) + @app.route('/api/chat', methods=['POST']) @swagger_route(security=get_auth_security()) @login_required @user_required def chat_api(): try: + request_start_time = time.time() settings = get_settings() data = request.get_json() user_id = get_current_user_id() @@ -668,6 +2739,18 @@ def result_requires_message_reload(result: Any) -> bool: conversation_item['last_updated'] = datetime.utcnow().isoformat() cosmos_conversations_container.upsert_item(conversation_item) # Update timestamp and potentially title + + # Generate assistant_message_id early for thought tracking + assistant_message_id = f"{conversation_id}_assistant_{int(time.time())}_{random.randint(1000,9999)}" + + # Initialize thought tracker + thought_tracker = ThoughtTracker( + conversation_id=conversation_id, + message_id=assistant_message_id, + thread_id=current_user_thread_id, + user_id=user_id + ) + # region 3 - Content Safety # --------------------------------------------------------------------- # 3) Check Content Safety (but DO NOT return 403). @@ -679,6 +2762,7 @@ def result_requires_message_reload(result: Any) -> bool: blocklist_matches = [] if settings.get('enable_content_safety') and "content_safety_client" in CLIENTS: + thought_tracker.add_thought('content_safety', 'Checking content safety...') try: content_safety_client = CLIENTS["content_safety_client"] request_obj = AnalyzeTextOptions(text=user_message) @@ -836,6 +2920,7 @@ def result_requires_message_reload(result: Any) -> bool: # Perform the search + thought_tracker.add_thought('search', f"Searching {document_scope or 'personal'} workspace documents for '{(search_query or user_message)[:50]}'") try: # Prepare search arguments # Set default and maximum values for top_n @@ -898,9 +2983,11 @@ def result_requires_message_reload(result: Any) -> bool: 'error': 'There was an issue with the embedding process. Please check with an admin on embedding configuration.' }), 500 + combined_documents = [] if search_results: + unique_doc_names = set(doc.get('file_name', 'Unknown') for doc in search_results) + thought_tracker.add_thought('search', f"Found {len(search_results)} results from {len(unique_doc_names)} documents") retrieved_texts = [] - combined_documents = [] classifications_found = set(conversation_item.get('classification', [])) # Load existing for doc in search_results: @@ -915,13 +3002,23 @@ def result_requires_message_reload(result: Any) -> bool: chunk_id = doc.get('chunk_id', str(uuid.uuid4())) # Ensure ID exists score = doc.get('score', 0.0) # Add default score group_id = doc.get('group_id', None) # Add default group ID + sheet_name = doc.get('sheet_name') + location_label, location_value = get_citation_location( + file_name, + page_number=page_number, + chunk_text=chunk_text, + sheet_name=sheet_name, + ) - citation = f"(Source: {file_name}, Page: {page_number}) [#{citation_id}]" + citation = f"(Source: {file_name}, {location_label}: {location_value}) [#{citation_id}]" retrieved_texts.append(f"{chunk_text}\n{citation}") combined_documents.append({ "file_name": file_name, "citation_id": citation_id, "page_number": page_number, + "sheet_name": sheet_name, + "location_label": location_label, + "location_value": location_value, "version": version, "classification": classification, "chunk_text": chunk_text, @@ -935,17 +3032,7 @@ def result_requires_message_reload(result: Any) -> bool: retrieved_content = "\n\n".join(retrieved_texts) # Construct system prompt for search results - system_prompt_search = f"""You are an AI assistant. Use the following retrieved document excerpts to answer the user's question. Cite sources using the format (Source: filename, Page: page number). - - Retrieved Excerpts: - {retrieved_content} - - Based *only* on the information provided above, answer the user's query. If the answer isn't in the excerpts, say so. - - Example - User: What is the policy on double dipping? - Assistant: The policy prohibits entities from using federal funds received through one program to apply for additional funds through another program, commonly known as 'double dipping' (Source: PolicyDocument.pdf, Page: 12) - """ + system_prompt_search = build_search_augmentation_system_prompt(retrieved_content) # Add this to a temporary list, don't save to DB yet system_messages_for_augmentation.append({ 'role': 'system', @@ -1122,24 +3209,11 @@ def result_requires_message_reload(result: Any) -> bool: # Update the system prompt with the enhanced content including metadata if retrieved_texts: retrieved_content = "\n\n".join(retrieved_texts) - system_prompt_search = f"""You are an AI assistant. Use the following retrieved document excerpts to answer the user's question. Cite sources using the format (Source: filename, Page: page number). - Retrieved Excerpts: - {retrieved_content} - Based *only* on the information provided above, answer the user's query. If the answer isn't in the excerpts, say so. - - Retrieved Excerpts: - {retrieved_content} - - Based *only* on the information provided above, answer the user's query. If the answer isn't in the excerpts, say so. - - Example - User: What is the policy on double dipping? - Assistant: The policy prohibits entities from using federal funds received through one program to apply for additional funds through another program, commonly known as 'double dipping' (Source: PolicyDocument.pdf, Page: 12) - """ + system_prompt_search = build_search_augmentation_system_prompt(retrieved_content) # Update the system message with enhanced content and updated documents array if system_messages_for_augmentation: - system_messages_for_augmentation[-1]['content'] = system_prompt_search - system_messages_for_augmentation[-1]['documents'] = combined_documents + system_messages_for_augmentation[0]['content'] = system_prompt_search + system_messages_for_augmentation[0]['documents'] = combined_documents # --- END NEW METADATA CITATIONS --- # Update conversation classifications if new ones were found @@ -1462,7 +3536,8 @@ def result_requires_message_reload(result: Any) -> bool: 'conversation_id': conversation_id, 'conversation_title': conversation_item['title'], 'model_deployment_name': image_gen_model, - 'message_id': image_message_id + 'message_id': image_message_id, + 'user_message_id': user_message_id }), 200 except Exception as e: debug_print(f"Image generation error: {str(e)}") @@ -1488,7 +3563,83 @@ def result_requires_message_reload(result: Any) -> bool: 'error': user_friendly_message }), status_code + workspace_tabular_files = set() + if hybrid_search_enabled and settings.get('enable_tabular_processing_plugin', False) and settings.get('enable_enhanced_citations', False): + workspace_tabular_files = collect_workspace_tabular_filenames( + combined_documents=combined_documents, + selected_document_ids=selected_document_ids, + selected_document_id=selected_document_id, + document_scope=document_scope, + ) + + if hybrid_search_enabled and workspace_tabular_files and settings.get('enable_tabular_processing_plugin', False) and settings.get('enable_enhanced_citations', False): + tabular_source_hint = determine_tabular_source_hint( + document_scope, + active_group_id=active_group_id, + active_public_workspace_id=active_public_workspace_id, + ) + tabular_execution_mode = get_tabular_execution_mode(user_message) + tabular_filenames_str = ", ".join(sorted(workspace_tabular_files)) + plugin_logger = get_plugin_logger() + baseline_tabular_invocation_count = len( + plugin_logger.get_invocations_for_conversation(user_id, conversation_id, limit=1000) + ) + + tabular_analysis = asyncio.run(run_tabular_sk_analysis( + user_question=user_message, + tabular_filenames=workspace_tabular_files, + user_id=user_id, + conversation_id=conversation_id, + gpt_model=gpt_model, + settings=settings, + source_hint=tabular_source_hint, + group_id=active_group_id if tabular_source_hint == 'group' else None, + public_workspace_id=active_public_workspace_id if tabular_source_hint == 'public' else None, + execution_mode=tabular_execution_mode, + )) + tabular_invocations = get_new_plugin_invocations( + plugin_logger.get_invocations_for_conversation(user_id, conversation_id, limit=1000), + baseline_tabular_invocation_count + ) + tabular_thought_payloads = get_tabular_tool_thought_payloads(tabular_invocations) + for thought_content, thought_detail in tabular_thought_payloads: + thought_tracker.add_thought('tabular_analysis', thought_content, thought_detail) + tabular_status_thought_payloads = get_tabular_status_thought_payloads( + tabular_invocations, + analysis_succeeded=bool(tabular_analysis), + ) + for thought_content, thought_detail in tabular_status_thought_payloads: + thought_tracker.add_thought('tabular_analysis', thought_content, thought_detail) + + if tabular_analysis: + tabular_system_msg = build_tabular_computed_results_system_message( + f"the file(s) {tabular_filenames_str}", + tabular_analysis, + ) + else: + tabular_system_msg = build_tabular_fallback_system_message( + tabular_filenames_str, + execution_mode=tabular_execution_mode, + ) + + system_messages_for_augmentation.append({ + 'role': 'system', + 'content': tabular_system_msg + }) + + if tabular_analysis: + tabular_sk_citations = collect_tabular_sk_citations(user_id, conversation_id) + if tabular_sk_citations: + agent_citations_list.extend(tabular_sk_citations) + else: + thought_tracker.add_thought( + 'tabular_analysis', + "Tabular analysis could not compute results; using schema context instead", + detail=f"files={tabular_filenames_str}" + ) + if web_search_enabled: + thought_tracker.add_thought('web_search', f"Searching the web for '{(search_query or user_message)[:50]}'") perform_web_search( settings=settings, conversation_id=conversation_id, @@ -1504,7 +3655,9 @@ def result_requires_message_reload(result: Any) -> bool: agent_citations_list=agent_citations_list, web_search_citations_list=web_search_citations_list, ) - + if web_search_citations_list: + thought_tracker.add_thought('web_search', f"Got {len(web_search_citations_list)} web search results") + # region 5 - FINAL conversation history preparation # --------------------------------------------------------------------- # 5) Prepare FINAL conversation history for GPT (including summarization) @@ -1650,6 +3803,7 @@ def result_requires_message_reload(result: Any) -> bool: allowed_roles_in_history = ['user', 'assistant'] # Add 'system' if you PERSIST general system messages not related to augmentation max_file_content_length_in_history = 50000 # Increased limit for all file content in history max_tabular_content_length_in_history = 50000 # Same limit for tabular data consistency + chat_tabular_files = set() # Track tabular files uploaded directly to chat for message in recent_messages: role = message.get('role') @@ -1685,25 +3839,38 @@ def result_requires_message_reload(result: Any) -> bool: filename = message.get('filename', 'uploaded_file') file_content = message.get('file_content', '') # Assuming file content is stored is_table = message.get('is_table', False) - - # Use higher limit for tabular data that needs complete analysis - content_limit = max_tabular_content_length_in_history if is_table else max_file_content_length_in_history - - display_content = file_content[:content_limit] - if len(file_content) > content_limit: - display_content += "..." - - # Enhanced message for tabular data - if is_table: + file_content_source = message.get('file_content_source', '') + + # Tabular files stored in blob (enhanced citations enabled) - reference plugin + if is_table and file_content_source == 'blob': + chat_tabular_files.add(filename) # Track for mini SK analysis conversation_history_for_api.append({ - 'role': 'system', # Represent file as system info - 'content': f"[User uploaded a tabular data file named '{filename}'. This is CSV format data for analysis:\n{display_content}]\nThis is complete tabular data in CSV format. You can perform calculations, analysis, and data operations on this dataset." + 'role': 'system', + 'content': f"[User uploaded a tabular data file named '{filename}'. " + f"The file is stored in blob storage and available for analysis. " + f"Use the tabular_processing plugin functions (list_tabular_files, describe_tabular_file, " + f"aggregate_column, filter_rows, query_tabular_data, group_by_aggregate, group_by_datetime_component) to analyze this data. " + f"The file source is 'chat'.]" }) else: - conversation_history_for_api.append({ - 'role': 'system', # Represent file as system info - 'content': f"[User uploaded a file named '{filename}'. Content preview:\n{display_content}]\nUse this file context if relevant." - }) + # Use higher limit for tabular data that needs complete analysis + content_limit = max_tabular_content_length_in_history if is_table else max_file_content_length_in_history + + display_content = file_content[:content_limit] + if len(file_content) > content_limit: + display_content += "..." + + # Enhanced message for tabular data + if is_table: + conversation_history_for_api.append({ + 'role': 'system', # Represent file as system info + 'content': f"[User uploaded a tabular data file named '{filename}'. This is CSV format data for analysis:\n{display_content}]\nThis is complete tabular data in CSV format. You can perform calculations, analysis, and data operations on this dataset." + }) + else: + conversation_history_for_api.append({ + 'role': 'system', # Represent file as system info + 'content': f"[User uploaded a file named '{filename}'. Content preview:\n{display_content}]\nUse this file context if relevant." + }) elif role == 'image': # Handle image uploads with extracted text and vision analysis filename = message.get('filename', 'uploaded_image') is_user_upload = message.get('metadata', {}).get('is_user_upload', False) @@ -1767,6 +3934,67 @@ def result_requires_message_reload(result: Any) -> bool: # Ignored roles: 'safety', 'blocked', 'system' (if they are only for augmentation/summary) + # --- Mini SK analysis for tabular files uploaded directly to chat --- + if chat_tabular_files and settings.get('enable_tabular_processing_plugin', False) and settings.get('enable_enhanced_citations', False): + chat_tabular_filenames_str = ", ".join(chat_tabular_files) + chat_tabular_execution_mode = get_tabular_execution_mode(user_message) + log_event( + f"[Chat Tabular SK] Detected {len(chat_tabular_files)} tabular file(s) uploaded to chat: {chat_tabular_filenames_str}", + level=logging.INFO + ) + plugin_logger = get_plugin_logger() + baseline_tabular_invocation_count = len( + plugin_logger.get_invocations_for_conversation(user_id, conversation_id, limit=1000) + ) + + chat_tabular_analysis = asyncio.run(run_tabular_sk_analysis( + user_question=user_message, + tabular_filenames=chat_tabular_files, + user_id=user_id, + conversation_id=conversation_id, + gpt_model=gpt_model, + settings=settings, + source_hint="chat", + execution_mode=chat_tabular_execution_mode, + )) + chat_tabular_invocations = get_new_plugin_invocations( + plugin_logger.get_invocations_for_conversation(user_id, conversation_id, limit=1000), + baseline_tabular_invocation_count + ) + chat_tabular_thought_payloads = get_tabular_tool_thought_payloads(chat_tabular_invocations) + for thought_content, thought_detail in chat_tabular_thought_payloads: + thought_tracker.add_thought('tabular_analysis', thought_content, thought_detail) + chat_tabular_status_thought_payloads = get_tabular_status_thought_payloads( + chat_tabular_invocations, + analysis_succeeded=bool(chat_tabular_analysis), + ) + for thought_content, thought_detail in chat_tabular_status_thought_payloads: + thought_tracker.add_thought('tabular_analysis', thought_content, thought_detail) + + if chat_tabular_analysis: + # Inject pre-computed analysis results as context + conversation_history_for_api.append({ + 'role': 'system', + 'content': build_tabular_computed_results_system_message( + f"the chat-uploaded file(s) {chat_tabular_filenames_str}", + chat_tabular_analysis, + ) + }) + + # Collect tool execution citations from SK tabular analysis + chat_tabular_sk_citations = collect_tabular_sk_citations(user_id, conversation_id) + if chat_tabular_sk_citations: + agent_citations_list.extend(chat_tabular_sk_citations) + + debug_print(f"[Chat Tabular SK] Analysis injected, {len(chat_tabular_analysis)} chars") + else: + thought_tracker.add_thought( + 'tabular_analysis', + "Tabular analysis could not compute results; using existing chat file context", + detail=f"files={chat_tabular_filenames_str}" + ) + debug_print("[Chat Tabular SK] Analysis returned None, relying on existing file context messages") + # Ensure the very last message is the current user's message (it should be if fetched correctly) if not conversation_history_for_api or conversation_history_for_api[-1]['role'] != 'user': debug_print("Warning: Last message in history is not the user's current message. Appending.") @@ -1939,7 +4167,6 @@ async def run_sk_call(callable_obj, *args, **kwargs): chat_mode = None scope_id=active_group_id if chat_type == 'group' else user_id scope_type='group' if chat_type == 'group' else 'user' - conversation_id=conversation_id enable_multi_agent_orchestration = False fallback_steps = [] selected_agent = None @@ -2110,6 +4337,27 @@ def orchestrator_error(e): }) if selected_agent: + agent_deployment_name = getattr(selected_agent, 'deployment_name', None) or gpt_model + thought_tracker.add_thought('agent_tool_call', f"Sending to agent '{getattr(selected_agent, 'display_name', getattr(selected_agent, 'name', 'unknown'))}'") + thought_tracker.add_thought('generation', f"Sending to '{agent_deployment_name}'") + + # Register callback to write plugin thoughts to Cosmos in real-time + callback_key = f"{user_id}:{conversation_id}" + plugin_logger = get_plugin_logger() + + def on_plugin_invocation(inv): + duration_str = f" ({int(inv.duration_ms)}ms)" if inv.duration_ms else "" + tool_name = f"{inv.plugin_name}.{inv.function_name}" + thought_tracker.add_thought( + 'agent_tool_call', + f"Agent called {tool_name}{duration_str}", + detail=f"success={inv.success}" + ) + + plugin_logger.register_callback(callback_key, on_plugin_invocation) + + agent_invoke_start_time = time.time() + def invoke_selected_agent(): return asyncio.run(run_sk_call( selected_agent.invoke, @@ -2120,16 +4368,22 @@ def agent_success(result): msg = str(result) notice = None agent_used = getattr(selected_agent, 'name', 'All Plugins') - + + # Emit responded thought with total duration from user message + agent_total_duration_s = round(time.time() - request_start_time, 1) + thought_tracker.add_thought('generation', f"'{agent_deployment_name}' responded ({agent_total_duration_s}s from initial message)") + + # Deregister real-time thought callback + plugin_logger.deregister_callbacks(callback_key) + # Get the actual model deployment used by the agent actual_model_deployment = getattr(selected_agent, 'deployment_name', None) or agent_used debug_print(f"Agent '{agent_used}' using deployment: {actual_model_deployment}") - + # Extract detailed plugin invocations for enhanced agent citations - plugin_logger = get_plugin_logger() - # CRITICAL FIX: Filter by user_id and conversation_id to prevent cross-conversation contamination + # (Thoughts already written to Cosmos in real-time by callback) plugin_invocations = plugin_logger.get_invocations_for_conversation(user_id, conversation_id) - + # Convert plugin invocations to citation format with detailed information detailed_citations = [] for inv in plugin_invocations: @@ -2204,6 +4458,7 @@ def make_json_serializable(obj): ) return (msg, actual_model_deployment, "agent", notice) def agent_error(e): + plugin_logger.deregister_callbacks(callback_key) debug_print(f"Error during Semantic Kernel Agent invocation: {str(e)}") log_event( f"Error during Semantic Kernel Agent invocation: {str(e)}", @@ -2244,8 +4499,21 @@ def foundry_agent_success(result): or agent_used ) + # Emit responded thought with total duration from user message + foundry_total_duration_s = round(time.time() - request_start_time, 1) + thought_tracker.add_thought('generation', f"'{actual_model_deployment}' responded ({foundry_total_duration_s}s from initial message)") + + # Deregister real-time thought callback + plugin_logger.deregister_callbacks(callback_key) + foundry_citations = getattr(selected_agent, 'last_run_citations', []) or [] if foundry_citations: + # Emit thoughts for Foundry agent citations/tool calls + for citation in foundry_citations: + thought_tracker.add_thought( + 'agent_tool_call', + f"Agent retrieved citation from Azure AI Foundry" + ) for citation in foundry_citations: try: serializable = json.loads(json.dumps(citation, default=str)) @@ -2282,6 +4550,7 @@ def foundry_agent_success(result): return (msg, actual_model_deployment, 'agent', notice) def foundry_agent_error(e): + plugin_logger.deregister_callbacks(callback_key) log_event( f"Error during Azure AI Foundry agent invocation: {str(e)}", extra={ @@ -2360,6 +4629,7 @@ def kernel_error(e): 'on_error': kernel_error }) + thought_tracker.add_thought('generation', f"Sending to '{gpt_model}'") def invoke_gpt_fallback(): if not conversation_history_for_api: raise Exception('Cannot generate response: No conversation history available.') @@ -2443,12 +4713,18 @@ def gpt_error(e): }) fallback_result = try_fallback_chain(fallback_steps) + # Unpack result - handle both 4-tuple (SK) and 5-tuple (GPT with tokens) if len(fallback_result) == 5: ai_message, final_model_used, chat_mode, kernel_fallback_notice, token_usage_data = fallback_result else: ai_message, final_model_used, chat_mode, kernel_fallback_notice = fallback_result token_usage_data = None + + # Emit responded thought for non-agent paths (agent paths emit their own inside callbacks) + if not selected_agent: + gpt_total_duration_s = round(time.time() - request_start_time, 1) + thought_tracker.add_thought('generation', f"'{final_model_used}' responded ({gpt_total_duration_s}s from initial message)") # Collect token usage from Semantic Kernel services if available if kernel and not token_usage_data: @@ -2510,8 +4786,8 @@ def gpt_error(e): if hasattr(selected_agent, 'name'): agent_name = selected_agent.name - assistant_message_id = f"{conversation_id}_assistant_{int(time.time())}_{random.randint(1000,9999)}" - + # assistant_message_id was generated earlier for thought tracking + # Get user_info and thread_id from the user message for ownership tracking and threading user_info_for_assistant = None user_thread_id = None @@ -2672,7 +4948,8 @@ def gpt_error(e): 'web_search_citations': web_search_citations_list, 'agent_citations': agent_citations_list, 'reload_messages': reload_messages_required, - 'kernel_fallback_notice': kernel_fallback_notice + 'kernel_fallback_notice': kernel_fallback_notice, + 'thoughts_enabled': thought_tracker.enabled }), 200 except Exception as e: @@ -2713,8 +4990,112 @@ def chat_stream_api(): data = request.get_json() user_id = get_current_user_id() settings = get_settings() + request_start_time = time.time() except Exception as e: return jsonify({'error': f'Failed to parse request: {str(e)}'}), 400 + + compatibility_mode = bool(data.get('image_generation')) or bool( + data.get('retry_user_message_id') or data.get('edited_user_message_id') + ) + + request_message = (data.get('message') or '').strip() + request_preview = request_message[:120] + '...' if len(request_message) > 120 else request_message + debug_print( + "[Streaming] Incoming /api/chat/stream request | " + f"conversation_id={data.get('conversation_id')} | " + f"compatibility_mode={compatibility_mode} | " + f"hybrid_search={data.get('hybrid_search')} | " + f"web_search={data.get('web_search_enabled')} | " + f"doc_scope={data.get('doc_scope')} | " + f"chat_type={data.get('chat_type', 'user')} | " + f"selected_document_id={data.get('selected_document_id')} | " + f"selected_document_ids={len(data.get('selected_document_ids', []) or [])} | " + f"active_group_id={data.get('active_group_id')} | " + f"active_group_ids={len(data.get('active_group_ids', []) or [])} | " + f"active_public_workspace_id={data.get('active_public_workspace_id')} | " + f"frontend_model={data.get('model_deployment')} | " + f"message_preview={request_preview!r}" + ) + + def normalize_legacy_chat_payload(payload): + """Convert the legacy JSON response shape into the streaming terminal payload.""" + return { + 'done': True, + 'conversation_id': payload.get('conversation_id'), + 'conversation_title': payload.get('conversation_title'), + 'classification': payload.get('classification', []), + 'model_deployment_name': payload.get('model_deployment_name'), + 'message_id': payload.get('message_id'), + 'user_message_id': payload.get('user_message_id'), + 'augmented': payload.get('augmented', False), + 'hybrid_citations': payload.get('hybrid_citations', []), + 'web_search_citations': payload.get('web_search_citations', []), + 'agent_citations': payload.get('agent_citations', []), + 'agent_display_name': payload.get('agent_display_name'), + 'agent_name': payload.get('agent_name'), + 'full_content': payload.get('reply', ''), + 'image_url': payload.get('image_url'), + 'reload_messages': payload.get('reload_messages', False), + 'kernel_fallback_notice': payload.get('kernel_fallback_notice'), + 'thoughts_enabled': payload.get('thoughts_enabled', False), + 'blocked': payload.get('blocked', False), + } + + def generate_compatibility_response(): + """Bridge legacy JSON chat handling into a terminal SSE event for parity cases.""" + try: + if data.get('image_generation'): + prompt_text = (data.get('message') or '').strip() + prompt_preview = prompt_text[:120] + '...' if len(prompt_text) > 120 else prompt_text + + image_prompt_event = { + 'type': 'thought', + 'step_type': 'generation', + 'content': f'Generating image based on \"{prompt_preview}\"' if prompt_preview else 'Generating image from your prompt' + } + yield f"data: {json.dumps(image_prompt_event)}\n\n" + + image_request_event = { + 'type': 'thought', + 'step_type': 'generation', + 'content': 'Preparing image model request' + } + yield f"data: {json.dumps(image_request_event)}\n\n" + + legacy_result = chat_api() + legacy_response = legacy_result + status_code = 200 + + if isinstance(legacy_result, tuple): + legacy_response = legacy_result[0] + if len(legacy_result) > 1 and isinstance(legacy_result[1], int): + status_code = legacy_result[1] + + if hasattr(legacy_response, 'get_json'): + payload = legacy_response.get_json(silent=True) or {} + else: + payload = {} + + if status_code >= 400: + error_message = payload.get('error') or f'Compatibility chat request failed ({status_code})' + yield f"data: {json.dumps({'error': error_message})}\n\n" + return + + if payload.get('image_url'): + image_ready_event = { + 'type': 'thought', + 'step_type': 'generation', + 'content': 'Image generated and ready to display' + } + yield f"data: {json.dumps(image_ready_event)}\n\n" + + yield f"data: {json.dumps(normalize_legacy_chat_payload(payload))}\n\n" + except Exception as compatibility_error: + yield f"data: {json.dumps({'error': str(compatibility_error)})}\n\n" + + if compatibility_mode: + debug_print("[Streaming] Routing request through compatibility bridge") + return build_background_stream_response(generate_compatibility_response) def generate(): try: @@ -2757,6 +5138,24 @@ def generate(): classifications_to_send = data.get('classifications') chat_type = data.get('chat_type', 'user') reasoning_effort = data.get('reasoning_effort') # Extract reasoning effort for reasoning models + + debug_print( + "[Streaming] Parsed request payload | " + f"user_id={user_id} | " + f"conversation_id={conversation_id} | " + f"message_length={len(user_message)} | " + f"hybrid_search={hybrid_search_enabled} | " + f"web_search={web_search_enabled} | " + f"doc_scope={document_scope} | " + f"chat_type={chat_type} | " + f"selected_document_id={selected_document_id} | " + f"selected_document_ids={len(selected_document_ids)} | " + f"active_group_id={active_group_id} | " + f"active_group_ids={len(active_group_ids)} | " + f"active_public_workspace_id={active_public_workspace_id} | " + f"frontend_model={frontend_gpt_model} | " + f"reasoning_effort={reasoning_effort}" + ) # Check if agents are enabled enable_semantic_kernel = settings.get('enable_semantic_kernel', False) @@ -2816,6 +5215,9 @@ def generate(): from semantic_kernel_plugins.plugin_invocation_logger import get_plugin_logger plugin_logger = get_plugin_logger() plugin_logger.clear_invocations_for_conversation(user_id, conversation_id) + debug_print( + f"[Streaming] Cleared plugin invocations for user_id={user_id}, conversation_id={conversation_id}" + ) # Validate chat_type if chat_type not in ('user', 'group'): @@ -2841,6 +5243,12 @@ def generate(): hybrid_search_enabled = hybrid_search_enabled.lower() == 'true' if isinstance(web_search_enabled, str): web_search_enabled = web_search_enabled.lower() == 'true' + debug_print( + "[Streaming] Normalized toggles | " + f"hybrid_search={hybrid_search_enabled} | " + f"web_search={web_search_enabled} | " + f"chat_type={chat_type}" + ) # Initialize GPT client (simplified version) gpt_model = "" @@ -2904,6 +5312,10 @@ def generate(): if not gpt_client or not gpt_model: yield f"data: {json.dumps({'error': 'Failed to initialize AI model'})}\n\n" return + + debug_print( + f"[Streaming] Initialized model client | model={gpt_model} | enable_gpt_apim={enable_gpt_apim}" + ) except Exception as e: yield f"data: {json.dumps({'error': f'Model initialization failed: {str(e)}'})}\n\n" @@ -2922,11 +5334,13 @@ def generate(): 'strict': False } cosmos_conversations_container.upsert_item(conversation_item) + debug_print(f"[Streaming] Created new conversation {conversation_id}") else: try: conversation_item = cosmos_conversations_container.read_item( item=conversation_id, partition_key=conversation_id ) + debug_print(f"[Streaming] Loaded existing conversation {conversation_id}") except CosmosResourceNotFoundError: conversation_item = { 'id': conversation_id, @@ -2938,6 +5352,7 @@ def generate(): 'strict': False } cosmos_conversations_container.upsert_item(conversation_item) + debug_print(f"[Streaming] Conversation {conversation_id} not found; created replacement") # Determine chat type actual_chat_type = 'personal' @@ -3088,6 +5503,9 @@ def generate(): } cosmos_messages_container.upsert_item(user_message_doc) + debug_print( + f"[Streaming] Saved user message {user_message_id} | thread_id={current_user_thread_id} | previous_thread_id={previous_thread_id}" + ) # Log activity try: @@ -3111,10 +5529,127 @@ def generate(): conversation_item['last_updated'] = datetime.utcnow().isoformat() cosmos_conversations_container.upsert_item(conversation_item) - + + # Generate assistant_message_id early for thought tracking + assistant_message_id = f"{conversation_id}_assistant_{int(time.time())}_{random.randint(1000,9999)}" + + # Initialize thought tracker for streaming path + thought_tracker = ThoughtTracker( + conversation_id=conversation_id, + message_id=assistant_message_id, + thread_id=current_user_thread_id, + user_id=user_id + ) + + def emit_thought(step_type, content, detail=None): + """Add a thought to Cosmos and return an SSE event string.""" + thought_tracker.add_thought(step_type, content, detail) + return f"data: {json.dumps({'type': 'thought', 'step_index': thought_tracker.current_index - 1, 'step_type': step_type, 'content': content})}\n\n" + + # Content Safety check (matching non-streaming path) + blocked = False + if settings.get('enable_content_safety') and "content_safety_client" in CLIENTS: + yield emit_thought('content_safety', 'Checking content safety...') + try: + content_safety_client = CLIENTS["content_safety_client"] + request_obj = AnalyzeTextOptions(text=user_message) + cs_response = content_safety_client.analyze_text(request_obj) + + max_severity = 0 + triggered_categories = [] + blocklist_matches = [] + block_reasons = [] + + for cat_result in cs_response.categories_analysis: + triggered_categories.append({ + "category": cat_result.category, + "severity": cat_result.severity + }) + if cat_result.severity > max_severity: + max_severity = cat_result.severity + + if cs_response.blocklists_match: + for match in cs_response.blocklists_match: + blocklist_matches.append({ + "blocklistName": match.blocklist_name, + "blocklistItemId": match.blocklist_item_id, + "blocklistItemText": match.blocklist_item_text + }) + + if max_severity >= 4: + blocked = True + block_reasons.append("Max severity >= 4") + if len(blocklist_matches) > 0: + blocked = True + block_reasons.append("Blocklist match") + + if blocked: + # Upsert to safety container + safety_item = { + 'id': str(uuid.uuid4()), + 'user_id': user_id, + 'conversation_id': conversation_id, + 'message': user_message, + 'triggered_categories': triggered_categories, + 'blocklist_matches': blocklist_matches, + 'timestamp': datetime.utcnow().isoformat(), + 'reason': "; ".join(block_reasons), + 'metadata': {} + } + cosmos_safety_container.upsert_item(safety_item) + + # Build blocked message + blocked_msg_content = ( + "Your message was blocked by Content Safety.\n\n" + f"**Reason**: {', '.join(block_reasons)}\n" + "Triggered categories:\n" + ) + for cat in triggered_categories: + blocked_msg_content += ( + f" - {cat['category']} (severity={cat['severity']})\n" + ) + if blocklist_matches: + blocked_msg_content += ( + "\nBlocklist Matches:\n" + + "\n".join([f" - {m['blocklistItemText']} (in {m['blocklistName']})" + for m in blocklist_matches]) + ) + + # Insert safety message + safety_message_id = f"{conversation_id}_safety_{int(time.time())}_{random.randint(1000,9999)}" + safety_doc = { + 'id': safety_message_id, + 'conversation_id': conversation_id, + 'role': 'safety', + 'content': blocked_msg_content.strip(), + 'timestamp': datetime.utcnow().isoformat(), + 'model_deployment_name': None, + 'metadata': {}, + } + cosmos_messages_container.upsert_item(safety_doc) + + conversation_item['last_updated'] = datetime.utcnow().isoformat() + cosmos_conversations_container.upsert_item(conversation_item) + + # Stream the blocked response and stop + yield f"data: {json.dumps({'content': blocked_msg_content.strip(), 'blocked': True})}\n\n" + yield "data: [DONE]\n\n" + return + + except HttpResponseError as e: + debug_print(f"[Content Safety Error - Streaming] {e}") + except Exception as ex: + debug_print(f"[Content Safety - Streaming] Unexpected error: {ex}") + # Hybrid search (if enabled) combined_documents = [] if hybrid_search_enabled: + debug_print( + "[Streaming] Starting hybrid search | " + f"conversation_id={conversation_id} | doc_scope={document_scope} | " + f"selected_document_ids={len(selected_document_ids)} | tags={len(tags_filter) if isinstance(tags_filter, list) else 0}" + ) + yield emit_thought('search', f"Searching {document_scope or 'personal'} workspace documents for '{(search_query or user_message)[:50]}'") try: search_args = { "query": search_query, @@ -3142,10 +5677,15 @@ def generate(): search_args['tags_filter'] = tags_filter search_results = hybrid_search(**search_args) + debug_print( + f"[Streaming] Hybrid search completed | results={len(search_results) if search_results else 0}" + ) except Exception as e: debug_print(f"Error during hybrid search: {e}") - + if search_results: + unique_doc_names_stream = set(doc.get('file_name', 'Unknown') for doc in search_results) + yield emit_thought('search', f"Found {len(search_results)} results from {len(unique_doc_names_stream)} documents") retrieved_texts = [] for doc in search_results: @@ -3159,14 +5699,24 @@ def generate(): chunk_id = doc.get('chunk_id', str(uuid.uuid4())) score = doc.get('score', 0.0) group_id = doc.get('group_id', None) + sheet_name = doc.get('sheet_name') + location_label, location_value = get_citation_location( + file_name, + page_number=page_number, + chunk_text=chunk_text, + sheet_name=sheet_name, + ) - citation = f"(Source: {file_name}, Page: {page_number}) [#{citation_id}]" + citation = f"(Source: {file_name}, {location_label}: {location_value}) [#{citation_id}]" retrieved_texts.append(f"{chunk_text}\n{citation}") combined_documents.append({ "file_name": file_name, "citation_id": citation_id, "page_number": page_number, + "sheet_name": sheet_name, + "location_label": location_label, + "location_value": location_value, "version": version, "classification": classification, "chunk_text": chunk_text, @@ -3303,27 +5853,106 @@ def generate(): retrieved_texts.append(vision_context) retrieved_content = "\n\n".join(retrieved_texts) - system_prompt_search = f"""You are an AI assistant. Use the following retrieved document excerpts to answer the user's question. Cite sources using the format (Source: filename, Page: page number). - Retrieved Excerpts: - {retrieved_content} - - Based *only* on the information provided above, answer the user's query. If the answer isn't in the excerpts, say so. - - Example - User: What is the policy on double dipping? - Assistant: The policy prohibits entities from using federal funds received through one program to apply for additional funds through another program, commonly known as 'double dipping' (Source: PolicyDocument.pdf, Page: 12) - """ + system_prompt_search = build_search_augmentation_system_prompt(retrieved_content) system_messages_for_augmentation.append({ 'role': 'system', 'content': system_prompt_search, 'documents': combined_documents }) - + # Reorder hybrid citations list in descending order based on page_number hybrid_citations_list.sort(key=lambda x: x.get('page_number', 0), reverse=True) + workspace_tabular_files = set() + if hybrid_search_enabled and settings.get('enable_tabular_processing_plugin', False) and settings.get('enable_enhanced_citations', False): + workspace_tabular_files = collect_workspace_tabular_filenames( + combined_documents=combined_documents, + selected_document_ids=selected_document_ids, + selected_document_id=selected_document_id, + document_scope=document_scope, + ) + + if hybrid_search_enabled and workspace_tabular_files and settings.get('enable_tabular_processing_plugin', False) and settings.get('enable_enhanced_citations', False): + tabular_source_hint = determine_tabular_source_hint( + document_scope, + active_group_id=active_group_id, + active_public_workspace_id=active_public_workspace_id, + ) + tabular_execution_mode = get_tabular_execution_mode(user_message) + tabular_filenames_str = ", ".join(sorted(workspace_tabular_files)) + plugin_logger = get_plugin_logger() + baseline_tabular_invocation_count = len( + plugin_logger.get_invocations_for_conversation(user_id, conversation_id, limit=1000) + ) + debug_print( + "[Streaming][Tabular SK] Starting workspace tabular analysis | " + f"files={sorted(workspace_tabular_files)} | source_hint={tabular_source_hint} | " + f"execution_mode={tabular_execution_mode} | baseline_invocations={baseline_tabular_invocation_count}" + ) + + tabular_analysis = asyncio.run(run_tabular_sk_analysis( + user_question=user_message, + tabular_filenames=workspace_tabular_files, + user_id=user_id, + conversation_id=conversation_id, + gpt_model=gpt_model, + settings=settings, + source_hint=tabular_source_hint, + group_id=active_group_id if tabular_source_hint == 'group' else None, + public_workspace_id=active_public_workspace_id if tabular_source_hint == 'public' else None, + execution_mode=tabular_execution_mode, + )) + tabular_invocations = get_new_plugin_invocations( + plugin_logger.get_invocations_for_conversation(user_id, conversation_id, limit=1000), + baseline_tabular_invocation_count + ) + debug_print( + "[Streaming][Tabular SK] Completed workspace tabular analysis | " + f"analysis_returned={bool(tabular_analysis)} | new_invocations={len(tabular_invocations)}" + ) + tabular_thought_payloads = get_tabular_tool_thought_payloads(tabular_invocations) + for thought_content, thought_detail in tabular_thought_payloads: + yield emit_thought('tabular_analysis', thought_content, thought_detail) + tabular_status_thought_payloads = get_tabular_status_thought_payloads( + tabular_invocations, + analysis_succeeded=bool(tabular_analysis), + ) + for thought_content, thought_detail in tabular_status_thought_payloads: + yield emit_thought('tabular_analysis', thought_content, thought_detail) + + if tabular_analysis: + system_messages_for_augmentation.append({ + 'role': 'system', + 'content': build_tabular_computed_results_system_message( + f"the file(s) {tabular_filenames_str}", + tabular_analysis, + ) + }) + + tabular_sk_citations = collect_tabular_sk_citations(user_id, conversation_id) + if tabular_sk_citations: + agent_citations_list.extend(tabular_sk_citations) + else: + system_messages_for_augmentation.append({ + 'role': 'system', + 'content': build_tabular_fallback_system_message( + tabular_filenames_str, + execution_mode=tabular_execution_mode, + ) + }) + + yield emit_thought( + 'tabular_analysis', + "Tabular analysis could not compute results; using schema context instead", + detail=f"files={tabular_filenames_str}" + ) + if web_search_enabled: + debug_print( + f"[Streaming] Starting web search augmentation for conversation_id={conversation_id}" + ) + yield emit_thought('web_search', f"Searching the web for '{(search_query or user_message)[:50]}'") perform_web_search( settings=settings, conversation_id=conversation_id, @@ -3339,6 +5968,11 @@ def generate(): agent_citations_list=agent_citations_list, web_search_citations_list=web_search_citations_list, ) + if web_search_citations_list: + debug_print( + f"[Streaming] Web search completed | citations={len(web_search_citations_list)}" + ) + yield emit_thought('web_search', f"Got {len(web_search_citations_list)} web search results") # Update message chat type message_chat_type = None @@ -3381,15 +6015,139 @@ def generate(): 'content': aug_msg['content'] }) - # Add recent messages + # Add recent messages (with file role handling) allowed_roles_in_history = ['user', 'assistant'] + max_file_content_length_in_history = 50000 + max_tabular_content_length_in_history = 50000 + chat_tabular_files = set() # Track tabular files uploaded directly to chat + for message in recent_messages: - if message.get('role') in allowed_roles_in_history: + role = message.get('role') + content = message.get('content', '') + + if role in allowed_roles_in_history: conversation_history_for_api.append({ - 'role': message['role'], - 'content': message.get('content', '') + 'role': role, + 'content': content }) - + elif role == 'file': + filename = message.get('filename', 'uploaded_file') + file_content = message.get('file_content', '') + is_table = message.get('is_table', False) + file_content_source = message.get('file_content_source', '') + + # Tabular files stored in blob - track for mini SK analysis + if is_table and file_content_source == 'blob': + chat_tabular_files.add(filename) + conversation_history_for_api.append({ + 'role': 'system', + 'content': ( + f"[User uploaded a tabular data file named '{filename}'. " + f"The file is stored in blob storage and available for analysis. " + f"Use the tabular_processing plugin functions (list_tabular_files, " + f"describe_tabular_file, aggregate_column, filter_rows, " + f"query_tabular_data, group_by_aggregate, group_by_datetime_component) to analyze this data. " + f"The file source is 'chat'.]" + ) + }) + else: + content_limit = ( + max_tabular_content_length_in_history if is_table + else max_file_content_length_in_history + ) + display_content = file_content[:content_limit] + if len(file_content) > content_limit: + display_content += "..." + + if is_table: + conversation_history_for_api.append({ + 'role': 'system', + 'content': ( + f"[User uploaded a tabular data file named '{filename}'. " + f"This is CSV format data for analysis:\n{display_content}]\n" + f"This is complete tabular data in CSV format. You can perform " + f"calculations, analysis, and data operations on this dataset." + ) + }) + else: + conversation_history_for_api.append({ + 'role': 'system', + 'content': ( + f"[User uploaded a file named '{filename}'. " + f"Content preview:\n{display_content}]\n" + f"Use this file context if relevant." + ) + }) + + # --- Mini SK analysis for tabular files uploaded directly to chat --- + if chat_tabular_files and settings.get('enable_tabular_processing_plugin', False) and settings.get('enable_enhanced_citations', False): + chat_tabular_filenames_str = ", ".join(chat_tabular_files) + chat_tabular_execution_mode = get_tabular_execution_mode(user_message) + log_event( + f"[Chat Tabular SK] Streaming: Detected {len(chat_tabular_files)} tabular file(s) uploaded to chat: {chat_tabular_filenames_str}", + level=logging.INFO + ) + plugin_logger = get_plugin_logger() + baseline_tabular_invocation_count = len( + plugin_logger.get_invocations_for_conversation(user_id, conversation_id, limit=1000) + ) + debug_print( + "[Streaming][Chat Tabular SK] Starting chat-uploaded tabular analysis | " + f"files={sorted(chat_tabular_files)} | execution_mode={chat_tabular_execution_mode} | " + f"baseline_invocations={baseline_tabular_invocation_count}" + ) + + chat_tabular_analysis = asyncio.run(run_tabular_sk_analysis( + user_question=user_message, + tabular_filenames=chat_tabular_files, + user_id=user_id, + conversation_id=conversation_id, + gpt_model=gpt_model, + settings=settings, + source_hint="chat", + execution_mode=chat_tabular_execution_mode, + )) + chat_tabular_invocations = get_new_plugin_invocations( + plugin_logger.get_invocations_for_conversation(user_id, conversation_id, limit=1000), + baseline_tabular_invocation_count + ) + debug_print( + "[Streaming][Chat Tabular SK] Completed chat-uploaded tabular analysis | " + f"analysis_returned={bool(chat_tabular_analysis)} | new_invocations={len(chat_tabular_invocations)}" + ) + chat_tabular_thought_payloads = get_tabular_tool_thought_payloads(chat_tabular_invocations) + for thought_content, thought_detail in chat_tabular_thought_payloads: + yield emit_thought('tabular_analysis', thought_content, thought_detail) + chat_tabular_status_thought_payloads = get_tabular_status_thought_payloads( + chat_tabular_invocations, + analysis_succeeded=bool(chat_tabular_analysis), + ) + for thought_content, thought_detail in chat_tabular_status_thought_payloads: + yield emit_thought('tabular_analysis', thought_content, thought_detail) + + if chat_tabular_analysis: + conversation_history_for_api.append({ + 'role': 'system', + 'content': build_tabular_computed_results_system_message( + f"the chat-uploaded file(s) {chat_tabular_filenames_str}", + chat_tabular_analysis, + ) + }) + + # Collect tool execution citations + chat_tabular_sk_citations = collect_tabular_sk_citations(user_id, conversation_id) + if chat_tabular_sk_citations: + agent_citations_list.extend(chat_tabular_sk_citations) + + debug_print(f"[Chat Tabular SK] Streaming: Analysis injected, {len(chat_tabular_analysis)} chars") + else: + yield emit_thought( + 'tabular_analysis', + "Tabular analysis could not compute results; using existing chat file context", + detail=f"files={chat_tabular_filenames_str}" + ) + debug_print("[Chat Tabular SK] Streaming: Analysis returned None, relying on existing file context") + except Exception as e: yield f"data: {json.dumps({'error': f'History error: {str(e)}'})}\n\n" return @@ -3472,18 +6230,46 @@ def generate(): # Stream the response accumulated_content = "" token_usage_data = None # Will be populated from final stream chunk - assistant_message_id = f"{conversation_id}_assistant_{int(time.time())}_{random.randint(1000,9999)}" + # assistant_message_id was generated earlier for thought tracking final_model_used = gpt_model # Default to gpt_model, will be overridden if agent is used # DEBUG: Check agent streaming decision debug_print(f"[DEBUG] use_agent_streaming={use_agent_streaming}, selected_agent={selected_agent is not None}") debug_print(f"[DEBUG] enable_semantic_kernel={enable_semantic_kernel}, user_enable_agents={user_enable_agents}") + debug_print( + "[Streaming] Selected response path | " + f"use_agent_streaming={use_agent_streaming} | " + f"selected_agent={getattr(selected_agent, 'name', None) if selected_agent else None} | " + f"model={gpt_model}" + ) try: if use_agent_streaming and selected_agent: # Stream from agent using invoke_stream + yield emit_thought('agent_tool_call', f"Sending to agent '{agent_display_name_used or agent_name_used}'") + yield emit_thought('generation', f"Sending to '{actual_model_used}'") debug_print(f"--- Streaming from Agent: {agent_name_used} ---") - + + # Register callback to persist plugin thoughts to Cosmos in real-time + callback_key = f"{user_id}:{conversation_id}" + plugin_logger_cb = get_plugin_logger() + debug_print( + f"[Streaming][Plugin Callback] Registering callback for key={callback_key}" + ) + + def on_plugin_invocation_streaming(inv): + duration_str = f" ({int(inv.duration_ms)}ms)" if inv.duration_ms else "" + tool_name = f"{inv.plugin_name}.{inv.function_name}" + debug_print( + f"[Streaming][Plugin Callback] Received invocation {tool_name}{duration_str} | success={inv.success}" + ) + thought_tracker.add_thought( + 'agent_tool_call', + f"Agent called {tool_name}{duration_str}" + ) + + plugin_logger_cb.register_callback(callback_key, on_plugin_invocation_streaming) + # Import required classes from semantic_kernel.contents.chat_message_content import ChatMessageContent @@ -3497,6 +6283,8 @@ def generate(): for msg in conversation_history_for_api ] + agent_stream_start_time = time.time() + # Stream agent responses - collect chunks first then yield async def stream_agent_async(): """Collect all streaming chunks from agent""" @@ -3524,7 +6312,6 @@ async def stream_agent_async(): return chunks, usage_data # Execute async streaming - import asyncio try: # Try to get existing event loop loop = asyncio.get_event_loop() @@ -3539,36 +6326,59 @@ async def stream_agent_async(): try: # Run streaming and collect chunks and usage chunks, stream_usage = loop.run_until_complete(stream_agent_async()) - - # Yield chunks to frontend - for chunk_content in chunks: - accumulated_content += chunk_content - yield f"data: {json.dumps({'content': chunk_content})}\n\n" - - # Try to capture token usage from stream metadata - if stream_usage: - # stream_usage is a CompletionUsage object, not a dict - prompt_tokens = getattr(stream_usage, 'prompt_tokens', 0) - completion_tokens = getattr(stream_usage, 'completion_tokens', 0) - total_tokens = getattr(stream_usage, 'total_tokens', None) - - # Calculate total if not provided - if total_tokens is None or total_tokens == 0: - total_tokens = prompt_tokens + completion_tokens - - token_usage_data = { - 'prompt_tokens': prompt_tokens, - 'completion_tokens': completion_tokens, - 'total_tokens': total_tokens, - 'captured_at': datetime.utcnow().isoformat() - } - debug_print(f"[Agent Streaming Tokens] From metadata - prompt: {prompt_tokens}, completion: {completion_tokens}, total: {total_tokens}") except Exception as stream_error: + plugin_logger_cb.deregister_callbacks(callback_key) + debug_print( + f"[Streaming][Plugin Callback] Deregistered callback after streaming error for key={callback_key}" + ) debug_print(f"❌ Agent streaming error: {stream_error}") import traceback traceback.print_exc() yield f"data: {json.dumps({'error': f'Agent streaming failed: {str(stream_error)}'})}\n\n" return + + # Emit responded thought with total duration from user message + agent_stream_total_duration_s = round(time.time() - request_start_time, 1) + yield emit_thought('generation', f"'{actual_model_used}' responded ({agent_stream_total_duration_s}s from initial message)") + + # Deregister callback (agent completed successfully) + plugin_logger_cb.deregister_callbacks(callback_key) + debug_print( + f"[Streaming][Plugin Callback] Deregistered callback after successful stream for key={callback_key}" + ) + + # Emit SSE-only events for streaming UI (Cosmos writes already done by callback) + agent_plugin_invocations = plugin_logger_cb.get_invocations_for_conversation(user_id, conversation_id) + for inv in agent_plugin_invocations: + duration_str = f" ({int(inv.duration_ms)}ms)" if inv.duration_ms else "" + tool_name = f"{inv.plugin_name}.{inv.function_name}" + content = f"Agent called {tool_name}{duration_str}" + yield f"data: {json.dumps({'type': 'thought', 'step_index': thought_tracker.current_index, 'step_type': 'agent_tool_call', 'content': content})}\n\n" + thought_tracker.current_index += 1 + + # Yield chunks to frontend + for chunk_content in chunks: + accumulated_content += chunk_content + yield f"data: {json.dumps({'content': chunk_content})}\n\n" + + # Try to capture token usage from stream metadata + if stream_usage: + # stream_usage is a CompletionUsage object, not a dict + prompt_tokens = getattr(stream_usage, 'prompt_tokens', 0) + completion_tokens = getattr(stream_usage, 'completion_tokens', 0) + total_tokens = getattr(stream_usage, 'total_tokens', None) + + # Calculate total if not provided + if total_tokens is None or total_tokens == 0: + total_tokens = prompt_tokens + completion_tokens + + token_usage_data = { + 'prompt_tokens': prompt_tokens, + 'completion_tokens': completion_tokens, + 'total_tokens': total_tokens, + 'captured_at': datetime.utcnow().isoformat() + } + debug_print(f"[Agent Streaming Tokens] From metadata - prompt: {prompt_tokens}, completion: {completion_tokens}, total: {total_tokens}") # Collect token usage from kernel services if not captured from stream if not token_usage_data: @@ -3650,6 +6460,7 @@ def make_json_serializable(obj): else: # Stream from regular GPT model (non-agent) + yield emit_thought('generation', f"Sending to '{gpt_model}'") debug_print(f"--- Streaming from GPT ({gpt_model}) ---") # Prepare stream parameters @@ -3700,6 +6511,10 @@ def make_json_serializable(obj): 'captured_at': datetime.utcnow().isoformat() } debug_print(f"[Streaming Tokens] Captured usage - prompt: {chunk.usage.prompt_tokens}, completion: {chunk.usage.completion_tokens}, total: {chunk.usage.total_tokens}") + + # Emit responded thought for regular LLM streaming + gpt_stream_total_duration_s = round(time.time() - request_start_time, 1) + yield emit_thought('generation', f"'{gpt_model}' responded ({gpt_stream_total_duration_s}s from initial message)") # Stream complete - save message and send final metadata # Get user thread info to maintain thread consistency @@ -3801,6 +6616,29 @@ def make_json_serializable(obj): except Exception as e: debug_print(f"Error collecting conversation metadata: {e}") + if is_personal_chat_conversation(conversation_item): + conversation_item = mark_conversation_unread( + conversation_item, + assistant_message_id, + unread_timestamp=conversation_item['last_updated'] + ) + + notification_doc = create_chat_response_notification( + user_id=user_id, + conversation_id=conversation_id, + message_id=assistant_message_id, + conversation_title=conversation_item.get('title', ''), + response_preview=accumulated_content, + ) + if notification_doc: + debug_print( + f"Created chat completion notification {notification_doc['id']} for conversation {conversation_id}" + ) + else: + debug_print( + f"Skipping personal chat completion notification for conversation {conversation_id} because chat_type={conversation_item.get('chat_type')}" + ) + cosmos_conversations_container.upsert_item(conversation_item) # Send final message with metadata @@ -3818,8 +6656,16 @@ def make_json_serializable(obj): 'agent_citations': agent_citations_list, 'agent_display_name': agent_display_name_used if use_agent_streaming else None, 'agent_name': agent_name_used if use_agent_streaming else None, - 'full_content': accumulated_content + 'full_content': accumulated_content, + 'thoughts_enabled': thought_tracker.enabled } + debug_print( + "[Streaming] Finalizing stream response | " + f"conversation_id={conversation_id} | message_id={assistant_message_id} | " + f"content_length={len(accumulated_content)} | hybrid_citations={len(hybrid_citations_list)} | " + f"web_citations={len(web_search_citations_list)} | agent_citations={len(agent_citations_list)} | " + f"thoughts_enabled={thought_tracker.enabled}" + ) yield f"data: {json.dumps(final_data)}\n\n" except Exception as e: @@ -3871,15 +6717,7 @@ def make_json_serializable(obj): debug_print(f"[STREAM API ERROR] Full traceback:\n{error_traceback}") yield f"data: {json.dumps({'error': f'Internal server error: {str(e)}'})}\n\n" - return Response( - stream_with_context(generate()), - mimetype='text/event-stream', - headers={ - 'Cache-Control': 'no-cache', - 'X-Accel-Buffering': 'no', - 'Connection': 'keep-alive' - } - ) + return build_background_stream_response(generate) @app.route('/api/message//mask', methods=['POST']) @swagger_route(security=get_auth_security()) diff --git a/application/single_app/route_backend_control_center.py b/application/single_app/route_backend_control_center.py index 2c3952f1..a28f756b 100644 --- a/application/single_app/route_backend_control_center.py +++ b/application/single_app/route_backend_control_center.py @@ -3572,8 +3572,15 @@ def api_bulk_public_workspace_action(): deleted_count = 0 for doc in docs_to_delete: try: - delete_document_chunks(doc['id']) - delete_document(doc['id']) + delete_document_chunks( + document_id=doc['id'], + public_workspace_id=workspace_id, + ) + delete_document( + user_id=None, + document_id=doc['id'], + public_workspace_id=workspace_id, + ) deleted_count += 1 except Exception as del_e: debug_print(f"Error deleting document {doc['id']}: {del_e}") diff --git a/application/single_app/route_backend_conversation_export.py b/application/single_app/route_backend_conversation_export.py index aad750e4..689d3476 100644 --- a/application/single_app/route_backend_conversation_export.py +++ b/application/single_app/route_backend_conversation_export.py @@ -2,15 +2,31 @@ import io import json +import markdown2 +import re +import tempfile import zipfile +from collections import Counter, defaultdict from datetime import datetime +from html import escape as _escape_html +from typing import Any, Dict, List, Optional from config import * +from flask import jsonify, make_response, request +from functions_appinsights import log_event from functions_authentication import * -from functions_settings import * -from flask import Response, jsonify, request, make_response +from functions_chat import sort_messages_by_thread +from functions_conversation_metadata import update_conversation_with_metadata from functions_debug import debug_print +from functions_settings import * +from functions_thoughts import get_thoughts_for_conversation from swagger_wrapper import swagger_route, get_auth_security +from docx import Document as DocxDocument +from docx.shared import Pt + + +TRANSCRIPT_ROLES = {'user', 'assistant'} +SUMMARY_SOURCE_CHAR_LIMIT = 60000 def register_route_backend_conversation_export(app): @@ -29,32 +45,36 @@ def api_export_conversations(): conversation_ids (list): List of conversation IDs to export. format (str): Export format — "json" or "markdown". packaging (str): Output packaging — "single" or "zip". + include_summary_intro (bool): Whether to generate a per-conversation intro. + summary_model_deployment (str): Optional model deployment for summary generation. """ user_id = get_current_user_id() if not user_id: return jsonify({'error': 'User not authenticated'}), 401 - data = request.get_json() + data = request.get_json(silent=True) if not data: return jsonify({'error': 'Request body is required'}), 400 conversation_ids = data.get('conversation_ids', []) - export_format = data.get('format', 'json').lower() - packaging = data.get('packaging', 'single').lower() + export_format = str(data.get('format', 'json')).lower() + packaging = str(data.get('packaging', 'single')).lower() + include_summary_intro = bool(data.get('include_summary_intro', False)) + summary_model_deployment = str(data.get('summary_model_deployment', '') or '').strip() if not conversation_ids or not isinstance(conversation_ids, list): return jsonify({'error': 'At least one conversation_id is required'}), 400 - if export_format not in ('json', 'markdown'): - return jsonify({'error': 'Format must be "json" or "markdown"'}), 400 + if export_format not in ('json', 'markdown', 'pdf'): + return jsonify({'error': 'Format must be "json", "markdown", or "pdf"'}), 400 if packaging not in ('single', 'zip'): return jsonify({'error': 'Packaging must be "single" or "zip"'}), 400 try: + settings = get_settings() exported = [] for conv_id in conversation_ids: - # Verify ownership and fetch conversation try: conversation = cosmos_conversations_container.read_item( item=conv_id, @@ -64,225 +84,1798 @@ def api_export_conversations(): debug_print(f"Export: conversation {conv_id} not found or access denied") continue - # Verify user owns this conversation if conversation.get('user_id') != user_id: debug_print(f"Export: user {user_id} does not own conversation {conv_id}") continue - # Fetch messages ordered by timestamp - message_query = f""" + message_query = """ SELECT * FROM c - WHERE c.conversation_id = '{conv_id}' + WHERE c.conversation_id = @conversation_id ORDER BY c.timestamp ASC """ messages = list(cosmos_messages_container.query_items( query=message_query, + parameters=[{'name': '@conversation_id', 'value': conv_id}], partition_key=conv_id )) - # Filter for active thread messages only - filtered_messages = [] - for msg in messages: - thread_info = msg.get('metadata', {}).get('thread_info', {}) - active = thread_info.get('active_thread') - if active is True or active is None or 'active_thread' not in thread_info: - filtered_messages.append(msg) - - exported.append({ - 'conversation': _sanitize_conversation(conversation), - 'messages': [_sanitize_message(m) for m in filtered_messages] - }) + exported.append( + _build_export_entry( + conversation=conversation, + raw_messages=messages, + user_id=user_id, + settings=settings, + include_summary_intro=include_summary_intro, + summary_model_deployment=summary_model_deployment + ) + ) if not exported: return jsonify({'error': 'No accessible conversations found'}), 404 - # Generate export content timestamp_str = datetime.utcnow().strftime('%Y%m%d_%H%M%S') if packaging == 'zip': return _build_zip_response(exported, export_format, timestamp_str) - else: - return _build_single_file_response(exported, export_format, timestamp_str) - - except Exception as e: - debug_print(f"Export error: {str(e)}") - return jsonify({'error': f'Export failed: {str(e)}'}), 500 - - def _sanitize_conversation(conv): - """Return only user-facing conversation fields.""" - return { - 'id': conv.get('id'), - 'title': conv.get('title', 'Untitled'), - 'last_updated': conv.get('last_updated', ''), - 'chat_type': conv.get('chat_type', 'personal'), - 'tags': conv.get('tags', []), - 'is_pinned': conv.get('is_pinned', False), - 'context': conv.get('context', []) - } - - def _sanitize_message(msg): - """Return only user-facing message fields.""" - result = { - 'role': msg.get('role', ''), - 'content': msg.get('content', ''), - 'timestamp': msg.get('timestamp', ''), - } - # Include citations if present - if msg.get('citations'): - result['citations'] = msg['citations'] - # Include context/tool info if present - if msg.get('context'): - result['context'] = msg['context'] - return result - - def _build_single_file_response(exported, export_format, timestamp_str): - """Build a single-file download response.""" - if export_format == 'json': - content = json.dumps(exported, indent=2, ensure_ascii=False, default=str) - filename = f"conversations_export_{timestamp_str}.json" - content_type = 'application/json; charset=utf-8' + + return _build_single_file_response(exported, export_format, timestamp_str) + + except Exception as exc: + debug_print(f"Export error: {str(exc)}") + log_event(f"Conversation export failed: {exc}", level="WARNING") + return jsonify({'error': f'Export failed: {str(exc)}'}), 500 + + @app.route('/api/message/export-word', methods=['POST']) + @swagger_route(security=get_auth_security()) + @login_required + @user_required + def api_export_message_word(): + """ + Export a single message as a Word (.docx) document. + + Request body: + message_id (str): ID of the message to export. + conversation_id (str): ID of the conversation the message belongs to. + """ + user_id = get_current_user_id() + if not user_id: + return jsonify({'error': 'User not authenticated'}), 401 + + data = request.get_json(silent=True) + if not data: + return jsonify({'error': 'Request body is required'}), 400 + + message_id = str(data.get('message_id', '') or '').strip() + conversation_id = str(data.get('conversation_id', '') or '').strip() + + if not message_id or not conversation_id: + return jsonify({'error': 'message_id and conversation_id are required'}), 400 + + try: + try: + conversation = cosmos_conversations_container.read_item( + item=conversation_id, + partition_key=conversation_id + ) + except Exception: + return jsonify({'error': 'Conversation not found'}), 404 + + if conversation.get('user_id') != user_id: + return jsonify({'error': 'Access denied'}), 403 + + try: + message = cosmos_messages_container.read_item( + item=message_id, + partition_key=conversation_id + ) + except Exception: + message_query = """ + SELECT * FROM c + WHERE c.id = @message_id AND c.conversation_id = @conversation_id + """ + message_results = list(cosmos_messages_container.query_items( + query=message_query, + parameters=[ + {'name': '@message_id', 'value': message_id}, + {'name': '@conversation_id', 'value': conversation_id} + ], + enable_cross_partition_query=True + )) + if not message_results: + return jsonify({'error': 'Message not found'}), 404 + message = message_results[0] + + if message.get('conversation_id') != conversation_id: + return jsonify({'error': 'Message not found'}), 404 + + document_bytes = _message_to_docx_bytes(message) + timestamp_str = datetime.utcnow().strftime('%Y%m%d_%H%M%S') + filename = f"message_export_{timestamp_str}.docx" + + response = make_response(document_bytes) + response.headers['Content-Type'] = ( + 'application/vnd.openxmlformats-officedocument.wordprocessingml.document' + ) + response.headers['Content-Disposition'] = f'attachment; filename="{filename}"' + return response + + except Exception as exc: + debug_print(f"Message export error: {str(exc)}") + log_event(f"Message export failed: {exc}", level="WARNING") + return jsonify({'error': 'Export failed due to a server error. Please try again later.'}), 500 + + +def _build_export_entry( + conversation: Dict[str, Any], + raw_messages: List[Dict[str, Any]], + user_id: str, + settings: Dict[str, Any], + include_summary_intro: bool = False, + summary_model_deployment: str = '' +) -> Dict[str, Any]: + filtered_messages = _filter_messages_for_export(raw_messages) + ordered_messages = sort_messages_by_thread(filtered_messages) + + raw_thoughts = get_thoughts_for_conversation(conversation.get('id'), user_id) + thoughts_by_message = defaultdict(list) + for thought in raw_thoughts: + thoughts_by_message[thought.get('message_id')].append(_sanitize_thought(thought)) + + exported_messages = [] + role_counts = Counter() + total_citation_counts = Counter({'document': 0, 'web': 0, 'agent_tool': 0, 'legacy': 0, 'total': 0}) + transcript_index = 0 + total_thoughts = 0 + + for sequence_index, message in enumerate(ordered_messages, start=1): + role = message.get('role', 'unknown') + role_counts[role] += 1 + + message_transcript_index = None + if role in TRANSCRIPT_ROLES: + transcript_index += 1 + message_transcript_index = transcript_index + + thoughts = thoughts_by_message.get(message.get('id'), []) + exported_message = _sanitize_message( + message, + sequence_index=sequence_index, + transcript_index=message_transcript_index, + thoughts=thoughts + ) + exported_messages.append(exported_message) + + counts = exported_message.get('citation_counts', {}) + for key in total_citation_counts: + total_citation_counts[key] += counts.get(key, 0) + total_thoughts += len(thoughts) + + # Compute message time range for summary caching + message_time_start = None + message_time_end = None + if ordered_messages: + message_time_start = ordered_messages[0].get('timestamp') + message_time_end = ordered_messages[-1].get('timestamp') + + sanitized_conversation = _sanitize_conversation( + conversation, + messages=exported_messages, + role_counts=role_counts, + citation_counts=total_citation_counts, + thought_count=total_thoughts + ) + summary_intro = _build_summary_intro( + messages=exported_messages, + conversation=conversation, + sanitized_conversation=sanitized_conversation, + settings=settings, + enabled=include_summary_intro, + summary_model_deployment=summary_model_deployment, + message_time_start=message_time_start, + message_time_end=message_time_end + ) + + return { + 'conversation': sanitized_conversation, + 'summary_intro': summary_intro, + 'messages': exported_messages + } + + +def _filter_messages_for_export(messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]: + filtered_messages = [] + for message in messages: + metadata = message.get('metadata', {}) or {} + if metadata.get('is_deleted') is True: + continue + + thread_info = metadata.get('thread_info', {}) or {} + active = thread_info.get('active_thread') + if active is True or active is None or 'active_thread' not in thread_info: + filtered_messages.append(message) + + return filtered_messages + + +def _sanitize_conversation( + conversation: Dict[str, Any], + messages: List[Dict[str, Any]], + role_counts: Counter, + citation_counts: Counter, + thought_count: int +) -> Dict[str, Any]: + transcript_count = sum(1 for message in messages if message.get('is_transcript_message')) + return { + 'id': conversation.get('id'), + 'title': conversation.get('title', 'Untitled'), + 'last_updated': conversation.get('last_updated', ''), + 'chat_type': conversation.get('chat_type', 'personal'), + 'tags': conversation.get('tags', []), + 'context': conversation.get('context', []), + 'classification': conversation.get('classification', []), + 'strict': conversation.get('strict', False), + 'is_pinned': conversation.get('is_pinned', False), + 'scope_locked': conversation.get('scope_locked'), + 'locked_contexts': conversation.get('locked_contexts', []), + 'message_count': len(messages), + 'transcript_message_count': transcript_count, + 'message_counts_by_role': dict(role_counts), + 'citation_counts': dict(citation_counts), + 'thought_count': thought_count + } + + +def _sanitize_message( + message: Dict[str, Any], + sequence_index: int, + transcript_index: Optional[int], + thoughts: List[Dict[str, Any]] +) -> Dict[str, Any]: + role = message.get('role', '') + content = message.get('content', '') + raw_citation_buckets = _collect_raw_citation_buckets(message) + normalized_citations = _normalize_citations(raw_citation_buckets) + citation_counts = _build_citation_counts(normalized_citations) + details = _curate_message_details(message, citation_counts, len(thoughts)) + + return { + 'id': message.get('id'), + 'role': role, + 'speaker_label': _role_to_label(role), + 'sequence_index': sequence_index, + 'transcript_index': transcript_index, + 'label': f"Turn {transcript_index}" if transcript_index else f"Message {sequence_index}", + 'is_transcript_message': role in TRANSCRIPT_ROLES, + 'timestamp': message.get('timestamp', ''), + 'content': content, + 'content_text': _normalize_content(content), + 'details': details, + 'citations': normalized_citations, + 'citation_counts': citation_counts, + 'thoughts': thoughts, + 'legacy_citations': raw_citation_buckets['legacy'], + 'hybrid_citations': raw_citation_buckets['hybrid'], + 'web_search_citations': raw_citation_buckets['web'], + 'agent_citations': raw_citation_buckets['agent'] + } + + +def _sanitize_thought(thought: Dict[str, Any]) -> Dict[str, Any]: + return { + 'step_index': thought.get('step_index'), + 'step_type': thought.get('step_type'), + 'content': thought.get('content'), + 'detail': thought.get('detail'), + 'duration_ms': thought.get('duration_ms'), + 'timestamp': thought.get('timestamp') + } + + +def _collect_raw_citation_buckets(message: Dict[str, Any]) -> Dict[str, List[Any]]: + def ensure_list(value: Any) -> List[Any]: + if not value: + return [] + return value if isinstance(value, list) else [value] + + return { + 'legacy': ensure_list(message.get('citations')), + 'hybrid': ensure_list(message.get('hybrid_citations')), + 'web': ensure_list(message.get('web_search_citations')), + 'agent': ensure_list(message.get('agent_citations')) + } + + +def _normalize_citations(raw_citation_buckets: Dict[str, List[Any]]) -> List[Dict[str, Any]]: + normalized = [] + + for citation in raw_citation_buckets.get('hybrid', []): + if isinstance(citation, dict): + normalized.append({ + 'citation_type': 'document', + 'label': _build_document_citation_label(citation), + 'file_name': citation.get('file_name'), + 'title': citation.get('title') or citation.get('file_name'), + 'page_number': citation.get('page_number'), + 'citation_id': citation.get('citation_id'), + 'chunk_id': citation.get('chunk_id'), + 'metadata_type': citation.get('metadata_type'), + 'metadata_content': citation.get('metadata_content'), + 'score': citation.get('score'), + 'classification': citation.get('classification'), + 'url': citation.get('url') + }) else: - parts = [] - for entry in exported: - parts.append(_conversation_to_markdown(entry)) - content = '\n\n---\n\n'.join(parts) - filename = f"conversations_export_{timestamp_str}.md" - content_type = 'text/markdown; charset=utf-8' - - response = make_response(content) - response.headers['Content-Type'] = content_type - response.headers['Content-Disposition'] = f'attachment; filename="{filename}"' - return response - - def _build_zip_response(exported, export_format, timestamp_str): - """Build a ZIP archive containing one file per conversation.""" - buffer = io.BytesIO() - with zipfile.ZipFile(buffer, 'w', zipfile.ZIP_DEFLATED) as zf: - for entry in exported: - conv = entry['conversation'] - safe_title = _safe_filename(conv.get('title', 'Untitled')) - conv_id_short = conv.get('id', 'unknown')[:8] - - if export_format == 'json': - file_content = json.dumps(entry, indent=2, ensure_ascii=False, default=str) - ext = 'json' - else: - file_content = _conversation_to_markdown(entry) - ext = 'md' + normalized.append({ + 'citation_type': 'document', + 'label': str(citation), + 'value': str(citation) + }) + + for citation in raw_citation_buckets.get('web', []): + if isinstance(citation, dict): + title = citation.get('title') or citation.get('url') or 'Web source' + normalized.append({ + 'citation_type': 'web', + 'label': title, + 'title': title, + 'url': citation.get('url') + }) + else: + normalized.append({ + 'citation_type': 'web', + 'label': str(citation), + 'value': str(citation) + }) + + for citation in raw_citation_buckets.get('agent', []): + if isinstance(citation, dict): + tool_name = citation.get('tool_name') or citation.get('function_name') or 'Tool invocation' + normalized.append({ + 'citation_type': 'agent_tool', + 'label': tool_name, + 'tool_name': citation.get('tool_name'), + 'function_name': citation.get('function_name'), + 'plugin_name': citation.get('plugin_name'), + 'success': citation.get('success'), + 'timestamp': citation.get('timestamp') + }) + else: + normalized.append({ + 'citation_type': 'agent_tool', + 'label': str(citation), + 'value': str(citation) + }) + + for citation in raw_citation_buckets.get('legacy', []): + if isinstance(citation, dict): + title = citation.get('title') or citation.get('filepath') or citation.get('url') or 'Legacy citation' + normalized.append({ + 'citation_type': 'legacy', + 'label': title, + 'title': title, + 'url': citation.get('url'), + 'filepath': citation.get('filepath') + }) + else: + normalized.append({ + 'citation_type': 'legacy', + 'label': str(citation), + 'value': str(citation) + }) + + return normalized + + +def _build_document_citation_label(citation: Dict[str, Any]) -> str: + file_name = citation.get('file_name') or citation.get('title') or 'Document source' + metadata_type = citation.get('metadata_type') + page_number = citation.get('page_number') + + if metadata_type: + return f"{file_name} — {metadata_type.replace('_', ' ').title()}" + if page_number not in (None, ''): + return f"{file_name} — Page {page_number}" + return file_name + + +def _build_citation_counts(citations: List[Dict[str, Any]]) -> Dict[str, int]: + counts = { + 'document': 0, + 'web': 0, + 'agent_tool': 0, + 'legacy': 0, + 'total': len(citations) + } + for citation in citations: + citation_type = citation.get('citation_type') + if citation_type in counts: + counts[citation_type] += 1 + return counts + + +def _curate_message_details( + message: Dict[str, Any], + citation_counts: Dict[str, int], + thought_count: int +) -> Dict[str, Any]: + role = message.get('role', '') + metadata = message.get('metadata', {}) or {} + details: Dict[str, Any] = {} + + if role == 'user': + details['interaction_mode'] = _remove_empty_values({ + 'button_states': metadata.get('button_states'), + 'workspace_search': _curate_workspace_search(metadata.get('workspace_search')), + 'prompt_selection': _curate_prompt_selection(metadata.get('prompt_selection')), + 'agent_selection': _curate_agent_selection(metadata.get('agent_selection')), + 'model_selection': _curate_model_selection(metadata.get('model_selection')) + }) + elif role == 'assistant': + details['generation'] = _remove_empty_values({ + 'augmented': message.get('augmented'), + 'model_deployment': message.get('model_deployment_name'), + 'agent_name': message.get('agent_name'), + 'agent_display_name': message.get('agent_display_name'), + 'reasoning_effort': metadata.get('reasoning_effort'), + 'hybrid_search_query': message.get('hybridsearch_query'), + 'token_usage': _curate_token_usage(metadata.get('token_usage')), + 'citation_counts': citation_counts, + 'thought_count': thought_count + }) + else: + details['message_context'] = _remove_empty_values({ + 'filename': message.get('filename'), + 'prompt': message.get('prompt'), + 'is_table': message.get('is_table'), + 'model_deployment': message.get('model_deployment_name') + }) + + return _remove_empty_values(details) + + +def _curate_workspace_search(workspace_search: Optional[Dict[str, Any]]) -> Dict[str, Any]: + if not isinstance(workspace_search, dict): + return {} + return _remove_empty_values({ + 'search_enabled': workspace_search.get('search_enabled'), + 'document_scope': workspace_search.get('document_scope'), + 'document_name': workspace_search.get('document_name'), + 'document_filename': workspace_search.get('document_filename'), + 'group_name': workspace_search.get('group_name'), + 'classification': workspace_search.get('classification'), + 'public_workspace_id': workspace_search.get('active_public_workspace_id') + }) + + +def _curate_prompt_selection(prompt_selection: Optional[Dict[str, Any]]) -> Dict[str, Any]: + if not isinstance(prompt_selection, dict): + return {} + return _remove_empty_values({ + 'prompt_name': prompt_selection.get('prompt_name'), + 'selected_prompt_index': prompt_selection.get('selected_prompt_index'), + 'selected_prompt_text': prompt_selection.get('selected_prompt_text') + }) + + +def _curate_agent_selection(agent_selection: Optional[Dict[str, Any]]) -> Dict[str, Any]: + if not isinstance(agent_selection, dict): + return {} + return _remove_empty_values({ + 'selected_agent': agent_selection.get('selected_agent'), + 'agent_display_name': agent_selection.get('agent_display_name'), + 'is_global': agent_selection.get('is_global'), + 'is_group': agent_selection.get('is_group'), + 'group_name': agent_selection.get('group_name') + }) + + +def _curate_model_selection(model_selection: Optional[Dict[str, Any]]) -> Dict[str, Any]: + if not isinstance(model_selection, dict): + return {} + return _remove_empty_values({ + 'selected_model': model_selection.get('selected_model'), + 'frontend_requested_model': model_selection.get('frontend_requested_model'), + 'reasoning_effort': model_selection.get('reasoning_effort'), + 'streaming': model_selection.get('streaming') + }) + + +def _curate_token_usage(token_usage: Any) -> Dict[str, Any]: + if not isinstance(token_usage, dict): + return {} + return _remove_empty_values({ + 'prompt_tokens': token_usage.get('prompt_tokens'), + 'completion_tokens': token_usage.get('completion_tokens'), + 'total_tokens': token_usage.get('total_tokens') + }) + + +def _remove_empty_values(value: Any) -> Any: + if isinstance(value, dict): + cleaned = {} + for key, item in value.items(): + cleaned_item = _remove_empty_values(item) + if cleaned_item in (None, '', [], {}): + continue + cleaned[key] = cleaned_item + return cleaned + + if isinstance(value, list): + cleaned_list = [] + for item in value: + cleaned_item = _remove_empty_values(item) + if cleaned_item in (None, '', [], {}): + continue + cleaned_list.append(cleaned_item) + return cleaned_list + + return value + + +def generate_conversation_summary( + messages: List[Dict[str, Any]], + conversation_title: str, + settings: Dict[str, Any], + model_deployment: str, + message_time_start: str = None, + message_time_end: str = None, + conversation_id: str = None +) -> Dict[str, Any]: + """Generate a conversation summary using the LLM and optionally persist it. + + This is the shared helper used by both the export pipeline and the + on-demand summary API endpoint. Returns a summary dict suitable for + storage in conversation metadata. + + Raises ValueError when there is no content to summarise and + RuntimeError on model errors. + """ + transcript_lines = [] + for message in messages: + content_text = message.get('content_text', '') + if not content_text: + continue + role = message.get('role', 'unknown') + speaker = message.get('speaker_label', role).upper() + transcript_lines.append(f"{speaker}: {content_text}") + + transcript_text = '\n\n'.join(transcript_lines).strip() + if not transcript_text: + raise ValueError('No message content was available to summarize.') + + transcript_text = _truncate_for_summary(transcript_text) + + gpt_client, gpt_model = _initialize_gpt_client(settings, model_deployment) + summary_prompt = ( + "You are summarizing a conversation for an export document. " + "Read the full conversation below and write a concise summary. " + "Use your judgement on length: for short conversations write one brief paragraph, " + "for longer or more detailed conversations write two paragraphs. " + "If you need refer to the user, use their name, but do not refer to the user too often." + "Cover the goals, the key topics discussed, any data or tools referenced, " + "and the main outcomes or answers provided. " + "Be factual and neutral. Return plain text only — no headings, no bullet points, no markdown formatting." + ) + + model_lower = gpt_model.lower() + is_reasoning_model = ( + 'o1' in model_lower or 'o3' in model_lower or 'gpt-5' in model_lower + ) + instruction_role = 'developer' if is_reasoning_model else 'system' + + debug_print(f"Summary generation: sending {len(transcript_lines)} messages " + f"({len(transcript_text)} chars) to {gpt_model} (role={instruction_role})") + + summary_response = gpt_client.chat.completions.create( + model=gpt_model, + messages=[ + { + 'role': instruction_role, + 'content': summary_prompt + }, + { + 'role': 'user', + 'content': ( + f"Conversation Title: {conversation_title}\n\n" + f"{transcript_text}" + ) + } + ] + ) + + debug_print(f"Summary generation: response choices=" + f"{len(summary_response.choices) if summary_response.choices else 0}, " + f"finish_reason={summary_response.choices[0].finish_reason if summary_response.choices else 'N/A'}") + + summary_text = (summary_response.choices[0].message.content or '').strip() if summary_response.choices else '' + if not summary_text: + debug_print('Summary generation: model returned an empty response') + log_event('Conversation summary generation returned empty response', level='WARNING') + raise RuntimeError('Summary model returned an empty response.') + + summary_data = { + 'content': summary_text, + 'model_deployment': gpt_model, + 'generated_at': datetime.utcnow().isoformat(), + 'message_time_start': message_time_start, + 'message_time_end': message_time_end + } + + # Persist to Cosmos when a conversation_id is available + if conversation_id: + try: + update_conversation_with_metadata(conversation_id, {'summary': summary_data}) + debug_print(f"Summary persisted to conversation {conversation_id}") + except Exception as persist_exc: + debug_print(f"Failed to persist summary to Cosmos: {persist_exc}") + log_event(f"Failed to persist conversation summary: {persist_exc}", level="WARNING") + + return summary_data + + +def _build_summary_intro( + messages: List[Dict[str, Any]], + conversation: Dict[str, Any], + sanitized_conversation: Dict[str, Any], + settings: Dict[str, Any], + enabled: bool, + summary_model_deployment: str, + message_time_start: str = None, + message_time_end: str = None +) -> Dict[str, Any]: + """Build the summary_intro block for the export payload. + + Uses cached summary from conversation metadata when present and + still current (no newer messages). Otherwise generates a fresh + summary via ``generate_conversation_summary`` and persists it. + """ + summary_intro = { + 'enabled': enabled, + 'generated': False, + 'model_deployment': summary_model_deployment or None, + 'generated_at': None, + 'content': '', + 'error': None + } + + if not enabled: + return summary_intro + + # Check for a cached summary stored in the conversation document + existing_summary = conversation.get('summary') + if existing_summary and isinstance(existing_summary, dict): + cached_end = existing_summary.get('message_time_end') + if cached_end and message_time_end and cached_end >= message_time_end: + debug_print('Export summary: using cached summary from conversation metadata') + summary_intro.update({ + 'generated': True, + 'model_deployment': existing_summary.get('model_deployment'), + 'generated_at': existing_summary.get('generated_at'), + 'content': existing_summary.get('content', ''), + 'error': None + }) + return summary_intro + debug_print('Export summary: cached summary is stale, regenerating') + + try: + conversation_id = conversation.get('id') + conversation_title = sanitized_conversation.get('title', 'Untitled') + + summary_data = generate_conversation_summary( + messages=messages, + conversation_title=conversation_title, + settings=settings, + model_deployment=summary_model_deployment, + message_time_start=message_time_start, + message_time_end=message_time_end, + conversation_id=conversation_id + ) + + summary_intro.update({ + 'generated': True, + 'model_deployment': summary_data.get('model_deployment'), + 'generated_at': summary_data.get('generated_at'), + 'content': summary_data.get('content', ''), + 'error': None + }) + return summary_intro + + except (ValueError, RuntimeError) as known_exc: + debug_print(f"Export summary generation issue: {known_exc}") + summary_intro['error'] = str(known_exc) + if hasattr(known_exc, 'model_deployment'): + summary_intro['model_deployment'] = known_exc.model_deployment + return summary_intro + + except Exception as exc: + debug_print(f"Export summary generation failed: {exc}") + log_event(f"Conversation export summary generation failed: {exc}", level="WARNING") + summary_intro['error'] = str(exc) + return summary_intro + + +def _truncate_for_summary(transcript_text: str) -> str: + if len(transcript_text) <= SUMMARY_SOURCE_CHAR_LIMIT: + return transcript_text + + head_chars = SUMMARY_SOURCE_CHAR_LIMIT // 2 + tail_chars = SUMMARY_SOURCE_CHAR_LIMIT - head_chars + return ( + transcript_text[:head_chars] + + "\n\n[... transcript truncated for export summary generation ...]\n\n" + + transcript_text[-tail_chars:] + ) + + +def _initialize_gpt_client(settings: Dict[str, Any], requested_model: str = ''): + enable_gpt_apim = settings.get('enable_gpt_apim', False) + + if enable_gpt_apim: + raw_models = settings.get('azure_apim_gpt_deployment', '') or '' + apim_models = [model.strip() for model in raw_models.split(',') if model.strip()] + if not apim_models: + raise ValueError('APIM GPT deployment name is not configured.') + + if requested_model and requested_model not in apim_models: + raise ValueError(f"Requested summary model '{requested_model}' is not configured for APIM.") + + gpt_model = requested_model or apim_models[0] + gpt_client = AzureOpenAI( + api_version=settings.get('azure_apim_gpt_api_version'), + azure_endpoint=settings.get('azure_apim_gpt_endpoint'), + api_key=settings.get('azure_apim_gpt_subscription_key') + ) + return gpt_client, gpt_model + + auth_type = settings.get('azure_openai_gpt_authentication_type') + endpoint = settings.get('azure_openai_gpt_endpoint') + api_version = settings.get('azure_openai_gpt_api_version') + gpt_model_obj = settings.get('gpt_model', {}) or {} + + if requested_model: + gpt_model = requested_model + elif gpt_model_obj.get('selected'): + gpt_model = gpt_model_obj['selected'][0]['deploymentName'] + else: + raise ValueError('No GPT model selected or configured for export summary generation.') + + if auth_type == 'managed_identity': + token_provider = get_bearer_token_provider(DefaultAzureCredential(), cognitive_services_scope) + gpt_client = AzureOpenAI( + api_version=api_version, + azure_endpoint=endpoint, + azure_ad_token_provider=token_provider + ) + else: + api_key = settings.get('azure_openai_gpt_key') + if not api_key: + raise ValueError('Azure OpenAI API Key not configured.') + gpt_client = AzureOpenAI( + api_version=api_version, + azure_endpoint=endpoint, + api_key=api_key + ) + + return gpt_client, gpt_model + + +def _build_single_file_response(exported: List[Dict[str, Any]], export_format: str, timestamp_str: str): + """Build a single-file download response.""" + if export_format == 'json': + content = json.dumps(exported, indent=2, ensure_ascii=False, default=str) + filename = f"conversations_export_{timestamp_str}.json" + content_type = 'application/json; charset=utf-8' + elif export_format == 'pdf': + if len(exported) == 1: + content = _conversation_to_pdf_bytes(exported[0]) + else: + combined_parts = [] + for idx, entry in enumerate(exported): + if idx > 0: + combined_parts.append( + '
' + ) + combined_parts.append(_build_pdf_html_body(entry)) + content = _html_body_to_pdf_bytes('\n'.join(combined_parts)) + filename = f"conversations_export_{timestamp_str}.pdf" + content_type = 'application/pdf' + else: + parts = [] + for entry in exported: + parts.append(_conversation_to_markdown(entry)) + content = '\n\n---\n\n'.join(parts) + filename = f"conversations_export_{timestamp_str}.md" + content_type = 'text/markdown; charset=utf-8' + + response = make_response(content) + response.headers['Content-Type'] = content_type + response.headers['Content-Disposition'] = f'attachment; filename="{filename}"' + return response + + +def _build_zip_response(exported: List[Dict[str, Any]], export_format: str, timestamp_str: str): + """Build a ZIP archive containing one file per conversation.""" + buffer = io.BytesIO() + with zipfile.ZipFile(buffer, 'w', zipfile.ZIP_DEFLATED) as zf: + for entry in exported: + conversation = entry['conversation'] + safe_title = _safe_filename(conversation.get('title', 'Untitled')) + conversation_id_short = conversation.get('id', 'unknown')[:8] + + if export_format == 'json': + file_content = json.dumps(entry, indent=2, ensure_ascii=False, default=str) + ext = 'json' + elif export_format == 'pdf': + file_content = _conversation_to_pdf_bytes(entry) + ext = 'pdf' + else: + file_content = _conversation_to_markdown(entry) + ext = 'md' + + file_name = f"{safe_title}_{conversation_id_short}.{ext}" + zf.writestr(file_name, file_content) + + buffer.seek(0) + filename = f"conversations_export_{timestamp_str}.zip" - file_name = f"{safe_title}_{conv_id_short}.{ext}" - zf.writestr(file_name, file_content) + response = make_response(buffer.read()) + response.headers['Content-Type'] = 'application/zip' + response.headers['Content-Disposition'] = f'attachment; filename="{filename}"' + return response - buffer.seek(0) - filename = f"conversations_export_{timestamp_str}.zip" - response = make_response(buffer.read()) - response.headers['Content-Type'] = 'application/zip' - response.headers['Content-Disposition'] = f'attachment; filename="{filename}"' - return response +def _conversation_to_markdown(entry: Dict[str, Any]) -> str: + """Convert a conversation + messages entry to Markdown format.""" + conversation = entry['conversation'] + messages = entry['messages'] + summary_intro = entry.get('summary_intro', {}) or {} - def _conversation_to_markdown(entry): - """Convert a conversation + messages entry to Markdown format.""" - conv = entry['conversation'] - messages = entry['messages'] + transcript_messages = [message for message in messages if message.get('is_transcript_message')] + detail_messages = [message for message in messages if message.get('details')] + reference_messages = [message for message in messages if message.get('citations')] + thought_messages = [message for message in messages if message.get('thoughts')] + supplemental_messages = [message for message in messages if not message.get('is_transcript_message')] - lines = [] - title = conv.get('title', 'Untitled') - lines.append(f"# {title}") + lines: List[str] = [] + lines.append(f"# {conversation.get('title', 'Untitled')}") + lines.append('') + lines.append(f"**Last Updated:** {conversation.get('last_updated', '')} ") + lines.append(f"**Chat Type:** {conversation.get('chat_type', 'personal')} ") + lines.append(f"**Messages:** {conversation.get('message_count', len(messages))} ") + if conversation.get('tags'): + lines.append(f"**Tags:** {', '.join(_format_tag(tag) for tag in conversation.get('tags', []))} ") + if conversation.get('classification'): + lines.append(f"**Classification:** {', '.join(_format_tag(item) for item in conversation.get('classification', []))} ") + lines.append('') + + if summary_intro.get('enabled') and summary_intro.get('generated') and summary_intro.get('content'): + lines.append('## Abstract') + lines.append('') + lines.append(summary_intro.get('content', '')) + lines.append('') + lines.append(f"_Generated with {summary_intro.get('model_deployment') or 'configured model'} on {summary_intro.get('generated_at')}_") + lines.append('') + elif summary_intro.get('enabled') and summary_intro.get('error'): + lines.append('> _A summary intro was requested, but it could not be generated for this export._') + lines.append(f"> _Error: {summary_intro.get('error')}_") lines.append('') - # Metadata - last_updated = conv.get('last_updated', '') - chat_type = conv.get('chat_type', 'personal') - tags = conv.get('tags', []) - - lines.append(f"**Last Updated:** {last_updated} ") - lines.append(f"**Chat Type:** {chat_type} ") - if tags: - tag_strs = [str(t) for t in tags] - lines.append(f"**Tags:** {', '.join(tag_strs)} ") - lines.append(f"**Messages:** {len(messages)} ") + lines.append('## Transcript') + lines.append('') + if not transcript_messages: + lines.append('_No user or assistant transcript messages were available for export._') lines.append('') - lines.append('---') + else: + for message in transcript_messages: + lines.append(f"### {message.get('label')} — {message.get('speaker_label')}") + if message.get('timestamp'): + lines.append(f"*{message.get('timestamp')}*") + lines.append('') + lines.append(message.get('content_text') or '_No content recorded._') + lines.append('') + + lines.append('## Appendix A — Conversation Metadata') + lines.append('') + metadata_to_render = _remove_empty_values({ + 'context': conversation.get('context'), + 'classification': conversation.get('classification'), + 'strict': conversation.get('strict'), + 'is_pinned': conversation.get('is_pinned'), + 'scope_locked': conversation.get('scope_locked'), + 'locked_contexts': conversation.get('locked_contexts'), + 'message_counts_by_role': conversation.get('message_counts_by_role'), + 'citation_counts': conversation.get('citation_counts'), + 'thought_count': conversation.get('thought_count') + }) + _append_markdown_mapping(lines, metadata_to_render) + lines.append('') + + if detail_messages: + lines.append('## Appendix B — Message Details') lines.append('') + for message in detail_messages: + lines.append(f"### {message.get('label')} — {message.get('speaker_label')}") + if message.get('timestamp'): + lines.append(f"*{message.get('timestamp')}*") + lines.append('') + _append_markdown_mapping(lines, message.get('details', {})) + lines.append('') - # Messages - for msg in messages: - role = msg.get('role', 'unknown') - timestamp = msg.get('timestamp', '') - raw_content = msg.get('content', '') - content = _normalize_content(raw_content) - - role_label = role.capitalize() - if role == 'assistant': - role_label = 'Assistant' - elif role == 'user': - role_label = 'User' - elif role == 'system': - role_label = 'System' - elif role == 'tool': - role_label = 'Tool' - - lines.append(f"### {role_label}") - if timestamp: - lines.append(f"*{timestamp}*") + if reference_messages: + lines.append('## Appendix C — References') + lines.append('') + for message in reference_messages: + lines.append(f"### {message.get('label')} — {message.get('speaker_label')}") + if message.get('timestamp'): + lines.append(f"*{message.get('timestamp')}*") lines.append('') - lines.append(content) + _append_citations_markdown(lines, message) lines.append('') - # Citations - citations = msg.get('citations') - if citations: - lines.append('**Citations:**') - if isinstance(citations, list): - for cit in citations: - if isinstance(cit, dict): - source = cit.get('title') or cit.get('filepath') or cit.get('url', 'Unknown') - lines.append(f"- {source}") - else: - lines.append(f"- {cit}") - lines.append('') - - lines.append('---') + if thought_messages: + lines.append('## Appendix D — Processing Thoughts') + lines.append('') + for message in thought_messages: + lines.append(f"### {message.get('label')} — {message.get('speaker_label')}") + if message.get('timestamp'): + lines.append(f"*{message.get('timestamp')}*") + lines.append('') + for thought in message.get('thoughts', []): + thought_label = thought.get('step_type', 'step').replace('_', ' ').title() + lines.append(f"1. **{thought_label}:** {thought.get('content') or 'No content recorded.'}") + if thought.get('duration_ms') is not None: + lines.append(f" - **Duration:** {thought.get('duration_ms')} ms") + if thought.get('timestamp'): + lines.append(f" - **Timestamp:** {thought.get('timestamp')}") + if thought.get('detail'): + lines.append(' - **Detail:**') + _append_code_block(lines, thought.get('detail'), indent=' ') lines.append('') - return '\n'.join(lines) + if supplemental_messages: + lines.append('## Appendix E — Supplemental Messages') + lines.append('') + for message in supplemental_messages: + lines.append(f"### {message.get('label')} — {message.get('speaker_label')}") + if message.get('timestamp'): + lines.append(f"*{message.get('timestamp')}*") + lines.append('') + lines.append(message.get('content_text') or '_No content recorded._') + lines.append('') - def _normalize_content(content): - """Normalize message content to a plain string. - - Content may be a string, a list of content-part dicts - (e.g. [{"type": "text", "text": "..."}, ...]), or a dict. - """ - if isinstance(content, str): - return content - if isinstance(content, list): - parts = [] - for item in content: - if isinstance(item, dict): - if item.get('type') == 'text': - parts.append(item.get('text', '')) - elif item.get('type') == 'image_url': - parts.append('[Image]') + return '\n'.join(lines).strip() + + +def _append_citations_markdown(lines: List[str], message: Dict[str, Any]): + document_citations = [citation for citation in message.get('citations', []) if citation.get('citation_type') == 'document'] + web_citations = [citation for citation in message.get('citations', []) if citation.get('citation_type') == 'web'] + agent_citations = message.get('agent_citations', []) or [] + legacy_citations = [citation for citation in message.get('citations', []) if citation.get('citation_type') == 'legacy'] + + if not any([document_citations, web_citations, agent_citations, legacy_citations]): + lines.append('_No citations were recorded for this message._') + return + + if document_citations: + lines.append('#### Document Sources') + lines.append('') + for index, citation in enumerate(document_citations, start=1): + lines.append(f"{index}. **{citation.get('label', 'Document source')}**") + detail_mapping = _remove_empty_values({ + 'citation_id': citation.get('citation_id'), + 'page_number': citation.get('page_number'), + 'classification': citation.get('classification'), + 'score': citation.get('score'), + 'metadata_type': citation.get('metadata_type') + }) + _append_markdown_mapping(lines, detail_mapping, indent=1) + if citation.get('metadata_content'): + lines.append(' - **Metadata Content:**') + _append_code_block(lines, citation.get('metadata_content'), indent=' ') + lines.append('') + + if web_citations: + lines.append('#### Web Sources') + lines.append('') + for index, citation in enumerate(web_citations, start=1): + title = citation.get('title') or citation.get('label') or 'Web source' + url = citation.get('url') + if url: + lines.append(f"{index}. [{title}]({url})") + else: + lines.append(f"{index}. {title}") + lines.append('') + + if agent_citations: + lines.append('#### Tool Invocations') + lines.append('') + for index, citation in enumerate(agent_citations, start=1): + label = citation.get('tool_name') or citation.get('function_name') or f"Tool {index}" + lines.append(f"{index}. **{label}**") + detail_mapping = _remove_empty_values({ + 'function_name': citation.get('function_name'), + 'plugin_name': citation.get('plugin_name'), + 'success': citation.get('success'), + 'timestamp': citation.get('timestamp') + }) + _append_markdown_mapping(lines, detail_mapping, indent=1) + if citation.get('function_arguments') not in (None, '', [], {}): + lines.append(' - **Arguments:**') + _append_code_block(lines, citation.get('function_arguments'), indent=' ') + if citation.get('function_result') not in (None, '', [], {}): + lines.append(' - **Result:**') + _append_code_block(lines, citation.get('function_result'), indent=' ') + lines.append('') + + if legacy_citations: + lines.append('#### Legacy Citation Records') + lines.append('') + for index, citation in enumerate(legacy_citations, start=1): + lines.append(f"{index}. {citation.get('label', 'Legacy citation')}") + lines.append('') + + +def _append_markdown_mapping(lines: List[str], mapping: Dict[str, Any], indent: int = 0): + if not isinstance(mapping, dict) or not mapping: + return + + prefix = ' ' * indent + for key, value in mapping.items(): + label = _format_markdown_key(key) + if isinstance(value, dict): + lines.append(f"{prefix}- **{label}:**") + _append_markdown_mapping(lines, value, indent + 1) + elif isinstance(value, list): + if not value: + continue + if all(not isinstance(item, (dict, list)) for item in value): + lines.append(f"{prefix}- **{label}:** {', '.join(_stringify_markdown_value(item) for item in value)}") + else: + lines.append(f"{prefix}- **{label}:**") + for item in value: + if isinstance(item, dict): + lines.append(f"{prefix} -") + _append_markdown_mapping(lines, item, indent + 2) else: - parts.append(str(item)) + lines.append(f"{prefix} - {_stringify_markdown_value(item)}") + else: + lines.append(f"{prefix}- **{label}:** {_stringify_markdown_value(value)}") + + +def _append_code_block(lines: List[str], value: Any, indent: str = ''): + if isinstance(value, (dict, list)): + code_block = json.dumps(value, indent=2, ensure_ascii=False, default=str) + language = 'json' + else: + code_block = str(value) + language = 'text' + + lines.append(f"{indent}```{language}") + for line in code_block.splitlines() or ['']: + lines.append(f"{indent}{line}") + lines.append(f"{indent}```") + + +def _format_markdown_key(key: str) -> str: + return str(key).replace('_', ' ').title() + + +def _stringify_markdown_value(value: Any) -> str: + if isinstance(value, bool): + return 'Yes' if value else 'No' + return str(value) + + +def _format_tag(tag: Any) -> str: + """Format a tag or classification entry for display. + + Tags in Cosmos are stored as dicts such as + ``{'category': 'model', 'value': 'gpt-5'}`` or + ``{'category': 'participant', 'name': 'Alice', 'user_id': '...'}`` + but they can also be plain strings in older data. + """ + if isinstance(tag, dict): + category = tag.get('category', '') + # Participant tags carry a readable name / email + name = tag.get('name') or tag.get('email') or tag.get('display_name') + if name: + return f"{category}: {name}" if category else str(name) + # Document tags carry a title + title = tag.get('title') or tag.get('document_id') + if title: + return f"{category}: {title}" if category else str(title) + # Generic category/value tags + value = tag.get('value') + if value: + return f"{category}: {value}" if category else str(value) + return category or str(tag) + return str(tag) + + +def _role_to_label(role: str) -> str: + role_map = { + 'assistant': 'Assistant', + 'user': 'User', + 'system': 'System', + 'tool': 'Tool', + 'file': 'File', + 'image': 'Image', + 'safety': 'Safety', + 'blocked': 'Blocked' + } + return role_map.get(role, str(role).capitalize() or 'Message') + + +def _normalize_content(content: Any) -> str: + """Normalize message content to a plain string.""" + if isinstance(content, str): + return content + if isinstance(content, list): + parts = [] + for item in content: + if isinstance(item, dict): + if item.get('type') == 'text': + parts.append(item.get('text', '')) + elif item.get('type') == 'image_url': + parts.append('[Image]') else: parts.append(str(item)) - return '\n'.join(parts) - if isinstance(content, dict): - if content.get('type') == 'text': - return content.get('text', '') - return str(content) - return str(content) if content else '' - - def _safe_filename(title): - """Create a filesystem-safe filename from a conversation title.""" - import re - # Remove or replace unsafe characters - safe = re.sub(r'[<>:"/\\|?*]', '_', title) - safe = re.sub(r'\s+', '_', safe) - safe = safe.strip('_. ') - # Truncate to reasonable length - if len(safe) > 50: - safe = safe[:50] - return safe or 'Untitled' + else: + parts.append(str(item)) + return '\n'.join(parts) + if isinstance(content, dict): + if content.get('type') == 'text': + return content.get('text', '') + return str(content) + return str(content) if content else '' + + +def _safe_filename(title: str) -> str: + """Create a filesystem-safe filename from a conversation title.""" + safe = re.sub(r'[<>:"/\\|?*]', '_', title) + safe = re.sub(r'\s+', '_', safe) + safe = safe.strip('_. ') + if len(safe) > 50: + safe = safe[:50] + return safe or 'Untitled' + + +def _message_to_docx_bytes(message: Dict[str, Any]) -> bytes: + doc = DocxDocument() + doc.add_heading('Message Export', level=1) + + role_label = _role_to_label(message.get('role', 'unknown')) + timestamp = message.get('timestamp', '') + + meta_paragraph = doc.add_paragraph() + meta_run = meta_paragraph.add_run(f"Role: {role_label}") + meta_run.bold = True + if timestamp: + meta_paragraph.add_run(f" {timestamp}") + + doc.add_paragraph('') + + content = _normalize_content(message.get('content', '')) + if content: + _add_markdown_content_to_doc(doc, content) + else: + doc.add_paragraph('No content recorded.') + + citation_labels = _build_message_citation_labels(message) + if citation_labels: + doc.add_heading('Citations', level=2) + for citation_label in citation_labels: + doc.add_paragraph(citation_label, style='List Bullet') + + buffer = io.BytesIO() + doc.save(buffer) + buffer.seek(0) + return buffer.read() + + +def _build_message_citation_labels(message: Dict[str, Any]) -> List[str]: + normalized_citations = _normalize_citations(_collect_raw_citation_buckets(message)) + citation_labels: List[str] = [] + seen_labels = set() + + for citation in normalized_citations: + label = str( + citation.get('label') + or citation.get('title') + or citation.get('url') + or citation.get('filepath') + or citation.get('tool_name') + or citation.get('function_name') + or '' + ).strip() + if not label or label in seen_labels: + continue + seen_labels.add(label) + citation_labels.append(label) + + return citation_labels + + +def _add_markdown_content_to_doc(doc: DocxDocument, content: str): + lines = content.split('\n') + index = 0 + + while index < len(lines): + line = lines[index] + + heading_match = re.match(r'^(#{1,6})\s+(.*)', line) + if heading_match: + level = min(len(heading_match.group(1)), 4) + doc.add_heading(heading_match.group(2).strip(), level=level) + index += 1 + continue + + if line.strip().startswith('```'): + code_lines = [] + index += 1 + while index < len(lines) and not lines[index].strip().startswith('```'): + code_lines.append(lines[index]) + index += 1 + index += 1 + code_paragraph = doc.add_paragraph() + code_run = code_paragraph.add_run('\n'.join(code_lines)) + code_run.font.name = 'Consolas' + code_run.font.size = Pt(9) + continue + + unordered_list_match = re.match(r'^(\s*)[*\-+]\s+(.*)', line) + if unordered_list_match: + doc.add_paragraph(unordered_list_match.group(2).strip(), style='List Bullet') + index += 1 + continue + + ordered_list_match = re.match(r'^(\s*)\d+[.)]\s+(.*)', line) + if ordered_list_match: + doc.add_paragraph(ordered_list_match.group(2).strip(), style='List Number') + index += 1 + continue + + if not line.strip(): + index += 1 + continue + + paragraph = doc.add_paragraph() + _add_inline_markdown_runs(paragraph, line) + index += 1 + + +def _add_inline_markdown_runs(paragraph, text: str): + parts = re.compile(r'(\*\*.*?\*\*|\*.*?\*|`[^`]+`)').split(text) + + for part in parts: + if part.startswith('**') and part.endswith('**'): + run = paragraph.add_run(part[2:-2]) + run.bold = True + elif part.startswith('*') and part.endswith('*') and len(part) > 2: + run = paragraph.add_run(part[1:-1]) + run.italic = True + elif part.startswith('`') and part.endswith('`'): + run = paragraph.add_run(part[1:-1]) + run.font.name = 'Consolas' + run.font.size = Pt(9) + elif part: + paragraph.add_run(part) + + +# --------------------------------------------------------------------------- +# PDF Export — HTML generation and PyMuPDF Story rendering +# --------------------------------------------------------------------------- + +_PDF_CSS = """ +body { + font-family: sans-serif; + font-size: 10pt; + color: #222; + line-height: 1.4; +} +h1 { + font-size: 16pt; + color: #1a1a2e; + margin-bottom: 2pt; +} +h2 { + font-size: 13pt; + color: #16213e; + margin-top: 16pt; + margin-bottom: 6pt; + border-bottom: 1px solid #ccc; + padding-bottom: 4pt; +} +h3 { + font-size: 11pt; + color: #0f3460; + margin-top: 10pt; + margin-bottom: 4pt; +} +h4 { + font-size: 10pt; + color: #333; + margin-top: 8pt; + margin-bottom: 4pt; +} +p { + margin-top: 2pt; + margin-bottom: 4pt; +} +.metadata { + font-size: 8pt; + color: #666; +} +.abstract { + background-color: #f8f9fa; + padding: 8pt; + margin-bottom: 8pt; +} +.note { + font-size: 9pt; + color: #856404; + background-color: #fff3cd; + padding: 6pt; +} +.bubble { + padding: 8pt 12pt; + margin-bottom: 8pt; +} +.bubble-header { + font-size: 8pt; + color: #444; + margin-bottom: 2pt; +} +.ts { + font-weight: normal; + color: #888; +} +.user-bubble { + background-color: #c8e0fa; + margin-left: 60pt; +} +.assistant-bubble { + background-color: #f1f0f0; + margin-right: 60pt; +} +.system-bubble { + background-color: #fff3cd; + margin-left: 30pt; + margin-right: 30pt; + font-size: 9pt; +} +.file-bubble { + background-color: #e8f5e9; + margin-right: 60pt; + font-size: 9pt; +} +.other-bubble { + background-color: #f5f5f5; + margin-left: 30pt; + margin-right: 30pt; + font-size: 9pt; +} +table { + border-collapse: collapse; + width: 100%; + font-size: 9pt; + margin-bottom: 8pt; +} +th, td { + border: 1px solid #ddd; + padding: 4pt 6pt; + text-align: left; +} +th { + background-color: #f5f5f5; + font-weight: bold; +} +pre { + background-color: #f5f5f5; + padding: 6pt; + font-size: 8pt; + font-family: monospace; +} +code { + font-family: monospace; + font-size: 9pt; + background-color: #f0f0f0; + padding: 1pt 3pt; +} +ol, ul { + margin-top: 4pt; + margin-bottom: 8pt; +} +li { + margin-bottom: 4pt; +} +small { + font-size: 8pt; + color: #666; +} +a { + color: #0066cc; +} +""" + + +def _pdf_bubble_class(role: str) -> str: + """Return the CSS class for a chat bubble based on message role.""" + role_classes = { + 'user': 'user-bubble', + 'assistant': 'assistant-bubble', + 'system': 'system-bubble', + 'file': 'file-bubble', + 'image': 'file-bubble' + } + return role_classes.get(role, 'other-bubble') + + +def _build_pdf_html_body(entry: Dict[str, Any]) -> str: + """Build the HTML body content for a single conversation PDF.""" + conversation = entry['conversation'] + messages = entry['messages'] + summary_intro = entry.get('summary_intro', {}) or {} + + transcript_messages = [m for m in messages if m.get('is_transcript_message')] + detail_messages = [m for m in messages if m.get('details')] + reference_messages = [m for m in messages if m.get('citations')] + thought_messages = [m for m in messages if m.get('thoughts')] + supplemental_messages = [m for m in messages if not m.get('is_transcript_message')] + + parts: List[str] = [] + + # --- Title and metadata --- + parts.append(f'

{_escape_html(conversation.get("title", "Untitled"))}

') + meta_items = [ + f'Last Updated: {_escape_html(str(conversation.get("last_updated", "")))}', + f'Chat Type: {_escape_html(str(conversation.get("chat_type", "personal")))}', + f'Messages: {conversation.get("message_count", len(messages))}' + ] + tags = conversation.get('tags') + if tags: + meta_items.append(f'Tags: {_escape_html(", ".join(_format_tag(t) for t in tags))}') + classification = conversation.get('classification') + if classification: + meta_items.append( + f'Classification: {_escape_html(", ".join(_format_tag(c) for c in classification))}' + ) + parts.append(f'') + + # --- Abstract --- + if summary_intro.get('enabled') and summary_intro.get('generated') and summary_intro.get('content'): + parts.append('

Abstract

') + abstract_html = markdown2.markdown( + summary_intro.get('content', ''), + extras=['fenced-code-blocks', 'tables'] + ) + parts.append(f'
{abstract_html}
') + parts.append( + f'' + ) + elif summary_intro.get('enabled') and summary_intro.get('error'): + error_text = _escape_html(str(summary_intro.get('error', ''))) + parts.append( + '

A summary intro was requested, ' + 'but could not be generated for this export.
' + f'Error: {error_text}

' + ) + + # --- Transcript with chat bubbles --- + parts.append('

Transcript

') + if not transcript_messages: + parts.append( + '

No user or assistant transcript messages were available for export.

' + ) + else: + for message in transcript_messages: + role = message.get('role', '') + bubble_class = _pdf_bubble_class(role) + label = message.get('label', '') + speaker = message.get('speaker_label', '') + timestamp = message.get('timestamp', '') + content = message.get('content_text', '') or 'No content recorded.' + + parts.append(f'
') + ts_str = ( + f'  |  {_escape_html(str(timestamp))}' + if timestamp else '' + ) + parts.append( + f'

{_escape_html(label)} — ' + f'{_escape_html(speaker)}{ts_str}

' + ) + content_html = markdown2.markdown( + content, + extras=['fenced-code-blocks', 'tables', 'break-on-newline'] + ) + parts.append(content_html) + parts.append('
') + + # --- Appendix A: Conversation Metadata --- + parts.append('

Appendix A — Conversation Metadata

') + metadata_to_render = _remove_empty_values({ + 'context': conversation.get('context'), + 'classification': conversation.get('classification'), + 'strict': conversation.get('strict'), + 'is_pinned': conversation.get('is_pinned'), + 'scope_locked': conversation.get('scope_locked'), + 'locked_contexts': conversation.get('locked_contexts'), + 'message_counts_by_role': conversation.get('message_counts_by_role'), + 'citation_counts': conversation.get('citation_counts'), + 'thought_count': conversation.get('thought_count') + }) + _append_html_table(parts, metadata_to_render) + + # --- Appendix B: Message Details --- + if detail_messages: + parts.append('

Appendix B — Message Details

') + for message in detail_messages: + parts.append( + f'

{_escape_html(message.get("label", ""))} — ' + f'{_escape_html(message.get("speaker_label", ""))}

' + ) + if message.get('timestamp'): + parts.append( + f'' + ) + _append_html_table(parts, message.get('details', {})) + + # --- Appendix C: References --- + if reference_messages: + parts.append('

Appendix C — References

') + for message in reference_messages: + parts.append( + f'

{_escape_html(message.get("label", ""))} — ' + f'{_escape_html(message.get("speaker_label", ""))}

' + ) + if message.get('timestamp'): + parts.append( + f'' + ) + _append_html_citations(parts, message) + + # --- Appendix D: Processing Thoughts --- + if thought_messages: + parts.append('

Appendix D — Processing Thoughts

') + for message in thought_messages: + parts.append( + f'

{_escape_html(message.get("label", ""))} — ' + f'{_escape_html(message.get("speaker_label", ""))}

' + ) + if message.get('timestamp'): + parts.append( + f'' + ) + parts.append('
    ') + for thought in message.get('thoughts', []): + thought_label = (thought.get('step_type') or 'step').replace('_', ' ').title() + parts.append( + f'
  1. {_escape_html(thought_label)}: ' + f'{_escape_html(str(thought.get("content") or "No content recorded."))}' + ) + if thought.get('duration_ms') is not None: + parts.append( + f'
    Duration: {thought.get("duration_ms")} ms' + ) + if thought.get('timestamp'): + parts.append( + f'
    Timestamp: ' + f'{_escape_html(str(thought.get("timestamp")))}' + ) + if thought.get('detail'): + parts.append('
    Detail:') + _append_html_code_block(parts, thought.get('detail')) + parts.append('
  2. ') + parts.append('
') + + # --- Appendix E: Supplemental Messages --- + if supplemental_messages: + parts.append('

Appendix E — Supplemental Messages

') + for message in supplemental_messages: + parts.append( + f'

{_escape_html(message.get("label", ""))} — ' + f'{_escape_html(message.get("speaker_label", ""))}

' + ) + if message.get('timestamp'): + parts.append( + f'' + ) + content = message.get('content_text', '') or 'No content recorded.' + content_html = markdown2.markdown( + content, + extras=['fenced-code-blocks', 'tables', 'break-on-newline'] + ) + parts.append(content_html) + + return '\n'.join(parts) + + +def _render_pdf_bytes(body_html: str) -> bytes: + """Render HTML body content to PDF bytes using PyMuPDF Story API.""" + MEDIABOX = fitz.paper_rect("letter") + WHERE = MEDIABOX + (36, 36, -36, -36) + + story = fitz.Story(html=body_html, user_css=_PDF_CSS) + + tmp_path = None + try: + with tempfile.NamedTemporaryFile(suffix='.pdf', delete=False) as tmp: + tmp_path = tmp.name + + writer = fitz.DocumentWriter(tmp_path) + more = True + while more: + device = writer.begin_page(MEDIABOX) + more, _ = story.place(WHERE) + story.draw(device) + writer.end_page() + writer.close() + del story + del writer + + with open(tmp_path, 'rb') as f: + return f.read() + finally: + if tmp_path: + try: + os.unlink(tmp_path) + except OSError: + pass + + +def _conversation_to_pdf_bytes(entry: Dict[str, Any]) -> bytes: + """Convert a conversation export entry to PDF bytes.""" + body_html = _build_pdf_html_body(entry) + return _render_pdf_bytes(body_html) + + +def _html_body_to_pdf_bytes(body_html: str) -> bytes: + """Convert raw HTML body content to PDF bytes.""" + return _render_pdf_bytes(body_html) + + +def _append_html_table(parts: List[str], mapping: Dict[str, Any]): + """Append a key-value mapping as an HTML table.""" + if not isinstance(mapping, dict) or not mapping: + parts.append('

No data available.

') + return + + parts.append('') + parts.append('') + for key, value in mapping.items(): + label = _format_markdown_key(key) + if isinstance(value, dict): + formatted = _format_nested_html_value(value) + elif isinstance(value, list): + formatted = ( + ', '.join(_escape_html(str(item)) for item in value) + if value else 'None' + ) + elif isinstance(value, bool): + formatted = 'Yes' if value else 'No' + else: + formatted = _escape_html(str(value)) + parts.append(f'') + parts.append('
PropertyValue
{_escape_html(label)}{formatted}
') + + +def _format_nested_html_value(mapping: Dict[str, Any], depth: int = 0) -> str: + """Format a nested dict as an HTML string for table cells.""" + if not mapping: + return 'None' + + items = [] + for key, value in mapping.items(): + label = _format_markdown_key(key) + if isinstance(value, dict): + nested = _format_nested_html_value(value, depth + 1) + items.append(f'{_escape_html(label)}:
{nested}') + elif isinstance(value, list): + list_str = ( + ', '.join(_escape_html(str(v)) for v in value) + if value else 'None' + ) + items.append(f'{_escape_html(label)}: {list_str}') + elif isinstance(value, bool): + items.append(f'{_escape_html(label)}: {"Yes" if value else "No"}') + else: + items.append(f'{_escape_html(label)}: {_escape_html(str(value))}') + return '
'.join(items) + + +def _append_html_citations(parts: List[str], message: Dict[str, Any]): + """Append citation data as HTML.""" + citations = message.get('citations', []) + if not citations: + parts.append('

No citations were recorded for this message.

') + return + + doc_citations = [c for c in citations if c.get('citation_type') == 'document'] + web_citations = [c for c in citations if c.get('citation_type') == 'web'] + agent_citations = [c for c in citations if c.get('citation_type') == 'agent_tool'] + legacy_citations = [c for c in citations if c.get('citation_type') == 'legacy'] + + if doc_citations: + parts.append('

Document Sources

') + parts.append('
    ') + for citation in doc_citations: + parts.append( + f'
  1. {_escape_html(str(citation.get("label", "Document source")))}' + ) + detail_items = _remove_empty_values({ + 'citation_id': citation.get('citation_id'), + 'page_number': citation.get('page_number'), + 'classification': citation.get('classification'), + 'score': citation.get('score'), + 'metadata_type': citation.get('metadata_type') + }) + if detail_items: + detail_str = '; '.join( + f'{_format_markdown_key(k)}: {_escape_html(str(v))}' + for k, v in detail_items.items() + ) + parts.append(f'
    {detail_str}') + if citation.get('metadata_content'): + parts.append('
    Metadata Content:') + _append_html_code_block(parts, citation.get('metadata_content')) + parts.append('
  2. ') + parts.append('
') + + if web_citations: + parts.append('

Web Sources

') + parts.append('
    ') + for citation in web_citations: + title = _escape_html( + str(citation.get('title') or citation.get('label') or 'Web source') + ) + url = citation.get('url') + if url: + parts.append(f'
  1. {title}
  2. ') + else: + parts.append(f'
  3. {title}
  4. ') + parts.append('
') + + if agent_citations: + parts.append('

Tool Invocations

') + parts.append('
    ') + for citation in agent_citations: + label = _escape_html( + str(citation.get('tool_name') or citation.get('function_name') or 'Tool') + ) + parts.append(f'
  1. {label}') + detail_items = _remove_empty_values({ + 'function_name': citation.get('function_name'), + 'plugin_name': citation.get('plugin_name'), + 'success': citation.get('success'), + 'timestamp': citation.get('timestamp') + }) + if detail_items: + detail_str = '; '.join( + f'{_format_markdown_key(k)}: {_escape_html(str(v))}' + for k, v in detail_items.items() + ) + parts.append(f'
    {detail_str}') + parts.append('
  2. ') + parts.append('
') + + if legacy_citations: + parts.append('

Legacy Citation Records

') + parts.append('
    ') + for citation in legacy_citations: + parts.append( + f'
  1. {_escape_html(str(citation.get("label", "Legacy citation")))}
  2. ' + ) + parts.append('
') + + +def _append_html_code_block(parts: List[str], value: Any): + """Append a code block in HTML format.""" + if isinstance(value, (dict, list)): + code_text = json.dumps(value, indent=2, ensure_ascii=False, default=str) + else: + code_text = str(value) + parts.append(f'
{_escape_html(code_text)}
') diff --git a/application/single_app/route_backend_conversations.py b/application/single_app/route_backend_conversations.py index f267d729..58c2fd41 100644 --- a/application/single_app/route_backend_conversations.py +++ b/application/single_app/route_backend_conversations.py @@ -3,11 +3,14 @@ from config import * from functions_authentication import * from functions_settings import * -from functions_conversation_metadata import get_conversation_metadata +from functions_conversation_metadata import get_conversation_metadata, update_conversation_with_metadata +from functions_conversation_unread import clear_conversation_unread, normalize_conversation_unread_state +from functions_notifications import mark_chat_response_notifications_read_for_conversation from flask import Response, request from functions_debug import debug_print from swagger_wrapper import swagger_route, get_auth_security from functions_activity_logging import log_conversation_creation, log_conversation_deletion, log_conversation_archival +from functions_thoughts import archive_thoughts_for_conversation, delete_thoughts_for_conversation def register_route_backend_conversations(app): @@ -287,8 +290,9 @@ def get_conversations(): return jsonify({'error': 'User not authenticated'}), 401 query = f"SELECT * FROM c WHERE c.user_id = '{user_id}' ORDER BY c.last_updated DESC" items = list(cosmos_conversations_container.query_items(query=query, enable_cross_partition_query=True)) + normalized_items = [normalize_conversation_unread_state(item) for item in items] return jsonify({ - 'conversations': items + 'conversations': normalized_items }), 200 @@ -311,7 +315,10 @@ def create_conversation(): 'tags': [], 'strict': False, 'is_pinned': False, - 'is_hidden': False + 'is_hidden': False, + 'has_unread_assistant_response': False, + 'last_unread_assistant_message_id': None, + 'last_unread_assistant_at': None, } cosmos_conversations_container.upsert_item(conversation_item) @@ -430,7 +437,14 @@ def delete_conversation(conversation_id): cosmos_archived_messages_container.upsert_item(archived_doc) cosmos_messages_container.delete_item(doc['id'], partition_key=conversation_id) - + + # Archive/delete thoughts for conversation + user_id_for_thoughts = conversation_item.get('user_id') + if archiving_enabled: + archive_thoughts_for_conversation(conversation_id, user_id_for_thoughts) + else: + delete_thoughts_for_conversation(conversation_id, user_id_for_thoughts) + # Log conversation deletion before actual deletion log_conversation_deletion( user_id=conversation_item.get('user_id'), @@ -530,7 +544,13 @@ def delete_multiple_conversations(): cosmos_archived_messages_container.upsert_item(archived_message) cosmos_messages_container.delete_item(message['id'], partition_key=conversation_id) - + + # Archive/delete thoughts for conversation + if archiving_enabled: + archive_thoughts_for_conversation(conversation_id, user_id) + else: + delete_thoughts_for_conversation(conversation_id, user_id) + # Log conversation deletion before actual deletion log_conversation_deletion( user_id=user_id, @@ -779,6 +799,7 @@ def get_conversation_metadata_api(conversation_id): item=conversation_id, partition_key=conversation_id ) + conversation_item = normalize_conversation_unread_state(conversation_item) # Ensure that the conversation belongs to the current user if conversation_item.get('user_id') != user_id: @@ -796,9 +817,13 @@ def get_conversation_metadata_api(conversation_id): "strict": conversation_item.get('strict', False), "is_pinned": conversation_item.get('is_pinned', False), "is_hidden": conversation_item.get('is_hidden', False), + "has_unread_assistant_response": conversation_item.get('has_unread_assistant_response', False), + "last_unread_assistant_message_id": conversation_item.get('last_unread_assistant_message_id'), + "last_unread_assistant_at": conversation_item.get('last_unread_assistant_at'), "scope_locked": conversation_item.get('scope_locked'), "locked_contexts": conversation_item.get('locked_contexts', []), - "chat_type": conversation_item.get('chat_type') + "chat_type": conversation_item.get('chat_type'), + "summary": conversation_item.get('summary') }), 200 except CosmosResourceNotFoundError: @@ -807,6 +832,135 @@ def get_conversation_metadata_api(conversation_id): print(f"Error retrieving conversation metadata: {e}") return jsonify({'error': 'Failed to retrieve conversation metadata'}), 500 + @app.route('/api/conversations//mark-read', methods=['POST']) + @swagger_route(security=get_auth_security()) + @login_required + @user_required + def mark_conversation_read_api(conversation_id): + """Clear unread assistant-response state and related chat notifications.""" + user_id = get_current_user_id() + if not user_id: + return jsonify({'error': 'User not authenticated'}), 401 + + try: + conversation_item = cosmos_conversations_container.read_item( + item=conversation_id, + partition_key=conversation_id + ) + conversation_item = normalize_conversation_unread_state(conversation_item) + + if conversation_item.get('user_id') != user_id: + return jsonify({'error': 'Forbidden'}), 403 + + conversation_item = clear_conversation_unread(conversation_item) + cosmos_conversations_container.upsert_item(conversation_item) + + notifications_marked_read = mark_chat_response_notifications_read_for_conversation( + user_id, + conversation_id + ) + + return jsonify({ + 'success': True, + 'conversation_id': conversation_id, + 'has_unread_assistant_response': False, + 'notifications_marked_read': notifications_marked_read, + }), 200 + except CosmosResourceNotFoundError: + return jsonify({'error': 'Conversation not found'}), 404 + except Exception as e: + debug_print(f"Error marking conversation {conversation_id} as read: {e}") + return jsonify({'error': 'Failed to mark conversation as read'}), 500 + + @app.route('/api/conversations//summary', methods=['POST']) + @swagger_route(security=get_auth_security()) + @login_required + @user_required + def generate_conversation_summary_api(conversation_id): + """ + Generate (or regenerate) a summary for a conversation and persist it. + + Request body (optional): + { "model_deployment": "gpt-4o" } + + Returns the generated summary dict on success. + """ + from route_backend_conversation_export import generate_conversation_summary, _normalize_content + from functions_chat import sort_messages_by_thread + + user_id = get_current_user_id() + if not user_id: + return jsonify({'error': 'User not authenticated'}), 401 + + try: + conversation_item = cosmos_conversations_container.read_item( + item=conversation_id, + partition_key=conversation_id + ) + if conversation_item.get('user_id') != user_id: + return jsonify({'error': 'Forbidden'}), 403 + except CosmosResourceNotFoundError: + return jsonify({'error': 'Conversation not found'}), 404 + except Exception as e: + debug_print(f"Error reading conversation for summary: {e}") + return jsonify({'error': 'Failed to read conversation'}), 500 + + body = request.get_json(silent=True) or {} + model_deployment = body.get('model_deployment', '') + + # Query messages for this conversation + try: + query = "SELECT * FROM c WHERE c.conversation_id = @cid ORDER BY c.timestamp ASC" + params = [{"name": "@cid", "value": conversation_id}] + raw_messages = list(cosmos_messages_container.query_items( + query=query, + parameters=params, + enable_cross_partition_query=True + )) + except Exception as e: + debug_print(f"Error querying messages for summary: {e}") + return jsonify({'error': 'Failed to query messages'}), 500 + + if not raw_messages: + return jsonify({'error': 'No messages in this conversation'}), 400 + + # Build lightweight export-style message list for the summary helper + ordered_messages = sort_messages_by_thread(raw_messages) + export_messages = [] + for msg in ordered_messages: + role = msg.get('role', 'unknown') + # Content may be a string OR a list of content parts — normalise it + content = _normalize_content(msg.get('content', '')) + speaker = 'USER' if role == 'user' else 'ASSISTANT' if role == 'assistant' else role.upper() + export_messages.append({ + 'role': role, + 'content_text': content, + 'speaker_label': speaker + }) + + message_time_start = ordered_messages[0].get('timestamp') if ordered_messages else None + message_time_end = ordered_messages[-1].get('timestamp') if ordered_messages else None + + settings = get_settings() + + try: + summary_data = generate_conversation_summary( + messages=export_messages, + conversation_title=conversation_item.get('title', 'Untitled'), + settings=settings, + model_deployment=model_deployment, + message_time_start=message_time_start, + message_time_end=message_time_end, + conversation_id=conversation_id + ) + return jsonify({'success': True, 'summary': summary_data}), 200 + + except (ValueError, RuntimeError) as known_exc: + return jsonify({'error': str(known_exc)}), 400 + except Exception as exc: + debug_print(f"Summary generation API error: {exc}") + return jsonify({'error': 'Summary generation failed'}), 500 + @app.route('/api/conversations//scope_lock', methods=['PATCH']) @swagger_route(security=get_auth_security()) @login_required diff --git a/application/single_app/route_backend_documents.py b/application/single_app/route_backend_documents.py index fb4eb19b..0e9d490b 100644 --- a/application/single_app/route_backend_documents.py +++ b/application/single_app/route_backend_documents.py @@ -7,6 +7,7 @@ from utils_cache import invalidate_personal_search_cache from functions_debug import * from functions_activity_logging import log_document_upload, log_document_metadata_update_transaction +import io import os import requests from flask import current_app @@ -72,7 +73,58 @@ def get_file_content(): filename = items_sorted[0].get('filename', 'Untitled') is_table = items_sorted[0].get('is_table', False) - debug_print(f"[GET_FILE_CONTENT] Filename: {filename}, is_table: {is_table}") + file_content_source = items_sorted[0].get('file_content_source', '') + debug_print(f"[GET_FILE_CONTENT] Filename: {filename}, is_table: {is_table}, source: {file_content_source}") + + # Handle blob-stored tabular files (enhanced citations enabled) + if file_content_source == 'blob': + blob_container = items_sorted[0].get('blob_container', '') + blob_path = items_sorted[0].get('blob_path', '') + debug_print(f"[GET_FILE_CONTENT] Blob-stored file: container={blob_container}, path={blob_path}") + + if not blob_container or not blob_path: + return jsonify({'error': 'Blob storage reference is incomplete'}), 500 + + try: + blob_service_client = CLIENTS.get("storage_account_office_docs_client") + if not blob_service_client: + return jsonify({'error': 'Blob storage client not available'}), 500 + + blob_client = blob_service_client.get_blob_client( + container=blob_container, + blob=blob_path + ) + stream = blob_client.download_blob() + blob_data = stream.readall() + + # Convert to CSV using pandas for display + file_ext = os.path.splitext(filename)[1].lower() + if file_ext == '.csv': + import pandas + df = pandas.read_csv(io.BytesIO(blob_data)) + combined_content = df.to_csv(index=False) + elif file_ext in ['.xlsx', '.xlsm']: + import pandas + df = pandas.read_excel(io.BytesIO(blob_data), engine='openpyxl') + combined_content = df.to_csv(index=False) + elif file_ext == '.xls': + import pandas + df = pandas.read_excel(io.BytesIO(blob_data), engine='xlrd') + combined_content = df.to_csv(index=False) + else: + combined_content = blob_data.decode('utf-8', errors='replace') + + debug_print(f"[GET_FILE_CONTENT] Successfully read blob content, length: {len(combined_content)}") + return jsonify({ + 'file_content': combined_content, + 'filename': filename, + 'is_table': is_table, + 'file_content_source': 'blob' + }), 200 + + except Exception as blob_err: + debug_print(f"[GET_FILE_CONTENT] Error reading from blob: {blob_err}") + return jsonify({'error': f'Error reading file from storage: {str(blob_err)}'}), 500 add_file_task_to_file_processing_log(document_id=file_id, user_id=user_id, content="Combining file content from chunks, filename: " + filename + ", is_table: " + str(is_table)) combined_parts = [] @@ -1378,7 +1430,7 @@ def api_get_shared_users(document_id): approval_status = entry.get('approval_status', 'unknown') try: # Get user details from Microsoft Graph - graph_url = f"https://graph.microsoft.com/v1.0/users/{oid}" + graph_url = get_graph_endpoint(f"/users/{oid}") response = requests.get(graph_url, headers=headers) if response.status_code == 200: diff --git a/application/single_app/route_backend_plugins.py b/application/single_app/route_backend_plugins.py index 77aab866..153f07ec 100644 --- a/application/single_app/route_backend_plugins.py +++ b/application/single_app/route_backend_plugins.py @@ -27,11 +27,22 @@ delete_group_action, validate_group_action_payload, ) -from functions_keyvault import SecretReturnType +from functions_keyvault import ( + SecretReturnType, + redact_plugin_secret_values, + retrieve_secret_from_key_vault_by_full_name, + ui_trigger_word, + validate_secret_name_dynamic, +) #from functions_personal_actions import delete_personal_action from functions_debug import debug_print from json_schema_validation import validate_plugin +from functions_activity_logging import ( + log_action_creation, + log_action_update, + log_action_deletion, +) def discover_plugin_types(): # Dynamically discover allowed plugin types from available plugin classes. @@ -211,6 +222,51 @@ def get_plugin_types(): bpap = Blueprint('admin_plugins', __name__) + +def _redact_plugin_for_logging(plugin): + """Return a plugin manifest with secret-bearing values redacted for logging.""" + if not isinstance(plugin, dict): + return plugin + return redact_plugin_secret_values(plugin) + + +def _resolve_secret_value_for_sql_test(value, field_name): + """Resolve a Key Vault reference for SQL test-connection flows.""" + if not isinstance(value, str) or not value: + return value + if not validate_secret_name_dynamic(value): + return value + + resolved_value = retrieve_secret_from_key_vault_by_full_name(value) + if validate_secret_name_dynamic(resolved_value): + raise ValueError(f"Unable to resolve stored Key Vault secret for SQL field '{field_name}'.") + return resolved_value + + +def _load_existing_plugin_for_sql_test(plugin_context, user_id): + """Load an existing plugin manifest with Key Vault reference names for edit-time SQL tests.""" + if not isinstance(plugin_context, dict): + return None + + plugin_scope = (plugin_context.get('scope') or 'user').lower() + plugin_identifier = plugin_context.get('id') or plugin_context.get('name') + if not plugin_identifier: + return None + + if plugin_scope == 'group': + active_group = require_active_group(user_id) + assert_group_role( + user_id, + active_group, + allowed_roles=("Owner", "Admin", "DocumentManager", "User"), + ) + return get_group_action(active_group, plugin_identifier, return_type=SecretReturnType.NAME) + + if plugin_scope == 'global': + return get_global_action(plugin_identifier, return_type=SecretReturnType.NAME) + + return get_personal_action(user_id, plugin_identifier, return_type=SecretReturnType.NAME) + # === USER PLUGINS ENDPOINTS === @bpap.route('/api/user/plugins', methods=['GET']) @swagger_route(security=get_auth_security()) @@ -268,12 +324,14 @@ def set_user_plugins(): global_plugin_names = set(p['name'].lower() for p in global_plugins if 'name' in p) # Get current personal actions to determine what to delete - current_actions = get_personal_actions(user_id) + current_actions = get_personal_actions(user_id, return_type=SecretReturnType.NAME) current_action_names = set(action['name'] for action in current_actions) + current_action_ids = {action.get('id') for action in current_actions if action.get('id')} # Filter out plugins whose name matches a global plugin name filtered_plugins = [] new_plugin_names = set() + new_plugin_ids = set() for plugin in plugins: if plugin.get('name', '').lower() in global_plugin_names: @@ -290,7 +348,7 @@ def set_user_plugins(): plugin.setdefault('additionalFields', {}) # Remove Cosmos DB system fields that are not part of the plugin schema - cosmos_fields = ['_attachments', '_etag', '_rid', '_self', '_ts', 'created_at', 'updated_at', 'id', 'user_id', 'last_updated'] + cosmos_fields = ['_attachments', '_etag', '_rid', '_self', '_ts', 'created_at', 'updated_at', 'user_id', 'last_updated'] for field in cosmos_fields: if field in plugin: del plugin[field] @@ -324,27 +382,53 @@ def set_user_plugins(): else: plugin['type'] = 'unknown' # Default type - print(f"Plugin build: {plugin}") + debug_print(f"Plugin build: {_redact_plugin_for_logging(plugin)}") validation_error = validate_plugin(plugin) if validation_error: return jsonify({'error': f'Plugin validation failed: {validation_error}'}), 400 filtered_plugins.append(plugin) new_plugin_names.add(plugin['name']) + if plugin.get('id'): + new_plugin_ids.add(plugin['id']) # Save each plugin to the personal_actions container + plugins_to_delete = [] try: for plugin in filtered_plugins: save_personal_action(user_id, plugin) # Delete any plugins that are no longer in the list - plugins_to_delete = current_action_names - new_plugin_names - for plugin_name in plugins_to_delete: - delete_personal_action(user_id, plugin_name) + for action in current_actions: + action_id = action.get('id') + action_name = action.get('name') + if action_id and action_id in new_plugin_ids: + continue + if action_name in new_plugin_names: + continue + plugins_to_delete.append(action) + + for action in plugins_to_delete: + delete_personal_action(user_id, action.get('id') or action.get('name')) except Exception as e: debug_print(f"Error saving personal actions for user {user_id}: {e}") return jsonify({'error': 'Failed to save plugins'}), 500 + + # Log individual action activities + for plugin in filtered_plugins: + p_name = plugin.get('name', '') + p_id = plugin.get('id', '') + p_type = plugin.get('type', '') + if (p_id and p_id in current_action_ids) or p_name in current_action_names: + log_action_update(user_id=user_id, action_id=p_id, action_name=p_name, action_type=p_type, scope='personal') + else: + log_action_creation(user_id=user_id, action_id=p_id, action_name=p_name, action_type=p_type, scope='personal') + for action in plugins_to_delete: + action_id = action.get('id', '') + action_name = action.get('name', '') + log_action_deletion(user_id=user_id, action_id=action_id, action_name=action_name, scope='personal') + log_event("User plugins updated", extra={"user_id": user_id, "plugins_count": len(filtered_plugins)}) return jsonify({'success': True}) @@ -360,6 +444,7 @@ def delete_user_plugin(plugin_name): if not deleted: return jsonify({'error': 'Plugin not found.'}), 404 + log_action_deletion(user_id=user_id, action_id=plugin_name, action_name=plugin_name, scope='personal') log_event("User plugin deleted", extra={"user_id": user_id, "plugin_name": plugin_name}) return jsonify({'success': True}) @@ -460,6 +545,13 @@ def create_group_action_route(): for key in ('group_id', 'last_updated', 'user_id', 'is_global', 'is_group', 'scope'): payload.pop(key, None) + # Handle endpoint based on plugin type (same logic as personal plugins) + plugin_type = payload.get('type', '') + if plugin_type in ['sql_schema', 'sql_query']: + payload.setdefault('endpoint', f'sql://{plugin_type}') + elif plugin_type == 'msgraph': + payload.setdefault('endpoint', 'https://graph.microsoft.com') + # Merge with schema to ensure all required fields are present (same as global actions) schema_dir = os.path.join(current_app.root_path, 'static', 'json', 'schemas') merged = get_merged_plugin_settings(payload.get('type'), payload, schema_dir) @@ -467,11 +559,12 @@ def create_group_action_route(): payload['additionalFields'] = merged.get('additionalFields', payload.get('additionalFields', {})) try: - saved = save_group_action(active_group, payload) + saved = save_group_action(active_group, payload, user_id=user_id) except Exception as exc: debug_print('Failed to save group action: %s', exc) return jsonify({'error': 'Unable to save action'}), 500 + log_action_creation(user_id=user_id, action_id=saved.get('id', ''), action_name=saved.get('name', ''), action_type=saved.get('type', ''), scope='group', group_id=active_group) return jsonify(saved), 201 @@ -516,6 +609,13 @@ def update_group_action_route(action_id): merged['is_group'] = True merged['id'] = existing.get('id', action_id) + # Handle endpoint based on plugin type (same logic as personal plugins) + plugin_type = merged.get('type', '') + if plugin_type in ['sql_schema', 'sql_query']: + merged.setdefault('endpoint', f'sql://{plugin_type}') + elif plugin_type == 'msgraph': + merged.setdefault('endpoint', 'https://graph.microsoft.com') + try: validate_group_action_payload(merged, partial=False) except ValueError as exc: @@ -528,11 +628,12 @@ def update_group_action_route(action_id): merged['additionalFields'] = schema_merged.get('additionalFields', merged.get('additionalFields', {})) try: - saved = save_group_action(active_group, merged) + saved = save_group_action(active_group, merged, user_id=user_id) except Exception as exc: debug_print('Failed to update group action %s: %s', action_id, exc) return jsonify({'error': 'Unable to update action'}), 500 + log_action_update(user_id=user_id, action_id=action_id, action_name=saved.get('name', ''), action_type=saved.get('type', ''), scope='group', group_id=active_group) return jsonify(saved), 200 @@ -563,6 +664,7 @@ def delete_group_action_route(action_id): if not removed: return jsonify({'error': 'Action not found'}), 404 + log_action_deletion(user_id=user_id, action_id=action_id, action_name=action_id, scope='group', group_id=active_group) return jsonify({'message': 'Action deleted'}), 200 @bpap.route('/api/user/plugins/types', methods=['GET']) @@ -588,6 +690,8 @@ def get_core_plugin_settings(): 'enable_text_plugin': bool(settings.get('enable_text_plugin', True)), 'enable_default_embedding_model_plugin': bool(settings.get('enable_default_embedding_model_plugin', True)), 'enable_fact_memory_plugin': bool(settings.get('enable_fact_memory_plugin', True)), + 'enable_tabular_processing_plugin': bool(settings.get('enable_tabular_processing_plugin', False)), + 'enable_enhanced_citations': bool(settings.get('enable_enhanced_citations', False)), 'enable_semantic_kernel': bool(settings.get('enable_semantic_kernel', False)), 'allow_user_plugins': bool(settings.get('allow_user_plugins', True)), 'allow_group_plugins': bool(settings.get('allow_group_plugins', True)), @@ -610,6 +714,7 @@ def update_core_plugin_settings(): 'enable_text_plugin', 'enable_default_embedding_model_plugin', 'enable_fact_memory_plugin', + 'enable_tabular_processing_plugin', 'allow_user_plugins', 'allow_group_plugins' ] @@ -627,6 +732,11 @@ def update_core_plugin_settings(): return jsonify({'error': f"Field '{key}' must be a boolean."}), 400 updates[key] = data[key] logging.info("Validated plugin settings: %s", updates) + # Dependency: tabular processing requires enhanced citations + if updates.get('enable_tabular_processing_plugin', False): + full_settings = get_settings() + if not full_settings.get('enable_enhanced_citations', False): + return jsonify({'error': 'Tabular Processing requires Enhanced Citations to be enabled.'}), 400 # Update settings success = update_settings(updates) if success: @@ -662,7 +772,7 @@ def add_plugin(): allowed_types = discover_plugin_types() validation_error = validate_plugin(new_plugin) if validation_error: - log_event("Add plugin failed: validation error", level=logging.WARNING, extra={"action": "add", "plugin": new_plugin, "error": validation_error}) + log_event("Add plugin failed: validation error", level=logging.WARNING, extra={"action": "add", "plugin": _redact_plugin_for_logging(new_plugin), "error": validation_error}) return jsonify({'error': validation_error}), 400 if allowed_types is not None and new_plugin.get('type') not in allowed_types: @@ -673,7 +783,7 @@ def add_plugin(): is_valid, validation_errors = PluginHealthChecker.validate_plugin_manifest(new_plugin, plugin_type) if not is_valid: log_event("Add plugin failed: manifest validation error", level=logging.WARNING, - extra={"action": "add", "plugin": new_plugin, "errors": validation_errors}) + extra={"action": "add", "plugin": _redact_plugin_for_logging(new_plugin), "errors": validation_errors}) return jsonify({'error': f"Manifest validation failed: {'; '.join(validation_errors)}"}), 400 # Merge with schema to ensure all required fields are present @@ -684,7 +794,7 @@ def add_plugin(): # Prevent duplicate names (case-insensitive) if any(p['name'].lower() == new_plugin['name'].lower() for p in plugins): - log_event("Add plugin failed: duplicate name", level=logging.WARNING, extra={"action": "add", "plugin": new_plugin}) + log_event("Add plugin failed: duplicate name", level=logging.WARNING, extra={"action": "add", "plugin": _redact_plugin_for_logging(new_plugin)}) return jsonify({'error': 'Plugin with this name already exists.'}), 400 # Assign a unique ID @@ -692,9 +802,10 @@ def add_plugin(): new_plugin['id'] = plugin_id # Save to global actions container - save_global_action(new_plugin) + save_global_action(new_plugin, user_id=str(get_current_user_id())) - log_event("Plugin added", extra={"action": "add", "plugin": new_plugin, "user": str(getattr(request, 'user', 'unknown'))}) + log_action_creation(user_id=str(get_current_user_id()), action_id=plugin_id, action_name=new_plugin.get('name', ''), action_type=new_plugin.get('type', ''), scope='global') + log_event("Plugin added", extra={"action": "add", "plugin": _redact_plugin_for_logging(new_plugin), "user": str(get_current_user_id())}) # --- HOT RELOAD TRIGGER --- setattr(builtins, "kernel_reload_needed", True) @@ -716,7 +827,7 @@ def edit_plugin(plugin_name): allowed_types = discover_plugin_types() validation_error = validate_plugin(updated_plugin) if validation_error: - log_event("Edit plugin failed: validation error", level=logging.WARNING, extra={"action": "edit", "plugin": updated_plugin, "error": validation_error}) + log_event("Edit plugin failed: validation error", level=logging.WARNING, extra={"action": "edit", "plugin": _redact_plugin_for_logging(updated_plugin), "error": validation_error}) return jsonify({'error': validation_error}), 400 if allowed_types is not None and updated_plugin.get('type') not in allowed_types: @@ -727,7 +838,7 @@ def edit_plugin(plugin_name): is_valid, validation_errors = PluginHealthChecker.validate_plugin_manifest(updated_plugin, plugin_type) if not is_valid: log_event("Edit plugin failed: manifest validation error", level=logging.WARNING, - extra={"action": "edit", "plugin": updated_plugin, "errors": validation_errors}) + extra={"action": "edit", "plugin": _redact_plugin_for_logging(updated_plugin), "errors": validation_errors}) return jsonify({'error': f"Manifest validation failed: {'; '.join(validation_errors)}"}), 400 # Merge with schema to ensure all required fields are present @@ -744,18 +855,24 @@ def edit_plugin(plugin_name): break if found_plugin: + duplicate_name = updated_plugin.get('name', '').lower() + if duplicate_name and any( + p.get('name', '').lower() == duplicate_name and p.get('id') != found_plugin.get('id') + for p in plugins + ): + log_event("Edit plugin failed: duplicate name", level=logging.WARNING, extra={"action": "edit", "plugin": _redact_plugin_for_logging(updated_plugin)}) + return jsonify({'error': 'Plugin with this name already exists.'}), 400 + # Preserve the existing ID if it exists if 'id' in found_plugin: updated_plugin['id'] = found_plugin['id'] else: updated_plugin['id'] = str(uuid.uuid4()) - # Delete old and save updated - if 'id' in found_plugin: - delete_global_action(found_plugin['id']) - save_global_action(updated_plugin) + save_global_action(updated_plugin, user_id=str(get_current_user_id())) - log_event("Plugin edited", extra={"action": "edit", "plugin": updated_plugin, "user": str(getattr(request, 'user', 'unknown'))}) + log_action_update(user_id=str(get_current_user_id()), action_id=updated_plugin.get('id', ''), action_name=plugin_name, action_type=updated_plugin.get('type', ''), scope='global') + log_event("Plugin edited", extra={"action": "edit", "plugin": _redact_plugin_for_logging(updated_plugin), "user": str(get_current_user_id())}) # --- HOT RELOAD TRIGGER --- setattr(builtins, "kernel_reload_needed", True) return jsonify({'success': True}) @@ -796,7 +913,8 @@ def delete_plugin(plugin_name): if 'id' in plugin_to_delete: delete_global_action(plugin_to_delete['id']) - log_event("Plugin deleted", extra={"action": "delete", "plugin_name": plugin_name, "user": str(getattr(request, 'user', 'unknown'))}) + log_action_deletion(user_id=str(get_current_user_id()), action_id=plugin_to_delete.get('id', ''), action_name=plugin_name, action_type=plugin_to_delete.get('type', ''), scope='global') + log_event("Plugin deleted", extra={"action": "delete", "plugin_name": plugin_name, "user": str(get_current_user_id())}) # --- HOT RELOAD TRIGGER --- setattr(builtins, "kernel_reload_needed", True) return jsonify({'success': True}) @@ -928,4 +1046,150 @@ def _merge_group_and_global_actions(group_actions, global_actions): return normalized_actions +@bpap.route('/api/plugins/test-sql-connection', methods=['POST']) +@swagger_route(security=get_auth_security()) +@login_required +@user_required +def test_sql_connection(): + """Test a SQL database connection using provided configuration.""" + data = request.get_json(silent=True) or {} + user_id = get_current_user_id() + database_type = (data.get('database_type') or 'sqlserver').lower() + connection_method = data.get('connection_method', 'parameters') + connection_string = data.get('connection_string', '') + server = data.get('server', '') + database = data.get('database', '') + port = data.get('port', '') + driver = data.get('driver', '') + username = data.get('username', '') + password = data.get('password', '') + auth_type = data.get('auth_type', 'username_password') + timeout = min(int(data.get('timeout', 10)), 15) # Cap at 15 seconds for test + + try: + existing_plugin = _load_existing_plugin_for_sql_test(data.get('existing_plugin'), user_id) + except PermissionError as exc: + return jsonify({'success': False, 'error': str(exc)}), 403 + except LookupError as exc: + return jsonify({'success': False, 'error': str(exc)}), 404 + except ValueError as exc: + return jsonify({'success': False, 'error': str(exc)}), 400 + + existing_additional_fields = {} + if isinstance(existing_plugin, dict) and isinstance(existing_plugin.get('additionalFields'), dict): + existing_additional_fields = existing_plugin['additionalFields'] + + if connection_string == ui_trigger_word: + connection_string = existing_additional_fields.get('connection_string', '') + if password == ui_trigger_word: + password = existing_additional_fields.get('password', '') + + unresolved_fields = [] + if connection_string == ui_trigger_word: + unresolved_fields.append('connection string') + if password == ui_trigger_word: + unresolved_fields.append('password') + if unresolved_fields: + field_list = ', '.join(unresolved_fields) + return jsonify({'success': False, 'error': f"Stored SQL secret could not be resolved for testing. Re-enter the {field_list}."}), 400 + + try: + connection_string = _resolve_secret_value_for_sql_test(connection_string, 'connection_string') + password = _resolve_secret_value_for_sql_test(password, 'password') + except ValueError as exc: + return jsonify({'success': False, 'error': str(exc)}), 400 + + # Map azure_sql to sqlserver + if database_type in ('azure_sql', 'azuresql'): + database_type = 'sqlserver' + + try: + if database_type == 'sqlserver': + import pyodbc + if connection_method == 'connection_string' and connection_string: + conn = pyodbc.connect(connection_string, timeout=timeout) + else: + if not server or not database: + return jsonify({'success': False, 'error': 'Server and database are required for individual parameters connection.'}), 400 + drv = driver or 'ODBC Driver 17 for SQL Server' + conn_str = f"DRIVER={{{drv}}};SERVER={server};DATABASE={database}" + if port: + conn_str += f",{port}" + if auth_type == 'username_password' and username and password: + conn_str += f";UID={username};PWD={password}" + elif auth_type == 'managed_identity': + conn_str += ";Authentication=ActiveDirectoryMsi" + elif auth_type == 'integrated': + conn_str += ";Trusted_Connection=yes" + conn = pyodbc.connect(conn_str, timeout=timeout) + cursor = conn.cursor() + cursor.execute("SELECT 1") + cursor.close() + conn.close() + return jsonify({'success': True, 'message': f'Successfully connected to {data.get("database", "database")} on {data.get("server", "server")}.'}) + + elif database_type == 'postgresql': + import psycopg2 + if connection_method == 'connection_string' and connection_string: + conn = psycopg2.connect(connection_string, connect_timeout=timeout) + else: + if not server or not database: + return jsonify({'success': False, 'error': 'Server and database are required.'}), 400 + conn_params = {'host': server, 'database': database, 'connect_timeout': timeout} + if port: + conn_params['port'] = int(port) + if username: + conn_params['user'] = username + if password: + conn_params['password'] = password + conn = psycopg2.connect(**conn_params) + cursor = conn.cursor() + cursor.execute("SELECT 1") + cursor.close() + conn.close() + return jsonify({'success': True, 'message': f'Successfully connected to PostgreSQL database {data.get("database", "")}.'}) + + elif database_type == 'mysql': + import pymysql + if connection_method == 'connection_string' and connection_string: + # pymysql doesn't natively parse connection strings, so use params + return jsonify({'success': False, 'error': 'MySQL test connection requires individual parameters, not a connection string.'}), 400 + if not server or not database: + return jsonify({'success': False, 'error': 'Server and database are required.'}), 400 + conn_params = {'host': server, 'database': database, 'connect_timeout': timeout} + if port: + conn_params['port'] = int(port) + if username: + conn_params['user'] = username + if password: + conn_params['password'] = password + conn = pymysql.connect(**conn_params) + cursor = conn.cursor() + cursor.execute("SELECT 1") + cursor.close() + conn.close() + return jsonify({'success': True, 'message': f'Successfully connected to MySQL database {data.get("database", "")}.'}) + + elif database_type == 'sqlite': + import sqlite3 + db_path = connection_string or database + if not db_path: + return jsonify({'success': False, 'error': 'Database path is required for SQLite.'}), 400 + conn = sqlite3.connect(db_path, timeout=timeout) + cursor = conn.cursor() + cursor.execute("SELECT 1") + cursor.close() + conn.close() + return jsonify({'success': True, 'message': f'Successfully connected to SQLite database.'}) + else: + return jsonify({'success': False, 'error': f'Unsupported database type: {database_type}'}), 400 + + except ImportError as e: + return jsonify({'success': False, 'error': f'Database driver not installed: {str(e)}'}), 400 + except Exception as e: + error_msg = str(e) + # Sanitize error message to avoid leaking sensitive details + if 'password' in error_msg.lower() or 'pwd' in error_msg.lower(): + error_msg = 'Authentication failed. Please check your credentials.' + return jsonify({'success': False, 'error': f'Connection failed: {error_msg}'}), 400 diff --git a/application/single_app/route_backend_public_workspaces.py b/application/single_app/route_backend_public_workspaces.py index ffe679eb..6d14f357 100644 --- a/application/single_app/route_backend_public_workspaces.py +++ b/application/single_app/route_backend_public_workspaces.py @@ -44,12 +44,7 @@ def get_user_details_from_graph(user_id): if not token: return {"displayName": "", "email": ""} - if AZURE_ENVIRONMENT == "usgovernment": - user_endpoint = f"https://graph.microsoft.us/v1.0/users/{user_id}" - elif AZURE_ENVIRONMENT == "custom": - user_endpoint = f"{CUSTOM_GRAPH_URL_VALUE}/{user_id}" - else: - user_endpoint = f"https://graph.microsoft.com/v1.0/users/{user_id}" + user_endpoint = get_graph_endpoint(f"/users/{user_id}") headers = { "Authorization": f"Bearer {token}", diff --git a/application/single_app/route_backend_settings.py b/application/single_app/route_backend_settings.py index 7be73134..aefb2f12 100644 --- a/application/single_app/route_backend_settings.py +++ b/application/single_app/route_backend_settings.py @@ -706,9 +706,18 @@ def _test_redis_connection(payload): cache_endpoint = get_redis_cache_infrastructure_endpoint(redis_hostname) token = credential.get_token(cache_endpoint) redis_password = token.token + elif redis_auth_type == 'key_vault': + if not redis_key: + return jsonify({'error': 'Key Vault secret name is required for Key Vault authentication'}), 400 + try: + from functions_keyvault import retrieve_secret_direct + redis_password = retrieve_secret_direct(redis_key) + except Exception as kv_err: + log_event(f"[REDIS_TEST] Key Vault retrieval failed for secret '{redis_key}': {str(kv_err)}", level="error") + return jsonify({'error': 'Failed to retrieve Redis key from Key Vault. Check Application Insights using "[REDIS_TEST]" for details.'}), 500 else: if not redis_key: - return jsonify({'error': 'Redis key is required for key auth'}), 400 + return jsonify({'error': 'Redis key is required for key authentication'}), 400 redis_password = redis_key r = redis.Redis( @@ -1043,4 +1052,4 @@ def _test_key_vault_connection(payload): except Exception as e: log_event(f"[AKV_TEST] Key Vault connection error: {str(e)}", level="error") - return jsonify({'error': f'Key Vault connection error. Check Application Insights using "[AKV_TEST]" for details.'}), 500 \ No newline at end of file + return jsonify({'error': 'Key Vault connection failed. Check Application Insights using "[AKV_TEST]" for details.'}), 500 \ No newline at end of file diff --git a/application/single_app/route_backend_thoughts.py b/application/single_app/route_backend_thoughts.py new file mode 100644 index 00000000..a7624a3f --- /dev/null +++ b/application/single_app/route_backend_thoughts.py @@ -0,0 +1,80 @@ +# route_backend_thoughts.py + +from flask import request, jsonify +from functions_authentication import login_required, user_required, get_current_user_id +from functions_settings import get_settings +from functions_thoughts import get_thoughts_for_message, get_pending_thoughts +from swagger_wrapper import swagger_route, get_auth_security +from functions_appinsights import log_event + + +def register_route_backend_thoughts(app): + + @app.route('/api/conversations//messages//thoughts', methods=['GET']) + @swagger_route(security=get_auth_security()) + @login_required + @user_required + def api_get_message_thoughts(conversation_id, message_id): + """Return persisted thoughts for a specific assistant message.""" + user_id = get_current_user_id() + if not user_id: + return jsonify({'error': 'User not authenticated'}), 401 + + settings = get_settings() + if not settings.get('enable_thoughts', False): + return jsonify({'thoughts': [], 'enabled': False}), 200 + + try: + thoughts = get_thoughts_for_message(conversation_id, message_id, user_id) + # Strip internal Cosmos fields before returning + sanitized = [] + for t in thoughts: + sanitized.append({ + 'id': t.get('id'), + 'step_index': t.get('step_index'), + 'step_type': t.get('step_type'), + 'content': t.get('content'), + 'detail': t.get('detail'), + 'duration_ms': t.get('duration_ms'), + 'timestamp': t.get('timestamp') + }) + return jsonify({'thoughts': sanitized, 'enabled': True}), 200 + except Exception as e: + log_event(f"api_get_message_thoughts error: {e}", level="WARNING") + return jsonify({'error': 'Failed to retrieve thoughts'}), 500 + + @app.route('/api/conversations//thoughts/pending', methods=['GET']) + @swagger_route(security=get_auth_security()) + @login_required + @user_required + def api_get_pending_thoughts(conversation_id): + """Return the latest in-progress thoughts for a conversation. + + Used by the non-streaming frontend to poll for thought updates + while waiting for the chat response. + """ + user_id = get_current_user_id() + if not user_id: + return jsonify({'error': 'User not authenticated'}), 401 + + settings = get_settings() + if not settings.get('enable_thoughts', False): + return jsonify({'thoughts': [], 'enabled': False}), 200 + + try: + thoughts = get_pending_thoughts(conversation_id, user_id) + sanitized = [] + for t in thoughts: + sanitized.append({ + 'id': t.get('id'), + 'step_index': t.get('step_index'), + 'step_type': t.get('step_type'), + 'content': t.get('content'), + 'detail': t.get('detail'), + 'duration_ms': t.get('duration_ms'), + 'timestamp': t.get('timestamp') + }) + return jsonify({'thoughts': sanitized, 'enabled': True}), 200 + except Exception as e: + log_event(f"api_get_pending_thoughts error: {e}", level="WARNING") + return jsonify({'error': 'Failed to retrieve pending thoughts'}), 500 diff --git a/application/single_app/route_backend_user_agreement.py b/application/single_app/route_backend_user_agreement.py index f46559ff..b76213b3 100644 --- a/application/single_app/route_backend_user_agreement.py +++ b/application/single_app/route_backend_user_agreement.py @@ -130,7 +130,7 @@ def api_accept_user_agreement(): return jsonify({"error": "workspace_id and workspace_type are required"}), 400 # Validate workspace type - valid_types = ["personal", "group", "public"] + valid_types = ["personal", "group", "public", "chat"] if workspace_type not in valid_types: return jsonify({"error": f"Invalid workspace_type. Must be one of: {', '.join(valid_types)}"}), 400 diff --git a/application/single_app/route_backend_users.py b/application/single_app/route_backend_users.py index d2ca52f8..ad3ae5f4 100644 --- a/application/single_app/route_backend_users.py +++ b/application/single_app/route_backend_users.py @@ -24,12 +24,7 @@ def api_user_search(): if not token: return jsonify({"error": "Could not acquire access token"}), 401 - if AZURE_ENVIRONMENT == "usgovernment": - user_endpoint = "https://graph.microsoft.us/v1.0/users" - elif AZURE_ENVIRONMENT == "custom": - user_endpoint = CUSTOM_GRAPH_URL_VALUE - else: - user_endpoint = "https://graph.microsoft.com/v1.0/users" + user_endpoint = get_graph_endpoint("/users") headers = { "Authorization": f"Bearer {token}", diff --git a/application/single_app/route_enhanced_citations.py b/application/single_app/route_enhanced_citations.py index c81ef225..44fc4fcb 100644 --- a/application/single_app/route_enhanced_citations.py +++ b/application/single_app/route_enhanced_citations.py @@ -8,6 +8,7 @@ import requests import mimetypes import io +import pandas from functions_authentication import login_required, user_required, get_current_user_id from functions_settings import get_settings, enabled_required @@ -15,7 +16,7 @@ from functions_group import get_user_groups from functions_public_workspaces import get_user_visible_public_workspace_ids_from_settings from swagger_wrapper import swagger_route, get_auth_security -from config import CLIENTS, storage_account_user_documents_container_name, storage_account_group_documents_container_name, storage_account_public_documents_container_name, IMAGE_EXTENSIONS, VIDEO_EXTENSIONS, AUDIO_EXTENSIONS +from config import CLIENTS, storage_account_user_documents_container_name, storage_account_group_documents_container_name, storage_account_public_documents_container_name, storage_account_personal_chat_container_name, IMAGE_EXTENSIONS, VIDEO_EXTENSIONS, AUDIO_EXTENSIONS, TABULAR_EXTENSIONS, cosmos_messages_container, cosmos_conversations_container from functions_debug import debug_print def register_enhanced_citations_routes(app): @@ -183,6 +184,274 @@ def get_enhanced_citation_pdf(): except Exception as e: return jsonify({"error": str(e)}), 500 + @app.route("/api/enhanced_citations/tabular", methods=["GET"]) + @swagger_route(security=get_auth_security()) + @login_required + @user_required + @enabled_required("enable_enhanced_citations") + def get_enhanced_citation_tabular(): + """ + Serve original tabular file (CSV, XLSX, etc.) from blob storage for download. + Used for chat-uploaded tabular files stored in blob storage. + """ + conversation_id = request.args.get("conversation_id") + file_id = request.args.get("file_id") + + if not conversation_id or not file_id: + return jsonify({"error": "conversation_id and file_id are required"}), 400 + + user_id = get_current_user_id() + if not user_id: + return jsonify({"error": "User not authenticated"}), 401 + + try: + # Verify the current user owns the conversation + try: + conversation = cosmos_conversations_container.read_item( + item=conversation_id, + partition_key=conversation_id + ) + except Exception: + return jsonify({"error": "Conversation not found"}), 404 + + if conversation.get('user_id') != user_id: + return jsonify({"error": "Forbidden"}), 403 + + # Look up the file message in Cosmos to get blob reference + query_str = """ + SELECT * FROM c + WHERE c.conversation_id = @conversation_id + AND c.id = @file_id + """ + items = list(cosmos_messages_container.query_items( + query=query_str, + parameters=[ + {'name': '@conversation_id', 'value': conversation_id}, + {'name': '@file_id', 'value': file_id} + ], + partition_key=conversation_id + )) + + if not items: + return jsonify({"error": "File not found"}), 404 + + file_msg = items[0] + file_content_source = file_msg.get('file_content_source', '') + + if file_content_source != 'blob': + return jsonify({"error": "File is not stored in blob storage"}), 400 + + blob_container = file_msg.get('blob_container', '') + blob_path = file_msg.get('blob_path', '') + filename = file_msg.get('filename', 'download') + + if not blob_container or not blob_path: + return jsonify({"error": "Blob reference is incomplete"}), 500 + + blob_service_client = CLIENTS.get("storage_account_office_docs_client") + if not blob_service_client: + return jsonify({"error": "Storage not available"}), 500 + + blob_client = blob_service_client.get_blob_client( + container=blob_container, + blob=blob_path + ) + stream = blob_client.download_blob() + content = stream.readall() + + # Determine content type + content_type, _ = mimetypes.guess_type(filename) + if not content_type: + content_type = 'application/octet-stream' + + return Response( + content, + content_type=content_type, + headers={ + 'Content-Length': str(len(content)), + 'Content-Disposition': f'attachment; filename="{filename}"', + 'Cache-Control': 'private, max-age=300', + } + ) + + except Exception as e: + debug_print(f"Error serving tabular citation: {e}") + return jsonify({"error": str(e)}), 500 + + @app.route("/api/enhanced_citations/tabular_workspace", methods=["GET"]) + @swagger_route(security=get_auth_security()) + @login_required + @user_required + @enabled_required("enable_enhanced_citations") + def get_enhanced_citation_tabular_workspace(): + """ + Serve tabular file (CSV, XLSX, etc.) from blob storage for workspace documents. + Uses doc_id to look up the document across personal, group, and public workspaces. + """ + doc_id = request.args.get("doc_id") + if not doc_id: + return jsonify({"error": "doc_id is required"}), 400 + + user_id = get_current_user_id() + if not user_id: + return jsonify({"error": "User not authenticated"}), 401 + + try: + doc_response, status_code = get_document(user_id, doc_id) + if status_code != 200: + return doc_response, status_code + + raw_doc = doc_response.get_json() + file_name = raw_doc.get('file_name', '') + ext = file_name.lower().split('.')[-1] if '.' in file_name else '' + + if ext not in ('csv', 'xlsx', 'xls', 'xlsm'): + return jsonify({"error": "File is not a tabular file"}), 400 + + return serve_enhanced_citation_content(raw_doc, force_download=True) + + except Exception as e: + debug_print(f"Error serving tabular workspace citation: {e}") + return jsonify({"error": str(e)}), 500 + + @app.route("/api/enhanced_citations/tabular_preview", methods=["GET"]) + @swagger_route(security=get_auth_security()) + @login_required + @user_required + @enabled_required("enable_enhanced_citations") + def get_enhanced_citation_tabular_preview(): + """ + Return JSON preview of a tabular file for rendering as an HTML table. + Reads the file into a pandas DataFrame and returns columns + rows as JSON. + """ + doc_id = request.args.get("doc_id") + sheet_name = request.args.get("sheet_name") + sheet_index = request.args.get("sheet_index") + max_rows = min(request.args.get("max_rows", 200, type=int), 500) + if not doc_id: + return jsonify({"error": "doc_id is required"}), 400 + + user_id = get_current_user_id() + if not user_id: + return jsonify({"error": "User not authenticated"}), 401 + + try: + doc_response, status_code = get_document(user_id, doc_id) + if status_code != 200: + return doc_response, status_code + + raw_doc = doc_response.get_json() + file_name = raw_doc.get('file_name', '') + ext = file_name.lower().rsplit('.', 1)[-1] if '.' in file_name else '' + if ext not in ('csv', 'xlsx', 'xls', 'xlsm'): + return jsonify({"error": "File is not a tabular file"}), 400 + + # Download blob with size cap to protect memory + settings = get_settings() + max_blob_size = int(settings.get('tabular_preview_max_blob_size_mb', 200)) * 1024 * 1024 + workspace_type, container_name = determine_workspace_type_and_container(raw_doc) + blob_name = get_blob_name(raw_doc, workspace_type) + blob_service_client = CLIENTS.get("storage_account_office_docs_client") + if not blob_service_client: + return jsonify({"error": "Blob storage client not available"}), 500 + blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name) + blob_props = blob_client.get_blob_properties() + if blob_props.size > max_blob_size: + return jsonify({"error": "File is too large to preview"}), 400 + data = blob_client.download_blob().readall() + + # Read into DataFrame, limiting rows for preview efficiency + # Read max_rows + 1 so we can detect truncation without loading the full file + nrows_limit = max_rows + 1 + selected_sheet = None + sheet_names = [] + if ext == 'csv': + df = pandas.read_csv(io.BytesIO(data), keep_default_na=False, dtype=str, nrows=nrows_limit) + elif ext in ('xlsx', 'xlsm'): + excel_file = pandas.ExcelFile(io.BytesIO(data), engine='openpyxl') + sheet_names = list(excel_file.sheet_names) + if not sheet_names: + return jsonify({"error": "Workbook does not contain any readable sheets"}), 400 + + if sheet_name: + requested_sheet_name = sheet_name.strip() + matching_sheet_name = next( + (candidate for candidate in sheet_names if candidate.lower() == requested_sheet_name.lower()), + None, + ) + if not matching_sheet_name: + return jsonify({ + "error": f"Sheet '{requested_sheet_name}' was not found. Available sheets: {sheet_names}" + }), 400 + selected_sheet = matching_sheet_name + elif sheet_index not in (None, ''): + try: + resolved_sheet_index = int(sheet_index) + except ValueError: + return jsonify({"error": "sheet_index must be an integer"}), 400 + if resolved_sheet_index < 0 or resolved_sheet_index >= len(sheet_names): + return jsonify({ + "error": f"sheet_index {resolved_sheet_index} is out of range. Available sheets: {sheet_names}" + }), 400 + selected_sheet = sheet_names[resolved_sheet_index] + else: + selected_sheet = sheet_names[0] + + df = excel_file.parse(selected_sheet, keep_default_na=False, dtype=str, nrows=nrows_limit) + elif ext == 'xls': + excel_file = pandas.ExcelFile(io.BytesIO(data), engine='xlrd') + sheet_names = list(excel_file.sheet_names) + if not sheet_names: + return jsonify({"error": "Workbook does not contain any readable sheets"}), 400 + + if sheet_name: + requested_sheet_name = sheet_name.strip() + matching_sheet_name = next( + (candidate for candidate in sheet_names if candidate.lower() == requested_sheet_name.lower()), + None, + ) + if not matching_sheet_name: + return jsonify({ + "error": f"Sheet '{requested_sheet_name}' was not found. Available sheets: {sheet_names}" + }), 400 + selected_sheet = matching_sheet_name + elif sheet_index not in (None, ''): + try: + resolved_sheet_index = int(sheet_index) + except ValueError: + return jsonify({"error": "sheet_index must be an integer"}), 400 + if resolved_sheet_index < 0 or resolved_sheet_index >= len(sheet_names): + return jsonify({ + "error": f"sheet_index {resolved_sheet_index} is out of range. Available sheets: {sheet_names}" + }), 400 + selected_sheet = sheet_names[resolved_sheet_index] + else: + selected_sheet = sheet_names[0] + + df = excel_file.parse(selected_sheet, keep_default_na=False, dtype=str, nrows=nrows_limit) + else: + return jsonify({"error": f"Unsupported file type: {ext}"}), 400 + + total_rows = len(df) + truncated = total_rows > max_rows + preview = df.head(max_rows) + + return jsonify({ + "filename": file_name, + "selected_sheet": selected_sheet, + "sheet_names": sheet_names, + "sheet_count": len(sheet_names), + "total_rows": total_rows if not truncated else None, + "total_columns": len(df.columns), + "columns": list(df.columns), + "rows": preview.values.tolist(), + "truncated": truncated + }) + + except Exception as e: + debug_print(f"Error generating tabular preview: {e}") + return jsonify({"error": str(e)}), 500 + def get_document(user_id, doc_id): """ Get document metadata - searches across all enabled workspace types diff --git a/application/single_app/route_frontend_admin_settings.py b/application/single_app/route_frontend_admin_settings.py index 578e1545..b9e69c51 100644 --- a/application/single_app/route_frontend_admin_settings.py +++ b/application/single_app/route_frontend_admin_settings.py @@ -9,10 +9,24 @@ from swagger_wrapper import swagger_route, get_auth_security from datetime import datetime, timedelta +ALLOWED_PIL_IMAGE_UPLOAD_FORMATS = ('PNG', 'JPEG') + def allowed_file(filename, allowed_extensions): return '.' in filename and \ filename.rsplit('.', 1)[1].lower() in allowed_extensions +def open_allowed_uploaded_image(file_bytes, filename): + img = Image.open(BytesIO(file_bytes), formats=list(ALLOWED_PIL_IMAGE_UPLOAD_FORMATS)) + img.load() + + detected_format = (img.format or '').upper() + if detected_format not in ALLOWED_PIL_IMAGE_UPLOAD_FORMATS: + raise ValueError( + f"Unsupported image format for {filename}. Allowed formats: {', '.join(ALLOWED_PIL_IMAGE_UPLOAD_FORMATS)}" + ) + + return img, detected_format + def register_route_frontend_admin_settings(app): @app.route('/admin/settings', methods=['GET', 'POST']) @swagger_route(security=get_auth_security()) @@ -98,6 +112,8 @@ def admin_settings(): settings['enable_text_plugin'] = False if 'enable_fact_memory_plugin' not in settings: settings['enable_fact_memory_plugin'] = False + if 'enable_tabular_processing_plugin' not in settings: + settings['enable_tabular_processing_plugin'] = False if 'enable_default_embedding_model_plugin' not in settings: settings['enable_default_embedding_model_plugin'] = False if 'enable_multi_agent_orchestration' not in settings: @@ -787,6 +803,7 @@ def is_valid_url(url): 'enable_enhanced_citations': enable_enhanced_citations, 'enable_enhanced_citations_mount': form_data.get('enable_enhanced_citations_mount') == 'on' and enable_enhanced_citations, 'enhanced_citations_mount': form_data.get('enhanced_citations_mount', '/view_documents').strip(), + 'tabular_preview_max_blob_size_mb': int(form_data.get('tabular_preview_max_blob_size_mb', 200)), 'office_docs_storage_account_blob_endpoint': office_docs_storage_account_blob_endpoint, 'office_docs_storage_account_url': office_docs_storage_account_url, 'office_docs_authentication_type': form_data.get('office_docs_authentication_type', 'key'), @@ -809,9 +826,10 @@ def is_valid_url(url): 'require_member_of_safety_violation_admin': require_member_of_safety_violation_admin, # ADDED 'require_member_of_feedback_admin': require_member_of_feedback_admin, # ADDED - # Feedback & Archiving + # Feedback, Archiving & Thoughts 'enable_user_feedback': form_data.get('enable_user_feedback') == 'on', 'enable_conversation_archiving': form_data.get('enable_conversation_archiving') == 'on', + 'enable_thoughts': form_data.get('enable_thoughts') == 'on', # Search (Web Search via Azure AI Foundry agent) 'enable_web_search': enable_web_search, @@ -869,6 +887,7 @@ def is_valid_url(url): 'max_file_size_mb': max_file_size_mb, 'conversation_history_limit': conversation_history_limit, 'default_system_prompt': form_data.get('default_system_prompt', '').strip(), + 'access_denied_message': form_data.get('access_denied_message', settings.get('access_denied_message', '')).strip(), # Video file settings with Azure Video Indexer Settings 'video_indexer_endpoint': form_data.get('video_indexer_endpoint', video_indexer_endpoint).strip(), @@ -938,13 +957,12 @@ def is_valid_url(url): ) # 3) Load into Pillow from the original bytes for processing - in_memory_for_process = BytesIO(file_bytes) # Use original bytes - img = Image.open(in_memory_for_process) + img, detected_format = open_allowed_uploaded_image(file_bytes, logo_file.filename) add_file_task_to_file_processing_log( document_id='Image_Upload', # Placeholder if needed user_id='New_image', - content=f"Loaded image for processing: {logo_file.filename}" + content=f"Loaded image for processing: {logo_file.filename} (format: {detected_format})" ) # Ensure image mode is compatible (e.g., convert palette modes) @@ -1021,13 +1039,12 @@ def is_valid_url(url): ) # 2) Load into Pillow from the original bytes for processing - in_memory_for_process = BytesIO(file_bytes) # Use original bytes - img = Image.open(in_memory_for_process) + img, detected_format = open_allowed_uploaded_image(file_bytes, logo_dark_file.filename) add_file_task_to_file_processing_log( document_id='Image_Upload', # Placeholder if needed user_id='New_image', - content=f"Loaded dark mode logo image for processing: {logo_dark_file.filename}" + content=f"Loaded dark mode logo image for processing: {logo_dark_file.filename} (format: {detected_format})" ) # 3) Ensure image mode is compatible (e.g., convert palette modes) @@ -1103,13 +1120,12 @@ def is_valid_url(url): ) # 2) Load into Pillow from the original bytes for processing - in_memory_for_process = BytesIO(file_bytes) # Use original bytes - img = Image.open(in_memory_for_process) + img, detected_format = open_allowed_uploaded_image(file_bytes, favicon_file.filename) add_file_task_to_file_processing_log( document_id='Image_Upload', # Placeholder if needed user_id='New_image', - content=f"Loaded favicon image for processing: {favicon_file.filename}" + content=f"Loaded favicon image for processing: {favicon_file.filename} (format: {detected_format})" ) # 3) Ensure image mode is compatible (e.g., convert palette modes) diff --git a/application/single_app/route_frontend_chats.py b/application/single_app/route_frontend_chats.py index ca0feb1a..67a41879 100644 --- a/application/single_app/route_frontend_chats.py +++ b/application/single_app/route_frontend_chats.py @@ -237,8 +237,33 @@ def upload_file(): # Handle XML, YAML, and LOG files as text for inline chat extracted_content = extract_text_file(temp_file_path) elif file_ext_nodot in TABULAR_EXTENSIONS: - extracted_content = extract_table_file(temp_file_path, file_ext) is_table = True + + # Upload tabular file to blob storage for tabular processing plugin access + if settings.get('enable_enhanced_citations', False): + try: + blob_service_client = CLIENTS.get("storage_account_office_docs_client") + if blob_service_client: + blob_path = f"{user_id}/{conversation_id}/{filename}" + blob_client = blob_service_client.get_blob_client( + container=storage_account_personal_chat_container_name, + blob=blob_path + ) + metadata = { + "conversation_id": str(conversation_id), + "user_id": str(user_id) + } + with open(temp_file_path, "rb") as blob_f: + blob_client.upload_blob(blob_f, overwrite=True, metadata=metadata) + log_event(f"Uploaded chat tabular file to blob storage: {blob_path}") + except Exception as blob_err: + log_event( + f"Warning: Failed to upload chat tabular file to blob storage: {blob_err}", + level=logging.WARNING + ) + else: + # Only extract content for Cosmos storage when enhanced citations is disabled + extracted_content = extract_table_file(temp_file_path, file_ext) else: return jsonify({'error': 'Unsupported file type'}), 400 @@ -395,25 +420,50 @@ def upload_file(): current_thread_id = str(uuid.uuid4()) - file_message = { - 'id': file_message_id, - 'conversation_id': conversation_id, - 'role': 'file', - 'filename': filename, - 'file_content': extracted_content, - 'is_table': is_table, - 'timestamp': datetime.utcnow().isoformat(), - 'model_deployment_name': None, - 'metadata': { - 'thread_info': { - 'thread_id': current_thread_id, - 'previous_thread_id': previous_thread_id, - 'active_thread': True, - 'thread_attempt': 1 + # When enhanced citations is enabled and file is tabular, store a lightweight + # reference without file_content to avoid Cosmos DB size limits. + # The tabular data lives in blob storage and is served from there. + if is_table and settings.get('enable_enhanced_citations', False): + file_message = { + 'id': file_message_id, + 'conversation_id': conversation_id, + 'role': 'file', + 'filename': filename, + 'is_table': is_table, + 'file_content_source': 'blob', + 'blob_container': storage_account_personal_chat_container_name, + 'blob_path': f"{user_id}/{conversation_id}/{filename}", + 'timestamp': datetime.utcnow().isoformat(), + 'model_deployment_name': None, + 'metadata': { + 'thread_info': { + 'thread_id': current_thread_id, + 'previous_thread_id': previous_thread_id, + 'active_thread': True, + 'thread_attempt': 1 + } } } - } - + else: + file_message = { + 'id': file_message_id, + 'conversation_id': conversation_id, + 'role': 'file', + 'filename': filename, + 'file_content': extracted_content, + 'is_table': is_table, + 'timestamp': datetime.utcnow().isoformat(), + 'model_deployment_name': None, + 'metadata': { + 'thread_info': { + 'thread_id': current_thread_id, + 'previous_thread_id': previous_thread_id, + 'active_thread': True, + 'thread_attempt': 1 + } + } + } + # Add vision analysis if available if vision_analysis: file_message['vision_analysis'] = vision_analysis diff --git a/application/single_app/route_openapi.py b/application/single_app/route_openapi.py index 238e9a4c..16b08550 100644 --- a/application/single_app/route_openapi.py +++ b/application/single_app/route_openapi.py @@ -2,19 +2,17 @@ """ OpenAPI Plugin Routes -This module provides routes for managing OpenAPI plugin file uploads and URL validation. +This module provides routes for managing OpenAPI plugin file uploads. """ import os import tempfile import uuid from flask import request, jsonify, current_app -from werkzeug.utils import secure_filename from functions_authentication import login_required, user_required from openapi_security import openapi_validator from openapi_auth_analyzer import analyze_openapi_authentication, get_authentication_help_text from swagger_wrapper import swagger_route, get_auth_security -from functions_security import is_valid_storage_name from functions_debug import debug_print def register_openapi_routes(app): @@ -136,214 +134,6 @@ def upload_openapi_spec(): 'error': 'Internal server error during upload' }), 500 - @app.route('/api/openapi/validate-url', methods=['POST']) - @swagger_route(security=get_auth_security()) - @login_required - @user_required - def validate_openapi_url(): - """ - Validate and download an OpenAPI specification from a URL. - - Expected JSON data: - - url: The URL to the OpenAPI specification - - Returns: - - success: Boolean indicating if validation was successful - - file_id: The unique file ID for the stored specification - - api_info: Basic information about the OpenAPI spec - - error: Error message if validation failed - """ - try: - data = request.get_json() - if not data or 'url' not in data: - return jsonify({ - 'success': False, - 'error': 'URL is required' - }), 400 - - url = data['url'].strip() - if not url: - return jsonify({ - 'success': False, - 'error': 'URL cannot be empty' - }), 400 - - # Validate URL and fetch content - valid, spec, error = openapi_validator.validate_url_content(url) - - if not valid: - return jsonify({ - 'success': False, - 'error': f'Validation failed: {error}' - }), 400 - - # Generate filename from URL or spec title - info = spec.get('info', {}) - title = info.get('title', 'openapi_spec') - # Sanitize title for filename - title = secure_filename(title) or 'openapi_spec' - safe_filename = f"{title}.yaml" - - # Create secure storage directory - upload_dir = os.path.join(current_app.instance_path, 'openapi_specs') - os.makedirs(upload_dir, exist_ok=True) - - # Generate unique filename to prevent conflicts - unique_id = str(uuid.uuid4())[:8] - base_name, ext = os.path.splitext(safe_filename) - stored_filename = f"{base_name}_{unique_id}{ext}" - if not is_valid_storage_name(stored_filename): - return jsonify({ - 'success': False, - 'error': 'Invalid storage filename' - }), 400 - storage_path = os.path.join(upload_dir, stored_filename) - - # Save spec to file - import yaml - with open(storage_path, 'w', encoding='utf-8') as f: - yaml.dump(spec, f, default_flow_style=False, allow_unicode=True) - - # Extract basic spec information - api_info = { - 'title': info.get('title', 'Unknown API'), - 'description': info.get('description', ''), - 'version': info.get('version', ''), - 'openapi_version': spec.get('openapi', ''), - 'servers': spec.get('servers', []), - 'paths_count': len(spec.get('paths', {})), - 'components_count': len(spec.get('components', {})), - 'source_url': url - } - - # Analyze authentication schemes - auth_analysis = analyze_openapi_authentication(spec) - - return jsonify({ - 'success': True, - 'file_id': stored_filename, - 'api_info': api_info, - 'spec_content': spec, # Include the spec content for frontend processing - 'authentication': auth_analysis - }) - - except Exception as e: - debug_print(f"Error validating OpenAPI URL: {str(e)}") - return jsonify({ - 'success': False, - 'error': 'Internal server error during validation' - }), 500 - - @app.route('/api/openapi/download-from-url', methods=['POST']) - @swagger_route(security=get_auth_security()) - @login_required - @user_required - def download_openapi_from_url(): - """ - Download and store an OpenAPI specification from a URL. - - Expected JSON data: - - url: The URL to the OpenAPI specification - - filename: Optional custom filename (will be sanitized) - - Returns: - - success: Boolean indicating if download was successful - - filename: The secure filename used for storage - - storage_path: Path where the file was stored - - spec_info: Basic information about the OpenAPI spec - - error: Error message if download failed - """ - try: - data = request.get_json() - if not data or 'url' not in data: - return jsonify({ - 'success': False, - 'error': 'URL is required' - }), 400 - - url = data['url'].strip() - custom_filename = data.get('filename', '').strip() - - if not url: - return jsonify({ - 'success': False, - 'error': 'URL cannot be empty' - }), 400 - - # Validate URL and fetch content - valid, spec, error = openapi_validator.validate_url_content(url) - - if not valid: - return jsonify({ - 'success': False, - 'error': f'Validation failed: {error}' - }), 400 - - # Determine filename - if custom_filename: - # Validate custom filename - filename_valid, filename_error = openapi_validator.validate_filename(custom_filename) - if not filename_valid: - return jsonify({ - 'success': False, - 'error': f'Invalid custom filename: {filename_error}' - }), 400 - safe_filename = openapi_validator.create_safe_filename(custom_filename) - else: - # Generate filename from URL or spec title - info = spec.get('info', {}) - title = info.get('title', 'openapi_spec') - # Sanitize title for filename - title = secure_filename(title) or 'openapi_spec' - safe_filename = f"{title}.yaml" - - # Create secure storage directory - upload_dir = os.path.join(current_app.instance_path, 'openapi_specs') - os.makedirs(upload_dir, exist_ok=True) - - # Generate unique filename to prevent conflicts - unique_id = str(uuid.uuid4())[:8] - base_name, ext = os.path.splitext(safe_filename) - stored_filename = f"{base_name}_{unique_id}{ext}" - if not is_valid_storage_name(stored_filename): - return jsonify({ - 'success': False, - 'error': 'Invalid storage filename' - }), 400 - storage_path = os.path.join(upload_dir, stored_filename) - - # Save spec to file - import yaml - with open(storage_path, 'w', encoding='utf-8') as f: - yaml.dump(spec, f, default_flow_style=False, allow_unicode=True) - - # Extract basic spec information - info = spec.get('info', {}) - spec_info = { - 'title': info.get('title', 'Unknown API'), - 'description': info.get('description', ''), - 'version': info.get('version', ''), - 'openapi_version': spec.get('openapi', ''), - 'servers': spec.get('servers', []), - 'paths_count': len(spec.get('paths', {})), - 'components_count': len(spec.get('components', {})), - 'source_url': url - } - - return jsonify({ - 'success': True, - 'filename': stored_filename, - 'storage_path': storage_path, - 'spec_info': spec_info - }) - - except Exception as e: - debug_print(f"Error downloading OpenAPI spec from URL: {str(e)}") - return jsonify({ - 'success': False, - 'error': 'Internal server error during download' - }), 500 - @app.route('/api/openapi/list-uploaded', methods=['GET']) @swagger_route(security=get_auth_security()) @login_required diff --git a/application/single_app/semantic_kernel_loader.py b/application/single_app/semantic_kernel_loader.py index 78f54203..c2a7cc1e 100644 --- a/application/single_app/semantic_kernel_loader.py +++ b/application/single_app/semantic_kernel_loader.py @@ -19,6 +19,7 @@ from semantic_kernel.functions.kernel_plugin import KernelPlugin from semantic_kernel_plugins.embedding_model_plugin import EmbeddingModelPlugin from semantic_kernel_plugins.fact_memory_plugin import FactMemoryPlugin +from semantic_kernel_plugins.tabular_processing_plugin import TabularProcessingPlugin from functions_settings import get_settings, get_user_settings from foundry_agent_runtime import AzureAIFoundryChatCompletionAgent from functions_appinsights import log_event, get_appinsights_logger @@ -408,6 +409,13 @@ def load_embedding_model_plugin(kernel: Kernel, settings): description="Provides text embedding functions using the configured embedding model." ) +def load_tabular_processing_plugin(kernel: Kernel): + kernel.add_plugin( + TabularProcessingPlugin(), + plugin_name="tabular_processing", + description="Provides data analysis on tabular files (CSV, XLSX) stored in blob storage. Can list files, describe schemas, aggregate columns, filter rows, run queries, and perform group-by operations." + ) + def load_core_plugins_only(kernel: Kernel, settings): """Load only core plugins for model-only conversations without agents.""" debug_print(f"[SK Loader] Loading core plugins only for model-only mode...") @@ -429,6 +437,10 @@ def load_core_plugins_only(kernel: Kernel, settings): load_text_plugin(kernel) log_event("[SK Loader] Loaded Text plugin.", level=logging.INFO) + if settings.get('enable_tabular_processing_plugin', False) and settings.get('enable_enhanced_citations', False): + load_tabular_processing_plugin(kernel) + log_event("[SK Loader] Loaded Tabular Processing plugin.", level=logging.INFO) + # =================== Semantic Kernel Initialization =================== def initialize_semantic_kernel(user_id: str=None, redis_client=None): debug_print(f"[SK Loader] Initializing Semantic Kernel and plugins...") @@ -721,6 +733,195 @@ def normalize(s): print(f"[SK Loader] Error loading agent-specific plugins: {e}") log_event(f"[SK Loader] Error loading agent-specific plugins: {e}", level=logging.ERROR, exceptionTraceback=True) + +def _extract_sql_schema_for_instructions(kernel) -> str: + """ + Check if any SQL Schema plugins are loaded in the kernel and extract their schema + information to inject into agent instructions. + + Returns a formatted schema summary string, or empty string if no SQL schema plugins found. + """ + from semantic_kernel_plugins.sql_schema_plugin import SQLSchemaPlugin + + schema_parts = [] + + try: + # Iterate through all registered plugins in the kernel + for plugin_name, plugin in kernel.plugins.items(): + # Check if the underlying plugin object is a SQLSchemaPlugin + # Kernel plugins wrap the original object - we need to check the underlying instance + plugin_obj = None + + # Try to access the underlying plugin instance + if isinstance(plugin, SQLSchemaPlugin): + plugin_obj = plugin + elif hasattr(plugin, '_plugin_instance'): + if isinstance(plugin._plugin_instance, SQLSchemaPlugin): + plugin_obj = plugin._plugin_instance + else: + # Check if any function in this plugin belongs to a SQLSchemaPlugin + for func_name, func in plugin.functions.items(): + if hasattr(func, 'method') and hasattr(func.method, '__self__'): + if isinstance(func.method.__self__, SQLSchemaPlugin): + plugin_obj = func.method.__self__ + break + + if plugin_obj is not None: + print(f"[SK Loader] Found SQL Schema plugin: {plugin_name}, fetching schema...") + try: + schema_result = plugin_obj.get_database_schema() + if schema_result and hasattr(schema_result, 'data'): + schema_data = schema_result.data + else: + schema_data = schema_result + + if isinstance(schema_data, dict) and "tables" in schema_data: + db_name = schema_data.get("database_name", "Unknown") + db_type = schema_data.get("database_type", "Unknown") + + schema_text = f"### Database: {db_name} ({db_type})\n\n" + + for table_name, table_info in schema_data["tables"].items(): + schema_name = table_info.get("schema_name", "dbo") + qualified_name = f"{schema_name}.{table_name}" if schema_name else table_name + schema_text += f"**Table: {qualified_name}**\n" + + columns = table_info.get("columns", []) + if columns: + schema_text += "| Column | Type | Nullable |\n|--------|------|----------|\n" + for col in columns: + col_name = col.get("column_name", "?") + col_type = col.get("data_type", "?") + nullable = "Yes" if col.get("is_nullable", True) else "No" + schema_text += f"| {col_name} | {col_type} | {nullable} |\n" + + pks = table_info.get("primary_keys", []) + if pks: + schema_text += f"Primary Key(s): {', '.join(pks)}\n" + + schema_text += "\n" + + # Add relationships + relationships = schema_data.get("relationships", []) + if relationships: + schema_text += "**Relationships (Foreign Keys):**\n" + for rel in relationships: + parent = rel.get("parent_table", "?") + parent_col = rel.get("parent_column", "?") + ref = rel.get("referenced_table", "?") + ref_col = rel.get("referenced_column", "?") + schema_text += f"- {parent}.{parent_col} → {ref}.{ref_col}\n" + schema_text += "\n" + + schema_parts.append(schema_text) + print(f"[SK Loader] Successfully extracted schema for {db_name}: {len(schema_data['tables'])} tables") + else: + print(f"[SK Loader] Schema data for {plugin_name} was empty or had unexpected format") + + except Exception as e: + print(f"[SK Loader] Warning: Failed to fetch schema from {plugin_name}: {e}") + log_event(f"[SK Loader] Failed to fetch SQL schema for injection: {e}", + extra={"plugin_name": plugin_name, "error": str(e)}, + level=logging.WARNING) + except Exception as e: + print(f"[SK Loader] Warning: Error iterating kernel plugins for SQL schema: {e}") + log_event(f"[SK Loader] Error iterating kernel plugins for SQL schema: {e}", + extra={"error": str(e)}, level=logging.WARNING) + + # Fallback: If no SQLSchemaPlugin was found, check for SQLQueryPlugin instances + # and create a temporary SQLSchemaPlugin from their connection config to extract schema + if not schema_parts: + from semantic_kernel_plugins.sql_query_plugin import SQLQueryPlugin as _SQLQueryPlugin + + try: + for plugin_name, plugin in kernel.plugins.items(): + query_obj = None + + if isinstance(plugin, _SQLQueryPlugin): + query_obj = plugin + elif hasattr(plugin, '_plugin_instance'): + if isinstance(plugin._plugin_instance, _SQLQueryPlugin): + query_obj = plugin._plugin_instance + else: + for func_name, func in plugin.functions.items(): + if hasattr(func, 'method') and hasattr(func.method, '__self__'): + if isinstance(func.method.__self__, _SQLQueryPlugin): + query_obj = func.method.__self__ + break + + if query_obj is not None: + print(f"[SK Loader] Fallback: Found SQLQueryPlugin '{plugin_name}', creating temporary schema extractor...") + try: + temp_manifest = { + 'type': 'sql_schema', + 'name': f'{plugin_name}_temp_schema', + 'database_type': getattr(query_obj, 'database_type', 'azure_sql'), + 'server': getattr(query_obj, 'server', ''), + 'database': getattr(query_obj, 'database', ''), + 'username': getattr(query_obj, 'username', ''), + 'password': getattr(query_obj, 'password', ''), + 'driver': getattr(query_obj, 'driver', ''), + 'connection_string': getattr(query_obj, 'connection_string', ''), + } + temp_schema = SQLSchemaPlugin(temp_manifest) + schema_result = temp_schema.get_database_schema() + if schema_result and hasattr(schema_result, 'data'): + schema_data = schema_result.data + else: + schema_data = schema_result + + if isinstance(schema_data, dict) and "tables" in schema_data: + db_name = schema_data.get("database_name", "Unknown") + db_type = schema_data.get("database_type", "Unknown") + + schema_text = f"### Database: {db_name} ({db_type})\n\n" + + for table_name, table_info in schema_data["tables"].items(): + schema_name = table_info.get("schema_name", "dbo") + qualified_name = f"{schema_name}.{table_name}" if schema_name else table_name + schema_text += f"**Table: {qualified_name}**\n" + + columns = table_info.get("columns", []) + if columns: + schema_text += "| Column | Type | Nullable |\n|--------|------|----------|\n" + for col in columns: + col_name = col.get("column_name", "?") + col_type = col.get("data_type", "?") + nullable = "Yes" if col.get("is_nullable", True) else "No" + schema_text += f"| {col_name} | {col_type} | {nullable} |\n" + + pks = table_info.get("primary_keys", []) + if pks: + schema_text += f"Primary Key(s): {', '.join(pks)}\n" + + schema_text += "\n" + + relationships = schema_data.get("relationships", []) + if relationships: + schema_text += "**Relationships (Foreign Keys):**\n" + for rel in relationships: + parent = rel.get("parent_table", "?") + parent_col = rel.get("parent_column", "?") + ref = rel.get("referenced_table", "?") + ref_col = rel.get("referenced_column", "?") + schema_text += f"- {parent}.{parent_col} → {ref}.{ref_col}\n" + schema_text += "\n" + + schema_parts.append(schema_text) + print(f"[SK Loader] Fallback: Successfully extracted schema from SQLQueryPlugin '{plugin_name}': {len(schema_data['tables'])} tables") + except Exception as e: + print(f"[SK Loader] Fallback: Failed to extract schema from SQLQueryPlugin '{plugin_name}': {e}") + log_event(f"[SK Loader] Fallback schema extraction failed", + extra={"plugin_name": plugin_name, "error": str(e)}, + level=logging.WARNING) + except Exception as e: + print(f"[SK Loader] Warning: Error in fallback SQL schema extraction: {e}") + log_event(f"[SK Loader] Error in fallback SQL schema extraction: {e}", + extra={"error": str(e)}, level=logging.WARNING) + + return "\n".join(schema_parts) + + def load_single_agent_for_kernel(kernel, agent_cfg, settings, context_obj, redis_client=None, mode_label="global"): """ DRY helper to load a single agent (default agent) for the kernel. @@ -859,6 +1060,27 @@ def load_single_agent_for_kernel(kernel, agent_cfg, settings, context_obj, redis group_id=group_id, ) + # Auto-inject SQL database schema into agent instructions if SQL plugins are loaded + try: + sql_schema_summary = _extract_sql_schema_for_instructions(kernel) + if sql_schema_summary: + agent_config["instructions"] = ( + agent_config.get("instructions", "") + + "\n\n## Available Database Schema\n" + "The following database tables and columns are available for SQL queries. " + "ALWAYS use these exact table and column names when writing SQL queries.\n\n" + + sql_schema_summary + + "\n\nWhen a user asks a question about data, use the schema above to construct " + "the appropriate SQL query and execute it using the SQL Query plugin functions. " + "Do NOT ask the user for table or column names — use the schema provided above." + ) + print(f"[SK Loader] Injected SQL schema into agent instructions for {agent_config['name']}") + except Exception as e: + print(f"[SK Loader] Warning: Failed to inject SQL schema into instructions: {e}") + log_event(f"[SK Loader] Failed to inject SQL schema into agent instructions: {e}", + extra={"agent_name": agent_config["name"], "error": str(e)}, + level=logging.WARNING) + try: kwargs = { "name": agent_config["name"], @@ -1013,6 +1235,14 @@ def load_plugins_for_kernel(kernel, plugin_manifests, settings, mode_label="glob except Exception as e: log_event(f"[SK Loader] Failed to load Fact Memory Plugin: {e}", level=logging.WARNING) + # Register Tabular Processing Plugin if enabled (requires enhanced citations) + if settings.get('enable_tabular_processing_plugin', False) and settings.get('enable_enhanced_citations', False): + try: + load_tabular_processing_plugin(kernel) + log_event("[SK Loader] Loaded Tabular Processing plugin.", level=logging.INFO) + except Exception as e: + log_event(f"[SK Loader] Failed to load Tabular Processing plugin: {e}", level=logging.WARNING) + # Conditionally load static embedding model plugin if settings.get('enable_default_embedding_model_plugin', True): try: @@ -1357,7 +1587,11 @@ def load_user_semantic_kernel(kernel: Kernel, settings, user_id: str, redis_clie load_embedding_model_plugin(kernel, settings) print(f"[SK Loader] Loaded Default Embedding Model plugin.") log_event("[SK Loader] Loaded Default Embedding Model plugin.", level=logging.INFO) - + + if settings.get('enable_tabular_processing_plugin', False) and settings.get('enable_enhanced_citations', False): + load_tabular_processing_plugin(kernel) + log_event("[SK Loader] Loaded Tabular Processing plugin.", level=logging.INFO) + # Get selected agent from user settings (this still needs to be in user settings for UI state) user_settings = get_user_settings(user_id).get('settings', {}) selected_agent = user_settings.get('selected_agent') diff --git a/application/single_app/semantic_kernel_plugins/logged_plugin_loader.py b/application/single_app/semantic_kernel_plugins/logged_plugin_loader.py index 64443633..f7c9e38c 100644 --- a/application/single_app/semantic_kernel_plugins/logged_plugin_loader.py +++ b/application/single_app/semantic_kernel_plugins/logged_plugin_loader.py @@ -80,6 +80,10 @@ def load_plugin_from_manifest(self, manifest: Dict[str, Any], # Register the plugin with the kernel self._register_plugin_with_kernel(plugin_instance, plugin_name) + # Auto-create companion SQL Schema plugin when loading a SQL Query plugin + if plugin_type == 'sql_query': + self._auto_create_companion_schema_plugin(manifest, plugin_name) + log_event( f"[Logged Plugin Loader] Successfully loaded plugin: {plugin_name}", extra={ @@ -117,8 +121,8 @@ def _create_plugin_instance(self, manifest: Dict[str, Any]): return self._create_openapi_plugin(manifest) elif plugin_type == 'python': return self._create_python_plugin(manifest) - #elif plugin_type in ['sql_schema', 'sql_query']: - # return self._create_sql_plugin(manifest) + elif plugin_type in ['sql_schema', 'sql_query']: + return self._create_sql_plugin(manifest) else: try: debug_print(f"[Logged Plugin Loader] Attempting to discover plugin type: {plugin_type}") @@ -221,6 +225,60 @@ def _create_sql_plugin(self, manifest: Dict[str, Any]): self.logger.error(f"Failed to import SQL plugin class for {plugin_type}: {e}") return None + def _auto_create_companion_schema_plugin(self, query_manifest: Dict[str, Any], query_plugin_name: str): + """ + Auto-create a companion SQLSchemaPlugin when a SQLQueryPlugin is loaded. + This ensures the agent has access to database schema information for constructing queries. + The schema plugin uses the same connection details as the query plugin. + """ + try: + # Derive schema plugin name from query plugin name + if query_plugin_name.endswith('_query'): + schema_plugin_name = query_plugin_name[:-6] + '_schema' + else: + schema_plugin_name = query_plugin_name + '_schema' + + # Check if schema plugin already exists in kernel + if schema_plugin_name in self.kernel.plugins: + log_event( + f"[Logged Plugin Loader] Companion schema plugin already exists: {schema_plugin_name}", + level=logging.DEBUG + ) + return + + # Create schema manifest from query manifest (same connection details) + schema_manifest = dict(query_manifest) + schema_manifest['type'] = 'sql_schema' + schema_manifest['name'] = schema_plugin_name + + # Create the schema plugin instance + schema_instance = SQLSchemaPlugin(schema_manifest) + + # Enable logging if supported + if hasattr(schema_instance, 'enable_invocation_logging'): + schema_instance.enable_invocation_logging(True) + + # Wrap functions if it's a BasePlugin + if isinstance(schema_instance, BasePlugin): + self._wrap_plugin_functions(schema_instance, schema_plugin_name) + + # Register with kernel + self._register_plugin_with_kernel(schema_instance, schema_plugin_name) + + log_event( + f"[Logged Plugin Loader] Auto-created companion SQL Schema plugin: {schema_plugin_name}", + extra={"query_plugin": query_plugin_name, "schema_plugin": schema_plugin_name}, + level=logging.INFO + ) + + except Exception as e: + log_event( + f"[Logged Plugin Loader] Warning: Failed to auto-create companion schema plugin", + extra={"query_plugin": query_plugin_name, "error": str(e)}, + level=logging.WARNING, + exceptionTraceback=True + ) + def _wrap_plugin_functions(self, plugin_instance, plugin_name: str): """Wrap all kernel functions in a plugin with logging.""" log_event(f"[Logged Plugin Loader] Checking logging status for plugin", diff --git a/application/single_app/semantic_kernel_plugins/openapi_plugin_factory.py b/application/single_app/semantic_kernel_plugins/openapi_plugin_factory.py index d2a91477..b5e80507 100644 --- a/application/single_app/semantic_kernel_plugins/openapi_plugin_factory.py +++ b/application/single_app/semantic_kernel_plugins/openapi_plugin_factory.py @@ -4,7 +4,6 @@ Factory class for creating OpenAPI plugins from different sources: - Stored content in user settings (preferred) - Uploaded files (deprecated) -- URLs (deprecated) - File paths (deprecated) """ @@ -64,8 +63,6 @@ def create_from_config(cls, config: Dict[str, Any]) -> OpenApiPlugin: raise ValueError("openapi_spec_content is required for content source type") elif source_type == 'file': openapi_spec_path = cls._get_uploaded_file_path(config) - elif source_type == 'url': - openapi_spec_path = cls._get_downloaded_file_path(config) elif source_type == 'path': openapi_spec_path = cls._get_local_file_path(config) else: @@ -95,22 +92,6 @@ def _get_uploaded_file_path(cls, config: Dict[str, Any]) -> str: return file_path @classmethod - def _get_downloaded_file_path(cls, config: Dict[str, Any]) -> str: - """Get file path for downloaded OpenAPI spec from URL.""" - file_id = config.get('openapi_file_id') - if not file_id: - raise ValueError("openapi_file_id is required for URL source type") - - # Construct path to downloaded file - file_path = os.path.join(cls.UPLOADED_FILES_DIR, f"{file_id}.yaml") - if not os.path.exists(file_path): - # Try JSON extension - file_path = os.path.join(cls.UPLOADED_FILES_DIR, f"{file_id}.json") - if not os.path.exists(file_path): - raise FileNotFoundError(f"Downloaded file not found: {file_id}") - - return file_path - @classmethod def _get_local_file_path(cls, config: Dict[str, Any]) -> str: """Get file path for local OpenAPI spec.""" diff --git a/application/single_app/semantic_kernel_plugins/plugin_health_checker.py b/application/single_app/semantic_kernel_plugins/plugin_health_checker.py index f8488ea1..962d0cf2 100644 --- a/application/single_app/semantic_kernel_plugins/plugin_health_checker.py +++ b/application/single_app/semantic_kernel_plugins/plugin_health_checker.py @@ -47,9 +47,18 @@ def validate_plugin_manifest(manifest: Dict[str, Any], plugin_type: str) -> Tupl errors.append(f"Plugin type '{plugin_type}' requires 'auth' field") elif plugin_type in ['sql_query', 'sql_schema']: - if 'database_type' not in manifest: + additional_fields = manifest.get('additionalFields', {}) + if not isinstance(additional_fields, dict): + additional_fields = {} + + database_type = manifest.get('database_type') or additional_fields.get('database_type') + connection_string = manifest.get('connection_string') or additional_fields.get('connection_string') + server = manifest.get('server') or additional_fields.get('server') + database = manifest.get('database') or additional_fields.get('database') + + if not database_type: errors.append(f"SQL plugin requires 'database_type' field") - if not manifest.get('connection_string') and not (manifest.get('server') and manifest.get('database')): + if not connection_string and not (server and database): errors.append("SQL plugin requires either 'connection_string' or 'server' and 'database' fields") elif plugin_type == 'log_analytics': diff --git a/application/single_app/semantic_kernel_plugins/plugin_invocation_logger.py b/application/single_app/semantic_kernel_plugins/plugin_invocation_logger.py index f982f0a4..bddf9cda 100644 --- a/application/single_app/semantic_kernel_plugins/plugin_invocation_logger.py +++ b/application/single_app/semantic_kernel_plugins/plugin_invocation_logger.py @@ -11,6 +11,7 @@ import logging import functools import inspect +import threading from typing import Any, Dict, List, Optional, Callable from datetime import datetime from dataclasses import dataclass, asdict @@ -51,24 +52,29 @@ def __init__(self): self.invocations: List[PluginInvocation] = [] self.max_history = 1000 # Keep last 1000 invocations in memory self.logger = get_appinsights_logger() or logging.getLogger(__name__) + self._callbacks: Dict[str, List[Callable[[PluginInvocation], None]]] = {} + self._callback_lock = threading.Lock() def log_invocation(self, invocation: PluginInvocation): """Log a plugin invocation to Application Insights and local history.""" # Add to local history self.invocations.append(invocation) - + # Trim history if needed if len(self.invocations) > self.max_history: self.invocations = self.invocations[-self.max_history:] - + # Enhanced terminal logging self._log_to_terminal(invocation) - + # Log to Application Insights self._log_to_appinsights(invocation) - + # Log to standard logging self._log_to_standard(invocation) + + # Fire registered thought callbacks + self._fire_callbacks(invocation) def _log_to_terminal(self, invocation: PluginInvocation): """Log detailed invocation information to terminal.""" @@ -277,6 +283,34 @@ def clear_history(self): """Clear the invocation history.""" self.invocations.clear() + def register_callback(self, key, callback): + """Register a callback fired on each plugin invocation for the given key. + + Args: + key: A string key, typically f"{user_id}:{conversation_id}". + callback: Called with the PluginInvocation after it is logged. + """ + with self._callback_lock: + if key not in self._callbacks: + self._callbacks[key] = [] + self._callbacks[key].append(callback) + + def deregister_callbacks(self, key): + """Remove all callbacks for the given key.""" + with self._callback_lock: + self._callbacks.pop(key, None) + + def _fire_callbacks(self, invocation): + """Fire matching callbacks for this invocation's user+conversation.""" + key = f"{invocation.user_id}:{invocation.conversation_id}" + with self._callback_lock: + callbacks = list(self._callbacks.get(key, [])) + for cb in callbacks: + try: + cb(invocation) + except Exception as e: + log_event(f"Plugin invocation callback error: {e}", level="WARNING") + # Global instance _plugin_logger = PluginInvocationLogger() diff --git a/application/single_app/semantic_kernel_plugins/sql_query_plugin.py b/application/single_app/semantic_kernel_plugins/sql_query_plugin.py index ccad030f..084c4c9b 100644 --- a/application/single_app/semantic_kernel_plugins/sql_query_plugin.py +++ b/application/single_app/semantic_kernel_plugins/sql_query_plugin.py @@ -176,11 +176,12 @@ def metadata(self) -> Dict[str, Any]: user_desc = self._metadata.get("description", f"SQL Query plugin for {self.database_type} database") api_desc = ( "This plugin executes SQL queries against databases and returns structured results. " - "It supports SQL Server, PostgreSQL, MySQL, and SQLite databases. The plugin includes " - "query sanitization, validation, and security features including parameterized queries, " - "read-only mode, result limiting, and timeout protection. It automatically cleans queries " - "from unnecessary characters and formats results for easy consumption by AI agents. " - "The plugin handles database-specific SQL variations and connection management." + "It supports SQL Server, PostgreSQL, MySQL, and SQLite databases. " + "WORKFLOW: Before executing any query, you MUST first use the SQL Schema plugin to discover " + "available tables, column names, data types, and relationships. Then construct valid SQL queries " + "using the discovered schema with correct fully-qualified table names (e.g., dbo.TableName). " + "The plugin includes query sanitization, validation, and security features including " + "parameterized queries, read-only mode, result limiting, and timeout protection." ) full_desc = f"{user_desc}\n\n{api_desc}" @@ -215,14 +216,24 @@ def metadata(self) -> Dict[str, Any]: {"name": "query", "type": "str", "description": "The SQL query to validate", "required": True} ], "returns": {"type": "ResultWithMetadata", "description": "Validation result with any issues found"} + }, + { + "name": "query_database", + "description": "Execute a SQL query to answer a question about the database", + "parameters": [ + {"name": "question", "type": "str", "description": "The natural language question being answered", "required": True}, + {"name": "query", "type": "str", "description": "The SQL query to execute", "required": True}, + {"name": "max_rows", "type": "int", "description": "Maximum number of rows to return (overrides default)", "required": False} + ], + "returns": {"type": "ResultWithMetadata", "description": "Query results with columns, data, and original question context"} } ] } def get_functions(self) -> List[str]: - return ["execute_query", "execute_scalar", "validate_query"] + return ["execute_query", "execute_scalar", "validate_query", "query_database"] - @kernel_function(description="Execute a SQL query and return results") + @kernel_function(description="Execute a SQL query against the database and return results as structured data with columns and rows. If the database schema is provided in your instructions, use those exact table and column names to construct valid SQL queries. If no schema is available in your instructions, call get_database_schema or get_table_list from the SQL Schema plugin to discover tables first. Always use fully qualified table names (e.g., dbo.TableName) when available. Results are limited by max_rows to prevent excessive data transfer.") @plugin_function_logger("SQLQueryPlugin") def execute_query( self, @@ -301,7 +312,7 @@ def execute_query( } return ResultWithMetadata(error_result, self.metadata) - @kernel_function(description="Execute a query that returns a single value") + @kernel_function(description="Execute a query that returns a single scalar value (e.g., COUNT, SUM, MAX, MIN). If the database schema is provided in your instructions, use it directly to construct the query. Otherwise, call get_database_schema from the SQL Schema plugin first to discover table and column names.") @plugin_function_logger("SQLQueryPlugin") def execute_scalar( self, @@ -360,7 +371,7 @@ def execute_scalar( } return ResultWithMetadata(error_result, self.metadata) - @kernel_function(description="Validate a SQL query without executing it") + @kernel_function(description="Validate a SQL query for syntax correctness and safety without executing it. Use this to pre-check complex queries before execution, especially when constructing multi-table JOINs or complex WHERE clauses.") @plugin_function_logger("SQLQueryPlugin") def validate_query(self, query: str) -> ResultWithMetadata: """Validate a SQL query without executing it""" @@ -380,6 +391,80 @@ def validate_query(self, query: str) -> ResultWithMetadata: } return ResultWithMetadata(error_result, self.metadata) + @kernel_function(description="Execute a SQL query to answer a question about the database. This is a convenience function that executes a SQL query and returns results along with the original question for context. If the database schema is provided in your instructions, use those table and column names directly to construct the query. Otherwise, first call get_database_schema from the SQL Schema plugin to discover the schema. Then construct the appropriate SQL query and provide it along with the original question.") + @plugin_function_logger("SQLQueryPlugin") + def query_database( + self, + question: str, + query: str, + max_rows: Optional[int] = None + ) -> ResultWithMetadata: + """Execute a SQL query to answer a specific question about the database""" + try: + # Clean and validate the query + cleaned_query = self._clean_query(query) + validation_result = self._validate_query(cleaned_query) + + if not validation_result["is_valid"]: + raise ValueError(f"Invalid query: {validation_result['issues']}") + + conn = self._get_connection() + cursor = conn.cursor() + + # Set query timeout + if hasattr(cursor, 'settimeout'): + cursor.settimeout(self.timeout) + + cursor.execute(cleaned_query) + + # Get column names + if hasattr(cursor, 'description') and cursor.description: + columns = [desc[0] for desc in cursor.description] + else: + columns = [] + + # Fetch results with row limit + effective_max_rows = max_rows or self.max_rows + + if self.database_type == 'sqlite': + rows = cursor.fetchall() + if len(rows) > effective_max_rows: + rows = rows[:effective_max_rows] + results = [dict(row) for row in rows] + else: + rows = cursor.fetchmany(effective_max_rows) + results = [] + for row in rows: + if isinstance(row, (list, tuple)): + results.append(dict(zip(columns, row))) + else: + results.append(row) + + # Prepare result data with question context + result_data = { + "question": question, + "columns": columns, + "data": results, + "row_count": len(results), + "is_truncated": len(results) >= effective_max_rows, + "query": cleaned_query + } + + log_event(f"[SQLQueryPlugin] query_database executed successfully, returned {len(results)} rows", extra={"question": question}) + return ResultWithMetadata(result_data, self.metadata) + + except Exception as e: + log_event(f"[SQLQueryPlugin] Error in query_database: {e}", extra={"question": question}) + error_result = { + "error": str(e), + "question": question, + "query": query, + "columns": [], + "data": [], + "row_count": 0 + } + return ResultWithMetadata(error_result, self.metadata) + def _clean_query(self, query: str) -> str: """Clean query from unnecessary characters and formatting""" if not query: diff --git a/application/single_app/semantic_kernel_plugins/sql_schema_plugin.py b/application/single_app/semantic_kernel_plugins/sql_schema_plugin.py index 01d89aa2..380b9a5f 100644 --- a/application/single_app/semantic_kernel_plugins/sql_schema_plugin.py +++ b/application/single_app/semantic_kernel_plugins/sql_schema_plugin.py @@ -165,11 +165,11 @@ def metadata(self) -> Dict[str, Any]: user_desc = self._metadata.get("description", f"SQL Schema plugin for {self.database_type} database") api_desc = ( "This plugin connects to SQL databases and extracts schema information including tables, columns, " - "data types, primary keys, foreign keys, and relationships. It supports SQL Server, PostgreSQL, " - "MySQL, and SQLite databases. The plugin provides structured schema data that can be used by " - "AI agents to understand database structure and generate appropriate SQL queries. " - "Authentication supports connection strings, username/password, and integrated authentication. " - "The plugin handles database-specific SQL variations for schema extraction." + "data types, primary keys, foreign keys, and relationships. WORKFLOW: ALWAYS call get_database_schema " + "or get_table_list FIRST before executing any SQL queries via the SQL Query plugin. This ensures " + "you have accurate table names, column names, and relationship information to construct valid queries. " + "It supports SQL Server, PostgreSQL, MySQL, and SQLite databases. " + "Authentication supports connection strings, username/password, and integrated authentication." ) full_desc = f"{user_desc}\n\n{api_desc}" @@ -219,7 +219,7 @@ def get_functions(self) -> List[str]: return ["get_database_schema", "get_table_schema", "get_table_list", "get_relationships"] @plugin_function_logger("SQLSchemaPlugin") - @kernel_function(description="Get complete database schema including all tables, columns, and relationships") + @kernel_function(description="Get complete database schema including all tables, columns, data types, primary keys, foreign keys, and relationships. If the database schema is already provided in your instructions, use that directly and do NOT call this function. Only call this function if you need to discover the schema and it was not already provided. The returned schema should be used to construct valid SQL queries with the correct fully-qualified table names (e.g., dbo.TableName) and column references.") def get_database_schema( self, include_system_tables: bool = False, @@ -255,24 +255,26 @@ def get_database_schema( # Get schema for each table for table in tables: - if isinstance(table, tuple) and len(table) >= 2: + try: + # Robust row parsing — works with pyodbc.Row, tuple, list, etc. table_name = table[0] - schema_name = table[1] - qualified_table_name = f"{schema_name}.{table_name}" - else: - table_name = table[0] if isinstance(table, tuple) else table + schema_name = table[1] if len(table) >= 2 else None + qualified_table_name = f"{schema_name}.{table_name}" if schema_name else str(table_name) + except (TypeError, IndexError): + table_name = str(table) schema_name = None qualified_table_name = table_name try: - table_schema = self._get_table_schema_data(cursor, table_name, schema_name) - schema_data["tables"][table_name] = table_schema - print(f"[SQLSchemaPlugin] Got schema for table: {qualified_table_name}") + table_schema = self._get_table_schema_data(cursor, str(table_name), str(schema_name) if schema_name else None) + schema_data["tables"][str(table_name)] = table_schema + print(f"[SQLSchemaPlugin] Got schema for table: {qualified_table_name} ({len(table_schema.get('columns', []))} columns)") except Exception as e: print(f"[SQLSchemaPlugin] Error getting schema for table {qualified_table_name}: {e}") log_event(f"[SQLSchemaPlugin] Error getting table schema", extra={ "table_name": qualified_table_name, - "error": str(e) + "error": str(e), + "raw_row": repr(table) }) # Get relationships @@ -310,30 +312,8 @@ def get_database_schema( {"error": error_msg}, {"source": "sql_schema_plugin", "success": False} ) - - # Get tables - tables_query = self._get_tables_query(include_system_tables, table_filter) - cursor.execute(tables_query) - tables = cursor.fetchall() - - # Get schema for each table - for table_row in tables: - table_name = table_row[0] if isinstance(table_row, (list, tuple)) else table_row - table_schema = self._get_table_schema_data(cursor, table_name) - schema_data["tables"][table_name] = table_schema - - # Get relationships - relationships = self._get_relationships_data(cursor) - schema_data["relationships"] = relationships - - log_event(f"[SQLSchemaPlugin] Retrieved schema for {len(schema_data['tables'])} tables") - return ResultWithMetadata(schema_data, self.metadata) - - except Exception as e: - log_event(f"[SQLSchemaPlugin] Error getting database schema: {e}") - raise - @kernel_function(description="Get detailed schema for a specific table") + @kernel_function(description="Get the detailed schema (column names, data types, constraints) for a specific table. If the database schema is already provided in your instructions, use that directly instead of calling this function. Only call this if you need details for a specific table not already in your instructions.") @plugin_function_logger("SQLSchemaPlugin") def get_table_schema(self, table_name: str) -> ResultWithMetadata: """Get detailed schema for a specific table""" @@ -350,7 +330,7 @@ def get_table_schema(self, table_name: str) -> ResultWithMetadata: log_event(f"[SQLSchemaPlugin] Error getting table schema for {table_name}: {e}") raise - @kernel_function(description="Get list of all tables in the database") + @kernel_function(description="Return the names of all tables in the database. If the database schema is already provided in your instructions, use that directly instead of calling this function. Only call this if you need to discover available tables and they are not already listed in your instructions.") @plugin_function_logger("SQLSchemaPlugin") def get_table_list( self, @@ -368,14 +348,14 @@ def get_table_list( table_list = [] for table_row in tables: - if isinstance(table_row, (list, tuple)): + try: table_info = { - "table_name": table_row[0], - "schema": table_row[1] if len(table_row) > 1 else None, - "table_type": table_row[2] if len(table_row) > 2 else "TABLE" + "table_name": str(table_row[0]), + "schema": str(table_row[1]) if len(table_row) > 1 else None, + "table_type": str(table_row[2]) if len(table_row) > 2 else "TABLE" } - else: - table_info = {"table_name": table_row, "schema": None, "table_type": "TABLE"} + except (TypeError, IndexError): + table_info = {"table_name": str(table_row), "schema": None, "table_type": "TABLE"} table_list.append(table_info) log_event(f"[SQLSchemaPlugin] Retrieved {len(table_list)} tables") @@ -385,7 +365,7 @@ def get_table_list( log_event(f"[SQLSchemaPlugin] Error getting table list: {e}") raise - @kernel_function(description="Get foreign key relationships between tables") + @kernel_function(description="Get foreign key relationships between tables. If the database schema and relationships are already provided in your instructions, use those directly instead of calling this function. Only call this if you need relationship details not already in your instructions.") def get_relationships(self, table_name: Optional[str] = None) -> ResultWithMetadata: """Get foreign key relationships between tables""" try: @@ -402,17 +382,20 @@ def get_relationships(self, table_name: Optional[str] = None) -> ResultWithMetad raise def _get_tables_query(self, include_system_tables: bool, table_filter: Optional[str]) -> str: - """Get database-specific query for listing tables""" + """Get database-specific query for listing tables. + Uses sys.tables/sys.schemas for SQL Server (more reliable than INFORMATION_SCHEMA + in Azure SQL environments with restricted permissions).""" if self.database_type == 'sqlserver': base_query = """ - SELECT TABLE_NAME, TABLE_SCHEMA, TABLE_TYPE - FROM INFORMATION_SCHEMA.TABLES - WHERE TABLE_TYPE = 'BASE TABLE' + SELECT t.name AS TABLE_NAME, s.name AS TABLE_SCHEMA, 'BASE TABLE' AS TABLE_TYPE + FROM sys.tables t + INNER JOIN sys.schemas s ON t.schema_id = s.schema_id + WHERE t.type = 'U' """ if not include_system_tables: - base_query += " AND TABLE_SCHEMA NOT IN ('sys', 'information_schema')" + base_query += " AND s.name NOT IN ('sys', 'information_schema')" if table_filter: - base_query += f" AND TABLE_NAME LIKE '{table_filter.replace('*', '%')}'" + base_query += f" AND t.name LIKE '{table_filter.replace('*', '%')}'" return base_query elif self.database_type == 'postgresql': @@ -467,22 +450,30 @@ def _get_table_schema_data(self, cursor, table_name: str, schema_name: str = Non if pk_query: cursor.execute(pk_query) pks = cursor.fetchall() - schema_data["primary_keys"] = [pk[0] if isinstance(pk, (list, tuple)) else pk for pk in pks] + schema_data["primary_keys"] = [str(pk[0]) for pk in pks] return schema_data def _get_columns_query(self, table_name: str, schema_name: str = None) -> str: - """Get database-specific query for table columns""" + """Get database-specific query for table columns. + Uses sys.columns/sys.types for SQL Server (consistent with sys.tables used for enumeration).""" if self.database_type == 'sqlserver': - where_clause = f"WHERE TABLE_NAME = '{table_name}'" - if schema_name: - where_clause += f" AND TABLE_SCHEMA = '{schema_name}'" + schema_filter = f"AND s.name = '{schema_name}'" if schema_name else "" return f""" - SELECT COLUMN_NAME, DATA_TYPE, IS_NULLABLE, COLUMN_DEFAULT, - CHARACTER_MAXIMUM_LENGTH, NUMERIC_PRECISION, NUMERIC_SCALE - FROM INFORMATION_SCHEMA.COLUMNS - {where_clause} - ORDER BY ORDINAL_POSITION + SELECT + c.name AS COLUMN_NAME, + TYPE_NAME(c.user_type_id) AS DATA_TYPE, + CASE WHEN c.is_nullable = 1 THEN 'YES' ELSE 'NO' END AS IS_NULLABLE, + dc.definition AS COLUMN_DEFAULT, + c.max_length AS CHARACTER_MAXIMUM_LENGTH, + c.precision AS NUMERIC_PRECISION, + c.scale AS NUMERIC_SCALE + FROM sys.columns c + INNER JOIN sys.tables t ON c.object_id = t.object_id + INNER JOIN sys.schemas s ON t.schema_id = s.schema_id + LEFT JOIN sys.default_constraints dc ON c.default_object_id = dc.object_id + WHERE t.name = '{table_name}' {schema_filter} + ORDER BY c.column_id """ elif self.database_type == 'postgresql': return f""" @@ -498,16 +489,18 @@ def _get_columns_query(self, table_name: str, schema_name: str = None) -> str: return f"PRAGMA table_info({table_name})" def _get_primary_keys_query(self, table_name: str, schema_name: str = None) -> Optional[str]: - """Get database-specific query for primary keys""" + """Get database-specific query for primary keys. + Uses sys.indexes/sys.index_columns for SQL Server (consistent with sys.tables).""" if self.database_type == 'sqlserver': - where_clause = f"WHERE TABLE_NAME = '{table_name}'" - if schema_name: - where_clause += f" AND TABLE_SCHEMA = '{schema_name}'" + schema_filter = f"AND s.name = '{schema_name}'" if schema_name else "" return f""" - SELECT COLUMN_NAME - FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE - {where_clause} - AND CONSTRAINT_NAME LIKE 'PK_%' + SELECT c.name AS COLUMN_NAME + FROM sys.index_columns ic + INNER JOIN sys.columns c ON ic.object_id = c.object_id AND ic.column_id = c.column_id + INNER JOIN sys.indexes i ON ic.object_id = i.object_id AND ic.index_id = i.index_id + INNER JOIN sys.tables t ON i.object_id = t.object_id + INNER JOIN sys.schemas s ON t.schema_id = s.schema_id + WHERE i.is_primary_key = 1 AND t.name = '{table_name}' {schema_filter} """ elif self.database_type == 'postgresql': return f""" diff --git a/application/single_app/semantic_kernel_plugins/tabular_processing_plugin.py b/application/single_app/semantic_kernel_plugins/tabular_processing_plugin.py new file mode 100644 index 00000000..fd5e2597 --- /dev/null +++ b/application/single_app/semantic_kernel_plugins/tabular_processing_plugin.py @@ -0,0 +1,1862 @@ +# tabular_processing_plugin.py +""" +TabularProcessingPlugin for Semantic Kernel: provides data analysis operations +on tabular files (CSV, XLSX, XLS, XLSM) stored in Azure Blob Storage. + +Works with workspace documents (user-documents, group-documents, public-documents) +and chat-uploaded documents (personal-chat container). +""" +import asyncio +import copy +from datetime import date, datetime +import io +import json +import logging +import warnings +import pandas +from typing import Annotated, Optional, List +from semantic_kernel.functions import kernel_function +from semantic_kernel_plugins.plugin_invocation_logger import plugin_function_logger +from functions_appinsights import log_event +from config import ( + CLIENTS, + TABULAR_EXTENSIONS, + storage_account_user_documents_container_name, + storage_account_personal_chat_container_name, + storage_account_group_documents_container_name, + storage_account_public_documents_container_name, +) + + +class TabularProcessingPlugin: + """Provides data analysis functions on tabular files stored in blob storage.""" + + SUPPORTED_EXTENSIONS = tuple(f'.{extension}' for extension in sorted(TABULAR_EXTENSIONS)) + DISCOVERY_FUNCTION_NAMES = ( + 'list_tabular_files', + 'describe_tabular_file', + ) + ANALYSIS_FUNCTION_NAMES = ( + 'lookup_value', + 'aggregate_column', + 'filter_rows', + 'query_tabular_data', + 'group_by_aggregate', + 'group_by_datetime_component', + ) + THOUGHT_EXCLUDED_PARAMETER_NAMES = ( + 'user_id', + 'conversation_id', + 'group_id', + 'public_workspace_id', + ) + DAY_NAME_ORDER = [ + 'Monday', + 'Tuesday', + 'Wednesday', + 'Thursday', + 'Friday', + 'Saturday', + 'Sunday' + ] + MONTH_NAME_ORDER = [ + 'January', + 'February', + 'March', + 'April', + 'May', + 'June', + 'July', + 'August', + 'September', + 'October', + 'November', + 'December' + ] + + def __init__(self): + self._df_cache = {} # Per-instance cache: (container, blob_name, sheet_name) -> DataFrame + self._blob_data_cache = {} # Per-instance cache: (container, blob_name) -> raw bytes + self._workbook_metadata_cache = {} # Per-instance cache: (container, blob_name) -> workbook metadata + self._default_sheet_overrides = {} # (container, blob_name) -> default sheet name + + @classmethod + def get_discovery_function_names(cls): + """Return discovery-oriented kernel function names exposed by the plugin.""" + return cls.DISCOVERY_FUNCTION_NAMES + + @classmethod + def get_analysis_function_names(cls): + """Return analytical kernel function names exposed by the plugin.""" + return cls.ANALYSIS_FUNCTION_NAMES + + @classmethod + def get_thought_excluded_parameter_names(cls): + """Return parameter names omitted from user-visible thought payloads.""" + return cls.THOUGHT_EXCLUDED_PARAMETER_NAMES + + def set_default_sheet(self, container_name: str, blob_name: str, sheet_name: str): + """Set the default sheet for a workbook so the model doesn't need to specify it.""" + self._default_sheet_overrides[(container_name, blob_name)] = sheet_name + + def _get_blob_service_client(self): + """Get the blob service client from CLIENTS cache.""" + client = CLIENTS.get("storage_account_office_docs_client") + if not client: + raise RuntimeError("Blob storage client not available. Enhanced citations must be enabled.") + return client + + def _list_tabular_blobs(self, container_name: str, prefix: str) -> List[str]: + """List all tabular file blobs under a given prefix.""" + client = self._get_blob_service_client() + container_client = client.get_container_client(container_name) + blobs = [] + for blob in container_client.list_blobs(name_starts_with=prefix): + name_lower = blob['name'].lower() + if any(name_lower.endswith(ext) for ext in self.SUPPORTED_EXTENSIONS): + blobs.append(blob['name']) + return blobs + + def _download_tabular_blob_bytes(self, container_name: str, blob_name: str) -> bytes: + """Download a blob once and reuse the raw bytes across sheet-aware operations.""" + cache_key = (container_name, blob_name) + if cache_key in self._blob_data_cache: + return self._blob_data_cache[cache_key] + + client = self._get_blob_service_client() + blob_client = client.get_blob_client(container=container_name, blob=blob_name) + stream = blob_client.download_blob() + data = stream.readall() + self._blob_data_cache[cache_key] = data + return data + + def _get_excel_engine(self, blob_name: str) -> Optional[str]: + """Return the pandas Excel engine for a workbook, or None for CSV files.""" + name_lower = blob_name.lower() + if name_lower.endswith('.xlsx') or name_lower.endswith('.xlsm'): + return 'openpyxl' + if name_lower.endswith('.xls'): + return 'xlrd' + return None + + def _get_workbook_metadata(self, container_name: str, blob_name: str) -> dict: + """Return workbook metadata including available sheet names for Excel files.""" + cache_key = (container_name, blob_name) + if cache_key in self._workbook_metadata_cache: + return copy.deepcopy(self._workbook_metadata_cache[cache_key]) + + engine = self._get_excel_engine(blob_name) + metadata = { + 'is_workbook': bool(engine), + 'sheet_names': [], + 'sheet_count': 0, + 'default_sheet': None, + } + + if engine: + data = self._download_tabular_blob_bytes(container_name, blob_name) + excel_file = pandas.ExcelFile(io.BytesIO(data), engine=engine) + sheet_names = list(excel_file.sheet_names) + metadata.update({ + 'sheet_names': sheet_names, + 'sheet_count': len(sheet_names), + 'default_sheet': sheet_names[0] if sheet_names else None, + }) + + self._workbook_metadata_cache[cache_key] = copy.deepcopy(metadata) + return copy.deepcopy(metadata) + + def _resolve_sheet_selection( + self, + container_name: str, + blob_name: str, + sheet_name: Optional[str] = None, + sheet_index: Optional[str] = None, + require_explicit_sheet: bool = False, + ) -> tuple: + """Resolve a workbook sheet selection and enforce explicit choice when required.""" + workbook_metadata = self._get_workbook_metadata(container_name, blob_name) + if not workbook_metadata.get('is_workbook'): + return None, workbook_metadata + + available_sheets = workbook_metadata.get('sheet_names', []) + if not available_sheets: + raise ValueError(f"Workbook '{blob_name}' does not contain any readable sheets.") + + normalized_sheet_name = (sheet_name or '').strip() + if normalized_sheet_name: + for candidate in available_sheets: + if candidate == normalized_sheet_name: + return candidate, workbook_metadata + for candidate in available_sheets: + if candidate.lower() == normalized_sheet_name.lower(): + return candidate, workbook_metadata + raise ValueError( + f"Sheet '{normalized_sheet_name}' was not found in workbook '{blob_name}'. " + f"Available sheets: {available_sheets}." + ) + + normalized_sheet_index = None if sheet_index is None else str(sheet_index).strip() + if normalized_sheet_index not in (None, ''): + try: + resolved_sheet_index = int(normalized_sheet_index) + except ValueError as exc: + raise ValueError( + f"sheet_index must be an integer for workbook '{blob_name}'." + ) from exc + + if resolved_sheet_index < 0 or resolved_sheet_index >= len(available_sheets): + raise ValueError( + f"sheet_index {resolved_sheet_index} is out of range for workbook '{blob_name}'. " + f"Available sheets: {available_sheets}." + ) + return available_sheets[resolved_sheet_index], workbook_metadata + + if len(available_sheets) == 1: + return available_sheets[0], workbook_metadata + + # Use pre-selected default sheet if one was set by the orchestration layer + override_key = (container_name, blob_name) + if override_key in self._default_sheet_overrides: + override_sheet = self._default_sheet_overrides[override_key] + for candidate in available_sheets: + if candidate == override_sheet or candidate.lower() == override_sheet.lower(): + return candidate, workbook_metadata + + if require_explicit_sheet: + raise ValueError( + f"Workbook '{blob_name}' has multiple sheets: {available_sheets}. " + "Specify sheet_name or sheet_index on analytical calls." + ) + + return workbook_metadata.get('default_sheet'), workbook_metadata + + def _filter_rows_across_sheets( + self, + container_name: str, + blob_name: str, + filename: str, + column: str, + operator_str: str, + value: str, + max_rows: int = 100, + ) -> Optional[str]: + """Search for matching rows across all sheets that contain the requested column. + + Returns a combined JSON result when matches are found on any sheet, + or None if the workbook is not multi-sheet (caller should fall through). + """ + workbook_metadata = self._get_workbook_metadata(container_name, blob_name) + if not workbook_metadata.get('is_workbook'): + return None + + available_sheets = workbook_metadata.get('sheet_names', []) + if len(available_sheets) <= 1: + return None + + combined_results = [] + sheets_searched = [] + sheets_matched = [] + total_matches = 0 + + for sheet in available_sheets: + df = self._read_tabular_blob_to_dataframe( + container_name, + blob_name, + sheet_name=sheet, + ) + df = self._try_numeric_conversion(df) + + if column not in df.columns: + continue + + sheets_searched.append(sheet) + series = df[column] + op = operator_str.strip().lower() + + numeric_value = None + try: + numeric_value = float(value) + except (ValueError, TypeError): + pass + + if op in ('==', 'equals'): + if numeric_value is not None and pandas.api.types.is_numeric_dtype(series): + mask = series == numeric_value + else: + mask = series.astype(str).str.lower() == value.lower() + elif op == '!=': + if numeric_value is not None and pandas.api.types.is_numeric_dtype(series): + mask = series != numeric_value + else: + mask = series.astype(str).str.lower() != value.lower() + elif op == '>': + mask = series > numeric_value if numeric_value is not None else pandas.Series([False] * len(series)) + elif op == '<': + mask = series < numeric_value if numeric_value is not None else pandas.Series([False] * len(series)) + elif op == '>=': + mask = series >= numeric_value if numeric_value is not None else pandas.Series([False] * len(series)) + elif op == '<=': + mask = series <= numeric_value if numeric_value is not None else pandas.Series([False] * len(series)) + elif op == 'contains': + mask = series.astype(str).str.contains(value, case=False, na=False) + elif op == 'startswith': + mask = series.astype(str).str.lower().str.startswith(value.lower()) + elif op == 'endswith': + mask = series.astype(str).str.lower().str.endswith(value.lower()) + else: + continue + + sheet_matches = int(mask.sum()) + if sheet_matches == 0: + continue + + sheets_matched.append(sheet) + total_matches += sheet_matches + remaining_capacity = max(0, max_rows - len(combined_results)) + if remaining_capacity > 0: + filtered = df[mask].head(remaining_capacity) + for row in filtered.to_dict(orient='records'): + row['_sheet'] = sheet + combined_results.append(row) + + if not sheets_searched: + return None + + log_event( + f"[TabularProcessingPlugin] Cross-sheet filter_rows: " + f"searched {len(sheets_searched)} sheets, " + f"matched on {len(sheets_matched)} ({sheets_matched}), " + f"total_matches={total_matches}", + level=logging.INFO, + ) + + return json.dumps({ + "filename": filename, + "selected_sheet": "ALL (cross-sheet search)", + "sheets_searched": sheets_searched, + "sheets_matched": sheets_matched, + "total_matches": total_matches, + "returned_rows": len(combined_results), + "data": combined_results, + }, indent=2, default=str) + + def _lookup_value_across_sheets( + self, + container_name: str, + blob_name: str, + filename: str, + lookup_column: str, + lookup_value_str: str, + target_column: Optional[str] = None, + match_operator: str = "equals", + max_rows: int = 25, + ) -> Optional[str]: + """Look up matching rows across all sheets that contain the lookup column. + + Returns a combined JSON result when matches are found, + or None if the workbook is not multi-sheet. + """ + workbook_metadata = self._get_workbook_metadata(container_name, blob_name) + if not workbook_metadata.get('is_workbook'): + return None + + available_sheets = workbook_metadata.get('sheet_names', []) + if len(available_sheets) <= 1: + return None + + combined_results = [] + sheets_searched = [] + sheets_matched = [] + total_matches = 0 + operator = (match_operator or 'equals').strip().lower() + normalized_lookup_value = str(lookup_value_str) + + for sheet in available_sheets: + df = self._read_tabular_blob_to_dataframe( + container_name, + blob_name, + sheet_name=sheet, + ) + df = self._try_numeric_conversion(df) + + if lookup_column not in df.columns: + continue + + sheets_searched.append(sheet) + series = df[lookup_column] + + if operator in {'equals', '=='}: + mask = series.astype(str).str.lower() == normalized_lookup_value.lower() + elif operator == 'contains': + mask = series.astype(str).str.contains(normalized_lookup_value, case=False, na=False) + elif operator == 'startswith': + mask = series.astype(str).str.lower().str.startswith(normalized_lookup_value.lower()) + elif operator == 'endswith': + mask = series.astype(str).str.lower().str.endswith(normalized_lookup_value.lower()) + else: + mask = series.astype(str).str.lower() == normalized_lookup_value.lower() + + sheet_matches = int(mask.sum()) + if sheet_matches == 0: + continue + + sheets_matched.append(sheet) + total_matches += sheet_matches + remaining_capacity = max(0, max_rows - len(combined_results)) + if remaining_capacity > 0: + matched_df = df[mask].head(remaining_capacity) + if target_column and target_column in df.columns: + for _, row in matched_df.iterrows(): + combined_results.append({ + '_sheet': sheet, + lookup_column: row[lookup_column], + target_column: row[target_column], + '_full_row': {str(k): v for k, v in row.to_dict().items()}, + }) + else: + for row in matched_df.to_dict(orient='records'): + row['_sheet'] = sheet + combined_results.append(row) + + if not sheets_searched: + return None + + log_event( + f"[TabularProcessingPlugin] Cross-sheet lookup_value: " + f"searched {len(sheets_searched)} sheets, " + f"matched on {len(sheets_matched)} ({sheets_matched}), " + f"total_matches={total_matches}", + level=logging.INFO, + ) + + return json.dumps({ + "filename": filename, + "selected_sheet": "ALL (cross-sheet search)", + "sheets_searched": sheets_searched, + "sheets_matched": sheets_matched, + "total_matches": total_matches, + "returned_rows": len(combined_results), + "data": combined_results, + }, indent=2, default=str) + + def _query_tabular_data_across_sheets( + self, + container_name: str, + blob_name: str, + filename: str, + query_expression: str, + max_rows: int = 100, + ) -> Optional[str]: + """Execute a pandas query expression across all sheets of a multi-sheet workbook. + + Returns a combined JSON result when any sheet produces matches, + or None if the workbook is not multi-sheet. + """ + workbook_metadata = self._get_workbook_metadata(container_name, blob_name) + if not workbook_metadata.get('is_workbook'): + return None + + available_sheets = workbook_metadata.get('sheet_names', []) + if len(available_sheets) <= 1: + return None + + combined_results = [] + sheets_searched = [] + sheets_matched = [] + total_matches = 0 + + for sheet in available_sheets: + df = self._read_tabular_blob_to_dataframe( + container_name, + blob_name, + sheet_name=sheet, + ) + df = self._try_numeric_conversion(df) + + try: + result_df = df.query(query_expression) + except Exception: + # Query expression references columns not in this sheet — skip + continue + + sheets_searched.append(sheet) + sheet_matches = len(result_df) + if sheet_matches == 0: + continue + + sheets_matched.append(sheet) + total_matches += sheet_matches + remaining_capacity = max(0, max_rows - len(combined_results)) + if remaining_capacity > 0: + for row in result_df.head(remaining_capacity).to_dict(orient='records'): + row['_sheet'] = sheet + combined_results.append(row) + + if not sheets_searched: + return None + + log_event( + f"[TabularProcessingPlugin] Cross-sheet query_tabular_data: " + f"searched {len(sheets_searched)} sheets, " + f"matched on {len(sheets_matched)} ({sheets_matched}), " + f"total_matches={total_matches}", + level=logging.INFO, + ) + + return json.dumps({ + "filename": filename, + "selected_sheet": "ALL (cross-sheet search)", + "sheets_searched": sheets_searched, + "sheets_matched": sheets_matched, + "total_matches": total_matches, + "returned_rows": len(combined_results), + "data": combined_results, + }, indent=2, default=str) + + def _format_datetime_column_label(self, value) -> str: + """Render date-like Excel header labels into stable analysis-friendly strings.""" + timestamp_value = pandas.Timestamp(value) + + if ( + timestamp_value.hour == 0 + and timestamp_value.minute == 0 + and timestamp_value.second == 0 + and timestamp_value.microsecond == 0 + ): + if timestamp_value.day == 1: + return timestamp_value.strftime('%b-%y') + return timestamp_value.strftime('%Y-%m-%d') + + return timestamp_value.strftime('%Y-%m-%d %H:%M:%S') + + def _normalize_column_label(self, label, fallback_index: int) -> str: + """Convert arbitrary DataFrame column labels into stable string names.""" + if label is None or (not isinstance(label, str) and pandas.isna(label)): + return f"Column {fallback_index}" + + if isinstance(label, pandas.Timestamp): + return self._format_datetime_column_label(label) + + if isinstance(label, datetime): + return self._format_datetime_column_label(label) + + if isinstance(label, date): + return self._format_datetime_column_label(datetime.combine(label, datetime.min.time())) + + normalized_label = str(label).strip() + return normalized_label or f"Column {fallback_index}" + + def _normalize_dataframe_columns(self, df: pandas.DataFrame) -> pandas.DataFrame: + """Rename DataFrame columns to unique, JSON-safe string labels.""" + normalized_df = df.copy() + normalized_columns = [] + normalized_label_counts = {} + + for column_index, column_label in enumerate(normalized_df.columns, start=1): + base_label = self._normalize_column_label(column_label, column_index) + occurrence_count = normalized_label_counts.get(base_label, 0) + 1 + normalized_label_counts[base_label] = occurrence_count + + if occurrence_count == 1: + normalized_columns.append(base_label) + else: + normalized_columns.append(f"{base_label} ({occurrence_count})") + + normalized_df.columns = normalized_columns + return normalized_df + + def _build_sheet_schema_summary(self, df: pandas.DataFrame, sheet_name: Optional[str], preview_rows: int = 3) -> dict: + """Build a compact schema summary for a single table or worksheet.""" + df = self._normalize_dataframe_columns(df) + df_numeric = self._try_numeric_conversion(df.copy()) + return { + 'selected_sheet': sheet_name, + 'row_count': len(df), + 'column_count': len(df.columns), + 'columns': list(df.columns), + 'dtypes': {col: str(dtype) for col, dtype in df_numeric.dtypes.items()}, + 'preview': df.head(preview_rows).to_dict(orient='records'), + 'null_counts': df.isnull().sum().to_dict(), + } + + def _build_workbook_schema_summary(self, container_name: str, blob_name: str, filename: str, preview_rows: int = 3) -> dict: + """Build a workbook-aware schema summary for prompt preload and file description.""" + workbook_metadata = self._get_workbook_metadata(container_name, blob_name) + if not workbook_metadata.get('is_workbook'): + df = self._read_tabular_blob_to_dataframe(container_name, blob_name) + summary = self._build_sheet_schema_summary(df, None, preview_rows=preview_rows) + summary.update({ + 'filename': filename, + 'is_workbook': False, + 'sheet_names': [], + 'sheet_count': 0, + }) + return summary + + per_sheet_schemas = {} + for workbook_sheet_name in workbook_metadata.get('sheet_names', []): + df = self._read_tabular_blob_to_dataframe( + container_name, + blob_name, + sheet_name=workbook_sheet_name, + ) + per_sheet_schemas[workbook_sheet_name] = self._build_sheet_schema_summary( + df, + workbook_sheet_name, + preview_rows=preview_rows, + ) + + return { + 'filename': filename, + 'is_workbook': True, + 'sheet_names': workbook_metadata.get('sheet_names', []), + 'sheet_count': workbook_metadata.get('sheet_count', 0), + 'selected_sheet': None, + 'per_sheet_schemas': per_sheet_schemas, + } + + def _find_candidate_sheets_for_columns( + self, + container_name: str, + blob_name: str, + column_names: List[str], + exclude_sheet: Optional[str] = None, + ) -> List[str]: + """Return workbook sheets that contain one or more requested columns, ordered by best match.""" + workbook_metadata = self._get_workbook_metadata(container_name, blob_name) + if not workbook_metadata.get('is_workbook'): + return [] + + normalized_targets = [] + seen_targets = set() + for column_name in column_names or []: + normalized_column_name = str(column_name or '').strip().lower() + if not normalized_column_name or normalized_column_name in seen_targets: + continue + seen_targets.add(normalized_column_name) + normalized_targets.append(normalized_column_name) + + if not normalized_targets: + return [] + + normalized_exclude_sheet = str(exclude_sheet or '').strip().lower() + ranked_candidates = [] + for sheet_name in workbook_metadata.get('sheet_names', []): + if normalized_exclude_sheet and sheet_name.lower() == normalized_exclude_sheet: + continue + + dataframe = self._read_tabular_blob_to_dataframe( + container_name, + blob_name, + sheet_name=sheet_name, + ) + normalized_columns = {str(column).strip().lower() for column in dataframe.columns} + matched_columns = [ + target_column for target_column in normalized_targets + if target_column in normalized_columns + ] + if not matched_columns: + continue + + ranked_candidates.append((len(matched_columns), sheet_name)) + + ranked_candidates.sort(key=lambda item: (-item[0], item[1].lower())) + return [sheet_name for _, sheet_name in ranked_candidates] + + def _build_missing_column_error_payload( + self, + container_name: str, + blob_name: str, + filename: str, + workbook_metadata: dict, + selected_sheet: Optional[str], + missing_column: str, + related_columns: Optional[List[str]] = None, + available_columns: Optional[List[str]] = None, + ) -> dict: + """Build a workbook-aware missing-column payload that points retries at better candidate sheets.""" + available_columns = available_columns or [] + payload = { + 'error': f"Column '{missing_column}' not found. Available: {available_columns}", + 'filename': filename, + 'missing_column': missing_column, + 'selected_sheet': selected_sheet if workbook_metadata.get('is_workbook') else None, + } + + if workbook_metadata.get('is_workbook') and workbook_metadata.get('sheet_count', 0) > 1: + candidate_sheets = self._find_candidate_sheets_for_columns( + container_name, + blob_name, + [missing_column] + list(related_columns or []), + exclude_sheet=selected_sheet, + ) + if candidate_sheets: + payload['candidate_sheets'] = candidate_sheets + payload['error'] = ( + f"Column '{missing_column}' not found on sheet '{selected_sheet}'. " + f"Available: {available_columns}. Candidate sheets: {candidate_sheets}" + ) + + return payload + + def _read_tabular_blob_to_dataframe( + self, + container_name: str, + blob_name: str, + sheet_name: Optional[str] = None, + sheet_index: Optional[str] = None, + require_explicit_sheet: bool = False, + ) -> pandas.DataFrame: + """Download a blob and read it into a pandas DataFrame. Uses per-instance cache.""" + resolved_sheet_name, workbook_metadata = self._resolve_sheet_selection( + container_name, + blob_name, + sheet_name=sheet_name, + sheet_index=sheet_index, + require_explicit_sheet=require_explicit_sheet, + ) + sheet_cache_key = resolved_sheet_name or '__default__' + cache_key = (container_name, blob_name, sheet_cache_key) + if cache_key in self._df_cache: + log_event( + f"[TabularProcessingPlugin] Cache hit for {blob_name}" + + (f" [{resolved_sheet_name}]" if resolved_sheet_name else ''), + level=logging.DEBUG, + ) + return self._df_cache[cache_key].copy() + + data = self._download_tabular_blob_bytes(container_name, blob_name) + + name_lower = blob_name.lower() + if name_lower.endswith('.csv'): + df = pandas.read_csv(io.BytesIO(data), keep_default_na=False, dtype=str) + elif name_lower.endswith('.xlsx') or name_lower.endswith('.xlsm'): + df = pandas.read_excel( + io.BytesIO(data), + engine='openpyxl', + keep_default_na=False, + dtype=str, + sheet_name=resolved_sheet_name, + ) + elif name_lower.endswith('.xls'): + df = pandas.read_excel( + io.BytesIO(data), + engine='xlrd', + keep_default_na=False, + dtype=str, + sheet_name=resolved_sheet_name, + ) + else: + raise ValueError(f"Unsupported tabular file type: {blob_name}") + + df = self._normalize_dataframe_columns(df) + self._df_cache[cache_key] = df + log_event( + f"[TabularProcessingPlugin] Cached DataFrame for {blob_name}" + + (f" [{resolved_sheet_name}]" if resolved_sheet_name else '') + + f" ({len(df)} rows)", + level=logging.DEBUG, + ) + return df.copy() + + def _try_numeric_conversion(self, df: pandas.DataFrame) -> pandas.DataFrame: + """Attempt to convert string columns to numeric where possible.""" + for col in df.columns: + if pandas.api.types.is_datetime64_any_dtype(df[col]) or pandas.api.types.is_timedelta64_dtype(df[col]): + continue + try: + df[col] = pandas.to_numeric(df[col]) + except (ValueError, TypeError): + pass + return df + + def _parse_datetime_like_series(self, series: pandas.Series) -> pandas.Series: + """Best-effort parsing for datetime and time-like values.""" + if pandas.api.types.is_datetime64_any_dtype(series): + return pandas.to_datetime(series, errors='coerce') + + cleaned_series = series.astype(str).str.strip() + cleaned_series = cleaned_series.replace({ + '': None, + 'nan': None, + 'NaN': None, + 'nat': None, + 'NaT': None, + 'none': None, + 'None': None, + }) + + parsed = pandas.Series(pandas.NaT, index=series.index, dtype='datetime64[ns]') + + common_formats = [ + '%m/%d/%Y %I:%M:%S %p', + '%m/%d/%Y %I:%M %p', + '%m/%d/%Y %H:%M:%S', + '%m/%d/%Y %H:%M', + '%Y-%m-%d %H:%M:%S', + '%Y-%m-%d %H:%M', + '%Y-%m-%dT%H:%M:%S', + '%Y-%m-%dT%H:%M:%S.%f', + '%Y-%m-%d', + '%m/%d/%Y', + ] + + for datetime_format in common_formats: + remaining_mask = parsed.isna() & cleaned_series.notna() + if not remaining_mask.any(): + break + + parsed.loc[remaining_mask] = pandas.to_datetime( + cleaned_series[remaining_mask], + format=datetime_format, + errors='coerce' + ) + + remaining_mask = parsed.isna() & cleaned_series.notna() + if remaining_mask.any(): + digits = cleaned_series[remaining_mask].str.replace(r'[^0-9]', '', regex=True) + + hhmm_mask = digits.str.match(r'^\d{3,4}$', na=False) + if hhmm_mask.any(): + hhmm_values = digits[hhmm_mask].str.zfill(4) + parsed.loc[hhmm_values.index] = pandas.to_datetime( + hhmm_values, + format='%H%M', + errors='coerce' + ) + + remaining_mask = parsed.isna() & cleaned_series.notna() + if remaining_mask.any(): + digits = cleaned_series[remaining_mask].str.replace(r'[^0-9]', '', regex=True) + hhmmss_mask = digits.str.match(r'^\d{5,6}$', na=False) + if hhmmss_mask.any(): + hhmmss_values = digits[hhmmss_mask].str.zfill(6) + parsed.loc[hhmmss_values.index] = pandas.to_datetime( + hhmmss_values, + format='%H%M%S', + errors='coerce' + ) + + remaining_mask = parsed.isna() & cleaned_series.notna() + if remaining_mask.any(): + with warnings.catch_warnings(): + warnings.simplefilter('ignore', UserWarning) + parsed.loc[remaining_mask] = pandas.to_datetime( + cleaned_series[remaining_mask], + errors='coerce' + ) + + return parsed + + def _normalize_datetime_component(self, component: str) -> str: + """Normalize datetime component aliases to a canonical value.""" + normalized = (component or '').strip().lower() + aliases = { + 'years': 'year', + 'months': 'month', + 'monthname': 'month_name', + 'month_name': 'month_name', + 'days': 'day', + 'dayofmonth': 'day', + 'dates': 'date', + 'hours': 'hour', + 'hour_of_day': 'hour', + 'timeofday': 'hour', + 'time_of_day': 'hour', + 'minutes': 'minute', + 'dayofweek': 'day_name', + 'day_of_week': 'day_name', + 'weekday': 'day_name', + 'weekday_name': 'day_name', + 'day_name': 'day_name', + 'weekdaynumber': 'weekday_number', + 'weekday_number': 'weekday_number', + 'quarters': 'quarter', + } + return aliases.get(normalized, normalized) + + def _extract_datetime_component(self, parsed_series: pandas.Series, component: str) -> pandas.Series: + """Extract a supported datetime component from a parsed datetime series.""" + normalized = self._normalize_datetime_component(component) + + if normalized == 'year': + return parsed_series.dt.year + if normalized == 'month': + return parsed_series.dt.month + if normalized == 'month_name': + month_names = parsed_series.dt.month_name() + ordered_months = pandas.Categorical( + month_names, + categories=self.MONTH_NAME_ORDER, + ordered=True + ) + return pandas.Series(ordered_months, index=parsed_series.index) + if normalized == 'day': + return parsed_series.dt.day + if normalized == 'date': + return parsed_series.dt.strftime('%Y-%m-%d') + if normalized == 'hour': + return parsed_series.dt.hour + if normalized == 'minute': + return parsed_series.dt.minute + if normalized == 'day_name': + day_names = parsed_series.dt.day_name() + ordered_days = pandas.Categorical( + day_names, + categories=self.DAY_NAME_ORDER, + ordered=True + ) + return pandas.Series(ordered_days, index=parsed_series.index) + if normalized == 'weekday_number': + return parsed_series.dt.dayofweek + if normalized == 'quarter': + return parsed_series.dt.quarter + if normalized == 'week': + return parsed_series.dt.isocalendar().week.astype(int) + + raise ValueError( + f"Unsupported datetime component '{component}'. " + "Use year, month, month_name, day, date, hour, minute, day_name, weekday_number, quarter, or week." + ) + + def _parse_boolean_argument(self, value, default=True) -> bool: + """Parse common string boolean values for plugin inputs.""" + if isinstance(value, bool): + return value + if value is None: + return default + + normalized = str(value).strip().lower() + if normalized in {'true', '1', 'yes', 'y', 'on'}: + return True + if normalized in {'false', '0', 'no', 'n', 'off'}: + return False + return default + + def _ordered_grouped_results(self, grouped: pandas.Series, component: str) -> pandas.Series: + """Return grouped results in a natural chronological order where possible.""" + normalized = self._normalize_datetime_component(component) + if normalized == 'day_name': + return grouped.reindex([day for day in self.DAY_NAME_ORDER if day in grouped.index]) + if normalized == 'month_name': + return grouped.reindex([month for month in self.MONTH_NAME_ORDER if month in grouped.index]) + return grouped.sort_index() + + def _series_to_json_dict(self, series: pandas.Series) -> dict: + """Convert a pandas Series into a JSON-safe dictionary.""" + safe_dict = {} + for index, value in series.items(): + safe_dict[str(index)] = value.item() if hasattr(value, 'item') else value + return safe_dict + + def _scalar_to_json_value(self, value): + """Convert a scalar value to a JSON-safe representation.""" + if pandas.isna(value): + return None + return value.item() if hasattr(value, 'item') else value + + def _build_grouped_summary(self, grouped: pandas.Series) -> dict: + """Build generic summary fields for grouped metric outputs.""" + if grouped.empty: + return {} + + descending_values = grouped.sort_values(ascending=False) + ascending_values = grouped.sort_values(ascending=True) + summary = { + 'highest_group': str(descending_values.index[0]), + 'highest_value': self._scalar_to_json_value(descending_values.iloc[0]), + 'lowest_group': str(ascending_values.index[0]), + 'lowest_value': self._scalar_to_json_value(ascending_values.iloc[0]), + 'average_group_value': self._scalar_to_json_value(grouped.mean()), + 'median_group_value': self._scalar_to_json_value(grouped.median()), + } + + if len(descending_values) > 1: + summary['second_highest_group'] = str(descending_values.index[1]) + summary['second_highest_value'] = self._scalar_to_json_value(descending_values.iloc[1]) + + return summary + + def _resolve_blob_location(self, user_id: str, conversation_id: str, filename: str, source: str, + group_id: str = None, public_workspace_id: str = None) -> tuple: + """Resolve container name and blob path from source type.""" + source = source.lower().strip() + if source == 'chat': + container = storage_account_personal_chat_container_name + blob_path = f"{user_id}/{conversation_id}/{filename}" + elif source == 'workspace': + container = storage_account_user_documents_container_name + blob_path = f"{user_id}/{filename}" + elif source == 'group': + if not group_id: + raise ValueError("group_id is required for source='group'") + container = storage_account_group_documents_container_name + blob_path = f"{group_id}/{filename}" + elif source == 'public': + if not public_workspace_id: + raise ValueError("public_workspace_id is required for source='public'") + container = storage_account_public_documents_container_name + blob_path = f"{public_workspace_id}/{filename}" + else: + raise ValueError(f"Unknown source '{source}'. Use 'workspace', 'chat', 'group', or 'public'.") + return container, blob_path + + def _resolve_blob_location_with_fallback(self, user_id: str, conversation_id: str, filename: str, source: str, + group_id: str = None, public_workspace_id: str = None) -> tuple: + """Try primary source first, then fall back to other containers if blob not found.""" + source = source.lower().strip() + attempts = [] + + # Primary attempt based on specified source + try: + primary = self._resolve_blob_location(user_id, conversation_id, filename, source, group_id, public_workspace_id) + attempts.append(primary) + except ValueError: + pass + + # Fallback attempts in priority order (skip the primary source) + if source != 'workspace': + attempts.append((storage_account_user_documents_container_name, f"{user_id}/{filename}")) + if source != 'group' and group_id: + attempts.append((storage_account_group_documents_container_name, f"{group_id}/{filename}")) + if source != 'public' and public_workspace_id: + attempts.append((storage_account_public_documents_container_name, f"{public_workspace_id}/{filename}")) + if source != 'chat': + attempts.append((storage_account_personal_chat_container_name, f"{user_id}/{conversation_id}/{filename}")) + + client = self._get_blob_service_client() + for container, blob_path in attempts: + try: + blob_client = client.get_blob_client(container=container, blob=blob_path) + if blob_client.exists(): + log_event(f"[TabularProcessingPlugin] Found blob at {container}/{blob_path}", level=logging.DEBUG) + return container, blob_path + except Exception: + continue + + # If nothing found, return primary for the original error message + if attempts: + return attempts[0] + raise ValueError(f"Could not resolve blob location for {filename}") + + @kernel_function( + description=( + "List all tabular data files available for a user. Checks workspace documents " + "(user-documents container), chat-uploaded documents (personal-chat container), " + "and optionally group or public workspace documents. " + "Returns a JSON list of available files with their source." + ), + name="list_tabular_files" + ) + @plugin_function_logger("TabularProcessingPlugin") + async def list_tabular_files( + self, + user_id: Annotated[str, "The user ID (from Scope ID in Conversation Metadata)"], + conversation_id: Annotated[str, "The conversation ID (from Conversation Metadata)"], + group_id: Annotated[Optional[str], "Group ID (for group workspace documents)"] = None, + public_workspace_id: Annotated[Optional[str], "Public workspace ID (for public workspace documents)"] = None, + ) -> Annotated[str, "JSON list of available tabular files"]: + """List all tabular files available for the user across all accessible containers.""" + def _sync_work(): + results = [] + try: + workspace_prefix = f"{user_id}/" + workspace_blobs = self._list_tabular_blobs( + storage_account_user_documents_container_name, workspace_prefix + ) + for blob in workspace_blobs: + filename = blob.split('/')[-1] + workbook_metadata = self._get_workbook_metadata( + storage_account_user_documents_container_name, + blob, + ) + results.append({ + "filename": filename, + "blob_path": blob, + "source": "workspace", + "container": storage_account_user_documents_container_name, + "sheet_names": workbook_metadata.get('sheet_names', []), + "sheet_count": workbook_metadata.get('sheet_count', 0), + }) + except Exception as e: + log_event(f"[TabularProcessingPlugin] Error listing workspace blobs: {e}", level=logging.WARNING) + + try: + chat_prefix = f"{user_id}/{conversation_id}/" + chat_blobs = self._list_tabular_blobs( + storage_account_personal_chat_container_name, chat_prefix + ) + for blob in chat_blobs: + filename = blob.split('/')[-1] + workbook_metadata = self._get_workbook_metadata( + storage_account_personal_chat_container_name, + blob, + ) + results.append({ + "filename": filename, + "blob_path": blob, + "source": "chat", + "container": storage_account_personal_chat_container_name, + "sheet_names": workbook_metadata.get('sheet_names', []), + "sheet_count": workbook_metadata.get('sheet_count', 0), + }) + except Exception as e: + log_event(f"[TabularProcessingPlugin] Error listing chat blobs: {e}", level=logging.WARNING) + + if group_id: + try: + group_prefix = f"{group_id}/" + group_blobs = self._list_tabular_blobs( + storage_account_group_documents_container_name, group_prefix + ) + for blob in group_blobs: + filename = blob.split('/')[-1] + workbook_metadata = self._get_workbook_metadata( + storage_account_group_documents_container_name, + blob, + ) + results.append({ + "filename": filename, + "blob_path": blob, + "source": "group", + "container": storage_account_group_documents_container_name, + "sheet_names": workbook_metadata.get('sheet_names', []), + "sheet_count": workbook_metadata.get('sheet_count', 0), + }) + except Exception as e: + log_event(f"[TabularProcessingPlugin] Error listing group blobs: {e}", level=logging.WARNING) + + if public_workspace_id: + try: + public_prefix = f"{public_workspace_id}/" + public_blobs = self._list_tabular_blobs( + storage_account_public_documents_container_name, public_prefix + ) + for blob in public_blobs: + filename = blob.split('/')[-1] + workbook_metadata = self._get_workbook_metadata( + storage_account_public_documents_container_name, + blob, + ) + results.append({ + "filename": filename, + "blob_path": blob, + "source": "public", + "container": storage_account_public_documents_container_name, + "sheet_names": workbook_metadata.get('sheet_names', []), + "sheet_count": workbook_metadata.get('sheet_count', 0), + }) + except Exception as e: + log_event(f"[TabularProcessingPlugin] Error listing public blobs: {e}", level=logging.WARNING) + + return json.dumps(results, indent=2) + return await asyncio.to_thread(_sync_work) + + @kernel_function( + description=( + "Get a summary of a tabular file including column names, row count, data types, " + "and a preview of the first few rows." + ), + name="describe_tabular_file" + ) + @plugin_function_logger("TabularProcessingPlugin") + async def describe_tabular_file( + self, + user_id: Annotated[str, "The user ID (from Scope ID in Conversation Metadata)"], + conversation_id: Annotated[str, "The conversation ID (from Conversation Metadata)"], + filename: Annotated[str, "The filename of the tabular file"], + sheet_name: Annotated[Optional[str], "Optional worksheet name for Excel files. When omitted on multi-sheet workbooks, the response returns workbook-level sheet schemas."] = None, + sheet_index: Annotated[Optional[str], "Optional zero-based worksheet index for Excel files. Ignored when sheet_name is provided."] = None, + source: Annotated[str, "Source: 'workspace', 'chat', 'group', or 'public'"] = "chat", + group_id: Annotated[Optional[str], "Group ID (for group workspace documents)"] = None, + public_workspace_id: Annotated[Optional[str], "Public workspace ID (for public workspace documents)"] = None, + ) -> Annotated[str, "JSON summary of the tabular file"]: + """Get schema and preview of a tabular file.""" + def _sync_work(): + try: + container, blob_path = self._resolve_blob_location_with_fallback( + user_id, conversation_id, filename, source, + group_id=group_id, public_workspace_id=public_workspace_id + ) + workbook_metadata = self._get_workbook_metadata(container, blob_path) + + if workbook_metadata.get('is_workbook') and workbook_metadata.get('sheet_count', 0) > 1 and not (sheet_name or sheet_index): + summary = self._build_workbook_schema_summary( + container, + blob_path, + filename, + preview_rows=3, + ) + else: + selected_sheet, workbook_metadata = self._resolve_sheet_selection( + container, + blob_path, + sheet_name=sheet_name, + sheet_index=sheet_index, + require_explicit_sheet=False, + ) + df = self._read_tabular_blob_to_dataframe( + container, + blob_path, + sheet_name=selected_sheet, + require_explicit_sheet=False, + ) + summary = self._build_sheet_schema_summary(df, selected_sheet, preview_rows=5) + summary.update({ + "filename": filename, + "is_workbook": workbook_metadata.get('is_workbook', False), + "sheet_names": workbook_metadata.get('sheet_names', []), + "sheet_count": workbook_metadata.get('sheet_count', 0), + }) + + return json.dumps(summary, indent=2, default=str) + except Exception as e: + log_event(f"[TabularProcessingPlugin] Error describing file: {e}", level=logging.WARNING) + return json.dumps({"error": str(e)}) + return await asyncio.to_thread(_sync_work) + + @kernel_function( + description=( + "Look up one or more rows by label/category in a tabular file and return the value from a target column. " + "Best for questions like 'What was Total Assets in Nov-25?' or 'What was Net Worth in Dec-25?'." + ), + name="lookup_value" + ) + @plugin_function_logger("TabularProcessingPlugin") + async def lookup_value( + self, + user_id: Annotated[str, "The user ID (from Scope ID in Conversation Metadata)"], + conversation_id: Annotated[str, "The conversation ID (from Conversation Metadata)"], + filename: Annotated[str, "The filename of the tabular file"], + lookup_column: Annotated[str, "The label/category column to search, such as Accounts or Category"], + lookup_value: Annotated[str, "The row label/category value to search for, such as Total Assets"], + target_column: Annotated[str, "The target column containing the desired value, such as Nov-25"], + match_operator: Annotated[str, "Match operator: equals, contains, startswith, endswith"] = "equals", + sheet_name: Annotated[Optional[str], "Optional worksheet name for Excel files. Required for analytical calls on multi-sheet workbooks unless sheet_index is provided."] = None, + sheet_index: Annotated[Optional[str], "Optional zero-based worksheet index for Excel files. Ignored when sheet_name is provided."] = None, + source: Annotated[str, "Source: 'workspace', 'chat', 'group', or 'public'"] = "chat", + max_rows: Annotated[str, "Maximum matching rows to return"] = "25", + group_id: Annotated[Optional[str], "Group ID (for group workspace documents)"] = None, + public_workspace_id: Annotated[Optional[str], "Public workspace ID (for public workspace documents)"] = None, + ) -> Annotated[str, "JSON result containing matching rows and target-column values"]: + """Look up values from a target column for matching rows.""" + def _sync_work(): + try: + container, blob_path = self._resolve_blob_location_with_fallback( + user_id, conversation_id, filename, source, + group_id=group_id, public_workspace_id=public_workspace_id + ) + # When no explicit sheet_name is given, try cross-sheet search first + normalized_sheet = (sheet_name or '').strip() + normalized_sheet_idx = None if sheet_index is None else str(sheet_index).strip() + if not normalized_sheet and normalized_sheet_idx in (None, ''): + cross_sheet_result = self._lookup_value_across_sheets( + container, blob_path, filename, + lookup_column, lookup_value, target_column, + match_operator=match_operator, + max_rows=int(max_rows), + ) + if cross_sheet_result is not None: + return cross_sheet_result + selected_sheet, workbook_metadata = self._resolve_sheet_selection( + container, + blob_path, + sheet_name=sheet_name, + sheet_index=sheet_index, + require_explicit_sheet=True, + ) + df = self._read_tabular_blob_to_dataframe( + container, + blob_path, + sheet_name=selected_sheet, + require_explicit_sheet=True, + ) + df = self._try_numeric_conversion(df) + + if lookup_column not in df.columns: + return json.dumps( + self._build_missing_column_error_payload( + container, + blob_path, + filename, + workbook_metadata, + selected_sheet, + lookup_column, + related_columns=[target_column], + available_columns=list(df.columns), + ) + ) + if target_column not in df.columns: + return json.dumps( + self._build_missing_column_error_payload( + container, + blob_path, + filename, + workbook_metadata, + selected_sheet, + target_column, + related_columns=[lookup_column], + available_columns=list(df.columns), + ) + ) + + series = df[lookup_column] + operator = (match_operator or 'equals').strip().lower() + normalized_lookup_value = str(lookup_value) + + if operator in {'equals', '=='}: + mask = series.astype(str).str.lower() == normalized_lookup_value.lower() + elif operator == 'contains': + mask = series.astype(str).str.contains(normalized_lookup_value, case=False, na=False) + elif operator == 'startswith': + mask = series.astype(str).str.lower().str.startswith(normalized_lookup_value.lower()) + elif operator == 'endswith': + mask = series.astype(str).str.lower().str.endswith(normalized_lookup_value.lower()) + else: + return json.dumps({"error": f"Unsupported match_operator: {match_operator}"}) + + limit = int(max_rows) + matches = df[mask].head(limit) + response = { + "filename": filename, + "selected_sheet": selected_sheet if workbook_metadata.get('is_workbook') else None, + "lookup_column": lookup_column, + "lookup_value": lookup_value, + "target_column": target_column, + "match_operator": operator, + "total_matches": int(mask.sum()), + "returned_rows": len(matches), + "data": matches.to_dict(orient='records'), + } + + if len(matches) == 1: + response["value"] = matches.iloc[0][target_column] + + return json.dumps(response, indent=2, default=str) + except Exception as e: + log_event(f"[TabularProcessingPlugin] Error looking up value: {e}", level=logging.WARNING) + return json.dumps({"error": str(e)}) + return await asyncio.to_thread(_sync_work) + + @kernel_function( + description=( + "Execute an aggregation operation on a column of a tabular file. " + "Supported operations: sum, mean, count, min, max, median, std, nunique, value_counts." + ), + name="aggregate_column" + ) + @plugin_function_logger("TabularProcessingPlugin") + async def aggregate_column( + self, + user_id: Annotated[str, "The user ID (from Scope ID in Conversation Metadata)"], + conversation_id: Annotated[str, "The conversation ID (from Conversation Metadata)"], + filename: Annotated[str, "The filename of the tabular file"], + column: Annotated[str, "The column name to aggregate"], + operation: Annotated[str, "Aggregation: sum, mean, count, min, max, median, std, nunique, value_counts"], + sheet_name: Annotated[Optional[str], "Optional worksheet name for Excel files. Required for analytical calls on multi-sheet workbooks unless sheet_index is provided."] = None, + sheet_index: Annotated[Optional[str], "Optional zero-based worksheet index for Excel files. Ignored when sheet_name is provided."] = None, + source: Annotated[str, "Source: 'workspace', 'chat', 'group', or 'public'"] = "chat", + group_id: Annotated[Optional[str], "Group ID (for group workspace documents)"] = None, + public_workspace_id: Annotated[Optional[str], "Public workspace ID (for public workspace documents)"] = None, + ) -> Annotated[str, "JSON result of the aggregation"]: + """Execute an aggregation operation on a column.""" + def _sync_work(): + try: + container, blob_path = self._resolve_blob_location_with_fallback( + user_id, conversation_id, filename, source, + group_id=group_id, public_workspace_id=public_workspace_id + ) + selected_sheet, workbook_metadata = self._resolve_sheet_selection( + container, + blob_path, + sheet_name=sheet_name, + sheet_index=sheet_index, + require_explicit_sheet=True, + ) + df = self._read_tabular_blob_to_dataframe( + container, + blob_path, + sheet_name=selected_sheet, + require_explicit_sheet=True, + ) + df = self._try_numeric_conversion(df) + + if column not in df.columns: + return json.dumps( + self._build_missing_column_error_payload( + container, + blob_path, + filename, + workbook_metadata, + selected_sheet, + column, + available_columns=list(df.columns), + ) + ) + + series = df[column] + op = operation.lower().strip() + + if op == 'sum': + result = series.sum() + elif op == 'mean': + result = series.mean() + elif op == 'count': + result = series.count() + elif op == 'min': + result = series.min() + elif op == 'max': + result = series.max() + elif op == 'median': + result = series.median() + elif op == 'std': + result = series.std() + elif op == 'nunique': + result = series.nunique() + elif op == 'value_counts': + result = self._series_to_json_dict(series.value_counts()) + else: + return json.dumps({"error": f"Unsupported operation: {operation}. Use sum, mean, count, min, max, median, std, nunique, value_counts."}) + + return json.dumps({ + "filename": filename, + "selected_sheet": selected_sheet if workbook_metadata.get('is_workbook') else None, + "column": column, + "operation": op, + "result": result, + }, indent=2, default=str) + except Exception as e: + log_event(f"[TabularProcessingPlugin] Error aggregating column: {e}", level=logging.WARNING) + return json.dumps({"error": str(e)}) + return await asyncio.to_thread(_sync_work) + + @kernel_function( + description=( + "Filter rows in a tabular file based on conditions and return matching rows. " + "Supports operators: ==, !=, >, <, >=, <=, contains, startswith, endswith." + ), + name="filter_rows" + ) + @plugin_function_logger("TabularProcessingPlugin") + async def filter_rows( + self, + user_id: Annotated[str, "The user ID (from Scope ID in Conversation Metadata)"], + conversation_id: Annotated[str, "The conversation ID (from Conversation Metadata)"], + filename: Annotated[str, "The filename of the tabular file"], + column: Annotated[str, "The column to filter on"], + operator: Annotated[str, "Operator: ==, !=, >, <, >=, <=, contains, startswith, endswith"], + value: Annotated[str, "The value to compare against"], + sheet_name: Annotated[Optional[str], "Optional worksheet name for Excel files. Required for analytical calls on multi-sheet workbooks unless sheet_index is provided."] = None, + sheet_index: Annotated[Optional[str], "Optional zero-based worksheet index for Excel files. Ignored when sheet_name is provided."] = None, + source: Annotated[str, "Source: 'workspace', 'chat', 'group', or 'public'"] = "chat", + max_rows: Annotated[str, "Maximum rows to return"] = "100", + group_id: Annotated[Optional[str], "Group ID (for group workspace documents)"] = None, + public_workspace_id: Annotated[Optional[str], "Public workspace ID (for public workspace documents)"] = None, + ) -> Annotated[str, "JSON list of matching rows"]: + """Filter rows based on a condition.""" + def _sync_work(): + try: + container, blob_path = self._resolve_blob_location_with_fallback( + user_id, conversation_id, filename, source, + group_id=group_id, public_workspace_id=public_workspace_id + ) + # When no explicit sheet_name is given, try cross-sheet search first + normalized_sheet = (sheet_name or '').strip() + normalized_sheet_idx = None if sheet_index is None else str(sheet_index).strip() + if not normalized_sheet and normalized_sheet_idx in (None, ''): + cross_sheet_result = self._filter_rows_across_sheets( + container, blob_path, filename, column, operator, value, + max_rows=int(max_rows), + ) + if cross_sheet_result is not None: + return cross_sheet_result + selected_sheet, workbook_metadata = self._resolve_sheet_selection( + container, + blob_path, + sheet_name=sheet_name, + sheet_index=sheet_index, + require_explicit_sheet=True, + ) + df = self._read_tabular_blob_to_dataframe( + container, + blob_path, + sheet_name=selected_sheet, + require_explicit_sheet=True, + ) + df = self._try_numeric_conversion(df) + + if column not in df.columns: + return json.dumps({"error": f"Column '{column}' not found. Available: {list(df.columns)}"}) + + series = df[column] + op = operator.strip().lower() + + numeric_value = None + try: + numeric_value = float(value) + except (ValueError, TypeError): + pass + + if op == '==' or op == 'equals': + if numeric_value is not None and pandas.api.types.is_numeric_dtype(series): + mask = series == numeric_value + else: + mask = series.astype(str).str.lower() == value.lower() + elif op == '!=': + if numeric_value is not None and pandas.api.types.is_numeric_dtype(series): + mask = series != numeric_value + else: + mask = series.astype(str).str.lower() != value.lower() + elif op == '>': + mask = series > numeric_value + elif op == '<': + mask = series < numeric_value + elif op == '>=': + mask = series >= numeric_value + elif op == '<=': + mask = series <= numeric_value + elif op == 'contains': + mask = series.astype(str).str.contains(value, case=False, na=False) + elif op == 'startswith': + mask = series.astype(str).str.lower().str.startswith(value.lower()) + elif op == 'endswith': + mask = series.astype(str).str.lower().str.endswith(value.lower()) + else: + return json.dumps({"error": f"Unsupported operator: {operator}"}) + + limit = int(max_rows) + filtered = df[mask].head(limit) + return json.dumps({ + "filename": filename, + "selected_sheet": selected_sheet if workbook_metadata.get('is_workbook') else None, + "total_matches": int(mask.sum()), + "returned_rows": len(filtered), + "data": filtered.to_dict(orient='records') + }, indent=2, default=str) + except Exception as e: + log_event(f"[TabularProcessingPlugin] Error filtering rows: {e}", level=logging.WARNING) + return json.dumps({"error": str(e)}) + return await asyncio.to_thread(_sync_work) + + @kernel_function( + description=( + "Execute a pandas query expression against a tabular file for advanced analysis. " + "The query string uses pandas DataFrame.query() syntax. " + "Examples: 'Age > 30 and State == \"CA\"', 'Price < 100'" + ), + name="query_tabular_data" + ) + @plugin_function_logger("TabularProcessingPlugin") + async def query_tabular_data( + self, + user_id: Annotated[str, "The user ID (from Scope ID in Conversation Metadata)"], + conversation_id: Annotated[str, "The conversation ID (from Conversation Metadata)"], + filename: Annotated[str, "The filename of the tabular file"], + query_expression: Annotated[str, "Pandas query expression (e.g. 'Age > 30 and State == \"CA\"')"], + sheet_name: Annotated[Optional[str], "Optional worksheet name for Excel files. Required for analytical calls on multi-sheet workbooks unless sheet_index is provided."] = None, + sheet_index: Annotated[Optional[str], "Optional zero-based worksheet index for Excel files. Ignored when sheet_name is provided."] = None, + source: Annotated[str, "Source: 'workspace', 'chat', 'group', or 'public'"] = "chat", + max_rows: Annotated[str, "Maximum rows to return"] = "100", + group_id: Annotated[Optional[str], "Group ID (for group workspace documents)"] = None, + public_workspace_id: Annotated[Optional[str], "Public workspace ID (for public workspace documents)"] = None, + ) -> Annotated[str, "JSON result of the query"]: + """Execute a pandas query expression against a tabular file.""" + def _sync_work(): + try: + container, blob_path = self._resolve_blob_location_with_fallback( + user_id, conversation_id, filename, source, + group_id=group_id, public_workspace_id=public_workspace_id + ) + # When no explicit sheet_name is given, try cross-sheet query first + normalized_sheet = (sheet_name or '').strip() + normalized_sheet_idx = None if sheet_index is None else str(sheet_index).strip() + if not normalized_sheet and normalized_sheet_idx in (None, ''): + cross_sheet_result = self._query_tabular_data_across_sheets( + container, blob_path, filename, query_expression, + max_rows=int(max_rows), + ) + if cross_sheet_result is not None: + return cross_sheet_result + selected_sheet, workbook_metadata = self._resolve_sheet_selection( + container, + blob_path, + sheet_name=sheet_name, + sheet_index=sheet_index, + require_explicit_sheet=True, + ) + df = self._read_tabular_blob_to_dataframe( + container, + blob_path, + sheet_name=selected_sheet, + require_explicit_sheet=True, + ) + df = self._try_numeric_conversion(df) + + result_df = df.query(query_expression) + limit = int(max_rows) + return json.dumps({ + "filename": filename, + "selected_sheet": selected_sheet if workbook_metadata.get('is_workbook') else None, + "total_matches": len(result_df), + "returned_rows": min(len(result_df), limit), + "data": result_df.head(limit).to_dict(orient='records') + }, indent=2, default=str) + except Exception as e: + log_event(f"[TabularProcessingPlugin] Error querying data: {e}", level=logging.WARNING) + return json.dumps({"error": f"Query error: {str(e)}. Ensure column names and values are correct."}) + return await asyncio.to_thread(_sync_work) + + @kernel_function( + description=( + "Perform a group-by aggregation on a tabular file. " + "Groups data by one column and aggregates another column. " + "Supported operations: sum, mean, count, min, max, median, std. " + "Returns top grouped results plus highest and lowest group summary fields." + ), + name="group_by_aggregate" + ) + @plugin_function_logger("TabularProcessingPlugin") + async def group_by_aggregate( + self, + user_id: Annotated[str, "The user ID (from Scope ID in Conversation Metadata)"], + conversation_id: Annotated[str, "The conversation ID (from Conversation Metadata)"], + filename: Annotated[str, "The filename of the tabular file"], + group_by_column: Annotated[str, "The column to group by"], + aggregate_column: Annotated[str, "The column to aggregate"], + operation: Annotated[str, "Aggregation operation: sum, mean, count, min, max, median, std"], + sheet_name: Annotated[Optional[str], "Optional worksheet name for Excel files. Required for analytical calls on multi-sheet workbooks unless sheet_index is provided."] = None, + sheet_index: Annotated[Optional[str], "Optional zero-based worksheet index for Excel files. Ignored when sheet_name is provided."] = None, + source: Annotated[str, "Source: 'workspace', 'chat', 'group', or 'public'"] = "chat", + top_n: Annotated[str, "How many top groups to return in descending or ascending order"] = "10", + sort_descending: Annotated[str, "Whether top_results should be sorted descending (true/false)"] = "true", + group_id: Annotated[Optional[str], "Group ID (for group workspace documents)"] = None, + public_workspace_id: Annotated[Optional[str], "Public workspace ID (for public workspace documents)"] = None, + ) -> Annotated[str, "JSON result of the group-by aggregation"]: + """Group by one column and aggregate another.""" + def _sync_work(): + try: + container, blob_path = self._resolve_blob_location_with_fallback( + user_id, conversation_id, filename, source, + group_id=group_id, public_workspace_id=public_workspace_id + ) + selected_sheet, workbook_metadata = self._resolve_sheet_selection( + container, + blob_path, + sheet_name=sheet_name, + sheet_index=sheet_index, + require_explicit_sheet=True, + ) + df = self._read_tabular_blob_to_dataframe( + container, + blob_path, + sheet_name=selected_sheet, + require_explicit_sheet=True, + ) + df = self._try_numeric_conversion(df) + + for col in [group_by_column, aggregate_column]: + if col not in df.columns: + related_columns = [group_by_column, aggregate_column] + related_columns = [column_name for column_name in related_columns if column_name != col] + return json.dumps( + self._build_missing_column_error_payload( + container, + blob_path, + filename, + workbook_metadata, + selected_sheet, + col, + related_columns=related_columns, + available_columns=list(df.columns), + ) + ) + + op = operation.lower().strip() + if op not in {'count', 'sum', 'mean', 'min', 'max', 'median', 'std'}: + return json.dumps({ + "error": "Unsupported operation. Use count, sum, mean, min, max, median, or std." + }) + + grouped = df.groupby(group_by_column)[aggregate_column].agg(op) + grouped = grouped.dropna() + if grouped.empty: + return json.dumps({"error": "No grouped results were produced."}) + + top_limit = max(1, int(top_n)) + descending = self._parse_boolean_argument(sort_descending, default=True) + top_results = grouped.sort_values(ascending=not descending).head(top_limit) + ordered_results = grouped.sort_index() + summary = self._build_grouped_summary(grouped) + + return json.dumps({ + "filename": filename, + "selected_sheet": selected_sheet if workbook_metadata.get('is_workbook') else None, + "group_by": group_by_column, + "aggregate_column": aggregate_column, + "operation": op, + "groups": len(grouped), + "top_results": self._series_to_json_dict(top_results), + "result": self._series_to_json_dict(ordered_results), + **summary, + }, indent=2, default=str) + except Exception as e: + log_event(f"[TabularProcessingPlugin] Error in group-by: {e}", level=logging.WARNING) + return json.dumps({"error": str(e)}) + return await asyncio.to_thread(_sync_work) + + @kernel_function( + description=( + "Group a tabular file by a component extracted from a datetime-like column and aggregate a metric. " + "Use this for time-based questions such as peak hours, busiest weekdays, or monthly trends. " + "Supported datetime components: year, month, month_name, day, date, hour, minute, day_name, " + "weekday_number, quarter, week. Supported operations: count, sum, mean, min, max, median, std. " + "An optional pandas query filter can be applied before grouping. Returns top grouped results plus highest and lowest summary fields." + ), + name="group_by_datetime_component" + ) + @plugin_function_logger("TabularProcessingPlugin") + async def group_by_datetime_component( + self, + user_id: Annotated[str, "The user ID (from Scope ID in Conversation Metadata)"], + conversation_id: Annotated[str, "The conversation ID (from Conversation Metadata)"], + filename: Annotated[str, "The filename of the tabular file"], + datetime_column: Annotated[str, "The datetime-like column to extract a component from"], + datetime_component: Annotated[str, "Component: year, month, month_name, day, date, hour, minute, day_name, weekday_number, quarter, or week"], + aggregate_column: Annotated[Optional[str], "The numeric column to aggregate. Leave empty and use operation='count' to count rows."] = "", + operation: Annotated[str, "Aggregation operation: count, sum, mean, min, max, median, std"] = "count", + sheet_name: Annotated[Optional[str], "Optional worksheet name for Excel files. Required for analytical calls on multi-sheet workbooks unless sheet_index is provided."] = None, + sheet_index: Annotated[Optional[str], "Optional zero-based worksheet index for Excel files. Ignored when sheet_name is provided."] = None, + source: Annotated[str, "Source: 'workspace', 'chat', 'group', or 'public'"] = "chat", + filter_expression: Annotated[Optional[str], "Optional pandas query filter applied before grouping"] = "", + top_n: Annotated[str, "How many top groups to return in descending order"] = "10", + sort_descending: Annotated[str, "Whether top_results should be sorted descending (true/false)"] = "true", + group_id: Annotated[Optional[str], "Group ID (for group workspace documents)"] = None, + public_workspace_id: Annotated[Optional[str], "Public workspace ID (for public workspace documents)"] = None, + ) -> Annotated[str, "JSON result of the datetime component grouping analysis"]: + """Group data by a datetime component and aggregate a metric.""" + def _sync_work(): + try: + container, blob_path = self._resolve_blob_location_with_fallback( + user_id, + conversation_id, + filename, + source, + group_id=group_id, + public_workspace_id=public_workspace_id + ) + selected_sheet, workbook_metadata = self._resolve_sheet_selection( + container, + blob_path, + sheet_name=sheet_name, + sheet_index=sheet_index, + require_explicit_sheet=True, + ) + df = self._read_tabular_blob_to_dataframe( + container, + blob_path, + sheet_name=selected_sheet, + require_explicit_sheet=True, + ) + df = self._try_numeric_conversion(df) + + if filter_expression: + try: + df = df.query(filter_expression) + except Exception as query_error: + return json.dumps({ + "error": f"Filter query error: {query_error}. Ensure column names and values are correct." + }) + + if datetime_column not in df.columns: + related_columns = [aggregate_column] if aggregate_column else [] + return json.dumps( + self._build_missing_column_error_payload( + container, + blob_path, + filename, + workbook_metadata, + selected_sheet, + datetime_column, + related_columns=related_columns, + available_columns=list(df.columns), + ) + ) + + parsed_datetime = self._parse_datetime_like_series(df[datetime_column]) + valid_mask = parsed_datetime.notna() + if not valid_mask.any(): + return json.dumps({ + "error": ( + f"Could not parse any datetime values from column '{datetime_column}'. " + "Try a different datetime column or inspect the file schema preview." + ) + }) + + filtered_df = df.loc[valid_mask].copy() + parsed_datetime = parsed_datetime.loc[valid_mask] + component_values = self._extract_datetime_component(parsed_datetime, datetime_component) + + component_column_name = f"__datetime_component_{self._normalize_datetime_component(datetime_component)}" + filtered_df[component_column_name] = component_values + + op = (operation or 'count').strip().lower() + if op not in {'count', 'sum', 'mean', 'min', 'max', 'median', 'std'}: + return json.dumps({ + "error": "Unsupported operation. Use count, sum, mean, min, max, median, or std." + }) + + aggregate_column_name = (aggregate_column or '').strip() + if op == 'count' and not aggregate_column_name: + grouped = filtered_df.groupby(component_column_name).size() + else: + if not aggregate_column_name: + return json.dumps({ + "error": "aggregate_column is required unless operation='count'." + }) + if aggregate_column_name not in filtered_df.columns: + return json.dumps( + self._build_missing_column_error_payload( + container, + blob_path, + filename, + workbook_metadata, + selected_sheet, + aggregate_column_name, + related_columns=[datetime_column], + available_columns=list(filtered_df.columns), + ) + ) + grouped = filtered_df.groupby(component_column_name)[aggregate_column_name].agg(op) + + grouped = grouped.dropna() + if grouped.empty: + return json.dumps({ + "error": "No grouped results were produced after filtering and datetime parsing." + }) + + top_limit = max(1, int(top_n)) + descending = self._parse_boolean_argument(sort_descending, default=True) + top_results = grouped.sort_values(ascending=not descending).head(top_limit) + ordered_results = self._ordered_grouped_results(grouped, datetime_component) + summary = self._build_grouped_summary(grouped) + + return json.dumps({ + "filename": filename, + "selected_sheet": selected_sheet if workbook_metadata.get('is_workbook') else None, + "datetime_column": datetime_column, + "datetime_component": self._normalize_datetime_component(datetime_component), + "aggregate_column": aggregate_column_name or None, + "operation": op, + "filter_expression": filter_expression or None, + "parsed_rows": int(valid_mask.sum()), + "dropped_rows": int((~valid_mask).sum()), + "groups": int(len(grouped)), + "top_results": self._series_to_json_dict(top_results), + "result": self._series_to_json_dict(ordered_results), + **summary, + }, indent=2, default=str) + except Exception as e: + log_event(f"[TabularProcessingPlugin] Error in datetime component grouping: {e}", level=logging.WARNING) + return json.dumps({"error": str(e)}) + return await asyncio.to_thread(_sync_work) diff --git a/application/single_app/simplechat_scheduler.py b/application/single_app/simplechat_scheduler.py new file mode 100644 index 00000000..2435227f --- /dev/null +++ b/application/single_app/simplechat_scheduler.py @@ -0,0 +1,38 @@ +# simplechat_scheduler.py + +"""Dedicated scheduler entrypoint for SimpleChat background tasks.""" + +import logging +import os +import sys + +import app_settings_cache +from background_tasks import run_scheduler_forever +from config import get_redis_cache_infrastructure_endpoint, initialize_clients +from functions_appinsights import setup_appinsights_logging +from functions_settings import get_settings + + +def initialize_scheduler_runtime(): + """Prepare settings cache, clients, and logging for scheduler execution.""" + print('Initializing SimpleChat scheduler runtime...') + settings = get_settings(use_cosmos=True) + redis_hostname = settings.get('redis_url', '').strip().split('.')[0] + app_settings_cache.configure_app_cache( + settings, + get_redis_cache_infrastructure_endpoint(redis_hostname) + ) + app_settings_cache.update_settings_cache(settings) + initialize_clients(settings) + setup_appinsights_logging(settings) + logging.basicConfig(level=logging.DEBUG) + print('SimpleChat scheduler runtime initialized.') + + +if __name__ == '__main__': + try: + initialize_scheduler_runtime() + run_scheduler_forever() + except KeyboardInterrupt: + print('SimpleChat scheduler stopped.') + sys.exit(0) \ No newline at end of file diff --git a/application/single_app/static/css/chats.css b/application/single_app/static/css/chats.css index 38e11c3a..8bdd0711 100644 --- a/application/single_app/static/css/chats.css +++ b/application/single_app/static/css/chats.css @@ -20,6 +20,13 @@ z-index: 1050 !important; /* Ensure it's above other elements */ } +.chat-searchable-select .dropdown-menu.show { + display: block !important; + opacity: 1 !important; + visibility: visible !important; + z-index: 1050 !important; +} + /* Handle dropdown positioning at the edge of viewport */ #document-dropdown.dropup .dropdown-menu { bottom: 100% !important; @@ -39,6 +46,188 @@ right: auto !important; /* Prevent right positioning */ } +.chat-searchable-select { + min-width: 120px; +} + +.chat-toolbar { + display: flex; + flex-wrap: wrap; + align-items: center; + gap: 0.75rem; +} + +.chat-toolbar-actions, +.chat-toolbar-controls, +.chat-toolbar-toggles, +.chat-toolbar-selectors { + display: flex; + align-items: center; + gap: 0.5rem; + min-width: 0; +} + +.chat-toolbar-actions { + flex: 1 1 320px; + flex-wrap: wrap; +} + +.chat-toolbar-controls { + flex: 1 1 540px; + flex-wrap: wrap; + justify-content: flex-end; + align-items: center; + margin-left: auto; +} + +.chat-toolbar-toggles { + flex: 0 0 auto; + flex-wrap: wrap; + justify-content: flex-end; +} + +.chat-toolbar-selectors { + flex: 1 1 460px; + flex-wrap: wrap; + justify-content: flex-end; +} + +.chat-toolbar-selector { + flex: 1 1 200px; + min-width: 180px; + max-width: 280px; +} + +.chat-toolbar-selector .chat-searchable-select { + width: 100%; + min-width: 0; +} + +#prompt-selection-container.chat-toolbar-selector { + max-width: 300px; +} + +#model-select-container.chat-toolbar-selector, +#agent-select-container.chat-toolbar-selector { + max-width: 230px; +} + +.chat-searchable-select .dropdown-menu { + min-width: 100%; + max-width: 360px; + max-height: 60vh; + overflow: hidden; + padding: 8px; +} + +.chat-searchable-select-button { + text-align: left; + position: relative; + overflow: hidden; + text-overflow: ellipsis; + white-space: nowrap; + padding-right: 1.5rem; +} + +.chat-searchable-select-text { + display: inline-block; + max-width: calc(100% - 20px); + overflow: hidden; + text-overflow: ellipsis; + white-space: nowrap; +} + +.chat-searchable-select-search, +.chat-dropdown-search { + padding: 0 0.25rem; +} + +.chat-searchable-select-items { + max-height: 40vh; + overflow-y: auto; + padding: 0; +} + +.chat-searchable-select-items .dropdown-item { + display: block; + width: 100%; + text-align: left; + white-space: nowrap; + overflow: hidden; + text-overflow: ellipsis; + cursor: pointer; + padding: 0.5rem 0.75rem; +} + +.chat-searchable-select-items .dropdown-item.active { + background-color: #e9ecef; + color: #212529; +} + +.chat-searchable-select-items .dropdown-item:hover { + background-color: #f8f9fa; +} + +.chat-searchable-select-items .dropdown-item.disabled { + color: #6c757d; + cursor: default; +} + +.chat-searchable-select-items .dropdown-item.disabled:hover { + background-color: transparent; +} + +#scope-dropdown-menu, +#tags-dropdown-menu { + overflow: hidden !important; +} + +@media (max-width: 1200px) { + .chat-toolbar-controls, + .chat-toolbar-selectors { + justify-content: flex-start; + margin-left: 0; + } +} + +@media (max-width: 768px) { + .chat-toolbar-controls { + flex-basis: 100%; + justify-content: flex-start; + margin-left: 0; + } + + .chat-toolbar-selectors { + flex-basis: 100%; + } + + .chat-toolbar-toggles { + width: 100%; + } + + .chat-toolbar-selector { + flex: 1 1 100%; + min-width: 0; + max-width: none; + } +} + +#scope-dropdown-items { + max-height: 320px; + overflow-y: auto; +} + +#tags-dropdown-items { + max-height: 220px; + overflow-y: auto; +} + +#scope-dropdown-items .dropdown-item, +#tags-dropdown-items .dropdown-item { + width: 100%; + text-align: left; +} + /* Document dropdown items must be explicitly displayed */ #document-dropdown .dropdown-item { display: block !important; @@ -455,6 +644,25 @@ body.layout-split .gutter { /* Dark grey text for better contrast */ } +.conversation-title-row { + min-width: 0; +} + +.conversation-unread-dot { + width: 0.625rem; + height: 0.625rem; + min-width: 0.625rem; + border-radius: 999px; + background-color: #198754; + box-shadow: 0 0 0 2px rgba(25, 135, 84, 0.15); + flex-shrink: 0; +} + +[data-bs-theme="dark"] .conversation-unread-dot { + background-color: #39d98a; + box-shadow: 0 0 0 2px rgba(57, 217, 138, 0.2); +} + .message-footer { /* position: absolute; bottom: 8px; left: 10px; right: 10px; */ /* Removed absolute positioning */ display: flex; @@ -770,6 +978,12 @@ a.citation-link:hover { justify-content: center; } +.search-btn, +.file-btn { + flex: 0 0 auto; + white-space: nowrap; +} + /* Hide the text initially */ .search-btn .search-btn-text { opacity: 0; @@ -1046,7 +1260,7 @@ a.citation-link:hover { .message-content { display: flex; align-items: flex-end; - overflow: visible; /* Allow dropdown menus to appear outside content */ + overflow: auto; /* Preserving higher level visible property while allowing response message scroll if needed */ } .message-content.flex-row-reverse { @@ -1258,8 +1472,7 @@ ol { align-items: flex-end; } #prompt-selection-container { - /* Prevent the container from growing vertically */ - align-self: flex-end; /* Align item itself to bottom */ + align-self: auto; } #prompt-select { /* Adjust max-width as needed */ @@ -1676,4 +1889,160 @@ mark.search-highlight { 100% { transform: scale(1.05); } +} + +/* ============================================= + Processing Thoughts + ============================================= */ + +/* Loading indicator thought text */ +.thought-live-text { + font-style: italic; + white-space: nowrap; + overflow: hidden; + text-overflow: ellipsis; + max-width: 300px; +} + +/* Toggle button in message footer */ +.thoughts-toggle-btn { + font-size: 0.9rem; + color: #6c757d; + padding: 0 0.25rem; + border: none; + background: none; + cursor: pointer; + transition: color 0.15s ease-in-out; +} + +.thoughts-toggle-btn:hover { + color: #ffc107; +} + +/* Collapsible container inside message bubble */ +.thoughts-container { + max-height: 300px; + overflow-y: auto; + font-size: 0.85rem; +} + +/* Timeline wrapper */ +.thoughts-list { + position: relative; + padding-left: 1.25rem; +} + +/* Vertical timeline line */ +.thoughts-list::before { + content: ''; + position: absolute; + left: 0.5rem; + top: 0.25rem; + bottom: 0.25rem; + width: 2px; + background: linear-gradient(to bottom, #0d6efd, #6ea8fe); + border-radius: 1px; +} + +/* Individual thought step */ +.thought-step { + display: flex; + align-items: flex-start; + padding-left: 0.75rem; + padding-top: 0.25rem; + padding-bottom: 0.25rem; + position: relative; +} + +/* Timeline node dot */ +.thought-step::before { + content: ''; + position: absolute; + left: -1rem; + top: 0.55rem; + width: 8px; + height: 8px; + border-radius: 50%; + background-color: #0d6efd; + border: 2px solid #fff; + box-shadow: 0 0 0 1px #0d6efd; + z-index: 1; +} + +/* Last thought step gets a slightly different dot */ +.thought-step:last-child::before { + background-color: #198754; + box-shadow: 0 0 0 1px #198754; +} + +.thought-step i { + flex-shrink: 0; + margin-top: 2px; +} + +/* Streaming cursor thought badge pulse animation */ +.animate-pulse { + animation: thought-pulse 1.5s ease-in-out infinite; +} + +/* Streaming thought display (before content arrives) */ +.streaming-thought-display { + display: flex; + align-items: center; + padding: 0.5rem 0; +} + +/* Light mode: use darker, more readable colors */ +.streaming-thought-display .badge { + background-color: rgba(13, 110, 253, 0.08) !important; + color: #0a58ca !important; + border-color: rgba(13, 110, 253, 0.25) !important; +} + +/* Dark mode: lighter accent colors */ +[data-bs-theme="dark"] .streaming-thought-display .badge { + background-color: rgba(13, 202, 240, 0.15) !important; + color: #6edff6 !important; + border-color: rgba(13, 202, 240, 0.3) !important; +} + +@keyframes thought-pulse { + 0%, 100% { + opacity: 1; + } + 50% { + opacity: 0.6; + } +} + +/* Dark mode overrides */ +[data-bs-theme="dark"] .thoughts-toggle-btn { + color: #adb5bd; +} + +[data-bs-theme="dark"] .thoughts-toggle-btn:hover { + color: #ffc107; +} + +[data-bs-theme="dark"] .thought-step { + /* Dark mode dot border matches dark background */ +} + +[data-bs-theme="dark"] .thought-step::before { + border-color: #212529; + background-color: #6ea8fe; + box-shadow: 0 0 0 1px #6ea8fe; +} + +[data-bs-theme="dark"] .thought-step:last-child::before { + background-color: #75b798; + box-shadow: 0 0 0 1px #75b798; +} + +[data-bs-theme="dark"] .thoughts-list::before { + background: linear-gradient(to bottom, #6ea8fe, #9ec5fe); +} + +[data-bs-theme="dark"] .thoughts-container { + border-top-color: #495057 !important; } \ No newline at end of file diff --git a/application/single_app/static/css/sidebar.css b/application/single_app/static/css/sidebar.css index ebc40910..932051c6 100644 --- a/application/single_app/static/css/sidebar.css +++ b/application/single_app/static/css/sidebar.css @@ -238,6 +238,16 @@ body.sidebar-nav-enabled.has-classification-banner .container-fluid { transition: color 0.2s ease; } +.sidebar-conversation-header { + min-width: 0; +} + +.sidebar-conversation-unread-dot { + width: 0.55rem; + height: 0.55rem; + min-width: 0.55rem; +} + /* Edit mode input styling */ .sidebar-conversation-item input.form-control { background: rgba(255, 255, 255, 0.95); diff --git a/application/single_app/static/css/styles.css b/application/single_app/static/css/styles.css index e537590d..eacc8859 100644 --- a/application/single_app/static/css/styles.css +++ b/application/single_app/static/css/styles.css @@ -502,6 +502,95 @@ main { flex-grow: 1; } +/* ============================================ + Item cards (agents/actions grid view) + ============================================ */ +.item-card { + cursor: default; + transition: all 0.3s ease; + border: 1px solid #dee2e6; + border-radius: 0.375rem; + background-color: #ffffff; +} + +.item-card:hover { + border-color: #adb5bd; + transform: translateY(-2px); + box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1); +} + +.item-card .card-title { + font-weight: 600; + font-size: 0.9rem; + color: #212529; +} + +.item-card .card-text { + color: #6c757d; + font-size: 0.8rem; + line-height: 1.4; +} + +.item-card .item-card-icon { + color: #0d6efd; +} + +.item-card .item-card-buttons { + border-top: 1px solid #f0f0f0; + padding-top: 0.5rem; +} + +/* Dark mode for item cards */ +[data-bs-theme="dark"] .item-card { + background-color: #343a40; + border: 1px solid #495057; + color: #e9ecef; +} + +[data-bs-theme="dark"] .item-card:hover { + background-color: #3d444b; + border-color: #6c757d; +} + +[data-bs-theme="dark"] .item-card .card-title { + color: #e9ecef; +} + +[data-bs-theme="dark"] .item-card .card-text { + color: #adb5bd; +} + +[data-bs-theme="dark"] .item-card .item-card-icon { + color: #6ea8fe; +} + +[data-bs-theme="dark"] .item-card .item-card-buttons { + border-top-color: #495057; +} + +/* Improved table column layout for agents and actions */ +.item-list-table th:nth-child(1), +.item-list-table td:nth-child(1) { + width: 28%; + min-width: 140px; +} + +.item-list-table th:nth-child(2), +.item-list-table td:nth-child(2) { + width: 47%; + max-width: 0; + overflow: hidden; + text-overflow: ellipsis; + white-space: nowrap; +} + +.item-list-table th:nth-child(3), +.item-list-table td:nth-child(3) { + width: 25%; + min-width: 160px; + white-space: nowrap; +} + /* Connection type buttons */ .connection-type-btn { border: 2px solid #dee2e6; @@ -854,3 +943,171 @@ main { [data-bs-theme="dark"] .message-content a:visited { color: #b399ff !important; /* Purple-ish for visited links */ } + +/* ============================================ + Rendered Markdown — table & code block styles + Shared by agent detail view, template preview, + and any non-chat area that renders Markdown. + ============================================ */ + +/* --- Tables --- */ +.rendered-markdown table { + width: 100%; + max-width: 100%; + margin: 0.75rem 0; + border-collapse: collapse; + border-spacing: 0; + border: 1px solid #dee2e6; + border-radius: 0.375rem; + overflow: hidden; + background-color: var(--bs-body-bg); + box-shadow: 0 0.125rem 0.25rem rgba(0, 0, 0, 0.075); + font-size: 0.875rem; + display: block; + overflow-x: auto; + white-space: nowrap; + -webkit-overflow-scrolling: touch; +} + +@media (min-width: 768px) { + .rendered-markdown table { + display: table; + white-space: normal; + } +} + +.rendered-markdown table th, +.rendered-markdown table td { + padding: 0.5rem 0.75rem; + border-bottom: 1px solid #dee2e6; + border-right: 1px solid #dee2e6; + text-align: left; + vertical-align: top; + word-wrap: break-word; + line-height: 1.4; +} + +.rendered-markdown table th:last-child, +.rendered-markdown table td:last-child { + border-right: none; +} + +.rendered-markdown table thead th { + background-color: #f8f9fa; + font-weight: 600; + color: #495057; + border-bottom: 2px solid #dee2e6; +} + +.rendered-markdown table tbody tr:nth-child(even) { + background-color: rgba(0, 0, 0, 0.02); +} + +.rendered-markdown table tbody tr:hover { + background-color: rgba(0, 0, 0, 0.04); + transition: background-color 0.15s ease-in-out; +} + +.rendered-markdown table th[align="center"], +.rendered-markdown table td[align="center"] { + text-align: center; +} + +.rendered-markdown table th[align="right"], +.rendered-markdown table td[align="right"] { + text-align: right; +} + +/* Dark mode tables */ +[data-bs-theme="dark"] .rendered-markdown table { + border-color: #495057; + background-color: var(--bs-dark); + color: #e9ecef; +} + +[data-bs-theme="dark"] .rendered-markdown table th, +[data-bs-theme="dark"] .rendered-markdown table td { + border-color: #495057; +} + +[data-bs-theme="dark"] .rendered-markdown table thead th { + background-color: #343a40; + color: #e9ecef; + border-bottom-color: #495057; +} + +[data-bs-theme="dark"] .rendered-markdown table tbody tr:nth-child(even) { + background-color: rgba(255, 255, 255, 0.05); +} + +[data-bs-theme="dark"] .rendered-markdown table tbody tr:hover { + background-color: rgba(255, 255, 255, 0.1); +} + +.rendered-markdown table code { + background-color: rgba(0, 0, 0, 0.1); + padding: 0.125rem 0.25rem; + border-radius: 0.25rem; + font-size: 0.8em; +} + +[data-bs-theme="dark"] .rendered-markdown table code { + background-color: rgba(255, 255, 255, 0.1); +} + +/* --- Code blocks --- */ +.rendered-markdown pre, +.rendered-markdown pre[class*="language-"] { + overflow-x: auto; + max-width: 100%; + width: 100%; + box-sizing: border-box; + display: block; + white-space: pre; + background-color: #1e1e1e; + color: #d4d4d4; + border-radius: 0.375rem; + padding: 1rem; + margin: 0.75rem 0; + font-size: 0.85rem; + line-height: 1.5; +} + +.rendered-markdown pre code { + display: block; + min-width: 0; + max-width: 100%; + overflow-x: auto; + white-space: pre; + background: transparent; + color: inherit; + padding: 0; + font-size: inherit; +} + +/* Inline code */ +.rendered-markdown code:not(pre code) { + background-color: rgba(0, 0, 0, 0.06); + padding: 0.15rem 0.35rem; + border-radius: 0.25rem; + font-size: 0.85em; + color: #d63384; +} + +[data-bs-theme="dark"] .rendered-markdown code:not(pre code) { + background-color: rgba(255, 255, 255, 0.1); + color: #e685b5; +} + +/* Blockquotes */ +.rendered-markdown blockquote { + border-left: 4px solid #dee2e6; + padding-left: 1em; + color: #6c757d; + margin: 0.75rem 0; +} + +[data-bs-theme="dark"] .rendered-markdown blockquote { + border-left-color: #495057; + color: #adb5bd; +} diff --git a/application/single_app/static/images/custom_logo.png b/application/single_app/static/images/custom_logo.png new file mode 100644 index 00000000..ecf6e652 Binary files /dev/null and b/application/single_app/static/images/custom_logo.png differ diff --git a/application/single_app/static/images/custom_logo_dark.png b/application/single_app/static/images/custom_logo_dark.png new file mode 100644 index 00000000..4f281945 Binary files /dev/null and b/application/single_app/static/images/custom_logo_dark.png differ diff --git a/application/single_app/static/js/admin/admin_settings.js b/application/single_app/static/js/admin/admin_settings.js index 85719128..21c989fd 100644 --- a/application/single_app/static/js/admin/admin_settings.js +++ b/application/single_app/static/js/admin/admin_settings.js @@ -1237,10 +1237,11 @@ function setupToggles() { const mathToggle = document.getElementById('toggle-math-plugin'); const textToggle = document.getElementById('toggle-text-plugin'); const factMemoryToggle = document.getElementById('toggle-fact-memory-plugin'); + const tabularProcessingToggle = document.getElementById('toggle-tabular-processing-plugin'); const embeddingToggle = document.getElementById('toggle-default-embedding-model-plugin'); const allowUserPluginsToggle = document.getElementById('toggle-allow-user-plugins'); const allowGroupPluginsToggle = document.getElementById('toggle-allow-group-plugins'); - const toggles = [timeToggle, httpToggle, waitToggle, mathToggle, textToggle, factMemoryToggle, embeddingToggle, allowUserPluginsToggle, allowGroupPluginsToggle]; + const toggles = [timeToggle, httpToggle, waitToggle, mathToggle, textToggle, factMemoryToggle, tabularProcessingToggle, embeddingToggle, allowUserPluginsToggle, allowGroupPluginsToggle]; // Feedback area let feedbackDiv = document.getElementById('core-plugin-toggles-feedback'); if (!feedbackDiv) { @@ -1270,6 +1271,16 @@ function setupToggles() { if (textToggle) textToggle.checked = !!settings.enable_text_plugin; if (embeddingToggle) embeddingToggle.checked = !!settings.enable_default_embedding_model_plugin; if (factMemoryToggle) factMemoryToggle.checked = !!settings.enable_fact_memory_plugin; + if (tabularProcessingToggle) { + tabularProcessingToggle.checked = !!settings.enable_tabular_processing_plugin; + const ecEnabled = !!settings.enable_enhanced_citations; + tabularProcessingToggle.disabled = !ecEnabled; + const depNote = document.getElementById('tabular-processing-dependency-note'); + if (depNote) { + depNote.textContent = ecEnabled ? 'Requires Enhanced Citations' : 'Requires Enhanced Citations (currently disabled)'; + depNote.className = ecEnabled ? 'text-muted d-block ms-4' : 'text-danger d-block ms-4'; + } + } if (allowUserPluginsToggle) allowUserPluginsToggle.checked = !!settings.allow_user_plugins; if (allowGroupPluginsToggle) allowGroupPluginsToggle.checked = !!settings.allow_group_plugins; } catch (err) { @@ -1291,6 +1302,7 @@ function setupToggles() { enable_text_plugin: textToggle ? textToggle.checked : false, enable_default_embedding_model_plugin: embeddingToggle ? embeddingToggle.checked : false, enable_fact_memory_plugin: factMemoryToggle ? factMemoryToggle.checked : false, + enable_tabular_processing_plugin: tabularProcessingToggle ? tabularProcessingToggle.checked : false, allow_user_plugins: allowUserPluginsToggle ? allowUserPluginsToggle.checked : false, allow_group_plugins: allowGroupPluginsToggle ? allowGroupPluginsToggle.checked : false }; @@ -1867,14 +1879,30 @@ function setupToggles() { const redisAuthType = document.getElementById('redis_auth_type'); if (redisAuthType) { const redisKeyContainer = document.getElementById('redis_key_container'); + const redisKeyLabel = document.getElementById('redis_key_label'); + + // Helper to update the label text based on auth type + function updateRedisKeyLabel(authTypeValue) { + if (!redisKeyLabel) return; + redisKeyLabel.textContent = authTypeValue === 'key_vault' ? 'Key Vault Secret Name' : 'Redis Access Key'; + } + // Set initial state on load if (redisKeyContainer) { - redisKeyContainer.style.display = (redisAuthType.value === 'key') ? 'block' : 'none'; + redisKeyContainer.classList.toggle('d-none', !(redisAuthType.value === 'key' || redisAuthType.value === 'key_vault')); } + updateRedisKeyLabel(redisAuthType.value); + redisAuthType.addEventListener('change', function () { if (redisKeyContainer) { - redisKeyContainer.style.display = (this.value === 'key') ? 'block' : 'none'; + redisKeyContainer.classList.toggle('d-none', !(this.value === 'key' || this.value === 'key_vault')); + } + const redisKeyVaultHint = document.getElementById('redis_key_vault_hint'); + if (redisKeyVaultHint) { + redisKeyVaultHint.classList.toggle('d-none', this.value !== 'key_vault'); } + updateRedisKeyLabel(this.value); + markFormAsModified(); }); } @@ -2179,7 +2207,8 @@ function setupTestButtons() { const payload = { test_type: 'redis', endpoint: document.getElementById('redis_url').value, - key: document.getElementById('redis_key').value + key: document.getElementById('redis_key').value, + auth_type: document.getElementById('redis_auth_type').value }; try { @@ -3827,11 +3856,12 @@ function checkOptionalFeaturesEnabled(stepNumber) { return endpoint && key; } - case 11: // User feedback and archiving - // Check if feedback is enabled + case 11: // User feedback, archiving, and thoughts + // Check if feedback, archiving, or thoughts is enabled const feedbackEnabled = document.getElementById('enable_user_feedback')?.checked; const archivingEnabled = document.getElementById('enable_conversation_archiving')?.checked; - return feedbackEnabled || archivingEnabled; + const thoughtsEnabled = document.getElementById('enable_thoughts')?.checked; + return feedbackEnabled || archivingEnabled || thoughtsEnabled; case 12: // Enhanced citations and image generation // Check if enhanced citations or image generation is enabled diff --git a/application/single_app/static/js/chat/chat-agents.js b/application/single_app/static/js/chat/chat-agents.js index b1e4f5fe..af18ff21 100644 --- a/application/single_app/static/js/chat/chat-agents.js +++ b/application/single_app/static/js/chat/chat-agents.js @@ -8,10 +8,43 @@ import { getUserSetting, setUserSetting } from '../agents_common.js'; +import { createSearchableSingleSelect } from './chat-searchable-select.js'; const enableAgentsBtn = document.getElementById("enable-agents-btn"); const agentSelectContainer = document.getElementById("agent-select-container"); const modelSelectContainer = document.getElementById("model-select-container"); +const agentSelect = document.getElementById('agent-select'); +const agentDropdown = document.getElementById('agent-dropdown'); +const agentDropdownButton = document.getElementById('agent-dropdown-button'); +const agentDropdownMenu = document.getElementById('agent-dropdown-menu'); +const agentDropdownText = agentDropdownButton + ? agentDropdownButton.querySelector('.chat-searchable-select-text') + : null; +const agentSearchInput = document.getElementById('agent-search-input'); +const agentDropdownItems = document.getElementById('agent-dropdown-items'); + +let agentSelectorController = null; + +function initializeAgentSelector() { + if (agentSelectorController || !agentSelect) { + return agentSelectorController; + } + + agentSelectorController = createSearchableSingleSelect({ + selectEl: agentSelect, + dropdownEl: agentDropdown, + buttonEl: agentDropdownButton, + buttonTextEl: agentDropdownText, + menuEl: agentDropdownMenu, + searchInputEl: agentSearchInput, + itemsContainerEl: agentDropdownItems, + placeholderText: 'Select an Agent', + emptyMessage: 'No agents available', + emptySearchMessage: 'No matching agents found', + }); + + return agentSelectorController; +} /** * Check if agents are currently enabled @@ -24,6 +57,8 @@ export function areAgentsEnabled() { export async function initializeAgentInteractions() { if (enableAgentsBtn && agentSelectContainer) { + initializeAgentSelector(); + // On load, sync UI with enable_agents setting const enableAgents = await getUserSetting('enable_agents'); if (enableAgents) { @@ -58,7 +93,8 @@ export async function initializeAgentInteractions() { } export async function populateAgentDropdown() { - const agentSelect = agentSelectContainer.querySelector('select'); + initializeAgentSelector(); + try { const [userAgents, selectedAgent] = await Promise.all([ fetchUserAgents(), @@ -71,6 +107,7 @@ export async function populateAgentDropdown() { const globalAgents = combinedAgents.filter(agent => agent.is_global); const orderedAgents = [...personalAgents, ...activeGroupAgents, ...globalAgents]; populateAgentSelect(agentSelect, orderedAgents, selectedAgent); + agentSelectorController?.refresh(); agentSelect.onchange = async function () { const selectedOption = agentSelect.options[agentSelect.selectedIndex]; if (!selectedOption) { diff --git a/application/single_app/static/js/chat/chat-citations.js b/application/single_app/static/js/chat/chat-citations.js index 9ec6bad3..60099398 100644 --- a/application/single_app/static/js/chat/chat-citations.js +++ b/application/single_app/static/js/chat/chat-citations.js @@ -11,6 +11,14 @@ import { showEnhancedCitationModal } from './chat-enhanced-citations.js'; const chatboxEl = document.getElementById("chatbox"); +function escapeAttribute(value) { + return String(value) + .replace(/&/g, '&') + .replace(/"/g, '"') + .replace(//g, '>'); +} + export function parseDocIdAndPage(citationId) { // ... (keep existing implementation) const underscoreIndex = citationId.lastIndexOf("_"); @@ -24,9 +32,9 @@ export function parseDocIdAndPage(citationId) { export function parseCitations(message) { // ... (keep existing implementation) - const citationRegex = /\(Source:\s*([^,]+),\s*Page(?:s)?:\s*([^)]+)\)\s*((?:\[#.*?\]\s*)+)/gi; + const citationRegex = /\(Source:\s*([^,]+),\s*(Page(?:s)?|Sheet(?:s)?|Location):\s*([^)]+)\)\s*((?:\[#.*?\]\s*)+)/gi; - let result = message.replace(citationRegex, (whole, filename, pages, bracketSection) => { + let result = message.replace(citationRegex, (whole, filename, locationLabel, locations, bracketSection) => { let filenameHtml; if (/^https?:\/\/.+/i.test(filename.trim())) { filenameHtml = `${filename.trim()}`; @@ -36,6 +44,7 @@ export function parseCitations(message) { const bracketMatches = bracketSection.match(/\[#.*?\]/g) || []; const pageToRefMap = {}; + const orderedRefs = []; bracketMatches.forEach((match) => { let inner = match.slice(2, -1).trim(); @@ -43,6 +52,7 @@ export function parseCitations(message) { refs.forEach((r) => { let ref = r.trim(); if (ref.startsWith('#')) ref = ref.slice(1); + orderedRefs.push(ref); const parts = ref.split('_'); const pageNumber = parts.pop(); // Ensure docId part is also captured if needed, though ref is the full ID here @@ -56,8 +66,15 @@ export function parseCitations(message) { return underscoreIndex === -1 ? ref : ref.slice(0, underscoreIndex + 1); } - const pagesTokens = pages.split(/,/).map(tok => tok.trim()); - const linkedTokens = pagesTokens.map(token => { + const normalizedLocationLabel = locationLabel.toLowerCase(); + const locationTokens = locations.split(/,/).map(tok => tok.trim()); + const linkedTokens = locationTokens.map((token, index) => { + if (!normalizedLocationLabel.startsWith('page')) { + const ref = orderedRefs[index] || orderedRefs[0]; + const sheetName = normalizedLocationLabel.startsWith('sheet') ? token : null; + return buildAnchorIfExists(token, ref, sheetName); + } + const dashParts = token.split(/[–—-]/).map(p => p.trim()); if (dashParts.length === 2 && dashParts[0] && dashParts[1]) { @@ -94,7 +111,7 @@ export function parseCitations(message) { }); const linkedPagesText = linkedTokens.join(', '); - return `(Source: ${filenameHtml}, Pages: ${linkedPagesText})`; + return `(Source: ${filenameHtml}, ${locationLabel}: ${linkedPagesText})`; }); // Cleanup pass: strip any remaining [#guid...] bracket groups that the main regex didn't match. @@ -107,14 +124,15 @@ export function parseCitations(message) { } -export function buildAnchorIfExists(pageStr, citationId) { +export function buildAnchorIfExists(pageStr, citationId, sheetName = null) { // ... (keep existing implementation) if (!citationId) { return pageStr; } // Ensure citationId doesn't have a leading # if passed accidentally const cleanCitationId = citationId.startsWith('#') ? citationId.slice(1) : citationId; - return `${pageStr}`; + const sheetNameAttribute = sheetName ? ` data-sheet-name="${escapeAttribute(sheetName)}"` : ''; + return `${pageStr}`; } // --- MODIFIED: fetchCitedText handles errors more gracefully --- @@ -609,6 +627,7 @@ if (chatboxEl) { } const { docId, pageNumber } = parseDocIdAndPage(citationId); + const sheetName = target.getAttribute("data-sheet-name"); // Safety check: Ensure docId and pageNumber were parsed correctly if (!docId || !pageNumber) { @@ -649,7 +668,7 @@ if (chatboxEl) { if (attemptEnhanced) { // console.log(`Attempting Enhanced Citation for ${docId}, page/timestamp ${pageNumber}, citationId ${citationId}`); // Use new enhanced citation system that supports multiple file types - showEnhancedCitationModal(docId, pageNumber, citationId); + showEnhancedCitationModal(docId, pageNumber, citationId, sheetName); } else { // console.log(`Fetching Text Citation for ${citationId}`); // Use text citation if globally disabled OR explicitly disabled for this doc OR if parsing failed earlier diff --git a/application/single_app/static/js/chat/chat-conversation-details.js b/application/single_app/static/js/chat/chat-conversation-details.js index 19851bae..484128af 100644 --- a/application/single_app/static/js/chat/chat-conversation-details.js +++ b/application/single_app/static/js/chat/chat-conversation-details.js @@ -75,7 +75,7 @@ export async function showConversationDetails(conversationId) { * @returns {string} HTML string */ function renderConversationMetadata(metadata, conversationId) { - const { context = [], tags = [], strict = false, classification = [], last_updated, chat_type = 'personal', is_pinned = false, is_hidden = false, scope_locked, locked_contexts = [] } = metadata; + const { context = [], tags = [], strict = false, classification = [], last_updated, chat_type = 'personal', is_pinned = false, is_hidden = false, scope_locked, locked_contexts = [], summary = null } = metadata; // Organize tags by category const tagsByCategory = { @@ -97,6 +97,18 @@ function renderConversationMetadata(metadata, conversationId) { // Build HTML sections let html = `
+ +
+
+
+
Summary
+ ${summary ? `Generated ${formatDate(summary.generated_at)}${summary.model_deployment ? ` · ${summary.model_deployment}` : ''}` : ''} +
+
+ ${renderSummaryContent(summary, conversationId)} +
+
+
@@ -570,8 +582,159 @@ function extractPageNumbers(chunkIds) { return pages.sort((a, b) => parseInt(a) - parseInt(b)); } +/** + * Render the summary card body content + * @param {Object|null} summary - Existing summary data or null + * @param {string} conversationId - The conversation ID + * @returns {string} HTML string + */ +function renderSummaryContent(summary, conversationId) { + if (summary && summary.content) { + return ` +

${escapeHtml(summary.content)}

+
+ +
+ `; + } + + // Build model options from the global model-select dropdown + const modelOptions = getAvailableModelOptions(); + return ` +

No summary has been generated for this conversation yet.

+
+ + +
+ `; +} + +/** + * Get available model options from the global #model-select dropdown + * @returns {string} HTML option elements + */ +function getAvailableModelOptions() { + const globalSelect = document.getElementById('model-select'); + if (!globalSelect) { + return ''; + } + let options = ''; + for (const opt of globalSelect.options) { + options += ``; + } + return options || ''; +} + +/** + * Handle summary generation (generate or regenerate) + * @param {string} conversationId - The conversation ID + * @param {string} modelDeployment - Selected model deployment + */ +async function handleGenerateSummary(conversationId, modelDeployment) { + const cardBody = document.getElementById('summary-card-body'); + if (!cardBody) { + return; + } + + cardBody.innerHTML = ` +
+
+ Generating... +
+ Generating summary... +
+ `; + + try { + const response = await fetch(`/api/conversations/${conversationId}/summary`, { + method: 'POST', + headers: { 'Content-Type': 'application/json' }, + body: JSON.stringify({ model_deployment: modelDeployment }) + }); + + if (!response.ok) { + const errData = await response.json().catch(() => ({})); + throw new Error(errData.error || `HTTP ${response.status}`); + } + + const data = await response.json(); + const summary = data.summary; + cardBody.innerHTML = renderSummaryContent(summary, conversationId); + + // Update card header with generation info + const cardHeader = cardBody.closest('.card').querySelector('.card-header'); + if (cardHeader && summary) { + const smallEl = cardHeader.querySelector('small'); + const infoText = `Generated ${formatDate(summary.generated_at)}${summary.model_deployment ? ` · ${summary.model_deployment}` : ''}`; + if (smallEl) { + smallEl.textContent = infoText; + } else { + const small = document.createElement('small'); + small.className = 'opacity-75'; + small.textContent = infoText; + cardHeader.appendChild(small); + } + } + + } catch (error) { + console.error('Error generating summary:', error); + cardBody.innerHTML = ` +
+ + Failed to generate summary: ${escapeHtml(error.message)} +
+ ${renderSummaryContent(null, conversationId)} + `; + } +} + +/** + * Simple HTML escapefor display + * @param {string} str - String to escape + * @returns {string} Escaped string + */ +function escapeHtml(str) { + if (!str) { + return ''; + } + const div = document.createElement('div'); + div.textContent = str; + return div.innerHTML; +} + // Event listeners for details buttons document.addEventListener('click', function(e) { + // Generate summary button + if (e.target.closest('#generate-summary-btn')) { + e.preventDefault(); + const btn = e.target.closest('#generate-summary-btn'); + const cid = btn.getAttribute('data-conversation-id'); + const modelSelect = document.getElementById('summary-model-select'); + const model = modelSelect ? modelSelect.value : ''; + handleGenerateSummary(cid, model); + return; + } + + // Regenerate summary button + if (e.target.closest('#regenerate-summary-btn')) { + e.preventDefault(); + const btn = e.target.closest('#regenerate-summary-btn'); + const cid = btn.getAttribute('data-conversation-id'); + // Use the currently selected global model for regeneration + const globalSelect = document.getElementById('model-select'); + const model = globalSelect ? globalSelect.value : ''; + handleGenerateSummary(cid, model); + return; + } + if (e.target.closest('.details-btn')) { e.preventDefault(); diff --git a/application/single_app/static/js/chat/chat-conversations.js b/application/single_app/static/js/chat/chat-conversations.js index 0af0a768..34c035be 100644 --- a/application/single_app/static/js/chat/chat-conversations.js +++ b/application/single_app/static/js/chat/chat-conversations.js @@ -3,7 +3,11 @@ import { showToast } from "./chat-toast.js"; import { loadMessages } from "./chat-messages.js"; import { isColorLight, toBoolean } from "./chat-utils.js"; -import { loadSidebarConversations, setActiveConversation as setSidebarActiveConversation } from "./chat-sidebar-conversations.js"; +import { + loadSidebarConversations, + setActiveConversation as setSidebarActiveConversation, + setConversationUnreadState as setSidebarConversationUnreadState, +} from "./chat-sidebar-conversations.js"; import { toggleConversationInfoButton } from "./chat-conversation-info-button.js"; import { restoreScopeLockState, resetScopeLock } from "./chat-documents.js"; @@ -28,6 +32,110 @@ let allConversations = []; // Store all conversations for client-side filtering let isLoadingConversations = false; // Prevent concurrent loads let showQuickSearch = false; // Track if quick search input is visible let quickSearchTerm = ""; // Current search term +let pendingConversationCreation = null; // Reuse a single in-flight create request +const markConversationReadRequests = new Map(); + +function createUnreadDotElement() { + const unreadDot = document.createElement("span"); + unreadDot.classList.add("conversation-unread-dot"); + unreadDot.setAttribute("aria-hidden", "true"); + return unreadDot; +} + +function updateConversationUnreadStateCache(conversationId, hasUnread) { + allConversations = allConversations.map(convo => { + if (convo.id !== conversationId) { + return convo; + } + + return { + ...convo, + has_unread_assistant_response: hasUnread, + last_unread_assistant_message_id: hasUnread ? convo.last_unread_assistant_message_id : null, + last_unread_assistant_at: hasUnread ? convo.last_unread_assistant_at : null, + }; + }); +} + +function getConversationUnreadState(conversationId) { + const convoItem = document.querySelector(`.conversation-item[data-conversation-id="${conversationId}"]`); + if (convoItem) { + return convoItem.dataset.hasUnreadAssistantResponse === "true"; + } + + const conversation = allConversations.find(convo => convo.id === conversationId); + return Boolean(conversation?.has_unread_assistant_response); +} + +export function setConversationUnreadState(conversationId, hasUnread) { + updateConversationUnreadStateCache(conversationId, hasUnread); + + const convoItem = document.querySelector(`.conversation-item[data-conversation-id="${conversationId}"]`); + if (convoItem) { + convoItem.dataset.hasUnreadAssistantResponse = hasUnread ? "true" : "false"; + + const titleRow = convoItem.querySelector(".conversation-title-row"); + const titleElement = convoItem.querySelector(".conversation-title"); + const existingDot = convoItem.querySelector(".conversation-unread-dot"); + + if (!hasUnread) { + if (existingDot) { + existingDot.remove(); + } + } else if (!existingDot && titleRow && titleElement) { + titleRow.insertBefore(createUnreadDotElement(), titleElement); + } + } + + setSidebarConversationUnreadState(conversationId, hasUnread); +} + +export async function markConversationRead(conversationId, options = {}) { + const { force = false, suppressErrorToast = false } = options; + if (!conversationId) { + return null; + } + + const previousUnreadState = getConversationUnreadState(conversationId); + if (!force && !previousUnreadState) { + return { success: true, skipped: true }; + } + + if (markConversationReadRequests.has(conversationId)) { + return markConversationReadRequests.get(conversationId); + } + + setConversationUnreadState(conversationId, false); + + const markReadRequest = fetch(`/api/conversations/${conversationId}/mark-read`, { + method: "POST", + headers: { "Content-Type": "application/json" }, + }) + .then(async response => { + const data = await response.json().catch(() => ({})); + if (!response.ok || data.success === false) { + throw new Error(data.error || "Failed to mark conversation as read"); + } + return data; + }) + .catch(error => { + if (previousUnreadState) { + setConversationUnreadState(conversationId, true); + } + + if (!suppressErrorToast) { + showToast(`Failed to clear unread state: ${error.message}`, "danger"); + } + + throw error; + }) + .finally(() => { + markConversationReadRequests.delete(conversationId); + }); + + markConversationReadRequests.set(conversationId, markReadRequest); + return markReadRequest; +} // Clear selected conversations when loading the page document.addEventListener('DOMContentLoaded', () => { @@ -344,6 +452,9 @@ export async function ensureConversationPresent(conversationId) { chat_type: metadata.chat_type || null, is_pinned: metadata.is_pinned || false, is_hidden: metadata.is_hidden || false, + has_unread_assistant_response: metadata.has_unread_assistant_response || false, + last_unread_assistant_message_id: metadata.last_unread_assistant_message_id || null, + last_unread_assistant_at: metadata.last_unread_assistant_at || null, }; // Keep allConversations in sync @@ -365,6 +476,7 @@ export function createConversationItem(convo) { convoItem.classList.add("list-group-item", "list-group-item-action", "conversation-item", "d-flex", "align-items-center"); // Use action class convoItem.setAttribute("data-conversation-id", convo.id); convoItem.setAttribute("data-conversation-title", convo.title); // Store title too + convoItem.dataset.hasUnreadAssistantResponse = convo.has_unread_assistant_response ? "true" : "false"; // *** Store classification data as stringified JSON *** convoItem.dataset.classifications = JSON.stringify(convo.classification || []); @@ -437,8 +549,12 @@ export function createConversationItem(convo) { leftDiv.classList.add("d-flex", "flex-column", "flex-grow-1", "pe-2"); // flex-grow and padding-end leftDiv.style.overflow = "hidden"; // Prevent overflow issues + const titleRow = document.createElement("div"); + titleRow.classList.add("conversation-title-row", "d-flex", "align-items-center", "gap-2", "overflow-hidden"); + const titleSpan = document.createElement("span"); - titleSpan.classList.add("conversation-title", "text-truncate"); // Bold and truncate + titleSpan.classList.add("conversation-title", "text-truncate", "flex-grow-1"); // Bold and truncate + titleSpan.style.minWidth = "0"; // Add pin icon if conversation is pinned const isPinned = convo.is_pinned || false; @@ -451,12 +567,18 @@ export function createConversationItem(convo) { titleSpan.appendChild(document.createTextNode(convo.title)); titleSpan.title = convo.title; // Tooltip for full title + if (convo.has_unread_assistant_response) { + titleRow.appendChild(createUnreadDotElement()); + } + + titleRow.appendChild(titleSpan); + const dateSpan = document.createElement("small"); dateSpan.classList.add("text-muted"); const date = new Date(convo.last_updated); dateSpan.textContent = date.toLocaleString([], { dateStyle: 'short', timeStyle: 'short' }); // Shorter format - leftDiv.appendChild(titleSpan); + leftDiv.appendChild(titleRow); leftDiv.appendChild(dateSpan); // Right part: three dots dropdown @@ -799,7 +921,8 @@ export function addConversationToList(conversationId, title = null, classificati id: conversationId, title: title || "New Conversation", // Default title last_updated: new Date().toISOString(), - classification: classifications // Include classifications + classification: classifications, // Include classifications + has_unread_assistant_response: false, }; const convoItem = createConversationItem(convo); @@ -988,6 +1111,9 @@ export async function selectConversation(conversationId) { } loadMessages(conversationId); + markConversationRead(conversationId, { force: true, suppressErrorToast: true }).catch(error => { + console.warn('Failed to clear unread state for conversation:', error); + }); highlightSelectedConversation(conversationId); // Show the conversation info button since we have an active conversation @@ -1067,7 +1193,21 @@ export function deleteConversation(conversationId) { } // Create a new conversation via API -export async function createNewConversation(callback) { +export async function createNewConversation(callback, options = {}) { + if (pendingConversationCreation) { + try { + await pendingConversationCreation; + if (typeof callback === "function") { + callback(); + } + } catch (error) { + // The original caller already surfaced the creation failure. + } + return; + } + + const { preserveSelections = false } = options; + // Disable new button? Show loading? if (newConversationBtn) newConversationBtn.disabled = true; @@ -1079,54 +1219,61 @@ export async function createNewConversation(callback) { } try { - const response = await fetch("/api/create_conversation", { - method: "POST", - headers: { - "Content-Type": "application/json", - }, - credentials: "same-origin", - }); - if (!response.ok) { - const errData = await response.json().catch(() => ({})); - throw new Error(errData.error || "Failed to create conversation"); - } - const data = await response.json(); - if (!data.conversation_id) { - throw new Error("No conversation_id returned from server."); - } + pendingConversationCreation = (async () => { + const response = await fetch("/api/create_conversation", { + method: "POST", + headers: { + "Content-Type": "application/json", + }, + credentials: "same-origin", + }); + if (!response.ok) { + const errData = await response.json().catch(() => ({})); + throw new Error(errData.error || "Failed to create conversation"); + } + const data = await response.json(); + if (!data.conversation_id) { + throw new Error("No conversation_id returned from server."); + } - currentConversationId = data.conversation_id; - // Reset scope lock for new conversation - resetScopeLock(); - // Add to list (pass empty classifications for new convo) - addConversationToList(data.conversation_id, data.title /* Use title from API if provided */, []); - - // Don't call selectConversation here if we're about to send a message - // because selectConversation clears the chatbox, which would remove - // the user message that's about to be appended by actuallySendMessage - // Instead, just update the UI elements directly - window.currentConversationId = data.conversation_id; - const titleEl = document.getElementById("current-conversation-title"); - if (titleEl) { - titleEl.textContent = data.title || "New Conversation"; - } - // Clear classification/tag badges from previous conversation - if (currentConversationClassificationsEl) { - currentConversationClassificationsEl.innerHTML = ""; - } - updateConversationUrl(data.conversation_id); - console.log('[createNewConversation] Created conversation without reload:', data.conversation_id); + currentConversationId = data.conversation_id; + // Reset scope lock for new conversation + resetScopeLock({ preserveSelections }); + // Add to list (pass empty classifications for new convo) + addConversationToList(data.conversation_id, data.title /* Use title from API if provided */, []); + + // Don't call selectConversation here if we're about to send a message + // because selectConversation clears the chatbox, which would remove + // the user message that's about to be appended by actuallySendMessage + // Instead, just update the UI elements directly + window.currentConversationId = data.conversation_id; + const titleEl = document.getElementById("current-conversation-title"); + if (titleEl) { + titleEl.textContent = data.title || "New Conversation"; + } + // Clear classification/tag badges from previous conversation + if (currentConversationClassificationsEl) { + currentConversationClassificationsEl.innerHTML = ""; + } + updateConversationUrl(data.conversation_id); + console.log('[createNewConversation] Created conversation without reload:', data.conversation_id); + + return data; + })(); + + const data = await pendingConversationCreation; // Execute callback if provided (e.g., to send the first message) if (typeof callback === "function") { callback(); } - + return data; } catch (error) { console.error("Error creating conversation:", error); showToast(`Failed to create a new conversation: ${error.message}`, "danger"); } finally { + pendingConversationCreation = null; if (newConversationBtn) newConversationBtn.disabled = false; } } @@ -1513,6 +1660,8 @@ window.chatConversations = { loadConversations, highlightSelectedConversation, addConversationToList, + markConversationRead, + setConversationUnreadState, deleteConversation, toggleConversationSelection, deleteSelectedConversations, diff --git a/application/single_app/static/js/chat/chat-documents.js b/application/single_app/static/js/chat/chat-documents.js index 44596872..bb1be3c4 100644 --- a/application/single_app/static/js/chat/chat-documents.js +++ b/application/single_app/static/js/chat/chat-documents.js @@ -1,6 +1,7 @@ // chat-documents.js import { showToast } from "./chat-toast.js"; +import { initializeFilterableDropdownSearch } from "./chat-searchable-select.js"; export const docScopeSelect = document.getElementById("doc-scope-select"); const searchDocumentsBtn = document.getElementById("search-documents-btn"); @@ -8,6 +9,7 @@ const docSelectEl = document.getElementById("document-select"); // Hidden select const searchDocumentsContainer = document.getElementById("search-documents-container"); // Container for scope/doc/class // Custom dropdown elements +const docDropdown = document.getElementById("document-dropdown"); const docDropdownButton = document.getElementById("document-dropdown-button"); const docDropdownItems = document.getElementById("document-dropdown-items"); const docDropdownMenu = document.getElementById("document-dropdown-menu"); @@ -17,12 +19,16 @@ const docSearchInput = document.getElementById("document-search-input"); const chatTagsFilter = document.getElementById("chat-tags-filter"); const tagsDropdown = document.getElementById("tags-dropdown"); const tagsDropdownButton = document.getElementById("tags-dropdown-button"); +const tagsDropdownMenu = document.getElementById("tags-dropdown-menu"); const tagsDropdownItems = document.getElementById("tags-dropdown-items"); +const tagsSearchInput = document.getElementById("tags-search-input"); // Scope dropdown elements +const scopeDropdown = document.getElementById("scope-dropdown"); const scopeDropdownButton = document.getElementById("scope-dropdown-button"); const scopeDropdownItems = document.getElementById("scope-dropdown-items"); const scopeDropdownMenu = document.getElementById("scope-dropdown-menu"); +const scopeSearchInput = document.getElementById("scope-search-input"); // We'll store personalDocs/groupDocs/publicDocs in memory once loaded: export let personalDocs = []; @@ -49,6 +55,33 @@ let selectedPersonal = true; let selectedGroupIds = (window.userGroups || []).map(g => g.id); let selectedPublicWorkspaceIds = (window.userVisiblePublicWorkspaces || []).map(ws => ws.id); +const documentSearchController = initializeFilterableDropdownSearch({ + dropdownEl: docDropdown, + menuEl: docDropdownMenu, + searchInputEl: docSearchInput, + itemsContainerEl: docDropdownItems, + emptyMessage: 'No matching documents found', + isAlwaysVisibleItem: item => item.getAttribute('data-search-role') === 'action', +}); + +const scopeSearchController = initializeFilterableDropdownSearch({ + dropdownEl: scopeDropdown, + menuEl: scopeDropdownMenu, + searchInputEl: scopeSearchInput, + itemsContainerEl: scopeDropdownItems, + emptyMessage: 'No matching workspaces found', + isAlwaysVisibleItem: item => item.getAttribute('data-search-role') === 'action', +}); + +const tagsSearchController = initializeFilterableDropdownSearch({ + dropdownEl: tagsDropdown, + menuEl: tagsDropdownMenu, + searchInputEl: tagsSearchInput, + itemsContainerEl: tagsDropdownItems, + emptyMessage: 'No matching tags found', + isAlwaysVisibleItem: item => item.getAttribute('data-search-role') === 'action', +}); + /* --------------------------------------------------------------------------- Get Effective Scopes — used by chat-messages.js and internally --------------------------------------------------------------------------- */ @@ -160,10 +193,19 @@ export function restoreScopeLockState(lockState, contexts) { * Reset scope lock for a new conversation. * Resets to "All" with no lock. */ -export function resetScopeLock() { +export function resetScopeLock(options = {}) { + const { preserveSelections = false } = options; + scopeLocked = null; lockedContexts = []; + if (preserveSelections) { + buildScopeDropdown(); + updateScopeLockIcon(); + updateHeaderLockIcon(); + return; + } + const groups = window.userGroups || []; const publicWorkspaces = window.userVisiblePublicWorkspaces || []; selectedPersonal = true; @@ -227,6 +269,7 @@ function buildScopeDropdown() { allItem.type = "button"; allItem.classList.add("dropdown-item", "d-flex", "align-items-center", "fw-bold"); allItem.setAttribute("data-scope-action", "toggle-all"); + allItem.setAttribute("data-search-role", "action"); allItem.style.display = "flex"; allItem.style.width = "100%"; allItem.style.textAlign = "left"; @@ -283,6 +326,7 @@ function buildScopeDropdown() { } syncScopeButtonText(); + scopeSearchController?.applyFilter(scopeSearchInput ? scopeSearchInput.value : ''); } /* --------------------------------------------------------------------------- @@ -358,6 +402,7 @@ function rebuildScopeDropdownWithLock() { syncScopeButtonText(); updateScopeLockIcon(); + scopeSearchController?.applyFilter(scopeSearchInput ? scopeSearchInput.value : ''); } /* --------------------------------------------------------------------------- @@ -421,6 +466,8 @@ function createScopeItem(value, label, checked) { item.type = "button"; item.classList.add("dropdown-item", "d-flex", "align-items-center"); item.setAttribute("data-scope-value", value); + item.setAttribute("data-search-role", "item"); + item.dataset.searchLabel = label; item.style.display = "flex"; item.style.width = "100%"; item.style.textAlign = "left"; @@ -556,6 +603,7 @@ export function populateDocumentSelectScope() { allItem.type = "button"; allItem.classList.add("dropdown-item"); allItem.setAttribute("data-document-id", ""); + allItem.setAttribute("data-search-role", "action"); allItem.textContent = "All Documents"; allItem.style.display = "block"; allItem.style.width = "100%"; @@ -618,7 +666,9 @@ export function populateDocumentSelectScope() { dropdownItem.type = "button"; dropdownItem.classList.add("dropdown-item", "d-flex", "align-items-center"); dropdownItem.setAttribute("data-document-id", doc.id); + dropdownItem.setAttribute("data-search-role", "item"); dropdownItem.setAttribute("title", doc.label); + dropdownItem.dataset.searchLabel = doc.label; dropdownItem.dataset.tags = JSON.stringify(doc.tags || []); dropdownItem.dataset.classification = doc.classification || ''; dropdownItem.style.display = "flex"; @@ -674,6 +724,7 @@ export function populateDocumentSelectScope() { // Trigger UI update after populating handleDocumentSelectChange(); + documentSearchController?.applyFilter(docSearchInput ? docSearchInput.value : ''); } export function getDocumentMetadata(docId) { @@ -831,14 +882,14 @@ export function loadAllDocs() { function initializeDocumentDropdown() { if (!docDropdownMenu) return; - // Clear any leftover search-filter inline styles on visible items + // Clear any leftover search-filter state on visible items docDropdownItems.querySelectorAll('.dropdown-item').forEach(item => { - item.removeAttribute('data-filtered'); - item.style.display = ''; + item.classList.remove('d-none'); }); // Re-apply tag filter (DOM removal approach — no CSS issues) filterDocumentsBySelectedTags(); + documentSearchController?.applyFilter(docSearchInput ? docSearchInput.value : ''); // Size the dropdown to fill its parent container const parentContainer = docDropdownButton.closest('.flex-grow-1'); @@ -976,6 +1027,7 @@ export async function loadTagsForScope() { allItem.type = 'button'; allItem.classList.add('dropdown-item', 'text-muted', 'small'); allItem.setAttribute('data-tag-value', ''); + allItem.setAttribute('data-search-role', 'action'); allItem.textContent = 'Clear All'; allItem.style.display = 'block'; allItem.style.width = '100%'; @@ -993,6 +1045,8 @@ export async function loadTagsForScope() { item.type = 'button'; item.classList.add('dropdown-item', 'd-flex', 'align-items-center'); item.setAttribute('data-tag-value', tag.name); + item.setAttribute('data-search-role', 'item'); + item.dataset.searchLabel = tag.displayName; item.style.display = 'flex'; item.style.width = '100%'; item.style.textAlign = 'left'; @@ -1029,6 +1083,8 @@ export async function loadTagsForScope() { item.type = 'button'; item.classList.add('dropdown-item', 'd-flex', 'align-items-center'); item.setAttribute('data-tag-value', cls.name); + item.setAttribute('data-search-role', 'item'); + item.dataset.searchLabel = cls.displayName; item.style.display = 'flex'; item.style.width = '100%'; item.style.textAlign = 'left'; @@ -1053,6 +1109,8 @@ export async function loadTagsForScope() { tagsDropdownItems.appendChild(item); }); } + + tagsSearchController?.applyFilter(tagsSearchInput ? tagsSearchInput.value : ''); } } else { hideTagsDropdown(); @@ -1069,6 +1127,9 @@ function showTagsDropdown() { function hideTagsDropdown() { if (tagsDropdown) tagsDropdown.style.display = 'none'; + if (tagsSearchController) { + tagsSearchController.resetFilter(); + } } /* --------------------------------------------------------------------------- @@ -1166,6 +1227,8 @@ export function filterDocumentsBySelectedTags() { opt.disabled = !matchesSelection(optTags, optClassification); }); } + + documentSearchController?.applyFilter(docSearchInput ? docSearchInput.value : ''); } /* --------------------------------------------------------------------------- @@ -1250,7 +1313,6 @@ if (chatTagsFilter) { // Tags dropdown: prevent closing when clicking inside if (tagsDropdownItems) { - const tagsDropdownMenu = document.getElementById("tags-dropdown-menu"); if (tagsDropdownMenu) { tagsDropdownMenu.addEventListener('click', function(e) { e.stopPropagation(); @@ -1413,70 +1475,6 @@ if (docDropdownItems) { }); } -// Add search functionality -if (docSearchInput) { - // Define our filtering function to ensure consistent filtering logic. - // Items hidden by tag filter are physically removed from the DOM, - // so querySelectorAll naturally excludes them. - const filterDocumentItems = function(searchTerm) { - if (!docDropdownItems) return; - - const items = docDropdownItems.querySelectorAll('.dropdown-item'); - let matchFound = false; - - items.forEach(item => { - const docName = item.textContent.toLowerCase(); - - if (!searchTerm || docName.includes(searchTerm)) { - item.style.display = ''; - item.setAttribute('data-filtered', 'visible'); - matchFound = true; - } else { - item.style.display = 'none'; - item.setAttribute('data-filtered', 'hidden'); - } - }); - - // Show a message if no matches found - const noMatchesEl = docDropdownItems.querySelector('.no-matches'); - if (!matchFound && searchTerm && searchTerm.length > 0) { - if (!noMatchesEl) { - const noMatchesMsg = document.createElement('div'); - noMatchesMsg.className = 'no-matches text-center text-muted py-2'; - noMatchesMsg.textContent = 'No matching documents found'; - docDropdownItems.appendChild(noMatchesMsg); - } - } else { - if (noMatchesEl) { - noMatchesEl.remove(); - } - } - }; - - // Attach input event directly - docSearchInput.addEventListener('input', function() { - const searchTerm = this.value.toLowerCase().trim(); - filterDocumentItems(searchTerm); - }); - - // Also attach keyup event as a fallback - docSearchInput.addEventListener('keyup', function() { - const searchTerm = this.value.toLowerCase().trim(); - filterDocumentItems(searchTerm); - }); - - // Prevent dropdown from closing when clicking in search input - docSearchInput.addEventListener('click', function(e) { - e.stopPropagation(); - e.preventDefault(); - }); - - // Prevent dropdown from closing when pressing keys in search input - docSearchInput.addEventListener('keydown', function(e) { - e.stopPropagation(); - }); -} - /* --------------------------------------------------------------------------- Handle Document Selection & Update UI --------------------------------------------------------------------------- */ @@ -1513,10 +1511,7 @@ document.addEventListener('DOMContentLoaded', function() { // If search documents button exists, it needs to be clicked to show controls if (searchDocumentsBtn && docDropdownButton) { try { - // Get the dropdown element - const dropdownEl = document.getElementById('document-dropdown'); - - if (dropdownEl) { + if (docDropdown) { // Initialize Bootstrap dropdown with the right configuration new bootstrap.Dropdown(docDropdownButton, { boundary: 'viewport', @@ -1537,14 +1532,15 @@ document.addEventListener('DOMContentLoaded', function() { }); // Clear search when opening - dropdownEl.addEventListener('show.bs.dropdown', function() { + docDropdown.addEventListener('show.bs.dropdown', function() { if (docSearchInput) { docSearchInput.value = ''; } + documentSearchController?.applyFilter(''); }); // Adjust sizing and focus search when shown - dropdownEl.addEventListener('shown.bs.dropdown', function() { + docDropdown.addEventListener('shown.bs.dropdown', function() { initializeDocumentDropdown(); if (docSearchInput) { setTimeout(() => docSearchInput.focus(), 50); @@ -1552,20 +1548,8 @@ document.addEventListener('DOMContentLoaded', function() { }); // Clean up inline styles and reset state when hidden - dropdownEl.addEventListener('hidden.bs.dropdown', function() { - if (docSearchInput) { - docSearchInput.value = ''; - } - // Clear search filtering state - if (docDropdownItems) { - const items = docDropdownItems.querySelectorAll('.dropdown-item'); - items.forEach(item => { - item.removeAttribute('data-filtered'); - item.style.display = ''; - }); - const noMatchesEl = docDropdownItems.querySelector('.no-matches'); - if (noMatchesEl) noMatchesEl.remove(); - } + docDropdown.addEventListener('hidden.bs.dropdown', function() { + documentSearchController?.resetFilter(); // Clear inline styles set by initializeDocumentDropdown so they // don't interfere with Bootstrap's positioning on next open if (docDropdownMenu) { diff --git a/application/single_app/static/js/chat/chat-edit.js b/application/single_app/static/js/chat/chat-edit.js index 0e09b0d6..f8d109a7 100644 --- a/application/single_app/static/js/chat/chat-edit.js +++ b/application/single_app/static/js/chat/chat-edit.js @@ -3,6 +3,7 @@ import { showToast } from './chat-toast.js'; import { showLoadingIndicatorInChatbox, hideLoadingIndicatorInChatbox } from './chat-loading-indicator.js'; +import { sendMessageWithStreaming } from './chat-streaming.js'; /** * Handle edit button click - opens edit modal @@ -146,70 +147,44 @@ window.executeMessageEdit = function() { console.log(' retry_thread_id:', data.chat_request.retry_thread_id); console.log(' retry_thread_attempt:', data.chat_request.retry_thread_attempt); console.log(' Full chat_request:', data.chat_request); - - // Call chat API with the edit parameters - return fetch('/api/chat', { - method: 'POST', - headers: { - 'Content-Type': 'application/json', - }, - credentials: 'same-origin', - body: JSON.stringify(data.chat_request) - }); + + sendMessageWithStreaming( + data.chat_request, + null, + data.chat_request.conversation_id, + { + onDone: () => { + const conversationId = window.chatConversations?.getCurrentConversationId() || data.chat_request.conversation_id; + if (conversationId) { + import('./chat-messages.js').then(module => { + module.loadMessages(conversationId); + }).catch(err => { + console.error('❌ Error loading chat-messages module:', err); + showToast('Failed to reload messages', 'error'); + }); + } + }, + onError: (errorMessage) => { + showToast(`Edit failed: ${errorMessage}`, 'error'); + }, + onFinally: () => { + hideLoadingIndicatorInChatbox(); + } + } + ); + + return null; } else { throw new Error('Edit response missing chat_request'); } }) - .then(response => { - if (!response.ok) { - return response.json().then(data => { - throw new Error(data.error || 'Chat API failed'); - }); - } - return response.json(); - }) - .then(chatData => { - console.log('✅ Chat API response:', chatData); - - // Hide typing indicator - hideLoadingIndicatorInChatbox(); - console.log('🧹 Typing indicator removed'); - - // Get current conversation ID using the proper API - const conversationId = window.chatConversations?.getCurrentConversationId(); - - console.log(`🔍 Current conversation ID: ${conversationId}`); - - // Reload messages to show edited message and new response - if (conversationId) { - console.log('🔄 Reloading messages for conversation:', conversationId); - - // Import loadMessages dynamically - import('./chat-messages.js').then(module => { - console.log('📦 chat-messages.js module loaded, calling loadMessages...'); - module.loadMessages(conversationId); - // No toast - the reloaded messages are enough feedback - }).catch(err => { - console.error('❌ Error loading chat-messages module:', err); - showToast('error', 'Failed to reload messages'); - }); - } else { - console.error('❌ No currentConversationId found!'); - - // Try to force a page refresh as fallback - console.log('🔄 Attempting page refresh as fallback...'); - setTimeout(() => { - window.location.reload(); - }, 1000); - } - }) .catch(error => { console.error('❌ Edit error:', error); // Hide typing indicator on error hideLoadingIndicatorInChatbox(); - showToast('error', `Edit failed: ${error.message}`); + showToast(`Edit failed: ${error.message}`, 'error'); }) .finally(() => { // Clean up pending edit diff --git a/application/single_app/static/js/chat/chat-enhanced-citations.js b/application/single_app/static/js/chat/chat-enhanced-citations.js index dcda708b..93779da9 100644 --- a/application/single_app/static/js/chat/chat-enhanced-citations.js +++ b/application/single_app/static/js/chat/chat-enhanced-citations.js @@ -18,11 +18,13 @@ export function getFileType(fileName) { const imageExtensions = ['jpg', 'jpeg', 'png', 'bmp', 'tiff', 'tif']; const videoExtensions = ['mp4', 'mov', 'avi', 'mkv', 'flv', 'webm', 'wmv', 'm4v', '3gp']; const audioExtensions = ['mp3', 'wav', 'ogg', 'aac', 'flac', 'm4a']; - + const tabularExtensions = ['csv', 'xlsx', 'xls', 'xlsm']; + if (imageExtensions.includes(ext)) return 'image'; if (ext === 'pdf') return 'pdf'; if (videoExtensions.includes(ext)) return 'video'; if (audioExtensions.includes(ext)) return 'audio'; + if (tabularExtensions.includes(ext)) return 'tabular'; return 'other'; } @@ -32,8 +34,9 @@ export function getFileType(fileName) { * @param {string} docId - Document ID * @param {string|number} pageNumberOrTimestamp - Page number for PDF or timestamp for video/audio * @param {string} citationId - Citation ID for fallback + * @param {string|null} initialSheetName - Workbook sheet to open initially for tabular files */ -export function showEnhancedCitationModal(docId, pageNumberOrTimestamp, citationId) { +export function showEnhancedCitationModal(docId, pageNumberOrTimestamp, citationId, initialSheetName = null) { // Get document metadata to determine file type const docMetadata = getDocumentMetadata(docId); if (!docMetadata || !docMetadata.file_name) { @@ -66,6 +69,9 @@ export function showEnhancedCitationModal(docId, pageNumberOrTimestamp, citation const audioTimestamp = convertTimestampToSeconds(pageNumberOrTimestamp); showAudioModal(docId, audioTimestamp, docMetadata.file_name); break; + case 'tabular': + showTabularDownloadModal(docId, docMetadata.file_name, initialSheetName); + break; default: // Fall back to text citation for unsupported types import('./chat-citations.js').then(module => { @@ -291,6 +297,249 @@ export function showAudioModal(docId, timestamp, fileName) { modalInstance.show(); } +function triggerBlobDownload(blob, filename) { + const url = URL.createObjectURL(blob); + const link = document.createElement('a'); + link.href = url; + link.download = filename; + document.body.appendChild(link); + link.click(); + document.body.removeChild(link); + window.setTimeout(() => URL.revokeObjectURL(url), 0); +} + +function getDownloadFilename(response, fallbackFilename) { + const contentDisposition = response.headers.get('Content-Disposition') || ''; + const utf8Match = contentDisposition.match(/filename\*=UTF-8''([^;]+)/i); + if (utf8Match && utf8Match[1]) { + try { + return decodeURIComponent(utf8Match[1]); + } catch (error) { + console.warn('Could not decode UTF-8 filename from Content-Disposition:', error); + return utf8Match[1]; + } + } + + const quotedMatch = contentDisposition.match(/filename="([^"]+)"/i); + if (quotedMatch && quotedMatch[1]) { + return quotedMatch[1]; + } + + const unquotedMatch = contentDisposition.match(/filename=([^;]+)/i); + if (unquotedMatch && unquotedMatch[1]) { + return unquotedMatch[1].trim(); + } + + return fallbackFilename || 'download'; +} + +async function downloadTabularFile(downloadUrl, fallbackFilename, downloadBtn) { + const originalMarkup = downloadBtn.innerHTML; + downloadBtn.disabled = true; + downloadBtn.classList.add('disabled'); + downloadBtn.innerHTML = 'Downloading...'; + + try { + const response = await fetch(downloadUrl, { + credentials: 'same-origin', + }); + + if (!response.ok) { + let errorMessage = `Could not download file (${response.status}).`; + const contentType = response.headers.get('Content-Type') || ''; + + if (contentType.includes('application/json')) { + const errorData = await response.json().catch(() => null); + if (errorData && errorData.error) { + errorMessage = errorData.error; + } + } else { + const errorText = await response.text().catch(() => ''); + if (errorText) { + errorMessage = errorText; + } + } + + throw new Error(errorMessage); + } + + const blob = await response.blob(); + const downloadFilename = getDownloadFilename(response, fallbackFilename); + triggerBlobDownload(blob, downloadFilename); + } catch (error) { + console.error('Error downloading tabular file:', error); + showToast(error.message || 'Could not download file.', 'danger'); + } finally { + downloadBtn.disabled = false; + downloadBtn.classList.remove('disabled'); + downloadBtn.innerHTML = originalMarkup; + } +} + +/** + * Show tabular file preview modal with data table + * @param {string} docId - Document ID + * @param {string} fileName - File name + * @param {string|null} initialSheetName - Workbook sheet to open initially + */ +export function showTabularDownloadModal(docId, fileName, initialSheetName = null) { + console.log(`Showing tabular preview modal for docId: ${docId}, fileName: ${fileName}`); + showLoadingIndicator(); + + // Create or get tabular modal + let tabularModal = document.getElementById("enhanced-tabular-modal"); + if (!tabularModal) { + tabularModal = createTabularModal(); + } + + const title = tabularModal.querySelector(".modal-title"); + const tableContainer = tabularModal.querySelector("#enhanced-tabular-table-container"); + const rowInfo = tabularModal.querySelector("#enhanced-tabular-row-info"); + const downloadBtn = tabularModal.querySelector("#enhanced-tabular-download"); + const errorContainer = tabularModal.querySelector("#enhanced-tabular-error"); + const sheetControls = tabularModal.querySelector("#enhanced-tabular-sheet-controls"); + const sheetSelect = tabularModal.querySelector("#enhanced-tabular-sheet-select"); + + title.textContent = `Tabular Data: ${fileName}`; + tableContainer.innerHTML = '
Loading...

Loading data preview...

'; + rowInfo.textContent = ''; + errorContainer.classList.add('d-none'); + sheetControls.classList.add('d-none'); + sheetSelect.innerHTML = ''; + + const downloadUrl = `/api/enhanced_citations/tabular_workspace?doc_id=${encodeURIComponent(docId)}`; + downloadBtn.onclick = (event) => { + event.preventDefault(); + downloadTabularFile(downloadUrl, fileName, downloadBtn); + }; + + // Show modal immediately with loading state + const modalInstance = new bootstrap.Modal(tabularModal); + modalInstance.show(); + + const escapeOptionValue = (value) => String(value) + .replace(/&/g, '&') + .replace(//g, '>') + .replace(/"/g, '"'); + + const loadTabularPreview = (selectedSheetName = null) => { + errorContainer.classList.add('d-none'); + + const params = new URLSearchParams({ + doc_id: docId, + }); + if (selectedSheetName) { + params.set('sheet_name', selectedSheetName); + } + + const previewUrl = `/api/enhanced_citations/tabular_preview?${params.toString()}`; + fetch(previewUrl) + .then(response => { + if (!response.ok) throw new Error(`HTTP ${response.status}`); + return response.json(); + }) + .then(data => { + hideLoadingIndicator(); + if (data.error) { + showTabularError(tableContainer, errorContainer, data.error); + return; + } + + title.textContent = data.selected_sheet + ? `Tabular Data: ${fileName} [${data.selected_sheet}]` + : `Tabular Data: ${fileName}`; + + const sheetNames = Array.isArray(data.sheet_names) ? data.sheet_names : []; + if (sheetNames.length > 1) { + sheetControls.classList.remove('d-none'); + sheetSelect.innerHTML = sheetNames + .map(sheetName => { + const isSelected = sheetName === data.selected_sheet ? ' selected' : ''; + return ``; + }) + .join(''); + sheetSelect.onchange = () => { + showLoadingIndicator(); + loadTabularPreview(sheetSelect.value); + }; + } else { + sheetControls.classList.add('d-none'); + sheetSelect.innerHTML = ''; + } + + renderTabularPreview(tableContainer, rowInfo, data); + }) + .catch(error => { + hideLoadingIndicator(); + console.error('Error loading tabular preview:', error); + showTabularError(tableContainer, errorContainer, 'Could not load data preview.'); + }); + }; + + loadTabularPreview(initialSheetName); +} + +/** + * Render tabular data as an HTML table + * @param {HTMLElement} container - Table container element + * @param {HTMLElement} rowInfo - Row info display element + * @param {Object} data - Preview data from API + */ +function renderTabularPreview(container, rowInfo, data) { + const { columns, rows, total_rows, truncated, selected_sheet } = data; + + // Build table HTML + let html = ''; + + // Header + html += ''; + for (const col of columns) { + const escaped = col.replace(/&/g, '&').replace(//g, '>'); + html += ``; + } + html += ''; + + // Body + html += ''; + for (const row of rows) { + html += ''; + for (const cell of row) { + const val = cell === null || cell === undefined ? '' : String(cell); + const escaped = val.replace(/&/g, '&').replace(//g, '>'); + html += ``; + } + html += ''; + } + html += '
${escaped}
${escaped}
'; + + container.innerHTML = html; + + // Row info + const displayedRows = rows.length; + const hasTotalRows = total_rows !== null && total_rows !== undefined; + const totalFormatted = hasTotalRows ? total_rows.toLocaleString() : displayedRows.toLocaleString(); + const sheetPrefix = selected_sheet ? `Sheet ${selected_sheet} · ` : ''; + if (truncated) { + const truncationSuffix = hasTotalRows ? `${totalFormatted} rows` : `${displayedRows.toLocaleString()}+ rows`; + rowInfo.textContent = `${sheetPrefix}Showing ${displayedRows.toLocaleString()} of ${truncationSuffix}`; + } else { + rowInfo.textContent = `${sheetPrefix}${totalFormatted} rows, ${columns.length} columns`; + } +} + +/** + * Show error state in tabular modal with download fallback + * @param {HTMLElement} tableContainer - Table container element + * @param {HTMLElement} errorContainer - Error display element + * @param {string} message - Error message + */ +function showTabularError(tableContainer, errorContainer, message) { + tableContainer.innerHTML = '
'; + errorContainer.textContent = message + ' You can still download the file below.'; + errorContainer.classList.remove('d-none'); +} + /** * Convert timestamp string to seconds * @param {string|number} timestamp - Timestamp in various formats @@ -445,3 +694,40 @@ function createPdfModal() { document.body.appendChild(modal); return modal; } + +/** + * Create tabular file preview modal HTML structure + * @returns {HTMLElement} - Modal element + */ +function createTabularModal() { + const modal = document.createElement("div"); + modal.id = "enhanced-tabular-modal"; + modal.classList.add("modal", "fade"); + modal.tabIndex = -1; + modal.innerHTML = ` + + `; + document.body.appendChild(modal); + return modal; +} diff --git a/application/single_app/static/js/chat/chat-export.js b/application/single_app/static/js/chat/chat-export.js index 269cbfe0..fc53d2b6 100644 --- a/application/single_app/static/js/chat/chat-export.js +++ b/application/single_app/static/js/chat/chat-export.js @@ -15,6 +15,8 @@ let exportConversationIds = []; let exportConversationTitles = {}; let exportFormat = 'json'; let exportPackaging = 'single'; +let includeSummaryIntro = false; +let summaryModelDeployment = ''; let currentStep = 1; let totalSteps = 3; let skipSelectionStep = false; @@ -53,14 +55,16 @@ function openExportWizard(conversationIds, skipSelection) { exportConversationTitles = {}; exportFormat = 'json'; exportPackaging = conversationIds.length > 1 ? 'zip' : 'single'; + includeSummaryIntro = false; + summaryModelDeployment = _getDefaultSummaryModel(); skipSelectionStep = !!skipSelection; // Determine step configuration if (skipSelectionStep) { - totalSteps = 3; + totalSteps = 4; currentStep = 1; // Format step (mapped to visual step) } else { - totalSteps = 4; + totalSteps = 5; currentStep = 1; // Selection review step } @@ -142,19 +146,21 @@ function _renderCurrentStep() { if (!stepBody) return; if (skipSelectionStep) { - // Steps: 1=Format, 2=Packaging, 3=Download + // Steps: 1=Format, 2=Packaging, 3=Summary, 4=Download switch (currentStep) { case 1: _renderFormatStep(stepBody); break; case 2: _renderPackagingStep(stepBody); break; - case 3: _renderDownloadStep(stepBody); break; + case 3: _renderSummaryStep(stepBody); break; + case 4: _renderDownloadStep(stepBody); break; } } else { - // Steps: 1=Selection, 2=Format, 3=Packaging, 4=Download + // Steps: 1=Selection, 2=Format, 3=Packaging, 4=Summary, 5=Download switch (currentStep) { case 1: _renderSelectionStep(stepBody); break; case 2: _renderFormatStep(stepBody); break; case 3: _renderPackagingStep(stepBody); break; - case 4: _renderDownloadStep(stepBody); break; + case 4: _renderSummaryStep(stepBody); break; + case 5: _renderDownloadStep(stepBody); break; } } } @@ -210,7 +216,7 @@ function _renderFormatStep(container) {

Select the format for your exported conversations.

-
+
@@ -219,7 +225,7 @@ function _renderFormatStep(container) {
-
+
@@ -228,6 +234,15 @@ function _renderFormatStep(container) {
+
+
+
+ +
PDF
+

Print-ready format with chat bubbles. Ideal for archiving and printing.

+
+
+
`; // Wire card clicks @@ -297,11 +312,68 @@ function _renderPackagingStep(container) { }); } +function _renderSummaryStep(container) { + const mainModelSelect = getEl('model-select'); + const hasModelOptions = Boolean(mainModelSelect && mainModelSelect.options.length > 0); + const defaultSummaryModel = summaryModelDeployment || _getDefaultSummaryModel(); + const perConversationText = exportConversationIds.length > 1 + ? 'An intro will be generated for each exported conversation.' + : 'An intro will be generated for this conversation.'; + + container.innerHTML = ` +
+
Optional Intro Summary
+

Add a short abstract before the exported transcript. ${perConversationText}

+
+
+ + +
+
+
+ + +
Uses the same model list as the chat composer.
+
+
`; + + const toggle = getEl('export-summary-toggle'); + const modelContainer = getEl('export-summary-model-container'); + const summaryModelSelect = getEl('export-summary-model'); + + if (summaryModelSelect && hasModelOptions) { + summaryModelSelect.value = defaultSummaryModel || summaryModelSelect.value; + summaryModelDeployment = summaryModelSelect.value; + summaryModelSelect.addEventListener('change', () => { + summaryModelDeployment = summaryModelSelect.value; + }); + } + + if (toggle) { + toggle.addEventListener('change', () => { + includeSummaryIntro = toggle.checked; + if (modelContainer) { + modelContainer.classList.toggle('d-none', !includeSummaryIntro); + } + if (includeSummaryIntro && summaryModelSelect && !summaryModelSelect.value) { + summaryModelSelect.value = _getDefaultSummaryModel(); + summaryModelDeployment = summaryModelSelect.value; + } + }); + } +} + function _renderDownloadStep(container) { const count = exportConversationIds.length; - const formatLabel = exportFormat === 'json' ? 'JSON' : 'Markdown'; + const formatLabels = { json: 'JSON', markdown: 'Markdown', pdf: 'PDF' }; + const formatLabel = formatLabels[exportFormat] || exportFormat.toUpperCase(); const packagingLabel = exportPackaging === 'zip' ? 'ZIP Archive' : 'Single File'; - const ext = exportPackaging === 'zip' ? '.zip' : (exportFormat === 'json' ? '.json' : '.md'); + const extMap = { json: '.json', markdown: '.md', pdf: '.pdf' }; + const ext = exportPackaging === 'zip' ? '.zip' : (extMap[exportFormat] || '.bin'); + const summaryLabel = includeSummaryIntro ? 'Enabled' : 'Disabled'; + const summaryModelLabel = includeSummaryIntro ? (summaryModelDeployment || 'Configured default') : '—'; let conversationsList = ''; exportConversationIds.forEach(id => { @@ -328,6 +400,14 @@ function _renderDownloadStep(container) {
Packaging:
${packagingLabel}
+
+
Intro summary:
+
${summaryLabel}
+
+
+
Summary model:
+
${_escapeHtml(summaryModelLabel)}
+
File type:
${ext}
@@ -364,6 +444,7 @@ function _updateStepIndicators() { steps = [ { label: 'Format', icon: 'bi-filetype-json' }, { label: 'Packaging', icon: 'bi-box' }, + { label: 'Summary', icon: 'bi-card-text' }, { label: 'Download', icon: 'bi-download' } ]; } else { @@ -371,6 +452,7 @@ function _updateStepIndicators() { { label: 'Select', icon: 'bi-list-check' }, { label: 'Format', icon: 'bi-filetype-json' }, { label: 'Packaging', icon: 'bi-box' }, + { label: 'Summary', icon: 'bi-card-text' }, { label: 'Download', icon: 'bi-download' } ]; } @@ -448,7 +530,9 @@ async function _executeExport() { body: JSON.stringify({ conversation_ids: exportConversationIds, format: exportFormat, - packaging: exportPackaging + packaging: exportPackaging, + include_summary_intro: includeSummaryIntro, + summary_model_deployment: includeSummaryIntro ? summaryModelDeployment : null }) }); @@ -460,7 +544,8 @@ async function _executeExport() { // Get filename from Content-Disposition header const disposition = response.headers.get('Content-Disposition') || ''; const filenameMatch = disposition.match(/filename="?([^"]+)"?/); - const filename = filenameMatch ? filenameMatch[1] : `conversations_export.${exportPackaging === 'zip' ? 'zip' : (exportFormat === 'json' ? 'json' : 'md')}`; + const fallbackExtMap = { json: 'json', markdown: 'md', pdf: 'pdf' }; + const filename = filenameMatch ? filenameMatch[1] : `conversations_export.${exportPackaging === 'zip' ? 'zip' : (fallbackExtMap[exportFormat] || 'bin')}`; // Download the blob const blob = await response.blob(); @@ -511,6 +596,15 @@ function _escapeHtml(text) { return div.innerHTML; } +function _getDefaultSummaryModel() { + const mainModelSelect = getEl('model-select'); + if (!mainModelSelect) { + return ''; + } + + return mainModelSelect.value || (mainModelSelect.options[0] ? mainModelSelect.options[0].value : ''); +} + // --- Expose Globally --- window.chatExport = { openExportWizard diff --git a/application/single_app/static/js/chat/chat-input-actions.js b/application/single_app/static/js/chat/chat-input-actions.js index 77851319..66eaf044 100644 --- a/application/single_app/static/js/chat/chat-input-actions.js +++ b/application/single_app/static/js/chat/chat-input-actions.js @@ -127,11 +127,11 @@ export function fetchFileContent(conversationId, fileId) { hideLoadingIndicator(); if (data.file_content && data.filename) { - showFileContentPopup(data.file_content, data.filename, data.is_table); + showFileContentPopup(data.file_content, data.filename, data.is_table, data.file_content_source, conversationId, fileId); } else if (data.error) { showToast(data.error, "danger"); } else { - ashowToastlert("Unexpected response from server.", "danger"); + showToast("Unexpected response from server.", "danger"); } }) .catch((error) => { @@ -141,7 +141,7 @@ export function fetchFileContent(conversationId, fileId) { }); } -export function showFileContentPopup(fileContent, filename, isTable) { +export function showFileContentPopup(fileContent, filename, isTable, fileContentSource, conversationId, fileId) { let modalContainer = document.getElementById("file-modal"); if (!modalContainer) { modalContainer = document.createElement("div"); @@ -155,6 +155,7 @@ export function showFileContentPopup(fileContent, filename, isTable) { @@ -816,6 +829,9 @@ export function appendMessage( } }); } + + // Attach thoughts toggle listener + attachThoughtsToggleListener(messageDiv, messageId, currentConversationId); const maskBtn = messageDiv.querySelector(".mask-btn"); if (maskBtn) { @@ -851,6 +867,50 @@ export function appendMessage( handleRetryButtonClick(messageDiv, currentMessageId, 'assistant'); }); } + + const dropdownExportMdBtn = messageDiv.querySelector(".dropdown-export-md-btn"); + if (dropdownExportMdBtn) { + dropdownExportMdBtn.addEventListener("click", (e) => { + e.preventDefault(); + const currentMessageId = messageDiv.getAttribute('data-message-id'); + import('./chat-message-export.js').then(module => { + module.exportMessageAsMarkdown(messageDiv, currentMessageId, 'assistant'); + }).catch(err => console.error('Error loading message export module:', err)); + }); + } + + const dropdownExportWordBtn = messageDiv.querySelector(".dropdown-export-word-btn"); + if (dropdownExportWordBtn) { + dropdownExportWordBtn.addEventListener("click", (e) => { + e.preventDefault(); + const currentMessageId = messageDiv.getAttribute('data-message-id'); + import('./chat-message-export.js').then(module => { + module.exportMessageAsWord(messageDiv, currentMessageId, 'assistant'); + }).catch(err => console.error('Error loading message export module:', err)); + }); + } + + const dropdownCopyPromptBtn = messageDiv.querySelector(".dropdown-copy-prompt-btn"); + if (dropdownCopyPromptBtn) { + dropdownCopyPromptBtn.addEventListener("click", (e) => { + e.preventDefault(); + const currentMessageId = messageDiv.getAttribute('data-message-id'); + import('./chat-message-export.js').then(module => { + module.copyAsPrompt(messageDiv, currentMessageId, 'assistant'); + }).catch(err => console.error('Error loading message export module:', err)); + }); + } + + const dropdownOpenEmailBtn = messageDiv.querySelector(".dropdown-open-email-btn"); + if (dropdownOpenEmailBtn) { + dropdownOpenEmailBtn.addEventListener("click", (e) => { + e.preventDefault(); + const currentMessageId = messageDiv.getAttribute('data-message-id'); + import('./chat-message-export.js').then(module => { + module.openInEmail(messageDiv, currentMessageId, 'assistant'); + }).catch(err => console.error('Error loading message export module:', err)); + }); + } // Handle dropdown positioning manually - move to chatbox container const dropdownToggle = messageDiv.querySelector(".message-actions .dropdown button[data-bs-toggle='dropdown']"); @@ -1076,6 +1136,11 @@ export function appendMessage(
  • Edit
  • Delete
  • Retry
  • +
  • +
  • Export to Markdown
  • +
  • Export to Word
  • +
  • Use as Prompt
  • +
  • Open in Email
  • `; + const containerHtml = `
    Loading thoughts...
    `; + + return { toggleHtml, containerHtml }; +} + +/** + * Attach event listener for the thoughts toggle button inside a message div. + * @param {HTMLElement} messageDiv + * @param {string} messageId + * @param {string} conversationId + */ +export function attachThoughtsToggleListener(messageDiv, messageId, conversationId) { + const toggleBtn = messageDiv.querySelector('.thoughts-toggle-btn'); + if (!toggleBtn) return; + + toggleBtn.addEventListener('click', () => { + const targetId = toggleBtn.getAttribute('aria-controls'); + const container = messageDiv.querySelector(`#${targetId}`); + if (!container) return; + + // Store scroll position + const scrollContainer = document.getElementById('chat-messages-container'); + const currentScroll = scrollContainer?.scrollTop || window.pageYOffset; + + const isExpanded = !container.classList.contains('d-none'); + if (isExpanded) { + container.classList.add('d-none'); + toggleBtn.setAttribute('aria-expanded', 'false'); + toggleBtn.title = 'Show processing thoughts'; + toggleBtn.innerHTML = ''; + } else { + container.classList.remove('d-none'); + toggleBtn.setAttribute('aria-expanded', 'true'); + toggleBtn.title = 'Hide processing thoughts'; + toggleBtn.innerHTML = ''; + + // Lazy-load thoughts on first expand + if (container.innerHTML.includes('Loading thoughts')) { + loadThoughtsForMessage(conversationId, messageId, container); + } + } + + // Restore scroll position + setTimeout(() => { + if (scrollContainer) { + scrollContainer.scrollTop = currentScroll; + } else { + window.scrollTo(0, currentScroll); + } + }, 10); + }); +} + +// --------------------------------------------------------------------------- +// Fetch + render thoughts for a message +// --------------------------------------------------------------------------- + +/** + * Fetch thoughts for a specific message from the API and render them. + * @param {string} conversationId + * @param {string} messageId + * @param {HTMLElement} container + */ +function loadThoughtsForMessage(conversationId, messageId, container) { + fetch(`/api/conversations/${conversationId}/messages/${messageId}/thoughts`, { + credentials: 'same-origin' + }) + .then(r => r.json()) + .then(data => { + if (!data.enabled) { + container.innerHTML = '
    Processing thoughts are disabled.
    '; + return; + } + if (!data.thoughts || data.thoughts.length === 0) { + container.innerHTML = '
    No processing thoughts recorded for this message.
    '; + return; + } + container.innerHTML = renderThoughtsList(data.thoughts); + }) + .catch(err => { + console.error('Error loading thoughts:', err); + container.innerHTML = '
    Failed to load processing thoughts.
    '; + }); +} + +/** + * Render a list of thought steps as HTML. + * @param {Array} thoughts + * @returns {string} HTML string + */ +function renderThoughtsList(thoughts) { + let html = '
    '; + thoughts.forEach(t => { + const icon = getThoughtIcon(t.step_type); + const durationStr = t.duration_ms != null ? `(${t.duration_ms}ms)` : ''; + html += `
    + + ${escapeHtml(t.content || '')} + ${durationStr} +
    `; + }); + html += '
    '; + return html; +} diff --git a/application/single_app/static/js/plugin_common.js b/application/single_app/static/js/plugin_common.js index e40158b9..29a88a24 100644 --- a/application/single_app/static/js/plugin_common.js +++ b/application/single_app/static/js/plugin_common.js @@ -2,6 +2,10 @@ // Shared logic for admin_plugins.js and workspace_plugins.js // Exports: functions for modal field handling, validation, label toggling, table rendering, and plugin CRUD import { showToast } from "./chat/chat-toast.js" +import { + humanizeName, truncateDescription, + openViewModal, createActionCard +} from './workspace/view-utils.js'; // Fetch merged plugin settings from backend given type and current settings export async function fetchAndMergePluginSettings(pluginType, currentSettings = {}) { @@ -60,8 +64,7 @@ export function escapeHtml(str) { } // Render plugins table (parameterized for tbody selector and button handlers) -export function renderPluginsTable({plugins, tbodySelector, onEdit, onDelete, ensureTable = true, isAdmin = false}) { - console.log('Rendering plugins table with %d plugins', plugins.length); +export function renderPluginsTable({plugins, tbodySelector, onEdit, onDelete, onView, ensureTable = true, isAdmin = false}) { // Optionally ensure the table is present before rendering if (ensureTable) { ensurePluginsTableInRoot(); @@ -75,29 +78,33 @@ export function renderPluginsTable({plugins, tbodySelector, onEdit, onDelete, en plugins.forEach(plugin => { const tr = document.createElement('tr'); const safeName = escapeHtml(plugin.name); - const safeDisplayName = escapeHtml(plugin.display_name || plugin.name); - const safeDesc = escapeHtml(plugin.description || 'No description available'); + const displayName = humanizeName(plugin.display_name || plugin.name); + const safeDisplayName = escapeHtml(displayName); + const description = plugin.description || 'No description available'; + const truncatedDesc = escapeHtml(truncateDescription(description, 90)); let actionButtons = ''; let globalBadge = plugin.is_global ? ' Global' : ''; - // Show action buttons for: - // - Admin context: all actions (global and personal) - // - User context: only personal actions (not global) + // View button always shown + let viewButton = ``; + + // Edit/Delete buttons based on context + let editDeleteButtons = ''; if (isAdmin || !plugin.is_global) { - actionButtons = ` -
    + editDeleteButtons = ` -
    - `; + `; } + actionButtons = `
    ${viewButton}${editDeleteButtons}
    `; tr.innerHTML = ` - ${safeDisplayName}${globalBadge} - ${safeDesc} + ${safeDisplayName}${globalBadge} + ${truncatedDesc} ${actionButtons} `; tbody.appendChild(tr); @@ -109,6 +116,34 @@ export function renderPluginsTable({plugins, tbodySelector, onEdit, onDelete, en tbody.querySelectorAll('.delete-plugin-btn').forEach(btn => { btn.onclick = () => onDelete(btn.getAttribute('data-plugin-name')); }); + tbody.querySelectorAll('.view-plugin-btn').forEach(btn => { + btn.onclick = () => { + if (onView) { + onView(btn.getAttribute('data-plugin-name')); + } + }; + }); +} + +// Render plugins grid (card-based view) +export function renderPluginsGrid({plugins, containerSelector, onEdit, onDelete, onView, isAdmin = false}) { + const container = document.querySelector(containerSelector); + if (!container) return; + container.innerHTML = ''; + if (!plugins.length) { + container.innerHTML = '
    No actions found.
    '; + return; + } + plugins.forEach(plugin => { + const card = createActionCard(plugin, { + onView: (p) => { if (onView) onView(p.name); }, + onEdit: (p) => onEdit(p.name), + onDelete: (p) => onDelete(p.name), + canManage: isAdmin || !plugin.is_global, + isAdmin + }); + container.appendChild(card); + }); } // Toggle auth fields and labels (parameterized for DOM elements) diff --git a/application/single_app/static/js/plugin_modal_stepper.js b/application/single_app/static/js/plugin_modal_stepper.js index 89076076..2e619fcf 100644 --- a/application/single_app/static/js/plugin_modal_stepper.js +++ b/application/single_app/static/js/plugin_modal_stepper.js @@ -1,6 +1,10 @@ // plugin_modal_stepper.js // Multi-step modal functionality for action/plugin creation import { showToast } from "./chat/chat-toast.js"; +import { getTypeIcon } from "./workspace/view-utils.js"; + +// Action types hidden from the creation UI (backend plugins remain intact) +const HIDDEN_ACTION_TYPES = ['sql_schema', 'ui_test', 'queue_storage', 'blob_storage', 'embedding_model']; export class PluginModalStepper { @@ -129,6 +133,12 @@ export class PluginModalStepper { document.getElementById('sql-auth-type').addEventListener('change', () => this.handleSqlAuthTypeChange()); + // Test SQL connection button + const testConnBtn = document.getElementById('sql-test-connection-btn'); + if (testConnBtn) { + testConnBtn.addEventListener('click', () => this.testSqlConnection()); + } + // Set up display name to generated name conversion this.setupNameGeneration(); @@ -193,6 +203,8 @@ export class PluginModalStepper { if (!res.ok) throw new Error('Failed to load action types'); this.availableTypes = await res.json(); + // Hide deprecated/internal action types from the creation UI + this.availableTypes = this.availableTypes.filter(t => !HIDDEN_ACTION_TYPES.includes(t.type)); // Sort action types alphabetically by display name this.availableTypes.sort((a, b) => { const nameA = (a.display || a.displayName || a.type || a.name || '').toLowerCase(); @@ -271,10 +283,15 @@ export class PluginModalStepper { description.substring(0, maxLength) + '...' : description; const needsTruncation = description.length > maxLength; + const iconClass = getTypeIcon(type.type || type.name); + col.innerHTML = `
    -
    ${this.escapeHtml(displayName)}
    +
    + +
    ${this.escapeHtml(displayName)}
    +

    ${this.escapeHtml(truncatedDescription)} ${needsTruncation ? ` @@ -538,43 +555,52 @@ export class PluginModalStepper { } if (stepNumber === 4) { - // Load additional settings schema for selected type - let options = {forceReload: true}; - this.getAdditionalSettingsSchema(this.selectedType, options); + const isSqlType = this.selectedType === 'sql_query' || this.selectedType === 'sql_schema'; const additionalFieldsDiv = document.getElementById('plugin-additional-fields-div'); - if (additionalFieldsDiv) { - // Only clear and rebuild if type changes - if (this.selectedType !== this.lastAdditionalFieldsType) { - additionalFieldsDiv.innerHTML = ''; - additionalFieldsDiv.classList.remove('d-none'); - if (this.selectedType) { - this.getAdditionalSettingsSchema(this.selectedType) - .then(schema => { - if (schema) { - this.buildAdditionalFieldsUI(schema, additionalFieldsDiv); - try { - if (this.isEditMode && this.originalPlugin && this.originalPlugin.additionalFields) { - this.populateDynamicAdditionalFields(this.originalPlugin.additionalFields); + + // For SQL types, hide additional fields entirely since Step 3 covers all SQL config + if (isSqlType && additionalFieldsDiv) { + additionalFieldsDiv.innerHTML = ''; + additionalFieldsDiv.classList.add('d-none'); + this.lastAdditionalFieldsType = this.selectedType; + } else { + // Load additional settings schema for selected type + let options = {forceReload: true}; + this.getAdditionalSettingsSchema(this.selectedType, options); + if (additionalFieldsDiv) { + // Only clear and rebuild if type changes + if (this.selectedType !== this.lastAdditionalFieldsType) { + additionalFieldsDiv.innerHTML = ''; + additionalFieldsDiv.classList.remove('d-none'); + if (this.selectedType) { + this.getAdditionalSettingsSchema(this.selectedType) + .then(schema => { + if (schema) { + this.buildAdditionalFieldsUI(schema, additionalFieldsDiv); + try { + if (this.isEditMode && this.originalPlugin && this.originalPlugin.additionalFields) { + this.populateDynamicAdditionalFields(this.originalPlugin.additionalFields); + } + } catch (error) { + console.error('Error populating dynamic additional fields:', error); } - } catch (error) { - console.error('Error populating dynamic additional fields:', error); + } else { + console.log('No additional settings schema found'); + additionalFieldsDiv.classList.add('d-none'); } - } else { - console.log('No additional settings schema found'); + }) + .catch(error => { + console.error(`Error fetching additional settings schema for type: ${this.selectedType} -- ${error}`); additionalFieldsDiv.classList.add('d-none'); - } - }) - .catch(error => { - console.error(`Error fetching additional settings schema for type: ${this.selectedType} -- ${error}`); - additionalFieldsDiv.classList.add('d-none'); - }); - } else { - console.warn('No plugin type selected'); - additionalFieldsDiv.classList.add('d-none'); + }); + } else { + console.warn('No plugin type selected'); + additionalFieldsDiv.classList.add('d-none'); + } + this.lastAdditionalFieldsType = this.selectedType; } - this.lastAdditionalFieldsType = this.selectedType; + // Otherwise, preserve user data and do not redraw } - // Otherwise, preserve user data and do not redraw } if (!this.isEditMode) { @@ -1230,6 +1256,110 @@ export class PluginModalStepper { this.updateSqlAuthInfo(); } + getSqlTestPluginContext() { + if (!this.isEditMode || !this.originalPlugin) { + return null; + } + + const originalPlugin = this.originalPlugin; + let scope = originalPlugin.scope; + + if (!scope) { + if (originalPlugin.is_group) { + scope = 'group'; + } else if (originalPlugin.is_global || window.location.pathname.includes('admin')) { + scope = 'global'; + } else { + scope = 'user'; + } + } + + return { + id: originalPlugin.id || '', + name: originalPlugin.name || '', + scope + }; + } + + async testSqlConnection() { + const btn = document.getElementById('sql-test-connection-btn'); + const resultDiv = document.getElementById('sql-test-connection-result'); + const alertDiv = document.getElementById('sql-test-connection-alert'); + if (!btn || !resultDiv || !alertDiv) return; + + // Collect current SQL config from Step 3 + const databaseType = document.querySelector('input[name="sql-database-type"]:checked')?.value; + const connectionMethod = document.querySelector('input[name="sql-connection-method"]:checked')?.value || 'parameters'; + const authType = document.getElementById('sql-auth-type')?.value || 'username_password'; + + if (!databaseType) { + resultDiv.classList.remove('d-none'); + alertDiv.className = 'alert alert-warning mb-0 py-2 px-3 small'; + alertDiv.textContent = 'Please select a database type first.'; + return; + } + + const payload = { + database_type: databaseType, + connection_method: connectionMethod, + auth_type: authType + }; + + if (connectionMethod === 'connection_string') { + payload.connection_string = document.getElementById('sql-connection-string')?.value?.trim() || ''; + } else { + payload.server = document.getElementById('sql-server')?.value?.trim() || ''; + payload.database = document.getElementById('sql-database')?.value?.trim() || ''; + payload.port = document.getElementById('sql-port')?.value?.trim() || ''; + if (databaseType === 'sqlserver' || databaseType === 'azure_sql') { + payload.driver = document.getElementById('sql-driver')?.value || ''; + } + } + + if (authType === 'username_password') { + payload.username = document.getElementById('sql-username')?.value?.trim() || ''; + payload.password = document.getElementById('sql-password')?.value?.trim() || ''; + } + + payload.timeout = parseInt(document.getElementById('sql-timeout')?.value) || 10; + + const existingPluginContext = this.getSqlTestPluginContext(); + if (existingPluginContext) { + payload.existing_plugin = existingPluginContext; + } + + // Show loading state + const originalText = btn.innerHTML; + btn.innerHTML = 'Testing...'; + btn.disabled = true; + resultDiv.classList.add('d-none'); + + try { + const response = await fetch('/api/plugins/test-sql-connection', { + method: 'POST', + headers: { 'Content-Type': 'application/json' }, + body: JSON.stringify(payload) + }); + const data = await response.json(); + + resultDiv.classList.remove('d-none'); + if (data.success) { + alertDiv.className = 'alert alert-success mb-0 py-2 px-3 small'; + alertDiv.innerHTML = '' + (data.message || 'Connection successful!'); + } else { + alertDiv.className = 'alert alert-danger mb-0 py-2 px-3 small'; + alertDiv.innerHTML = '' + (data.error || 'Connection failed.'); + } + } catch (error) { + resultDiv.classList.remove('d-none'); + alertDiv.className = 'alert alert-danger mb-0 py-2 px-3 small'; + alertDiv.innerHTML = 'Test failed: ' + (error.message || 'Network error'); + } finally { + btn.innerHTML = originalText; + btn.disabled = false; + } + } + updateSqlConnectionExamples() { const selectedType = document.querySelector('input[name="sql-database-type"]:checked')?.value; const examplesDiv = document.getElementById('sql-connection-examples'); @@ -1432,6 +1562,13 @@ export class PluginModalStepper { } else if (plugin.type && (plugin.type.toLowerCase().includes('sql') || plugin.type.toLowerCase() === 'sql_schema' || plugin.type.toLowerCase() === 'sql_query')) { // Populate SQL fields const additionalFields = plugin.additionalFields || {}; + const auth = plugin.auth || {}; + + const pluginVariant = plugin.type.toLowerCase() === 'sql_schema' ? 'schema' : 'query'; + const pluginTypeRadio = document.querySelector(`input[name="sql-plugin-type"][value="${pluginVariant}"]`); + if (pluginTypeRadio) { + pluginTypeRadio.checked = true; + } // Database type - select the appropriate radio button const databaseType = additionalFields.database_type || 'sqlserver'; @@ -1440,56 +1577,40 @@ export class PluginModalStepper { dbTypeRadio.checked = true; } - // Connection method (default to connection string) - // Note: The connection method might not be saved in the data, so we'll default to connection_string - const connectionMethodRadio = document.querySelector('input[name="sql-connection-method"][value="connection_string"]'); + const hasConnectionString = typeof additionalFields.connection_string === 'string' && additionalFields.connection_string.length > 0; + const connectionMethodValue = hasConnectionString ? 'connection_string' : 'parameters'; + const connectionMethodRadio = document.querySelector(`input[name="sql-connection-method"][value="${connectionMethodValue}"]`); if (connectionMethodRadio) { connectionMethodRadio.checked = true; } - - // Build connection string from individual parameters if needed - let connectionString = plugin.endpoint || ''; - if (!connectionString && additionalFields.server) { - // Build connection string from components - const server = additionalFields.server; - const database = additionalFields.database; - const driver = additionalFields.driver || 'ODBC Driver 17 for SQL Server'; - - if (databaseType === 'azure_sql' || databaseType === 'sqlserver') { - connectionString = `Server=${server};Database=${database};Driver={${driver}};`; - if (additionalFields.username && additionalFields.password) { - connectionString += `Uid=${additionalFields.username};Pwd=${additionalFields.password};`; - } - } else if (databaseType === 'postgresql') { - connectionString = `Host=${server};Database=${database};`; - if (additionalFields.username && additionalFields.password) { - connectionString += `Username=${additionalFields.username};Password=${additionalFields.password};`; - } - } else if (databaseType === 'mysql') { - connectionString = `Server=${server};Database=${database};`; - if (additionalFields.username && additionalFields.password) { - connectionString += `Uid=${additionalFields.username};Pwd=${additionalFields.password};`; - } - } - } - - document.getElementById('sql-connection-string').value = connectionString; - - // Authentication - const auth = plugin.auth || {}; - let sqlAuthType = 'username_password'; // Default for SQL plugins - - if (auth.type === 'user' || auth.type === 'username_password') { + + document.getElementById('sql-connection-string').value = additionalFields.connection_string || ''; + document.getElementById('sql-server').value = additionalFields.server || ''; + document.getElementById('sql-database').value = additionalFields.database || ''; + document.getElementById('sql-port').value = additionalFields.port || ''; + document.getElementById('sql-driver').value = additionalFields.driver || 'ODBC Driver 17 for SQL Server'; + + let sqlAuthType = hasConnectionString ? 'connection_string_only' : 'username_password'; + + if (auth.type === 'servicePrincipal') { + sqlAuthType = 'service_principal'; + document.getElementById('sql-client-id').value = auth.identity || auth.client_id || ''; + document.getElementById('sql-client-secret').value = auth.key || auth.client_secret || ''; + document.getElementById('sql-tenant-id').value = auth.tenantId || auth.tenant_id || ''; + } else if (auth.type === 'user' || auth.type === 'username_password' || additionalFields.username || additionalFields.password) { sqlAuthType = 'username_password'; document.getElementById('sql-username').value = additionalFields.username || ''; document.getElementById('sql-password').value = additionalFields.password || ''; } else if (auth.type === 'integrated' || auth.type === 'windows') { sqlAuthType = 'integrated'; - } else if (auth.type === 'connection_string') { - sqlAuthType = 'connection_string'; + } else if (auth.type === 'identity') { + sqlAuthType = databaseType === 'azure_sql' ? 'managed_identity' : 'integrated'; } document.getElementById('sql-auth-type').value = sqlAuthType; + this.handleSqlDatabaseTypeChange(); + this.handleSqlConnectionMethodChange(); + this.handleSqlAuthTypeChange(); } else { // Populate generic fields document.getElementById('plugin-endpoint-generic').value = plugin.endpoint || ''; @@ -1672,9 +1793,9 @@ export class PluginModalStepper { if (!clientId || !clientSecret || !tenantId) { throw new Error('Please enter client ID, client secret, and tenant ID'); } - auth.client_id = clientId; - auth.client_secret = clientSecret; - auth.tenant_id = tenantId; + auth.identity = clientId; + auth.key = clientSecret; + auth.tenantId = tenantId; break; case 'integrated': @@ -1720,12 +1841,17 @@ export class PluginModalStepper { // Collect additional fields from the dynamic UI and MERGE with existing additionalFields // This preserves OpenAPI spec content and other auto-populated fields - try { - const dynamicFields = this.collectAdditionalFields(); - // Merge dynamicFields into additionalFields (preserving existing values) - additionalFields = { ...additionalFields, ...dynamicFields }; - } catch (e) { - throw new Error('Invalid additional fields input'); + // For SQL types, Step 3 already provides all necessary config — skip dynamic field merge + // to prevent empty Step 4 fields from overwriting populated Step 3 values + const isSqlType = this.selectedType === 'sql_query' || this.selectedType === 'sql_schema'; + if (!isSqlType) { + try { + const dynamicFields = this.collectAdditionalFields(); + // Merge dynamicFields into additionalFields (preserving existing values) + additionalFields = { ...additionalFields, ...dynamicFields }; + } catch (e) { + throw new Error('Invalid additional fields input'); + } } let metadata = {}; @@ -2106,6 +2232,7 @@ export class PluginModalStepper { populateAdvancedSummary() { const advancedSection = document.getElementById('summary-advanced-section'); + const isSqlType = this.selectedType === 'sql_query' || this.selectedType === 'sql_schema'; // Check if there's any metadata or additional fields const metadata = document.getElementById('plugin-metadata').value.trim(); @@ -2123,9 +2250,33 @@ export class PluginModalStepper { hasMetadata = metadata.length > 0 && metadata !== '{}'; } - // DRY: Use private helper to collect additional fields - let additionalFieldsObj = this.collectAdditionalFields(); - hasAdditionalFields = Object.keys(additionalFieldsObj).length > 0; + // For SQL types, additional fields are already shown in the SQL Database Configuration + // summary section, so skip showing them again in Advanced to avoid redundancy + if (!isSqlType) { + // DRY: Use private helper to collect additional fields + let additionalFieldsObj = this.collectAdditionalFields(); + hasAdditionalFields = Object.keys(additionalFieldsObj).length > 0; + + // Show/hide additional fields preview + const additionalFieldsPreview = document.getElementById('summary-additional-fields-preview'); + if (hasAdditionalFields) { + let previewContent = ''; + if (typeof additionalFieldsObj === 'object' && additionalFieldsObj !== null) { + previewContent = JSON.stringify(additionalFieldsObj, null, 2); + } else { + previewContent = ''; + } + document.getElementById('summary-additional-fields-content').textContent = previewContent; + additionalFieldsPreview.style.display = ''; + } else { + additionalFieldsPreview.style.display = 'none'; + } + } else { + // Hide additional fields for SQL types + const additionalFieldsPreview = document.getElementById('summary-additional-fields-preview'); + if (additionalFieldsPreview) additionalFieldsPreview.style.display = 'none'; + hasAdditionalFields = false; + } // Update has metadata/additional fields indicators document.getElementById('summary-has-metadata').textContent = hasMetadata ? 'Yes' : 'No'; @@ -2140,21 +2291,6 @@ export class PluginModalStepper { metadataPreview.style.display = 'none'; } - // Show/hide additional fields preview - const additionalFieldsPreview = document.getElementById('summary-additional-fields-preview'); - if (hasAdditionalFields) { - let previewContent = ''; - if (typeof additionalFieldsObj === 'object' && additionalFieldsObj !== null) { - previewContent = JSON.stringify(additionalFieldsObj, null, 2); - } else { - previewContent = ''; - } - document.getElementById('summary-additional-fields-content').textContent = previewContent; - additionalFieldsPreview.style.display = ''; - } else { - additionalFieldsPreview.style.display = 'none'; - } - // Show advanced section if there's any advanced content if (hasMetadata || hasAdditionalFields) { advancedSection.style.display = ''; diff --git a/application/single_app/static/js/public/public_workspace.js b/application/single_app/static/js/public/public_workspace.js index 995fb51c..fb1133a7 100644 --- a/application/single_app/static/js/public/public_workspace.js +++ b/application/single_app/static/js/public/public_workspace.js @@ -199,8 +199,6 @@ document.addEventListener('DOMContentLoaded', ()=>{ if (activePublicId) fetchPublicDocs(); }); - Array.from(publicDropdownItems.children).forEach(()=>{}); // placeholder - // --- Document selection event listeners --- // Event delegation for document checkboxes document.addEventListener('change', function(event) { @@ -266,8 +264,7 @@ function updatePublicRoleDisplay(){ if (nameRoleEl) nameRoleEl.textContent = activePublicName; if (display) display.style.display = 'block'; if (uploadSection) uploadSection.style.display = ['Owner','Admin','DocumentManager'].includes(userRoleInActivePublic) ? 'block' : 'none'; - // uploadHr was removed from template, so skip - + // Control visibility of Settings tab (only for Owners and Admins) const settingsTabNav = document.getElementById('public-settings-tab-nav'); const canManageSettings = ['Owner', 'Admin'].includes(userRoleInActivePublic); @@ -491,6 +488,7 @@ function renderPublicDocumentRow(doc) {

    Citations: ${getCitationBadge(doc.enhanced_citations)}

    Publication Date: ${escapeHtml(doc.publication_date || 'N/A')}

    Keywords: ${escapeHtml(doc.keywords || 'N/A')}

    +

    Tags: ${renderPublicTagBadges(doc.tags || [])}

    Abstract: ${escapeHtml(doc.abstract || 'N/A')}


    @@ -1708,7 +1706,7 @@ window.loadPublicWorkspaceTags = loadPublicWorkspaceTags; function isPublicColorLight(hex) { if (!hex) return true; hex = hex.replace('#', ''); - const r = parseInt(hex.substr(0,2),16), g = parseInt(hex.substr(2,2),16), b = parseInt(hex.substr(4,2),16); + const r = parseInt(hex.substring(0, 2), 16), g = parseInt(hex.substring(2, 4), 16), b = parseInt(hex.substring(4, 6), 16); return (r * 299 + g * 587 + b * 114) / 1000 > 155; } @@ -1718,6 +1716,29 @@ function escapePublicHtml(text) { return d.innerHTML; } +function renderPublicTagBadges(tags, maxDisplay = 3) { + if (!Array.isArray(tags) || tags.length === 0) { + return 'No tags'; + } + + let html = ''; + const displayTags = tags.slice(0, maxDisplay); + + displayTags.forEach(tagName => { + const tag = publicWorkspaceTags.find(t => t.name === tagName); + const color = tag && tag.color ? tag.color : '#6c757d'; + const textClass = isPublicColorLight(color) ? 'text-dark' : 'text-light'; + + html += `${escapePublicHtml(tagName)}`; + }); + + if (tags.length > maxDisplay) { + html += `+${tags.length - maxDisplay}`; + } + + return html; +} + // --- Tag Management Modal --- function showPublicTagManagementModal() { loadPublicWorkspaceTags().then(() => { diff --git a/application/single_app/static/js/workspace/group_agents.js b/application/single_app/static/js/workspace/group_agents.js index f97dbd07..608f029e 100644 --- a/application/single_app/static/js/workspace/group_agents.js +++ b/application/single_app/static/js/workspace/group_agents.js @@ -4,16 +4,23 @@ import { showToast } from "../chat/chat-toast.js"; import * as agentsCommon from "../agents_common.js"; import { AgentModalStepper } from "../agent_modal_stepper.js"; +import { + humanizeName, truncateDescription, escapeHtml as escapeHtmlUtil, + setupViewToggle, switchViewContainers, openViewModal, createAgentCard +} from './view-utils.js'; const tableBody = document.getElementById("group-agents-table-body"); const errorContainer = document.getElementById("group-agents-error"); const searchInput = document.getElementById("group-agents-search"); const createButton = document.getElementById("create-group-agent-btn"); const permissionWarning = document.getElementById("group-agents-permission-warning"); +const agentsListView = document.getElementById("group-agents-list-view"); +const agentsGridView = document.getElementById("group-agents-grid-view"); let agents = []; let filteredAgents = []; let agentStepper = null; +let currentViewMode = 'list'; let currentContext = window.groupWorkspaceContext || { activeGroupId: null, activeGroupName: "", @@ -21,14 +28,7 @@ let currentContext = window.groupWorkspaceContext || { }; function escapeHtml(value) { - if (!value) return ""; - return value.replace(/[&<>"']/g, (char) => ({ - "&": "&", - "<": "<", - ">": ">", - '"': """, - "'": "'" - }[char] || char)); + return escapeHtmlUtil(value); } function canManageAgents() { @@ -46,6 +46,7 @@ function groupAllowsModifications() { } function truncateName(name, maxLength = 18) { + // Kept for backward compat; prefer humanizeName for display if (!name || name.length <= maxLength) return name || ""; return `${name.substring(0, maxLength)}…`; } @@ -114,29 +115,61 @@ function renderAgentsTable(list) { list.forEach((agent) => { const tr = document.createElement("tr"); - const displayName = truncateName(agent.display_name || agent.displayName || agent.name || ""); - const description = escapeHtml(agent.description || "No description available."); - - let actionsHtml = ""; + const rawName = agent.display_name || agent.displayName || agent.name || ""; + const displayName = humanizeName(rawName); + const fullDesc = agent.description || "No description available."; + const shortDesc = truncateDescription(fullDesc, 90); + + let actionsHtml = ` + + `; if (canManage) { - actionsHtml = ` - - `; } tr.innerHTML = ` - ${escapeHtml(displayName)} - ${description} + ${escapeHtml(displayName)} + ${escapeHtml(shortDesc)} ${actionsHtml}`; tableBody.appendChild(tr); }); } +function renderAgentsGrid(list) { + if (!agentsGridView) return; + agentsGridView.innerHTML = ''; + + if (!list.length) { + agentsGridView.innerHTML = '
    No group agents found.
    '; + return; + } + + const canManage = canManageAgents() && groupAllowsModifications(); + list.forEach(agent => { + const col = createAgentCard(agent, { + onChat: a => chatWithGroupAgent(a.name || a), + onView: a => openGroupAgentViewModal(a), + onEdit: canManage ? a => { + const found = agents.find(x => x.id === (a.id || a.name || a) || x.name === (a.name || a)); + openAgentModal(found || null); + } : null, + onDelete: canManage ? a => deleteGroupAgent(a.id || a.name || a) : null + }); + agentsGridView.appendChild(col); + }); +} + function filterAgents(term) { if (!term) { filteredAgents = agents.slice(); @@ -149,6 +182,23 @@ function filterAgents(term) { }); } renderAgentsTable(filteredAgents); + renderAgentsGrid(filteredAgents); +} + +// Open the view modal for a group agent with Chat/Edit/Delete actions +function openGroupAgentViewModal(agent) { + const canManage = canManageAgents() && groupAllowsModifications(); + const callbacks = { + onChat: (a) => chatWithGroupAgent(a.name) + }; + if (canManage) { + callbacks.onEdit = (a) => { + const found = agents.find(x => x.id === a.id || x.name === a.name); + openAgentModal(found || a); + }; + callbacks.onDelete = (a) => deleteGroupAgent(a.id || a.name); + } + openViewModal(agent, 'agent', callbacks); } function overrideAgentStepper(stepper) { @@ -343,7 +393,57 @@ async function fetchGroupAgents() { } } +async function chatWithGroupAgent(agentName) { + try { + const agent = agents.find(a => a.name === agentName); + if (!agent) { + throw new Error("Agent not found"); + } + + const payloadData = { + selected_agent: { + name: agentName, + display_name: agent.display_name || agent.displayName || agentName, + is_global: !!agent.is_global, + is_group: true, + group_id: currentContext.activeGroupId, + group_name: currentContext.activeGroupName + } + }; + + const resp = await fetch("/api/user/settings/selected_agent", { + method: "POST", + headers: { "Content-Type": "application/json" }, + body: JSON.stringify(payloadData) + }); + + if (!resp.ok) { + throw new Error("Failed to select agent"); + } + + window.location.href = "/chats"; + } catch (err) { + console.error("Error selecting group agent for chat:", err); + showToast("Error selecting agent for chat. Please try again.", "danger"); + } +} + function handleTableClick(event) { + const viewBtn = event.target.closest(".view-group-agent-btn"); + if (viewBtn) { + const agentName = viewBtn.dataset.agentName; + const agent = agents.find(a => a.name === agentName); + if (agent) openGroupAgentViewModal(agent); + return; + } + + const chatBtn = event.target.closest(".chat-group-agent-btn"); + if (chatBtn) { + const agentName = chatBtn.dataset.agentName; + chatWithGroupAgent(agentName); + return; + } + const editBtn = event.target.closest(".edit-group-agent-btn"); if (editBtn) { const agentId = editBtn.dataset.agentId; @@ -384,6 +484,11 @@ function initialize() { updatePermissionUI(); bindEventHandlers(); + setupViewToggle('groupAgents', 'groupAgentsViewPreference', (mode) => { + currentViewMode = mode; + switchViewContainers(mode, agentsListView, agentsGridView); + }); + if (document.getElementById("group-agents-tab-btn")?.classList.contains("active")) { fetchGroupAgents(); } diff --git a/application/single_app/static/js/workspace/group_plugins.js b/application/single_app/static/js/workspace/group_plugins.js index 60a7f42e..8acdf5bd 100644 --- a/application/single_app/static/js/workspace/group_plugins.js +++ b/application/single_app/static/js/workspace/group_plugins.js @@ -3,6 +3,10 @@ import { ensurePluginsTableInRoot, validatePluginManifest } from "../plugin_common.js"; import { showToast } from "../chat/chat-toast.js"; +import { + humanizeName, truncateDescription, escapeHtml as escapeHtmlUtil, + setupViewToggle, switchViewContainers, openViewModal, createActionCard +} from './view-utils.js'; const root = document.getElementById("group-plugins-root"); const permissionWarning = document.getElementById("group-plugins-permission-warning"); @@ -11,6 +15,7 @@ let plugins = []; let filteredPlugins = []; let templateReady = false; let listenersBound = false; +let currentViewMode = 'list'; let currentContext = window.groupWorkspaceContext || { activeGroupId: null, activeGroupName: "", @@ -18,14 +23,7 @@ let currentContext = window.groupWorkspaceContext || { }; function escapeHtml(value) { - if (!value) return ""; - return value.replace(/[&<>"']/g, (char) => ({ - "&": "&", - "<": "<", - ">": ">", - '"': """, - "'": "'" - }[char] || char)); + return escapeHtmlUtil(value); } function canManagePlugins() { @@ -66,6 +64,14 @@ function bindRootEvents() { }); root.addEventListener("click", async (event) => { + const viewBtn = event.target.closest(".view-group-plugin-btn"); + if (viewBtn) { + const pluginId = viewBtn.dataset.pluginId; + const plugin = plugins.find(x => x.id === pluginId || x.name === pluginId); + if (plugin) openGroupPluginViewModal(plugin); + return; + } + const createBtn = event.target.closest("#create-group-plugin-btn"); if (createBtn) { event.preventDefault(); @@ -148,23 +154,28 @@ function renderPluginsTable(list) { const canManage = canManagePlugins() && groupAllowsModifications(); list.forEach((plugin) => { const tr = document.createElement("tr"); - const displayName = plugin.displayName || plugin.display_name || plugin.name || ""; - const description = plugin.description || "No description available."; + const rawName = plugin.displayName || plugin.display_name || plugin.name || ""; + const displayName = humanizeName(rawName); + const fullDesc = plugin.description || "No description available."; + const shortDesc = truncateDescription(fullDesc, 90); const isGlobal = Boolean(plugin.is_global); - let actionsHtml = ""; + // View button always visible + let actionsHtml = ` + `; + if (canManage && !isGlobal) { - actionsHtml = ` -
    - - -
    `; + actionsHtml += ` + + `; } else if (canManage && isGlobal) { - actionsHtml = "Managed globally"; + actionsHtml += `Managed globally`; } const titleHtml = isGlobal @@ -172,14 +183,36 @@ function renderPluginsTable(list) { : escapeHtml(displayName); tr.innerHTML = ` - ${titleHtml} - ${escapeHtml(description)} + ${titleHtml} + ${escapeHtml(shortDesc)} ${actionsHtml}`; tbody.appendChild(tr); }); } +function renderPluginsGrid(list) { + const gridView = document.getElementById('group-plugins-grid-view'); + if (!gridView) return; + gridView.innerHTML = ''; + + if (!list.length) { + gridView.innerHTML = '
    No group actions found.
    '; + return; + } + + const canManage = canManagePlugins() && groupAllowsModifications(); + list.forEach(plugin => { + const isGlobal = Boolean(plugin.is_global); + const col = createActionCard(plugin, { + onView: p => openGroupPluginViewModal(p), + onEdit: (canManage && !isGlobal) ? p => openPluginModal(p.id || p.name) : null, + onDelete: (canManage && !isGlobal) ? p => deleteGroupPlugin(p.id || p.name) : null + }); + gridView.appendChild(col); + }); +} + function filterPlugins(term) { if (!term) { filteredPlugins = plugins.slice(); @@ -192,6 +225,19 @@ function filterPlugins(term) { }); } renderPluginsTable(filteredPlugins); + renderPluginsGrid(filteredPlugins); +} + +// Open the view modal for a group action with Edit/Delete actions +function openGroupPluginViewModal(plugin) { + const canManage = canManagePlugins() && groupAllowsModifications(); + const isGlobal = Boolean(plugin.is_global); + const callbacks = {}; + if (canManage && !isGlobal) { + callbacks.onEdit = (p) => openPluginModal(p.id || p.name); + callbacks.onDelete = (p) => deleteGroupPlugin(p.id || p.name); + } + openViewModal(plugin, 'action', callbacks); } async function fetchGroupPlugins() { @@ -220,7 +266,17 @@ async function fetchGroupPlugins() { filteredPlugins = plugins.slice(); renderPluginsTable(filteredPlugins); + renderPluginsGrid(filteredPlugins); updatePermissionUI(); + + // Set up view toggle (only once after template is in DOM) + setupViewToggle('groupPlugins', 'groupPluginsViewPreference', (mode) => { + currentViewMode = mode; + switchViewContainers(mode, + document.getElementById('group-plugins-list-view'), + document.getElementById('group-plugins-grid-view') + ); + }); } catch (error) { console.error("Error loading group actions:", error); renderError(error.message || "Unable to load group actions."); diff --git a/application/single_app/static/js/workspace/view-utils.js b/application/single_app/static/js/workspace/view-utils.js new file mode 100644 index 00000000..3b78bc15 --- /dev/null +++ b/application/single_app/static/js/workspace/view-utils.js @@ -0,0 +1,523 @@ +// view-utils.js +// Shared utilities for list/grid view toggle, name humanization, and view modal +// Used by personal and group agents/actions workspace modules + +/** + * Convert a technical name to a human-readable display name. + * Handles underscores, camelCase, PascalCase, and consecutive uppercase. + * Examples: + * "sql_query" → "Sql Query" + * "myAgentName" → "My Agent Name" + * "OpenAPIPlugin" → "Open API Plugin" + * "log_analytics" → "Log Analytics" + */ +export function humanizeName(name) { + if (!name) return ""; + // Replace underscores and hyphens with spaces + let result = name.replace(/[_-]/g, " "); + // Insert space before uppercase letters that follow lowercase letters (camelCase) + result = result.replace(/([a-z])([A-Z])/g, "$1 $2"); + // Insert space between consecutive uppercase followed by lowercase (e.g., "APIPlugin" → "API Plugin") + result = result.replace(/([A-Z]+)([A-Z][a-z])/g, "$1 $2"); + // Capitalize first letter of each word + result = result.replace(/\b\w/g, (c) => c.toUpperCase()); + // Collapse multiple spaces + result = result.replace(/\s+/g, " ").trim(); + return result; +} + +/** + * Truncate a description string to maxLen characters, appending "…" if truncated. + */ +export function truncateDescription(text, maxLen = 100) { + if (!text) return ""; + if (text.length <= maxLen) return text; + return text.substring(0, maxLen).trimEnd() + "…"; +} + +/** + * Escape HTML entities to prevent XSS. + */ +export function escapeHtml(str) { + if (!str) return ""; + return str.replace(/[&<>"']/g, (c) => + ({ "&": "&", "<": "<", ">": ">", '"': """, "'": "'" }[c]) + ); +} + +/** + * Get an appropriate Bootstrap icon class for an action/plugin type. + */ +export function getTypeIcon(type) { + if (!type) return "bi-lightning-charge"; + const t = type.toLowerCase(); + if (t.includes("sql")) return "bi-database"; + if (t.includes("openapi")) return "bi-globe"; + if (t.includes("log_analytics")) return "bi-graph-up"; + if (t.includes("msgraph")) return "bi-microsoft"; + if (t.includes("databricks")) return "bi-bricks"; + if (t.includes("http") || t.includes("smart_http")) return "bi-cloud-arrow-up"; + if (t.includes("azure_function")) return "bi-lightning"; + if (t.includes("blob")) return "bi-file-earmark"; + if (t.includes("queue")) return "bi-inbox"; + if (t.includes("embedding")) return "bi-vector-pen"; + if (t.includes("fact_memory")) return "bi-brain"; + if (t.includes("math")) return "bi-calculator"; + if (t.includes("text")) return "bi-fonts"; + if (t.includes("time")) return "bi-clock"; + return "bi-lightning-charge"; +} + +/** + * Create the HTML string for a list/grid view toggle button group. + * @param {string} prefix - Unique prefix for element IDs (e.g., "agents", "plugins", "group-agents") + * @returns {string} HTML string + */ +export function createViewToggleHtml(prefix) { + return ` +
    + + + + +
    `; +} + +/** + * Set up view toggle event listeners and restore saved preference. + * @param {string} prefix - Unique prefix matching createViewToggleHtml + * @param {string} storageKey - localStorage key for persistence + * @param {function} onSwitch - Callback receiving 'list' or 'grid' + */ +export function setupViewToggle(prefix, storageKey, onSwitch) { + const listRadio = document.getElementById(`${prefix}-view-list`); + const gridRadio = document.getElementById(`${prefix}-view-grid`); + if (!listRadio || !gridRadio) return; + + listRadio.addEventListener("change", () => { + if (listRadio.checked) { + localStorage.setItem(storageKey, "list"); + onSwitch("list"); + } + }); + + gridRadio.addEventListener("change", () => { + if (gridRadio.checked) { + localStorage.setItem(storageKey, "grid"); + onSwitch("grid"); + } + }); + + // Restore saved preference + const saved = localStorage.getItem(storageKey); + if (saved === "grid") { + gridRadio.checked = true; + listRadio.checked = false; + onSwitch("grid"); + } else { + onSwitch("list"); + } +} + +/** + * Toggle visibility of list and grid containers. + * @param {string} mode - 'list' or 'grid' + * @param {HTMLElement} listContainer - The list/table container element + * @param {HTMLElement} gridContainer - The grid container element + */ +export function switchViewContainers(mode, listContainer, gridContainer) { + if (listContainer) { + listContainer.classList.toggle("d-none", mode !== "list"); + } + if (gridContainer) { + gridContainer.classList.toggle("d-none", mode !== "grid"); + } +} + +// ============================================================================ +// VIEW MODAL — Lightweight read-only detail view +// ============================================================================ + +/** + * Open a read-only view modal for an agent or action. + * @param {object} item - The agent or action data object + * @param {'agent'|'action'} type - What kind of item this is + * @param {object} [callbacks] - Optional action callbacks { onChat, onEdit, onDelete } + */ +export function openViewModal(item, type, callbacks = {}) { + const modalEl = document.getElementById("item-view-modal"); + if (!modalEl) return; + + const titleEl = modalEl.querySelector(".modal-title"); + const bodyEl = modalEl.querySelector(".modal-body"); + const footerEl = modalEl.querySelector(".modal-footer"); + if (!titleEl || !bodyEl || !footerEl) return; + + if (type === "agent") { + titleEl.textContent = "Agent Details"; + bodyEl.innerHTML = buildAgentViewHtml(item); + } else { + titleEl.textContent = "Action Details"; + bodyEl.innerHTML = buildActionViewHtml(item); + } + + // Build footer buttons dynamically + footerEl.innerHTML = ''; + const { onChat, onEdit, onDelete } = callbacks; + + if (onChat && typeof onChat === 'function') { + const chatBtn = document.createElement('button'); + chatBtn.type = 'button'; + chatBtn.className = 'btn btn-primary'; + chatBtn.innerHTML = 'Chat'; + chatBtn.addEventListener('click', () => { + bootstrap.Modal.getInstance(modalEl)?.hide(); + onChat(item); + }); + footerEl.appendChild(chatBtn); + } + + if (onEdit && typeof onEdit === 'function') { + const editBtn = document.createElement('button'); + editBtn.type = 'button'; + editBtn.className = 'btn btn-outline-secondary'; + editBtn.innerHTML = 'Edit'; + editBtn.addEventListener('click', () => { + bootstrap.Modal.getInstance(modalEl)?.hide(); + onEdit(item); + }); + footerEl.appendChild(editBtn); + } + + if (onDelete && typeof onDelete === 'function') { + const delBtn = document.createElement('button'); + delBtn.type = 'button'; + delBtn.className = 'btn btn-outline-danger'; + delBtn.innerHTML = 'Delete'; + delBtn.addEventListener('click', () => { + bootstrap.Modal.getInstance(modalEl)?.hide(); + onDelete(item); + }); + footerEl.appendChild(delBtn); + } + + const closeBtn = document.createElement('button'); + closeBtn.type = 'button'; + closeBtn.className = 'btn btn-secondary'; + closeBtn.textContent = 'Close'; + closeBtn.setAttribute('data-bs-dismiss', 'modal'); + footerEl.appendChild(closeBtn); + + const modal = new bootstrap.Modal(modalEl); + modal.show(); +} + +function buildAgentViewHtml(agent) { + const displayName = escapeHtml(agent.display_name || agent.displayName || agent.name || ""); + const name = escapeHtml(agent.name || ""); + const description = escapeHtml(agent.description || "No description available."); + const model = escapeHtml(agent.azure_openai_gpt_deployment || agent.model || "Default"); + const agentType = agent.agent_type === "aifoundry" ? "Azure AI Foundry" : "Local (Semantic Kernel)"; + const rawInstructions = agent.instructions || "No instructions defined."; + // Render instructions as Markdown (marked + DOMPurify are loaded globally in base.html) + const renderedInstructions = (typeof marked !== 'undefined' && typeof DOMPurify !== 'undefined') + ? DOMPurify.sanitize(marked.parse(rawInstructions)) + : escapeHtml(rawInstructions); + const isGlobal = agent.is_global; + const scopeBadge = isGlobal + ? 'Global' + : 'Personal'; + + return ` +
    +
    + Basic Information +
    +
    +
    +
    + + ${displayName} +
    +
    + + ${name} +
    +
    + + ${scopeBadge} +
    +
    + + ${escapeHtml(agentType)} +
    +
    + + ${description} +
    +
    +
    +
    +
    +
    + Model Configuration +
    +
    +
    +
    + + ${model} +
    +
    +
    +
    +
    +
    + Instructions +
    +
    +
    +${renderedInstructions} +
    +
    +
    `; +} + +function buildActionViewHtml(action) { + const displayName = escapeHtml(action.display_name || action.displayName || action.name || ""); + const name = escapeHtml(action.name || ""); + const description = escapeHtml(action.description || "No description available."); + const type = escapeHtml(action.type || "unknown"); + const typeIcon = getTypeIcon(action.type); + const authType = escapeHtml(formatAuthType(action.auth?.type || action.auth_type || "")); + const endpoint = escapeHtml(action.endpoint || action.base_url || ""); + const isGlobal = action.is_global; + const scopeBadge = isGlobal + ? 'Global' + : 'Personal'; + + let configHtml = ""; + if (endpoint) { + configHtml = ` +
    +
    + Configuration +
    +
    +
    +
    + + ${endpoint} +
    +
    + + ${authType || "None"} +
    +
    +
    +
    `; + } + + return ` +
    +
    + Basic Information +
    +
    +
    +
    + + ${displayName} +
    +
    + + ${name} +
    +
    + + ${humanizeName(type)} +
    +
    + + ${scopeBadge} +
    +
    + + ${description} +
    +
    +
    +
    + ${configHtml}`; +} + +function formatAuthType(type) { + if (!type) return ""; + const map = { + "key": "API Key", + "identity": "Managed Identity", + "user": "User (Delegated)", + "servicePrincipal": "Service Principal", + "connection_string": "Connection String", + "basic": "Basic Auth", + "username_password": "Username / Password", + "NoAuth": "No Authentication" + }; + return map[type] || type; +} + +// ============================================================================ +// GRID CARD RENDERERS +// ============================================================================ + +/** + * Create a grid card element for an agent. + * @param {object} agent - Agent data object + * @param {object} options - { onChat, onView, onEdit, onDelete, canManage, isGroup } + * @returns {HTMLElement} + */ +export function createAgentCard(agent, options = {}) { + const { onChat, onView, onEdit, onDelete, canManage = false, isGroup = false } = options; + const col = document.createElement("div"); + col.className = "col-sm-6 col-md-4 col-lg-3"; + + const displayName = humanizeName(agent.display_name || agent.displayName || agent.name || ""); + const description = agent.description || "No description available."; + const isGlobal = agent.is_global; + + let badgeHtml = ""; + if (isGlobal) { + badgeHtml = 'Global'; + } + + let buttonsHtml = ` + + `; + + if (canManage && !isGlobal) { + buttonsHtml += ` + + `; + } + + col.innerHTML = ` +
    +
    +
    + +
    +
    ${escapeHtml(displayName)}${badgeHtml}
    +

    ${escapeHtml(truncateDescription(description, 120))}

    +
    + ${buttonsHtml} +
    +
    +
    `; + + // Bind button events + const chatBtn = col.querySelector(".item-card-chat-btn"); + const viewBtn = col.querySelector(".item-card-view-btn"); + const editBtn = col.querySelector(".item-card-edit-btn"); + const deleteBtn = col.querySelector(".item-card-delete-btn"); + + if (chatBtn && onChat) chatBtn.addEventListener("click", (e) => { e.stopPropagation(); onChat(agent); }); + if (viewBtn && onView) viewBtn.addEventListener("click", (e) => { e.stopPropagation(); onView(agent); }); + if (editBtn && onEdit) editBtn.addEventListener("click", (e) => { e.stopPropagation(); onEdit(agent); }); + if (deleteBtn && onDelete) deleteBtn.addEventListener("click", (e) => { e.stopPropagation(); onDelete(agent); }); + + // Clicking anywhere on the card opens the detail view + const cardEl = col.querySelector(".item-card"); + if (cardEl && onView) { + cardEl.style.cursor = "pointer"; + cardEl.addEventListener("click", () => onView(agent)); + } + + return col; +} + +/** + * Create a grid card element for an action/plugin. + * @param {object} plugin - Action/plugin data object + * @param {object} options - { onView, onEdit, onDelete, canManage, isAdmin } + * @returns {HTMLElement} + */ +export function createActionCard(plugin, options = {}) { + const { onView, onEdit, onDelete, canManage = true, isAdmin = false } = options; + const col = document.createElement("div"); + col.className = "col-sm-6 col-md-4 col-lg-3"; + + const displayName = humanizeName(plugin.display_name || plugin.displayName || plugin.name || ""); + const description = plugin.description || "No description available."; + const type = plugin.type || ""; + const typeIcon = getTypeIcon(type); + const isGlobal = plugin.is_global; + + let badgeHtml = ""; + if (isGlobal) { + badgeHtml = 'Global'; + } + + const typeBadge = type + ? `${escapeHtml(humanizeName(type))}` + : ""; + + let buttonsHtml = ` + `; + + if ((isAdmin || (canManage && !isGlobal))) { + buttonsHtml += ` + + `; + } + + col.innerHTML = ` +
    +
    +
    + +
    +
    ${escapeHtml(displayName)}${badgeHtml}
    +
    ${typeBadge}
    +

    ${escapeHtml(truncateDescription(description, 120))}

    +
    + ${buttonsHtml} +
    +
    +
    `; + + // Bind button events + const viewBtn = col.querySelector(".item-card-view-btn"); + const editBtn = col.querySelector(".item-card-edit-btn"); + const deleteBtn = col.querySelector(".item-card-delete-btn"); + + if (viewBtn && onView) viewBtn.addEventListener("click", (e) => { e.stopPropagation(); onView(plugin); }); + if (editBtn && onEdit) editBtn.addEventListener("click", (e) => { e.stopPropagation(); onEdit(plugin); }); + if (deleteBtn && onDelete) deleteBtn.addEventListener("click", (e) => { e.stopPropagation(); onDelete(plugin); }); + + // Clicking anywhere on the card opens the detail view + const cardEl = col.querySelector(".item-card"); + if (cardEl && onView) { + cardEl.style.cursor = "pointer"; + cardEl.addEventListener("click", () => onView(plugin)); + } + + return col; +} diff --git a/application/single_app/static/js/workspace/workspace_agents.js b/application/single_app/static/js/workspace/workspace_agents.js index a0839b25..623be234 100644 --- a/application/single_app/static/js/workspace/workspace_agents.js +++ b/application/single_app/static/js/workspace/workspace_agents.js @@ -4,14 +4,22 @@ import { showToast } from "../chat/chat-toast.js"; import * as agentsCommon from '../agents_common.js'; import { AgentModalStepper } from '../agent_modal_stepper.js'; +import { + humanizeName, truncateDescription, escapeHtml, + setupViewToggle, switchViewContainers, + openViewModal, createAgentCard +} from './view-utils.js'; // --- DOM Elements & Globals --- const agentsTbody = document.getElementById('agents-table-body'); const agentsErrorDiv = document.getElementById('workspace-agents-error'); const createAgentBtn = document.getElementById('create-agent-btn'); const agentsSearchInput = document.getElementById('agents-search'); +const agentsListView = document.getElementById('agents-list-view'); +const agentsGridView = document.getElementById('agents-grid-view'); let agents = []; let filteredAgents = []; +let currentViewMode = 'list'; // --- Function Definitions --- @@ -43,104 +51,87 @@ function filterAgents(searchTerm) { }); } renderAgentsTable(filteredAgents); + renderAgentsGrid(filteredAgents); } -// --- Helper Functions --- - -function truncateDisplayName(displayName, maxLength = 12) { - if (!displayName || displayName.length <= maxLength) { - return displayName; +// Open the view modal for an agent with Chat/Edit/Delete actions in the footer +function openAgentViewModal(agent) { + const callbacks = { + onChat: (a) => chatWithAgent(a.name), + onDelete: !agent.is_global ? (a) => { if (confirm(`Delete agent '${a.name}'?`)) deleteAgent(a.name); } : null + }; + if (!agent.is_global) { + callbacks.onEdit = (a) => openAgentModal(a); } - return displayName.substring(0, maxLength) + '...'; + openViewModal(agent, 'agent', callbacks); } +// --- Rendering Functions --- function renderAgentsTable(agentsList) { if (!agentsTbody) return; agentsTbody.innerHTML = ''; if (!agentsList.length) { const tr = document.createElement('tr'); - tr.innerHTML = 'No agents found.'; + tr.innerHTML = 'No agents found.'; agentsTbody.appendChild(tr); return; } - // Fetch selected_agent from user settings (async) - fetch('/api/user/settings').then(res => { - if (!res.ok) throw new Error('Failed to load user settings'); - return res.json(); - }).then(settings => { - let selectedAgentObj = settings.selected_agent; - if (!selectedAgentObj && settings.settings && settings.settings.selected_agent) { - selectedAgentObj = settings.settings.selected_agent; - } - let selectedAgentName = typeof selectedAgentObj === 'object' ? selectedAgentObj.name : selectedAgentObj; - agentsTbody.innerHTML = ''; - for (const agent of agentsList) { - const tr = document.createElement('tr'); - - // Create action buttons - let actionButtons = ``; - - if (!agent.is_global) { - actionButtons += ` - - - `; - } - - const truncatedDisplayName = truncateDisplayName(agent.display_name || agent.name || ''); - - tr.innerHTML = ` - - ${truncatedDisplayName} - ${agent.is_global ? ' Global' : ''} - - ${agent.description || 'No description available'} - ${actionButtons} - `; - agentsTbody.appendChild(tr); - } - }).catch(e => { - renderError('Could not load agent settings: ' + e.message); - // Fallback: render table without settings - agentsTbody.innerHTML = ''; - for (const agent of agentsList) { - const tr = document.createElement('tr'); - - // Create action buttons - let actionButtons = ` + `; - - if (!agent.is_global) { - actionButtons += ` - - - `; - } - - const truncatedDisplayName = truncateDisplayName(agent.display_name || agent.name || ''); - - tr.innerHTML = ` - - ${truncatedDisplayName} - ${agent.is_global ? ' Global' : ''} - - ${agent.description || 'No description available'} - ${actionButtons} - `; - agentsTbody.appendChild(tr); + + if (!isGlobal) { + actionButtons += ` + + `; } - }); + + tr.innerHTML = ` + + ${escapeHtml(displayName)} + ${isGlobal ? ' Global' : ''} + + ${escapeHtml(truncatedDesc)} + ${actionButtons} + `; + agentsTbody.appendChild(tr); + } +} + +function renderAgentsGrid(agentsList) { + if (!agentsGridView) return; + agentsGridView.innerHTML = ''; + if (!agentsList.length) { + agentsGridView.innerHTML = '
    No agents found.
    '; + return; + } + + for (const agent of agentsList) { + const card = createAgentCard(agent, { + onChat: (a) => chatWithAgent(a.name), + onView: (a) => openAgentViewModal(a), + onEdit: (a) => openAgentModal(a), + onDelete: (a) => { if (confirm(`Delete agent '${a.name}'?`)) deleteAgent(a.name); }, + canManage: !agent.is_global + }); + agentsGridView.appendChild(card); + } } async function fetchAgents() { @@ -151,6 +142,7 @@ async function fetchAgents() { agents = await res.json(); filteredAgents = agents; // Initialize filtered list renderAgentsTable(filteredAgents); + renderAgentsGrid(filteredAgents); } catch (e) { renderError(e.message); } @@ -177,17 +169,14 @@ function attachAgentTableEvents() { } agentsTbody.addEventListener('click', function (e) { - console.log('Agent table clicked, target:', e.target); - // Find the button element (could be the target or a parent) const editBtn = e.target.closest('.edit-agent-btn'); const deleteBtn = e.target.closest('.delete-agent-btn'); const chatBtn = e.target.closest('.chat-agent-btn'); + const viewBtn = e.target.closest('.view-agent-btn'); if (editBtn) { - console.log('Edit agent button clicked, dataset:', editBtn.dataset); const agent = agents.find(a => a.name === editBtn.dataset.name); - console.log('Found agent:', agent); openAgentModal(agent); } @@ -201,33 +190,27 @@ function attachAgentTableEvents() { const agentName = chatBtn.dataset.name; chatWithAgent(agentName); } + + if (viewBtn) { + const agent = agents.find(a => a.name === viewBtn.dataset.name); + if (agent) openAgentViewModal(agent); + } }); } async function chatWithAgent(agentName) { try { - console.log('DEBUG: chatWithAgent called with agentName:', agentName); - console.log('DEBUG: Available agents:', agents); - - // Find the agent to get its is_global status const agent = agents.find(a => a.name === agentName); - console.log('DEBUG: Found agent:', agent); - if (!agent) { throw new Error('Agent not found'); } - console.log('DEBUG: Agent is_global flag:', agent.is_global); - console.log('DEBUG: !!agent.is_global:', !!agent.is_global); - - // Set the selected agent with proper is_global flag const payloadData = { selected_agent: { name: agentName, is_global: !!agent.is_global } }; - console.log('DEBUG: Sending payload:', payloadData); const resp = await fetch('/api/user/settings/selected_agent', { method: 'POST', @@ -239,9 +222,6 @@ async function chatWithAgent(agentName) { throw new Error('Failed to select agent'); } - console.log('DEBUG: Agent selection saved successfully'); - - // Navigate to chat page window.location.href = '/chats'; } catch (err) { console.error('Error selecting agent for chat:', err); @@ -353,6 +333,17 @@ async function deleteAgent(name) { function initializeWorkspaceAgentUI() { window.agentModalStepper = new AgentModalStepper(false); attachAgentTableEvents(); + + // Set up view toggle + setupViewToggle('agents', 'agentsViewPreference', (mode) => { + currentViewMode = mode; + switchViewContainers(mode, agentsListView, agentsGridView); + // Re-render grid if switching to grid and we have data + if (mode === 'grid' && filteredAgents.length) { + renderAgentsGrid(filteredAgents); + } + }); + fetchAgents(); } diff --git a/application/single_app/static/js/workspace/workspace_plugins.js b/application/single_app/static/js/workspace/workspace_plugins.js index 30fef0d5..8ed4f6b5 100644 --- a/application/single_app/static/js/workspace/workspace_plugins.js +++ b/application/single_app/static/js/workspace/workspace_plugins.js @@ -1,10 +1,14 @@ // workspace_plugins.js (refactored to use plugin_common.js and new multi-step modal) -import { renderPluginsTable, ensurePluginsTableInRoot, validatePluginManifest } from '../plugin_common.js'; +import { renderPluginsTable, renderPluginsGrid, ensurePluginsTableInRoot, validatePluginManifest } from '../plugin_common.js'; import { showToast } from "../chat/chat-toast.js" +import { + setupViewToggle, switchViewContainers, openViewModal +} from './view-utils.js'; const root = document.getElementById('workspace-plugins-root'); let plugins = []; let filteredPlugins = []; +let currentViewMode = 'list'; function renderLoading() { root.innerHTML = `
    Loading...
    `; @@ -14,6 +18,22 @@ function renderError(msg) { root.innerHTML = `
    ${msg}
    `; } +function getViewHandlers() { + return { + onEdit: name => openPluginModal(plugins.find(p => p.name === name)), + onDelete: name => deletePlugin(name), + onView: name => { + const plugin = plugins.find(p => p.name === name); + if (plugin) { + openViewModal(plugin, 'action', { + onEdit: (item) => openPluginModal(item), + onDelete: (item) => deletePlugin(item.name) + }); + } + } + }; +} + function filterPlugins(searchTerm) { if (!searchTerm || !searchTerm.trim()) { filteredPlugins = plugins; @@ -26,14 +46,18 @@ function filterPlugins(searchTerm) { }); } - // Ensure table template is in place ensurePluginsTableInRoot(); + const handlers = getViewHandlers(); renderPluginsTable({ plugins: filteredPlugins, tbodySelector: '#plugins-table-body', - onEdit: name => openPluginModal(plugins.find(p => p.name === name)), - onDelete: name => deletePlugin(name) + ...handlers + }); + renderPluginsGrid({ + plugins: filteredPlugins, + containerSelector: '#plugins-grid-view', + ...handlers }); } @@ -47,12 +71,26 @@ async function fetchPlugins() { // Ensure table template is in place ensurePluginsTableInRoot(); + const handlers = getViewHandlers(); renderPluginsTable({ plugins: filteredPlugins, tbodySelector: '#plugins-table-body', - onEdit: name => openPluginModal(plugins.find(p => p.name === name)), - onDelete: name => deletePlugin(name) + ...handlers + }); + renderPluginsGrid({ + plugins: filteredPlugins, + containerSelector: '#plugins-grid-view', + ...handlers + }); + + // Set up view toggle (only once after template is in DOM) + setupViewToggle('plugins', 'pluginsViewPreference', (mode) => { + currentViewMode = mode; + switchViewContainers(mode, + document.getElementById('plugins-list-view'), + document.getElementById('plugins-grid-view') + ); }); // Set up the create action button @@ -137,6 +175,8 @@ function setupSaveHandler(plugin, modal) { } async function savePlugin(pluginData, existingPlugin = null) { + const payload = existingPlugin?.id ? { ...pluginData, id: existingPlugin.id } : { ...pluginData }; + // Get all plugins first const res = await fetch('/api/user/plugins'); @@ -145,11 +185,19 @@ async function savePlugin(pluginData, existingPlugin = null) { let plugins = await res.json(); // Update or add the plugin - const existingIndex = plugins.findIndex(p => p.name === pluginData.name); + const existingIndex = plugins.findIndex(p => { + if (payload.id && p.id === payload.id) { + return true; + } + if (existingPlugin?.name && p.name === existingPlugin.name) { + return true; + } + return p.name === payload.name; + }); if (existingIndex >= 0) { - plugins[existingIndex] = pluginData; + plugins[existingIndex] = payload; } else { - plugins.push(pluginData); + plugins.push(payload); } // Save back to server diff --git a/application/single_app/static/json/schemas/sql_query.definition.json b/application/single_app/static/json/schemas/sql_query.definition.json index d38a41a8..6903c22a 100644 --- a/application/single_app/static/json/schemas/sql_query.definition.json +++ b/application/single_app/static/json/schemas/sql_query.definition.json @@ -1,6 +1,9 @@ { "$schema": "./plugin.definition.schema.json", "allowedAuthTypes": [ + "user", + "identity", + "servicePrincipal", "connection_string" ] } diff --git a/application/single_app/static/json/schemas/sql_query_plugin.additional_settings.schema.json b/application/single_app/static/json/schemas/sql_query_plugin.additional_settings.schema.json index 9e4f6d34..f7f46ebd 100644 --- a/application/single_app/static/json/schemas/sql_query_plugin.additional_settings.schema.json +++ b/application/single_app/static/json/schemas/sql_query_plugin.additional_settings.schema.json @@ -3,13 +3,13 @@ "title": "SQL Query Plugin Additional Settings", "type": "object", "properties": { - "connection_string__Secret": { + "connection_string": { "type": "string", "description": "Database connection string. Required if server/database not provided." }, "database_type": { "type": "string", - "enum": ["sqlserver", "postgresql", "mysql", "sqlite", "azure_sql", "azuresql"], + "enum": ["sqlserver", "postgresql", "mysql", "sqlite", "azure_sql"], "description": "Type of database engine." }, "server": { @@ -24,7 +24,7 @@ "type": "string", "description": "Username for authentication." }, - "password__Secret": { + "password": { "type": "string", "description": "Password for authentication." }, @@ -50,6 +50,6 @@ "description": "Query timeout in seconds." } }, - "required": ["database_type", "database"], + "required": ["database_type"], "additionalProperties": false } diff --git a/application/single_app/static/json/schemas/sql_schema.definition.json b/application/single_app/static/json/schemas/sql_schema.definition.json index d38a41a8..6903c22a 100644 --- a/application/single_app/static/json/schemas/sql_schema.definition.json +++ b/application/single_app/static/json/schemas/sql_schema.definition.json @@ -1,6 +1,9 @@ { "$schema": "./plugin.definition.schema.json", "allowedAuthTypes": [ + "user", + "identity", + "servicePrincipal", "connection_string" ] } diff --git a/application/single_app/static/json/schemas/sql_schema_plugin.additional_settings.schema.json b/application/single_app/static/json/schemas/sql_schema_plugin.additional_settings.schema.json index e97c7b4b..29fb6b3f 100644 --- a/application/single_app/static/json/schemas/sql_schema_plugin.additional_settings.schema.json +++ b/application/single_app/static/json/schemas/sql_schema_plugin.additional_settings.schema.json @@ -3,13 +3,13 @@ "title": "SQL Schema Plugin Additional Settings", "type": "object", "properties": { - "connection_string__Secret": { + "connection_string": { "type": "string", "description": "Database connection string. Required if server/database not provided." }, "database_type": { "type": "string", - "enum": ["sqlserver", "postgresql", "mysql", "sqlite", "azure_sql", "azuresql"], + "enum": ["sqlserver", "postgresql", "mysql", "sqlite", "azure_sql"], "description": "Type of database engine." }, "server": { @@ -24,7 +24,7 @@ "type": "string", "description": "Username for authentication." }, - "password__Secret": { + "password": { "type": "string", "description": "Password for authentication." }, @@ -33,6 +33,6 @@ "description": "ODBC or DB driver name." } }, - "required": ["database_type", "database"], + "required": ["database_type"], "additionalProperties": false } diff --git a/application/single_app/templates/_agent_examples_modal.html b/application/single_app/templates/_agent_examples_modal.html index 52f95cdc..398e930c 100644 --- a/application/single_app/templates/_agent_examples_modal.html +++ b/application/single_app/templates/_agent_examples_modal.html @@ -92,7 +92,7 @@
    -
    
    +          
    @@ -427,7 +427,12 @@
    + + +
    +
    + +
    + + +
    +
    +
    + +
    +
    Advanced
    +

    Advanced settings are typically not required. Expand below if you need to customize metadata or additional fields.

    - - -
    Optional metadata for this action.
    +
    -
    - - -
    Additional configuration fields specific to this action type.
    +
    +
    + + +
    Optional metadata for this action.
    +
    +
    + + +
    Additional configuration fields specific to this action type.
    +
    @@ -777,6 +802,15 @@
    background-color: #f8f9fa; } +/* Advanced toggle chevron animation */ +#plugin-advanced-toggle-icon { + transition: transform 0.3s ease; +} +#plugin-advanced-collapse.show ~ .mb-3 #plugin-advanced-toggle-icon, +[aria-expanded="true"] #plugin-advanced-toggle-icon { + transform: rotate(180deg); +} + .sql-connection-config, .sql-auth-config { background-color: white; diff --git a/application/single_app/templates/_sidebar_nav.html b/application/single_app/templates/_sidebar_nav.html index a0bceee8..33a89b04 100644 --- a/application/single_app/templates/_sidebar_nav.html +++ b/application/single_app/templates/_sidebar_nav.html @@ -287,6 +287,11 @@ GPT Configuration +
    +
    + + + + Requires Enhanced Citations +
    @@ -1428,6 +1434,12 @@
    +
    + + Shown to signed-in users who lack the required roles. Use Enter for line breaks. + +
    @@ -1580,6 +1592,27 @@
    + +
    +
    + Processing Thoughts +
    +

    When enabled, real-time processing steps are shown to users during chat responses and persisted for later review.

    +
    + + + +
    +
    +
    @@ -2036,14 +2069,26 @@
    -
    - +
    +
    +
    + Enter the full Key Vault secret name. + Enable Key Vault for Agent and Action Secrets + must be enabled and configured. +
    +
    +
    Tabular Preview Limits
    +
    + + + + Maximum blob size (in MB) allowed for tabular file previews (CSV, XLSX). Files larger than this will not be previewed. + Increase for larger files if your compute has sufficient memory, or decrease to protect smaller instances. Default: 200 MB. + +
    +
    + -
    +
    {% if settings.enable_image_generation %}
    -
    - - - {% if app_settings.enable_text_to_speech %} - - {% endif %} - - - - - - - +
    +
    {% if settings.enable_user_workspace or settings.enable_group_workspaces %} - +
    -
    +
    - +
    + +
    +
    +
    + +
    + + + + + + {% if app_settings.enable_text_to_speech %} + + {% endif %} + +
    +
    @@ -457,7 +519,13 @@
    All
    - - - - - - - - - - - - -
    Display NameDescriptionActions
    -
    - Loading... -
    - Select a group to load agents. -
    +
    + + + + + + + + + + + + + +
    Display NameDescriptionActions
    +
    + Loading... +
    + Select a group to load agents. +
    +
    +
    @@ -813,33 +822,42 @@

    Group Workspace

    -
    +
    +
    + + + + +
    - - - - - - - - - - - - - -
    Display NameDescriptionActions
    -
    - Loading... -
    - Select a group to load actions. -
    +
    + + + + + + + + + + + + + +
    Display NameDescriptionActions
    +
    + Loading... +
    + Select a group to load actions. +
    +
    +
    @@ -851,6 +869,22 @@

    Group Workspace

    + + +
    - - - - - - - -
    Display NameDescriptionActions
    -
    Loading...
    - Loading agents... -
    + +
    + + + + + + + +
    Display NameDescriptionActions
    +
    Loading...
    + Loading agents... +
    +
    + +
    @@ -730,16 +741,27 @@

    Personal Workspace

    +
    + + + + +
    - - - - - -
    Display NameDescriptionActions
    + +
    + + + + + +
    Display NameDescriptionActions
    +
    + +
    @@ -754,6 +776,24 @@

    Personal Workspace

    + + +