-
Notifications
You must be signed in to change notification settings - Fork 426
Description
Summary
Gen AI eval: MULTI_TURN_GENERAL_QUALITY fails with "conversation_history is required but not provided" even when column is present in dataset
Environment details
- OS type and version: Colab / Colab Enterprise (Linux, managed runtime)
- Python version:
Python 3.10(from Colab) - pip version:
pip 24.x(from Colab) google-cloud-aiplatformversion:1.134.0
(also reproducible on1.135.0)google-genaiversion:1.61.0- API usage:
vertexai.Client(..., http_options=genai_types.HttpOptions(api_version="v1beta1"))
Steps to reproduce
-
Install the SDK in a fresh Colab / Colab Enterprise runtime
%pip install -U -q "google-cloud-aiplatform[evaluation]==1.134.0" import google.cloud.aiplatform as aiplatform import google.genai as genai print("aiplatform:", aiplatform.__version__) # 1.134.0 print("genai:", genai.__version__) # 1.61.0
-
Initialize Vertex and define
agent_infoimport os import vertexai from google.cloud import storage from google.genai import types as genai_types from vertexai import Client from vertexai import types import pandas as pd PROJECT_ID = os.getenv("PROJECT_ID", "<project-id>") LOCATION = os.getenv("LOCATION", "us-central1") AGENT = os.getenv("AGENT", "<reasoningEngine resource name>") GCS_DEST = os.getenv("GCS_DEST", "<bucket-or-prefix>") AUTOMATED_RUN = os.getenv("AUTOMATED_RUN", "false") AGENT_DISPLAY_NAME = os.getenv("AGENT_DISPLAY_NAME", AGENT.split("/")[-1]) vertexai.init(project=PROJECT_ID, location=LOCATION) client = Client( project=PROJECT_ID, location=LOCATION, http_options=genai_types.HttpOptions(api_version="v1beta1"), ) # Define agent_info (simplified) agent_info = types.evals.AgentInfo( agent_resource_name=AGENT, name="orchestrator_agent", # instruction + tools omitted for brevity )
-
Build a multi‑turn dataset with
historyand run inferencefrom vertexai import generative_models as genai_models multi_turn_conversations = [ { "history": [ genai_models.Content( role="user", parts=[genai_models.Part.from_text("First question")] ), genai_models.Content( role="model", parts=[genai_models.Part.from_text("First response")] ), ], "prompt": "Second question", "session_inputs": types.evals.SessionInput( user_id="user_1", state={"agent_type": "engineering", "conversation_id": "1"}, ), }, # ... a few more rows ... ] prompts = [conv["prompt"] for conv in multi_turn_conversations] histories = [conv["history"] for conv in multi_turn_conversations] session_inputs_list = [conv["session_inputs"] for conv in multi_turn_conversations] def content_to_dict(content): parts_list = [] for part in content.parts: if hasattr(part, "text"): parts_list.append({"text": part.text}) return { "role": content.role, "parts": parts_list, } histories_as_dicts = [ [content_to_dict(content) for content in history] for history in histories ] # Create DataFrame with required columns df = pd.DataFrame({ "prompt": prompts, "history": histories_as_dicts, # list[dict] with role + parts[text] "session_inputs": session_inputs_list, }) multi_turn_dataset = types.EvaluationDataset(eval_dataset_df=df) print(multi_turn_dataset.eval_dataset_df.columns) # Index(['prompt', 'history', 'session_inputs'], dtype='object') # Run inference eval_dataset = client.evals.run_inference( src=multi_turn_dataset, agent=AGENT, ) # Workaround attempt: explicitly add conversation_history df2 = eval_dataset.eval_dataset_df.copy() df2["conversation_history"] = df2["history"] eval_dataset = types.EvaluationDataset(eval_dataset_df=df2) print(eval_dataset.eval_dataset_df.columns) # Index([... 'history', 'conversation_history', ...], dtype='object')
-
Create the multi‑turn evaluation run with
MULTI_TURN_GENERAL_QUALITYimport datetime import time print("📊 Running multi-turn evaluation...") run_type = "Auto" if AUTOMATED_RUN.lower() == "true" else "Manual" evaluation_run = client.evals.create_evaluation_run( display_name=f"{run_type}-MultiTurn-Eval-{AGENT_DISPLAY_NAME}-{datetime.datetime.now().strftime('%Y%m%d-%H%M%S')}", dataset=eval_dataset, agent_info=agent_info, metrics=[ types.RubricMetric.MULTI_TURN_GENERAL_QUALITY, ], dest=GCS_DEST, ) evaluation_run.show() # Poll to completion while evaluation_run.state not in {"SUCCEEDED", "FAILED", "CANCELLED"}: evaluation_run = client.evals.get_evaluation_run(name=evaluation_run.name) time.sleep(10) eval_result = client.evals.get_evaluation_run( name=evaluation_run.name, include_evaluation_items=True, ) eval_result.show()
Stack trace / error
The evaluation run consistently fails with FAILED_PRECONDITION:
📈 Multi-Turn Evaluation Results:
Status: FAILED
Error:
code=9 details=None message='code=FAILED_PRECONDITION, message=Evaluation items failed with errors:
Item ...: INVALID_ARGUMENT: code=INVALID_ARGUMENT, message=Error rendering metric prompt template: Variable conversation_history is required but not provided.., cause=null,
Item ...: INVALID_ARGUMENT: code=INVALID_ARGUMENT, message=Error rendering metric prompt template: Variable conversation_history is required but not provided.., cause=null,
...
cause=null'
This happens even when eval_dataset.eval_dataset_df clearly contains a conversation_history column which is a copy of history (each value is a list of {"role": ..., "parts": [{"text": ...}]} dicts).
Single‑turn evaluations in the same environment work as expected.
Expected behavior
- Either
MULTI_TURN_GENERAL_QUALITYshould accept the documentedhistoryfield for multi‑turn datasets, or - If
conversation_historyis now required by the metric prompt template, then providingconversation_history = historyin the dataset should allow the metric to run successfully instead of returning:
Error rendering metric prompt template: Variable conversation_history is required but not provided.
If a different schema is now required for multi‑turn agent evaluation (e.g. a request object wrapper or a different field name/structure), updated documentation or a validation error before running the metric would be very helpful.
Thanks!