Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -1 +1,10 @@
OPENSYNDROME_PROVIDER=ollama
OPENSYNDROME_MODEL=mistral
# mandatory only if using Ollama
OLLAMA_BASE_URL=http://localhost:11434
# optional: add the API KEY of your desired provider in your .env file
ANTHROPIC_API_KEY=sk-ant-...
MISTRAL_API_KEY=...
DEEPSEEK_API_KEY=...
GEMINI_API_KEY=...
OPENAI_API_KEY=sk-...
42 changes: 31 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@

## Installation

You can install it from PyPI or Docker. To use the conversion features,
you will need to have [Ollama](https://github.com/ollama/ollama) installed.
You can install it from PyPI or Docker. By default, the conversion features use [Ollama](https://github.com/ollama/ollama) running locally.
Cloud providers (OpenAI, Anthropic, Mistral, DeepSeek, Gemini) are also supported and require only an API key.

From PyPi, install the package with `pip install opensyndrome`. Then run it with `opensyndrome --help`.

Expand Down Expand Up @@ -46,13 +46,32 @@ opensyndrome download definitions

The files will be placed in the folder `.open_syndrome` in `$HOME`.

### Convert a human-readable syndrome definition to a machine-readable JSON
### Providers and configuration

The provider and model can be set via environment variables so you don't have to pass them on every command:

```bash
OPENSYNDROME_PROVIDER=ollama # ollama (default), openai, anthropic, mistral, deepseek, gemini
OPENSYNDROME_MODEL=mistral # overrides the provider's default model
```

You need to have [Ollama](https://github.com/ollama/ollama) installed locally
to use this feature. Pull the models you want to use with `opensyndrome` before running the command.
We have tested llama3.2, mistral, and deepseek-r1 so far.
Copy `.env.example` to `.env` and fill in the relevant values:

Don't go well with structured output: qwen2.5-coder
| Provider | Required env var | Default model |
|----------|-----------------|---------------|
| `ollama` | — (runs locally) | `mistral` |
| `openai` | `OPENAI_API_KEY` | `gpt-4o` |
| `anthropic` | `ANTHROPIC_API_KEY` | `claude-3-haiku-20240307` |
| `mistral` | `MISTRAL_API_KEY` | `mistral-large-latest` |
| `deepseek` | `DEEPSEEK_API_KEY` | `deepseek-chat` |
| `gemini` | `GEMINI_API_KEY` | `gemini-1.5-flash` |

For Ollama, the model must be pulled before use: `ollama pull mistral`. You can also override the Ollama base URL with `OLLAMA_BASE_URL`
(default: `http://localhost:11434`).

Ollama models tested: `llama3.2`, `mistral`, `deepseek-r1`. Known to not work well with structured output: `qwen2.5-coder`.

### Convert a human-readable syndrome definition to a machine-readable JSON

> If you do not pass `-hr` or `-hf`, an editor will open for you to enter the definition.

Expand All @@ -65,10 +84,11 @@ opensyndrome convert -hr "Any person with pneumonia"
# pass the definition from a TXT file
opensyndrome convert -hf definition.txt

opensyndrome convert --model mistral
# use a specific provider and model
opensyndrome convert -hr "Any person with pneumonia" --provider openai --model gpt-4o

# to have the JSON translated to a specific language and edit it just after conversion
opensyndrome convert --language "Português do Brasil" --model mistral --edit
opensyndrome convert --language "Português do Brasil" --edit

# include a validation step after conversion
opensyndrome convert --validate
Expand All @@ -78,8 +98,8 @@ opensyndrome convert --validate

```bash
opensyndrome humanize <path-to-json-file>
opensyndrome humanize <path-to-json-file> --model mistral
opensyndrome humanize <path-to-json-file> --model mistral --language "Português do Brasil"
opensyndrome humanize <path-to-json-file> --provider anthropic
opensyndrome humanize <path-to-json-file> --model mistral-large-latest --language "Português do Brasil"
```

### Validate a machine-readable JSON syndrome definition
Expand Down
94 changes: 66 additions & 28 deletions opensyndrome/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,20 @@
from pygments import highlight, lexers, formatters
import jsonschema
import click
import ollama
from ollama._types import ResponseError
from instructor.core.exceptions import InstructorRetryException

from opensyndrome.converters import (
generate_machine_readable_format,
generate_human_readable_format,
)
from opensyndrome.artifacts import get_schema_filepath, get_definition_dir
from opensyndrome.validators import validate_machine_readable_format
from opensyndrome.providers import (
build_model_string,
check_provider_available,
DEFAULT_PROVIDER,
SUPPORTED_PROVIDERS,
)


@click.group()
Expand Down Expand Up @@ -54,25 +59,27 @@ def color_json(json_definition: dict):
return highlight(formatted_json, lexers.JsonLexer(), formatters.TerminalFormatter())


def is_ollama_available():
try:
ollama.list()
return True
except (ConnectionError, ResponseError):
return False
def _show_llm_error(exception: Exception, provider: str, model: str) -> None:
click.echo(
click.style(
f"❌ Request to LLM failed for {provider} {model} after {exception.n_attempts} attempts:\n"
f"Details: {exception.args[0].message}",
fg="red",
),
err=True,
)


def check_ollama(func):
def check_provider(func):
@wraps(func)
def wrapper(*args, **kwargs):
if not is_ollama_available():
click.echo(
click.style("Ollama service is missing or unavailable.", fg="red"),
err=True,
)
return
else:
return func(*args, **kwargs)
provider = kwargs.get("provider", DEFAULT_PROVIDER)
model = kwargs.get("model", "mistral")
available, message = check_provider_available(provider, model)
if not available:
click.echo(click.style(message, fg="red"), err=True)
raise click.Abort()
return func(*args, **kwargs)

return wrapper

Expand All @@ -84,8 +91,9 @@ def wrapper(*args, **kwargs):
@click.option(
"--model",
type=str,
help="Model used to generate the JSON file.",
default="mistral",
help="Model to use. Defaults per provider: ollama=mistral, openai=gpt-4o, anthropic=claude-3-haiku-20240307, mistral=mistral-large-latest, deepseek=deepseek-chat, gemini=gemini-1.5-flash.",
default=None,
envvar="OPENSYNDROME_MODEL",
)
@click.option(
"--language",
Expand All @@ -110,14 +118,22 @@ def wrapper(*args, **kwargs):
type=click.Path(exists=True),
help="Path to a TXT file containing the human-readable definition.",
)
@check_ollama
@click.option(
"--provider",
type=click.Choice(SUPPORTED_PROVIDERS),
default=DEFAULT_PROVIDER,
envvar="OPENSYNDROME_PROVIDER",
help="LLM provider to use.",
)
@check_provider
def convert_to_json(
validate,
model,
language,
edit,
human_readable_definition,
human_readable_definition_file,
provider,
):
"""
Convert human-readable definition (TEXT) to the machine-readable format (JSON).
Expand All @@ -130,9 +146,15 @@ def convert_to_json(
human_readable_definition = Path(human_readable_definition_file).read_text()
if not human_readable_definition:
human_readable_definition = click.edit(extension=".txt")
machine_readable_definition = generate_machine_readable_format(
human_readable_definition, model, language
)
resolved_model = build_model_string(provider, model)
click.echo(click.style(f"Using {provider} / {resolved_model}", fg="cyan"), err=True)
try:
machine_readable_definition = generate_machine_readable_format(
human_readable_definition, model, language, provider
)
except InstructorRetryException as exception:
_show_llm_error(exception, provider, model)
return

if edit:
machine_readable_definition_edited = click.edit(
Expand All @@ -152,20 +174,36 @@ def convert_to_json(
@click.option(
"--model",
type=str,
help="Model used to generate the JSON file.",
default="mistral",
help="Model to use. Defaults per provider: ollama=mistral, openai=gpt-4o, anthropic=claude-3-haiku-20240307, mistral=mistral-large-latest, deepseek=deepseek-chat, gemini=gemini-1.5-flash.",
default=None,
envvar="OPENSYNDROME_MODEL",
)
@click.option(
"--language",
type=str,
help="Language used to generate the human-readable definition.",
default="American English",
)
@check_ollama
def convert_to_text(json_file, model, language):
@click.option(
"--provider",
type=click.Choice(SUPPORTED_PROVIDERS),
default=DEFAULT_PROVIDER,
envvar="OPENSYNDROME_PROVIDER",
help="LLM provider to use.",
)
@check_provider
def convert_to_text(json_file, model, language, provider):
"""Convert a machine-readable format (JSON) to a human-readable format (TEXT)."""
resolved_model = build_model_string(provider, model)
click.echo(click.style(f"Using {provider} / {resolved_model}", fg="cyan"), err=True)
machine_readable_definition = json.loads(Path(json_file).read_text())
text = generate_human_readable_format(machine_readable_definition, model, language)
try:
text = generate_human_readable_format(
machine_readable_definition, model, language, provider
)
except InstructorRetryException as exception:
_show_llm_error(exception, provider, model)
return
click.echo(click.style(text, fg="green"))


Expand Down
76 changes: 33 additions & 43 deletions opensyndrome/converters.py
Original file line number Diff line number Diff line change
@@ -1,28 +1,34 @@
import json
import logging
import litellm
from datetime import datetime
from importlib.resources import files
from pathlib import Path
import random

from dotenv import load_dotenv
from ollama import chat

from opensyndrome.artifacts import get_schema_filepath
from opensyndrome.schema import OpenSyndromeCaseDefinitionSchema
from opensyndrome.providers import (
DEFAULT_MODEL,
DEFAULT_PROVIDER,
build_model_string,
get_instructor_client,
get_litellm_kwargs,
)

load_dotenv()
logger = logging.getLogger(__name__)
DEFAULT_MODEL = "mistral"


def load_examples(examples_dir: Path, random_k=None):
json_definitions = {}
for raw_json in examples_dir.glob("**/*"):
if not raw_json.name.endswith(".json"):
continue
if raw_json.read_text() != "":
content = json.loads(raw_json.read_text())
if raw_json.read_text(encoding="utf-8") != "":
content = json.loads(raw_json.read_text(encoding="utf-8"))
if content:
json_definitions[raw_json.stem] = content

Expand Down Expand Up @@ -112,25 +118,11 @@ def _fill_automatic_fields(
return machine_readable_definition


def _drop_regex_pattern(node: dict):
"""Recursively drop 'pattern' keys from the schema since it is not supported.

Issue: https://github.com/ollama/ollama-python/issues/541"""
original_node = node.copy()
dropped = node.pop("pattern", None)
if dropped is not None:
logger.warning(f"Dropped 'pattern' from {original_node}")
for value in node.values():
if isinstance(value, dict):
_drop_regex_pattern(value)
elif isinstance(value, list):
for item in value:
if isinstance(item, dict):
_drop_regex_pattern(item)


def generate_machine_readable_format(
human_readable_definition, model=DEFAULT_MODEL, language="American English"
human_readable_definition,
model=DEFAULT_MODEL,
language="American English",
provider=DEFAULT_PROVIDER,
):
if not human_readable_definition:
raise ValueError("Human-readable definition cannot be empty.")
Expand All @@ -143,24 +135,19 @@ def generate_machine_readable_format(
language=language,
)

json_schema = OpenSyndromeCaseDefinitionSchema.model_json_schema()
_drop_regex_pattern(json_schema)
response = chat(
client = get_instructor_client(provider)
model_string = build_model_string(provider, model)
instance = client.chat.completions.create(
model=model_string,
messages=[{"role": "user", "content": formatted_prompt}],
model=model,
format=json_schema,
options={"temperature": 0},
stream=False,
response_model=OpenSyndromeCaseDefinitionSchema,
temperature=0,
**get_litellm_kwargs(provider),
)

machine_readable_definition = json.loads(response.message.content)
if isinstance(machine_readable_definition, list):
if len(machine_readable_definition) > 1:
logger.warning("More than one definition generated...")
machine_readable_definition = machine_readable_definition[0]

return _fill_automatic_fields(
machine_readable_definition, human_readable_definition
instance.model_dump(exclude_none=True, by_alias=True, mode="json"),
human_readable_definition,
)


Expand All @@ -175,7 +162,10 @@ def _exclude_metadata_fields(definition: dict):


def generate_human_readable_format(
machine_readable_definition, model=DEFAULT_MODEL, language="American English"
machine_readable_definition,
model=DEFAULT_MODEL,
language="American English",
provider=DEFAULT_PROVIDER,
):
if not machine_readable_definition:
raise ValueError("Machine-readable definition cannot be empty.")
Expand All @@ -186,11 +176,11 @@ def generate_human_readable_format(
machine_readable_definition
),
)
response = chat(
model_string = build_model_string(provider, model)
response = litellm.completion(
model=model_string,
messages=[{"role": "user", "content": formatted_prompt}],
model=model,
options={"temperature": 0},
stream=False,
temperature=0,
**get_litellm_kwargs(provider),
)
human_readable_definition = response.message.content
return human_readable_definition
return response.choices[0].message.content
Loading
Loading