Skip to content

Conversation

@aditigopalan
Copy link

@aditigopalan aditigopalan commented Feb 9, 2026

Problem:

The CLI didn't have a workflow for registering and binding JSON Schemas.

Specifically:

  • There was no register-json-schema CLI command
  • There was no bind-json-schema CLI command

Highlighted in SYNPY-1762

Solution:

  • Introduced a new schema_management.py module that provides thin wrapper functions (register_jsonschema() and bind_jsonschema()), keeping CLI logic minimal and consistent with previous work
  • Added a new register-json-schema CLI command that registers schemas using JSONSchema.store().
  • Added a new bind-json-schema CLI command that binds schemas to entities via Entity.bind_schema().
  • Updated CLI and Python tutorial documentation to reflect the new commands and recommended usage.

Testing:

  • Added unit tests covering both register-json-schema and bind-json-schema CLI commands.
  • Verified that schema registration and binding routes through JSONSchema.store() and Entity.bind_schema() respectively.
  • Manually tested the CLI commands to confirm expected behavior and alignment with the updated docs.

- Create schema_management.py with register_jsonschema and bind_jsonschema
- Export functions from curator extension __init__.py
- Update register_json_schema to use register_jsonschema wrapper
- Update bind_json_schema to use bind_jsonschema wrapper
- Add register-json-schema CLI command with argparse configuration
- Add register-json-schema to table of contents
- Add complete command documentation with usage and parameters
- Document schema registration workflow
- Update tutorial to cover complete JSON Schema workflow
- Add section 9: Register JSON Schema to Synapse
- Add section 10: Bind JSON Schema to entities
- Update tutorial script with register_jsonschema and bind_jsonschema examples
- Update imports and line references
- Add test for argument parsing
- Add test for --enable-derived-annotations flag
- Add test for API call invocation
Update test to mock bind_jsonschema wrapper instead of asyncio.run,
and verify correct arguments are passed to the wrapper function.
@aditigopalan aditigopalan requested a review from a team as a code owner February 9, 2026 20:54
@andrewelamb
Copy link
Contributor

Thanks for doing this @aditigopalan ! Could you add an integration test as well as the unit testing you did?

)

# Build the schema URI
schema_uri = f"{organization_name}-{schema_name}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSONSchema class has a uri attribute that does this for you.

if schema_version:
schema_uri += f"-{schema_version}"

message = f"Successfully registered schema '{schema_name}' to organization '{organization_name}'"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of returning this message, you should log it instead.


message = f"Successfully registered schema '{schema_name}' to organization '{organization_name}'"

return schema_uri, message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just return the JSONSchema object to the user.

synapse_client=syn,
)
else:
# Fallback to direct API call for old-style entities
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I'm not sure this is what we want to be doing here. The entity should really just be a Project, Folder or File correct? Those definitely have the bind_schema method.

- Update register_jsonschema to log message and return only schema_uri
- Update CLI function to match new signature
- Update docstring and examples
- Update return type from str to JSONSchema
- Update docstring and examples to show accessing json_schema.uri
- Update tutorial script accordingly
Add comprehensive integration tests for register_jsonschema and bind_jsonschema
wrapper functions. Tests cover:
- Registering schemas with and without version
- Binding schemas to folders
- Enabling derived annotations
- Complete register + bind workflow
Add integration tests for register-json-schema and bind-json-schema
CLI commands in test_command_line_client.py.
Tests cover:
- Registering a schema via CLI
- Binding a schema to an entity via CLI
- Complete workflow of register and bind operations
Copy link
Contributor

@linglp linglp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your hard work @aditigopalan ! I left some comments about documentation and code style changes.

with open(schema_path, "r") as f:
schema_body = json.load(f)

# Create JSONSchema instance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I don't think we need comments like this since the code is pretty self-explanatory

json_schema_uri: str,
enable_derived_annotations: bool = False,
synapse_client: Optional["Synapse"] = None,
) -> dict:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: What does entity.bind_schema() actually return?

When I checked out the bind_schema function, it returns a JSONSchemaBinding dataclass. Should this return the same?


syn = Synapse.get_client(synapse_client=synapse_client)

# Get the entity (File, Folder, or Project)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Like above, I think comments like these can be removed

Arguments:
schema_path: Path to the JSON schema file to register
organization_name: Name of the organization to register the schema under
schema_name: The name of the JSON schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Name of the JSON Schema
This is to be consistent with organization_name

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just meant that you have:
organization_name: Name of the organization to register the schema under
And similarly, we can do:
schema_name: Name of the JSON schema

syn.logger.info(
f"Successfully bound schema '{args.json_schema_uri}' to entity '{args.id}'"
)
syn.logger.info(f"Binding details: {result}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewelamb Should we print the binding details here? Will binding details be useful or distracting? When I tested it out, it seems like a bit distracting to read all the binding details.

(synapsePythonClient) lpeng@w290 synapsePythonClient % synapse bind-json-schema syn73698041 sage.schemas.v2571-el.BiospecimenHumanTemplate.schema-0.0.0 --enable-derived-annotations
Welcome, Lingling Peng! You are using the 'default' profile.
Successfully bound schema 'sage.schemas.v2571-el.BiospecimenHumanTemplate.schema-0.0.0' to entity 'syn73698041'
Binding details: JSONSchemaBinding(json_schema_version_info=JSONSchemaVersionInfo(organization_id='607', organization_name='sage.schemas.v2571', schema_id='7066', id='sage.schemas.v2571-el.BiospecimenHumanTemplate.schema-0.0.0', schema_name='el.BiospecimenHumanTemplate.schema', version_id='43041', semantic_version='0.0.0', json_sha256_hex='17339e10ff2e6d363b3cd96a80426faab4133c2ea6180d0fb6d201ee3d71598d', created_on='2025-07-31T16:33:39.171Z', created_by='3360851'), object_id=73698041, object_type='entity', created_on='2026-02-10T22:06:04.080Z', created_by='3443707', enable_derived_annotations=True)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, let's return the result here instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you meant to not return the result, and just Successfully bound schema '{args.json_schema_uri}' to entity '{args.id}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like:

result = bind_jsonschema(
        entity_id=args.id,
        json_schema_uri=args.json_schema_uri,
        enable_derived_annotations=args.enable_derived_annotations,
        synapse_client=syn,
    )
    syn.logger.info(
        f"Successfully bound schema '{args.json_schema_uri}' to entity '{args.id}'"
    )

return result


This tutorial uses the Python client as a library. To use the CLI tool, see the [command line documentation](../command_line_client.md).

## Prerequisites
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here users also need to have organizations that they can work with?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean we should list the organizations here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is referring to the fact that a Jsonschema organization is required, I don't think we need to list out all of the organizations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think you can have something like: "The user has a schema organization that they can work with" as a prerequisite.

{!docs/tutorials/python/tutorial_scripts/schema_operations.py!lines=1-6}
```

You'll need to import:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format here seems to be off Image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't think in other tutorials, we explain the imports. Login, imports, and set up variables can just be one step called "Initial set up". See example here: https://synapsepythonclient--1317.org.readthedocs.build/en/1317/tutorials/python/wiki/

Copy link
Member

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution @aditigopalan !

@andrewelamb or @linglp once all the changes are made, can one of you make sure the all the tests contributed here runs, since contributions via forks don't have CI tests.

Thanks all!

@aditigopalan
Copy link
Author

@linglp @andrewelamb thank you both for reviewing! Please let me know if I understood the documentation changes correctly.

Regarding the organizations section, I’m not entirely sure what would make sense to include in the documentation. Since new organizations can be created, I’m not sure what it means for the JSON schemas to be “consistent with organization name.”

For example, for HTAN2Organization, we generate schemas like these:
https://github.com/ncihtan/htan2-data-model/tree/main/JSON_Schemas/v1.2.0

I’m also not sure what other DCCs are doing in this space, so I’ll defer to you both (and @thomasyu888) on what makes sense to describe in the documentation.

@aditigopalan aditigopalan requested a review from linglp February 12, 2026 18:55
assert result is not None

# Verify the schema is actually bound by retrieving it
retrieved_folder = syn.get(folder.id, downloadFile=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get is deprecated here make sure to use the newer version as before.


# Verify the schema is actually bound by retrieving it
retrieved_folder = syn.get(folder.id, downloadFile=False)
bound_schema = retrieved_folder.get_schema(synapse_client=syn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work because the object you are getting from the deprecated get method doesn't have the method.

if TYPE_CHECKING:
from synapseclient import Synapse
from synapseclient.models.schema_organization import JSONSchema, JSONSchemaBinding

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is something wrong with the JSONSchemaBinding import, use:
from synapseclient.models.mixins.json_schema import JSONSchemaBinding


# Verify the binding
assert result is not None
assert "entityId" in result or "entity_id" in result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result here is a JSONSchemaBinding, this assert statement won't work. What are you trying to test here?


# Verify the binding
assert result is not None
assert "entityId" in result or "entity_id" in result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

saem as below

assert json_schema is not None
assert json_schema.uri is not None
assert json_schema.name == schema_name
assert json_schema.version is not None # Should have auto-generated version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSONSchemas don't have an innate version, it can contain many versions.

assert json_schema is not None
assert json_schema.uri is not None
assert json_schema.name == schema_name
assert json_schema.version == version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as bellow

def delete_org():
# Delete all schemas in the organization
for schema in org.get_json_schemas(synapse_client=syn):
schema.delete(synapse_client=syn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is failing. One of the tests below is doing this cleanup, but the schema is still bound to an folder. Either you need to unbind the schema, or delete the folder first.

@andrewelamb
Copy link
Contributor

@aditigopalan Thank you for adding the integration tests! However, they are not passing. These all need to be resolved:

ERROR tests/integration/synapseclient/test_command_line_client.py::TestSchemaManagementCommands::test_register_json_schema - ValueError: Name may be separated by periods, but each part must start with a letter and contain only letters and numbers: test.org.1e893b7b
ERROR tests/integration/synapseclient/test_command_line_client.py::TestSchemaManagementCommands::test_bind_json_schema - ValueError: Name may be separated by periods, but each part must start with a letter and contain only letters and numbers: test.org.3b8d5232
ERROR tests/integration/synapseclient/test_command_line_client.py::TestSchemaManagementCommands::test_register_and_bind_workflow - ValueError: Name may be separated by periods, but each part must start with a letter and contain only letters and numbers: test.org.193d052a
FAILED tests/integration/synapseclient/test_schema_management.py::TestRegisterJsonSchema::test_register_jsonschema_with_version - AttributeError: 'JSONSchema' object has no attribute 'version'
FAILED tests/integration/synapseclient/test_schema_management.py::TestRegisterJsonSchema::test_register_jsonschema_without_version - AttributeError: 'JSONSchema' object has no attribute 'version'
FAILED tests/integration/synapseclient/test_schema_management.py::TestBindJsonSchema::test_bind_jsonschema_to_folder - TypeError: argument of type 'JSONSchemaBinding' is not iterable
FAILED tests/integration/synapseclient/test_schema_management.py::TestBindJsonSchema::test_bind_jsonschema_with_derived_annotations - TypeError: argument of type 'JSONSchemaBinding' is not iterable
FAILED tests/integration/synapseclient/test_schema_management.py::TestRegisterAndBindWorkflow::test_complete_workflow - AttributeError: get_schema
ERROR tests/integration/synapseclient/test_schema_management.py::TestRegisterAndBindWorkflow::test_complete_workflow - synapseclient.core.exceptions.SynapseHTTPError: 400 Client Error: Cannot delete a schema that is bound to an object

- previously tried to test if that the binding result contains entity information (like entityId or entity_id)
- result is a JSONSchemaBinding object (a Python class instance), not a dictionary
- just checking that the result is not none should be sufficient
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants