SMSD Pro

Substructure & MCS Search for Chemical Graphs

SMSD Pro provides exact substructure search and maximum common substructure (MCS) search for chemical graphs. It is available for Java, C++ (header-only), and Python. Optional GPU paths are available for CUDA and Apple Metal builds.

Version 7.1.1 is a bug-fix patch on top of 7.1.0. It delivers cross-language ECFP/FCFP fingerprint parity between Java, C++, and Python, fixes a canonical SMILES writer corner case, restores the correct semantics of MatchResult.overlapCoefficient, and fixes a Kekulization edge case so the canonical SMILES output round-trips cleanly in downstream readers for pyrrole-type aromatic nitrogen. No new features and no public API breakage for existing callers. Builds on the 7.1.0 unified Python API (find_mcs, find_substructure).

Dalke Nearest-Neighbor MCS Benchmark (1,000 pairs)

We benchmark on the community-standard Dalke NN dataset (1,000 high-similarity ChEMBL pairs) — the same dataset widely used by RDKit, CDK, and the academic MCS literature. Identical SMILES input, same 10 s timeout, same process, same machine. We gratefully acknowledge Andrew Dalke's foundational work on MCS benchmarking.

Metric	SMSD Pro 7.1.1	RDKit FindMCS 2026.03
Total time	40 s	213 s
Median time	0.6 ms	0.4 ms
Mean MCS size	25.8 atoms	25.0 atoms
Timeouts	0	8
Larger-MCS wins	211 (21 %)	29 (3 %)

SMSD Pro's adaptive multi-strategy native engine complements RDKit's well-proven VF2-based approach. Both engines are excellent; SMSD tends to find slightly larger common substructures on hard pairs while RDKit offers superb median-case latency. We recommend choosing based on your workload: SMSD for coverage-critical applications (reaction mapping, SAR), RDKit for high-throughput screening where median speed dominates.

Full benchmark suite and reproduction scripts in benchmarks/.

Guides and References

Document	Description
Examples, How-To, and Cautions	Worked examples for every feature with cautions and performance tips
Python API Guide	Full Python API reference with code examples
Java Guide	Java API and CLI usage
C++ Guide	Header-only C++ integration
Release Notes	What's new in this release
How to Install	Build from source on all platforms
Changelog	Full versioned change history

Molfile Support

V2000 and V3000 core graph round-trip, names/comments, SDF properties, charges, isotopes, atom classes/maps, R# plus M RGP, and basic stereo flags.

Install

Java (Maven)

<dependency>
  <groupId>com.bioinceptionlabs</groupId>
  <artifactId>smsd</artifactId>
  <version>7.1.1</version>
</dependency>

Java (Download JAR)

curl -LO https://github.com/asad/SMSD/releases/download/v7.1.1/smsd-7.1.1-jar-with-dependencies.jar

java -jar smsd-7.1.1-jar-with-dependencies.jar \
  --Q SMI --q "c1ccccc1" --T SMI --t "c1ccc(O)cc1" --json -

Python (PyPI)

pip install smsd

Supported CPython versions: 3.10 through 3.13. Wheels available for Linux (x86_64, aarch64), macOS (arm64, x86_64), and Windows (x64). CPU execution is the default path. CUDA and Metal acceleration are optional. RDKit and Open Babel are optional interop layers.

import smsd

result = smsd.find_substructure("c1ccccc1", "c1ccc(O)cc1")
mcs    = smsd.find_mcs("c1ccccc1", "c1ccc2ccccc2c1")

# Tautomer-aware MCS
mcs    = smsd.find_mcs("CC(=O)C", "CC(O)=C", tautomer_aware=True)

# Prefer rare heteroatoms (S, P, Se) in MCS scoring
mcs    = smsd.find_mcs("C[S+](C)CCC(N)C(=O)O", "SCCC(N)C(=O)O",
                   prefer_rare_heteroatoms=True)

# Similarity upper bound (fast pre-filter)
sim    = smsd.similarity("c1ccccc1", "c1ccc(O)cc1")

fp     = smsd.fingerprint("c1ccccc1", kind="mcs")

# Circular fingerprint (ECFP4 equivalent)
ecfp4 = smsd.fingerprint_from_smiles("c1ccccc1", radius=2, fp_size=2048)

Java API

import com.bioinception.smsd.core.*;

SMSD smsd = new SMSD(mol1, mol2, new ChemOptions());
boolean isSub = smsd.isSubstructure();
var mcs = smsd.findMCS();

// CIP stereo assignment (Rules 1-5, including pseudoasymmetric r/s)
Map<Integer, Character> stereo = CIPAssigner.assignRS(g);
Map<Long, Character> ez = CIPAssigner.assignEZ(g);

// Batch MCS with non-overlap constraints
var mappings = SearchEngine.batchMCSConstrained(queries, targets, new ChemOptions(), 10_000);

Python — Advanced Features

import smsd

# --- Structured MCS Result ---
result = smsd.mcs_result("c1ccccc1", "c1ccc(O)cc1")
print(result.size)          # 6
print(result.overlapCoefficient)  # 0.857 (overlap coefficient)
print(result.mcs_smiles)    # "c1ccccc1"
print(result.mapping)       # {0: 0, 1: 1, ...}

# --- Works with any input type ---
# SMILES strings
mcs = smsd.find_mcs("c1ccccc1", "c1ccc(O)cc1")

# MolGraph objects (pre-parsed, fastest for batch)
g1 = smsd.parse_smiles("c1ccccc1")
g2 = smsd.parse_smiles("c1ccc(O)cc1")
mcs = smsd.find_mcs(g1, g2)

# Native Mol objects (auto-detected, indices returned in native ordering)
# from rdkit import Chem
# mcs = smsd.find_mcs(Chem.MolFromSmiles("c1ccccc1"), Chem.MolFromSmiles("c1ccc(O)cc1"))

# --- Fingerprints ---
g = smsd.parse_smiles("c1ccccc1")
ecfp4  = smsd.circular_fingerprint(g, radius=2, fp_size=2048)
fcfp4  = smsd.circular_fingerprint(g, radius=2, fp_size=2048, mode="fcfp")
counts = smsd.circular_fingerprint_counts(g, radius=2, fp_size=2048)
torsion = smsd.topological_torsion("c1ccccc1", fp_size=2048)
tan    = smsd.overlap_coefficient(ecfp4, ecfp4)

# --- 2D Layout ---
g = smsd.parse_smiles("c1ccc2c(c1)cc1ccccc1c2")  # phenanthrene
coords = smsd.generate_coords_2d(g)
_, coords = smsd.force_directed_layout(g, coords, max_iter=500, target_bond_length=1.5)
_, coords = smsd.stress_majorisation(g, coords, max_iter=300)
crossings = smsd.reduce_crossings(g, coords, max_iter=2000)

Python — MCS Variants & Batch Operations

import smsd

# --- All MCS variants ---
mcs = smsd.find_mcs("c1ccccc1", "c1ccc(O)cc1")                     # Connected MCS (default)
mcs = smsd.find_mcs("c1ccccc1", "c1ccc(O)cc1", connected_only=False) # Disconnected MCS
mcs = smsd.find_mcs("c1ccccc1", "c1ccc(O)cc1", induced=True)         # Induced MCS
mcs = smsd.find_mcs("c1ccccc1", "c1ccc(O)cc1", maximize_bonds=True)  # Edge MCS (MCES)

# Find top-N distinct MCS solutions
all_mcs = smsd.find_mcs("c1ccccc1", "c1ccc(O)cc1", max_results=5)

# SMARTS-based MCS
mcs = smsd.find_mcs_smarts("[#6]~[#7]", "c1ccc(N)cc1")

# Scaffold MCS (Murcko framework)
scaffold = smsd.find_scaffold_mcs(
    smsd.parse_smiles("CC(=O)Oc1ccccc1C(=O)O"),
    smsd.parse_smiles("Oc1ccccc1C(=O)O")
)

# R-group decomposition
rgroups = smsd.decompose_r_groups("c1ccccc1", ["c1ccc(O)cc1", "c1ccc(N)cc1"])

# --- Substructure Search ---
hit = smsd.find_substructure("c1ccccc1", "c1ccc(O)cc1")
all_matches = smsd.find_substructure("c1ccccc1", "c1ccc(O)cc1", max_results=10)

# SMARTS pattern matching
matches = smsd.smarts_match("[OH]", smsd.parse_smiles("c1ccc(O)cc1"))

# --- Similarity & Screening ---
sim = smsd.overlap_coefficient(
    smsd.circular_fingerprint(smsd.parse_smiles("CCO"), radius=2),
    smsd.circular_fingerprint(smsd.parse_smiles("CCCO"), radius=2)
)
dice = smsd.dice(
    smsd.circular_fingerprint_counts(smsd.parse_smiles("CCO"), radius=2),
    smsd.circular_fingerprint_counts(smsd.parse_smiles("CCCO"), radius=2)
)

# --- Chemistry Options ---
# Tautomer-aware with solvent and pH
mcs = smsd.find_mcs("CC(=O)C", "CC(O)=C", tautomer_aware=True)

# Loose bond matching (FMCS-style)
mcs = smsd.find_mcs("c1ccccc1", "C1CCCCC1", match_bond_order="loose")

# --- Canonical SMILES ---
# v7.1.1: canonical_smiles() and to_smiles() accept a SMILES string OR a MolGraph.
# Output is byte-identical across the Java, C++, and Python engines.
smi = smsd.canonical_smiles("OC(=O)c1ccccc1")                 # from SMILES string
smi = smsd.to_smiles(smsd.parse_smiles("OC(=O)c1ccccc1"))     # from MolGraph
mcs_smi = smsd.mcs_to_smiles(g1, mapping)                     # extract MCS as SMILES

# --- CIP Stereo Assignment ---
g = smsd.parse_smiles("N[C@@H](C)C(=O)O")  # L-alanine
stereo = smsd.assign_rs(g)                   # {1: 'S'}
ez = smsd.assign_ez(smsd.parse_smiles("C/C=C/C"))  # E-2-butene

# --- Native MolGraph I/O ---
g = smsd.parse_smiles("c1ccccc1")
g = smsd.read_molfile("molecule.mol")
mol_block = smsd.write_mol_block(g)
v3000 = smsd.write_mol_block_v3000(g)
smsd.write_molfile(g, "molecule_out.mol", v3000=True)
smsd.export_sdf([g1, g2], "output.sdf")

Publication-Quality Depiction (ACS 1996 Standard)

Zero-dependency SVG renderer — the same specification used by Nature, Science, JACS, and Springer journals. See Examples for full usage guide.

import smsd

# Render any molecule as publication-quality SVG
svg = smsd.depict_svg("CC(=O)Oc1ccccc1C(=O)O")  # aspirin
smsd.save_svg(svg, "aspirin.svg")

# MCS comparison — side-by-side with highlighted matching atoms
mol1 = smsd.parse_smiles("c1ccccc1")
mol2 = smsd.parse_smiles("c1ccc(O)cc1")
mapping = smsd.find_mcs(mol1, mol2)
svg = smsd.depict_pair(mol1, mol2, mapping)
smsd.save_svg(svg, "mcs_comparison.svg")

# Substructure highlighting
svg = smsd.depict_mapping(mol2, mapping)

# Custom styling (all ACS proportions auto-scale from bond_length)
svg = smsd.depict_svg("Cn1cnc2c1c(=O)n(c(=O)n2C)C",  # caffeine
    bond_length=50, width=600, height=400)

# Export to SDF file
mols = [smsd.parse_smiles(s) for s in ["CCO", "c1ccccc1", "CC(=O)O"]]
smsd.export_sdf(mols, "output.sdf")

Features: skeletal formula, Jmol/CPK element colors, asymmetric double bonds, wedge/dash stereo, H-count subscripts, charge superscripts, bond-to-label clipping, aromatic inner circles, atom map numbers.

C++ (Header-Only)

git clone https://github.com/asad/SMSD.git
# Add SMSD/cpp/include to your include path — no other dependencies needed

#include "smsd/smsd.hpp"

auto mol1 = smsd::parseSMILES("c1ccccc1");
auto mol2 = smsd::parseSMILES("c1ccc(O)cc1");

bool isSub = smsd::isSubstructure(mol1, mol2, smsd::ChemOptions{});
auto mcs   = smsd::findMCS(mol1, mol2, smsd::ChemOptions{}, smsd::MCSOptions{});

// Batch MCS with non-overlap constraints
auto mappings = smsd::batchMCSConstrained(queries, targets, smsd::ChemOptions{});

Build from Source

git clone https://github.com/asad/SMSD.git
cd SMSD

# Java
mvn -U clean package

# C++
mkdir cpp/build && cd cpp/build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)

# Python
cd python && pip install -e .

Docker

docker build -t smsd .
docker run --rm smsd --Q SMI --q "c1ccccc1" --T SMI --t "c1ccc(O)cc1" --json -

Benchmarks

MCS Performance (Python)

Representative pairs from the checked-in Python benchmark results on the same machine and in the same Python process. Full data: benchmarks/results_python.tsv For the maintained local core leaderboard, run python3 benchmarks/benchmark_leaderboard.py --mode core --compare-mode strict. Use the mode-matched core leaderboard for current cross-tool comparisons.

Pair	Category	SMSD (ms)	MCS Size
Cubane (self)	Cage	0.003	8
Coronene (self)	PAH	0.006	24
NAD / NADH	Cofactor	0.012	44
Caffeine / Theophylline	N-methyl diff	0.017	13
Morphine / Codeine	Alkaloid	0.079	20
Ibuprofen / Naproxen	NSAID	0.070	15
ATP / ADP	Nucleotide	0.148	27
PEG-12 / PEG-16	Polymer	0.039	40
Paclitaxel / Docetaxel	Taxane	1,691	56

Substructure Performance (Java)

Current maintained cached Java core summary: 28/28 hit agreement and 28/28 favourable timings on the local curated corpus.

Run python3 benchmarks/benchmark_leaderboard.py --mode core --compare-mode strict to refresh the maintained local summary.

External Benchmark Datasets

Community-standard datasets for reproducible evaluation, stored in benchmarks/data/:

Dataset	Pairs/Patterns	Source	Purpose
Tautobase (Chodera subset)	468 tautomer pairs	Wahl & Sander 2020	Tautomer-aware MCS validation
Tautobase (full SMIRKS)	1,680 pairs	Wahl & Sander 2020	Tautomer transform coverage
Ehrlich-Rarey SMARTS v2.0	1,400 patterns	Ehrlich & Rarey 2012	Substructure search validation
Dalke-style random pairs	1,000 pairs	MoleculeNet drug collections	Low-similarity MCS scaling
Dalke-style NN pairs	1,000 pairs	MoleculeNet drug collections	High-similarity MCS quality
Stress pairs	12 pairs	Duesbury et al. 2017	Timeout/robustness
Molecule pool	5,590 SMILES	MoleculeNet (BBBP, SIDER, ClinTox, BACE)	Pair generation source

# Run external benchmarks (Java)
mvn test -Dtest=ExternalBenchmarkTest -Dbenchmark=true

# Run external benchmarks (Python)
SMSD_BENCHMARK=1 pytest python/tests/test_external_benchmarks.py -v -s

# Regenerate Dalke-style pairs (requires RDKit)
python benchmarks/generate_dalke_pairs.py

Algorithms

MCS Engine

SMSD Pro ships an adaptive multi-strategy MCS engine that selects the best technique for each input pair. Implementation details are subject to change between minor releases.

Public algorithmic foundations the engine builds on (citations only, not the SMSD pipeline itself):

Foundation	Reference
Partition-refinement clique search	McCreesh, Prosser & Trimble, J. Artif. Intell. Res. 2017
Edge-growth backtracking	McGregor, Software: Practice & Experience 1982
Maximum-clique enumeration	Bron & Kerbosch, Comm. ACM 1973; Tomita et al. 2006
Subgraph isomorphism (VF2++)	Juttner & Madarasi, Discrete Appl. Math. 2018
Ring perception	Vismara, J. Chem. Inf. Comput. Sci. 1997

MCS Variants

Variant	Flag
MCIS (induced)	`induced=true`
MCCS (connected)	default
MCES (edge subgraph)	`maximizeBonds=true`
dMCS (disconnected)	`disconnectedMCS=true`
N-MCS (multi-molecule)	`findNMCS()`
Weighted MCS	`atomWeights`
Scaffold MCS	`findScaffoldMCS()`
Tautomer-aware MCS	`ChemOptions.tautomerProfile()`

Substructure Search (VF2++)

VF2++ (Juttner & Madarasi 2018) matcher with optional GPU-accelerated domain initialization (CUDA + Metal). Implementation details are subject to change between minor releases.

Ring Perception

Horton's candidate generation + 2-phase GF(2) elimination (Vismara 1997) for relevant cycles, orbit-based grouping for Unique Ring Families (URFs).

Output	Description
SSSR / MCB	Smallest Set of Smallest Rings
RCB	Relevant Cycle Basis
URF	Unique Ring Families (automorphism orbit grouping)

Chemistry Options

Option	Values
Chirality	R/S tetrahedral, E/Z double bond
Isotope	`matchIsotope=true`
Tautomers	30 transforms with pKa-informed weights (Sitzmann 2010, Dhaked & Nicklaus 2024)
Solvent	AQUEOUS, DMSO, METHANOL, CHLOROFORM, ACETONITRILE, DIETHYL_ETHER
Ring fusion	IGNORE / PERMISSIVE / STRICT
Bond order	STRICT / LOOSE / ANY
Aromaticity	STRICT / FLEXIBLE
Lenient SMILES	`ParseOptions{.lenient=true}` (C++) / `ChemOptions.lenientSmiles` (Java)

Preset profiles: ChemOptions() (default), .tautomerProfile(), .fmcsProfile()

With the default chemistry profile, ringMatchesRingOnly=true enforces ring/non-ring parity for matched atoms and bonds in both directions. Use .fmcsProfile() when you explicitly want loose FMCS-style topology where ring atoms may map to chain atoms and partial ring fragments are accepted.

Solvent-aware tautomers (Tier 2 pKa): opts.withSolvent(Solvent.DMSO) adjusts tautomer equilibrium weights for non-aqueous environments.

Platform & GPU Support

Platform	CPU	GPU
macOS (Apple Silicon)	OpenMP	Metal (zero-copy unified memory)
Linux	OpenMP	CUDA
Windows	OpenMP	CUDA
Any (no GPU)	OpenMP	Automatic CPU fallback

GPU acceleration covers RASCAL batch screening, Tanimoto clustering, and substructure domain initialization. Recursive matching runs on CPU. Dispatch: CUDA -> Metal -> OpenMP -> sequential.

Performance Caching

SMSD employs multi-level caching to eliminate redundant computation in batch and reaction workloads:

Cache	Target	Benefit
MolGraph identity cache	Molecule object conversion	Same molecule reused across 6-18 calls per reaction pair
Domain space cache	VF2++ atom compatibility matrix	Avoids O(Nq*Nt) rebuild on repeated queries
ECFP/FCFP fingerprint cache	Default-parameter fingerprints	337x speedup on repeated fingerprint calls
Pharmacophore features cache	FCFP atom invariants	Eliminates O(n*degree^2) per FCFP call
C++ GraphBuilder compat matrix	All MCS strategies	Pre-computed once, shared across algorithms

Call SearchEngine.clearMolGraphCache() (Java) or reuse MolGraph instances (C++/Python) between batches.

Additional Tools

Tool	Description
CIP R/S/E/Z assignment	Full digraph-based stereo descriptors (IUPAC 2013 Rules 1-5) including Rule 3 (Z > E), like/unlike pairing, and pseudoasymmetric r/s
Circular fingerprint (ECFP/FCFP)	Tautomer-aware Morgan/ECFP with configurable radius (-1 = whole molecule)
Count-based ECFP/FCFP	`ecfpCounts()` / `fcfpCounts()` — superior to binary for ML
Topological Torsion fingerprint	4-atom path with atom typing (SOTA on peptide benchmarks)
Path fingerprint	Graph-aware, tautomer-invariant path enumeration
MCS fingerprint	MCS-aware, auto-sized
Similarity metrics	Tanimoto, Dice, Cosine, Soergel (binary + count-vector)
Fingerprint formats	`toBitSet()`, `toHex()`, `toBinaryString()`, `fromBitSet()`, `fromHex()`
MCS SMILES extraction	`findMCSSMILES()` — extract MCS as canonical SMILES
findAllMCS	Top-N MCS enumeration with canonical SMILES dedup
SMARTS-based MCS	`findMCSSMARTS()` — largest substructure matching a SMARTS pattern
R-group decomposition	`decomposeRGroups()`
MatchResult	Structured result: size, mapping, overlap coefficient, query/target atom counts
RASCAL screening	O(V+E) similarity upper bound
Canonical SMILES / SMARTS	deterministic, toolkit-independent (including `X` total connectivity)
Publication-quality SVG depiction	ACS 1996 standard renderer: skeletal formulas, Jmol colors, stereo wedges, MCS highlighting, side-by-side pair rendering
Lenient SMILES parser	Best-effort recovery from malformed SMILES
N-MCS	Multi-molecule MCS with provenance tracking
Tautomer validation	`validateTautomerConsistency()` — proton conservation check
30 tautomer transforms	pKa-informed weights, 6 solvents, pH-sensitive, ring-chain tautomerism
8-phase 2D layout pipeline	Template match, ring-first, chain zig-zag, force-directed, overlap resolution, crossing reduction, canonical orientation, bond normalisation
Distance geometry 3D	Bounds matrix, double-centering, power iteration, force-field refinement
40+ scaffold templates	Pharmaceutical scaffolds, PAH, spiro, bridged (norbornane, adamantane)
Coordinate transforms	translate, rotate, scale, mirror, center, align, bounding box, RMSD
Force-directed layout	`forceDirectedLayout()` for bond-crossing minimisation
SMACOF stress majorisation	`stressMajorisation()` for optimal 2D embedding
Batch constrained MCS	`batchMCSConstrained()` multi-pair MCS with non-overlap atom exclusion
Two-phase crossing reduction	`reduceCrossings()` Phase 1: system-level flipping, Phase 2: individual ring flipping with fusion-atom pivots
computeSSSR / layoutSSSR	Clean SSSR APIs: minimum cycle basis and layout-ordered ring perception

File Formats

Format	Read	Write
SMILES	Java, C++	Java, C++
SMARTS	Java, C++	C++
MOL V2000	Java, C++	C++
SDF	Java, C++	—
Mol2, PDB, CML	Java	—

Release Downloads

Every release includes all platforms:

Download	Description
`SMSD.Pro-7.1.1.dmg`	macOS installer (Apple Silicon) — drag to Applications
`SMSD.Pro-7.1.1.msi`	Windows installer — next, next, finish
`smsd-pro_7.1.1_amd64.deb`	Linux installer — `sudo dpkg -i`
`smsd-7.1.1.jar`	Pure library JAR (Maven/Gradle dependency)
`smsd-7.1.1-jar-with-dependencies.jar`	Standalone CLI (just `java -jar`)
`smsd-cpp-7.1.1-headers.tar.gz`	C++ header-only library (unpack, `#include "smsd/smsd.hpp"`)
`pip install smsd`	Python package (PyPI — Linux, macOS, Windows wheels)

# Native installer — download .dmg / .msi / .deb, double-click, done

# CLI
java -jar smsd-7.1.1-jar-with-dependencies.jar --Q SMI --q "c1ccccc1" --T SMI --t "c1ccc(O)cc1" --json -

# Docker CLI
docker build -t smsd .
docker run --rm smsd --Q SMI --q "c1ccccc1" --T SMI --t "c1ccc(O)cc1" --json -

# Python
pip install smsd

Tests

1,518 tests passed across all platforms:

Suite	Tests	Coverage
Java	581 (+ 25 parity)	MCS, substructure, reactions, tautomers, stereochemistry, ring perception, hydrogen handling, cross-language ECFP/FCFP/canonical-SMILES parity
C++ core	114	MCS, substructure, precision chemistry, kekulisation, implicit H
C++ parser	542	SMILES, SMARTS, 1,003 diverse molecules, edge cases
C++ layout	42	2D/3D generation, transforms, overlap resolution, templates
C++ CIP	42	R/S, E/Z, pseudoasymmetric, sequence rules
Python	172 (+ 25 parity)	Full API coverage, hydrogen handling, charged species, golden-vector parity

AddressSanitizer: zero memory errors.

Documentation

Document	Description
Examples, How-To, and Cautions	Worked examples for every feature with cautions and performance tips
Python API Guide	Full Python API reference
Java Guide	Java API and CLI usage
C++ Guide	Header-only C++ integration
Release Notes	Current release
Changelog	Full versioned change history
How to Install	Build from source on all platforms
NOTICE	Attribution, trademark, and novel algorithm terms

License and Commercial Use

SMSD Pro is released under the Apache License 2.0 — free for any use, including commercial, with no fee, registration, or approval required.

Use Case	Permitted
Commercial products and services	Yes
Proprietary / closed-source software	Yes
SaaS platforms and cloud services	Yes
Pharmaceutical, biotech, agrochemical pipelines	Yes
Academic research and teaching	Yes
Internal corporate tools	Yes
Modify and redistribute	Yes

What you must do (Apache 2.0 Section 4): include the LICENSE and NOTICE files in your distribution, retain copyright notices, and state any changes you made to source files.

What you must not do: use "SMSD", "SMSD Pro", or BioInception trademarks to endorse your product without permission (see NOTICE for trademark terms).

Full details: LICENSE | NOTICE

Citation

If you use SMSD Pro in your research, please cite the following paper describing the tautomer-aware MCS engine:

Rahman SA. SMSD Pro: Tautomer-Aware Maximum Common Substructure Search. ChemRxiv, 2025. DOI: 10.26434/chemrxiv.15001534

For the original SMSD toolkit, please also cite:

Rahman SA, Bashton M, Holliday GL, Schrader R, Thornton JM. Small Molecule Subgraph Detector (SMSD) toolkit. Journal of Cheminformatics, 1:12, 2009. DOI: 10.1186/1758-2946-1-12

GitHub renders a "Cite this repository" button from CITATION.cff.

Author

Syed Asad Rahman — BioInception PVT LTD

License

Apache License 2.0 — see LICENSE and NOTICE

SMSD Pro is developed at BioInception and distributed under Apache License 2.0. Commercial use and redistribution are allowed, subject to the license and notice requirements.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
cpp		cpp
docs		docs
icons		icons
python		python
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
Dockerfile		Dockerfile
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
pom.xml		pom.xml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

SMSD Pro

Dalke Nearest-Neighbor MCS Benchmark (1,000 pairs)

Guides and References

Molfile Support

Install

Java (Maven)

Java (Download JAR)

Python (PyPI)

Java API

Python — Advanced Features

Python — MCS Variants & Batch Operations

Publication-Quality Depiction (ACS 1996 Standard)

C++ (Header-Only)

Build from Source

Docker

Benchmarks

MCS Performance (Python)

Substructure Performance (Java)

External Benchmark Datasets

Algorithms

MCS Engine

MCS Variants

Substructure Search (VF2++)

Ring Perception

Chemistry Options

Platform & GPU Support

Performance Caching

Additional Tools

File Formats

Release Downloads

Tests

Documentation

License and Commercial Use

Citation

Author

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 11

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages