Substructure & MCS Search for Chemical Graphs
SMSD Pro provides exact substructure search and maximum common substructure (MCS) search for chemical graphs. It is available for Java, C++ (header-only), and Python. Optional GPU paths are available for CUDA and Apple Metal builds.
Version 7.1.1 is a bug-fix patch on top of 7.1.0. It delivers
cross-language ECFP/FCFP fingerprint parity between Java, C++, and
Python, fixes a canonical SMILES writer corner case, restores the
correct semantics of MatchResult.overlapCoefficient, and fixes a
Kekulization edge case so the canonical SMILES output round-trips
cleanly in downstream readers for pyrrole-type aromatic nitrogen.
No new features and no public API breakage for existing callers.
Builds on the 7.1.0 unified Python API (find_mcs, find_substructure).
We benchmark on the community-standard Dalke NN dataset (1,000 high-similarity ChEMBL pairs) — the same dataset widely used by RDKit, CDK, and the academic MCS literature. Identical SMILES input, same 10 s timeout, same process, same machine. We gratefully acknowledge Andrew Dalke's foundational work on MCS benchmarking.
| Metric | SMSD Pro 7.1.1 | RDKit FindMCS 2026.03 |
|---|---|---|
| Total time | 40 s | 213 s |
| Median time | 0.6 ms | 0.4 ms |
| Mean MCS size | 25.8 atoms | 25.0 atoms |
| Timeouts | 0 | 8 |
| Larger-MCS wins | 211 (21 %) | 29 (3 %) |
SMSD Pro's adaptive multi-strategy native engine complements RDKit's well-proven VF2-based approach. Both engines are excellent; SMSD tends to find slightly larger common substructures on hard pairs while RDKit offers superb median-case latency. We recommend choosing based on your workload: SMSD for coverage-critical applications (reaction mapping, SAR), RDKit for high-throughput screening where median speed dominates.
Full benchmark suite and reproduction scripts in benchmarks/.
| Document | Description |
|---|---|
| Examples, How-To, and Cautions | Worked examples for every feature with cautions and performance tips |
| Python API Guide | Full Python API reference with code examples |
| Java Guide | Java API and CLI usage |
| C++ Guide | Header-only C++ integration |
| Release Notes | What's new in this release |
| How to Install | Build from source on all platforms |
| Changelog | Full versioned change history |
V2000 and V3000 core graph round-trip, names/comments, SDF properties, charges,
isotopes, atom classes/maps, R# plus M RGP, and basic stereo flags.
Copyright (c) 2018-2026 Syed Asad Rahman — BioInception PVT LTD
<dependency>
<groupId>com.bioinceptionlabs</groupId>
<artifactId>smsd</artifactId>
<version>7.1.1</version>
</dependency>curl -LO https://github.com/asad/SMSD/releases/download/v7.1.1/smsd-7.1.1-jar-with-dependencies.jar
java -jar smsd-7.1.1-jar-with-dependencies.jar \
--Q SMI --q "c1ccccc1" --T SMI --t "c1ccc(O)cc1" --json -Python (PyPI)
pip install smsdSupported CPython versions: 3.10 through 3.13.
Wheels available for Linux (x86_64, aarch64), macOS (arm64, x86_64), and Windows (x64).
CPU execution is the default path. CUDA and Metal acceleration are optional.
RDKit and Open Babel are optional interop layers.
import smsd
result = smsd.find_substructure("c1ccccc1", "c1ccc(O)cc1")
mcs = smsd.find_mcs("c1ccccc1", "c1ccc2ccccc2c1")
# Tautomer-aware MCS
mcs = smsd.find_mcs("CC(=O)C", "CC(O)=C", tautomer_aware=True)
# Prefer rare heteroatoms (S, P, Se) in MCS scoring
mcs = smsd.find_mcs("C[S+](C)CCC(N)C(=O)O", "SCCC(N)C(=O)O",
prefer_rare_heteroatoms=True)
# Similarity upper bound (fast pre-filter)
sim = smsd.similarity("c1ccccc1", "c1ccc(O)cc1")
fp = smsd.fingerprint("c1ccccc1", kind="mcs")
# Circular fingerprint (ECFP4 equivalent)
ecfp4 = smsd.fingerprint_from_smiles("c1ccccc1", radius=2, fp_size=2048)import com.bioinception.smsd.core.*;
SMSD smsd = new SMSD(mol1, mol2, new ChemOptions());
boolean isSub = smsd.isSubstructure();
var mcs = smsd.findMCS();
// CIP stereo assignment (Rules 1-5, including pseudoasymmetric r/s)
Map<Integer, Character> stereo = CIPAssigner.assignRS(g);
Map<Long, Character> ez = CIPAssigner.assignEZ(g);
// Batch MCS with non-overlap constraints
var mappings = SearchEngine.batchMCSConstrained(queries, targets, new ChemOptions(), 10_000);import smsd
# --- Structured MCS Result ---
result = smsd.mcs_result("c1ccccc1", "c1ccc(O)cc1")
print(result.size) # 6
print(result.overlapCoefficient) # 0.857 (overlap coefficient)
print(result.mcs_smiles) # "c1ccccc1"
print(result.mapping) # {0: 0, 1: 1, ...}
# --- Works with any input type ---
# SMILES strings
mcs = smsd.find_mcs("c1ccccc1", "c1ccc(O)cc1")
# MolGraph objects (pre-parsed, fastest for batch)
g1 = smsd.parse_smiles("c1ccccc1")
g2 = smsd.parse_smiles("c1ccc(O)cc1")
mcs = smsd.find_mcs(g1, g2)
# Native Mol objects (auto-detected, indices returned in native ordering)
# from rdkit import Chem
# mcs = smsd.find_mcs(Chem.MolFromSmiles("c1ccccc1"), Chem.MolFromSmiles("c1ccc(O)cc1"))
# --- Fingerprints ---
g = smsd.parse_smiles("c1ccccc1")
ecfp4 = smsd.circular_fingerprint(g, radius=2, fp_size=2048)
fcfp4 = smsd.circular_fingerprint(g, radius=2, fp_size=2048, mode="fcfp")
counts = smsd.circular_fingerprint_counts(g, radius=2, fp_size=2048)
torsion = smsd.topological_torsion("c1ccccc1", fp_size=2048)
tan = smsd.overlap_coefficient(ecfp4, ecfp4)
# --- 2D Layout ---
g = smsd.parse_smiles("c1ccc2c(c1)cc1ccccc1c2") # phenanthrene
coords = smsd.generate_coords_2d(g)
_, coords = smsd.force_directed_layout(g, coords, max_iter=500, target_bond_length=1.5)
_, coords = smsd.stress_majorisation(g, coords, max_iter=300)
crossings = smsd.reduce_crossings(g, coords, max_iter=2000)import smsd
# --- All MCS variants ---
mcs = smsd.find_mcs("c1ccccc1", "c1ccc(O)cc1") # Connected MCS (default)
mcs = smsd.find_mcs("c1ccccc1", "c1ccc(O)cc1", connected_only=False) # Disconnected MCS
mcs = smsd.find_mcs("c1ccccc1", "c1ccc(O)cc1", induced=True) # Induced MCS
mcs = smsd.find_mcs("c1ccccc1", "c1ccc(O)cc1", maximize_bonds=True) # Edge MCS (MCES)
# Find top-N distinct MCS solutions
all_mcs = smsd.find_mcs("c1ccccc1", "c1ccc(O)cc1", max_results=5)
# SMARTS-based MCS
mcs = smsd.find_mcs_smarts("[#6]~[#7]", "c1ccc(N)cc1")
# Scaffold MCS (Murcko framework)
scaffold = smsd.find_scaffold_mcs(
smsd.parse_smiles("CC(=O)Oc1ccccc1C(=O)O"),
smsd.parse_smiles("Oc1ccccc1C(=O)O")
)
# R-group decomposition
rgroups = smsd.decompose_r_groups("c1ccccc1", ["c1ccc(O)cc1", "c1ccc(N)cc1"])
# --- Substructure Search ---
hit = smsd.find_substructure("c1ccccc1", "c1ccc(O)cc1")
all_matches = smsd.find_substructure("c1ccccc1", "c1ccc(O)cc1", max_results=10)
# SMARTS pattern matching
matches = smsd.smarts_match("[OH]", smsd.parse_smiles("c1ccc(O)cc1"))
# --- Similarity & Screening ---
sim = smsd.overlap_coefficient(
smsd.circular_fingerprint(smsd.parse_smiles("CCO"), radius=2),
smsd.circular_fingerprint(smsd.parse_smiles("CCCO"), radius=2)
)
dice = smsd.dice(
smsd.circular_fingerprint_counts(smsd.parse_smiles("CCO"), radius=2),
smsd.circular_fingerprint_counts(smsd.parse_smiles("CCCO"), radius=2)
)
# --- Chemistry Options ---
# Tautomer-aware with solvent and pH
mcs = smsd.find_mcs("CC(=O)C", "CC(O)=C", tautomer_aware=True)
# Loose bond matching (FMCS-style)
mcs = smsd.find_mcs("c1ccccc1", "C1CCCCC1", match_bond_order="loose")
# --- Canonical SMILES ---
# v7.1.1: canonical_smiles() and to_smiles() accept a SMILES string OR a MolGraph.
# Output is byte-identical across the Java, C++, and Python engines.
smi = smsd.canonical_smiles("OC(=O)c1ccccc1") # from SMILES string
smi = smsd.to_smiles(smsd.parse_smiles("OC(=O)c1ccccc1")) # from MolGraph
mcs_smi = smsd.mcs_to_smiles(g1, mapping) # extract MCS as SMILES
# --- CIP Stereo Assignment ---
g = smsd.parse_smiles("N[C@@H](C)C(=O)O") # L-alanine
stereo = smsd.assign_rs(g) # {1: 'S'}
ez = smsd.assign_ez(smsd.parse_smiles("C/C=C/C")) # E-2-butene
# --- Native MolGraph I/O ---
g = smsd.parse_smiles("c1ccccc1")
g = smsd.read_molfile("molecule.mol")
mol_block = smsd.write_mol_block(g)
v3000 = smsd.write_mol_block_v3000(g)
smsd.write_molfile(g, "molecule_out.mol", v3000=True)
smsd.export_sdf([g1, g2], "output.sdf")Zero-dependency SVG renderer — the same specification used by Nature, Science, JACS, and Springer journals. See Examples for full usage guide.
import smsd
# Render any molecule as publication-quality SVG
svg = smsd.depict_svg("CC(=O)Oc1ccccc1C(=O)O") # aspirin
smsd.save_svg(svg, "aspirin.svg")
# MCS comparison — side-by-side with highlighted matching atoms
mol1 = smsd.parse_smiles("c1ccccc1")
mol2 = smsd.parse_smiles("c1ccc(O)cc1")
mapping = smsd.find_mcs(mol1, mol2)
svg = smsd.depict_pair(mol1, mol2, mapping)
smsd.save_svg(svg, "mcs_comparison.svg")
# Substructure highlighting
svg = smsd.depict_mapping(mol2, mapping)
# Custom styling (all ACS proportions auto-scale from bond_length)
svg = smsd.depict_svg("Cn1cnc2c1c(=O)n(c(=O)n2C)C", # caffeine
bond_length=50, width=600, height=400)
# Export to SDF file
mols = [smsd.parse_smiles(s) for s in ["CCO", "c1ccccc1", "CC(=O)O"]]
smsd.export_sdf(mols, "output.sdf")Features: skeletal formula, Jmol/CPK element colors, asymmetric double bonds, wedge/dash stereo, H-count subscripts, charge superscripts, bond-to-label clipping, aromatic inner circles, atom map numbers.
git clone https://github.com/asad/SMSD.git
# Add SMSD/cpp/include to your include path — no other dependencies needed#include "smsd/smsd.hpp"
auto mol1 = smsd::parseSMILES("c1ccccc1");
auto mol2 = smsd::parseSMILES("c1ccc(O)cc1");
bool isSub = smsd::isSubstructure(mol1, mol2, smsd::ChemOptions{});
auto mcs = smsd::findMCS(mol1, mol2, smsd::ChemOptions{}, smsd::MCSOptions{});
// Batch MCS with non-overlap constraints
auto mappings = smsd::batchMCSConstrained(queries, targets, smsd::ChemOptions{});git clone https://github.com/asad/SMSD.git
cd SMSD
# Java
mvn -U clean package
# C++
mkdir cpp/build && cd cpp/build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)
# Python
cd python && pip install -e .docker build -t smsd .
docker run --rm smsd --Q SMI --q "c1ccccc1" --T SMI --t "c1ccc(O)cc1" --json -Representative pairs from the checked-in Python benchmark results on the same
machine and in the same Python process.
Full data: benchmarks/results_python.tsv
For the maintained local core leaderboard, run python3 benchmarks/benchmark_leaderboard.py --mode core --compare-mode strict.
Use the mode-matched core leaderboard for current cross-tool comparisons.
| Pair | Category | SMSD (ms) | MCS Size |
|---|---|---|---|
| Cubane (self) | Cage | 0.003 | 8 |
| Coronene (self) | PAH | 0.006 | 24 |
| NAD / NADH | Cofactor | 0.012 | 44 |
| Caffeine / Theophylline | N-methyl diff | 0.017 | 13 |
| Morphine / Codeine | Alkaloid | 0.079 | 20 |
| Ibuprofen / Naproxen | NSAID | 0.070 | 15 |
| ATP / ADP | Nucleotide | 0.148 | 27 |
| PEG-12 / PEG-16 | Polymer | 0.039 | 40 |
| Paclitaxel / Docetaxel | Taxane | 1,691 | 56 |
Current maintained cached Java core summary: 28/28 hit agreement and 28/28 favourable timings on the local curated corpus.
Run python3 benchmarks/benchmark_leaderboard.py --mode core --compare-mode strict
to refresh the maintained local summary.
Community-standard datasets for reproducible evaluation, stored in benchmarks/data/:
| Dataset | Pairs/Patterns | Source | Purpose |
|---|---|---|---|
| Tautobase (Chodera subset) | 468 tautomer pairs | Wahl & Sander 2020 | Tautomer-aware MCS validation |
| Tautobase (full SMIRKS) | 1,680 pairs | Wahl & Sander 2020 | Tautomer transform coverage |
| Ehrlich-Rarey SMARTS v2.0 | 1,400 patterns | Ehrlich & Rarey 2012 | Substructure search validation |
| Dalke-style random pairs | 1,000 pairs | MoleculeNet drug collections | Low-similarity MCS scaling |
| Dalke-style NN pairs | 1,000 pairs | MoleculeNet drug collections | High-similarity MCS quality |
| Stress pairs | 12 pairs | Duesbury et al. 2017 | Timeout/robustness |
| Molecule pool | 5,590 SMILES | MoleculeNet (BBBP, SIDER, ClinTox, BACE) | Pair generation source |
# Run external benchmarks (Java)
mvn test -Dtest=ExternalBenchmarkTest -Dbenchmark=true
# Run external benchmarks (Python)
SMSD_BENCHMARK=1 pytest python/tests/test_external_benchmarks.py -v -s
# Regenerate Dalke-style pairs (requires RDKit)
python benchmarks/generate_dalke_pairs.pySMSD Pro ships an adaptive multi-strategy MCS engine that selects the best technique for each input pair. Implementation details are subject to change between minor releases.
Public algorithmic foundations the engine builds on (citations only, not the SMSD pipeline itself):
| Foundation | Reference |
|---|---|
| Partition-refinement clique search | McCreesh, Prosser & Trimble, J. Artif. Intell. Res. 2017 |
| Edge-growth backtracking | McGregor, Software: Practice & Experience 1982 |
| Maximum-clique enumeration | Bron & Kerbosch, Comm. ACM 1973; Tomita et al. 2006 |
| Subgraph isomorphism (VF2++) | Juttner & Madarasi, Discrete Appl. Math. 2018 |
| Ring perception | Vismara, J. Chem. Inf. Comput. Sci. 1997 |
| Variant | Flag |
|---|---|
| MCIS (induced) | induced=true |
| MCCS (connected) | default |
| MCES (edge subgraph) | maximizeBonds=true |
| dMCS (disconnected) | disconnectedMCS=true |
| N-MCS (multi-molecule) | findNMCS() |
| Weighted MCS | atomWeights |
| Scaffold MCS | findScaffoldMCS() |
| Tautomer-aware MCS | ChemOptions.tautomerProfile() |
VF2++ (Juttner & Madarasi 2018) matcher with optional GPU-accelerated domain initialization (CUDA + Metal). Implementation details are subject to change between minor releases.
Horton's candidate generation + 2-phase GF(2) elimination (Vismara 1997) for relevant cycles, orbit-based grouping for Unique Ring Families (URFs).
| Output | Description |
|---|---|
| SSSR / MCB | Smallest Set of Smallest Rings |
| RCB | Relevant Cycle Basis |
| URF | Unique Ring Families (automorphism orbit grouping) |
| Option | Values |
|---|---|
| Chirality | R/S tetrahedral, E/Z double bond |
| Isotope | matchIsotope=true |
| Tautomers | 30 transforms with pKa-informed weights (Sitzmann 2010, Dhaked & Nicklaus 2024) |
| Solvent | AQUEOUS, DMSO, METHANOL, CHLOROFORM, ACETONITRILE, DIETHYL_ETHER |
| Ring fusion | IGNORE / PERMISSIVE / STRICT |
| Bond order | STRICT / LOOSE / ANY |
| Aromaticity | STRICT / FLEXIBLE |
| Lenient SMILES | ParseOptions{.lenient=true} (C++) / ChemOptions.lenientSmiles (Java) |
Preset profiles: ChemOptions() (default), .tautomerProfile(), .fmcsProfile()
With the default chemistry profile, ringMatchesRingOnly=true enforces ring/non-ring
parity for matched atoms and bonds in both directions. Use .fmcsProfile() when you
explicitly want loose FMCS-style topology where ring atoms may map to chain atoms and
partial ring fragments are accepted.
Solvent-aware tautomers (Tier 2 pKa): opts.withSolvent(Solvent.DMSO) adjusts tautomer equilibrium weights for non-aqueous environments.
| Platform | CPU | GPU |
|---|---|---|
| macOS (Apple Silicon) | OpenMP | Metal (zero-copy unified memory) |
| Linux | OpenMP | CUDA |
| Windows | OpenMP | CUDA |
| Any (no GPU) | OpenMP | Automatic CPU fallback |
GPU acceleration covers RASCAL batch screening, Tanimoto clustering, and substructure domain initialization. Recursive matching runs on CPU. Dispatch: CUDA -> Metal -> OpenMP -> sequential.
SMSD employs multi-level caching to eliminate redundant computation in batch and reaction workloads:
| Cache | Target | Benefit |
|---|---|---|
| MolGraph identity cache | Molecule object conversion | Same molecule reused across 6-18 calls per reaction pair |
| Domain space cache | VF2++ atom compatibility matrix | Avoids O(Nq*Nt) rebuild on repeated queries |
| ECFP/FCFP fingerprint cache | Default-parameter fingerprints | 337x speedup on repeated fingerprint calls |
| Pharmacophore features cache | FCFP atom invariants | Eliminates O(n*degree^2) per FCFP call |
| C++ GraphBuilder compat matrix | All MCS strategies | Pre-computed once, shared across algorithms |
Call SearchEngine.clearMolGraphCache() (Java) or reuse MolGraph instances (C++/Python) between batches.
| Tool | Description |
|---|---|
| CIP R/S/E/Z assignment | Full digraph-based stereo descriptors (IUPAC 2013 Rules 1-5) including Rule 3 (Z > E), like/unlike pairing, and pseudoasymmetric r/s |
| Circular fingerprint (ECFP/FCFP) | Tautomer-aware Morgan/ECFP with configurable radius (-1 = whole molecule) |
| Count-based ECFP/FCFP | ecfpCounts() / fcfpCounts() — superior to binary for ML |
| Topological Torsion fingerprint | 4-atom path with atom typing (SOTA on peptide benchmarks) |
| Path fingerprint | Graph-aware, tautomer-invariant path enumeration |
| MCS fingerprint | MCS-aware, auto-sized |
| Similarity metrics | Tanimoto, Dice, Cosine, Soergel (binary + count-vector) |
| Fingerprint formats | toBitSet(), toHex(), toBinaryString(), fromBitSet(), fromHex() |
| MCS SMILES extraction | findMCSSMILES() — extract MCS as canonical SMILES |
| findAllMCS | Top-N MCS enumeration with canonical SMILES dedup |
| SMARTS-based MCS | findMCSSMARTS() — largest substructure matching a SMARTS pattern |
| R-group decomposition | decomposeRGroups() |
| MatchResult | Structured result: size, mapping, overlap coefficient, query/target atom counts |
| RASCAL screening | O(V+E) similarity upper bound |
| Canonical SMILES / SMARTS | deterministic, toolkit-independent (including X total connectivity) |
| Publication-quality SVG depiction | ACS 1996 standard renderer: skeletal formulas, Jmol colors, stereo wedges, MCS highlighting, side-by-side pair rendering |
| Lenient SMILES parser | Best-effort recovery from malformed SMILES |
| N-MCS | Multi-molecule MCS with provenance tracking |
| Tautomer validation | validateTautomerConsistency() — proton conservation check |
| 30 tautomer transforms | pKa-informed weights, 6 solvents, pH-sensitive, ring-chain tautomerism |
| 8-phase 2D layout pipeline | Template match, ring-first, chain zig-zag, force-directed, overlap resolution, crossing reduction, canonical orientation, bond normalisation |
| Distance geometry 3D | Bounds matrix, double-centering, power iteration, force-field refinement |
| 40+ scaffold templates | Pharmaceutical scaffolds, PAH, spiro, bridged (norbornane, adamantane) |
| Coordinate transforms | translate, rotate, scale, mirror, center, align, bounding box, RMSD |
| Force-directed layout | forceDirectedLayout() for bond-crossing minimisation |
| SMACOF stress majorisation | stressMajorisation() for optimal 2D embedding |
| Batch constrained MCS | batchMCSConstrained() multi-pair MCS with non-overlap atom exclusion |
| Two-phase crossing reduction | reduceCrossings() Phase 1: system-level flipping, Phase 2: individual ring flipping with fusion-atom pivots |
| computeSSSR / layoutSSSR | Clean SSSR APIs: minimum cycle basis and layout-ordered ring perception |
| Format | Read | Write |
|---|---|---|
| SMILES | Java, C++ | Java, C++ |
| SMARTS | Java, C++ | C++ |
| MOL V2000 | Java, C++ | C++ |
| SDF | Java, C++ | — |
| Mol2, PDB, CML | Java | — |
Every release includes all platforms:
| Download | Description |
|---|---|
SMSD.Pro-7.1.1.dmg |
macOS installer (Apple Silicon) — drag to Applications |
SMSD.Pro-7.1.1.msi |
Windows installer — next, next, finish |
smsd-pro_7.1.1_amd64.deb |
Linux installer — sudo dpkg -i |
smsd-7.1.1.jar |
Pure library JAR (Maven/Gradle dependency) |
smsd-7.1.1-jar-with-dependencies.jar |
Standalone CLI (just java -jar) |
smsd-cpp-7.1.1-headers.tar.gz |
C++ header-only library (unpack, #include "smsd/smsd.hpp") |
pip install smsd |
Python package (PyPI — Linux, macOS, Windows wheels) |
# Native installer — download .dmg / .msi / .deb, double-click, done
# CLI
java -jar smsd-7.1.1-jar-with-dependencies.jar --Q SMI --q "c1ccccc1" --T SMI --t "c1ccc(O)cc1" --json -
# Docker CLI
docker build -t smsd .
docker run --rm smsd --Q SMI --q "c1ccccc1" --T SMI --t "c1ccc(O)cc1" --json -
# Python
pip install smsd1,518 tests passed across all platforms:
| Suite | Tests | Coverage |
|---|---|---|
| Java | 581 (+ 25 parity) | MCS, substructure, reactions, tautomers, stereochemistry, ring perception, hydrogen handling, cross-language ECFP/FCFP/canonical-SMILES parity |
| C++ core | 114 | MCS, substructure, precision chemistry, kekulisation, implicit H |
| C++ parser | 542 | SMILES, SMARTS, 1,003 diverse molecules, edge cases |
| C++ layout | 42 | 2D/3D generation, transforms, overlap resolution, templates |
| C++ CIP | 42 | R/S, E/Z, pseudoasymmetric, sequence rules |
| Python | 172 (+ 25 parity) | Full API coverage, hydrogen handling, charged species, golden-vector parity |
AddressSanitizer: zero memory errors.
| Document | Description |
|---|---|
| Examples, How-To, and Cautions | Worked examples for every feature with cautions and performance tips |
| Python API Guide | Full Python API reference |
| Java Guide | Java API and CLI usage |
| C++ Guide | Header-only C++ integration |
| Release Notes | Current release |
| Changelog | Full versioned change history |
| How to Install | Build from source on all platforms |
| NOTICE | Attribution, trademark, and novel algorithm terms |
SMSD Pro is released under the Apache License 2.0 — free for any use, including commercial, with no fee, registration, or approval required.
| Use Case | Permitted |
|---|---|
| Commercial products and services | Yes |
| Proprietary / closed-source software | Yes |
| SaaS platforms and cloud services | Yes |
| Pharmaceutical, biotech, agrochemical pipelines | Yes |
| Academic research and teaching | Yes |
| Internal corporate tools | Yes |
| Modify and redistribute | Yes |
What you must do (Apache 2.0 Section 4): include the LICENSE and NOTICE files in your distribution, retain copyright notices, and state any changes you made to source files.
What you must not do: use "SMSD", "SMSD Pro", or BioInception trademarks to endorse your product without permission (see NOTICE for trademark terms).
Full details: LICENSE | NOTICE
If you use SMSD Pro in your research, please cite the following paper describing the tautomer-aware MCS engine:
Rahman SA. SMSD Pro: Tautomer-Aware Maximum Common Substructure Search. ChemRxiv, 2025. DOI: 10.26434/chemrxiv.15001534
For the original SMSD toolkit, please also cite:
Rahman SA, Bashton M, Holliday GL, Schrader R, Thornton JM. Small Molecule Subgraph Detector (SMSD) toolkit. Journal of Cheminformatics, 1:12, 2009. DOI: 10.1186/1758-2946-1-12
GitHub renders a "Cite this repository" button from CITATION.cff.
Syed Asad Rahman — BioInception PVT LTD
Copyright (c) 2018-2026 BioInception PVT LTD. Algorithm Copyright (c) 2009-2026 Syed Asad Rahman.
Apache License 2.0 — see LICENSE and NOTICE
SMSD Pro is developed at BioInception and distributed under Apache License 2.0. Commercial use and redistribution are allowed, subject to the license and notice requirements.