Skip to content

Conversation

@calvinp0
Copy link
Member

@calvinp0 calvinp0 commented Nov 27, 2025

Addition of CREST Adapter that complements the heuristic adapter.

This pull request adds support for the CREST conformer and transition state (TS) search method to the ARC project, along with several related improvements and code cleanups. The most important changes include integrating CREST as a TS search adapter, updating configuration and constants, and enhancing the heuristics TS search logic for better provenance tracking and code clarity.

CREST Integration:

  • Added CREST as a supported TS search method: updated JobEnum (arc/job/adapter.py), included CREST in the list of adapters and RMG family mapping, and registered it as a default incore adapter (arc/job/adapters/common.py, arc/job/adapters/ts/__init__.py). [1] [2] [3] [4]
  • Implemented a new test suite for CREST input generation (arc/job/adapters/ts/crest_test.py).
  • Added a Makefile target and installation script for CREST (Makefile). [1] [2]

Constants and Configuration:

  • Added the angstrom_to_bohr conversion constant to both Cython and Python constants modules (arc/constants.pxd, arc/constants.py). [1] [2] [3]

Heuristics TS Search Enhancements and Refactoring:

  • Refactored heuristics TS search logic to track and combine method provenance for TS guesses, allowing for more precise attribution when multiple methods contribute to a guess (arc/job/adapters/ts/heuristics.py). [1] [2] [3] [4]
  • Improved code readability and maintainability by reformatting imports and function calls, and clarifying data structures and comments in heuristics TS search (arc/job/adapters/ts/heuristics.py). [1] [2] [3] [4] [5] [6] [7]
    .

This comment was marked as resolved.

This comment was marked as resolved.

@calvinp0 calvinp0 force-pushed the crest_adapter branch 2 times, most recently from 3e88c36 to 7674f5f Compare February 2, 2026 09:49
@calvinp0 calvinp0 requested a review from Copilot February 2, 2026 12:10

This comment was marked as resolved.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

This comment was marked as resolved.

@calvinp0 calvinp0 force-pushed the crest_adapter branch 2 times, most recently from 9be935e to eab7647 Compare February 4, 2026 20:14
@calvinp0 calvinp0 requested a review from alongd February 4, 2026 21:30
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 26 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 434 to 441
def parse_version(folder_name):
"""
Parses the version from the folder name and returns a tuple for comparison.
Supports versions like: 3.0.2, v212, 2.1, 2
"""
version_regex = re.compile(r"(?:v?(\d+)(?:\.(\d+))?(?:\.(\d+))?)", re.IGNORECASE)
match = version_regex.search(folder_name)
if not match:
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_version() uses the re module, but arc/settings/settings.py doesn't import re. Since find_crest_executable() is executed at import time, this will raise a NameError during ARC startup. Add import re near the other imports (or avoid using re at import time).

Copilot uses AI. Check for mistakes.
Comment on lines 369 to 394
if cluster_soft in ["condor", "htcondor"]:
# HTCondor branch (kept for completeness – you can delete if you don't use it)
sub_job = submit_scripts["local"]["crest"]
format_params = {
"name": f"crest_{xyz_crest_int}",
"cpus": cpus,
"memory": int(SERVERS["local"].get("memory", 32.0) * 1024),
}
sub_job = sub_job.format(**format_params)

with open(
os.path.join(path, settings["submit_filenames"]["HTCondor"]), "w"
) as f:
f.write(sub_job)

crest_job = submit_scripts["local"]["crest_job"]
crest_job = crest_job.format(
path=path,
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crest_ts_conformer_search() expects submit_scripts["local"]["crest"] (and crest_job) templates to exist, but arc/settings/submit.py in this repo doesn't define any CREST entries. This will raise a KeyError the first time CREST is used unless every user provides a custom ~/.arc/submit.py. Add default CREST submit templates to arc/settings/submit.py (for PBS/HTCondor) or change the adapter to generate a minimal submit script internally without relying on missing submit_scripts keys.

Copilot uses AI. Check for mistakes.

if len(self.reactions) < 5:
successes = len([tsg for tsg in rxn.ts_species.ts_guesses if tsg.success and 'heuristics' in tsg.method])
successes = [tsg for tsg in rxn.ts_species.ts_guesses if tsg.success]
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The success summary counts all successful TS guesses on rxn.ts_species.ts_guesses, not just the ones produced/updated by the heuristics adapter. This can misreport results once other TS adapters also add guesses to the same ts_guesses list. Filter by heuristics provenance (e.g., "heuristics" in tsg.method_sources or tsg.method == "heuristics") when computing successes for the heuristics log message.

Suggested change
successes = [tsg for tsg in rxn.ts_species.ts_guesses if tsg.success]
successes = [
tsg
for tsg in rxn.ts_species.ts_guesses
if tsg.success
and (
(
getattr(tsg, "method_sources", None) is not None
and any(
isinstance(src, str) and src.lower() == "heuristics"
for src in tsg.method_sources
)
)
or (
isinstance(getattr(tsg, "method", None), str)
and tsg.method.lower() == "heuristics"
)
)
]

Copilot uses AI. Check for mistakes.
Comment on lines +58 to +110
"""
Reorder an XYZ string between ``ATOM X Y Z`` and ``X Y Z ATOM`` with optional unit conversion.

Args:
xyz_str (str): The string xyz format to be converted.
reverse_atoms (bool, optional): Whether to reverse the atoms and coordinates.
units (str, optional): Units of the input coordinates ('angstrom' or 'bohr').
convert_to (str, optional): The units to convert to (either 'angstrom' or 'bohr').
project_directory (str, optional): The path to the project directory.

Raises:
ConverterError: If xyz_str is not a string or does not have four space-separated entries per non-empty line.

Returns: str
The converted string xyz format.
"""
if isinstance(xyz_str, tuple):
xyz_str = '\n'.join(xyz_str)
if isinstance(xyz_str, list):
xyz_str = '\n'.join(xyz_str)
if not isinstance(xyz_str, str):
raise ConverterError(f'Expected a string input, got {type(xyz_str)}')
if project_directory is not None:
file_path = os.path.join(project_directory, xyz_str)
if os.path.isfile(file_path):
with open(file_path, 'r') as f:
xyz_str = f.read()


if units.lower() == 'angstrom' and convert_to.lower() == 'angstrom':
conversion_factor = 1
elif units.lower() == 'bohr' and convert_to.lower() == 'bohr':
conversion_factor = 1
elif units.lower() == 'angstrom' and convert_to.lower() == 'bohr':
conversion_factor = constants.angstrom_to_bohr
elif units.lower() == 'bohr' and convert_to.lower() == 'angstrom':
conversion_factor = constants.bohr_to_angstrom
else:
raise ConverterError("Invalid target unit. Choose 'angstrom' or 'bohr'.")

processed_lines = list()
# Split the string into lines
lxyz = xyz_str.strip().splitlines()
# Determine whether the atom label appears first or last in each line
first_line_tokens = lxyz[0].strip().split()
atom_first = not is_str_float(first_line_tokens[0])

for item in lxyz:
parts = item.strip().split()

if len(parts) != 4:
raise ConverterError(f'xyz_str has an incorrect format, expected 4 elements in each line, '
f'got "{item}" in:\n{xyz_str}')
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reorder_xyz_string() claims to validate “four space-separated entries per non-empty line”, but the implementation iterates over all splitlines() results and will raise on blank lines (since len(parts) == 0). Either filter out empty/whitespace-only lines before parsing, or update the docstring/error message to match the stricter behavior. Also, the raised message “Invalid target unit” is misleading when units is invalid (not just convert_to).

Copilot uses AI. Check for mistakes.
Copy link
Member

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @calvinp0 for this awesome contribution! I think it strengthens our heuristics adapter significantly.
Please see my comments, and some comments from copilot (like unused imports)

Important: Please add a description of the H-Abstraction heuristics (briefly) and of the CREST adapter (briefly) to ARC's Docs in this file.


MAX_CHECK_INTERVAL_SECONDS = 100

try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you turn these into single lines such as: CREST_PATH = settings.get("CREST_PATH", None)?

continue
if not crest_available():
logger.warning('CREST is not available. Skipping CREST TS search.')
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace with break

from arc.plotter import save_geo
from arc.species.converter import reorder_xyz_string, str_to_xyz, xyz_to_dmat, xyz_to_str
from arc.species.species import ARCSpecies, TSGuess
from arc.job.adapters.ts.heuristics import h_abstraction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you place this import a bot higher, next to the other job imports?

tsg.tic()

crest_job_dirs = []
xyz_guesses = h_abstraction(reaction=rxn,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the CREST adapter is a wrapper of h_abstraction only (as also defined by ts_adapters_by_rmg_family).
I wonder whether it should wrap other TS search methods, and whether users can elegantly ask ARC to apply CREST to their chosen method/adapter (or even a user guess for the TS?).

At least, if it "complements the heuristics adapter", then let's make it call a switch/hub function in heuristics.py that maps to the specific heuristic function by the reaction family (h_abstraction or hydrolysis, more to come in the future). Then CREST can wrap that hub function. Users will specify 'crest' as the TS search adapter, it it will automatically do 'appropriate-heuristics-method' + CREST.
I think we'll just need a function in heuristics that returns pairs of atoms for CREST to constrain.

If you think there's a more elegant way to request a CREST run for any TS search adapter in ARC, then that would be welcomed. Let's discuss this.

if reactions is None:
raise ValueError('Cannot execute TS CREST without ARCReaction object(s).')

dihedral_increment = dihedral_increment or 30
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it a to level constant, like in heuristics: DIHEDRAL_INCREMENT = 30, or even better, just import DIHEDRAL_INCREMENT from heuristics for consistency

install-xtb:
bash $(DEVTOOLS_DIR)/install_xtb.sh

install-crest:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we add install-crest into install-all as well?

for tsg in self.ts_guesses:
for cluster_tsg in cluster_tsgs:
if cluster_tsg.almost_equal_tsgs(tsg):
logger.info(f"Similar TSGuesses found: {tsg.index} is similar to {cluster_tsg.index}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this or make it a debug comment


# TS methods to try when appropriate for a reaction (other than user guesses which are always allowed):
ts_adapters = ['heuristics', 'AutoTST', 'GCN', 'xtb_gsm']
ts_adapters = ['heuristics', 'AutoTST', 'GCN', 'xtb_gsm', 'crest']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should wither use CREST or use H_Abstraction (this is an example-like settings, no?)




def parse_version(folder_name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should house these funcs somewhere else, not in settings.py
maybe try in imports.py?

Are these funcs testable?

arc/constants.py Outdated
epsilon_0 = 8.8541878128

bohr_to_angstrom = 0.529177
angstrom_to_bohr = 1.8897259886
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need it? maybe in the code we can just do 1 / bohr_to_angstrom instead of 1* angstrom_to_bohr
If you feel that you need it for semantics, than maybe consider defining angstrom_to_bohr = 1/ bohr_to_angstrom here, for consistency?

Adds the angstrom to bohr conversion factor to the constants module.

This facilitates easier conversions between these units within the codebase,
enhancing usability and reducing potential errors.
import os
import string
import sys
import shutil

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'shutil' is not used.

Copilot Autofix

AI about 19 hours ago

In general, an unused import should be removed from the file to avoid unnecessary dependencies and keep the code clean. Since we must not change existing functionality, we only remove the import when there is no evidence of it being used.

For arc/settings/settings.py, the best fix is to delete the import shutil line (line 12 in the snippet) while leaving all other imports intact. No additional methods, imports, or definitions are needed. The only change is to edit arc/settings/settings.py and remove the single unused import statement.

Suggested changeset 1
arc/settings/settings.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/arc/settings/settings.py b/arc/settings/settings.py
--- a/arc/settings/settings.py
+++ b/arc/settings/settings.py
@@ -9,7 +9,6 @@
 import os
 import string
 import sys
-import shutil
 from arc.settings.crest import (
     find_crest_executable,
     find_highest_version_in_directory,
EOF
@@ -9,7 +9,6 @@
import os
import string
import sys
import shutil
from arc.settings.crest import (
find_crest_executable,
find_highest_version_in_directory,
Copilot is powered by AI and may make mistakes. Always verify output.
Adds CREST settings and installation to ARC for transition state search.

This commit introduces necessary files and updates to enable CREST within the ARC framework, enhancing its capabilities for exploring reaction pathways.
It includes:
- A script to install CREST via conda.
- A module for locating the CREST executable and setting up the environment.
- Integration of CREST into the settings to allow its use as a TS adapter.
Adds a CREST adapter for transition state (TS) conformer searches, leveraging heuristics-generated guesses to find suitable TS structures. This facilitates more comprehensive TS exploration, particularly for reaction families supported by heuristics but potentially refined through CREST's conformer searching capabilities.

Also introduces a TS seed hub, which centralizes requests to base TS-search adapters, and provides wrapper adapters (e.g., CREST) family-specific constraints for a seed.
Adds documentation for the CREST adapter, including a guide for extending CREST-based TS workflows and minimal usage examples.

The documentation covers current family support, external references, and extension instructions for adding new families to CREST or enabling CREST to wrap a new TS seed adapter.

Also includes seed schema contract.
Introduces a `method_sources` attribute to the TSGuess class.

This attribute stores all methods that produced an equivalent xyz guess.
Normalizes the method sources to a unique, ordered, lowercase list, ensuring consistency and avoiding duplicates.
This allows for better tracking of the origins of TS guesses and simplifies the clustering logic.
Adds 'crest' as a valid option for transition state search methods.

Adds a new job to the test suite for coverage of the wall time exceeded functionality.

Updates testing path for wall_exceeded fixture

Updates the path used to locate the `wall_exceeded.txt`
fixture in the `TestJobAdapter` test class. This ensures
that the test can correctly access the fixture file,
regardless of the execution environment.
Includes CREST as a valid TS adapter option for H_Abstraction reactions.
This allows users to utilize CREST for transition state searches,
expanding the available methods.
Adds a function to reorder and convert XYZ strings between
``ATOM X Y Z`` and ``X Y Z ATOM`` formats, with optional unit conversion.

Also adds a backwards-compatible wrapper with a deprecation warning.
Modifies restart tests to generate unique project names when running in parallel using pytest-xdist. This avoids collisions during cleanup of project directories.

Updates restart tests to use ARC_TESTING_PATH

Modifies restart tests to utilize the ARC_TESTING_PATH
constant for specifying test directories. This change ensures
consistency and simplifies path management within the testing
framework.
Includes setuptools as a dependency in the environment file.
This ensures that the package installation process has access to required tools and resources.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants