Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
85 commits
Select commit Hold shift + click to select a range
fdceeba
initial commit
ArthurDeclercq Feb 24, 2024
5374ed8
finalize ms2 feature generation
ArthurDeclercq Feb 25, 2024
60207a3
add rustyms
ArthurDeclercq Feb 25, 2024
ae39844
remove exit statement fixed IM required value
ArthurDeclercq Feb 26, 2024
9b98c4d
change logger.info to debug
ArthurDeclercq Feb 26, 2024
5e45756
added profile decorator to get timings for functions
ArthurDeclercq Feb 26, 2024
304777c
removed profile as standard rescore debug statement
ArthurDeclercq Feb 26, 2024
95ee475
added new basic features
ArthurDeclercq Feb 26, 2024
73f4573
fixes for ms2 feature generator, removed multiprocessing
ArthurDeclercq Feb 26, 2024
947233e
return empty list on parsing error with rustyms, removed multiprocessing
ArthurDeclercq Feb 28, 2024
24ce565
add deeplc_calibration psm set
ArthurDeclercq Mar 15, 2024
114b006
Merge branch 'timsRescore' of https://github.com/compomics/ms2rescore…
ArthurDeclercq Apr 17, 2024
33c38b0
remove unused import
ArthurDeclercq Apr 17, 2024
40425c7
Merge branch 'timsRescore' of https://github.com/compomics/ms2rescore…
ArthurDeclercq Apr 19, 2024
b810b8c
Merge branch 'timsRescore' of https://github.com/compomics/ms2rescore…
ArthurDeclercq Apr 19, 2024
69b5d1a
Merge tag 'main' of https://github.com/compomics/ms2rescore into spec…
ArthurDeclercq Aug 16, 2024
6e2d102
Merge pull request #177 from compomics/main
ArthurDeclercq Aug 16, 2024
11fdc51
integrate mumble into ms2branch
ArthurDeclercq Aug 21, 2024
3140c44
Merge remote-tracking branch 'origin/main' into spectrum-feature-gene…
ArthurDeclercq Sep 23, 2024
883169a
temp removal of sage features before rescoring
ArthurDeclercq Sep 27, 2024
97865e7
Merge branch 'main' of https://github.com/compomics/ms2rescore into s…
ArthurDeclercq Sep 27, 2024
da39ae8
remove psm_file features when rescoring with mumble
ArthurDeclercq Nov 8, 2024
37fff28
linting
SamvPy Nov 19, 2024
e8b59f3
add hyperscore calculation
SamvPy Nov 19, 2024
c51cd34
calibration fixes
ArthurDeclercq Nov 21, 2024
295e37f
changes for mumble implementation
ArthurDeclercq Nov 21, 2024
909860d
change openms peptide formatting
SamvPy Nov 22, 2024
c5902c2
add mumble psm filtering functionality
ArthurDeclercq Nov 22, 2024
6eaceb2
Merge branch 'spectrum-feature-generator' of https://github.com/compo…
ArthurDeclercq Nov 22, 2024
5ce55f5
remove pyopenms dependency for hyperscore calculation
SamvPy Nov 22, 2024
986c5f6
fix spectrum_id accession
ArthurDeclercq Nov 22, 2024
bbecf6a
Merge branch 'spectrum-feature-generator' of https://github.com/compo…
ArthurDeclercq Nov 22, 2024
6fd6053
Merge remote-tracking branch 'origin/main' into spectrum-feature-gene…
paretje Jan 14, 2025
5333e46
remove unused imports
paretje Jan 17, 2025
dd2259f
remove unused import in deeplc feature generator
paretje Jan 17, 2025
d24ef30
add rustyms dependency
paretje Jan 17, 2025
21cafc7
drop rustyms requirement to 0.8.3
paretje Jan 17, 2025
ca9da7d
mumble related changes
ArthurDeclercq Jan 17, 2025
c5b6eb0
add mumble
paretje Jan 17, 2025
aee8ec7
update mumble to use user cache dir
paretje Jan 21, 2025
7ce56c2
bump im2deep dependency
paretje Jan 24, 2025
106ad8f
make mumble and rustyms optional dependancy
ArthurDeclercq Feb 14, 2025
72e2b71
Merge branch 'main' of https://github.com/compomics/ms2rescore into s…
ArthurDeclercq Jun 10, 2025
29aac8a
set defaults in mumble config
ArthurDeclercq Sep 23, 2025
487f661
fix rustyms 0.8.0 -> 0.10.0
SamvPy Dec 3, 2025
9a97ed1
Merge remote-tracking branch 'origin/main' into refactoring
ArthurDeclercq Dec 22, 2025
05078cd
moved maxquant features to ms2
ArthurDeclercq Dec 22, 2025
2011241
im2deep refactoring
ArthurDeclercq Dec 24, 2025
36750b4
ms2pip refactoring
ArthurDeclercq Dec 24, 2025
3091b0f
parsing spectra once and storing spectra objects
ArthurDeclercq Dec 24, 2025
5b3d4c4
directly operate on spectra objects instead of reacquiring them
ArthurDeclercq Dec 24, 2025
a3cbb1b
updated profiling
ArthurDeclercq Dec 24, 2025
a1df72d
removed maxquant generator from fg
ArthurDeclercq Dec 24, 2025
2c7a09b
changes to column names
ArthurDeclercq Jan 5, 2026
e855abf
changes to avoid out of memory error due to multiprocessing
ArthurDeclercq Jan 5, 2026
577df19
replace list with set to reduce lookup time to O(1)
ArthurDeclercq Jan 5, 2026
a80238b
remove unused imports
ArthurDeclercq Jan 12, 2026
abf66b4
migrate ms2 and ms2pip features to ms2rescore-rs
ArthurDeclercq Jan 12, 2026
a9108b9
reimplement deeplc feature calculation
ArthurDeclercq Jan 12, 2026
698dd5e
change logging
ArthurDeclercq Jan 12, 2026
862f9be
fix im2deep fg
ArthurDeclercq Jan 14, 2026
889b42d
add support for continue runs and writing intermediate file on error …
ArthurDeclercq Jan 14, 2026
ba930b8
changes to default features sets instead of based on charge
ArthurDeclercq Jan 14, 2026
a3875da
minor changes
ArthurDeclercq Jan 14, 2026
8e793a0
conditional import of mumble
ArthurDeclercq Jan 14, 2026
07b96e7
add tracking to spectrum file reading
ArthurDeclercq Jan 14, 2026
496c0b8
change rust function names
ArthurDeclercq Jan 20, 2026
95e149e
minor changes to logging and other bugfixes
ArthurDeclercq Jan 21, 2026
6f49935
add deeplc plot to plotting module
ArthurDeclercq Jan 22, 2026
d66426e
making report generation funcitonal again
ArthurDeclercq Jan 22, 2026
c132684
change fg colors
ArthurDeclercq Jan 22, 2026
e782616
remove ionmob from ms2rescore
ArthurDeclercq Jan 27, 2026
f011705
update required python version
ArthurDeclercq Jan 27, 2026
623b95f
remove ionmob from gui
ArthurDeclercq Jan 27, 2026
704c22d
update numpy versioning
ArthurDeclercq Jan 28, 2026
4cce1f6
updata colors of report
ArthurDeclercq Jan 28, 2026
13a72b8
updated documentation on intermediate files
ArthurDeclercq Jan 28, 2026
603ee50
Address review comments
RalfG Mar 18, 2026
8498de9
Merge branch 'main' of https://github.com/CompOmics/ms2rescore into r…
RalfG Mar 18, 2026
c3e5560
Update dependencies for alpha release; clean up imports and configs
RalfG Apr 8, 2026
1fc12b4
Fix duplicate psm_id_pattern application and inverted mz condition
RalfG Apr 8, 2026
dbb6dad
fix: Remove redundant rustyms dependency and skip mumble install for …
RalfG Apr 8, 2026
38ce072
fix: Address review findings across refactoring branch
RalfG Apr 8, 2026
41a1906
Fix missing import on type annotation
RalfG Apr 8, 2026
d97c9b0
fix: Fix multiple bugs found during code review
RalfG Apr 8, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,15 @@ jobs:
python-version: ${{ matrix.python-version }}
enable-cache: true

# Temporarily skip mumble on 3.14 until rustyms supports it
- name: Install the project
run: uv sync --all-extras --dev
run: |
if [ "${{ matrix.python-version }}" = "3.14" ]; then
uv sync --extra idxml --dev
else
uv sync --all-extras --dev
fi


- name: Run tests
run: uv run pytest
Expand Down
2 changes: 1 addition & 1 deletion .readthedocs.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
version: 2

build:
os: ubuntu-22.04
os: ubuntu-lts-latest
tools:
python: "3.11"
jobs:
Expand Down
7 changes: 7 additions & 0 deletions docs/source/userguide/input-files.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,13 @@ Check the list of :py:mod:`psm_utils` tags in the
file extension, the file type can also be inferred from the file name. In that case,
``psm_file_type`` option can be set to ``infer``.

.. note::
If a previous MS²Rescore run crashed during feature generation or rescoring, an intermediate
file (``<prefix>.intermediate.psms.tsv``) is automatically saved. This file contains all PSMs
with features that were successfully added up to that point. You can resume processing by
providing this file as the PSM file (``-p <prefix>.intermediate.psms.tsv -t tsv``) to skip
already completed feature generation steps.


Spectrum file(s)
================
Expand Down
4 changes: 4 additions & 0 deletions docs/source/userguide/output-files.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,10 @@ Log and configuration files:
+--------------------------------------+--------------------------------------------------------------------------------------+
| ``<prefix>.feature_names.tsv`` | List of the features and their descriptions |
+--------------------------------------+--------------------------------------------------------------------------------------+
| ``<prefix>.intermediate.psms.tsv`` | Created automatically if the process crashes during feature generation or rescoring. |
| | Contains all PSMs with successfully added features up to the crash point. Can be |
| | used to resume processing with ``-p <prefix>.intermediate.psms.tsv -t tsv``. |
+--------------------------------------+--------------------------------------------------------------------------------------+

Rescoring engine files:

Expand Down
2 changes: 1 addition & 1 deletion ms2rescore.spec
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ project = "ms2rescore"
bundle_name = "ms2rescore"
bundle_identifier = f"{bundle_name}.{__version__}"

extra_requirements = {"ionmob"}
extra_requirements = {}

# Requirements config
skip_requirements_regex = r"^(?:.*\..*)"
Expand Down
10 changes: 9 additions & 1 deletion ms2rescore/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import json
import logging
import sys
from datetime import datetime
from pathlib import Path
from typing import Union

Expand Down Expand Up @@ -196,7 +197,13 @@ def profile(fnc, filepath):
def inner(*args, **kwargs):
with cProfile.Profile() as profiler:
return_value = fnc(*args, **kwargs)
profiler.dump_stats(filepath + ".profile.prof")

# Add timestamp to profiler output filename
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
profile_filename = f"{filepath}.profile_{timestamp}.prof"
profiler.dump_stats(profile_filename)
LOGGER.info(f"Profile data written to: {profile_filename}")

return return_value

return inner
Expand Down Expand Up @@ -248,6 +255,7 @@ def main(tims=False):
# Run MS²Rescore
try:
if config["ms2rescore"]["profile"]:
LOGGER.info("Profiling enabled")
profiled_rescore = profile(rescore, config["ms2rescore"]["output_path"])
profiled_rescore(configuration=config)
else:
Expand Down
6 changes: 6 additions & 0 deletions ms2rescore/constants.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
"""Shared constants for ms2rescore."""

import re

# Regex pattern to strip charge state suffix (e.g., "/2") from peptidoform strings
CHARGE_PATTERN = re.compile(r"(/\d+$)")
81 changes: 69 additions & 12 deletions ms2rescore/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,22 @@
from multiprocessing import cpu_count
from typing import Dict, Optional

import numpy as np
import psm_utils.io
from mokapot.dataset import LinearPsmDataset
from psm_utils import PSMList

from ms2rescore import exceptions
from ms2rescore.constants import CHARGE_PATTERN
from ms2rescore.feature_generators import FEATURE_GENERATORS
from ms2rescore.parse_psms import parse_psms
from ms2rescore.parse_spectra import add_precursor_values
from ms2rescore.report import generate
from ms2rescore.rescoring_engines import mokapot, percolator
from ms2rescore.rescoring_engines.mokapot import add_peptide_confidence, add_psm_confidence
from ms2rescore.rescoring_engines.mokapot import (
add_peptide_confidence,
add_psm_confidence,
)

logger = logging.getLogger(__name__)

Expand All @@ -34,7 +39,9 @@ def rescore(configuration: Dict, psm_list: Optional[PSMList] = None) -> None:
f"Running MS²Rescore with following configuration: {json.dumps(configuration, indent=4)}"
)
config = configuration["ms2rescore"]
output_file_root = config["output_path"]
output_file_root = config["output_path"].split(".intermediate.")[
0
] # if no intermediate, takes full name

# Write full configuration including defaults to file
with open(output_file_root + ".full-config.json", "w") as f:
Expand All @@ -59,11 +66,24 @@ def rescore(configuration: Dict, psm_list: Optional[PSMList] = None) -> None:
logger.debug(
f"PSMs already contain the following rescoring features: {psm_list_feature_names}"
)
# Check if all features are already present; collect generators to skip
skip_fgens = set()
for fgen_name, fgen_config in config["feature_generators"].items():
fgen_features = FEATURE_GENERATORS[fgen_name]().feature_names
if set(fgen_features).issubset(psm_list_feature_names):
logger.debug(
f"Skipping feature generator {fgen_name} because all features are already "
"present in the PSM file."
)
feature_names[fgen_name] = set(fgen_features)
feature_names["psm_file"] = psm_list_feature_names - set(fgen_features)
skip_fgens.add(fgen_name)

# Add missing precursor info from spectrum file if needed
required_ms_data = {
ms_data
for fgen_name in config["feature_generators"].keys()
if fgen_name not in skip_fgens
for ms_data in FEATURE_GENERATORS[fgen_name].required_ms_data
}
available_ms_data = add_precursor_values(
Expand All @@ -75,6 +95,8 @@ def rescore(configuration: Dict, psm_list: Optional[PSMList] = None) -> None:

# Add rescoring features
for fgen_name, fgen_config in config["feature_generators"].items():
if fgen_name in skip_fgens:
continue
# Compile configuration
conf = config.copy()
conf.update(fgen_config)
Expand All @@ -89,12 +111,29 @@ def rescore(configuration: Dict, psm_list: Optional[PSMList] = None) -> None:
"files or disable the feature generator."
)
continue

# Add features
fgen.add_features(psm_list)
try:
fgen.add_features(psm_list)
except (
Exception,
KeyboardInterrupt,
) as e: # Intentionally broad to save intermediate output before re-raising
logger.error(
f"Error while adding features from {fgen_name}: {e}, writing intermediary output..."
)
# Write intermediate TSV
psm_utils.io.write_file(
psm_list, output_file_root + ".intermediate.psms.tsv", filetype="tsv"
)
raise
logger.debug(f"Adding features from {fgen_name}: {set(fgen.feature_names)}")
feature_names[fgen_name] = set(fgen.feature_names)

# Remove overlapping features from psm_file to avoid duplicates
# (e.g., hyperscore can be in both psm_file and ms2pip)
overlap = feature_names.get("psm_file", set()) & feature_names[fgen_name]
if overlap:
feature_names["psm_file"] = feature_names["psm_file"] - overlap

# Filter out psms that do not have all added features
all_feature_names = {f for fgen in feature_names.values() for f in fgen}
psms_with_features = [
Expand All @@ -114,6 +153,12 @@ def rescore(configuration: Dict, psm_list: Optional[PSMList] = None) -> None:
)
psm_list = psm_list[psms_with_features]

if "mumble" in config["psm_generator"]:
from ms2rescore.utils import filter_mumble_psms

# Remove PSMs where matched_ions_pct drops 25% below the original hit
psm_list = filter_mumble_psms(psm_list, threshold=0.75)

# Write feature names to file
_write_feature_names(feature_names, output_file_root)

Expand Down Expand Up @@ -160,13 +205,18 @@ def rescore(configuration: Dict, psm_list: Optional[PSMList] = None) -> None:
protein_kwargs=protein_kwargs,
**config["rescoring_engine"]["mokapot"],
)
except exceptions.RescoringError as e:
except (
Exception,
KeyboardInterrupt,
): # Intentionally broad to save intermediate output before re-raising
# Write output
logger.info(f"Writing intermediary output to {output_file_root}.psms.tsv...")
psm_utils.io.write_file(psm_list, output_file_root + ".psms.tsv", filetype="tsv")
logger.info(f"Writing intermediary output to {output_file_root}.intermediate.psms.tsv...")
psm_utils.io.write_file(
psm_list, output_file_root + ".intermediate.psms.tsv", filetype="tsv"
)

# Reraise exception
raise e
raise

# Post-rescoring processing
if all(psm_list["pep"] == 1.0):
Expand Down Expand Up @@ -219,7 +269,12 @@ def _write_feature_names(feature_names, output_file_root):
def _log_id_psms_before(psm_list: PSMList, fdr: float = 0.01, max_rank: int = 1) -> int:
"""Log #PSMs identified before rescoring."""
id_psms_before = (
(psm_list["qvalue"] <= 0.01) & (psm_list["rank"] <= max_rank) & (~psm_list["is_decoy"])
(psm_list["qvalue"] <= fdr)
& (psm_list["rank"] <= max_rank)
& (~psm_list["is_decoy"])
& np.array(
[(metadata or {}).get("original_psm", True) for metadata in psm_list["metadata"]]
)
).sum()
logger.info(
f"Found {id_psms_before} identified PSMs with rank <= {max_rank} at {fdr} FDR before "
Expand Down Expand Up @@ -274,7 +329,7 @@ def _calculate_confidence(psm_list: PSMList) -> PSMList:
psm_df = psm_list.to_dataframe()
psm_df = psm_df.reset_index(drop=True).reset_index()
psm_df["peptide"] = (
psm_df["peptidoform"].astype(str).str.replace(r"(/\d+$)", "", n=1, regex=True)
psm_df["peptidoform"].astype(str).str.replace(CHARGE_PATTERN, "", n=1, regex=True)
)
psm_df["is_target"] = ~psm_df["is_decoy"]
lin_psm_data = LinearPsmDataset(
Expand All @@ -285,7 +340,9 @@ def _calculate_confidence(psm_list: PSMList) -> PSMList:
)

# Recalculate confidence
new_confidence = lin_psm_data.assign_confidence(scores=psm_list["score"])
new_confidence = lin_psm_data.assign_confidence(
scores=list(psm_list["score"])
) # explicity make it a list to avoid TypingError: Failed in nopython mode pipeline (step: nopython frontend) in mokapot

# Add new confidence estimations to PSMList
add_psm_confidence(psm_list, new_confidence)
Expand Down
6 changes: 6 additions & 0 deletions ms2rescore/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,3 +41,9 @@ class RescoringError(MS2RescoreError):
"""Error while rescoring PSMs."""

pass


class ParseSpectrumError(MS2RescoreError):
"""Error while parsing spectrum files."""

pass
6 changes: 2 additions & 4 deletions ms2rescore/feature_generators/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,13 @@
from ms2rescore.feature_generators.basic import BasicFeatureGenerator
from ms2rescore.feature_generators.deeplc import DeepLCFeatureGenerator
from ms2rescore.feature_generators.im2deep import IM2DeepFeatureGenerator
from ms2rescore.feature_generators.ionmob import IonMobFeatureGenerator
from ms2rescore.feature_generators.maxquant import MaxQuantFeatureGenerator
from ms2rescore.feature_generators.ms2 import MS2FeatureGenerator
from ms2rescore.feature_generators.ms2pip import MS2PIPFeatureGenerator

FEATURE_GENERATORS: dict[str, type[FeatureGeneratorBase]] = {
"basic": BasicFeatureGenerator,
"ms2pip": MS2PIPFeatureGenerator,
"deeplc": DeepLCFeatureGenerator,
"maxquant": MaxQuantFeatureGenerator,
"ionmob": IonMobFeatureGenerator,
"im2deep": IM2DeepFeatureGenerator,
"ms2": MS2FeatureGenerator,
}
4 changes: 2 additions & 2 deletions ms2rescore/feature_generators/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ def __init__(self, *args, **kwargs) -> None:

@property
@abstractmethod
def feature_names(self):
def feature_names(self) -> list[str]:
pass

@abstractmethod
def add_features(self, psm_list: PSMList):
def add_features(self, psm_list: PSMList) -> None:
pass


Expand Down
Loading
Loading