Feature/evaluate corpus by tanhaow · Pull Request #54 · Princeton-CDH/muse

tanhaow · 2026-03-17T14:45:00Z

Associated Issue(s): resolves #16

Changes in this PR

Include all key changes in this pull request

Added evaluate_corpus.py script that generates CSV files containing MT evaluation metrics (ChrF and COMET scores) for machine translation corpora
Implemented caching in metrics.py to avoid reloading models for each translation, reducing evaluation time from ~9 seconds per translation to ~0.35 seconds per translation
Suppressed verbose PyTorch Lightning logging messages to provide cleaner output during evaluation

Notes

Added REQUIRED_FIELDS constant documenting expected input fields in MT corpus records

Reviewer Checklist

Run the script on a sample MT corpus to verify CSV output format is correct
Verify that ChrF and COMET scores are computed correctly and fall within expected ranges
Verify that progress bar and logging output are clean and informative

Resolved conflicts by accepting develop's changes: - Added environment variables to suppress HuggingFace logging - Wrapped COMET computation in context managers to suppress output

laurejt

Overall, this looks good and only needs a few small changes. Requested changes:

Document why pytorch and huggingface-related enivornment variables are set with metrics.py. Not all of these are related to logging.
As a small add-on (since you're already updating metrics.py), please normalize the chrF score so its range is 0-1.

laurejt · 2026-03-17T15:26:48Z

+    parsed = args.parse_args()
+
+    # Setup logging
+    log_level = logging.DEBUG if parsed.verbose else logging.INFO


Typically, default logging should be set to warning not info, especially when there is a verbose mode.

laurejt · 2026-03-17T15:27:49Z

-# Suppress verbose HuggingFace logging
+# Suppress verbose HuggingFace and PyTorch Lightning logging
 os.environ["TRANSFORMERS_VERBOSITY"] = "error"
 os.environ["TOKENIZERS_PARALLELISM"] = "false"


Why do you need to explicitly set this? Were you encountering a warning message?

There was a lot of INFO messages suggesting LitLogger, which made the output long and messy:

INFO: 💡 Tip: For seamless cloud logging and experiment tracking, try installing [litlogger](https://pypi.org/project/litlogger/) to enable LitLogger, which logs metrics and artifacts automatically to the Lightning Experiments platform. INFO: 💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.

I see this pointing to the bigger issue that the code assumes that INFO is the default non-verbose level, which is not standard.

You are right. I have set the default logging level to WARNING. Let me also remove some of these logging related code

So, the thing is, the PyTorch Lightning messages are still showing after setting the default logging to WARN. I did some investigation and found it's because PyTorch Lightning uses its own internal logging system that bypasses Python's logging module... So we actually need to keep the stderr contextlib redirects. But I will simplify it into a minimal version

Thanks for looking into this!

laurejt · 2026-03-17T15:30:29Z

+# Suppress verbose HuggingFace and PyTorch Lightning logging
 os.environ["TRANSFORMERS_VERBOSITY"] = "error"
 os.environ["TOKENIZERS_PARALLELISM"] = "false"
+os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"


Why do you need to set this?

It's for enabling fallback for unsupported MPS operations. But I can remove this.

There's no need to remove this, just document it. It's worth documenting that this library supports MPS / Apple Silicon whereas we have not done that with other parts of the pipeline / package.

laurejt · 2026-03-17T15:31:07Z

 os.environ["TOKENIZERS_PARALLELISM"] = "false"
+os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"
+os.environ["PYTHONWARNINGS"] = "ignore"
+os.environ["PL_DISABLE_FORK"] = "1"


Why do you need to set this?

It's recommended by HuggingFace to prevent tokenizer deadlocks when used with PyTorch's parallel processing during COMET evaluation. I will add a note in the code

laurejt · 2026-03-17T15:32:12Z

 import torch

-# Suppress verbose HuggingFace logging
+# Suppress verbose HuggingFace and PyTorch Lightning logging


Only a few of these environment variables are logging related. All others must be documented since they impact the execution of pytorch & huggingface

laurejt · 2026-03-17T15:38:39Z

+# Required fields in input machine translation corpus records
+REQUIRED_FIELDS = ["tr_id", "src_text", "ref_text", "tr_text"]


This is fine, but unnecessary. There's no need to validate input we control for experimental code.

laurejt · 2026-03-17T15:41:43Z

+# Cache for loaded metrics to avoid reloading models
+LOADED_METRICS = {
+    "chrf": None,
+    "comet": None,
+}


How much RAM does this require? This will be important especially worth noting when CometKiwi is added.

laurejt · 2026-03-17T15:52:27Z

@@ -28,7 +52,11 @@ def compute_chrf(
    Returns a float in the range [0, 100], where 0 indicates no match and 100
    indicates a perfect match.
    """


In retrospect, the range of ChrF is odd since it typically ranges between 0-1 since it's an f-score. This must have something to do with the implementation that huggingface-evaluate uses.

Let's correct this by normalizing the scores to 0-1 (i.e., divide by 100) and update the relevant documentation.

laurejt · 2026-03-17T16:07:19Z

+            LOADED_METRICS["comet"] = evaluate.load("comet")
+
+    comet_metric = LOADED_METRICS["comet"]
+    gpus = 1 if (torch.cuda.is_available() or torch.backends.mps.is_available()) else 0


This is fine for now, we may want to update this logic if we plan on running this on a cluster (i.e., make it possible to specify the use of gpu or cpu-only regardless of what the device has available).

tanhaow · 2026-03-18T14:15:36Z

Thanks for your review, @laurejt ! I've made the following changes:

Change default logging level to WARNING
Simplify and document environment variable configuration
Normalize ChrF scores to 0-1 range (-- good catch!)
Document RAM requirements for COMET caching
Remove REQUIRED_FIELDS constant

laurejt

When I run evaluate_corpus.py none of the pytorch lighnining INFO logging messages are suppressed.

tanhaow · 2026-03-19T14:39:19Z

PyTorch Lightning logging is now suppressed. Now only 1 INFO message appears on first evaluation (checkpoint upgrade notice) instead of 6 messages per translation.

laurejt

Pytorch lightning does use Python logging, but I think this is a case of not being specific enough with the logger name. I was able to suppress the logging messages with the following two lines:

logging.getLogger("pytorch_lightning.utilities.rank_zero").setLevel(logging.WARNING)
logging.getLogger("pytorch_lightning.utilities.migration").setLevel(logging.WARNING)

Try simplifying the code to this and verify that this works for your local dev environment as well.

In case you're curious, I figured out the particular module names by searching for the portions of the INFO logs within the pytorch-lightning repo.

laurejt · 2026-03-19T14:50:21Z

+    # Suppress PyTorch Lightning INFO messages by redirecting stderr at file descriptor level
+    # This is necessary because PyTorch Lightning bypasses Python's logging system
+    _stderr_fd = sys.stderr.fileno()
+    _original_stderr_fd = os.dup(_stderr_fd)
+    _devnull_fd = os.open(os.devnull, os.O_WRONLY)
+    os.dup2(_devnull_fd, _stderr_fd)
+
+    try:


This is a lot of complexity for what we're trying to do, see if my suggestion works instead.

laurejt · 2026-03-19T14:50:26Z

+    finally:
+        # Restore original stderr
+        os.dup2(_original_stderr_fd, _stderr_fd)
+        os.close(_original_stderr_fd)
+        os.close(_devnull_fd)


This is a lot of complexity for what we're trying to do, see if my suggestion works instead.

tanhaow · 2026-03-19T14:53:45Z

Pytorch lightning does use Python logging, but I think this is a case of not being specific enough with the logger name. I was able to suppress the logging messages with the following two lines:
logging.getLogger("pytorch_lightning.utilities.rank_zero").setLevel(logging.WARNING)
logging.getLogger("pytorch_lightning.utilities.migration").setLevel(logging.WARNING)
Try simplifying the code to this and verify that this works for your local dev environment as well.

In case you're curious, I figured out the particular module names by searching for the portions of the INFO logs within the pytorch-lightning repo.

This does look like a much more elegant solution!! Ugh you're so smart

tanhaow · 2026-03-19T14:57:04Z

@laurejt It works perfectly. Thank you for digging out this solution!!

laurejt

🚀

tanhaow added 5 commits March 11, 2026 08:56

add method to compute comet metric and a test script for it

b28fc9b

Merge branch 'develop' into feature/evaluate-corpus

57952bc

Resolved conflicts by accepting develop's changes: - Added environment variables to suppress HuggingFace logging - Wrapped COMET computation in context managers to suppress output

Create evaluate_corpus.py

8b6bfca

suppress PyTorch Lightning logging and cache models

605f1c8

Update metrics.py

c700c8e

tanhaow requested a review from laurejt March 17, 2026 14:45

tanhaow self-assigned this Mar 17, 2026

ruff format

caa28b5

laurejt requested changes Mar 17, 2026

View reviewed changes

tanhaow added 3 commits March 18, 2026 10:01

Remove required fields in input

8162dfe

Revise per @laurejt review

36f5ab0

ruff format

0da505f

tanhaow requested a review from laurejt March 18, 2026 14:15

laurejt reviewed Mar 19, 2026

View reviewed changes

suppress PyTorch Lightning INFO messages

505a725

tanhaow requested a review from laurejt March 19, 2026 14:38

laurejt reviewed Mar 19, 2026

View reviewed changes

Change to @laurejt 's approach

a525d7f

tanhaow requested a review from laurejt March 19, 2026 14:56

laurejt approved these changes Mar 19, 2026

View reviewed changes

tanhaow merged commit 80c74b7 into develop Mar 19, 2026
1 check passed

tanhaow deleted the feature/evaluate-corpus branch March 19, 2026 15:07

		# Required fields in input machine translation corpus records
		REQUIRED_FIELDS = ["tr_id", "src_text", "ref_text", "tr_text"]

Conversation

tanhaow commented Mar 17, 2026 • edited by laurejt Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes in this PR

Notes

Reviewer Checklist

Uh oh!

laurejt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tanhaow commented Mar 18, 2026

Uh oh!

laurejt left a comment

Choose a reason for hiding this comment

Uh oh!

tanhaow commented Mar 19, 2026

Uh oh!

laurejt left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tanhaow commented Mar 19, 2026

Uh oh!

tanhaow commented Mar 19, 2026

Uh oh!

laurejt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tanhaow commented Mar 17, 2026 •

edited by laurejt

Loading

laurejt left a comment •

edited

Loading