Conversation
Resolved conflicts by accepting develop's changes: - Added environment variables to suppress HuggingFace logging - Wrapped COMET computation in context managers to suppress output
laurejt
left a comment
There was a problem hiding this comment.
Overall, this looks good and only needs a few small changes. Requested changes:
- Document why pytorch and huggingface-related enivornment variables are set with
metrics.py. Not all of these are related to logging. - As a small add-on (since you're already updating
metrics.py), please normalize the chrF score so its range is 0-1.
| parsed = args.parse_args() | ||
|
|
||
| # Setup logging | ||
| log_level = logging.DEBUG if parsed.verbose else logging.INFO |
There was a problem hiding this comment.
Typically, default logging should be set to warning not info, especially when there is a verbose mode.
| # Suppress verbose HuggingFace logging | ||
| # Suppress verbose HuggingFace and PyTorch Lightning logging | ||
| os.environ["TRANSFORMERS_VERBOSITY"] = "error" | ||
| os.environ["TOKENIZERS_PARALLELISM"] = "false" |
There was a problem hiding this comment.
Why do you need to explicitly set this? Were you encountering a warning message?
There was a problem hiding this comment.
There was a lot of INFO messages suggesting LitLogger, which made the output long and messy:
INFO: 💡 Tip: For seamless cloud logging and experiment tracking, try installing [litlogger](https://pypi.org/project/litlogger/) to enable LitLogger, which logs metrics and artifacts automatically to the Lightning Experiments platform.
INFO: 💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
There was a problem hiding this comment.
I see this pointing to the bigger issue that the code assumes that INFO is the default non-verbose level, which is not standard.
There was a problem hiding this comment.
You are right. I have set the default logging level to WARNING. Let me also remove some of these logging related code
There was a problem hiding this comment.
So, the thing is, the PyTorch Lightning messages are still showing after setting the default logging to WARN. I did some investigation and found it's because PyTorch Lightning uses its own internal logging system that bypasses Python's logging module... So we actually need to keep the stderr contextlib redirects. But I will simplify it into a minimal version
There was a problem hiding this comment.
Thanks for looking into this!
| # Suppress verbose HuggingFace and PyTorch Lightning logging | ||
| os.environ["TRANSFORMERS_VERBOSITY"] = "error" | ||
| os.environ["TOKENIZERS_PARALLELISM"] = "false" | ||
| os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1" |
There was a problem hiding this comment.
Why do you need to set this?
There was a problem hiding this comment.
It's for enabling fallback for unsupported MPS operations. But I can remove this.
There was a problem hiding this comment.
There's no need to remove this, just document it. It's worth documenting that this library supports MPS / Apple Silicon whereas we have not done that with other parts of the pipeline / package.
| os.environ["TOKENIZERS_PARALLELISM"] = "false" | ||
| os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1" | ||
| os.environ["PYTHONWARNINGS"] = "ignore" | ||
| os.environ["PL_DISABLE_FORK"] = "1" |
There was a problem hiding this comment.
Why do you need to set this?
There was a problem hiding this comment.
It's recommended by HuggingFace to prevent tokenizer deadlocks when used with PyTorch's parallel processing during COMET evaluation. I will add a note in the code
| import torch | ||
|
|
||
| # Suppress verbose HuggingFace logging | ||
| # Suppress verbose HuggingFace and PyTorch Lightning logging |
There was a problem hiding this comment.
Only a few of these environment variables are logging related. All others must be documented since they impact the execution of pytorch & huggingface
| # Required fields in input machine translation corpus records | ||
| REQUIRED_FIELDS = ["tr_id", "src_text", "ref_text", "tr_text"] |
There was a problem hiding this comment.
This is fine, but unnecessary. There's no need to validate input we control for experimental code.
| # Cache for loaded metrics to avoid reloading models | ||
| LOADED_METRICS = { | ||
| "chrf": None, | ||
| "comet": None, | ||
| } |
There was a problem hiding this comment.
How much RAM does this require? This will be important especially worth noting when CometKiwi is added.
| @@ -28,7 +52,11 @@ def compute_chrf( | |||
| Returns a float in the range [0, 100], where 0 indicates no match and 100 | |||
| indicates a perfect match. | |||
| """ | |||
There was a problem hiding this comment.
In retrospect, the range of ChrF is odd since it typically ranges between 0-1 since it's an f-score. This must have something to do with the implementation that huggingface-evaluate uses.
Let's correct this by normalizing the scores to 0-1 (i.e., divide by 100) and update the relevant documentation.
| LOADED_METRICS["comet"] = evaluate.load("comet") | ||
|
|
||
| comet_metric = LOADED_METRICS["comet"] | ||
| gpus = 1 if (torch.cuda.is_available() or torch.backends.mps.is_available()) else 0 |
There was a problem hiding this comment.
This is fine for now, we may want to update this logic if we plan on running this on a cluster (i.e., make it possible to specify the use of gpu or cpu-only regardless of what the device has available).
|
Thanks for your review, @laurejt ! I've made the following changes:
|
laurejt
left a comment
There was a problem hiding this comment.
When I run evaluate_corpus.py none of the pytorch lighnining INFO logging messages are suppressed.
|
PyTorch Lightning logging is now suppressed. Now only 1 INFO message appears on first evaluation (checkpoint upgrade notice) instead of 6 messages per translation. |
There was a problem hiding this comment.
Pytorch lightning does use Python logging, but I think this is a case of not being specific enough with the logger name. I was able to suppress the logging messages with the following two lines:
logging.getLogger("pytorch_lightning.utilities.rank_zero").setLevel(logging.WARNING)
logging.getLogger("pytorch_lightning.utilities.migration").setLevel(logging.WARNING)
Try simplifying the code to this and verify that this works for your local dev environment as well.
In case you're curious, I figured out the particular module names by searching for the portions of the INFO logs within the pytorch-lightning repo.
| # Suppress PyTorch Lightning INFO messages by redirecting stderr at file descriptor level | ||
| # This is necessary because PyTorch Lightning bypasses Python's logging system | ||
| _stderr_fd = sys.stderr.fileno() | ||
| _original_stderr_fd = os.dup(_stderr_fd) | ||
| _devnull_fd = os.open(os.devnull, os.O_WRONLY) | ||
| os.dup2(_devnull_fd, _stderr_fd) | ||
|
|
||
| try: |
There was a problem hiding this comment.
This is a lot of complexity for what we're trying to do, see if my suggestion works instead.
| finally: | ||
| # Restore original stderr | ||
| os.dup2(_original_stderr_fd, _stderr_fd) | ||
| os.close(_original_stderr_fd) | ||
| os.close(_devnull_fd) |
There was a problem hiding this comment.
This is a lot of complexity for what we're trying to do, see if my suggestion works instead.
This does look like a much more elegant solution!! Ugh you're so smart |
|
@laurejt It works perfectly. Thank you for digging out this solution!! |
Associated Issue(s): resolves #16
Changes in this PR
Include all key changes in this pull request
evaluate_corpus.pyscript that generates CSV files containing MT evaluation metrics (ChrF and COMET scores) for machine translation corporametrics.pyto avoid reloading models for each translation, reducing evaluation time from ~9 seconds per translation to ~0.35 seconds per translationNotes
REQUIRED_FIELDSconstant documenting expected input fields in MT corpus recordsReviewer Checklist