Add TORCHCODEC_FFMPEG_DIR to specify FFmpeg dynamic libs search path.#1161
Add TORCHCODEC_FFMPEG_DIR to specify FFmpeg dynamic libs search path.#1161bcw222 wants to merge 8 commits intometa-pytorch:mainfrom
Conversation
|
Hi @bcw222! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
|
Thanks for the PR @bcw222! Let's try to implement a more general solution. Perhaps we could expose a global variable that we pass to Would that work for your workflow? |
|
Hmm, that sounds good, but it would result in a feature that should work out-of-the-box (on devices with the required FFmpeg already installed) still requiring additional manual steps. In a narrow sense, my fix already avoids hardcoding paths as much as possible by reading the Using an environment variable is certainly a good idea, but this pull request is just a quick fix I made after encountering the issue while using I would be happy to refine this further if I have the time, or if someone else is willing to help. |
|
Got it, I understand your concern that an environment variable would require an extra manual step. From our perspective, there are too many package managers to implement specific solutions for each. Our thinking is that an environment variable will provide a general solution that helps simplify your use case and others. Roughly, the implementation would be:
Would you be interested in contributing this change? |
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/meta-pytorch/torchcodec/1161
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ⏳ No Failures, 1 PendingAs of commit cce75db with merge base 910005c ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This way? |
Dan-Flores
left a comment
There was a problem hiding this comment.
Thanks for updating this @bcw222! I left a comment on avoiding package manager specific logic.
|
Reasonable. But maybe we need to document this. |
Co-authored-by: Dan-Flores <danielflores3@meta.com>
|
This way? |
|
Thanks for the update @bcw222! I'll push a quick change to load the FFmpeg libraries in macOS and linux, since |
| if not ffmpeg_library_paths: | ||
| raise RuntimeError( | ||
| "TORCHCODEC_FFMPEG_DIR is set, but no FFmpeg shared libraries " | ||
| f"were found in {repr(ffmpeg_dir)}." |
| for ffmpeg_library_path in ffmpeg_library_paths: | ||
| ctypes.CDLL(ffmpeg_library_path) |
There was a problem hiding this comment.
This is not ideal: this is loading every single .so file that is present in the TORCHCODEC_FFMPEG_DIR.
If, for example, the user specifies TORCHCODEC_FFMPEG_DIR=$CONDA_PREFIX/lib/ then we're loading hundreds and hundreds of .so files that we'll never use.
We shouldn't load the .so files ourselves, we should continue leaving it up to load_torchcodec_shared_libraries() load all the transitive .so files it needs. What we want is to tell load_torchcodec_shared_libraries() where to find some extra .so files. I.e. I think we just want to update LD_LIBRARY_PATH on linux and its equivalent on MacOS.
Something along these lines:
custom_path = os.environ.get('TORCHCODEC_FFMPEG_LIBRARIES')
if custom_path:
# Prepend so it takes priority, just like LD_LIBRARY_PATH
ld_path = os.environ.get('LD_LIBRARY_PATH', '')
os.environ['LD_LIBRARY_PATH'] = f"{custom_path}:{ld_path}" if ld_path else custom_path
# now call load_torchcodec_shared_libraries() as usualThere was a problem hiding this comment.
That seems reasonable on Linux. However, as far as I know, on macOS, DYLD_LIBRARY_PATH is ignored when using the system Python because of System Integrity Protection (SIP). Should we simply document this limitation and recommend that users use an external Python installation, or should we keep the logic to load the shared libraries manually with ctypes?
There was a problem hiding this comment.
Thanks for pointing this out @bcw222! Unfortunately this behavior on Linux and MacOS means we need to know the names of all the required .so files to avoid loading many unnecessary .so files (as @NicolasHug points out), and we load the .so files before load_torchcodec_shared_libraries().
These introduce complexity and potential for unexpected side effects. Based on our discussion, what we actually want is for TORCHCODEC_FFMPEG_DIR to act as an alias for LD_LIBRARY_PATH.
So reconsidering this approach, we should be able to have FFmpeg libs be correctly detected by setting LD_LIBRARY_PATH / DYLD_LIBRARY_PATH / PATH.
We can update this PR to document this in README.md:
# Conda on Linux
LD_LIBRARY_PATH="$CONDA_PREFIX/lib:$LD_LIBRARY_PATH"
# Conda on macOS
DYLD_LIBRARY_PATH="$CONDA_PREFIX/lib:$DYLD_LIBRARY_PATH"# Conda on Windows Powershell
$env:PATH = "$env:CONDA_PREFIX\Library\bin;" + "$env:PATH"There was a problem hiding this comment.
Wait, are you sure? On Linux it should be LD_LIBRARY_PATH while on Windows no environment variable equivalent exists. That's why we need TORCHCODEC_FFMPEG_DIR on Windows.
There was a problem hiding this comment.
My apologies, those examples were very incorrect as I sent my message while drafting, I'll fix my previous comment.
My understanding is that the PATH variable is the Windows equivalent to make DLLs discoverable, and the syntax to use it would be very similar to LD_LIBRARY_PATH or DYLD_LIBRARY_PATH.
There was a problem hiding this comment.
On Linux it should be LD_LIBRARY_PATH while on Windows no environment variable equivalent exists. That's why we need TORCHCODEC_FFMPEG_DIR on Windows.
@bcw222 our understanding is that on Windows, the LD_LIBRARY_PATH equivalent is actually the PATH env var. Is that not the case?
There was a problem hiding this comment.
Indeed, but globally modifying PATH would clutter the environment (essentially losing the advantages of using shims like chocolatey provides; for example, when using Tab auto-completion; although Windows is already quite messy in this regard); while manually passing it every time is cumbersome. Especially considering the differences across platforms, I think using a dedicated environment variable (TORCHCODEC_FFMPEG_DIR) to configure it would be better.
However, given that this is essentially just an alias for existing environment variables (PATH/LD_LIBRARY_PATH/DYLD_LIBRARY_PATH), perhaps documenting these considerations for users would be sufficient.
So what solution do you all think is more appropriate?
There was a problem hiding this comment.
Given that the OS differences require us to:
- Know the names of all .so files we want to load on Linux/MacOS (we could grep on
libav*,libsw*, but thats not guaranteed to always work). - Load the library using
ctypeson Linux/MacOS, which diverges from the Windows logic and introduces library loading logic, which adds complexity and potentially side effects.
Lets stick to the standard environmental variables, and add a section documenting how to use them! It can be similar to my edited comment above and list the various PATH variables.
manually passing it every time is cumbersome
Depending on your set up, could you call os.add_dll_directory() before import torchcodec?
There was a problem hiding this comment.
You're right. So we have two solutions: with the original logic untouched, on Windows we could either modify PATH or directly call os.add_dll_directory() before import torchcodec.
Maybe sometimes FFmpeg's DLLs is not in
PATHso we need to add them manually.