diff --git a/admin_manual/ai/app_live_transcription.rst b/admin_manual/ai/app_live_transcription.rst index ab2a872a5c4..832f6f86565 100644 --- a/admin_manual/ai/app_live_transcription.rst +++ b/admin_manual/ai/app_live_transcription.rst @@ -1,13 +1,14 @@ -============================================================== -App: Live Transcription in Nextcloud Talk (live_transcription) -============================================================== +============================================================================== +App: Live Transcription and Translation in Nextcloud Talk (live_transcription) +============================================================================== .. _ai-live-transcription: -This app provides live transcription of speech in Nextcloud Talk calls using open source AI models provided by `Vosk `_. -The transcription is done on your own server, preserving your privacy and data sovereignty. +| This app provides live transcription and translation of speech in Nextcloud Talk calls using open source AI models provided by `Vosk `_. +| The transcription is done on your own server, preserving your privacy and data sovereignty, while the translation is done using a translation task processing provider like the :ref:`translate2 app `. `OpenAI and LocalAI integration `_ and `DeepL integration `_ apps will soon also be supported for translation. -A good set of language models are auto-downloaded. They include Arabic, Arabic (Tunisian), Breton, Catalan, Czech, German, English, Esperanto, Spanish, Persian (Farsi), French, Hindi, Italian, Japanese, Kazakh, Korean, Dutch, Polish, Portuguese (Brazilian), Russian, Telegu, Tajik, Turkish, Ukrainian, Uzbek, Vietnamese and Chinese. +| A good set of language models for transcription are auto-downloaded. They include Arabic, Arabic (Tunisian), Breton, Catalan, Czech, German, English, Esperanto, Spanish, Persian (Farsi), French, Hindi, Italian, Japanese, Kazakh, Korean, Dutch, Polish, Portuguese (Brazilian), Russian, Telegu, Tajik, Turkish, Ukrainian, Uzbek, Vietnamese and Chinese. +| The translation capabilities depend on the installed translation task processing provider app. A list of translation-capable apps can be found :ref:`here ` in the "Backend apps" section. Installation ------------ @@ -24,21 +25,42 @@ Installation --env LT_INTERNAL_SECRET=1234 \ --wait-finish +.. important:: -.. note:: + The environment variables ``LT_HPB_URL`` and ``LT_INTERNAL_SECRET`` must be set in the :ref:`Deploy Options ` during installation, + and the High-Performance Backend must be functionally configured in Nextcloud Talk settings for the app to work. - Environment variables and mounts can be set during the app installation from the "Deploy Options" button. - The models are stored in a persistent volume at ``/nc_app_live_transcription_data``. - This volume is created automatically during the installation but you can also mount your own volume there. - As the name suggests, this volume is persistent and will not be deleted when the app is updated or uninstalled - (without removing data). + Changing these environment variables after installation is possible through a re-installation of the app after uninstalling it first. +5. Install a Text-to-text task processing provider app for translation capabilities from the "Backend apps" section :ref:`here `. -.. important:: +Requirements +------------ - The environment variables ``LT_HPB_URL`` and ``LT_INTERNAL_SECRET`` must be set in the Deploy Options, - and the High-Performance Backend must be functionally configured in Nextcloud Talk settings for the app to work. +* Minimal Nextcloud version: 33 +* Nextcloud AIO is supported +* We currently support NVIDIA GPUs and x86_64 CPUs. Only CPU-based transcription is also supported and works well on modern x86 CPUs. +* CUDA >= v12.4.1 on your host system for GPU-based transcription +* GPU Sizing + + * A NVIDIA GPU with at least 10 GB VRAM + * 16 GB of system RAM should be enough for one or two concurrent calls + +* CPU Sizing + + * x86 CPU with 4 threads. Additional 2 threads per concurrent call. + * 16 GB of RAM should be enough for one or two concurrent calls + +* Space usage + * ~ 2.8 GB for the docker container + * ~ 6.0 GB for the default models + +.. note:: + We currently have very little real-world experience running this software on production instances. + The above sizing recommendations come from our estimates and are not real-world benchmarks. + Actual requirements will vary based on factors such as the number of concurrent calls, audio quality, and selected languages. + Please do thorough testing to confirm your hardware meets your needs. App store --------- @@ -59,3 +81,4 @@ Limitations * The app currently supports only a limited number of languages. More languages may be added in the future. * The languages other than English may have lower accuracy mainly due to the shipped models being smaller. * The app currently does not support punctuation in the transcription. +* `OpenAI and LocalAI integration `_ and `DeepL integration `_ apps are not yet supported for translation. diff --git a/admin_manual/ai/overview.rst b/admin_manual/ai/overview.rst index 30e0c288cb8..bf30798623a 100644 --- a/admin_manual/ai/overview.rst +++ b/admin_manual/ai/overview.rst @@ -137,13 +137,14 @@ Frontend apps * *Text* for offering the translation menu * `Assistant `_ offering a graphical translation UI * `Analytics `_ for translating graph labels +* `Talk `_ for translating messages and live translations in calls in conjunction with the :ref:`Live Transcription app ` Backend apps ~~~~~~~~~~~~ * :ref:`translate2 (ExApp)` - Runs open source AI translation models locally on your own server hardware (Customer support available upon request) * `OpenAI and LocalAI integration (via OpenAI API) `_ - Integrates with the OpenAI API to provide AI functionality from OpenAI servers (Customer support available upon request; see :ref:`AI as a Service`) -* *integration_deepl* - Integrates with the deepl API to provide translation functionality from Deepl.com servers (Only community supported) +* `DeepL integration `__ - Integrates with the deepl API to provide translation functionality from Deepl.com servers (Only community supported) Speech-To-Text ^^^^^^^^^^^^^^ diff --git a/admin_manual/exapps_management/AdvancedDeployOptions.rst b/admin_manual/exapps_management/AdvancedDeployOptions.rst index 1157804f90f..b1c1609e517 100644 --- a/admin_manual/exapps_management/AdvancedDeployOptions.rst +++ b/admin_manual/exapps_management/AdvancedDeployOptions.rst @@ -2,6 +2,8 @@ Advanced Deploy Options ======================= +.. _ai-app_api_deploy_options: + AppAPI allows optionally to configure environment variables and mounts for the ExApp container. It is available via "Deploy options" modal next to "Deploy and Enable" button in the sidebar of the ExApp page on the Apps management page: