High throughput & inference engine for Tr with cplusplus extensions, updated for environments like ubuntu 20.04+.
The web service depends on:
These three repository adopt one header file only way, which is convenient for web service developing.
Let's just keep it simple, and fast.
Note: this project is still in early development and not ready for production use. May finish in several days.
As the release files from Tr has only .so files, we can never figure out how it was implemented, but following the tutorials and examples from the scripts it offered, we may infer that:
- The
tr_runfunction can be executed in multi-threads environment, which is of vital importance to accelerate the inference latency and throughput. - The input of
tr_runfunction can be a path of local image, or the pointer of ndarray, for easy-development reasons I adoptlocal image path(const char*) as input. Another kind of input will try later.
git clone & fetch submodules:
git clone https://github.com/SamuraiBUPT/TrWebOCR.cpp.git
cd TrWebOCR.cpp
git submodule update --init --recursiveAnd then build the project:
mkdir build && cd build
# without GPU
cmake -DUSE_GPU=OFF ..
# with GPU
cmake -DUSE_GPU=ON ..
# compile
makeand if there is no error, you can run the server:
./mainas for requests, you may use python:
cd scripts
python test_api.py- support inference
- GPU support
- Flexible image serving (In progress)
- Support Chinese OCR.
- Image rotation C++ implement.
CPU Mode
- Num=100 requests
- Concurrency=20
GPU Mode
- Num=100 requests
- Concurrency=20
The Cplusplus backend seems to offer more stable service than tornado backend.
And the GPU utilization comparison:
GPU: Cplusplus backend
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 2080 Ti On | 00000000:B2:00.0 Off | N/A |
| 33% 45C P2 135W / 250W | 2673MiB / 11264MiB | 73% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
GPU: Tornado backend
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 2080 Ti On | 00000000:B2:00.0 Off | N/A |
| 33% 45C P2 135W / 250W | 875MiB / 11264MiB | 39% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
I think there has something to do with the GPU utilization in Tornado backend, and that can be one of the reasons why the C++ backend is faster.
Test on 10000 requests, concurrency=20, same images dataset, GPU mode.


