Even with tensorflow-gpu, the program uses the CPU (one core) quite much.
Find out what are the slow parts and how good they can be optimized:
- Probably quite a few numpy operations could be in-place, but in general I would not expect the numpy operations to be slow for arrays with shape (1280, 720, 4).
- Passing around the frame should be a reference and not too slow by itself.
- How good is the performance of the
cv2 operations?
- Is there a bottleneck between tensorflow and cv2/numpy?
- Is there data that should be global but is stored in each iteration of the main loop?
Even with tensorflow-gpu, the program uses the CPU (one core) quite much.
Find out what are the slow parts and how good they can be optimized:
cv2operations?