Need to retry jobs that died with ProcessPrunedError

We have some situations where we get ProcessPrunedErrors for jobs that were on killed workers (in Kubernetes). We need a way to have those retried. 

`on_thread_error` seems like it might be appropriate, but it is very unclear what the lifecycle would look like using it since the examples are all just for error captures.