Skip to content

Parallel query and PLJava

Chapman Flack edited this page Oct 30, 2016 · 2 revisions

Parallel query and PL/Java

PL/Java 1.5.1 adds support for PostgreSQL 9.6, and with that comes the possibility of using PL/Java functions in parallel queries. Simple testing shows that this actually works; PL/Java functions can even be declared PARALLEL SAFE if they meet the requirements, and executed in the parallelized parts of queries.

However, this is a substantial change to conditions in which PL/Java was developed, so this wiki page is here to collect the notes that are likely to come with experience using this new capability. Such experience might include empirically-determined, good values for parallel_setup_cost, nonobvious cases where a function should not be declared RESTRICTED or SAFE, and so on.


Notes go here


Preview of new documentation

Until PL/Java 1.5.1 is released, here is a preview of the new section of the user's guide.

PL/Java in parallel query or background worker

PL/Java can be used in some background worker processes as introduced in PostgreSQL 9.3, and in parallel queries, from PostgreSQL 9.6, with some restrictions.

Background worker processes

Because PL/Java requires access to a database containing the sqlj schema, PL/Java is only usable in a worker process that initializes a database connection, which must happen before the first use of any function that depends on PL/Java.

Parallel queries

Like any user-defined function, a PL/Java function can be annotated with a level of "parallel safety", UNSAFE by default.

When a function labeled UNSAFE is used in a query, the query cannot be parallelized at all. If a query contains a function labeled RESTRICTED, parts of the query may execute in parallel, but the part that calls the RESTRICTED function will be executed only in the lead process. A function labeled SAFE may be executed in every process participating in the query.

Parallel setup cost

PostgreSQL parallel query processing uses multiple operating-system processes, and these processes are new for each parallel query. If a PL/Java function is labeled PARALLEL SAFE and is pushed by the query planner to run in the parallel worker processes, each new process will start a Java virtual machine. The cost of doing so will reduce the expected advantage of parallel execution.

To inform the query planner of this trade-off, the value of the PostgreSQL configuration variable parallel_setup_cost should be increased. The startup cost can be minimized with attention to the PL/Java VM option recommendations, including class data sharing.

Limits on RESTRICTED/SAFE function behavior

There are stringent limits on what a function labeled RESTRICTED may do, and even more stringent limits on what may be done in a function labeled SAFE. The PostgreSQL manual describes the limits in the section Parallel Labeling for Functions and Aggregates.

While PostgreSQL does check for some inappropriate operations from a PARALLEL SAFE or RESTRICTED function, for the most part it relies on functions being labeled correctly. When in doubt, the conservative approach is to label a function UNSAFE, which can't go wrong. A function mistakenly labeled RESTRICTED or SAFE could produce unpredictable results.

Internal workings of PL/Java

While a given PL/Java function itself may clearly qualify as RESTRICTED or SAFE by inspection, there may still be cases where a forbidden operation results from the internal workings of PL/Java itself. This has not been seen in testing (simple parallel queries with RESTRICTED or SAFE PL/Java functions work fine), but to rule out the possibility would require a careful audit of PL/Java's code. Until then, it would be prudent for any application involving parallel query with RESTRICTED or SAFE PL/Java functions to be first tested in a non-production environment.

Further reading

README.parallel in the PostgreSQL source, for more detail on why parallel query works the way it does.

Clone this wiki locally