-
Notifications
You must be signed in to change notification settings - Fork 77
Parallel query and PLJava
PL/Java 1.5.1 adds support for PostgreSQL 9.6, and with that comes the
possibility of using PL/Java functions in parallel queries. Simple testing shows
that this actually works; PL/Java functions can even be declared PARALLEL SAFE
if they meet the requirements, and executed in the parallelized parts of
queries.
However, this is a substantial change to conditions in which PL/Java was
developed, so this wiki page is here to collect the notes that are likely to
come with experience using this new capability. Such experience might include
empirically-determined, good values for parallel_setup_cost, nonobvious cases
where a function should not be declared RESTRICTED or SAFE, and so on.
Until PL/Java 1.5.1 is released, here is a preview of the new section of the user's guide.
PL/Java can be used in some background worker processes as introduced in PostgreSQL 9.3, and in parallel queries, from PostgreSQL 9.6, with some restrictions.
Because PL/Java requires access to a database containing the sqlj schema,
PL/Java is only usable in a worker process that initializes a database
connection, which must happen before the first use of any function that
depends on PL/Java.
Like any user-defined function, a PL/Java function can be
annotated with a level of "parallel safety", UNSAFE by default.
When a function labeled UNSAFE is used in a query, the query cannot be
parallelized at all. If a query contains a function labeled RESTRICTED, parts
of the query may execute in parallel, but the part that calls the RESTRICTED
function will be executed only in the lead process. A function labeled SAFE
may be executed in every process participating in the query.
PostgreSQL parallel query processing uses multiple operating-system processes,
and these processes are new for each parallel query. If a PL/Java function is
labeled PARALLEL SAFE and is pushed by the query planner to run in the
parallel worker processes, each new process will start a Java virtual machine.
The cost of doing so will reduce the expected advantage of parallel execution.
To inform the query planner of this trade-off, the value of the PostgreSQL
configuration variable parallel_setup_cost should be increased.
The startup cost can be minimized with attention to the
PL/Java VM option recommendations, including class data sharing.
There are stringent limits on what a function labeled RESTRICTED may do,
and even more stringent limits on what may be done in a function labeled SAFE.
The PostgreSQL manual describes the limits in the section
Parallel Labeling for Functions and Aggregates.
While PostgreSQL does check for some inappropriate operations from a
PARALLEL SAFE or RESTRICTED function, for the most part it relies on
functions being labeled correctly. When in doubt, the conservative approach
is to label a function UNSAFE, which can't go wrong. A function mistakenly
labeled RESTRICTED or SAFE could produce unpredictable results.
While a given PL/Java function itself may clearly qualify as RESTRICTED or
SAFE by inspection, there may still be cases where a forbidden operation
results from the internal workings of PL/Java itself. This has not been seen
in testing (simple parallel queries with RESTRICTED or SAFE PL/Java
functions work fine), but to rule out the possibility would require a careful
audit of PL/Java's code. Until then, it would be prudent for any application
involving parallel query with RESTRICTED or SAFE PL/Java functions
to be first tested in a non-production environment.
README.parallel in the PostgreSQL source, for more detail on why parallel query works the way it does.