Plugins are Python script that transforms a payload during dataset generation. This is typically used to assess transformation based jailbreaking techniques, or to modify prompts into a target friendly format.
Sample plugins can be found within the workspace/plugins/ directory, created by running spikee init. Further information about built-in plugins and usage examples can be found in Built-in Plugins.
Both Plugins and Dynamic Attacks can generate variations of a payload, but they serve different purposes in the testing workflow:
-
Plugins (Pre-Test Transformation):
- When they run: During
spikee generate. - What they do: Create multiple variations of a payload. Each variation is saved as a separate, independent entry in the final dataset file.
- Result: When you run
spikee test, every single variation generated by the plugin is tested against the target. This is useful for systematically evaluating a target's resilience to a known set of transformations (e.g., "Is the target vulnerable to Base64 encoding? To Leetspeak?").
- When they run: During
-
Dynamic Attacks (Real-Time Transformation):
- When they run: During
spikee test, but only if the initial, standard prompt fails. - What they do: Generate and test variations one by one in real-time. The attack stops as soon as a variation succeeds.
- Result: Only the first successful variation (or the final failed attempt) is logged. This is useful for efficiently finding any successful bypass, rather than testing every possible variation.
- When they run: During
In short, use Plugins to build a comprehensive dataset of known transformations. Use Dynamic Attacks to find a single successful bypass with adaptive, real-time logic.
Every plugin is a Python module located in the plugins/ directory of your workspace. Spikee identifies plugins by their filename.
from spikee.templates.plugin import Plugin
from spikee.templates.basic_plugin import BasicPlugin
from spikee.utilities.enums import ModuleTag
from spikee.utilities.hinting import ModuleDescriptionHint, ModuleOptionsHint, Content
from typing import List, Union, Tuple
class SamplePlugin(Plugin):
def get_description(self) -> ModuleDescriptionHint:
"""Returns the type and a short description of the plugin."""
return [], "A brief description of what this plugin does."
def get_available_option_values(self) -> ModuleOptionsHint:
"""Return supported attack options; Tuple[options (default is first), llm_required]"""
return [], False
def transform(
self,
content: Content, # To specify specific content types, use str, Audio, Image subclasses of Content
exclude_patterns: Optional[List[str]] = None,
plugin_option: str = ""
) -> Union[Content, List[Content]]:
"""Transforms the input text according to the user-defined logic, returning one or more variations.
Args:
content (Content): The input prompt to transform.
exclude_patterns (List[str], optional): Regex patterns for substrings to preserve.
Returns:
Content: The transformed text in uppercase.
"""
# Your implementation here...
class SampleBasicPlugin(BasicPlugin):
def get_description(self) -> ModuleDescriptionHint:
"""Returns the type and a short description of the plugin."""
return [], "A brief description of what this plugin does."
def get_available_option_values(self) -> ModuleOptionsHint:
"""Return supported attack options; Tuple[options (default is first), llm_required]"""
return [], False
def plugin_transform(
self,
text: str,
plugin_option: str = "",
) -> str:
"""Transforms the input text according to the user-defined logic, returning a single variation.
Args:
text (str): The input prompt to transform.
plugin_option (str, optional): A string option passed from the command line for custom behavior.
Returns:
str: The transformed text in uppercase.
"""
# Your implementation here...This is the core function of every plugin. It receives a payload string and returns one or more transformed versions.
-
content: Content: The input payload, which is typically a combination of a jailbreak and a malicious instruction. -
exclude_patterns: List[str]: A list of regular expression patterns. Your plugin must not transform any part of thecontentthat matches one of these patterns. This is critical for preserving sensitive parts of a prompt, like URLs or specific keywords. -
plugin_option: str(Optional): A string passed from the command line via--plugin-options(e.g.,"my_plugin:mode=full;variants=10"). If your plugin doesn't need configuration, you can omit this parameter.
str: Return a single transformed string. Spikee will create one new test case from this.List[str]: Return a list of transformed strings. Spikee will create a separate test case for each string in the list, allowing you to test multiple variations at once.
For more advanced plugins, you can accept a configuration string and advertise the available options. This must be implemented as a class method — standalone functions are not supported in the current OOP API.
from typing import List, Union, Optional
from spikee.utilities.hinting import Content, ModuleOptionsHint
def get_available_option_values(self) -> ModuleOptionsHint:
"""Return supported attack options; Tuple[options (default is first), llm_required]"""
return ["mode=strict", "mode=full"], False # "mode=strict" is the default
def transform(self, content: Content, exclude_patterns: Optional[List[str]] = None, plugin_option: str = "") -> Union[Content, List[Content]]:
"""Transforms the payload based on the provided option."""
# Your transformation logic here...For more advanced plugins, you can support plugin_options by implementing the get_available_option_values function. By default, it should return None, indicating no options are supported.
from spikee.templates.plugin import Plugin
from typing import List, Union
class SamplePlugin(Plugin):
def get_available_option_values(self) -> ModuleOptionsHint:
"""Return supported attack options; Tuple[options (default is first), llm_required]"""
return ["mode=strict", "mode=full"], False # "mode=strict" is the default
def transform(
self,
content: Content,
exclude_patterns: Optional[List[str]] = None,
plugin_option: str = "",
) -> Union[Content, List[Content]]:
# Your implementation here...Correctly handling exclude_patterns is the most important part of writing a robust plugin. You must leave the excluded parts of the string completely untouched. The recommended way to do this is with re.split as implemnted within the BasicPlugin.
# Example transformation function converting all text to uppercase with exclude_patterns support
import re
from typing import List, Union, Optional
from spikee.utilities.hinting import Content, get_content
def transform(self, content: Content, exclude_patterns: Optional[List[str]] = None) -> Union[Content, List[Content]]:
text = get_content(content) # Unwrap Content wrapper to get the raw string
if not exclude_patterns:
# No exclusions, transform the whole text
return apply_transformation(text)
# 1. Create a single regex pattern that captures any of the exclude patterns.
# The parentheses around the pattern are crucial for re.split to keep the delimiters.
combined_pattern = "(" + "|".join(exclude_patterns) + ")"
# 2. Split the text by the combined pattern.
# even-indexed chunks are normal text; odd-indexed chunks are the exclusions.
chunks = re.split(combined_pattern, text)
# 3. Transform only the non-excluded chunks.
transformed_chunks = []
for i, chunk in enumerate(chunks):
if i % 2 == 0:
# This is normal text, apply the transformation
transformed_chunks.append(apply_transformation(chunk))
else:
# This is an excluded part, keep it as is
transformed_chunks.append(chunk)
# 4. Rejoin the chunks into a single string.
return "".join(transformed_chunks)
def apply_transformation(text: str) -> str:
return text.upper()Plugins can output non-text content types by returning Audio or Image objects. This is how TTS (text-to-speech) and image-generation plugins work. When a plugin returns a Content subclass, the generator updates the dataset entry's content_type field accordingly so that targets and judges can handle it correctly.
Content-type routing: The generator inspects the plugin's transform (or plugin_transform) parameter annotations to decide whether to call it:
- A
content: Contentparameter annotation — plugin accepts any content type. - A
content: str(ortext: str) parameter annotation — plugin only accepts text; the generator will skip it for audio/image entries.
from typing import Optional, List
from spikee.templates.plugin import Plugin
from spikee.utilities.enums import ModuleTag
from spikee.utilities.hinting import Audio, Content, get_content, ModuleDescriptionHint, ModuleOptionsHint
class MyTTSPlugin(Plugin):
"""Example plugin that converts text to audio using a TTS service."""
def get_description(self) -> ModuleDescriptionHint:
return [ModuleTag.SINGLE], "Converts text payload to Audio via TTS"
def get_available_option_values(self) -> ModuleOptionsHint:
return ["voice=alloy", "voice=nova"], True # Requires LLM/TTS provider
def transform(
self,
content: str, # Annotate as str: only receives text entries
exclude_patterns: Optional[List[str]] = None,
plugin_option: str = "",
) -> Audio:
text = get_content(content)
# ... call TTS API to get base64-encoded audio bytes ...
audio_bytes_b64 = call_tts_api(text)
return Audio(audio_bytes_b64)See spikee/plugins/tts.py and spikee/plugins/text2image.py for full reference implementations.