This repository contains the code and resources for the paper Appeal, Align, Divide? Stance Detection for Group-Directed Messages in German Parliamentary Debates.
- Title: Appeal, Align, Divide? Stance Detection for Group-Directed Messages in German Parliamentary Debates
- Authors: Ines Rehbein, Maris Leander Buttmann, Julian Schlenker and Simone Paolo Ponzetto
- Institutions: University of Münster, University of Mannheim
- Supplementary Material: The pre-built database and other materials can be found here.
- To run the local LLM (
gemma-3-27b-it), a high-performance GPU is required. The minimum requirement is an NVIDIA H100 NVL GPU with 94 GB of VRAM.
- API Keys: Create a
secrets.jsonfile in the project's root directory. It must contain your API keys in the following format:{ "gemini_api_key": "YOUR_GEMINI_API_KEY", "huggingface_api_key": "YOUR_HUGGINGFACE_API_KEY" } - Project Path: The scripts use relative paths assuming the project folder (
stance-detection-german-llm) is placed directly in your system's home directory (e.g.,~/stance-detection-german-llm).
A database is required to run the classification scripts.
It is highly recommended to download the pre-built databases from here to save time.
Alternatively, you can build the database from scratch:
- Download the German parliamentary debates from here.
- Run the Jupyter notebook
save_plenary_minutes.ipynbto parse the debates and build the initial database.
Note: This step is only necessary if you wish to re-classify the group mentions. The pre-built database "debates_with_group_mentions" already includes these classifications.
- Download the fine-tuned classifier (
bert-base-german-cased-finetuned-MOPE-L3_Run_3_Epochs_29) from the official MOPE repository. - Create a
models/folder in the project's root directory and place the downloaded classifier inside it. - Run the extraction script. The
--reset_dbargument must be passed to clear existing data from the relevant tables.python extract-group-mention/extract_group_mention.py --reset_db
This section outlines the process for extracting, inserting, and evaluating annotation data.
To extract a sample of data for annotation, run the following script:
python data-processing/extract_annotation_data.pyTo insert manually annotated data back into the database:
- Place the annotators' completed files into the
/data/annotated_data/folder. - Run the processing script with the
--reset_dbargument. This will reset the corresponding tables before inserting the new data.python data-processing/process_annotated_data.py --reset_db
- Important: Before running a new inference, you may need to manually delete previous predictions from the database (e.g.,
DELETE FROM predictions WHERE [CONDITION]'). This is the recommended way, to avoid deleting prior results. - The whole database for predictions can be reset with running:
inference/build_inference_table.py --reset_predictions
Inference on the test set is designed to be run in parallel for different configurations.
-
Run Inference: Call the script from the CLI for each
prompt_typeandtechniquecombination.python inference/gemini_inference.py --api-key=YOUR_GEMINI_API_KEY --prompt-type=it-thinking_guideline_higher_standards --technique=zero-shot
(Available prompt types can be found in
inference/inference_helper.py) -
Insert Results: After the script generates a CSV output file, insert the results into the database using the following scrip:
python inference/insert_gemini_predictions.py
To run inference using the local Gemma-27b-it model, execute the script:
python inference/gemma_27b_it_inference.py- Run the following script with to evaluate the LLM predictions:
python evaluation/evaluation_script.py
- Inspect the results in the database, via the table "evaluation_results"
New prompt types for the models can be added by modifying the inference/inference_helper.py script.
- The
inference_helper.pyscript is complex and could be improved. It is recommended to refactor it to dynamically parse prompts from a structured file (e.g., a JSON file) to better manage the differentprompt_typeandtechniquecombinations.