Can LLMs Beat BERT in Biomedical Information Extraction? Evaluating Prompting and Fine-Tuning Strategies for NER and Classification
Author: Vera Bernhard
Date: December 2025
Institution: University of Zurich, Switzerland
This repository contains the code and data for the Master’s thesis by Vera Bernhard.
bert_baseline/: Prediction files and evaluation outputs for the BERT baseline modelsdata/: The PsyNamic datasetevaluation/: Evaluation, post-processing, and plotting scriptsfew_shot/: Predictions and plots for the few-shot experimentsfinetuning/: All files related to fine-tuning LLMsift/: Instruction fine-tuning dataset and training scriptslst/: Label-supervised fine-tuning scripts and predictions
prompts/: Prompt templates, prompt generation scripts, and annotation guidelines for the PsyNamic datasettest/: Unit tests for evaluation and post-processing scriptszero_shot/: Predictions and plots for zero-shot experiments, including predictions from the instruction fine-tuned model
- Python 3.12
- Hugging Face Transformers – model loading, inference, and training
- PEFT – parameter-efficient fine-tuning methods
- TRL – training large language models with instruction tuning
- BiLLM – converting LLMs from uni-directional to bidirectional for classification tasks