This repository contains research code for detecting coordinated behavior in IO campaigns on Twitter/X through different approaches (e.g., weighted co-retweets). Using labeled datasets from the "Coordinated Behavior in Information Operations on Twitter" article, accessible in "url", we present improvements for the co-retweets feature for identifying users engaged in coordinated campaigns, consisting of weighting each co-retweet.
polio/
├── bin/ # Executable scripts
│ ├── analyze_results.py # Statistical analysis and visualization
│ ├── benchmark_pnorm.py # Benchmark Lp-norm aggregation modes
│ ├── bootstrap_experiments.py # Bootstrap analysis with confidence intervals for metrics
│ ├── compare_datasets.ipynb # Dataset comparison notebook (Seckin vs Cima datasets)
│ ├── convert_to_pickle.py # Convert RTs to pickle format for faster loading
│ ├── count_io_users.py # Count IO users with filtering criteria
│ ├── generate_RTs_files.py # Generate filtered datasets (calls src/extractors.py)
│ ├── plot_multiple_results_toghether.py # Plot multiple approaches in the same Cumulative IO Discovery Plot
│ └── run_experiments.py # Batch experiment runner
├── src/ # Core libraries and utilities
│ ├── analysis/ # Analysis utilities
│ │ ├── __init__.py
│ │ ├── io_utils.py # IO accounts analysis utilities
│ │ ├── metrics.py # Evaluation metrics (AUC, AP, NDCG)
│ │ ├── reporting.py # Report generation (summary.txt)
│ │ ├── statistics.py # Statistical tests (Wilcoxon, Nemenyi)
│ │ └── visualization.py # Plot generation
│ ├── approaches/ # Detection approach implementations
│ │ ├── base.py # Base approach class
│ │ ├── factory.py # Approach factory (register new approaches here)
│ │ ├── lexicographic.py # Lexicographic approach (hierarchical)
│ │ ├── coretweets.py # Co-retweet counting (baseline)
│ │ ├── coretweets_weighted_1day.py # 1-day weighted co-retweets
│ │ ├── coretweets_weighted_12h.py # 12-hour weighted co-retweets
│ │ └── coretweets_weighted_2days.py # 2-day weighted co-retweets
│ ├── data_loader.py # Load filtered datasets for experiments
│ ├── extractors.py # Extract and generate filtered data (retweets, tweets, hashtags)
│ └── synchronous_repeated_detection.py # Main detection library
├── experiments/ # Experiment configurations and results
│ └── CimaIO_coretweetweighted_ranking_modes.json # Example: comparing metrics
├── output/ # Outputs for the experiments results and plots
│ └── CimaIO_coretweetweighted_rts_filtering/ # Example for an experiment output
│ └── analysis/
│ ├── plots/<plot>.png # Cumulative IO Discovery Plot and Bootstrap Plots
│ ├── <metric>/summary.txt # Results for each metric specified AUC, AP, NDCG
│ └── summary.txt # General evaluations (ED, P@100, P@500, Filtering Concentration)
├── pyproject.toml # Project dependencies
└── README.md
Contains command-line tools for the entire workflow: data generation, experiment execution, and analysis.
analysis/: Utilities for metrics computation, visualization, statistical tests, and reportingapproaches/: Detection approach implementations (baseline and weighted variants)extractors.py: Called bygenerate_RTs_files.pyto extract and filter datasets (retweets, tweets, hashtags, etc.)data_loader.py: Loads filtered datasets when running experiments withrun_experiments.py
JSON files defining experiment scenarios, comparing different approaches on Cima datasets. This file also specifies the folder where the datasets has been stored.
- Python 3.12+
- uv package manager (recommended) or pip
git clone <repository-url>
cd polio
uv sync
source .venv/bin/activateTo create a custom detection approach:
- Create approach file: Use an existing approach in
src/approaches/as a template (e.g., coretweets.py) - Implement required methods: Inherit from
BaseApproachin base.py - Register in factory: Add your approach to src/approaches/factory.py so it can be called by name in experiment configurations
Example registration in factory.py:
from .my_new_approach import MyNewApproach
APPROACHES = {
"coretweets": CoreTweets,
"my_new_approach": MyNewApproach, # Add your approach here
# ...
}Once you download the datasets you want to analyze, store it in the correspondant folder.
Datasets from "Coordinated Behavior in Information Operations on Twitter" article store them in the folder datasets/CimaIO/:
datasets/CimaIO/Honduras/honduras-bad-anonymized.jsonl
If you want to compare with the datasets presented in the "Labeled Datasets for Research on Information Operations" article, available in "url", place them in folder datasets/, for example:
datasets/Catalonia/Catalonia_part_1.gzip.parquet
Extract retweets, tweets, and other features from raw data (e.g., Cima anonymized JSONL):
uv run bin/generate_RTs_files.pyThis script calls src/extractors.py to generate filtered datasets in the appropriate format.
Create detection approaches in src/approaches/. Remember to register new approaches in factory.py.
Create an experiment JSON file in experiments/. This defines:
- Which approaches to compare
- Which datasets to use
- Time window parameters
- Ranking modes (metrics)
- If the approach needs a first filtering of a users cohort (users that presents minimum number of co-retweets)
Important: Different ranking modes (L1, L2,..., Linf) aggregate scores differently. To determine the optimal metric for your use case, first run an experiment comparing different ranking modes. For example:
uv run bin/run_experiments.py experiments/CimaIO_coretweetweighted_ranking_modes.json
uv run bin/benchmark_pnorm.py experiments/CimaIO_coretweetweighted_ranking_modes.json <approach_name>This benchmarks metrics (AUC, AP, NDCG) across all datasets to help you choose the best aggregation mode.
Example experiment configuration:
{
"name": "my_experiment",
"data_dir": "datasets/CimaIO/RTs/",
"output_dir": "output/",
"window_sec": 60,
"approaches": [
{
"name": "coretweets",
"min_coactions": 2,
"ranking_mode": "Linf"
},
{
"name": "coretweets_weighted_1day",
"ranking_mode": "L2",
"need_filtering": false
}
],
"datasets": ["Honduras"],
"force": false
}Execute the experiment across all specified datasets and approaches:
uv run bin/run_experiments.py experiments/my_experiment.jsonResults are saved to experiments/my_experiment/results/.
Generate metrics, statistical tests, and visualizations:
# Analyze all metrics
uv run bin/analyze_results.py experiments/my_experiment.json --metric all
# Analyze specific metric
uv run bin/analyze_results.py experiments/my_experiment.json --metric AUC
# Skip plot generation
uv run bin/analyze_results.py experiments/my_experiment.json --no-plots
# Generate plots without truncation (used when one approach is not based
# on co-retweets and end up with bigger suspicious users cohort)
uv run bin/analyze_results.py experiments/my_experiment.json --notruncation
# Compare approach results with ideal (catching all IO users first)
uv run bin/analyze_results.py experiments/my_experiment.json --idealStatistical Tests (need a lot of datasets):
- 2 approaches: Wilcoxon Signed-Rank test
- 3+ approaches: Nemenyi test with Critical Diagram
Plot multiple experiment results together:
uv run bin/plot_multiple_results_toghether.py experiments/my_experiment.jsonFor limited datasets, perform bootstrap resampling to compute confidence intervals (95% quantile) for metrics (AUC, AP, NDCG):
uv run bin/bootstrap_experiments.py experiments/my_experiment.jsonThis replaces users in the retweet data and computes confidence intervals across bootstrap samples.
# 1. Generate datasets from Cima anonymized data
uv run bin/generate_RTs_files.py
# 2. Compare ranking modes to find optimal metric
uv run bin/run_experiments.py experiments/Ranking_modes_experiment.json
uv run bin/benchmark_pnorm.py experiments/Ranking_modes_experiment.json coretweets_weighted_1day
# 3. Define and run main experiment
uv run bin/run_experiments.py experiments/my_experiment.json
# 4. Analyze results
uv run bin/analyze_results.py experiments/my_experiment.json --metric all
# 5. (Optional) Bootstrap for confidence intervals
uv run bin/bootstrap_experiments.py experiments/my_experiment.json- count_io_users.py: Analyze IO user distribution with different filtering thresholds
- compare_datasets.ipynb: Interactive dataset comparison notebook (Cima vs Seckin). Need both datasets downloaded.
- convert_to_pickle.py: Convert filtered datasets to pickle format for faster loading (done automatically when running an experiment)