Rbinetwork

Rbinetwork is an R package designed to recover latent peer influence networks and estimate peer influence effects through high-dimensional sparse machine learning. A behavior influence network (binetwork) captures the relational structure among individuals whose actions mutually affect one another within a shared social context—such as a classroom, grade level, or school. In social science research, understanding interpersonal dynamics is crucial, yet a fundamental challenge remains: we often cannot directly observe who influences whom. By reconstructing the underlying “binetwork” from observed behavioral data, this package enables researchers to identify concrete influence pathways—for instance, tracing which peers contributed to an individual’s decision to engage in substance use. Once recovered, the network reveals the full architecture of behavioral influence, allowing for the estimation of peer influence effects, including both the direction and strength of social impact.

Unlike conventional social network analysis—which typically relies on self-reported friendship ties that are costly to collect and prone to measurement error, Rbinetwork provides a novel methodology to infer genuine influence relationships without requiring prior network data. It accommodates both cross‑sectional and longitudinal observation schemes, though the current repository demonstrates implementation for cross‑sectional settings with undirectional peer influence networks. While the underlying algorithms deliver high estimation accuracy, they are computationally intensive on standard personal computers due to the iterative large‑scale matrix operations involved. Therefore, we have designed parallel computation procedures to improve computational efficiency. We also recommend executing the procedures on high‑performance computing (HPC) platforms, such as supercomputers or computational clusters with high-level CPU cores, to achieve feasible runtimes.

(Recovered bullying influence networks for three schools by the Sparse Machine Learning (SML) algorithm. Each panel displays the bullying influence network recovered by Algorithm SML for one school. Nodes represent students; red circles denote bullies and blue squares denote non-bullies. Directed edges indicate estimated behavioral influence relations. Node size is proportional to degree centrality. Isolated nodes indicate students with no recovered influence links.)

To implement our package, please load the R packages "MASS", “parallel”， “ggplot2"，“gridExtra”， “glmnet", "igraph" and "ivreg" before you running the codes.

#################################################################################### ####################################################################################

Codes Intrudoction

📋 1. Overview

This R project investigates how different network structures affect the accuracy of network recovery algorithms. The study implements various network generation models, a sophisticated network recovery algorithm, and comprehensive evaluation metrics to analyze recovery performance across different topological configurations.

✨ 2. Enhanced Recovery Algorithm

(1) Optimized Implementation: Reduced optimization loops for computational efficiency

(2) Parallel Computing: Leverages multicore processing for faster execution

(3) Lambda Tuning: Automatic optimization of regularization parameter across multiple values

🚀 3. Required R Packages

install.packages(c("MASS", "parallel", "ggplot2", "gridExtra"))

🚀 4. System Requirements

(1) R version 3.6 or higher

(2) Multi-core processor for parallel execution

(3) Sufficient memory (≥ 8GB recommended for larger networks)

📁 5. Project Structure

├── Network Generation Functions

│ ├── generate_ER_network() # Random networks

│ ├── generate_WS_network() # Small-world networks

│ ├── generate_BA_network() # Scale-free networks

│ ├── generate_community_network() # Community networks

│ └── [Additional network types...]

│ ├── Network Recovery Algorithm

│ ├── f_function() # Primary objective function

│ ├── ff_function() # Threshold optimization function

│ ├── calculate_metrics() # Comprehensive performance evaluation

│ └── run_network_recovery() # Main recovery pipeline

│ ├── Network Feature Extraction

│ └── extract_network_features() # Computes topological characteristics

│ ├── Research Studies

│ ├── conduct_network_structure_study() # Impact of network topology

│ ├── conduct_sample_size_study() # Effect of node count

│ ├── conduct_fixed_edges_vary_nodes_study() # Fixed edges, varying nodes

│ └── conduct_fixed_nodes_vary_edges_study() # Fixed nodes, varying density

│ ├── Visualization & Analysis

│ ├── analyze_and_visualize_results() # Multi-figure analysis

│ ├── visualize_fixed_edges_vary_nodes() # Fixed-edge study plots

│ └── visualize_fixed_nodes_vary_edges() # Fixed-node study plots │

└── Main Execution Functions

├── main_network_structure_study()       # Primary study runner

├── main_sample_size_study()             # Sample size analysis

├── main_fixed_edges_vary_nodes_study()  # Fixed-edge analysis

└── main_fixed_nodes_vary_edges_study()  # Fixed-node analysis

🎯 6. Usage Examples

#Test network recovery on a single scale-free network

result <- quick_test_single_network()

#Compare recovery rates across 8 different network types

study_results <- main_network_structure_study()

#Analyze how node count affects recovery accuracy

sample_results <- main_sample_size_study()

#Investigate recovery with constant edges but varying nodes

fixed_edges_results <- main_fixed_edges_vary_nodes_study()

#Examine density effects with constant node count

fixed_nodes_results <- main_fixed_nodes_vary_edges_study()

📊 7. Primary Outputs

(1) Statistical Summaries: Average recovery rates, standard deviations, standard errors

(2) Performance Metrics: TPR, TNR, F1 scores, MAD values for each network type

(3) Correlation Analysis: Relationships between network features and recovery metrics

vRegression Models: Predictive models for recovery rates based on network characteristics

(4) Visualizations: Comprehensive multi-panel plots comparing performance across conditions

📊 8. Saved Results

(1) All studies automatically save results to RData files:

(2) network_structure_study_results.RData

(3) sample_size_study_results.RData

(4) fixed_edges_vary_nodes_results.RData

(5) fixed_nodes_vary_edges_results.RData

🔬 9. Key Studies Included

(1) Network Topology Impact

Investigates how 8 different network structures affect recovery accuracy using:

20-node networks
3 repetitions per network type
11 lambda values for regularization tuning

(2) Sample Size Effects

Examines recovery performance with varying node counts (10-30 nodes) on ER networks.

(3) Fixed Edge Count Analysis

Holds edge count constant (~50 edges) while varying node count (15-40 nodes) to isolate density effects.

(4) Fixed Node Count Analysis

Maintains constant node count (30 nodes) while varying edge density (0.1-0.5) to study density impact.

🛠️ 10. Technical Details

Algorithm Optimizations

(1) Reduced optimization loops from original implementation

(2) Parallel processing across multiple cores

(3) Efficient matrix operations for large networks

(4) Network Features Computed

Density, average degree, degree variance

(1) Clustering coefficient, degree assortativity

(2) Network diameter, number of components

(3) Degree distribution skewness

(4) Statistical Analysis

Mean comparisons with standard errors

(1) Correlation matrices between features and metrics

(2) Multiple regression modeling

(3) Trade-off analysis between TPR and TNR

⚡ 11. Performance Considerations

Computation Time

(1) Single network recovery: ~30-60 seconds (depending on node count)

(2) Full structure study: ~15-30 minutes (8 networks × 3 repetitions)

(3) Parallelization: Utilizes (CPU cores - 1) for lambda optimization

(4) Memory Usage

Scales with O(n²) where n is node count

(1) 20-node networks require ~100MB RAM

(2) 40-node networks require ~400MB RAM

📈 12. Visualization Examples

The code generates comprehensive visualizations including:

(1) Recovery rate bar charts with error bars

(2) TPR vs TNR scatter plots

(3) F1 score comparisons across network types

(4) Recovery rate vs network density relationships

(5) Time-series plots for sample size studies

(6) Trade-off analyses between different metrics

🔧 13. Customization Options

Adjustable Parameters

(1) Node count (10-50 nodes recommended)

(2) Edge density (0.1-0.8 typical range)

(3) Number of repetitions (≥3 for statistical reliability)

(4) Lambda range and count (5-15 values recommended)

(5) Network type parameters (k, p_rewire, m, etc.)

(6) Extending the Code

Add new network generation functions following existing patterns

(1) Modify evaluation metrics in calculate_metrics()

(2) Adjust visualization themes in analysis functions

(3) Incorporate additional statistical tests as needed

📚 14. Citation & References

If using this code for research, please cite relevant methodologies from:

(1) Wasserman & Roeder (2009) on high-dimensional variable selection

(2) Classic network models: Erdős-Rényi, Watts-Strogatz, Barabási-Albert

(3) Network recovery and spatial econometrics literature

🤝 15. Contributing

Contributions are welcome! Please:

(1) Fork the repository

(2) Create a feature branch

(3) Add tests for new functionality

(4) Submit a pull request with clear documentation

📄 16. License

This project is provided for academic and research purposes. Please contact the authors for specific licensing inquiries.

⚠️ 17. Known Issues & Limitations

Computational Intensity: Recovery algorithm is computationally expensive for large networks (>50 nodes)

Memory Usage: Dense networks require significant memory

Convergence: Optimization may occasionally fail to converge; results include error handling

Determinism: Random network generation requires set.seed() for reproducibility

(For questions or issues, please open an issue on GitHub or contact the maintainer.)

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
Codes for Monte Carlo Simulations		Codes for Monte Carlo Simulations
Codes to Illustrate the Relationship Between Peer Effect Strength and Network Degree		Codes to Illustrate the Relationship Between Peer Effect Strength and Network Degree
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rbinetwork

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Rbinetwork

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages