WikiHow Network Analysis through Data Mining and Network Graph Modelling

This segment focuses on the preliminary stage of the project, which involves the extraction of valuable information from WikiHow pages through web scraping techniques. The primary objective is to gather comprehensive data about various articles, including their main headings, subheadings, categories, ratings, views, and other pertinent details. By utilizing libraries such as BeautifulSoup and requests in Python, the code accesses and parses the HTML content of WikiHow pages, extracting the desired information. Once extracted, the data is structured and cleaned to ensure consistency and reliability. This phase serves as the foundation for subsequent analysis, providing a rich dataset that captures the breadth and depth of content available on WikiHow.

![Network Graph](Network Graph.png)

Part 1: Data Mining and Processing

Web Scraping: Utilization of Python libraries like BeautifulSoup and requests for fetching and parsing HTML data from WikiHow pages.
Data Extraction: Extraction of relevant information such as main headings, sub-headings, categories, co-authors, views, etc., from the parsed HTML content.
Cleaning and Preprocessing: Removal of unwanted HTML tags, formatting inconsistencies, and noise from the extracted data to ensure consistency and accuracy.
Structuring Data: Organizing the extracted information into a structured format such as a pandas DataFrame for easy manipulation and analysis.
Handling Unwanted URLs: Filtering out unwanted URLs and irrelevant data to focus only on WikiHow articles related to the specified categories.
Batch Processing: Iterative processing of multiple WikiHow pages to accumulate a comprehensive dataset for analysis.
Data Storage: Saving the cleaned and structured data into a CSV file for further analysis and visualization.

Code for Data Mining and Data Cleaning

Part 2: Graph Construction and Analysis

Network Graph Creation: Creation of a NetworkX graph to represent the relationships between main headings and their corresponding categories/sub-categories extracted from WikiHow articles.
Node and Edge Addition: Addition of nodes representing main headings and categories/sub-categories to the graph, with edges connecting them to depict their relationships.
Visualization: Visualization of the network graph using Matplotlib to provide a visual representation of the connections between main headings and categories/sub-categories.
Graph Metrics Calculation: Calculation of various graph metrics such as degree distribution, average degree, PageRank, diameter, and centrality measures (closeness and betweenness centrality).
Statistical Analysis: Statistical analysis of graph metrics to gain insights into the structure and characteristics of the WikiHow network.
Visualization Enhancement: Enhancement of visualizations with additional features such as highlighting major nodes, plotting histograms for centrality measures, and displaying top PageRank values.
Insight Generation: Generation of insights from the analyzed graph metrics to understand the importance and centrality of different categories/sub-categories within the WikiHow network.
Presentation of Results: Presentation of results through descriptive statistics, visualizations, and key findings to facilitate interpretation and understanding of the WikiHow network structure.

Code for [Network Visualization and Analysis](Network graph.ipynb)

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
Network Graph.png		Network Graph.png
Network graph.ipynb		Network graph.ipynb
README.md		README.md
Webscrapping_wikihow_new.ipynb		Webscrapping_wikihow_new.ipynb
cleaned_wikiHow_data.csv		cleaned_wikiHow_data.csv
cleaning_wikiHow_data.ipynb		cleaning_wikiHow_data.ipynb
unwanted_urls.txt		unwanted_urls.txt
wikihow_data.csv		wikihow_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WikiHow Network Analysis through Data Mining and Network Graph Modelling

Part 1: Data Mining and Processing

Part 2: Graph Construction and Analysis

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WikiHow Network Analysis through Data Mining and Network Graph Modelling

Part 1: Data Mining and Processing

Part 2: Graph Construction and Analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages