Skip to content

Mugeshgithub/Wat-sTheStory-DataAnalysis.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Here’s the original README tailored for your Wat’sTheStory Data Analysis Project:

WatTheStory Data Analysis Project

Description

This project focuses on exploring and analyzing the Wat’sTheStory dataset to uncover trends in cybersecurity, technology, stock markets, marketing, and Irish news. By transforming a messy dataset into a structured format, the analysis aims to extract meaningful insights, identify correlations, and highlight emerging trends using advanced data analysis techniques.

Key Features 1. Data Cleaning and Preprocessing: • Cleaned the dataset using Python and Pandas to handle missing values, remove inconsistencies, and standardize text. • Added meaningful features like word count and cleaned summaries for deeper analysis. 2. Keyword Analysis: • Extracted dominant keywords using Scikit-learn’s CountVectorizer to uncover thematic patterns across topics. 3. Sentiment Analysis: • Classified sentiments as positive, neutral, or negative using TextBlob, providing insights into public mood across categories. 4. Visualizations: • Generated word clouds, topic-wise trends, and sentiment distributions using Matplotlib, Plotly, and WordCloud. 5. Emerging Trends: • Explored startup growth in FinTech and AI, revealing a significant spike in September 2024 tied to industry events. 6. Topic Correlations: • Analyzed overlaps between topics like tech news and cybersecurity, uncovering their interconnected nature.

Technologies Used • Python: Core language for data manipulation and analysis. • Pandas: For data cleaning and feature engineering. • Scikit-learn: For keyword extraction and thematic analysis. • TextBlob: For sentiment analysis. • Matplotlib, Plotly, WordCloud: For creating insightful visualizations.

Key Insights 1. Emerging Startups: • Identified FinTech and AI startups as the leading sectors for growth, with a significant spike in September 2024. 2. Sentiment Trends: • Positive sentiment dominated tech news, reflecting optimism for innovation, while cybersecurity leaned negative, highlighting risk concerns. 3. Word Cloud Analysis: • Highlighted recurring themes like data, security, privacy, and AI across categories. 4. Topic Overlaps: • Strong connections between tech news and cybersecurity demonstrated the ripple effects of technological advancements on security.

Future Work • Incorporate real-time data for dynamic insights. • Leverage advanced NLP models like BERT for more robust sentiment and thematic analysis. • Expand to additional topics like healthcare or climate change for broader insights.

Call to Action

Explore the insights and discover how structured data analysis transforms news: • Visit the Wat’sTheStory platform: https://www.watsthestory.ie/

About

Exploring and analyzing the Wat’sTheStory dataset to uncover trends in cybersecurity, technology, startups, and more using Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages