Last Updated: Loading...
← Back to Dashboard

Dashboard Guide

Filters

Interactive controls for customizing dashboard view by time, region, person, and source.

Interactive filter bar allowing you to customize the dashboard view. Use the query box to search for specific keywords or topics. The time filter lets you select any date range (default is last 30 days). Additional dropdowns allow you to filter results by region, country, person (political figure or public personality), and news source. The "Refresh" button updates all visualizations to reflect your current filter selections.

Number of Posts

Total Reddit posts analyzed from r/worldnews during the selected time period.

Total number of Reddit posts from r/worldnews analyzed during the selected time period. The count reflects all posts collected in daily data collection runs at 6 PM UTC, filtered by the chosen date range.

Trending Person

Most frequently mentioned political figure extracted using BERT NER and Wikipedia mapping.

Most frequently mentioned political figure or public personality in r/worldnews posts during the selected time period. Person names are extracted from each reddit post's title using BERT-based Named Entity Recognition (NER), then cleaned and mapped to canonical Wikipedia entries via fuzzy matching. All counts reflect the chosen time filter.

Trending Country

Most discussed country using NER extraction and geocoding to ISO codes.

Country most frequently mentioned in r/worldnews discussions during the selected time period. Location entities are extracted from each post title using BERT-based Named Entity Recognition (NER), then geocoded with geopy/pycountry/reverse_geocoder and mapped to standardized country names and ISO codes using pycountry. Duplicate mentions are deduplicated. All counts reflect the chosen time filter.

Trending Region

Geographic region with highest discussion volume mapped from country mentions.

Geographic region with the highest discussion volume in r/worldnews during the selected time period. Regions are determined by mapping each mentioned country (extracted from post titles using BERT NER and geocoded with geopy/pycountry/reverse_geocoder) to its continent using pycountry_convert, with special handling for Middle East countries (as only continents are available for mapping). Duplicate region mentions are removed. All counts reflect the chosen time filter.

Trending Source

Most shared news source extracted from post URLs with domain normalization.

News source most frequently shared and discussed in r/worldnews during the selected time period. Domains are extracted from post URLs using urlparse, with 'www.' prefixes and common suffixes (like '.com') removed for normalization. Only posts with valid URLs are included. All counts reflect the chosen time filter.

Sentiment Heatmap

World map showing sentiment distribution by country using DistilBERT analysis.

World map visualizing sentiment distribution by country in r/worldnews posts during the selected time period. Each country's color reflects the average sentiment score of posts mentioning that country (green = positive, red/orange = negative), as determined by the DistilBERT sentiment model analyzing post titles. Country boundaries are mapped using standardized ISO country codes, which are extracted and deduplicated from locations mentioned in the post titles using NER. The tooltip displays additional metrics for each country, including country name, number of posts, average upvotes, average comments, and average sentiment score. Sentiment scores range from -1 to +1 and are averaged per country. All data reflects the chosen time filter.

Trending Figures

Treemap showing relative discussion volume of political figures by mention percentage.

Treemap showing the relative discussion volume of political figures in r/worldnews during the selected time period. Person names are extracted from post titles using BERT-based Named Entity Recognition (NER), then cleaned and mapped to canonical Wikipedia entries via fuzzy matching and category scoring. The size of each block indicates the percentage share of that person's mentions relative to the total across all posts for the selected time filter.

Engagement vs Sentiment by Region

Scatter plot correlating discussion engagement with sentiment across global regions.

Scatter plot showing how discussion engagement relates to sentiment across regions in r/worldnews during the selected time period. Each bubble represents a region (determined from geocoded country mentions in post titles). The x-axis shows the region's average sentiment score (from -1 to +1, calculated using DistilBERT on post titles), while the y-axis shows its average engagement score (engagement score for a post = (upvotes × 0.5) + (comments × 0.5)). Bubble size indicates the total number of posts for that region, and color reflects average sentiment (red = negative, green = positive). All metrics are filtered by the chosen time range.

Distribution of Posts by Engagement Tier

Area chart showing post engagement tiers (Low/Medium/High) across 24-hour UTC timeline.

Area chart showing the distribution of r/worldnews posts by engagement tier across each hour of the day (UTC) for the selected time period. Engagement tier (Low, Medium, High) is determined by calculating an engagement score for each post: engagement score = (upvotes × 0.5) + (comments × 0.5). Posts with a score less than 100 are classified as Low, 100–499 as Medium, and 500 or more as High engagement. The hour of the day is based on when each post was published. The chart helps identify when posts with different engagement levels are most frequently submitted. All data reflects the chosen time filter.

Upvotes vs Number of People Engaging

Bar chart analyzing relationship between post upvotes and engagement metrics.

Bar chart showing the relationship between post popularity (upvotes) and engagement metrics (unique commenters, comment depth) in r/worldnews for the selected time period. Posts are grouped into upvote ranges (0-100, 100-500, 500-1000, etc.) and the chart displays three metrics: number of posts in each range (pink bars), average unique commenters per post (purple bars), and maximum comment depth (yellow line). The visualization reveals how higher upvote counts correlate with increased discussion participation and conversation complexity. Comment depth is determined by analyzing parent post's comment chains to measure how deeply nested discussions become. All data reflects the chosen time filter.

Post Reach vs Time taken to comment

Analysis of discussion response times by post popularity showing first and top comment timing.

Horizontal bar chart analyzing how quickly discussions begin after post creation in r/worldnews, showing the relationship between post reach (measured by upvote ranges) and response times for the selected time period. Posts are grouped by upvote ranges (0-100, 100-500, 500-1000, etc.) and displays two key metrics: average minutes to first comment (purple bars) and average minutes to most upvoted comment (green bars). The chart reveals that higher-reach posts (more upvotes) tend to generate faster initial responses, with the most popular posts receiving their first comments within the first few minutes and top comments within a very short duration. Response time is calculated as the difference between post creation timestamp and comment timestamp in minutes. Only includes posts with at least one comment. All data reflects the chosen time filter.

People/Countries with High Initial Discussion

Top entities generating immediate engagement measured by first-hour comment activity.

Horizontal bar charts showing political figures and countries that generate the most immediate engagement in r/worldnews, measured by average comments received in the first hour after post creation for the selected time period. The top chart displays the top 5 political figures ranked by their ability to spark rapid discussion, while the bottom chart shows the top 5 countries with the highest initial engagement. Comments in the first hour are calculated by counting all comments posted within 60 minutes of post creation. Only includes entities (people or countries) mentioned in at least 10 posts to ensure statistical significance. Results are averaged across all posts mentioning each entity. This filter may limit results when using shorter time ranges, as fewer entities will meet the 10-post minimum threshold due to limited data. All data reflects the chosen time filter and only includes posts with resolved political figures and geocoded countries from the entity resolution processes.

Post Lifecycle: Comment Distribution Over Time

Line chart tracking comment activity evolution across engagement tiers over time.

Line chart showing how comment activity evolves over time for different engagement tiers in r/worldnews for the selected time period. The chart displays three lines representing High (red), Medium (orange), and Low (blue) engagement posts, tracking average comments per hour from post creation through 12-24 hours. Engagement tier (Low, Medium, High) is determined by calculating an engagement score for each post: engagement score = (upvotes × 0.5) + (comments × 0.5). Posts with a score less than 100 are classified as Low, 100–499 as Medium, and 500 or more as High engagement. Time buckets include 0-1h, 1-2h, 2-3h, 3-4h, 4-5h, 5-6h, 6-12h, and 12-24h after post creation. Comments are assigned to buckets based on the time difference between comment creation and original post creation. The chart reveals that discussion intensity peaks immediately after posting and gradually diminishes over time across all engagement levels, with High engagement posts showing the steepest initial activity and fastest decay. All data reflects the chosen time filter.

Top Posts

Table of highest-performing posts with comprehensive metrics and direct Reddit links.

Table displaying the top-performing posts in r/worldnews for the selected time period. Columns include post title, upvotes, comments, sentiment score (from DistilBERT analysis of the title), unique commenters, time posted (UTC), discussion duration in hours (from first to last comment), and a direct link to the news article mentioned in the post. All metrics are calculated from daily collected data and filtered by the chosen time range.
← Back to Dashboard

About

Overview

Welcome to Reddit World News - A real-time intelligence platform that transforms r/worldnews discussions (posts/comments) into actionable insights on global news trends, political figures, and public sentiment.

Every day at 6 PM UTC, our automated pipeline collects and analyzes the most discussed posts and comments of the past day from r/worldnews, one of Reddit's largest news communities with over 47 million members. Using advanced natural language processing, we extract meaningful patterns from thousands of discussions to answer questions like:

  • Who are the most talked-about political figures right now?
  • Which countries are dominating global news discussions?
  • What regions are experiencing the highest engagement?
  • How does public sentiment vary across different countries and topics?
  • Which news sources are driving the most discussion?
  • How quickly do major stories generate responses and engagement?

Our dashboard provides an unfiltered view into what the global Reddit community is discussing, offering insights that traditional news metrics often miss—the near real-time pulse of public opinion and engagement around world events.

Architecture

The system architecture follows modern data engineering principles with containerized microservices, automated orchestration, and real-time processing capabilities. Use your mouse to explore the diagram:

Reddit WorldNews Analytics Pipeline Architecture

End-to-end production pipeline showing automated Data Collection, NLP processing, Entity Resolution, Indexing and Real-time Visualization

Data Flow

Our pipeline operates in 7 automated stages, processing data from raw Reddit discussions to actionable insights:

  1. Automated Data Collection
    Every day at 6 PM UTC, Apache Airflow triggers the pipeline DAG. The system uses PRAW (Python Reddit API Wrapper) to collect top posts and all associated comments from r/worldnews for the past 24 hrs. This includes post metadata (id, upvotes, creation time, author, URL) and full comment threads and metadata (comment id, upvotes, etc) with reply hierarchies.
  2. NLP Processing & Entity Extraction
    Each post title and comment is processed through transformer-based models. BERT-large-cased extracts named entities (people, locations, organizations), while DistilBERT analyzes sentiment. Text is preprocessed to improve accuracy (removing punctuation, normalizing spacing). NER is applied only for post titles while Sentiment Analysis is performed on both post titles and comments.
  3. Entity Resolution & Mapping
    Person names undergo fuzzy matching against Wikipedia's "Living People" category, with scoring based on political relevance. Locations are geocoded using Nominatim/Geopy and mapped to standardized country names, ISO codes and regions via pycountry. All mappings are cached for performance.
  4. Feature Engineering
    The system computes discussion metrics for both posts and comments including unique commenters, comment depth (via parent_id chains), time-based engagement buckets (0-1h, 1-2h, etc.), and response times. Engagement scores combine upvotes and comments with weighted formulas.
  5. Data Indexing
    All processed data flows into Elasticsearch with custom mappings optimized for aggregations. Runtime fields enable dynamic calculations, while eager global ordinals speed up frequent queries on entities.
  6. Real-time Visualization
    Kibana dashboards automatically refresh with new data from Elasticsearch, providing interactive filtering by time, region, person, and source. Visualizations include world maps, treemaps, scatter plots, and time series charts.
  7. Monitoring & Error Handling
    Airflow monitors pipeline health, with automatic retries and alerting. All components are containerized with Docker for consistent deployment and scaling.

Data Pipeline Components

Data Collection

Automated Reddit API integration for collection of posts and comments from r/worldnews

Libraries
PRAW
APIs
Reddit API
Tech
Python

Natural Language Processing

Used BERT-based Transformer models for Named Entity Resolution and Sentiment Analysis

Libraries
Transformers torch
APIs
Hugging Face Hub
Tech
Python BERT-large-cased DistilBERT CUDA

Person Entity Resolution

Political figure name resolution and mapping to canonical Wikipedia entries

Libraries
requests unicodedata json re
APIs
Wikipedia API
Tech
Python Fuzzy matching Political scoring Caching

Location & Region Resolution

Location geocoding and country/region mapping with ISO codes

Libraries
geopy pycountry pycountry-convert reverse_geocoder
APIs
Nominatim API
Tech
Python Geocoding ISO mapping Region classification Caching

Indexing (Storage) and Search

Real-time indexing and search capabilities

Libraries
elasticsearch
APIs
Elasticsearch REST API
Tech
Python Custom mappings Runtime fields Bulk indexing

Visualization & Analytics

Interactive dashboards with real-time filtering and advanced visualizations

Libraries
Vega-Lite
APIs
Kibana API
Tech
Lens Vega KQL Visualization

Infrastructure & Ops Components

Orchestration & Workflow

Automated workflow scheduling, monitoring, and error handling

Libraries
airflow psycopg2
APIs
Airflow REST API
Tech
Airflow PostgreSQL Scheduling

Infrastructure & Deployment

Secure cloud deployment with SSL, load balancing, and reverse proxy

Libraries
None
APIs
Let's Encrypt API DuckDNS API
Tech
Docker Nginx SSL/TLS Oracle Cloud VPS

Web Hosting & Frontend

Static web hosting with responsive design and interactive architecture viewer with usage tracking

Libraries
None
APIs
Google Analytics
Tech
Nginx HTML5/CSS3 JavaScript