What is a News Aggregator?
It is a web application that collects data(news articles) from multiple websites. Then presents the data in one location.
As we all know there are tons of news sites online. They publish their content on multiple platforms. Now imagine, when you open 20–30 websites daily just to read the news articles. The time you will waste gaining information.
Now, this web app can make this task easier for you. In a news aggregator, you can select the websites you want to follow. Then the app will collect the desired articles for you.
Requirements/Prerequisite
You should basic understanding of the framework/libraries given below:
- Django Framework
- BeautifulSoup
- requests module
Learn how to create location based website using django

Setup
Setup the basic Django project with the following command:
#shell django-admin startproject NewsAggregator
Then navigate to Project Folder, and create the app:
#shell python manage.py startapp news
We can also store the articles in the database, so now create the model inside the models.py file.
#news/models.py from django.db import models class Headline(models.Model): title = models.CharField(max_length=200) image = models.URLField(null=True, blank=True) url = models.TextField() def __str__(self): return self.title
We will be storing three things, title, image, and URL of the article. Also, make sure that the image field should have blank and null as true because articles can be without images.
Now, let’s start with the steps for web crawlers.
Step 1: Scrapping
To scrape the website we will use beautifulsoup library and request module. So open your views.py and start writing the code as follows:
#news/views.py #basic import import requests from django.shortcuts import render, redirect from bs4 import BeautifulSoup as BSoup from news.models import Headline
Now create a function news_scrape() for scraping the article
def news_scrape(request): session = requests.Session() session.headers = {"User-Agent": "Googlebot/2.1 (+http://www.google.com/bot.html)"} url = "https://www.theonion.com/" content = session.get(url, verify=False).content soup = BSoup(content, "html.parser") News = soup.find_all('div', {"class":"curation-module__item"}) for artcile in News: main = artcile.find_all('a')[0] link = main['href'] image_src = str(main.find('img')['srcset']).split(" ")[-4] title = main['title'] news_headline = Headline() news_headline.title = title news_headline.url = link news_headline.image = image_src news_headline.save() return redirect("./")
Then we write our view function news_scrape().
The news_scrape() method will scrape the news articles from the URL “theonion.com”.
These headers are used by our function to request the webpage. The scrapper acts like a normal HTTP client to the news site. The User-Agent key is important here.
This HTTP header will tell the server information about the client. We are using Google Bots for that purpose. When our client requests anything on the server, the server sees our request coming as a Google bot. You can configure it to look like a browser User-Agent.
In the News object, we return the <div> of a particular class. We selected this class by inspecting the webpage.
and this particular <div> has all three things(Title, image, URL)
To access the link we have used main[‘href’].
and then we have stored the data in our Headline database.
Now we have to show this data to our client. Follow these steps to achieve this.
Show the stored database objects
- create article_list() method in views.py to show the data
#news/views.py def article(request): headlines = Headline.objects.all() context = { 'headlines': headlines, } return render(request, "news/index.html", context)
now simply use this context variable to access the data in the Html template.
#index.html ..... <div class='container'> {% for headline in headlines %} <p>{{headline.title}}</p> <img src="{{headline.image.url}}"> <a href = "{{headline.url}}">Read Full Article</a> {% endfor %} </div> .....
Run the server and you are good to go. Style the webpage as you want.
Cheers!!
Happy Coding!!
Stay Safe!!!
Pingback: Learn how to Validate Username using Ajax in Django ‣