Generating Link Preview using beautifulsoup4 and Django

Link previews are pop-up boxes you might see on a chat app or other social media platform when you share a URL. Link previews summarize the contents of the URL and display the name of the linked website, an image and a description of the website’s content.

In this article, we will be using beautifulsoup library to scrape the basic web data, and using that data we will generate the preview of the link.

Detail video tutorial is on youtube. Do watch it for more detail.

Libraries we need.

  1. beautifulsoup4
  2. requests

requests will provide us with our target’s HTML, and beautifulsoup4 will parse that data.

Installation using pip on virtual environment

$ pip3 install beautifulsoup4 requests

Now create and open a brand new Django project. Open views.py and set the request header.

import requests
from bs4 import BeautifulSoup


headers = {
    'Access-Control-Allow-Origin': '*',
    'Access-Control-Allow-Methods': 'GET',
    'Access-Control-Allow-Headers': 'Content-Type',
    'Access-Control-Max-Age': '3600',
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0'
    }

Create the basic functions to get the title, description and image from the link.

def get_title(html):
    """Scrape page title."""
    title = None
    if html.title.string:
        title = html.title.string
    elif html.find("meta", property="og:title"):
        title = html.find("meta", property="og:title").get('content')
    elif html.find("meta", property="twitter:title"):
        title = html.find("meta", property="twitter:title").get('content')
    elif html.find("h1"):
        title = html.find("h1").string
    return title


def get_description(html):
    """Scrape page description."""
    description = None
    if html.find("meta", property="description"):
        description = html.find("meta", property="description").get('content')
    elif html.find("meta", property="og:description"):
        description = html.find("meta", property="og:description").get('content')
    elif html.find("meta", property="twitter:description"):
        description = html.find("meta", property="twitter:description").get('content')
    elif html.find("p"):
        description = html.find("p").contents
    return description


def get_image(html):
    """Scrape share image."""
    image = None
    if html.find("meta", property="image"):
        image = html.find("meta", property="image").get('content')
    elif html.find("meta", property="og:image"):
        image = html.find("meta", property="og:image").get('content')
    elif html.find("meta", property="twitter:image"):
        image = html.find("meta", property="twitter:image").get('content')
    elif html.find("img", src=True):
        image = html.find_all("img").get('src')
    return image

Now let’s create the view function to generate the preview of the link.

Also read how to create qr-code in django

def generate_preview(request):
    url = request.GET.get('link')
    req = requests.get(url, headers)
    html = BeautifulSoup(req.content, 'html.parser')
    meta_data = {
       'title': get_title(html),
       'description': get_description(html),
       'image': get_image(html),
    }
    return JsonResponse(meta_data)

Now that’s all we need to do on our backend and now simply send an ajax request and get this response to render it on a template.

For a full detailed video tutorial consider watching youtube video

Full code for the tutorial is on github.

Thanks for reading.

Follow for more such articles and videos.

Cheers!!

Happy coding.

Advertisement

Leave a Reply