Automating PDF Download and Verification Using Selenium and Python

 Automating PDF download and verification using Selenium with Python


✅ Goals

Download a PDF file via Selenium.


Verify that the file was downloaded.


(Optional) Read/Validate contents of the PDF.


๐Ÿ“ฆ Prerequisites

bash

Copy

Edit

pip install selenium PyPDF2

You’ll also need:


A browser driver (e.g., ChromeDriver)


A known download directory


๐Ÿงช Step 1: Configure Selenium to Download PDFs Automatically

Example with Chrome WebDriver:

python

Copy

Edit

from selenium import webdriver

from selenium.webdriver.chrome.service import Service

from selenium.webdriver.common.by import By

import time

import os


# Define download directory

download_dir = "/path/to/downloads"


# Chrome options

options = webdriver.ChromeOptions()

prefs = {

    "download.default_directory": download_dir,

    "plugins.always_open_pdf_externally": True,  # Skip built-in PDF viewer

    "download.prompt_for_download": False,

}

options.add_experimental_option("prefs", prefs)


# Launch browser

driver = webdriver.Chrome(options=options)


# Open page with PDF download link

driver.get("https://example.com/sample.pdf")


# If it's a button/link, click it

driver.find_element(By.LINK_TEXT, "Download PDF").click()


# Wait for the download to finish

time.sleep(5)


# Verify PDF downloaded

files = os.listdir(download_dir)

pdf_files = [f for f in files if f.endswith(".pdf")]

if pdf_files:

    print("✅ PDF downloaded:", pdf_files[0])

else:

    print("❌ PDF not downloaded.")


driver.quit()

๐Ÿ“„ Step 2: Read and Verify PDF Content (Optional)

Use PyPDF2 or pdfplumber to verify content:


python

Copy

Edit

from PyPDF2 import PdfReader


pdf_path = os.path.join(download_dir, pdf_files[0])

reader = PdfReader(pdf_path)


text = ""

for page in reader.pages:

    text += page.extract_text()


# Check content

if "expected keyword" in text:

    print("✅ PDF content verified.")

else:

    print("❌ Expected content not found.")

๐Ÿ›  Tips

Use WebDriverWait instead of time.sleep() for more reliable timing.


PDFs may take time to finish downloading—check file size or use .crdownload detection to wait for completion.


For headless mode, make sure the download directory setup is correctly configured.


Would you like a version of this for headless browsers, or with pdfplumber for more advanced PDF text extraction?

Learn Selenium Python Training in Hyderabad

Read More

How to Use Selenium Grid for Distributed Testing in Python 

Top 5 Projects You Can Build Using Selenium and Python

Visit Our IHUB Talent Training Institute in Hyderabad

Get Directions

Comments

Popular posts from this blog

Handling Frames and Iframes Using Playwright

Tosca for API Testing: A Step-by-Step Tutorial

Cybersecurity Internship Opportunities in Hyderabad for Freshers