Automating PDF Download and Verification Using Selenium and Python

June 03, 2025

Automating PDF download and verification using Selenium with Python

✅ Goals

Download a PDF file via Selenium.

Verify that the file was downloaded.

(Optional) Read/Validate contents of the PDF.

📦 Prerequisites

bash

Copy

Edit

pip install selenium PyPDF2

You’ll also need:

A browser driver (e.g., ChromeDriver)

A known download directory

🧪 Step 1: Configure Selenium to Download PDFs Automatically

Example with Chrome WebDriver:

python

Copy

Edit

from selenium import webdriver

from selenium.webdriver.chrome.service import Service

from selenium.webdriver.common.by import By

import time

import os

# Define download directory

download_dir = "/path/to/downloads"

# Chrome options

options = webdriver.ChromeOptions()

prefs = {

"download.default_directory": download_dir,

"plugins.always_open_pdf_externally": True, # Skip built-in PDF viewer

"download.prompt_for_download": False,

}

options.add_experimental_option("prefs", prefs)

# Launch browser

driver = webdriver.Chrome(options=options)

# Open page with PDF download link

driver.get("https://example.com/sample.pdf")

# If it's a button/link, click it

driver.find_element(By.LINK_TEXT, "Download PDF").click()

# Wait for the download to finish

time.sleep(5)

# Verify PDF downloaded

files = os.listdir(download_dir)

pdf_files = [f for f in files if f.endswith(".pdf")]

if pdf_files:

print("✅ PDF downloaded:", pdf_files[0])

else:

print("❌ PDF not downloaded.")

driver.quit()

📄 Step 2: Read and Verify PDF Content (Optional)

Use PyPDF2 or pdfplumber to verify content:

python

Copy

Edit

from PyPDF2 import PdfReader

pdf_path = os.path.join(download_dir, pdf_files[0])

reader = PdfReader(pdf_path)

text = ""

for page in reader.pages:

text += page.extract_text()

# Check content

if "expected keyword" in text:

print("✅ PDF content verified.")

else:

print("❌ Expected content not found.")

🛠 Tips

Use WebDriverWait instead of time.sleep() for more reliable timing.

PDFs may take time to finish downloading—check file size or use .crdownload detection to wait for completion.

For headless mode, make sure the download directory setup is correctly configured.

Would you like a version of this for headless browsers, or with pdfplumber for more advanced PDF text extraction?

Learn Selenium Python Training in Hyderabad

Top 5 Projects You Can Build Using Selenium and Python

Visit Our IHUB Talent Training Institute in Hyderabad

Get Directions

Search This Blog

IHUB Talent

Automating PDF Download and Verification Using Selenium and Python

Automating PDF download and verification using Selenium with Python

🛠 Tips

Comments

Post a Comment

Popular posts from this blog

Handling Frames and Iframes Using Playwright

Cybersecurity Internship Opportunities in Hyderabad for Freshers

Tosca for API Testing: A Step-by-Step Tutorial