Automating PDF Download and Verification Using Selenium and Python
Automating PDF download and verification using Selenium with Python
✅ Goals
Download a PDF file via Selenium.
Verify that the file was downloaded.
(Optional) Read/Validate contents of the PDF.
๐ฆ Prerequisites
bash
Copy
Edit
pip install selenium PyPDF2
You’ll also need:
A browser driver (e.g., ChromeDriver)
A known download directory
๐งช Step 1: Configure Selenium to Download PDFs Automatically
Example with Chrome WebDriver:
python
Copy
Edit
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time
import os
# Define download directory
download_dir = "/path/to/downloads"
# Chrome options
options = webdriver.ChromeOptions()
prefs = {
"download.default_directory": download_dir,
"plugins.always_open_pdf_externally": True, # Skip built-in PDF viewer
"download.prompt_for_download": False,
}
options.add_experimental_option("prefs", prefs)
# Launch browser
driver = webdriver.Chrome(options=options)
# Open page with PDF download link
driver.get("https://example.com/sample.pdf")
# If it's a button/link, click it
driver.find_element(By.LINK_TEXT, "Download PDF").click()
# Wait for the download to finish
time.sleep(5)
# Verify PDF downloaded
files = os.listdir(download_dir)
pdf_files = [f for f in files if f.endswith(".pdf")]
if pdf_files:
print("✅ PDF downloaded:", pdf_files[0])
else:
print("❌ PDF not downloaded.")
driver.quit()
๐ Step 2: Read and Verify PDF Content (Optional)
Use PyPDF2 or pdfplumber to verify content:
python
Copy
Edit
from PyPDF2 import PdfReader
pdf_path = os.path.join(download_dir, pdf_files[0])
reader = PdfReader(pdf_path)
text = ""
for page in reader.pages:
text += page.extract_text()
# Check content
if "expected keyword" in text:
print("✅ PDF content verified.")
else:
print("❌ Expected content not found.")
๐ Tips
Use WebDriverWait instead of time.sleep() for more reliable timing.
PDFs may take time to finish downloading—check file size or use .crdownload detection to wait for completion.
For headless mode, make sure the download directory setup is correctly configured.
Would you like a version of this for headless browsers, or with pdfplumber for more advanced PDF text extraction?
Learn Selenium Python Training in Hyderabad
Read More
How to Use Selenium Grid for Distributed Testing in Python
Top 5 Projects You Can Build Using Selenium and Python
Visit Our IHUB Talent Training Institute in Hyderabad
Comments
Post a Comment