Jiaxi Liu (Jesse)

Master’s Graduate

Software Engineer | Scalable APIs · Web Scraping · Data Integration · Code Quality & Refactoring

Back to Blog

BeautifulSoup and Selenium: Static HTML Parsing vs Browser Automation

BeautifulSoup and Selenium both work with web pages, but they serve different purposes.

Selection Rule

If the target data already exists in the HTML, use BeautifulSoup. It is fast and lightweight.

If the page requires JavaScript execution, login, clicking, scrolling, or dynamic content, use Selenium.

BeautifulSoup Setup

from bs4 import BeautifulSoup
 
soup = BeautifulSoup(html_doc, "html.parser")

Finding Elements

first_p = soup.find("p")
all_links = soup.find_all("a")
title = soup.find("p", class_="title")

CSS selectors:

links = soup.select("a.sister")
id_link = soup.select("#link1")
nested = soup.select("p.story a")

Text and attributes:

text = soup.find("p").get_text()
href = soup.find("a").get("href")

Modify and delete:

tag = soup.find("b")
tag.string = "New Title"
 
link = soup.find("a", id="link1")
link.decompose()

Selenium Setup

from selenium import webdriver
 
driver = webdriver.Chrome()
driver.get("https://example.com")

Browser operations:

driver.maximize_window()
driver.refresh()
driver.back()
driver.forward()
print(driver.current_url)
print(driver.title)

Locating Elements

from selenium.webdriver.common.by import By
 
driver.find_element(By.ID, "username")
driver.find_element(By.NAME, "email")
driver.find_element(By.CSS_SELECTOR, "button.submit")
driver.find_element(By.XPATH, "//div[@id='content']")

Element Actions

input_box.send_keys("my_username")
input_box.clear()
button.click()
form.submit()

Explicit Waits

Dynamic pages need condition-based waits, not blind sleeps.

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
 
element = WebDriverWait(driver, 10).until(
    EC.visibility_of_element_located((By.ID, "username"))
)

Advanced Page Operations

Scroll:

driver.execute_script("arguments[0].scrollIntoView();", element)

Alerts:

alert = driver.switch_to.alert
alert.accept()

iframes:

driver.switch_to.frame("iframe_name")
driver.switch_to.default_content()

Screenshots:

driver.save_screenshot("page.png")
element.screenshot("element.png")

In practice, parse static HTML first, then page JSON, and only then start browser automation.