BeautifulSoup and Selenium both work with web pages, but they serve different purposes.
Selection Rule
If the target data already exists in the HTML, use BeautifulSoup. It is fast and lightweight.
If the page requires JavaScript execution, login, clicking, scrolling, or dynamic content, use Selenium.
BeautifulSoup Setup
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, "html.parser")Finding Elements
first_p = soup.find("p")
all_links = soup.find_all("a")
title = soup.find("p", class_="title")CSS selectors:
links = soup.select("a.sister")
id_link = soup.select("#link1")
nested = soup.select("p.story a")Text and attributes:
text = soup.find("p").get_text()
href = soup.find("a").get("href")Modify and delete:
tag = soup.find("b")
tag.string = "New Title"
link = soup.find("a", id="link1")
link.decompose()Selenium Setup
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://example.com")Browser operations:
driver.maximize_window()
driver.refresh()
driver.back()
driver.forward()
print(driver.current_url)
print(driver.title)Locating Elements
from selenium.webdriver.common.by import By
driver.find_element(By.ID, "username")
driver.find_element(By.NAME, "email")
driver.find_element(By.CSS_SELECTOR, "button.submit")
driver.find_element(By.XPATH, "//div[@id='content']")Element Actions
input_box.send_keys("my_username")
input_box.clear()
button.click()
form.submit()Explicit Waits
Dynamic pages need condition-based waits, not blind sleeps.
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
element = WebDriverWait(driver, 10).until(
EC.visibility_of_element_located((By.ID, "username"))
)Advanced Page Operations
Scroll:
driver.execute_script("arguments[0].scrollIntoView();", element)Alerts:
alert = driver.switch_to.alert
alert.accept()iframes:
driver.switch_to.frame("iframe_name")
driver.switch_to.default_content()Screenshots:
driver.save_screenshot("page.png")
element.screenshot("element.png")In practice, parse static HTML first, then page JSON, and only then start browser automation.