- Home
- Selenium WebDriver Tutorial
Java Selenium WebDriver Tutorial
Master browser automation from setup to advanced design patterns. Learn WebDriver API, locators, waits, Page Object Model, and best practices used in professional QA teams.
What is Selenium?
Selenium is a free, open-source suite of tools for automating web browsers. It provides a programming interface to drive browsers programmatically, enabling automated functional testing of web applications across different browsers and operating systems.
Selenium was originally created by Jason Huggins in 2004 at ThoughtWorks to automate repetitive manual tests. Today it is governed by the Selenium Project under the Software Freedom Conservancy. The official documentation is at selenium.dev.
The Selenium Suite
Selenium WebDriver is like a remote control for your browser. Just as a TV remote sends signals to change channels, WebDriver sends commands — click here, type this, navigate there — through code instead of your hands.
Selenium 4 (released October 2021) implements the W3C WebDriver standard natively, removing the older JSON Wire Protocol. It adds Chrome DevTools Protocol (CDP) support, a revamped Grid, and improved relative locators.
Which component of the Selenium suite is used to run tests in parallel across multiple machines and browsers?
Selenium WebDriver Architecture
Understanding the architecture helps you troubleshoot failures and write more reliable automation. Selenium 4 uses the W3C WebDriver standard — a REST-like HTTP API for controlling browsers.
The W3C WebDriver specification defines a platform- and language-neutral wire protocol for remote control of web browsers. Every Selenium 4 command maps to an HTTP request sent to the browser driver executable.
4 Architecture Layers
Test Script (Language Bindings): The Java / Python / C# / JS code you write using the Selenium client library.
Selenium Client Library: Translates your method calls into HTTP requests conforming to the W3C WebDriver protocol.
Browser Driver (ChromeDriver / GeckoDriver etc.): A separate executable that receives HTTP requests and translates them into browser-native instructions.
Real Browser: Chrome, Firefox, Edge, or Safari — the actual browser being automated.
When you call driver.findElement(By.id("btn")).click(), the client sends a POST to localhost:9515/session/{id}/element (ChromeDriver's port). ChromeDriver translates this to a native Chrome command and the browser clicks the element.
ChromeDriver's major version must match your installed Chrome version. A mismatch causes SessionNotCreatedException. Use Selenium Manager (built into Selenium 4.6+) or WebDriverManager to handle this automatically.
In Selenium 4 architecture, what is the role of the Browser Driver (e.g., ChromeDriver)?
Setting Up the Environment
This tutorial uses Java with Maven — the most common enterprise QA setup. The same concepts apply to Python, C#, and JavaScript bindings.
Prerequisites
java -versionmvn -versionpom.xml — Maven Dependencies
<dependencies> <!-- Selenium Java --> <dependency> <groupId>org.seleniumhq.selenium</groupId> <artifactId>selenium-java</artifactId> <version>4.21.0</version> </dependency> <!-- TestNG --> <dependency> <groupId>org.testng</groupId> <artifactId>testng</artifactId> <version>7.10.2</version> <scope>test</scope> </dependency> </dependencies>
First Test
import org.openqa.selenium.WebDriver; import org.openqa.selenium.chrome.ChromeDriver; public class FirstTest { public static void main(String[] args) { // Selenium 4.6+ manages driver versions automatically WebDriver driver = new ChromeDriver(); driver.get("https://www.selenium.dev"); System.out.println("Title: " + driver.getTitle()); driver.quit(); } }
From Selenium 4.6 onward, the built-in Selenium Manager automatically downloads the correct browser driver at runtime. You no longer need to manually set the ChromeDriver path.
Starting from Selenium 4.6, what handles browser driver management automatically?
WebDriver Core Browser Commands
The WebDriver interface provides top-level methods for controlling the browser session — the building blocks of every Selenium script.
WebDriver driver = new ChromeDriver(); // Navigation driver.get("https://example.com"); driver.navigate().back(); driver.navigate().forward(); driver.navigate().refresh(); driver.navigate().to("https://page2.com"); // Browser info String title = driver.getTitle(); String url = driver.getCurrentUrl(); String src = driver.getPageSource(); // Window driver.manage().window().maximize(); driver.manage().window().setSize(new Dimension(1280, 800)); // Cleanup driver.close(); // Close current tab only driver.quit(); // End session + close ALL windows
Not calling driver.quit() leaks browser processes in memory. Always place it in @AfterMethod or a finally block to guarantee cleanup even when tests fail.
What is the key difference between driver.close() and driver.quit()?
Locators & Strategies
Locators tell WebDriver how to find elements on the page. Choosing the right strategy affects how robust and performant your tests are.
All Locator Types
| Locator | Example | Best Used When | Speed |
|---|---|---|---|
By.id | By.id("login") | Element has a unique id attribute | Fastest |
By.name | By.name("email") | Form input with name attribute | Fast |
By.cssSelector | By.cssSelector(".btn") | Complex attribute-based selection | Fast |
By.xpath | By.xpath("//input[@id='q']") | Text-based paths, parent traversal | Slower |
By.className | By.className("card") | Single class on unique element | Medium |
By.linkText | By.linkText("Sign In") | Exact anchor link text | Medium |
By.tagName | By.tagName("h1") | Fetching all elements of a tag | Medium |
XPath Common Patterns
// By attribute //input[@id='username'] //button[@type='submit'] // By text //button[text()='Login'] //button[contains(text(),'Log')] // Sibling / parent traversal //label[text()='Email']/following-sibling::input //td[text()='John']/parent::tr // ❌ AVOID — absolute XPath (breaks on any DOM change) /html/body/div[2]/form/input[1]
Use in this order: ID → Name → CSS Selector → XPath. Use XPath only when CSS cannot do the job — e.g., selecting by text content or traversing to a parent element.
Which XPath expression correctly selects a button whose text contains "Login"?
WebElement Interactions
Once located, the WebElement interface provides methods to interact with and read data from elements.
WebElement field = driver.findElement(By.id("email")); // Actions field.click(); field.sendKeys("[email protected]"); field.clear(); field.submit(); // Read data String text = field.getText(); String value = field.getAttribute("value"); String css = field.getCssValue("color"); // State checks boolean vis = field.isDisplayed(); boolean ena = field.isEnabled(); boolean sel = field.isSelected();
HTML select Dropdown — Select Class
Select select = new Select(driver.findElement(By.id("country"))); select.selectByVisibleText("India"); select.selectByValue("IN"); select.selectByIndex(2); String chosen = select.getFirstSelectedOption().getText(); // Special keys field.sendKeys(Keys.ENTER); field.sendKeys(Keys.chord(Keys.CONTROL, "a"), Keys.DELETE);
The Select class only works on native HTML <select> elements. For custom dropdowns built with div/ul/li (React, Angular etc.), locate and click each element directly.
Which Selenium class is specifically designed to interact with native HTML <select> dropdowns?
Waits & Synchronization
Timing issues are the leading cause of flaky tests. Selenium provides three types of waits to handle page load timing reliably.
Implicit Wait
A global timeout telling WebDriver to poll the DOM for a set duration when any findElement call doesn't immediately find the element. Set once for the whole session.
driver.manage().timeouts().implicitlyWait(Duration.ofSeconds(10));
Explicit Wait — Recommended
WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(15)); // Wait until visible WebElement el = wait.until( ExpectedConditions.visibilityOfElementLocated(By.id("result")) ); // Wait until clickable wait.until(ExpectedConditions.elementToBeClickable(By.id("btn"))); // Wait for page title wait.until(ExpectedConditions.titleContains("Dashboard")); // Wait for spinner to disappear wait.until(ExpectedConditions.invisibilityOfElementLocated( By.cssSelector(".spinner") ));
Fluent Wait
FluentWait<WebDriver> wait = new FluentWait<>(driver) .withTimeout(Duration.ofSeconds(30)) .pollingEvery(Duration.ofMillis(500)) .ignoring(NoSuchElementException.class); WebElement el = wait.until(d -> d.findElement(By.id("dynamic")));
Combining them causes unpredictable behaviour. The official Selenium docs recommend using only one approach — prefer explicit waits for granular control.
What is the key advantage of Explicit Wait over Implicit Wait?
Advanced Actions & JavascriptExecutor
The Actions class handles complex gestures. JavascriptExecutor lets you run JS directly in the browser for edge cases.
Actions actions = new Actions(driver); // Hover (reveals dropdown menus) actions.moveToElement(navMenu).perform(); // Right-click actions.contextClick(element).perform(); // Double-click actions.doubleClick(element).perform(); // Drag and drop actions.dragAndDrop(sourceEl, targetEl).perform(); // Shift + click (multi-select) actions.keyDown(Keys.SHIFT) .click(item1).click(item2) .keyUp(Keys.SHIFT).perform();
JavascriptExecutor js = (JavascriptExecutor) driver; // Scroll element into view js.executeScript("arguments[0].scrollIntoView(true);", el); // Force-click (bypass overlays) js.executeScript("arguments[0].click();", el); // Scroll to bottom js.executeScript("window.scrollTo(0, document.body.scrollHeight);");
What method must be called at the end of an Actions chain to actually execute the actions?
Alerts, iFrames & Multiple Windows
Real applications use JavaScript alerts, iframes, and multiple browser tabs. Selenium's switchTo() API handles each.
JavaScript Alerts
wait.until(ExpectedConditions.alertIsPresent()); Alert alert = driver.switchTo().alert(); alert.getText(); // Read message alert.accept(); // Click OK alert.dismiss(); // Click Cancel alert.sendKeys("input"); // Type in prompt
iFrames
// Switch into iframe driver.switchTo().frame(0); driver.switchTo().frame("iframeName"); driver.switchTo().frame(driver.findElement(By.id("myFrame"))); // Return to main document driver.switchTo().defaultContent(); driver.switchTo().parentFrame();
Multiple Windows / Tabs
String original = driver.getWindowHandle(); driver.findElement(By.id("openNew")).click(); for (String h : driver.getWindowHandles()) { if (!h.equals(original)) { driver.switchTo().window(h); break; } } // Switch back when done driver.switchTo().window(original);
After interacting inside an iframe, which method returns focus to the main page?
Page Object Model (POM)
POM is a design pattern where each web page has a corresponding Java class. Each class encapsulates that page's locators and interaction methods, separating test logic from page-interaction code.
A Page Object is like a user manual for a single page. When a test needs to log in, it calls the LoginPage "manual" — loginPage.login("user","pass") — without knowing how the page works internally.
Benefits
public class LoginPage { private WebDriver driver; @FindBy(id = "username") private WebElement usernameField; @FindBy(id = "password") private WebElement passwordField; @FindBy(cssSelector = "button[type='submit']") private WebElement loginBtn; public LoginPage(WebDriver driver) { this.driver = driver; PageFactory.initElements(driver, this); } public DashboardPage login(String user, String pass) { usernameField.sendKeys(user); passwordField.sendKeys(pass); loginBtn.click(); return new DashboardPage(driver); } }
@Test public void testValidLogin() { LoginPage page = new LoginPage(driver); DashboardPage dash = page.login("admin", "secret"); Assert.assertTrue(dash.isLoaded()); }
What is the role of PageFactory.initElements(driver, this) in a Page Object class?
TestNG Integration
TestNG is the most widely used test framework with Selenium in Java. It provides lifecycle annotations, grouping, parallel execution, data-driven testing, and HTML reports.
Key Annotations
Data-Driven Testing
@DataProvider(name = "loginData") public Object[][] credentials() { return new Object[][] { { "admin", "pass1", true }, { "user2", "pass2", true }, { "invalid", "wrong", false } }; } @Test(dataProvider = "loginData") public void testLogin(String user, String pass, boolean success) { loginPage.login(user, pass); if (success) { Assert.assertTrue(dashboard.isLoaded()); } else { Assert.assertTrue(loginPage.getError().contains("Invalid")); } }
Which TestNG annotation is best suited for opening a browser before each individual test method?
Best Practices & Common Pitfalls
Knowing the API is only half the battle. These practices separate reliable automation suites from fragile scripts that teams eventually abandon.
Best Practices — Always Follow
Use explicit waits — never Thread.sleep().
Thread.sleep(3000)always waits the full duration. Explicit waits exit as soon as the condition is met, saving seconds per test.Implement POM from day one. Even for small projects. Locators scattered across test classes become a nightmare after 20+ tests.
Never hard-code test data. Use
@DataProvider, external CSV/Excel files, or a TestData factory class.Always clean up in @AfterMethod.
driver.quit()there guarantees cleanup even when tests fail.Capture screenshots on failure. Use a TestNG
ITestListenerto auto-save screenshots on test failures.Use meaningful method names.
testLoginWithValidCredentials()is far better thantest1().
❌ Absolute XPaths like /html/body/div[3]/table/tr[2]/td[1] — break on any DOM change
❌ Thread.sleep() everywhere — slow and still flaky
❌ No assertions — tests pass even when app is broken
❌ Sharing one WebDriver instance across parallel tests
Recommended Project Structure
src/main/java/ pages/ ← Page Object classes utils/ ← DriverFactory, ScreenshotUtil, WaitUtil data/ ← TestData constants src/test/java/ tests/ ← All @Test classes listeners/ ← ITestListener (screenshot on fail) src/test/resources/ testng.xml ← Suite configuration config.properties ← Base URL, timeouts, credentials screenshots/ ← Auto-captured on failure pom.xml
Why should Thread.sleep() be avoided for synchronization in Selenium tests?
Ready to advance your QA career?
Hands-on QA training covering Selenium, Playwright, JIRA, Manual Testing & more — taught by industry practitioners.
Visit stadsolution.com →