How to Use Selenium in R

Category Data Analytics, Data Engineering

Selenium is a powerful tool for web scraping and browser automation. While it's commonly used with Python, it can also be used with R. This tutorial explains how to use Selenium in R, focusing on the selenider package, which is more up-to-date and compatible with modern browsers compared to the older RSelenium package.

Prerequisites

Before we start, make sure you have the following installed:

  1. R: You can download and install R from CRAN.
  2. RStudio (optional but recommended): An integrated development environment for R. Download it from RStudio.
  3. Java: Selenium requires Java to be installed on your system. Make sure you have Java version 17 or higher. Download it from Oracle.
  4. WebDriver: Depending on the browser you want to automate (Chrome, Firefox, etc.), download the appropriate WebDriver.

Setting Up Selenium in R

Step 1: Install Required Packages

You need the selenium and selenider packages for using Selenium in R. Install them using the following command:

 r

install.packages("selenium")
install.packages("selenider")

Step 2: Load the Packages

Load the selenider package into your R session. Additionally, we'll use the dplyr package for better readability using pipes.

 r

library(selenider)
library(dplyr)

Step 3: Start a Selenium Session

Create a session and launch a new window of the Google Chrome browser. Make sure ChromeDriver is in your PATH.

 r

# Start a Selenium session with Chrome
session <- selenider_session("selenium", browser = "chrome")

Step 4: Navigating to a Webpage

Use the open_url function to navigate to a webpage. Here, we'll navigate to Wikipedia.

 r

# Navigate to a webpage
open_url("https://www.wikipedia.com/")

If you encounter a timeout error, you can increase the timeout duration:

 r

# Navigate to a webpage with increased timeout
open_url("https://www.wikipedia.com/", timeout = 120)

Step 5: Interacting with Web Elements

To interact with web elements, you need to locate them first. You can use CSS selectors or XPath expressions.

Finding an Element by CSS Selector

 r

# Find an element and get its text
title <- session %>% find_element("h1") %>% elem_text()
print(title)

The s function is a shorthand for find_element:

 r

# Shorthand for find_element
title <- s("h1") %>% elem_text()
print(title)

Finding Multiple Elements

 r

# Find multiple elements
elements <- session %>% find_elements(".central-featured-lang strong")
extracted_text <- sapply(elements, elem_text)
print(extracted_text)

 

The ss function is a shorthand for find_elements:

 r

# Shorthand for find_elements
elements <- ss(".central-featured-lang strong")
extracted_text <- sapply(elements, elem_text)
print(extracted_text)

 

Step 6: Clicking an Element

Use the elem_click function to click on a web element.

 r

# Click an element
session %>% find_element(".central-featured-lang strong") %>% elem_click()

 

Step 7: Extracting Information from Webpages

Web scraping involves extracting information from webpages. Here's an example of extracting the titles of news items from the R Project homepage.

Example: Extracting Titles of News Items

 r

# Navigate to the R Project homepage
open_url("https://www.r-project.org/")

# Find the news items by CSS selector
news_items <- ss(".news-item")
news_titles <- sapply(news_items, elem_text)
print(news_titles)

Step 8: Taking Screenshots

Taking screenshots can be useful for debugging or documentation purposes.

 r

# Take a screenshot
file_path <- withr::local_tempfile(fileext = ".png")
take_screenshot(file_path, view = TRUE)

Step 9: Running JavaScript

You can execute JavaScript code in the browser.

 r

# Execute JavaScript
execute_js_expr("return document.querySelectorAll('.central-featured-lang strong')")
execute_js_expr("return navigator.userAgent")
execute_js_expr("arguments[0].click()", s(".central-featured-lang strong"))

Step 10: Closing the Selenium Session

After completing your tasks, it is important to close the Selenium session to free up resources.

 r

# Close the Selenium session
close_session()

Example Use Case: Automating Google Search

Let's combine everything we've learned into a practical example. We'll automate a Google search and extract the titles of the search results.

 r

# Load the required packages
library(selenider)
library(dplyr)

# Start a Selenium session with Chrome
session <- selenider_session("selenium", browser = "chrome")

# Navigate to Google
open_url("https://www.google.com/")

# Find the search box by name and enter a query
search_box <- session %>% find_element("name", "q")
search_box %>% send_keys("R programming", key = "enter")

# Wait for the results to load
Sys.sleep(3)

# Extract the titles of the search results
result_titles <- ss("h3") %>% sapply(elem_text)
print(result_titles)

# Take a screenshot of the results page
file_path <- withr::local_tempfile(fileext = ".png")
take_screenshot(file_path, view = TRUE)

# Close the Selenium session
close_session()

 

In this tutorial, we explored how to use Selenium in R for web scraping and browser automation using the selenider package. We covered the setup process, navigating to webpages, interacting with web elements, extracting information, taking screenshots, and running JavaScript code. We also demonstrated a practical example of automating a Google search.

Using Selenium in R can greatly enhance your ability to gather and interact with web data programmatically.

Ready to embark on a transformative journey? Connect with our experts and fuel your growth today!