How to Use Selenium in R
Selenium is a powerful tool for web scraping and browser automation. While it's commonly used with Python, it can also be used with R. This tutorial explains how to use Selenium in R, focusing on the selenider package, which is more up-to-date and compatible with modern browsers compared to the older RSelenium package.PrerequisitesBefore we start, make sure you have the following installed:R: You can download and install R from CRAN.RStudio (optional but recommended): An integrated development environment for R. Download it from RStudio.Java: Selenium requires Java to be installed on your system. Make sure you have Java version 17 or higher. Download it from Oracle.WebDriver: Depending on the browser you want to automate (Chrome, Firefox, etc.), download the appropriate WebDriver.ChromeDriver: DownloadGeckoDriver (for Firefox): DownloadSetting Up Selenium in RStep 1: Install Required PackagesYou need the selenium and selenider packages for using Selenium in R. Install them using the following command: r install.packages("selenium")install.packages("selenider")Step 2: Load the PackagesLoad the selenider package into your R session. Additionally, we'll use the dplyr package for better readability using pipes. rlibrary(selenider)library(dplyr)Step 3: Start a Selenium SessionCreate a session and launch a new window of the Google Chrome browser. Make sure ChromeDriver is in your PATH. r# Start a Selenium session with Chromesession <- selenider_session("selenium", browser = "chrome")Step 4: Navigating to a WebpageUse the open_url function to navigate to a webpage. Here, we'll navigate to Wikipedia. r# Navigate to a webpageopen_url("https://www.wikipedia.com/")If you encounter a timeout error, you can increase the timeout duration: r# Navigate to a webpage with increased timeoutopen_url("https://www.wikipedia.com/", timeout = 120)Step 5: Interacting with Web ElementsTo interact with web elements, you need to locate them first. You can use CSS selectors or XPath expressions.Finding an Element by CSS Selector r# Find an element and get its texttitle <- session %>% find_element("h1") %>% elem_text()print(title)The s function is a shorthand for find_element: r# Shorthand for find_elementtitle <- s("h1") %>% elem_text()print(title)Finding Multiple Elements r# Find multiple elementselements <- session %>% find_elements(".central-featured-lang strong")extracted_text <- sapply(elements, elem_text)print(extracted_text) The ss function is a shorthand for find_elements: r# Shorthand for find_elementselements <- ss(".central-featured-lang strong")extracted_text <- sapply(elements, elem_text)print(extracted_text) Step 6: Clicking an ElementUse the elem_click function to click on a web element. r# Click an elementsession %>% find_element(".central-featured-lang strong") %>% elem_click() Step 7: Extracting Information from WebpagesWeb scraping involves extracting information from webpages. Here's an example of extracting the titles of news items from the R Project homepage.Example: Extracting Titles of News Items r# Navigate to the R Project homepageopen_url("https://www.r-project.org/")# Find the news items by CSS selectornews_items <- ss(".news-item")news_titles <- sapply(news_items, elem_text)print(news_titles)Step 8: Taking ScreenshotsTaking screenshots can be useful for debugging or documentation purposes. r# Take a screenshotfile_path <- withr::local_tempfile(fileext = ".png")take_screenshot(file_path, view = TRUE)Step 9: Running JavaScriptYou can execute JavaScript code in the browser. r# Execute JavaScriptexecute_js_expr("return document.querySelectorAll('.central-featured-lang strong')")execute_js_expr("return navigator.userAgent")execute_js_expr("arguments[0].click()", s(".central-featured-lang strong"))Step 10: Closing the Selenium SessionAfter completing your tasks, it is important to close the Selenium session to free up resources. r# Close the Selenium sessionclose_session()Example Use Case: Automating Google SearchLet's combine everything we've learned into a practical example. We'll automate a Google search and extract the titles of the search results. r# Load the required packageslibrary(selenider)library(dplyr)# Start a Selenium session with Chromesession <- selenider_session("selenium", browser = "chrome")# Navigate to Googleopen_url("https://www.google.com/")# Find the search box by name and enter a querysearch_box <- session %>% find_element("name", "q")search_box %>% send_keys("R programming", key = "enter")# Wait for the results to loadSys.sleep(3)# Extract the titles of the search resultsresult_titles <- ss("h3") %>% sapply(elem_text)print(result_titles)# Take a screenshot of the results pagefile_path <- withr::local_tempfile(fileext = ".png")take_screenshot(file_path, view = TRUE)# Close the Selenium sessionclose_session() In this tutorial, we explored how to use Selenium in R for web scraping and browser automation using the selenider package. We covered the setup process, navigating to webpages, interacting with web elements, extracting information, taking screenshots, and running JavaScript code. We also demonstrated a practical example of automating a Google search.Using Selenium in R can greatly enhance your ability to gather and interact with web data programmatically.
Learn More >