Web Automation with Python
- Get link
- Other Apps
Using Selenium
Introduction
Website automation is a way to automate common web actions — like filling out forms, clicking on buttons, downloading files and handing them over to helpful software bots. Unfortunately, while the Internet makes doing business faster and easier in countless ways, these actions can be time-consuming and prone to errors.
You might be used to this trend now, but we’ll use a Python module to achieve this, Selenium.
Installation
For Windows:
`pip install selenium`
For Linux and Mac:
`pip3 install selenium`
Selenium requires a driver to interface with the chosen browser, which will be Chrome for us.
First, go to the About section of Chrome and update it to the latest version.
Keep the version in mind.
For me, the version is 103.something.something, so I will go to the site below and download the chromedriver for the same version.
So I used the second link and downloaded the driver. Now keep in mind, we won’t run the driver, we will use Python.
Code
Importing
And creating a driver object
from selenium import webdriver
# this import is optional in case you want to tweak the driver itself
from selenium.webdriver.chrome.options import Options
opt = Options()
opt.add_argument(“ — start-maximized”)
# or if you don’t even want to see the browser window
opt.add_argument(‘ — headless’)
# incase you want to mute the driver
opt.add_argument(“ — mute-audio”)
opt.add_argument(“ — disable-extensions”)
# put the downloaded driver in the same file
driver = webdriver.Chrome(options=opt,
executable_path=”chromedriver.exe”)
Navigating
The first thing you’ll want to do with WebDriver is navigate to a link. The usual way to do this is by calling get method:
# to open a website, just use the get function
driver.get(“https://google.com")
WebDriver will wait until the page has fully loaded (that is, the onload event has fired) before returning control to your test or script. But, you still need to ensure such pages are fully loaded before you try to access an element.
Locating Elements
Now, let’s say we want to find the search input inside the page’s html to make a query.
from selenium.webdriver.common.by import By
element = driver.find_element(By.CSS_SELECTOR, ‘input[aria-label=”Search”]’)
# we can use even other criterias, like
find_element(By.ID, “id”)
find_element(By.NAME, “name”)
find_element(By.XPATH, “xpath”)
find_element(By.LINK_TEXT, “link text”)
find_element(By.PARTIAL_LINK_TEXT, “partial link text”)
find_element(By.TAG_NAME, “tag name”)
find_element(By.CLASS_NAME, “class name”)
find_element(By.CSS_SELECTOR, “css selector”)
For that, we can use many methods.
Keys
Now, after we have the elements, we want to fill some text into them; for that, we need to use the send_keys function, but if we want to send in some special keys, e.g. Alt, Ctrl, Enter, F1 etc., we need to use the Keys class.
from selenium.webdriver.common.keys import Keys
element.send_keys(“python”)
element.send_keys(Keys.ENTER)
Waits
Now, let’s say after doing that, you want to click on the first link, so you find that element, but, what if it hasn’t yet loaded, the result will be an error of no element like that found. So, we search the page after the element has been loaded to prevent this.
To achieve that, we can either use the time.sleep() function, but that will waste time if the element loads early.
import time
time.sleep(2)
import time
time.sleep(2)
So the final solution is to use selenium’s wait function, which waits just the exact amount.
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
# giving a timeout
match = WebDriverWait(driver,30).until(
# will search till the element is located
expected_conditions.presence_of_element_located(
# the condition to search the 1st link of each search
(By.CSS_SELECTOR, “div[lang=’en’]”)
)
)
# just printing out the text attribute in the match
print(match.text)
Output:
Final Touches to make this a bit more usable
# finding the link
link_tag = match.find_element(By.TAG_NAME, “a”)
link = link_tag.get_attribute(“href”)
# and redirecting to it
driver.get(link)
Extras
If you want to click on the elements instead of sending in keys, it’s pretty simple.
# Assume the button has the ID “submit” :)
driver.find_element_by_id(“submit”).click()
# or if the element is a for you could just
element.submit()
If you want to move between windows:
driver.switch_to_window(“windowName”)
Selenium WebDriver has built-in support for handling popup dialog boxes. After you’ve triggered an action that would open a popup, you can access the alert with the following:
alert = driver.switch_to.alert
To move backwards and forwards in your browser’s history:
driver.forward()
driver.back()
You may be interested in understanding how to use cookies. First of all, you need to be on the domain that the cookie will be valid for:
driver.get(“http://www.example.com")
# Now set the cookie. This one’s valid for the entire domain
cookie = {‘name’ : ‘foo’, ‘value’ : ‘bar’}
driver.add_cookie(cookie)
# And now output all the available cookies for the current URL
driver.get_cookies()
Conclusion
You can do anything you do on the Internet using Selenium.You can do so much using all the various supported browsers and simple functions,and anything can be automated (almost, leaving out the sites protected by captcha), and made much faster; you can even run multiple in parallel and on different operating systems.
But remember that the toughest part is finding the elements in html that you want to interact with and generalizing it, so it works with different parameters.
Resources
- Get link
- Other Apps
Comments
Post a Comment