Quantcast

Python Selenium + PhantomJS on AWS EC2 Ubuntu Instance - Headless Browser Automation

Selenium and PhantomJS working in conjunction is one of the best headless browser automation options available today. I had been running a task written in Python on my local Mac OS X Yosemite for some time. I was trying to schedule the same job on my EC2 Ubuntu (14.04 64-bit) instance, but I ran into way more trouble than I anticipated. This is a summary of the research I did to get it working on my EC2 machine.

Step 1: Install PhantomJS
sudo npm install -g phantomjs  

If you do not have npm installed, install node (which includes npm): https://nodejs.org/en/download/

Step 2: Install Selenium Through pip

2.1 Install Python pip

wget https://bootstrap.pypa.io/get-pip.py  
sudo python get-pip.py  

2.2 Install Selenium using pip

sudo pip install selenium  
Step 3: Python Selenium Code

To my horror, copy pasting my Python code on my Mac OS did not work on my Ubuntu instance! I kept taking screenshots to see what the screen is showing, and it was showing blanks! I finally found the --ssl-protocol issue.

from selenium import webdriver  
from selenium.webdriver.common.keys import Keys

driver = webdriver.PhantomJS(service_args=['--ssl-protocol=any'])  
driver.implicitly_wait(10)  
driver.get('http://www.python.org/')  
assert "Python" in driver.title  
elem = driver.find_element_by_name("q")  
elem.send_keys("pycon")  
elem.send_keys(Keys.RETURN)  
assert "No results found." not in driver.page_source  
print(driver.title)  
driver.quit()  

I tried the Implicit Wait option in selenium. If you are less certain about the performance of the page you are testing, you should try Explicit Wait.

Happy coding! Let me know if this doesn't work for you, I can try to help you debug: kai@randomdotnext.com