Python Selenium + PhantomJS on AWS EC2 Ubuntu Instance - Headless Browser Automation

Selenium and PhantomJS working in conjunction is one of the best headless browser automation options available today. I had been running a task written in Python on my local Mac OS X Yosemite for some time. I was trying to schedule the same job on my EC2 Ubuntu (14.04 64-bit) instance, but I ran into way more trouble than I anticipated. This is a summary of the research I did to get it working on my EC2 machine.

Step 1: Install PhantomJS
sudo apt-get install build-essential g++ flex bison gperf ruby perl libsqlite3-dev libfontconfig1-dev libicu-dev libfreetype6 libssl-dev libpng-dev libjpeg-dev python
git clone git://github.com/ariya/phantomjs.git
cd phantomjs
git checkout 2.0
./build.sh

Go ahead and move the file generated in /bin/phantomjs to an executable PATH

Step 2: Install Selenium Through pip

2.1 Install Python pip

wget https://bootstrap.pypa.io/get-pip.py
sudo python get-pip.py

2.2 Install Selenium using pip

sudo pip install selenium
Step 3: Python Selenium Code

To my horror, copy pasting my Python code on my Mac OS did not work on my Ubuntu instance! I kept taking screenshots to see what the screen is showing, and it was showing blanks! I finally found the --ssl-protocol issue.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.PhantomJS(service_args=['--ssl-protocol=any'])
driver.implicitly_wait(10)
driver.get('http://www.python.org/')
assert "Python" in driver.title
elem = driver.find_element_by_name("q")
elem.send_keys("pycon")
elem.send_keys(Keys.RETURN)
assert "No results found." not in driver.page_source
print(driver.title)
driver.quit()

I tried the Implicit Wait option in selenium. If you are less certain about the performance of the page you are testing, you should try Explicit Wait.

Happy coding! Let me know if this doesn't work for you, I can try to help you debug: kai@randomdotnext.com