Python Selenium + PhantomJS on AWS EC2 Ubuntu Instance - Headless Browser Automation
Selenium and PhantomJS working in conjunction is one of the best headless browser automation options available today. I had been running a task written in Python on my local Mac OS X Yosemite for some time. I was trying to schedule the same job on my EC2 Ubuntu (14.04 64-bit) instance, but I ran into way more trouble than I anticipated. This is a summary of the research I did to get it working on my EC2 machine.
Step 1: Install PhantomJS
sudo apt-get install build-essential g++ flex bison gperf ruby perl libsqlite3-dev libfontconfig1-dev libicu-dev libfreetype6 libssl-dev libpng-dev libjpeg-dev python
git clone git://github.com/ariya/phantomjs.git
cd phantomjs
git checkout 2.0
./build.sh
Go ahead and move the file generated in /bin/phantomjs to an executable PATH
Step 2: Install Selenium Through pip
2.1 Install Python pip
wget https://bootstrap.pypa.io/get-pip.py
sudo python get-pip.py
2.2 Install Selenium using pip
sudo pip install selenium
Step 3: Python Selenium Code
To my horror, copy pasting my Python code on my Mac OS did not work on my Ubuntu instance! I kept taking screenshots to see what the screen is showing, and it was showing blanks! I finally found the --ssl-protocol issue.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.PhantomJS(service_args=['--ssl-protocol=any'])
driver.implicitly_wait(10)
driver.get('http://www.python.org/')
assert "Python" in driver.title
elem = driver.find_element_by_name("q")
elem.send_keys("pycon")
elem.send_keys(Keys.RETURN)
assert "No results found." not in driver.page_source
print(driver.title)
driver.quit()
I tried the Implicit Wait option in selenium. If you are less certain about the performance of the page you are testing, you should try Explicit Wait.
Happy coding! Let me know if this doesn't work for you, I can try to help you debug: kai@randomdotnext.com