Automating Firefox for Web Application Integration

7 March, 2008

This post explains how to control Firefox from the command line with Telnet and Ruby. After presenting some context to explain why I think this hack represents an important area of concern in contemporary web application development, I'll show how to execute it with actual install directions and code samples.

Ok, I'll say it: I think JavaScript is cool. One of my favorite effects of the move to the modern AJAX-oriented web application architecture has been the opportunity to move ever more functionality into the client. At, we like to say, "Anything you can implement in JavaScript is free." Instead of running on our servers, the JavaScript portion of our app runs on a distributed grid of thousands of machines maintained for us by our users. Also, despite the reputation given it by the Browser Wars, JavaSript is incredibly fun to develop in: it's lightweight and extremely flexible in a unique way that somehow forces you to constantly keep your code very closely tied to the data it's manipulating.

The one big downside to JavaScript is its runtime environment. Not only does code running in the browser confront a Gordian Knot of browser compatibility problems, but it's also irretreviably isolated from interoperating with other application code. While javascript libraries (like the inestimable jQuery) are increasingly proving the Alexander's sword of the browser compatibility Knot, the issue of lack of application interoperability is only just beginning to get serious. As JavaScript's innate advantages lure more and more application code into the browser, the question will be unavoidable: How do you get modules implemented in JavaScript to interact with those built in other languages that live in more traditional environments? How do you avoid duplicating all functionality that you put into the JavaScript portion of the application so that you can call it from outside the browser?

This week, trying to solve exactly these types of problems, I discovered a tantalizing avenue towards addressing some of these questions: browser automation from the command line and from scripting languages. Here was my situation.

As part of an upcoming Grabbit project, I've built a a highly interactive data browser for our customers. The JavaScript running on that page makes a series of JSON GET requests to gather all of the necessary information to compose its display and it makes a few AJAX POST requests to report back to the server on certain bits of status. But now, I wanted to trigger those POSTs programatically on a schedule rather than waiting for customers to trigger them. The dilemma is that I'd already written this relatively sophisticated JavaScript application that makes all the necessary requests, implements the business logic, and knows how to POST in the data. I had two options: redo all of that work again in my server-side application (ick!) or figure out a way to trigger this JavaScript code by automating its runtime enviornment (the browser).

After a half day's research, here's what I discovered: there's a Firefox extension that allows other applications to establish JavaScript shell connections to a running Mozilla process via TCP/IP. It's called JSSH. Once you've got JSSH installed and running in Firefox, you can open a telnet connection to the browser that allows you to automate it using JavaScript commands to do things like load new pages or even manipulate the DOM on pages you've loaded. You can then automate this interaction using any scripting language with a telnet library. For the remainder of this post, I'll provide step-by-step instructions for running JSSH and for automating it with Ruby.

Install JSSH

The easiest way to install JSSH is to download the JSSH.xpi and open it with Firefox which will offer to install the extension (if you're interested in compiling Firefox with it from scratch or installing an existing binary, you should read these instructions).

Start Firefox with JSSH

Once you've got a copy of Firefox with JSSH installed, you'll need to run it. You can do this by providing the correct options when launching Firefox from the command line. On Mac OS X, that looks like this:

/Applications/ -jssh &

The "&" at the end of that line will background your command so it doesn't take over your terminal session.

Telnet into the JavaScript Shell

Once Firefox is running, we can use telnet to log into JSSH like so:

$ telnet localhost 9997
Trying ::1...
telnet: connect to address ::1: Connection refused
Connected to localhost.
Escape character is '^]'.
Welcome to the Mozilla JavaScript Shell!

Load a URL from JSSH

Now that we're in, we can tell Firefox to load pages for us, thusly:

var w0 = getWindows()[0]
var browser = w0.getBrowser()

And that's it! If the JavaScript application I wanted to run lived at "", we'd be done. That command would load the page and Firefox would interpret and run the JavaScript it found there.

Now, all we've got left to do is make it so that we can trigger this process from our application code. So, we'll...

Automate the Process with Ruby

Like any good scripting language, Ruby has a telnet library, which means that once we've got an instance of Firefox running with JSSH enabled, we can talk to it from Ruby whenever we want. Here's an example script that logs into the telnet shell and loads a series of URLs one at a time:

require 'net/telnet'
my_urls = ["", "", "", ""]
# start telnet session with the Firefox javascript shell and setup browser object
puts "starting telnet session"
firefox = Net::Telnet::new("Host" => "localhost", "Port" => 9997)
firefox.cmd "var w0 = getWindows()[0]"
firefox.cmd "var browser = w0.getBrowser()"
# load each page
my_urls.each do |url|
puts "loading...#{url}"
firefox.cmd "browser.loadURI('#{url}')"
sleep 10 # so that the browser has time to load even if the page is slow

Further Research: Screen Scraping JavaScript Heavy Sites

What else might this rickety bridge we've built to the JavaSript runtime environment be good for? One thing that immediately occurs to me is: screen scraping for sites with a lot of JavaScript. Another side effect of the rise of rich JavaScript applications has been to create intractable problems for people trying to do screen scraping. If the data you want is not in the page's HTML when you request it in the first place but is only written in later when the page's JavaScript runs then traditional spidering and screen scraping techiques will fail to find it. Freebase, the open database application built by Danny Hillis and his team, for example, uses a highly dynamic interface for presenting its data that is almost entirely based in JavaScript. Or, on the low-brow side, MySpace uses JavaScript throughout the forms in its interface to help with date picking and such. If you wanted to scrape or automate interaction with either of these sites, you'd need access to a runtime environment that could execute JavaScript.

I haven't really tackled this problem with JSSH, but I do have some leads. For example, here's how you get the html of the document:

> browser.contentDocument
[object XPCNativeWrapper [object HTMLDocument]]
> domDumpFull(domNode(browser.contentDocument))
<HTML><HEAD><META content="text/html...

If you want to explore this avenue further, one of the best places to look is Firewatir, a project to add Firefox support to the WATIR browser testing framework. They do lots of click-by-click automation and checking for results, so I'm sure they've figured out approaches for a lot of what you'd confront when screen scraping. The JSSH documentation itself is useful and clear but not the most in depth.

Happy automating! Let me know what you discover...

Tagged: , , , , , , ,