CoScripter

What do you get when you mix one part automation, one part natural language interpretation, two parts programming by demonstration, and three parts online collaboration? If you stir all of these research areas together and toss in some XUL, you get one of the most innovative extensions for Firefox: CoScripter.

CoScripter was created by a research team at IBM led by Allen Cypher, and it allows you to record your actions on the Web, play them back, and share them with others. For instance, one popular script quickly automates the process of adding your phone number to the national do not call registry:

Coscripter

A video demonstrating how CoScripter works is available on IBM’s alphaWorks site. In the video, they automate the process of searching for houses in Palo Alto. Instead of bookmarking individual static pages on the Web, this is like bookmarking a series of actions, with online sharing and tagging built in.

In addition to being really useful, CoScripter is also very interesting from a research perspective. One of the most innovative aspects of CoScripter is that actions are represented as human readable and editable text. In their CHI 2007 paper they put CoScripter into the context of previous Firefox extensions:

Greasemonkey [1] enables users to make client-side modifications to the appearance and behavior of web pages on their computer. However, creating a Greasemonkey script requires detailed knowledge of JavaScript programming to alter the DOM of the web page.

Chickenfoot [2] eases client-side customization by providing a higher-level API for accessing and manipulating common web page elements, using information in the rendered DOM. For example, the Chickenfoot instruction click(‘search button’) will click a button with the text “search” on it. However, the Chickenfoot interface is still very much a programming interface, in which users write syntactically correct statements in the Chickenfoot programming language.

Unlike Greasemonkey and Chickenfoot, CoScripter does not require users to know how to program. CoScripter expresses commands in natural language, as opposed to a formal scripting syntax. This means that you can literally edit the textual instructions and play the script again, or even drop in instructions written by hand and see if CoScripter is able to execute them. Because CoScripter’s interpreter is extremely flexible, this actually works surprisingly well. They call this approach Sloppy Programming:

…Koala [former name] leverages the sloppy programming approach in the web domain by taking advantage of the fact that most web commands are flat: there is one verb, and one or two arguments. This assumption dramatically simplifies the algorithm, and makes it more robust to extraneous words. It can handle long expressions originally intended for humans.

Some of the code that drives CoScripter is also interesting from an accessibility perspective. Imagine commanding your browser using only your voice, or tabbing through form fields on a Web page and having a screen reader accurately tell you what each element is by analyzing the surrounding text. Since the CoScripter team plans to open source their code, Mozilla’s accessibility team will be looking into leveraging their work.

In the future CoScripter might also impact how we test Firefox. Ray Kiddy recently wrote a post proposing that we allow beta testers to attach a log of their actions to a bug report, instead of having to manually write a list of steps explaining how to recreate the issue. Ray notes that in addition to helping our testers quickly communicate the steps to recreate a bug, these scripts could also eventually be used for automated testing.

Quick note about phishing: since scripts are shared between users, be careful what you run. Hopefully the social nature of CoScripter will result in the community quickly flagging and removing any malicious scripts that get submitted.

To everyone at IBM that worked on building CoScripter, congratulations on setting a new bar for the state of the art in Web browser automation.

15 responses

  1. F Murray Rumpelstiltskin wrote on :

    >> “Simplifying web based processes”

    Sheesh.

    IBM get a clue! It’s not at AMO – and I’m not wasting any time on it.

  2. diş beyazlatma wrote on :

    teşekkürler

  3. Matthew Cornell wrote on :

    Neat tool. Terrible registration process – should support anonymous downloads. Need keystrokes for the run button. Thanks a lot – very handy.

  4. Matthew Cornell wrote on :

    Great tool, HORRID steps to get it.

  5. John Tian wrote on :

    Alex, did you look at Dejaclick and iMacros yet? Both extensions are easy to use and work well.

  6. Jim wrote on :

    Looks nice – but does it work with the majority of websites like iMacros does? If not, it’ll remain useless for me.

    Sure, the imacros syntax is not exactly natural language 🙂 but not too complicated either. And as soon as the task gets a bit more involved, a natural language “program” becomes very confusing (download files, time stamps, tab support, clear cache… how do you describe these concepts in natural language?)

    >In the future CoScripter might also impact how we test Firefox.

    This is a good idea! But why not use imacros for this purpose? It’s already there and it works.

  7. Alex Faaborg wrote on :

    >how fragile a recorded script is when faced with site structure changes

    My impression is that because the script abstracts actions down to field and button names, the design and layout can change without breaking the script, but the overall UI can not.

  8. Jason Huggins wrote on :

    We’ve had something very similar to this with Selenium IDE (http://www.openqa.org/selenium-ide/) for a few years now, albeit without the natural language support. Selenium IDE lets you record scripts and play them back one step at a time.

    I think the best example of this approach was Apple’s HyperTalk, the scripting language built-in to HyperCard. HyperTalk really was a sweet language. I think GUI automation really lends itself to this natural language approach… but only in the simple cases. When testing gets slightly more complicated, and you want to use loops, conditionals, or functions… the natural language approach starts to get in the way… and a “real” language like JavaScript becomes far more productive. But I can’t deny the seductive attraction of the natural language approach.

  9. 翻译公司 wrote on :

    Another benefit is that by lowering the cost of entry we should see this getting used by clinical and educational support staff to set up customisations for individual users. That should really help it (and FF) take off and provide great web accessibility. I’m going to see how to use this in Jambu.

  10. Steve Lee wrote on :

    Hi Alex, Thanks for bringing this to our attention.

    Another benefit is that by lowering the cost of entry we should see this getting used by clinical and educational support staff to set up customisations for individual users. That should really help it (and FF) take off and provide great web accessibility. I’m going to see how to use this in Jambu.

    My one question is how fragile a recorded script is when faced with site structure changes. I guess the ‘sloppy programming’ removes some of the dependency.

  11. James wrote on :

    I’m reminded of this Stikkit review.

  12. 翻译公司 wrote on :

    Some of the code that drives CoScripter is also interesting from an accessibility perspective. Imagine commanding your browser using only your voice, or tabbing through form fields on a Web page and having a screen reader accurately tell you what each element is by analyzing the surrounding text. Since the CoScripter team plans to open source their code, Mozilla’s accessibility team will be looking into leveraging their work.

  13. BanjoPlayingHamster wrote on :

    Alright, this is brilliant.

    While job hunting online, just about every bloody job outlet or organization just HAS to hav its own online system for application, and you end up wasting precious time filling out the application for $BAR which is just different enough from $FOO that:

    a) it needs to be done manually

    b) it is still similar enough to reduce you to tears that you have to do it manually

    I’ll look forward to testing this!

  14. AndyEd wrote on :

    I’ve long advocated a programmatic event trace for unstructured feedback, see http://surfmind.com/muzings/?p=98, but this is a killer app for “Repro steps”.

  15. David Naylor wrote on :

    This seems really interesting, and I love how it understands normal language! I’m just trying to think of a way to use it…:)