Hello guys, Have you ever heard about automation scripting with headless browser? Maybe yes, because this is the one of many methods to make a test your front end website. Or maybe for scrapping the data from the website that used CSR (Client Side Rendering).

Today I want to write an article about installing the PhantomJS in Elementary OS 5 Hera, decribe a little bit what is PhantomJS used for and give you a simple two examples to use it.

What is PhantomJS

PhantomJS is a (today was discontinued) headless browser used for automating web page interaction. PhantomJS provides a JavaScript API enabling automated navigation, screenshots, user behavior and assertions making it a common tool used to run browser-based unit tests in a headless system like a continuous integration environment.

But there are Malicious use of PhantomJS, it is because PhantomJS is runnable without a UI, scriptable via JavaScript, and relatively adherent to modern browser specifications, it is commonly used as a way to automate attacks against web sites.

Install

1. Prerequisites Library
Before installing PhantomJS, we need to make sure the prerequisites library has been installed first in our machine.

1
2
3
sudo apt update
sudo apt install build-essential chrpath libssl-dev libxft-dev
sudo apt install libfreetype6 libfreetype6-dev libfontconfig1 libfontconfig1-dev

2. Install PhantomJS
Now we have to download the PhantomJS. Actually PhantomJS is still suspended in development, so we must download the version 2.1.1 (this is the stable version).

1
2
wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2
sudo tar xvjf phantomjs-2.1.1-linux-x86_64.tar.bz2 -C /usr/local/share/

3. Create a Symlink
A symlink (also called a symbolic link) is a type of file in Linux that points to another file or a folder on your computer. Symlinks are similar to shortcuts in Windows.

Some people call symlinks “soft links” – a type of link in Linux/UNIX systems – as opposed to “hard links.”

1
sudo ln -sf /usr/local/share/phantomjs-2.1.1-linux-x86_64/bin/phantomjs /usr/local/bin

4. Verify PhantomJS Version

1
phantomjs --version

If the PhantomJS version shows 2.1.1 in your terminal, then PhantomJS already installed in your machine.

Example

1. Folder Structure
In this example I will use folder structure like this

  • learn-phantomjs (directory)
    • screenshots (directory)
    • src (directory)

2. Basic
This just a basic example to take a screenshot of a webpage.

  • Create a file name basic.js inside src directory.
    1
    2
    3
    4
    5
    6
    7
    8
    9
    "use strict";
    var page = require('webpage').create();
    page.open('http://example.com', function (status) {
    console.log('Status: ' + status);
    if (status === 'success') {
    page.render('screenshots/basic.png');
    }
    phantom.exit();
    })
  • Run it
    1
    phantomjs src/basic.js

3. Advanced
In this advanced example, I want to show you how we could scrape the data from another site which is use CSR (Client Side Rendering).

  • Create a file name datatables.js inside src directory.
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    "use strict";

    var page = require('webpage').create();
    var system = require('system');

    function waitFor(testFx, onReady, timeOutMillis) {
    var maxtimeOutMillis = timeOutMillis ? timeOutMillis : 3000, //< Default Max Timout is 3s
    start = new Date().getTime(),
    condition = false,
    interval = setInterval(function() {
    if ( (new Date().getTime() - start < maxtimeOutMillis) && !condition ) {
    // If not time-out yet and condition not yet fulfilled
    condition = (typeof(testFx) === "string" ? eval(testFx) : testFx()); //< defensive code
    } else {
    if(!condition) {
    // If condition still not fulfilled (timeout but condition is 'false')
    console.log("'waitFor()' timeout");
    phantom.exit(1);
    } else {
    // Condition fulfilled (timeout and/or condition is 'true')
    console.log("'waitFor()' finished in " + (new Date().getTime() - start) + "ms.");
    typeof(onReady) === "string" ? eval(onReady) : onReady(); //< Do what it's supposed to do once the condition is fulfilled
    clearInterval(interval); //< Stop this interval
    }
    }
    }, 250); //< repeat check every 250ms
    };

    if (system.args.length !== 1) {
    console.log('invalid call');
    phantom.exit(1);
    } else {
    // Set another User Agent
    page.settings.userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36';

    // set viewport
    page.viewportSize = {
    width: 800,
    height: 600
    }

    // set listen on request page
    // page.onResourceRequested = function (request) {
    // console.log('Request ' + JSON.stringify(request, undefined, 4))
    // }

    // set listen on error page
    page.onError = function (msg, trace) {
    console.log(msg);
    trace.forEach(function (item) {
    console.log(' ', item.file, ':', item.line);
    });
    }

    // set listen console from web target
    page.onConsoleMessage = function (msg) {
    console.log(msg);
    }

    page.open('https://datatables.net/examples/data_sources/ajax', function (status) {
    // Check for page load success
    if (status !== 'success') {
    console.log('Unable to access network');
    phantom.exit();
    } else {
    console.log('Status: ' + status);
    // Wait for 'tbody' to be visible
    waitFor(function() {
    // Check in the page if a specific element is now visible
    return page.evaluate(function() {
    return $("tbody").is(":visible");
    });
    }, function() {
    console.log("The tbody should be visible now.");
    // get the data from datatables
    page.evaluate(function () {
    console.log(document.title);
    var tBody = document.querySelector('tbody');
    var tableRow = tBody.getElementsByTagName('tr');
    for (var t = 0; t < tableRow.length; t++) {
    console.log(tableRow[t].innerText);
    }
    })
    // take a screenshot
    page.render('screenshots/datatables.png');
    // exit phantomjs
    phantom.exit();
    });
    }
    })
    }
  • Run it
    1
    phantomjs src/datatables.js

The stealed data will be shows at your terminal screen.

Explanation
In this advanced example, we will make a phantomjs to steal the data rendered by datatables from other webpage. So we have to create the waitFor function to wait the phantomjs until the spesific selector already visible. Also I’ve put some listener on console to help you easier to listen what happened in the web target.

Conclusion

Why I write an article about PhantomJS? Maybe this is a hard part for newbie, because the official PhantomJS website there is no any single documentation that explain how to install it. Even I was used PhantomJS, I didn’t remember how to install it, I’ve always to search again through Google to do this.

If you read and follow this tutorial carefully, to install and to use PhantomJS is very easy. I’ve made this in very simple explanation. After you read this article, hope you will easier to follow PhantomJS API documentation.

What is CSR ?
In JavaScript development world today, there are two type of website which is use SSR and CSR. CSR is Client Side Rendering, for simple explanation, any website who use Ajax to load the data from server, we can call it as CSR.

The problem is a website who are using Ajax, the data can’t be scrapped directly by using server side request (eg. cURL). Why? Because cURL just download it response and can’t make execution on JavaScript. So the solution is we need a headless browser (PhantomJS).

Source Code
This example is already on my github, just download it to make you easier in learning and make a new research test for you.

Pros and Cons
Actualy I was used PhantomJS in very long time in a back years ago but still used it until this time in some condition only.

Pros

  • PhantomJS is cross platform (support for windows, linux and mac).
  • PhantomJS is very fast because run in native.
  • PhantomJS is based on WebKit browser, similar to Safari and Google Chrome, very easy to use and huge community.
  • PhantomJS is using outdated WebKit browser, means you are still able to use a deprecated JavaScript function.

Cons

  • PhantomJS not support the latest/modern javascript code style, you have to write your code following ES5 standard.
  • PhantomJS didn’t shows error stacks when you fail at execution code inside evaluate function.
  • PhantomJS still suspended in development, the latest released still 2.1.1 at 2018. So PhantomJS seems too old to use for today.
  • PhantomJS lack of built-in JavaScript feature because using outdated WebKit browser.

For more better experience of using PhantomJS, I suggest you to learn CasperJS. CasperJS is a suite of libraries on top of PhantomJS that extend its capabilities as a client for automated web page testing.

If you want to use PhantomJS with NodeJS, I suggest you to learn SpookyJS.

Alternative
If you need an alternative for headless browser like PhantomJS, you can try Puppeteer.
Maybe I will create an article about Puppeteer next time.

If you have any question about PhantomJS, feel free to leave it at comment below.

Thank you for reading my article.