Array

PSN Trophy Scraper to Scrape Games and Trophy Information

This scraper will allow you to scrape game and trophy data from the official PlayStation website. The scraper uses PhantomJS to scrape the data based on a provided user profile data. The data can then be parsed by a programming language and inserted into a database.

This is still an early version of the code. I have it finally producing an output so Im going to post this up now and update it with more code next time i am able to work on it.

There is no official PSN API so we will have to take advantage of the data avaiable on the public trophies DB on the playstation website. There are websites that have trophy list information for Playstation trophies, so I wanted to find a way to do the same. This website uses a ton of Javascript and AJAX meaning we cant easly scrape the data using PHP or any other language. To make life even harder, they seem to have a lot of safeguards in place to prevent people from getting access to this trophy data easily. I’m not sure why they have to be so secretive with this stuff!

Anyway, I was able to use PhantomJS to bypass any of the difficult stuff in place to block users from doing this and I was able to successfully dump data from a users trophy list into a text file. The hardest part is done now. Once we have access to the list its just a matter of gathering the data you want. It gets returned in HTML format which makes it nice and easy to parse.

If anyone has any suggestions or improvements to this please post them. As a group we may get a fully functioning scraper working!

This is my first time using PhantomJS so I’m still getting the hang of it. For anyone who doesnt know what it is, PhantomJS is a way to interact with a webpage in the way that a user would. You provide a URL and give commands. PhantomJS will then be able to click buttons and interact with the web page just like a user would. We can return the content of the current page at any time which allows us to pull the trophy data, or anything else for that matter.

For this i will use Hakoom as the username since he has the most trophies of any user on PSN. If you visit the website in person you will see that the page only loads a certain number of trophies and then there is an AJAX button at the bottom of the page to load more content. This is the next thing that I will add to this document in order for us to be able to scrape a huge amount of trophies at once. For now the code is able to get the first page of trophies, which I think is a good start (considering it took me ages to get working 😛 ).

In order to run this script you will need to have PhantomJS installed. This is a command line tool, but it’s not as difficult to use as it might seem. If you save the code below to a file you can run it using the following command.

phantomjs psn.js

 

Waiting for trophy list to load...
'waitFor()' finished in 1270ms.

The window will then dump a huge load of HTML that contains the trophy data. The important part looks like this.

<h2 class="clearfix title">The Swapper</h2></div><ul class="trophies clearfix"><li class="bronze">0</li><li class="silver">0</li><li class="gold">0</li><li class="platinum">0</li></ul>

The best way to handle this data is to use a programming language to parse the HTML and pull the data we need from the code. There are many languages you can use to do this. Once of the most simple ways to do this in my opinion is to use PHP. The exec() function will allow you to run the above command and all of the output will be dumped into a variable which you can then parse. You will need to update the path for psn.js if you do not have the php file in the same folder as the psn.js file. So the function might look like.

$trophyOutput = exec("phantomjs /var/www/psnscrapper/psn.js");

File Contents : psn.js

var page = require('webpage').create();

//open the url of the playstation trophy site.
page.open('http://my.playstation.com/logged-in/trophies/public-trophies/', function(status) 
{
  page.evaluate(function() {
    document.getElementById("trophiesId").value = "hakoom";
    //checkPTrophies(); btn click calls this function
    $('#btn_publictrophy').click().delay( 6000 );
  });

  //generally this completes in about 300-500ms.
  console.log("\nWaiting for trophy list to load...");

    waitFor(function(){
      return page.evaluate(function(){
        //this div contains all of the trophy content. Once this is present then we know that the page has successfully loaded and we are now able to pull the trophy data. 
        //This is the most difficult part of using this tool. If you try calling values that arent loaded yet it can mess things up. 
        var e = document.querySelector("#trophyTrophyList .trophy-image");
        return e;
      });
    }, function(){
      setTimeout(function(){
        var trophiesDiv = page.evaluate(function(){
          //dump all of the trophy list innerHTML data. 
          return document.getElementById("trophyTrophyList").innerHTML;
        });
        console.log(trophiesDiv);
            phantom.exit();
      }, 1000); // wait a little longer
    }, 20000);

});


//thanks to Artjom B for helping with this part.
function waitFor(testFx, onReady, timeOutMillis) {
    var maxtimeOutMillis = timeOutMillis ? timeOutMillis : 3000, //< Default Max Timout is 3s
        start = new Date().getTime(),
        condition = false,
        interval = setInterval(function() {
            if ( (new Date().getTime() - start < maxtimeOutMillis) && !condition ) {
                // If not time-out yet and condition not yet fulfilled
                condition = (typeof(testFx) === "string" ? eval(testFx) : testFx()); //< defensive code
            } else {
                if(!condition) {
                    // If condition still not fulfilled (timeout but condition is 'false')
                    console.log("'waitFor()' timeout");
                    phantom.exit(1);
                } else {
                    // Condition fulfilled (timeout and/or condition is 'true')
                    console.log("'waitFor()' finished in " + (new Date().getTime() - start) + "ms.");
                    typeof(onReady) === "string" ? eval(onReady) : onReady(); //< Do what it's supposed to do once the condition is fulfilled
                    clearInterval(interval); //< Stop this interval
                }
            }
        }, 250); //< repeat check every 250ms
}

Related Articles

Related Questions

WordPress Table of Contents Plus Not Working

I have been using this plugin for a while and i really like it. It seems to have completely stopped working recently. I can...

Function Keys Reversing Between Fn Actions And Normal

My keyboard has the usual F1 to F12 keys along the top. I use these for shortcuts in various applications. These keys also have...

Whirlpool Oven F6E6: Appliance Manager 1 Board Communication

I have a brand new Whirlpool oven W11I OM1 4MS2 H or (859991549450). I bought it alongside the microwave combi oven. I have had...

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Latest Tools

Markdown To Html Converter

Welcome to our web-based tool designed to make your life easier by converting Markdown to HTML in a matter of seconds. Our user-friendly interface...

AI Content Detector

We've got this awesome free tool that'll help you figure out if that content you're looking at was written by a human or some...

Image Saturation

Are you looking for an easy-to-use, free app to modify your image saturation levels and make your pictures truly pop? Look no further! Our...

Pixelate Image Tool

Introducing the ultimate free online image pixelator tool that allows you to easily transform your images into stunning pixel art in just a few...

Image RGB Level Adjustment Tool

Introducing the ultimate image color adjustment tool for all your photo editing needs. Our free online tool lets you take full control of your...

Image Color Inverter

Looking for a quick and efficient way to convert your images into negatives? Our Free Image to Negative Converter is the answer! Our online...

Latest Posts

Latest Questions