PSN Trophy Scraper to Scrape Games and Trophy Information

This scraper will allow you to scrape game and trophy data from the official PlayStation website. The scraper uses PhantomJS to scrape the data based on a provided user profile data. The data can then be parsed by a programming language and inserted into a database.

This is still an early version of the code. I have it finally producing an output so Im going to post this up now and update it with more code next time i am able to work on it.

There is no official PSN API so we will have to take advantage of the data avaiable on the public trophies DB on the playstation website. There are websites that have trophy list information for Playstation trophies, so I wanted to find a way to do the same. This website uses a ton of Javascript and AJAX meaning we cant easly scrape the data using PHP or any other language. To make life even harder, they seem to have a lot of safeguards in place to prevent people from getting access to this trophy data easily. I’m not sure why they have to be so secretive with this stuff!

Anyway, I was able to use PhantomJS to bypass any of the difficult stuff in place to block users from doing this and I was able to successfully dump data from a users trophy list into a text file. The hardest part is done now. Once we have access to the list its just a matter of gathering the data you want. It gets returned in HTML format which makes it nice and easy to parse.

If anyone has any suggestions or improvements to this please post them. As a group we may get a fully functioning scraper working!

This is my first time using PhantomJS so I’m still getting the hang of it. For anyone who doesnt know what it is, PhantomJS is a way to interact with a webpage in the way that a user would. You provide a URL and give commands. PhantomJS will then be able to click buttons and interact with the web page just like a user would. We can return the content of the current page at any time which allows us to pull the trophy data, or anything else for that matter.

For this i will use Hakoom as the username since he has the most trophies of any user on PSN. If you visit the website in person you will see that the page only loads a certain number of trophies and then there is an AJAX button at the bottom of the page to load more content. This is the next thing that I will add to this document in order for us to be able to scrape a huge amount of trophies at once. For now the code is able to get the first page of trophies, which I think is a good start (considering it took me ages to get working 😛 ).

In order to run this script you will need to have PhantomJS installed. This is a command line tool, but it’s not as difficult to use as it might seem. If you save the code below to a file you can run it using the following command.

phantomjs psn.js

 

Waiting for trophy list to load...
'waitFor()' finished in 1270ms.

The window will then dump a huge load of HTML that contains the trophy data. The important part looks like this.

<h2 class="clearfix title">The Swapper</h2></div><ul class="trophies clearfix"><li class="bronze">0</li><li class="silver">0</li><li class="gold">0</li><li class="platinum">0</li></ul>

The best way to handle this data is to use a programming language to parse the HTML and pull the data we need from the code. There are many languages you can use to do this. Once of the most simple ways to do this in my opinion is to use PHP. The exec() function will allow you to run the above command and all of the output will be dumped into a variable which you can then parse. You will need to update the path for psn.js if you do not have the php file in the same folder as the psn.js file. So the function might look like.

$trophyOutput = exec("phantomjs /var/www/psnscrapper/psn.js");

File Contents : psn.js

var page = require('webpage').create();

//open the url of the playstation trophy site.
page.open('http://my.playstation.com/logged-in/trophies/public-trophies/', function(status) 
{
  page.evaluate(function() {
    document.getElementById("trophiesId").value = "hakoom";
    //checkPTrophies(); btn click calls this function
    $('#btn_publictrophy').click().delay( 6000 );
  });

  //generally this completes in about 300-500ms.
  console.log("\nWaiting for trophy list to load...");

    waitFor(function(){
      return page.evaluate(function(){
        //this div contains all of the trophy content. Once this is present then we know that the page has successfully loaded and we are now able to pull the trophy data. 
        //This is the most difficult part of using this tool. If you try calling values that arent loaded yet it can mess things up. 
        var e = document.querySelector("#trophyTrophyList .trophy-image");
        return e;
      });
    }, function(){
      setTimeout(function(){
        var trophiesDiv = page.evaluate(function(){
          //dump all of the trophy list innerHTML data. 
          return document.getElementById("trophyTrophyList").innerHTML;
        });
        console.log(trophiesDiv);
            phantom.exit();
      }, 1000); // wait a little longer
    }, 20000);

});


//thanks to Artjom B for helping with this part.
function waitFor(testFx, onReady, timeOutMillis) {
    var maxtimeOutMillis = timeOutMillis ? timeOutMillis : 3000, //< Default Max Timout is 3s
        start = new Date().getTime(),
        condition = false,
        interval = setInterval(function() {
            if ( (new Date().getTime() - start < maxtimeOutMillis) && !condition ) {
                // If not time-out yet and condition not yet fulfilled
                condition = (typeof(testFx) === "string" ? eval(testFx) : testFx()); //< defensive code
            } else {
                if(!condition) {
                    // If condition still not fulfilled (timeout but condition is 'false')
                    console.log("'waitFor()' timeout");
                    phantom.exit(1);
                } else {
                    // Condition fulfilled (timeout and/or condition is 'true')
                    console.log("'waitFor()' finished in " + (new Date().getTime() - start) + "ms.");
                    typeof(onReady) === "string" ? eval(onReady) : onReady(); //< Do what it's supposed to do once the condition is fulfilled
                    clearInterval(interval); //< Stop this interval
                }
            }
        }, 250); //< repeat check every 250ms
}

Related Articles

Related Questions

What are the best budget options for practicing DevOps skills?

I'm a student wanting to dive into DevOps and cloud skills, but I need a super cheap server. I'm talking about options that are...

Is it safe to use open Wi-Fi at a hotel without a VPN?

Hey everyone! I'm currently at a hotel that offers open Wi-Fi, and I'm using it for social media and checking my emails. A friend...

How Can You Build Docker Images Without Root Access?

Have you ever faced a situation where you needed to build a Docker image from a `Dockerfile`, but the environment didn't allow running as...

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Latest Tools

OpenAI Token Calculator

This tool is a simple OpenAI token calculator, web-based utility designed to help you quickly estimate the number of tokens in your text when...

List Sorting Tool

Welcome to our innovative list ordering and management tool. This next-level platform enables you to sort a list of items in ascending or descending...

Sudoku Solver

Welcome to our free online Sudoku solving tool, an interactive platform for puzzle enthusiasts seeking a break from a Sudoku conundrum. This advanced platform...

Apply Image Filters To Image

Digital imagery in the modern world is all about reinforcing emotions and stories behind each photo we take. To amplify this storytelling, we are...

Add Watermark To Image

As the world is increasingly consumed by digital media, protecting your original images is paramount. We are thrilled to introduce you to our innovative...

CSV To Xml Converter

Welcome to our CSV to XML converter tool, a convenient and user-friendly solution for all your data conversion needs. This versatile tool on our...

Latest Posts

Latest Questions