Skip to content

Commit

Permalink
Merge pull request #158 from keneanung/GetGemxchangeFromApiv2
Browse files Browse the repository at this point in the history
Get the Gem Exchange dta from the official GW2 API
  • Loading branch information
rubensayshi committed Nov 24, 2014
2 parents beefb78 + 8204dd4 commit d4a7e19
Show file tree
Hide file tree
Showing 13 changed files with 88 additions and 590 deletions.
91 changes: 62 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,96 +103,129 @@ You will need the pear library Log
pear channel-discover pear.phing.info
pear install phing/phing
pear install Log

Node.js and grunt
-----------------
While Node.js and grunt are not directly needed for running gw2spidy, grunt is needed to build the js and css files that are
served with the web pages. That means you don't need node on your webserver, but can do the grunt tasks on any other macine as
long as you copy the files in `webroot/assets/compiled` over to the same directory on the web server.

Install Node.js via the usual installation mechanism for your OS. Afterwards run `npm insall -g grunt` to globally install
grunt on the machine.


Project Setup
=============

RequestSlots
------------
ArenaNet is okay with me doing this, but nonetheless I want to limit the amount of requests I'm shooting at their website or at least spread them out a bit.
I came up with this concept of 'request slots', I setup an x amount of slots, claim one when I do a request and then give it a cooldown before I can use it again.
ArenaNet is okay with me doing this, but nonetheless I want to limit the amount of requests I'm shooting at their website or at
least spread them out a bit.
I came up with this concept of 'request slots', I setup an x amount of slots, claim one when I do a request and then give it a
cooldown before I can use it again.
That way I can control the flood a bit better, from the technical side this is done using Redis sorted sets.

Background Queues
-----------------
All crawling work is done in background process / daemons and the heavy lifting is done with a few queues, the queues process also handles the previously mentioned request slots.
All crawling work is done in background process / daemons and the heavy lifting is done with a few queues, the queues process
also handles the previously mentioned request slots.
This is also done using Redis sorted sets.

Config / Env
------------
Think of a name that represents your machine / evn, eg *ruben-vm1* or *gw2spidy-prod-vps2*.
Copy the `config/cnf/example-custom-cnf.json` to `config/cnf/<your-chosen-name>.json` and edit it to set the values for *auth_email* and *auth_password*.
Copy the `config/cnf/example-custom-cnf.json` to `config/cnf/<your-chosen-name>.json` and edit it to set the values for
*auth_email* and *auth_password*.

Copy `config/cnf/example-env` to `config/cnf/env` and edit it, it contains a line for each config file it should load from `config/cnf/<name>.json`
Replace the first line (*ruben-vm1*) with the name you had previously chosen, leave the *dev* and *default*, those are other config files it should load too (or change *dev* to *prod* if you don't want debug mode.
Copy `config/cnf/example-env` to `config/cnf/env` and edit it, it contains a line for each config file it should load from
`config/cnf/<name>.json`
Replace the first line (*ruben-vm1*) with the name you had previously chosen, leave the *dev* and *default*, those are other
config files it should load too (or change *dev* to *prod* if you don't want debug mode.

The config files you specify `config/cnf/env` will be loaded (in reverse order), overwriting the previous ones.
For overloading other config values (like database login etc.), check `config/cnf/default.json` for all options which you could also se in your custom config file.
For overloading other config values (like database login etc.), check `config/cnf/default.json` for all options which you could
also se in your custom config file.

### The `config/cnf/env` and any `config/cnf/*.json` other then `default.json`, `dev.json` and `prod.json` are on .gitignore so they won't be versioned controlled
**The `config/cnf/env` and any `config/cnf/*.json` other then `default.json`, `dev.json` and `prod.json` are on .gitignore so
they won't be versioned controlled**

Database Setup
--------------
In the `config` folder there's a `config/schema.sql` (generated by propel based of `config/schema.xml`, so database changes should be made to the XML and then generating the SQL file!).
You should create a database called 'gw2spidy' and load the `config/schema.sql` in. Afterwards import `config/itemTypesAndDisciplines.sql` to get certain stable Disciplines and item types.
In the `config` folder there's a `config/schema.sql` (generated by propel based of `config/schema.xml`, so database changes
should be made to the XML and then generating the SQL file!).
You should create a database called 'gw2spidy' and load the `config/schema.sql` in. Afterwards import
`config/itemTypesAndDisciplines.sql` to get certain stable Disciplines and item types.

RequestSlots Setup
------------------
Run `tools/setup-request-slots.php` to create the initial request slots, you can also run this during development to reinitiate the slots so you can instantly use them again if they are all on cooldown.
Run `tools/setup-request-slots.php` to create the initial request slots, you can also run this during development to reinitiate
the slots so you can instantly use them again if they are all on cooldown.

Building The Item Database
--------------------------

The scripts described below are called by the script `bin/rebuild-items-recpipes.sh`.

To build the item database, you want to run `tools/update-items-from-api.php`. This gives you all known items in the game and creates new types and subtypes on the fly.
To build the item database, you want to run `tools/update-items-from-api.php`. This gives you all known items in the game and
creates new types and subtypes on the fly.

Afterwards you may want to run the script nightly to keep up to date with known items.

If you want or need the recipe data in the database, you also need to run `php tools/create-recipe-map /some/place/on/harddrive`. After this is complete, you also have to import the map with `php tools/import-recipe-map /some/place/on/harddrive`.
If you want or need the recipe data in the database, you also need to run
`php tools/create-recipe-map /some/place/on/harddrive`. After this is complete, you also have to import the map with
`php tools/import-recipe-map /some/place/on/harddrive`.

Creating the web assets
-----------------------
gw2spidy serves js and css files in a single file and (depending on the configuration) minified. To build these files, you simply need to run `grunt`.

Crawling The Tradingpost
------------------------
========================

ItemListingDB Worker
--------------------
The ItemListingDB Worker itself is this script: `daemons/worker-queue-item-listing-db.php`.
It will pop items off the listing queue and process them, these queue-items are automatically requeue'd with their priority so you should only have to run `daemons/fill-queue-item-listing-db.php` once to get the initial batch in.
Since the v2/commerce APIs are enabled, the worker uses the v2/commerce/listings endpoint to process the configured 'items-per-request' amount of items at 1 time (max 250!).
It will pop items off the listing queue and process them, these queue-items are automatically requeue'd with their priority so
you should only have to run `daemons/fill-queue-item-listing-db.php` once to get the initial batch in.
Since the v2/commerce APIs are enabled, the worker uses the v2/commerce/listings endpoint to process the configured
'items-per-request' amount of items at 1 time (max 250!).

However if the script fails we might sometimes loose a queue-item or new items might be added to the database at some point so there's a `daemons/supervise-queue-item-listing-db.php` script which makes sure that the queue is still filled properly.
However if the script fails we might sometimes loose a queue-item or new items might be added to the database at some point so
there's a `daemons/supervise-queue-item-listing-db.php` script which makes sure that the queue is still filled properly.

There's a priority system in place so that some items (like weapons above certain rarity / level) are processed more often then others (like salvage kits, which nobody buys from the TP ...).
There's a priority system in place so that some items (like weapons above certain rarity / level) are processed more often then
others (like salvage kits, which nobody buys from the TP ...).
See the Priority System section below for more info on that!


Gem Worker
----------
The `daemons/worker-gem.php` script does 2 requests to the gem-exchange site to retrieve the exchange rates and volume and then sleeps for 180 seconds (3 minutes).
This script also requires a game session_key (see GW2Session section again).
The `daemons/worker-gem.php` script does 2 requests to the gem-exchange GW2-API to retrieve the exchange rates and volume and
then sleeps for 180 seconds (3 minutes).

Running The Workers
-------------------
The workers all do 100 loops to do their specific task or if no tasks they do short sleeps waiting for a task.
They will also sleep if there are no slots available.

Previously I used to run 4 workers in parallel using `while [ true ]; do php daemons/worker-queue-item-listing-db.php >> /var/log/gw2spidy/worker.1.log; echo "restart"; done;`
Previously I used to run 4 workers in parallel using
`while [ true ]; do php daemons/worker-queue-item-listing-db.php >> /var/log/gw2spidy/worker.1.log; echo "restart"; done;`
Where I replace the .1 with which number of the 4 it is so I got 4 logs to tail.

I now added some bash scripts in the `bin` folder to `bin/start-workers.sh <num-listing-workers> <num-item-workers> <num-gem-workers>` and `bin/stop-workers.sh <now>` to manage them.
You should check the bash scripts and understand them before running them imo ;) but you could also trust me on my blue eyes and just run it xD
I now added some bash scripts in the `bin` folder to `bin/start-workers.sh <num-listing-workers> <num-gem-workers>`
and `bin/stop-workers.sh <now>` to manage them.
You should check the bash scripts and understand them before running them imo ;) but you could also trust me on my blue eyes
and just run it xD

Priority System
---------------
The amount of requests we do are limited by our requestslot system, unfortunatly we're now bound by doing 1 item per request (previously we could combine up to 250).
===============
The amount of requests we do are limited by our requestslot system, unfortunatly we're now bound by doing 1 item per request
(previously we could combine up to 250).
So I created a priority system to process 'important' items more often, in the this spreadsheet I calculated the priorities:
https://docs.google.com/a/rubensayshi.com/spreadsheet/ccc?key=0Alq65aekWXJmdGotSmdBYXJPZ0NKbHBhdzVZMlh5Q1E#gid=0

**this has been changed slightly, I need to update the spreadsheet and should write some stuff here soon**

GW2 Sessions
============
GW2 Sessions (obsolete)
=======================
When spidering we used to access the tradingpost using a session created by logging into accounts.guildwars2.com.
After logging in it gives us a session_key which allows access to the tradingpost, however limited to only being able to get the lists of items!

Expand Down
12 changes: 2 additions & 10 deletions bin/start-workers.sh
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
#!/bin/bash

LISTING_CNT=$1
ITEM_CNT=$2
GEM_CNT=$3
GEM_CNT=$2
LOGDIR="/var/log/gw2spidy"
PIDDIR="/var/run/gw2spidy"

Expand All @@ -19,9 +18,6 @@ sudo chmod -R 0777 $PIDDIR
if [[ -z "${LISTING_CNT}" ]]; then
LISTING_CNT=1
fi
if [[ -z "${ITEM_CNT}" ]]; then
ITEM_CNT=1
fi
if [[ -z "${GEM_CNT}" ]]; then
GEM_CNT=1
fi
Expand Down Expand Up @@ -60,11 +56,7 @@ for ((i = 0; i < LISTING_CNT; i++)); do
start_worker "worker-queue-item-listing-db" $i
done

for ((i = 0; i < ITEM_CNT; i++)); do
start_worker "worker-queue-item-db" $i
done

for ((i = 0; i < GEM_CNT; i++)); do
for ((i = 0; i < GEM_CNT; i++)); do
start_worker "worker-gem" $i
done

Expand Down
5 changes: 1 addition & 4 deletions config/cnf/default.json
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,6 @@ cnf = {
"count" : 100,
"cooldown" : 30
},
"tradingpost_url" : "https://tradingpost-live.ncplatform.net",
"gemexchange_url" : "https://exchange-live.ncplatform.net",
"auth_url" : "https://account.guildwars2.com",
"gw2api_url" : "https://api.guildwars2.com",
"gw2render_url" : "https://render.guildwars2.com",
"auth_app" : null,
Expand All @@ -62,7 +59,7 @@ cnf = {
"use_shroud_magic" : false,
"use_samuirai_magic" : false,
"save_listing_from_item_data" : true,
"items-per-request" : 100
"items-per-request" : 200
},

/*
Expand Down
47 changes: 0 additions & 47 deletions controllers/other.php
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,11 @@

use GW2Spidy\DB\UserQuery;

use \DateTime;
use \DateInterval;

use GW2Spidy\GW2SessionManager;
use GW2Spidy\NewQueue\RequestSlotManager;
use GW2Spidy\NewQueue\QueueHelper;

use Symfony\Component\HttpFoundation\Request;

use GW2Spidy\DB\GW2Session;
use GW2Spidy\DB\ItemQuery;


Expand Down Expand Up @@ -102,48 +97,6 @@
})
->bind('admin_session');

/**
* ----------------------
* route /admin/session POST
* ----------------------
*/
$app->post("/admin/session", function(Request $request) use($app) {
$session_key = $request->get('session_key');
$game_session = (boolean)$request->get('game_session');

if (preg_match('/s=(.+)/', $session_key, $a)) {
$session_key = $a[1];
}

$gw2session = new GW2Session();
$gw2session->setSessionKey($session_key);
$gw2session->setGameSession($game_session);
$gw2session->setCreated(new DateTime());

try {
try {
$ok = GW2SessionManager::getInstance()->checkSessionAlive($gw2session);
} catch (Exception $e) {
$gw2session->save();
return $app->redirect($app['url_generator']->generate('admin_session', array('flash' => "tpdown")));
}

if ($ok) {
$gw2session->save();
return $app->redirect($app['url_generator']->generate('admin_session', array('flash' => "ok")));
} else {
return $app->redirect($app['url_generator']->generate('admin_session', array('flash' => "dead")));
}
} catch (PropelException $e) {
if (strstr($e->getMessage(), "Duplicate")) {
return $app->redirect($app['url_generator']->generate('admin_session', array('flash' => "duplicate")));
} else {
throw $e;
}
}
})
->bind('admin_session_post');

/**
* ----------------------
* route /admin/password
Expand Down
18 changes: 0 additions & 18 deletions daemons/worker-gem.php
Original file line number Diff line number Diff line change
@@ -1,10 +1,5 @@
<?php

use GW2Spidy\GW2SessionManager;

use \DateTime;
use \Exception;

use GW2Spidy\DB\GoldToGemRate;
use GW2Spidy\DB\GoldToGemRateQuery;

Expand All @@ -28,19 +23,6 @@ function logg($msg){

$slotManager = RequestSlotManager::getInstance();

/*
* login here, this allows us to exit right away on failure
*/
logg("login ...\n");
try {
$gw2session = GW2SessionManager::getInstance()->getSessionKey();
logg("login ok -> [{$gw2session}] \n");
} catch (Exception $e) {
logg("login failed ... sleeping [60] and restarting \n");
sleep(60);
exit(1);
}

/*
* $run up to $max in 1 process, then exit so process gets revived
* this is to avoid any memory problems (propel keeps a lot of stuff in memory)
Expand Down
29 changes: 16 additions & 13 deletions src/GW2Spidy/BaseSpider.php
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,28 @@

namespace GW2Spidy;

use Exception;
use GW2Spidy\Util\CurlRequest;
use GW2Spidy\Util\Singleton;

abstract class BaseSpider extends Singleton {
protected $gw2session;

public function getSession() {
if (is_null($this->gw2session)) {
$this->gw2session = GW2SessionManager::getInstance();
/**
* @param $url string The url for this API call.
* @return mixed[] The API answer parsed as an array.
* @throws Exception Thrown if a non positive HTTP Code was returned.
*/
protected function getApiData($url)
{
$curl = CurlRequest::newInstance($url)
->exec();

if ($curl->getInfo("http_code") != 200) {
throw new Exception("Failed to retrieve API data.");
}

return $this->gw2session;
}

public function getSessionKey() {
$this->getSession()->getSessionKey();
}

public function setSession(GW2SessionManager $gw2session) {
$this->gw2session = $gw2session;
$data = json_decode($curl->getResponseBody(), true);
return $data;
}
}

Expand Down
Loading

0 comments on commit d4a7e19

Please sign in to comment.