Skip to content

Commit a1db064

Browse files
committed
v1.2.5
1 parent 9584708 commit a1db064

File tree

10 files changed

+141
-83
lines changed

10 files changed

+141
-83
lines changed

README.md

Lines changed: 38 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,11 @@ npm i puppeteer-page-proxy
1818
#### PageProxy(pageOrReq, proxy)
1919

2020
- `pageOrReq` <[object](https://developer.mozilla.org/en-US/docs/Glossary/Object)> 'Page' or 'Request' object to set a proxy for.
21-
- `proxy` <[string](https://developer.mozilla.org/en-US/docs/Glossary/String)> Proxy to use in the current page.
21+
- `proxy` <[string](https://developer.mozilla.org/en-US/docs/Glossary/String)|[object](https://developer.mozilla.org/en-US/docs/Glossary/Object)> Proxy to use in the current page.
2222
* Begins with a protocol (e.g. http://, https://, socks://)
23+
* In the case of [proxy per request](https://github.com/Cuadrix/puppeteer-page-proxy#proxy-per-request), this can be an object with optional properites for overriding requests:\
24+
`url`, `method`, `postData`, `headers`\
25+
See [request.continue](https://github.com/puppeteer/puppeteer/blob/master/docs/api.md#requestcontinueoverrides) for more info about the above properties.
2326

2427
#### PageProxy.lookup(page[, lookupService, isJSON, timeout])
2528

@@ -34,7 +37,7 @@ npm i puppeteer-page-proxy
3437

3538
**NOTE:** By default this method expects a response in [JSON](https://en.wikipedia.org/wiki/JSON#Example) format and [JSON.parse](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/parse)'s it to a usable javascript object. To disable this functionality, set `isJSON` to `false`.
3639

37-
## Examples
40+
## Usage
3841
#### Proxy per page:
3942
```js
4043
const puppeteer = require('puppeteer');
@@ -56,7 +59,7 @@ const useProxy = require('puppeteer-page-proxy');
5659
await page2.goto(site);
5760
})();
5861
```
59-
To remove a proxy set this way, simply pass a falsy value (e.g `null`) instead of the proxy;
62+
To remove proxy, omit or pass in falsy value (e.g `null`):
6063
```js
6164
await useProxy(page, null);
6265
```
@@ -74,30 +77,43 @@ const useProxy = require('puppeteer-page-proxy');
7477
const page = await browser.newPage();
7578

7679
await page.setRequestInterception(true);
77-
page.on('request', req => {
78-
useProxy(req, proxy);
80+
page.on('request', async req => {
81+
await useProxy(req, proxy);
7982
});
8083
await page.goto(site);
8184
})();
8285
```
8386
The request object itself is passed as the first argument. The proxy can now be changed every request.
84-
Leaving it as is will have the same effect as applying a proxy for the whole page by passing in the page object as an argument. Basically, the same proxy will be used for all requests within the page.
8587

86-
Using it with other interception methods is straight forward aswell:
88+
Using it along with other interception methods:
8789
```js
8890
await page.setRequestInterception(true);
89-
page.on('request', req => {
91+
page.on('request', async req => {
9092
if (req.resourceType() === 'image') {
9193
req.abort();
9294
} else {
93-
useProxy(req, proxy);
95+
await useProxy(req, proxy);
9496
}
9597
});
9698
```
97-
All requests can be handled exactly once, so it's not possible to intercept the same request after a proxy has been applied to it. This means that it will not be possible to call (e.g. [request.abort](https://github.com/puppeteer/puppeteer/blob/master/docs/api.md#requestaborterrorcode), [request.continue](https://github.com/puppeteer/puppeteer/blob/master/docs/api.md#requestcontinueoverrides)) on the same request without getting a *'Request is already handled!'* error message. This is because `puppeteer-page-proxy` internally calls [request.respond](https://github.com/puppeteer/puppeteer/blob/master/docs/api.md#requestrespondresponse) which fulfills the request.
9899

99-
**NOTE:** It is necessary to set [page.setRequestInterception](https://github.com/puppeteer/puppeteer/blob/master/docs/api.md#pagesetrequestinterceptionvalue) to true when setting proxies this way, otherwise the function will fail.
100+
Overriding requests:
101+
```js
102+
await page.setRequestInterception(true);
103+
page.on('request', async req => {
104+
await useProxy(req, {
105+
proxy: proxy,
106+
url: 'https://example.com',
107+
method: 'POST',
108+
postData: '404',
109+
headers: {
110+
accept: 'text/html'
111+
}
112+
});
113+
});
114+
```
100115

116+
**NOTE:** It is necessary to set [page.setRequestInterception](https://github.com/puppeteer/puppeteer/blob/master/docs/api.md#pagesetrequestinterceptionvalue) to true when setting proxies per request, otherwise the function will fail.
101117

102118
#### Authentication:
103119
```js
@@ -134,16 +150,21 @@ const useProxy = require('puppeteer-page-proxy');
134150
```
135151

136152
## FAQ
137-
#### How does puppeteer-page-proxy work?
153+
#### How does this module work?
138154

139-
It takes over the task of requesting resources from the browser to instead do it internally. This means that the requests that the browser is usually supposed to make directly, are instead intercepted and made indirectly via Node using a requests library. This naturally means that Node also receives the responses that the browser would have normally received from those requests. For changing the proxy, the requests are routed through the specified proxy server using ***-proxy-agent**'s. The responses are then forwarded back to the browser as mock/simulated responses using the [request.respond](https://github.com/puppeteer/puppeteer/blob/master/docs/api.md#requestrespondresponse) method, making the browser think that a response has been received from the server, thus fulfilling the request and rendering any content from the response onto the screen.
155+
It takes over the task of requesting content **from** the browser to do it internally via a requests library instead. Requests that are normally made by the browser, are thus made by Node. The IP's are changed by routing the requests through the specified proxy servers using ***-proxy-agent's**. When Node gets a response back from the server, it's forwarded to the browser for completion/rendering.
140156

141-
#### Why does the browser show _"Your connection to this site is not secure"_ when connecting to **https** sites?
157+
#### Why am I getting _"Request is already handled!"_?
142158

143-
This is simply because the server and the browser are unable perform the secure handshakes for the connections due to the requests being intercepted and effectively blocked by Node when forwarding responses to the browser. However, despite the browser alerting of an insecure connection, the requests are infact made securely through Node as seen from the connection property of the response object:
159+
This happens when there is an attempt to handle the same request more than once. An intercepted request is handled by either [request.abort](https://github.com/puppeteer/puppeteer/blob/master/docs/api.md#requestaborterrorcode), [request.continue](https://github.com/puppeteer/puppeteer/blob/master/docs/api.md#requestcontinueoverrides) or [request.respond](https://github.com/puppeteer/puppeteer/blob/master/docs/api.md#requestrespondresponse) methods. Each of these methods 'send' the request to its destination. A request that has already reached its destination cannot be intercepted or handled.
144160

145161

146-
```
162+
#### Why does the browser show _"Your connection to this site is not secure"_?
163+
164+
Because direct requests from the browser to the server are being intercepted by Node, making the establishment of a secure connection between them impossible. However, the requests aren't made by the browser, they are made by Node. All `https` requests made through Node using this module are secure. This is evidenced by the connection property of the response object:
165+
166+
167+
```json
147168
connection: TLSSocket {
148169
_tlsOptions: {
149170
secureContext: [SecureContext],
@@ -155,7 +176,7 @@ connection: TLSSocket {
155176
encrypted: true,
156177
}
157178
```
158-
While a proxy is applied, the browser is just an empty drawing board used for rendering content on the screen. All the network requests and responses, both secure and non-secure, are made by Node. Because of this, it makes no difference whether the site in the browser is shown as insecure or not.
179+
You can think of the warning as a false positive.
159180

160181
## Dependencies
161182
- [Got](https://github.com/sindresorhus/got)

changelog.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
11
# Change log
2+
### [1.2.5] - 2020-05-21
3+
#### Changes
4+
- Added ability to override requests
5+
- Increase redirect restriction ([#17](https://github.com/Cuadrix/puppeteer-page-proxy/issues/17))
26
### [1.2.4] - 2020-05-18
37
#### Changes
48
- Fix 'net::ERR_FAILED' by updating package to work with latest Got ([#16](https://github.com/Cuadrix/puppeteer-page-proxy/issues/16), [#14](https://github.com/Cuadrix/puppeteer-page-proxy/issues/14))

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"name": "puppeteer-page-proxy",
33
"description": "Additional Node.js module to use with 'puppeteer' for setting proxies per page basis.",
4-
"version": "1.2.4",
4+
"version": "1.2.5",
55
"author": "Cuadrix <[email protected]> (https://github.com/Cuadrix)",
66
"homepage": "https://github.com/Cuadrix/puppeteer-page-proxy",
77
"main": "./src/index.js",

src/core/lookup.js

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,9 +30,8 @@ const lookup = async (page, lookupService = "https://api.ipify.org?format=json",
3030
}
3131
return await XMLHttpRequest();
3232
} catch(error) {
33-
if (error.message === "Execution context was destroyed, most likely because of a navigation." || error.message === "Execution context was destroyed.") {
33+
if (error.message === "Execution context was destroyed, most likely because of a navigation." || error.message === "Execution context was destroyed.")
3434
return await XMLHttpRequest();
35-
}
3635
}
3736
};
3837
module.exports = lookup;

src/core/proxy.js

Lines changed: 36 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,56 +1,64 @@
1-
const {setHeaders, setAgent, request} = require("../lib/request");
2-
const cookies = require("../lib/cookies");
1+
const request = require("got");
2+
const {type} = require("../lib/types");
3+
const {getCookies, cookieStore} = require("../lib/cookies");
4+
const {setOverrides, setHeaders, setAgent} = require("../lib/options");
35

4-
const pageProxy = async (param, proxy) => {
5-
let page, req;
6-
if (param.constructor.name === "Request") {
7-
req = param;
8-
} else if (param.constructor.name === "Page") {
9-
page = param;
10-
await page.setRequestInterception(true);
11-
}
12-
// Responsible for forward requesting using proxy
13-
const $puppeteerPageProxyHandler = async req => {
6+
const useProxy = async (target, proxy) => {
7+
// Listener responsible for applying proxy
8+
const $puppeteerPageProxyHandler = async req => {
149
endpoint = req._client._connection._url;
1510
targetId = req._frame._id;
16-
const cookieJar = cookies.store(
17-
await cookies.get(endpoint, targetId)
11+
const cookieJar = cookieStore(
12+
await getCookies(endpoint, targetId)
1813
);
1914
const options = {
2015
cookieJar,
2116
method: req.method(),
22-
responseType: "buffer",
23-
headers: setHeaders(req),
2417
body: req.postData(),
25-
followRedirect: false,
18+
headers: setHeaders(req),
19+
agent: setAgent(proxy),
20+
responseType: "buffer",
2621
throwHttpErrors: false
2722
};
2823
try {
29-
options.agent = setAgent(req.url(), proxy);
3024
const res = await request(req.url(), options);
31-
await req.respond(res);
25+
await req.respond({
26+
status: res.statusCode,
27+
headers: res.headers,
28+
body: res.body
29+
});
3230
} catch(error) {
3331
await req.abort();
3432
}
3533
};
3634
// Remove existing listener for reassigning proxy of current page
37-
const removeRequestListener = () => {
35+
const removeRequestListener = (page, listenerName) => {
3836
const listeners = page.listeners("request");
3937
for (let i = 0; i < listeners.length; i++) {
40-
if (listeners[i].name === "$puppeteerPageProxyHandler") {
38+
if (listeners[i].name === listenerName) {
4139
page.removeListener("request", listeners[i]);
4240
}
4341
}
4442
};
45-
if (req) {
46-
$puppeteerPageProxyHandler(req);
47-
} else {
48-
removeRequestListener();
43+
// Proxy per request
44+
if (target.constructor.name === "Request") {
45+
if (type(proxy) == "object") {
46+
target = setOverrides(target, proxy);
47+
proxy = proxy.proxy;
48+
}
49+
await $puppeteerPageProxyHandler(target);
50+
// Page-wide proxy
51+
} else if (target.constructor.name === "Page") {
52+
if (type(proxy) == "object") {
53+
proxy = proxy.proxy;
54+
}
55+
await target.setRequestInterception(true);
56+
removeRequestListener(target, "$puppeteerPageProxyHandler");
4957
if (proxy) {
50-
page.on("request", $puppeteerPageProxyHandler);
58+
target.on("request", $puppeteerPageProxyHandler);
5159
} else {
52-
await page.setRequestInterception(false);
60+
await target.setRequestInterception(false);
5361
}
5462
}
5563
};
56-
module.exports = pageProxy;
64+
module.exports = useProxy;

src/index.d.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ export = puppeteer_page_proxy;
1111
* @param page 'Page' or 'Request' object to set a proxy for.
1212
* @param proxy Proxy to use in the current page. Must begin with a protocol e.g. **http://**, **https://**, **socks://**.
1313
*/
14-
declare function puppeteer_page_proxy(page: object, proxy: string): Promise<any>;
14+
declare function puppeteer_page_proxy(page: object, proxy: string | object): Promise<any>;
1515
declare namespace puppeteer_page_proxy {
1616
/**
1717
* **Request data from a lookupservice.**

src/lib/cdp.js

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,7 @@ const cdp = {
2121
flatten: true,
2222
}
2323
})).result;
24-
if (result) {
25-
return result.sessionId;
26-
}
24+
return (result) ? result.sessionId : undefined;
2725
}
2826
},
2927
Network: {
@@ -33,9 +31,7 @@ const cdp = {
3331
id: 2,
3432
method: "Network.getCookies"
3533
})).result;
36-
if (result) {
37-
return result.cookies;
38-
}
34+
return (result) ? result.cookies : undefined;
3935
}
4036
}
4137
};

src/lib/cookies.js

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ const {CookieJar} = require("tough-cookie");
33
const {Target, Network} = require("./cdp");
44

55
const cookies = {
6-
async get(endpoint, targetId) {
6+
async getCookies(endpoint, targetId) {
77
const ws = new WebSocket(endpoint, {
88
perMessageDeflate: false,
99
maxPayload: 180 * 4096 // 0.73728Mb
@@ -13,10 +13,9 @@ const cookies = {
1313
const sessionId = await Target.attachToTarget(ws, targetId);
1414
return await Network.getCookies(ws, sessionId);
1515
},
16-
store(cookies) {
17-
if (!cookies) {
16+
cookieStore(cookies) {
17+
if (!cookies)
1818
return;
19-
}
2019
return CookieJar.deserializeSync({
2120
version: '[email protected]',
2221
storeType: 'MemoryCookieStore',
Lines changed: 24 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,29 @@
1-
const got = require("got");
21
const HttpProxyAgent = require("http-proxy-agent");
32
const HttpsProxyAgent = require("https-proxy-agent");
43
const SocksProxyAgent = require("socks-proxy-agent");
54

6-
const request = {
5+
const options = {
6+
setOverrides(req, overrides) {
7+
const map = {url: true, method: true, postData: true, headers: true};
8+
for (const key in overrides) {
9+
if (key == "headers")
10+
req.$headers = true
11+
if (map[key])
12+
req[key] = () => overrides[key];
13+
}
14+
return req;
15+
},
716
setHeaders(req) {
17+
// If headers have been overriden
18+
if (req.$headers)
19+
return req.headers();
20+
// Extended default headers
821
const headers = {
922
...req.headers(),
1023
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
1124
"accept-encoding": "gzip, deflate, br",
1225
"host": new URL(req.url()).hostname
13-
};
26+
}
1427
if (req.isNavigationRequest()) {
1528
headers["sec-fetch-mode"] = "navigate";
1629
headers["sec-fetch-site"] = "none";
@@ -21,30 +34,17 @@ const request = {
2134
}
2235
return headers;
2336
},
24-
setAgent(url, proxy) {
25-
if (proxy.startsWith("socks")) {
37+
// For applying proxy
38+
setAgent(proxy) {
39+
if (proxy.startsWith("socks"))
2640
return {
2741
http: new SocksProxyAgent(proxy),
2842
https: new SocksProxyAgent(proxy)
29-
}
30-
} else {
31-
return {
32-
http: new HttpProxyAgent(proxy),
33-
https: new HttpsProxyAgent(proxy)
34-
}
35-
}
36-
},
37-
async request(url, options) {
38-
try {
39-
const res = await got(url, options);
40-
return {
41-
status: res.statusCode,
42-
headers: res.headers,
43-
body: res.body
4443
};
45-
} catch(error) {
46-
throw new Error(error);
47-
}
44+
return {
45+
http: new HttpProxyAgent(proxy),
46+
https: new HttpsProxyAgent(proxy)
47+
};
4848
}
4949
};
50-
module.exports = request;
50+
module.exports = options;

0 commit comments

Comments
 (0)