Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Stealth plugin's iframe.contentWindow deletes the entire DOM and crashes site in an unrecoverable error #909

Open
AbraarArique opened this issue Aug 21, 2024 · 2 comments
Labels
issue: bug report A bug has been reported needs triage

Comments

@AbraarArique
Copy link

Describe the bug

When visiting certain pages (see full code example below), puppeteer-extra-plugin-stealth's iframe.contentWindow evasion interacts with the site's JavaScript to cause the entire DOM/HTML to go blank or get deleted.

No direct errors are thrown, which made it very difficult to pinpoint the cause of this issue when I first encountered it.

But after tweaking many Puppeteer settings, I found that puppeteer-extra-plugin-stealth is causing it.

Then I was able to track it down to a specific evasion: iframe.contentWindow.

Code Snippet

import pptr from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';

const puppeteer = pptr.default;

const evasions = new Set([
  'chrome.app',
  'chrome.csi',
  'chrome.loadTimes',
  'chrome.runtime',
  'defaultArgs',
  'iframe.contentWindow',
  'media.codecs',
  'navigator.hardwareConcurrency',
  'navigator.languages',
  'navigator.permissions',
  'navigator.plugins',
  'navigator.webdriver',
  'sourceurl',
  'user-agent-override',
  'webgl.vendor',
  'window.outerdimensions',
]);
puppeteer.use(StealthPlugin({ enabledEvasions: evasions }));

const browser = await puppeteer.launch();
const page = await browser.newPage();

// This try/catch is needed because this site will often exceed the default 30s timeout
try {
  await page.goto(
    'https://variety.com/2024/film/global/johnny-depp-modi-premiere-san-sebastian-film-festival-1236111999/',
  );
} catch {}

const body = await page.evaluate(() => document.body.outerHTML.slice(0, 50));
console.log(body);

When I run this code, this error is thrown:

node:internal/process/esm_loader:34
      internalBinding('errors').triggerUncaughtException(
                                ^

Error [TypeError]: Cannot read properties of null (reading 'outerHTML')
    at evaluate (evaluate at file:///Users/abraar/Documents/monorepo/packages/ai-dataset-bot/src/run.ts:61:1242, <anonymous>:0:19)
    at ExecutionContext.#evaluate (/Users/abraar/Documents/monorepo/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/src/cdp/ExecutionContext.ts:304:34)
    at ExecutionContext.evaluate (/Users/abraar/Documents/monorepo/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/src/cdp/ExecutionContext.ts:157:12)
    at IsolatedWorld.evaluate (/Users/abraar/Documents/monorepo/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/src/cdp/IsolatedWorld.ts:143:12)
    at CdpFrame.evaluate (/Users/abraar/Documents/monorepo/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/src/api/Frame.ts:470:12)
    at CdpPage.evaluate (/Users/abraar/Documents/monorepo/node_modules/.pnpm/[email protected]/node_modules/puppeteer-core/src/api/Page.ts:2190:12)
    at <anonymous> (/Users/abraar/Documents/monorepo/packages/ai-dataset-bot/src/run.ts:488:14)

Node.js v20.11.1

The reason why the TypeError occurs is because document.body is null.

In fact, if you turn off headless and inspect visually, you'll first see the site load normally, but then everything on the page goes blank, and all HTML elements inside Chrome DevTools have disappeared.

But if you comment out iframe.contentWindow from the list of evasions, it works properly:

<body class="home blog pmc-gallery__ pmc-desktop p

For the particular site above, I also discovered that if you block all requests that include the path pmc-plugins in DevTools (this is a WordPress plugin), the above error doesn't occur.

So I'm guessing that stealth plugin's iframe.contentWindow somehow messes up this site's internal JS scripts.

This webpage also seems to contain <iframe> tags with srcdoc property, which may be relevant to this issue.

I've also experienced this issue on other sites run by the same organization, such as https://www.billboard.com/

Versions

  System:
    OS: macOS 12.7.4
    CPU: (4) x64 Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
    Memory: 786.00 MB / 8.00 GB
    Shell: 5.8.1 - /bin/zsh
  Binaries:
    Node: 20.11.1 - /usr/local/bin/node
    Yarn: 1.22.19 - ~/.yarn/bin/yarn
    npm: 10.2.4 - /usr/local/bin/npm
    pnpm: 9.5.0 - /usr/local/bin/pnpm
  npmPackages:
    puppeteer-core: ^22.6.5 => 22.6.5 
    puppeteer-extra: ^3.3.6 => 3.3.6 
    puppeteer-extra-plugin-adblocker: ^2.13.6 => 2.13.6 
    puppeteer-extra-plugin-stealth: ^2.11.2 => 2.11.2 
@AbraarArique AbraarArique added issue: bug report A bug has been reported needs triage labels Aug 21, 2024
@dannyokec
Copy link

Have you been able to resolve this or make we enter the matter ?

@vladtreny
Copy link

just remove iframe.contentWindow. This works:

import puppeteer from 'puppeteer-extra'
import pptr from 'puppeteer-extra'

            const puppeteer = pptr.default

            const evasions = new Set([
                'chrome.app',
                'chrome.csi',
                'chrome.loadTimes',
                'chrome.runtime',
                'defaultArgs',
                //  'iframe.contentWindow',
                'media.codecs',
                'navigator.hardwareConcurrency',
                'navigator.languages',
                'navigator.permissions',
                'navigator.plugins',
                'navigator.webdriver',
                'sourceurl',
                'user-agent-override',
                'webgl.vendor',
                'window.outerdimensions'
            ])
            console.log('here')
            puppeteer.use(StealthPlugin({enabledEvasions: evasions}))

            const browser = await puppeteer.launch({
                userDataDir: '_',
                executablePath: '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome',

                headless: false  // <--- 1
            })
            const page = await browser.newPage()

// This try/catch is needed because this site will often exceed the default 30s timeout
            try {
                await page.goto(
                    'https://variety.com/2024/film/global/johnny-depp-modi-premiere-san-sebastian-film-festival-1236111999/', {timeout: 5_000}
                )
            } catch {}

            const body = await page.evaluate(() => document.body.outerHTML.slice(0, 50))
            console.log(body)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
issue: bug report A bug has been reported needs triage
Projects
None yet
Development

No branches or pull requests

3 participants