@@ -198,7 +198,8 @@ ZYTE_API_MAX_REQUESTS
198
198
Default: ``None ``
199
199
200
200
When set to an integer value > 0, the spider will close when the number of Zyte
201
- API requests reaches it.
201
+ API requests reaches it, with ``closespider_max_zapi_requests `` as the close
202
+ reason.
202
203
203
204
Note that requests with error responses that cannot be retried or exceed their
204
205
retry limit also count here.
@@ -246,6 +247,261 @@ subclass.
246
247
See :ref: `retry `.
247
248
248
249
250
+ .. setting :: ZYTE_API_SESSION_CHECKER
251
+
252
+ ZYTE_API_SESSION_CHECKER
253
+ ========================
254
+
255
+ Default: ``None ``
256
+
257
+ A :ref: `Scrapy component <topics-components >` (or its import path as a string)
258
+ that defines a ``check `` method.
259
+
260
+ If ``check `` returns ``True ``, the response session is considered valid; if
261
+ ``check `` returns ``False ``, the response session is considered invalid, and
262
+ will be discarded. ``check `` can also raise a
263
+ :exc: `~scrapy.exceptions.CloseSpider ` exception to close the spider.
264
+
265
+ If defined, the ``check `` method is called on every response that is using a
266
+ :ref: `session managed by scrapy-zyte-api <session >`. If not defined, the
267
+ default implementation checks the outcome of the ``setLocation `` action if
268
+ session initialization was location-based, as described in
269
+ :ref: `session-check `.
270
+
271
+ Example:
272
+
273
+ .. code-block :: python
274
+ :caption: settings.py
275
+
276
+ from scrapy import Request
277
+ from scrapy.http.response import Response
278
+
279
+
280
+ class MySessionChecker :
281
+
282
+ def check (self , request : Request, response : Response) -> bool :
283
+ return bool (response.css(" .is_valid" ))
284
+
285
+
286
+ ZYTE_API_SESSION_CHECKER = MySessionChecker
287
+
288
+ Because the session checker is a Scrapy component, you can access the crawler
289
+ object, for example to read settings:
290
+
291
+ .. code-block :: python
292
+ :caption: settings.py
293
+
294
+ from scrapy import Request
295
+ from scrapy.http.response import Response
296
+
297
+
298
+ class MySessionChecker :
299
+
300
+ @ classmethod
301
+ def from_crawler (cls , crawler ):
302
+ return cls (crawler)
303
+
304
+ def __init__ (self , crawler ):
305
+ location = crawler.settings[" ZYTE_API_SESSION_LOCATION" ]
306
+ self .postal_code = location[" postalCode" ]
307
+
308
+ def check (self , request : Request, response : Response) -> bool :
309
+ return response.css(" .postal_code::text" ).get() == self .postal_code
310
+
311
+
312
+ ZYTE_API_SESSION_CHECKER = MySessionChecker
313
+
314
+
315
+ .. setting :: ZYTE_API_SESSION_ENABLED
316
+
317
+ ZYTE_API_SESSION_ENABLED
318
+ ========================
319
+
320
+ Default: ``False ``
321
+
322
+ Enables :ref: `scrapy-zyte-api session management <session >`.
323
+
324
+
325
+ .. setting :: ZYTE_API_SESSION_LOCATION
326
+
327
+ ZYTE_API_SESSION_LOCATION
328
+ =========================
329
+
330
+ Default: ``{} ``
331
+
332
+ If defined, sessions are initialized using the ``setLocation ``
333
+ :http: `action <request:actions> `, and the value of this setting must be the
334
+ target address :class: `dict `. For example:
335
+
336
+ .. code-block :: python
337
+ :caption: settings.py
338
+
339
+ ZYTE_API_SESSION_LOCATION = {" postalCode" : " 10001" }
340
+
341
+ If the :setting: `ZYTE_API_SESSION_PARAMS ` setting or the
342
+ :reqmeta: `zyte_api_session_params ` request metadata key set a ``"url" ``, it
343
+ will be used for session initialization as well. Otherwise, the URL of the
344
+ request for which the session is being initialized will be used instead.
345
+
346
+ This setting, if not empty, takes precedence over the
347
+ :setting: `ZYTE_API_SESSION_PARAMS ` setting and the
348
+ :reqmeta: `zyte_api_session_params ` request metadata key, but it can be
349
+ overridden by the :reqmeta: `zyte_api_session_location ` request metadata key.
350
+
351
+ To disable the :setting: `ZYTE_API_SESSION_LOCATION ` setting on a specific
352
+ request, e.g. to use the :setting: `ZYTE_API_SESSION_PARAMS ` setting or the
353
+ :reqmeta: `zyte_api_session_params ` request metadata key instead, set
354
+ the :reqmeta: `zyte_api_session_location ` request metadata key to ``{} ``.
355
+
356
+
357
+ .. setting :: ZYTE_API_SESSION_MAX_BAD_INITS
358
+
359
+ ZYTE_API_SESSION_MAX_BAD_INITS
360
+ ==============================
361
+
362
+ Default: ``8 ``
363
+
364
+ The maximum number of :ref: `scrapy-zyte-api sessions <session >` per pool that
365
+ are allowed to fail their session check right after creation in a row. If the
366
+ maximum is reached, the spider closes with ``bad_session_inits `` as the close
367
+ reason.
368
+
369
+ To override this value for specific pools, use
370
+ :setting: `ZYTE_API_SESSION_MAX_BAD_INITS_PER_POOL `.
371
+
372
+
373
+ .. setting :: ZYTE_API_SESSION_MAX_BAD_INITS_PER_POOL
374
+
375
+ ZYTE_API_SESSION_MAX_BAD_INITS_PER_POOL
376
+ =======================================
377
+
378
+ Default: ``{} ``
379
+
380
+ :class: `dict ` where keys are :ref: `pool <session-pools >` IDs and values are
381
+ overrides of :setting: `ZYTE_API_SESSION_POOL_SIZE ` for those pools.
382
+
383
+
384
+ .. setting :: ZYTE_API_SESSION_MAX_ERRORS
385
+
386
+ ZYTE_API_SESSION_MAX_ERRORS
387
+ ===========================
388
+
389
+ Default: ``1 ``
390
+
391
+ Maximum number of :ref: `unsuccessful responses
392
+ <zyte-api-unsuccessful-responses>` allowed for any given session before
393
+ discarding the session.
394
+
395
+ You might want to increase this number if you find that a session may continue
396
+ to work even after an unsuccessful response. See :ref: `optimize-sessions `.
397
+
398
+ .. note :: This setting does not affect session checks
399
+ (:setting: `ZYTE_API_SESSION_CHECKER `). A session is always discarded the
400
+ first time it fails its session check.
401
+
402
+
403
+ .. setting :: ZYTE_API_SESSION_PARAMS
404
+
405
+ ZYTE_API_SESSION_PARAMS
406
+ =======================
407
+
408
+ Default: ``{"browserHtml": True} ``
409
+
410
+ Parameters to use for session initialization.
411
+
412
+ It works similarly to :http: `request:sessionContextParams ` from
413
+ :ref: `server-managed sessions <zyte-api-session-contexts >`, but it supports
414
+ arbitrary Zyte API parameters instead of a specific subset.
415
+
416
+ If it does not define a ``"url" ``, the URL of the request for which the session
417
+ is being initialized will be used.
418
+
419
+ This setting can be overridden by the :setting: `ZYTE_API_SESSION_LOCATION `
420
+ setting, the :reqmeta: `zyte_api_session_location ` request metadata key, or the
421
+ :reqmeta: `zyte_api_session_params ` request metadata key.
422
+
423
+ Example:
424
+
425
+ .. code-block :: python
426
+ :caption: settings.py
427
+
428
+ ZYTE_API_SESSION_PARAMS = {
429
+ " browserHtml" : True ,
430
+ " actions" : [
431
+ {
432
+ " action" : " setLocation" ,
433
+ " address" : {" postalCode" : " 10001" },
434
+ }
435
+ ],
436
+ }
437
+
438
+ .. tip :: The example above is equivalent to setting
439
+ :setting: `ZYTE_API_SESSION_LOCATION ` to ``{"postalCode": "10001"} ``.
440
+
441
+
442
+ .. setting :: ZYTE_API_SESSION_POOL_SIZE
443
+
444
+ ZYTE_API_SESSION_POOL_SIZE
445
+ ==========================
446
+
447
+ Default: ``8 ``
448
+
449
+ The maximum number of active :ref: `scrapy-zyte-api sessions <session >` to keep
450
+ per :ref: `pool <session-pools >`.
451
+
452
+ To override this value for specific pools, use
453
+ :setting: `ZYTE_API_SESSION_POOL_SIZES `.
454
+
455
+ Increase this number to lower the frequency with which requests are sent
456
+ through each session, which on some websites may increase the lifetime of each
457
+ session. See :ref: `optimize-sessions `.
458
+
459
+
460
+ .. setting :: ZYTE_API_SESSION_POOL_SIZES
461
+
462
+ ZYTE_API_SESSION_POOL_SIZES
463
+ ===========================
464
+
465
+ Default: ``{} ``
466
+
467
+ :class: `dict ` where keys are :ref: `pool <session-pools >` IDs and values are
468
+ overrides of :setting: `ZYTE_API_SESSION_POOL_SIZE ` for those pools.
469
+
470
+
471
+ .. setting :: ZYTE_API_SESSION_QUEUE_MAX_ATTEMPTS
472
+
473
+ ZYTE_API_SESSION_QUEUE_MAX_ATTEMPTS
474
+ ===================================
475
+
476
+ Default: ``60 ``
477
+
478
+ scrapy-zyte-api maintains a rotation queue of ready-to-use sessions per
479
+ :ref: `pool <session-pools >`. At some points, the queue might be empty for a
480
+ given pool because all its sessions are in the process of being initialized or
481
+ refreshed.
482
+
483
+ If the queue is empty when trying to assign a session to a request,
484
+ scrapy-zyte-api will wait some time
485
+ (:setting: `ZYTE_API_SESSION_QUEUE_WAIT_TIME `), and then try to get a session
486
+ from the queue again.
487
+
488
+ Use this setting to configure the maximum number of attempts before giving up
489
+ and raising a :exc: `RuntimeError ` exception.
490
+
491
+
492
+ .. setting :: ZYTE_API_SESSION_QUEUE_WAIT_TIME
493
+
494
+ ZYTE_API_SESSION_QUEUE_WAIT_TIME
495
+ ===================================
496
+
497
+ Default: ``1.0 ``
498
+
499
+ Number of seconds to wait between attempts to get a session from a rotation
500
+ queue.
501
+
502
+ See :setting: `ZYTE_API_SESSION_QUEUE_MAX_ATTEMPTS ` for details.
503
+
504
+
249
505
.. setting :: ZYTE_API_SKIP_HEADERS
250
506
251
507
ZYTE_API_SKIP_HEADERS
0 commit comments