@@ -316,7 +316,7 @@ instances of the :class:`~.PageObjectRegistry` instead:
316316 cool_gadget_fr_registry = PageObjectRegistry()
317317 furniture_shop_registry = PageObjectRegistry()
318318
319- After declaring the :class: `~.PageObjectRegistry ` instances, they can be imported
319+ After declaring the :class: `~.PageObjectRegistry ` instances, they can be used
320320in each of the Page Object packages like so:
321321
322322.. code-block :: python
@@ -432,3 +432,173 @@ Retrieving all of the Product Listing Override rules would simply be:
432432
433433 # We can also filter it down further on a per site basis if needed.
434434 rules = product_listings_registry.get_overrides_from(" my_page_obj_project.cool_gadget_site" )
435+
436+ Using Overrides from External Packages
437+ --------------------------------------
438+
439+ Developers have the option to import existing Page Objects alongside the Override
440+ Rules attached to them. This section aims to showcase different ways you can
441+ play with the Registries to manipulate the Override Rules according to your needs.
442+
443+ Let's suppose we have the following use case before us:
444+
445+ - An external Python package named ``ecommerce_page_objects `` is available
446+ which contains Page Objects for common websites. It's using the
447+ ``default_registry `` from **web-poet **.
448+ - Another similar package named ``gadget_sites_page_objects `` is available
449+ for more specific websites. It's using its own registry named
450+ ``gadget_registry ``.
451+ - Your project's objectives is to handle as much eCommerce websites as you
452+ can. Thus, you'd want to use the already available packages above and
453+ perhaps improve on them or create new Page Objects for new websites.
454+
455+ Assuming that you'd want to **use all existing Override rules from the external
456+ packages ** in your project, you can do it like:
457+
458+ .. code-block :: python
459+
460+ import ecommerce_page_objects
461+ import gadget_sites_page_objects
462+ from web_poet import PageObjectRegistry, consume_modules, default_registry
463+
464+ consume_modules(" ecommerce_page_objects" , " gadget_sites_page_objects" )
465+
466+ combined_registry = PageObjectRegistry()
467+ combined_registry.data = {
468+ # Since ecommerce_page_objects is using web_poet.default_registry, then
469+ # it functions like a global registry which we can access as:
470+ ** default_registry.data,
471+
472+ ** gadget_sites_page_objects.gadget_registry.data,
473+ }
474+
475+ combined_rules = combined_registry.get_overrides()
476+
477+ # The combined_rules would be as follows:
478+ # 1. OverrideRule(for_patterns=Patterns(include=['site_1.com'], exclude=[], priority=500), use=<class 'ecommerce_page_objects.site_1.EcomSite1'>, instead_of=<class 'ecommerce_page_objects.EcomGenericPage'>, meta={})
479+ # 2. OverrideRule(for_patterns=Patterns(include=['site_2.com'], exclude=[], priority=500), use=<class 'ecommerce_page_objects.site_2.EcomSite2'>, instead_of=<class 'ecommerce_page_objects.EcomGenericPage'>, meta={})
480+ # 3. OverrideRule(for_patterns=Patterns(include=['site_2.com'], exclude=[], priority=500), use=<class 'gadget_sites_page_objects.site_2.GadgetSite2'>, instead_of=<class 'gadget_sites_page_objects.GadgetGenericPage'>, meta={})
481+ # 4. OverrideRule(for_patterns=Patterns(include=['site_3.com'], exclude=[], priority=500), use=<class 'gadget_sites_page_objects.site_3.GadgetSite3'>, instead_of=<class 'gadget_sites_page_objects.GadgetGenericPage'>, meta={})
482+
483+ .. note ::
484+
485+ Note that ``registry.get_overrides() == list(registry.data.values()) ``. We're
486+ using ``registry.data `` for these cases so that we can easily look up specific
487+ Page Objects using the ``dict ``'s key. Otherwise, it may become a problem on
488+ large cases with lots of Override rules.
489+
490+ .. note ::
491+
492+ If you don't need the entire data contents of Registries, then you can opt
493+ to use :meth: `~.PageObjectRegistry.data_from ` to easily filter them out
494+ per package/module.
495+
496+ Here's an example:
497+
498+ .. code-block :: python
499+
500+ default_registry.data_from(" ecommerce_page_objects.site_1" , " ecommerce_page_objects.site_2" )
501+
502+ As you can see in the example above, we can easily combine the data from multiple
503+ different registries as it simply follows a ``Dict[Callable, OverrideRule] ``
504+ structure. There won't be any duplication or clashes of ``dict `` keys between
505+ registries of different external packages since the keys are the Page Object
506+ classes intended to be used. From our example above, the ``dict `` keys from a
507+ given ``data `` registry attribute would be:
508+
509+ 1. ``<class 'ecommerce_page_objects.site_1.EcomSite1'> ``
510+ 2. ``<class 'ecommerce_page_objects.site_2.EcomSite2'> ``
511+ 3. ``<class 'gadget_sites_page_objects.site_2.GadgetSite2'> ``
512+ 4. ``<class 'gadget_sites_page_objects.site_3.GadgetSite3'> ``
513+
514+ As you might've observed, combining the two Registries above may result in a
515+ conflict for the Override rules for **#2 ** and **#3 **:
516+
517+ .. code-block :: python
518+
519+ # 2. OverrideRule(for_patterns=Patterns(include=['site_2.com'], exclude=[], priority=500), use=<class 'ecommerce_page_objects.site_2.EcomSite2'>, instead_of=<class 'ecommerce_page_objects.EcomGenericPage'>, meta={})
520+ # 3. OverrideRule(for_patterns=Patterns(include=['site_2.com'], exclude=[], priority=500), use=<class 'gadget_sites_page_objects.site_2.GadgetSite2'>, instead_of=<class 'gadget_sites_page_objects.GadgetGenericPage'>, meta={})
521+
522+ The `url-matcher `_ library is the one responsible breaking such conflicts. It's
523+ specifically discussed in this section: `rules-conflict-resolution
524+ <https://url-matcher.readthedocs.io/en/stable/intro.html#rules-conflict-resolution> `_.
525+
526+ However, it's technically **NOT ** a conflict, **yet **, since:
527+
528+ - ``ecommerce_page_objects.site_2.EcomSite2 `` would only be used in **site_2.com **
529+ if ``ecommerce_page_objects.EcomGenericPage `` is to be replaced.
530+ - The same case with ``gadget_sites_page_objects.site_2.GadgetSite2 `` wherein
531+ it's only going to be utilized for **site_2.com ** if the following is to be
532+ replaced: ``gadget_sites_page_objects.GadgetGenericPage ``.
533+
534+ It would be only become a conflict if the **#2 ** and **#3 ** Override Rules for
535+ **site_2.com ** both intend to replace the same Page Object. In fact, none of the
536+ Override Rules above would ever be used if your project never intends to use the
537+ following Page Objects *(since there's nothing to override) *. You can import
538+ these Page Objects into your project and use them so they can be overridden:
539+
540+ - ``ecommerce_page_objects.EcomGenericPage ``
541+ - ``gadget_sites_page_objects.GadgetGenericPage ``
542+
543+ However, let's assume that you want to create your own generic Page Object and
544+ only intend to use it instead of the ones above. We can easily replace them like:
545+
546+ .. code-block :: python
547+
548+ class ImprovedEcommerceGenericPage :
549+ def to_item (self ):
550+ ... # different type of generic parsers
551+
552+ for _, rule in combined_registry.data.items():
553+ rule.instead_of = ImprovedEcommerceGenericPage
554+
555+ updated_rules = combined_registry.get_overrides()
556+
557+ # The updated_rules would be as follows:
558+ # 1. OverrideRule(for_patterns=Patterns(include=['site_1.com'], exclude=[], priority=500), use=<class 'ecommerce_page_objects.site_1.EcomSite1'>, instead_of=<class 'my_project.ImprovedEcommerceGenericPage'>, meta={})
559+ # 2. OverrideRule(for_patterns=Patterns(include=['site_2.com'], exclude=[], priority=500), use=<class 'ecommerce_page_objects.site_2.EcomSite2'>, instead_of=<class 'my_project.ImprovedEcommerceGenericPage'>, meta={})
560+ # 3. OverrideRule(for_patterns=Patterns(include=['site_2.com'], exclude=[], priority=500), use=<class 'gadget_sites_page_objects.site_2.GadgetSite2'>, instead_of=<class 'my_project.ImprovedEcommerceGenericPage'>, meta={})
561+ # 4. OverrideRule(for_patterns=Patterns(include=['site_3.com'], exclude=[], priority=500), use=<class 'gadget_sites_page_objects.site_3.GadgetSite3'>, instead_of=<class 'my_project.ImprovedEcommerceGenericPage'>, meta={})
562+
563+ Now, **#2 ** and **#3 ** have a conflict since they now both intend to replace
564+ ``ImprovedEcommerceGenericPage ``. As mentioned earlier, the `url-matcher `_
565+ would be the one to resolve such conflicts.
566+
567+ However, it would help prevent future confusion if we could remove the source of
568+ ambiguity in our Override Rules.
569+
570+ Suppose, we prefer ``gadget_sites_page_objects.site_2.GadgetSite2 `` more than
571+ ``ecommerce_page_objects.site_2.EcomSite2 ``. As such, we could remove the latter:
572+
573+ .. code-block :: python
574+
575+ del combined_registry.data[ecommerce_page_objects.site_2.EcomSite2]
576+
577+ updated_rules = combined_registry.get_overrides()
578+
579+ # The newly updated_rules would be as follows:
580+ # 1. OverrideRule(for_patterns=Patterns(include=['site_1.com'], exclude=[], priority=500), use=<class 'ecommerce_page_objects.site_1.EcomSite1'>, instead_of=<class 'my_project.ImprovedEcommerceGenericPage'>, meta={})
581+ # 2. OverrideRule(for_patterns=Patterns(include=['site_2.com'], exclude=[], priority=500), use=<class 'ecommerce_page_objects.site_2.EcomSite2'>, instead_of=<class 'my_project.ImprovedEcommerceGenericPage'>, meta={})
582+ # 3. OverrideRule(for_patterns=Patterns(include=['site_3.com'], exclude=[], priority=500), use=<class 'gadget_sites_page_objects.site_3.GadgetSite3'>, instead_of=<class 'my_project.ImprovedEcommerceGenericPage'>, meta={})
583+
584+ As discussed before, the Registry's data is structured simply as
585+ ``Dict[Callable, OverrideRule] `` for which we can easily manipulate it via ``dict ``
586+ operations.
587+
588+ Now, suppose we want to improve ``ecommerce_page_objects.site_1.EcomSite1 ``
589+ from **#1 ** above by perhaps adding/fixing fields. We can do that by:
590+
591+ .. code-block :: python
592+
593+ class ImprovedEcomSite1 (ecommerce_page_objects .site_1 .EcomSite1 ):
594+ def to_item (self ):
595+ ... # replace and improve some of the parsers here
596+
597+ combined_registry.data[ecommerce_page_objects.site_1.EcomSite1].use = ImprovedEcomSite1
598+
599+ updated_rules = combined_registry.get_overrides()
600+
601+ # The newly updated_rules would be as follows:
602+ # 1. OverrideRule(for_patterns=Patterns(include=['site_1.com'], exclude=[], priority=500), use=<class 'my_project.ImprovedEcomSite1'>, instead_of=<class 'my_project.ImprovedEcommerceGenericPage'>, meta={})
603+ # 2. OverrideRule(for_patterns=Patterns(include=['site_2.com'], exclude=[], priority=500), use=<class 'gadget_sites_page_objects.site_2.GadgetSite2'>, instead_of=<class 'my_project.ImprovedEcommerceGenericPage'>, meta={})
604+ # 3. OverrideRule(for_patterns=Patterns(include=['site_3.com'], exclude=[], priority=500), use=<class 'gadget_sites_page_objects.site_3.GadgetSite3'>, instead_of=<class 'my_project.ImprovedEcommerceGenericPage'>, meta={})
0 commit comments