Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabled to overwrite sort method for the regexp. #6

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

usualoma
Copy link

@usualoma usualoma commented Apr 9, 2017

Motivation

I want to weight to some conditions explicitly.

my $ra = Regexp::Assemble->new->track(1);
$ra->add(qw(ab [a-z]+));
say $ra->match('ab'); # "[a-z]+" is returned, but I want to get "ab"

Proposal

I think that we can specify matching order by specifying comparison function for regular expression.

local $Regexp::Assemble::Regexp_Cmp = sub($$) {
    my ($a, $b) = @_;
    $a =~ m/[\*\+]/ <=> $b =~ m/[\*\+]/ || $a cmp $b;
};

my $ra = Regexp::Assemble->new->track(1);
$ra->add(qw(ab [a-z]+));
say $ra->match('ab'); # "ab" is returned.

@ronsavage
Copy link
Owner

I would not do that because it would affect everyone who upgraded to the new version, wouldn't it? I suggest you simply subclass Regexp::Assemble itself, add that code (or a new method) and use your own local copy of that code. Alternately, you could propose a new method which does what you want but has no effect on any other user. That is, users could call or ignore your new method.

@usualoma
Copy link
Author

@ronsavage
Thank you for your quick response.

Simple subclassing

I think that we can not resolve this problem by simple subclassing.

package Regexp::Assemble::Custom {
    use parent 'Regexp::Assemble';

    sub _re_sort($$) {
        my ($a, $b) = @_;
        $a =~ m/[\*\+]/ <=> $b =~ m/[\*\+]/ || $a cmp $b;
    }
}

my $ra = Regexp::Assemble::Custom->new->track(1);
$ra->add(qw(ab [a-z]+));
say $ra->match('ab'); # "[a-z]+" is returned. Regexp::Assemble::Custom::_re_sort is not called

To overwrite _re_sort by simple subclassing

If $self->_re_sort is used at Regexp::Assemble, we can resolve that by simple sublassing. However, there are places where we can not be simply replaced by $self->_re_sort.

diff --git a/lib/Regexp/Assemble.pm b/lib/Regexp/Assemble.pm
index 57f2fe9..c2f0dc0 100644
--- a/lib/Regexp/Assemble.pm
+++ b/lib/Regexp/Assemble.pm
@@ -1832,7 +1832,7 @@ sub _re_path_track {
                 grep { $_ ne '' }
                 keys %{$in->[$n]}
             ];
-            $o = '(?:' . join( '|' => sort _re_sort @$path ) . ')';
+            $o = '(?:' . join( '|' => sort {$self->_re_sort($a,$b)} @$path ) . ')';
             $o .= '?' if exists $in->[$n]{''};
             $simple  .= $o;
             $augment .= $o;
package Regexp::Assemble::Custom {
    use parent 'Regexp::Assemble';

    sub _re_sort {
        my ($self, $a, $b) = @_;
        $a =~ m/[\*\+]/ <=> $b =~ m/[\*\+]/ || $a cmp $b;
    }
}

my $ra = Regexp::Assemble::Custom->new->track(1);
$ra->add(qw(ab [a-z]+));
say $ra->match('ab'); # "ab" is returned.

To overwrite _re_sort by redifine

If __PACKAGE__->_re_sort is used at Regexp::Assemble, we can resolve that by redifine.

patch: https://gist.github.com/usualoma/58a89fd5bc8c4cfe634259158f1f46d2

use Regexp::Assemble;

package Regexp::Assemble;
no warnings 'redefine';
sub _re_sort {
    $a =~ m/[\*\+]/ <=> $b =~ m/[\*\+]/ || $a cmp $b;
};

package main;

my $ra = Regexp::Assemble->new->track(1);
$ra->add(qw(ab [a-z]+));
say $ra->match('ab'); # "ab" is returned.

about 40608cb

I believe that 40608cb is less affected and can be solved without using dirty hacks.
40608cb is fully compatible with current version (0.37), so I think that 40608cb does not break environments that is upgraded from current version.

@ronsavage
Copy link
Owner

Hmmm. I can see you've been very thorough. I can't work on this right now, since I'm at $work, so I'll have to think about it.

@usualoma
Copy link
Author

Thank you very much. I am not in a hurry. I'd appreciate it if you could answer at your convenience.

@usualoma
Copy link
Author

Oh... very sorry.

To overwrite _re_sort by redifine

The following code also worked at 0.37 without patch

use Regexp::Assemble;

package Regexp::Assemble;
no warnings 'redefine';
sub _re_sort {
    $a =~ m/[\*\+]/ <=> $b =~ m/[\*\+]/ || $a cmp $b;
};

package main;

my $ra = Regexp::Assemble->new->track(1);
$ra->add(qw(ab [a-z]+));
say $ra->match('ab'); # "ab" is returned.

I think that 40608cb is useful, however even without it, I can do what I want to do.

@ronsavage
Copy link
Owner

ronsavage commented Apr 11, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants