Regexp::Ignore

Regexp::Ignore is a Perl module that let us ignore unwanted parts, while parsing text.
Download

Regexp::Ignore Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Perl Artistic License
  • Price:
  • FREE
  • Publisher Name:
  • Rani Pinchuk
  • Publisher web site:
  • http://search.cpan.org/~rani/Class-Phrasebook-0.88/SQL/SQL.pm

Regexp::Ignore Tags


Regexp::Ignore Description

Regexp::Ignore is a Perl module that let us ignore unwanted parts, while parsing text. Regexp::Ignore is a Perl module that let us ignore unwanted parts, while parsing text.WARNINGThis is an alpha code. Really. It was written in the end of 2001. It is not yet checked much. The only reason I submit it to CPAN that early is to get feedback about the idea, and hopefully to get some help in finding the many bugs that must still be in it. In our company we use this code, though, and for our needs it runs well.SYNOPSIS use Regexp::IgnoreXXX; my $rei = new Regexp::IgnoreXXX($text, ""); # split the wanted text from the unwanted text $rei->split(); # use substitution function $rei->s('(var)_(d+)', '$2$1', 'gi'); $rei->s('(d+):(d+)', '$2:$1'); # merge back to get the resulted text my $changed_text = $rei->merge();Markup languages, like HTML, are difficult to parse. The reason is that you can have a line like: < font size=+1 >H< /font >ello < font size=+1 >W< /font >orldHow can we find the string "Hello World", in the above line, and replace it by "Hello Universe" (which is a lot deeper)? Or how can we run a speller on the text and replace the mistakes with suggestions for the correct spelling?This module come to help you doing exactly that.Actually the module let you first split the text to the parts you are interested in and the unwanted parts. For example, all the HTML tags can be taken as unwanted parts.Then it let you parse the part you are interested in (while totally ignoring the unwanted parts).In the end it let you merge back the unwanted parts with the possibly changed parts you were interested in.There is just one catch. It uses the assumption that when you replace the above "Hello World" to "Hello Universe", all the unwanted parts between the start of the match to the end of the match, will be pushed after the text that will replace the match. This is not really understood right? Look at the example:The text: < font size=+1 >H< /font >ello < font size=+1 >W< /font >orldwill be first split and we will get the "cleaned" text: Hello WorldThen we can parse it using something like: s/Hello World/Hello Universe/;This will give us the changed "cleaned" text: Hello UniverseWhen we will merge with the unwanted parts we will get < font size=+1 >Hello Universe< /font >< font size=+1 >< /font >So, the unwanted parts in the match were pushed after the replacer.Requirements:· Perl Requirements: · Perl


Regexp::Ignore Related Software