manuel

Document parser and evaluator
Download

manuel Ranking & Summary

Advertisement

  • Rating:
  • License:
  • ZPL
  • Price:
  • FREE
  • Publisher Name:
  • Benji York
  • Publisher web site:
  • http://pypi.python.org/pypi/manuel

manuel Tags


manuel Description

Document parser and evaluator Manuel is a Python module that parses documents, evaluates their contents, then formats the result of the evaluation.The core functionality is accessed through an instance of a Manuel object. It is used to build up the handling of a document type. Each phase has a corresponding slot to which various implementations are attached. >>> import manuelParsingManuel operates on Documents. Each Document is created from a string containing one or more lines. >>> source = """ ... This is our document, it has several lines. ... one: 1, 2, 3 ... two: 4, 5, 7 ... three: 3, 5, 1 ... """ >>> document = manuel.Document(source)For example purposes we will create a type of test that consists of a sequence of numbers so lets create a NumbersTest object to represent the parsed list. >>> class NumbersTest(object): ... def __init__(self, description, numbers): ... self.description = description ... self.numbers = numbersThe Document is divided into one or more regions. Each region is a distinct "chunk" of the document and will be acted uppon in later (post-parsing) phases. Initially the Document is made up of a single element, the source string. >>> The Document offers a "find_regions" method to assist in locating the portions of the document a particular parser is interested in. Given a regular expression (either as a string, or compiled), it will return "region" objects that contain the matched source text, the line number (1 based) the region begins at, as well as the associated re.Match object. >>> import re >>> numbers_test_finder = re.compile( ... r'^(?P.*?): (?P(d+,??)+)$', re.MULTILINE) >>> regions = document.find_regions(numbers_test_finder) >>> regions >>> regions.lineno 2 >>> regions.source 'one: 1, 2, 3 ' >>> regions.start_match.group('description') 'one' >>> regions.start_match.group('numbers') '1, 2, 3'If given two regular expressions find_regions will use the first to identify the begining of a region and the second to identify the end. >>> region = document.find_regions( ... re.compile('^one:.*$', re.MULTILINE), ... re.compile('^three:.*$', re.MULTILINE), ... ) >>> region.lineno 2 >>> print region.source one: 1, 2, 3 two: 4, 5, 7 three: 3, 5, 1Also, instead of just a "start_match" attribute, the region will have start_match and end_match attributes. >>> region.start_match < _sre.SRE_Match object at 0x... > >>> region.end_match < _sre.SRE_Match object at 0x... >Regions must always consist of whole lines. >>> document.find_regions('1, 2, 3') Traceback (most recent call last): ... ValueError: Regions must start at the begining of a line. >>> document.find_regions('three') Traceback (most recent call last): ... ValueError: Regions must end at the ending of a line. >>> document.find_regions( ... re.compile('ne:.*$', re.MULTILINE), ... re.compile('^one:.*$', re.MULTILINE), ... ) Traceback (most recent call last): ... ValueError: Regions must start at the begining of a line. >>> document.find_regions( ... re.compile('^one:.*$', re.MULTILINE), ... re.compile('^three:', re.MULTILINE), ... ) Traceback (most recent call last): ... ValueError: Regions must end at the ending of a line.Now we can register a parser that will identify the regions we're interested in and create NumbersTest objects from the source text. >>> def parse(document): ... for region in document.find_regions(numbers_test_finder): ... description = region.start_match.group('description') ... numbers = map( ... int, region.start_match.group('numbers').split(',')) ... test = NumbersTest(description, numbers) ... document.replace_region(region, test) >>> parse(document) >>> >>> Requirements: · Python What's New in This Release: · fix a bug that caused instances of zope.testin.doctest.Example (and instances of subclasses of the same) to be silently ignored


manuel Related Software