Pep 0.4 API Documentation

edu.osu.ling.pep
Class EarleyParser

java.lang.Object
  extended by edu.osu.ling.pep.EarleyParser

public class EarleyParser
extends Object

An Earley parser, named after the inventor of the algorithm it implements.

Earley parsers are used to parse strings for conformance with a given context-free grammar. Once instantiated with a grammar, an instance of this class can be used to parse (or just recognize) strings (represented as iterable series of tokens).

This parser fills out a chart based on the specified tokens for a specified seed category. Because of this, it can be used to recognize strings that represent any rule in the grammar. The parse(Iterable, Category) method returns a Parse object that encapsulates the completed chart, the tokens given and the seed category for that parse.

For example, if a grammar contains the following rules:

parses can be requested for category S ("the boy left") but also for category NP ("the boy"). For convenience, this class provides the recognize(Iterable, Category) method that just returns the status for a given parse (but not its completed chart, tokens, and seed category).

A parser instance can be configured using setOption(ParserOption, Boolean). When no configuration is done, a parser just uses the default values of options. Note, however, that instances of this class are not synchronized. If it is possible that a thread could be calling both parse(Iterable, Category) or recognize(Iterable, Category) and another thread could be setting options, steps should be taken by the developer to ensure that these do not happen concurrently.

Version:
$LastChangedRevision: 1807 $
Author:
Scott Martin
See Also:
Grammar, Parse, ParserOption

Constructor Summary
EarleyParser(Grammar grammar)
          Creates a new Earley parser for the specified grammar.
EarleyParser(Grammar grammar, ParserListener listener)
          Creates a new Earley parser for the given set of production rules with the specified listener.
 
Method Summary
 boolean containsOption(ParserOption optionName)
          Tests whether this parser has a defined option identified by the specified option name.
 Grammar getGrammar()
          The grammar where this parser looks up its production rules.
 ParserListener getListener()
          Gets the listener currently receiving events from this parser.
 Boolean getOption(ParserOption optionName)
          Gets the value of the option with the specified name.
 Parse parse(Iterable<String> tokens, Category seed)
          Gets a parse for the specified string (iterable series of tokens) and seed category.
 Parse parse(String tokens, Category seed)
          Convenience method for parsing a string of tokens separated by spaces.
 Parse parse(String tokens, String separator, Category seed)
          Convenience method for parsing a string of tokens separated by a specified string.
 Status recognize(Iterable<String> tokens, Category seed)
          Tests whether this parser recognizes a given string (list of tokens) for the specified seed category.
 Status recognize(String tokens, Category seed)
          Convenience method for recognizing a string of tokens separated by spaces.
 Status recognize(String tokens, String separator, Category seed)
          Convenience method for recognizing a string of tokens separated by spaces.
 void setGrammar(Grammar grammar)
          Sets the grammar this parser uses to look up its production rules.
 void setListener(ParserListener listener)
          Sets the listener that will receive notification of events during parsing.
 Boolean setOption(ParserOption optionName, Boolean value)
          Sets an option on this parser instance with the specified name.
 String toString()
          Gets a string representation of this Earley parser.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

EarleyParser

public EarleyParser(Grammar grammar)
Creates a new Earley parser for the specified grammar.

See Also:
EarleyParser(Grammar, ParserListener)

EarleyParser

public EarleyParser(Grammar grammar,
                    ParserListener listener)
Creates a new Earley parser for the given set of production rules with the specified listener.

Parameters:
grammar - The grammar that this parser will consult for valid production rules.
listener - A listener that will be notified as edges are added and tokens scanned by this Earley parser.
See Also:
setGrammar(Grammar)
Method Detail

containsOption

public boolean containsOption(ParserOption optionName)
Tests whether this parser has a defined option identified by the specified option name.

Parameters:
optionName - The option name to test for.
Returns:
true iff the corresponding option has been previously set on this parser instance. Even if this returns false, getOption(ParserOption) can still be called as it will just return the default value of the specified option.

getGrammar

public Grammar getGrammar()
The grammar where this parser looks up its production rules.

Returns:
The grammar provided when this parser was created.

getListener

public ParserListener getListener()
Gets the listener currently receiving events from this parser.

Returns:
null if no listener has been specified.
Since:
0.2

getOption

public Boolean getOption(ParserOption optionName)
Gets the value of the option with the specified name.

Parameters:
optionName - The option to fetch a value for.
Returns:
The defined value of the specified option, or its default value if it has not been set.

parse

public Parse parse(Iterable<String> tokens,
                   Category seed)
            throws PepException
Gets a parse for the specified string (iterable series of tokens) and seed category.

While parsing is underway, this method will generate events to the listener specified for this parser, if any. Specifically, events are generated whenever the parser is seeded, an edge is added to the chart as a result of prediction or completion, or a token is scanned from the input string.

Parameters:
tokens - The tokens to parse.
seed - The seed category to attempt to find for the given tokens.
Returns:
A parse for the specified tokens and seed, containing a completed chart.
Throws:
PepException - If no listener has been specified for this parser, or if this parser's listener decides to re-throw exceptions it is notified about, then this method throws a PepException in any of the following cases:
  • tokens is null or empty
  • seed is null
  • An exception is thrown in the process of parsing, for example, in case the parser is unable to parse one of the input tokens

parse

public Parse parse(String tokens,
                   Category seed)
            throws PepException
Convenience method for parsing a string of tokens separated by spaces.

Parameters:
tokens - The string of tokens to parse.
Throws:
PepException
Since:
0.4
See Also:
parse(String, String, Category)

parse

public Parse parse(String tokens,
                   String separator,
                   Category seed)
            throws PepException
Convenience method for parsing a string of tokens separated by a specified string.

Parameters:
tokens - The string of tokens to parse.
separator - The separator in the token string.
Throws:
PepException
Since:
0.4
See Also:
parse(Iterable, Category)

recognize

public Status recognize(Iterable<String> tokens,
                        Category seed)
                 throws PepException
Tests whether this parser recognizes a given string (list of tokens) for the specified seed category.

Parameters:
tokens - The tokens to parse.
seed - The seed category to attempt to recognize for the given tokens.
Returns:
Status.ACCEPT if the string is recognized, Status.REJECT if the string is rejected, and Status.ERROR if an error occurred during parsing.
Throws:
PepException
See Also:
parse(Iterable, Category), Parse.getStatus()

recognize

public Status recognize(String tokens,
                        Category seed)
                 throws PepException
Convenience method for recognizing a string of tokens separated by spaces.

Parameters:
tokens - The string of tokens to recognize.
Throws:
PepException
Since:
0.4
See Also:
recognize(String, String, Category)

recognize

public Status recognize(String tokens,
                        String separator,
                        Category seed)
                 throws PepException
Convenience method for recognizing a string of tokens separated by spaces.

Parameters:
tokens - The string of tokens to recognize.
separator - The separator in the token string.
Throws:
PepException
Since:
0.4
See Also:
recognize(Iterable, Category)

setGrammar

public void setGrammar(Grammar grammar)
Sets the grammar this parser uses to look up its production rules.

Throws:
IllegalArgumentException - If a null grammar is provided.
Since:
0.2

setListener

public void setListener(ParserListener listener)
Sets the listener that will receive notification of events during parsing.

Parameters:
listener - A listener, possibly null. If a null listener is specified, event notification is effectively turned off for this parser.
Since:
0.2

setOption

public Boolean setOption(ParserOption optionName,
                         Boolean value)
Sets an option on this parser instance with the specified name.

Parameters:
optionName - The option to set.
value - The new value for this option.
Returns:
The former value for the specified option, or the default value that would have been used..
Throws:
IllegalArgumentException - If value or optionName is null.
See Also:
EnumMap.put(Enum, Object)

toString

public String toString()
Gets a string representation of this Earley parser.

Overrides:
toString in class Object

Pep: Pep is an Earley parser

Pep API Documentation, Copyright © 2007 Scott Martin

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the overview file.