At Cades Cove in the Smokies.

Scott Martin

Principal Scientist, Ozlo Inc.

Public encryption key


About Me

I'm doing research into semantic modeling and conversational interfaces as part of our efforts to reinvent search at Ozlo. In the past, I have worked on various topics in computational linguistics and NLP during stints at Yahoo Labs (parsing and semantic modeling for search and web analytics) and at Nuance's NLU and AI lab (computational semantics, dialog management for intelligent conversational assistance). Before that, I was a presidential fellow at OSU while finishing my Ph.D. in linguistics there in 2013. My CV has details, and so do my profile pages at Google Scholar, LinkedIn, and ResearchGate.

My academic interests broadly include computational, formal, and mathematical linguistics, touching on computer science, artificial intelligence, and mathematical logic. My computational work has focused on anaphora and coreference resolution, automatic paraphrase alignment, generation, entailment-based task selection for dialog management, syntactic parsing, and semantic representation. My work in linguistic theory mostly focuses on developing an interface between syntax and semantics that can model discourse compositionally, especially projective meaning types like anaphora and supplements.

Besides research, I'm also accomplished as a software engineer. I contributed to the core parsing components of the SkyPhrase SDK, and I designed and implemented systems for resolving anaphora, dialog task matching, and lambda conversion while at Nuance's NLU/AI lab. I'm one of the main authors of OpenCCG (an open-source parser and realizer for CCG), and I wrote the PEP Java Earley parser. I also built an NLP component in the initial release of Sermo's social network for physicians.

Sometimes I sneak away for a bike ride, usually in the Santa Cruz mountains.

Selected Publications & Talks







2011 and Before

Code & Data


This corpus is an enhanced version of the Edinburgh paraphrase corpus, with both machine- and hand-corrected tokenization, hand-corrected alignments based on retokenization, parses from both the OpenCCG parser and the Stanford dependency parser. It also includes named entity annotations generated by the Stanford parser and Meteor alignments for use as a baseline.

Edinburgh++ Corpus
(03/22/2013 release)

The corpus is encoded in JSON format, but comes with a handy Python script that outputs just the alignments. The training and a test partitions are based on the partitioning scheme in my COLING 2012 paper.


The name PEP stands for PEP is an Earley Parser and is an example of direct left recursion. PEP is an implementation of Earley's chart-parsing algorithm in Java. It includes a thin command-line interface, but is intended to be used as a library. PEP is free software released under the LGPL.

PEP source and binaries
Version 0.4
generated using my public key
API Documentation
generated by JavaDoc

The tar bundle above contains PEP's binaries, full source code, generated documentation, and an Ant build file. It also includes several sample grammars for testing and automated JUnit tests.

PEP can parse strings licensed by any CFG (including those that contain recursive rules). PEP's charts use backpointers so that if a grammar allows ambiguity, PEP keeps track of all of the possible parses in a set of traversable parse trees. Version 0.4 is generalized to allow rules with right-hand sides that include a mix of terminals and nonterminals.



Formal Foundations of Linguistic Theory
(Assistant to Carl Pollard.) Foundational course on the mathematical tools used in formal linguistics.
Syntax 1
(Assistant to Bob Levine.) Overview of syntactic theory and description based on HPSG.
  • Fall 2011


Language and Computers
Broad-based overview of topics in computational linguistics.
Language and Formal Reasoning
Truth-conditional meaning in natural language and its interaction with deductive reasoning.
Introduction to Language in the Humanities
Survey course in general linguistics.