: At Cades Cove in the Smokies.

Scott Martin

Computer scientist, linguist, software engineering leader, math and logic enthusiast, cyclist

Email: scott [at] coffeeblack [dot] org; Public encryption key

About Me

My work in software engineering has focused on three main areas:

Climate technology
- Home electrification at Onsemble, building mobile apps that help people tap into subsidies to switch from gas to electric appliances
- Demand response, also at Onsemble, developing a virtual power plant to aggregate the negative load from an install base of electrified appliances, reducing demand on the grid at peak times
NLP / AI
- Dialog systems at Ozlo, which eventually became the engine behind Meta's conversational assistant embedded in the Portal, Ray-Ban Stories, and several other devices
- Semantic modeling for conversational assistants at NLU and AI lab and Yahoo Labs
Trust and safety, also known as Integrity
- Civic integrity, preventing encitements to violence on Facebook in at-risk countries
- Facebook feed integrity, by optimizing the content curation algorithm to limit the spread of severe misinformation and disinformation

Some open-source software projects and datasets I have contributed to are

The MuDoCo multi-domain coreference dataset
OpenCCG a parser and realizer for CCG
PEP, a general syntactic parser for CFLs

As an academic, I have done research into

Computational approaches to language generation and anaphora/coreference resolution
Formal and mathematical linguistics, compiled into a textbook on the foundations of linguistic theory
Theoretical modeling for contextual interpretation aimed at complex phenomena like supplements in English

My full CV has all the details. I also enjoy bike rides on or around Mount Tam from time to time.

Selected Publications & Talks

On semantics, syntax, and formal/mathematical/theoretical linguistics

Formal Foundations of Linguistic Theory, CSLI Publications, to appear. (With Carl Pollard; updated 10/9/2015.)
Supplemental update. Semantics and Pragmatics, 9(5), 2016 [ doi | preprint ].
It all depends: a modern, type-theoretic, compositional dynamic semantics for projection and beyond. Invited talk at the workshop Dynamic Semantics: Modern Type Theoretic and Category Theoretic Approaches, 2015.
A unidimensional syntax-semantics interface for supplements. ESSLLI 27 workshop Empirical Advances in Categorial Grammar, 2015 [ talk ].
A dynamic categorial grammar. Formal Grammar 19, LNCS 8612, 2014 [ doi ]. (With Carl Pollard.)
The Dynamics of Sense and Implicature, Ph.D. thesis, Ohio State University, 2013 [ invited talk ]. Committee: Carl Pollard (co-advisor), Craige Roberts (co-advisor), and Michael White.
A multistratal account of the projective Tagalog evidential ‘daw’. SALT 22, 2012 [ talk ]. (With Greg Kierstead.)
A higher-order theory of presupposition. Studia Logica 100(4):727–751, 2012 [ doi | talk ]. (With Carl Pollard.)
Weak familiarity and anaphoric accessibility in dynamic semantics. Formal Grammar 16, LNCS 7395, 2012 [ doi | talk ].
Hyperintensional Dynamic Semantics: Analyzing definiteness with enriched contexts. Formal Grammar 15, LNCS 7395, 2012 [ doi ]. (With Carl Pollard.)
Dynamic semantics in direct style, unpublished manuscript. Presented in the LLIC group, 2010.
Enriching contexts for type-theoretic dynamics. Invited talk at the CAuLD Workshop on Logical Methods for Discourse, 2009. (With Carl Pollard.)
A poof-theoretic approach to French pronominal clitics. 13th ESSLLI Student Session, 2008 [ talk ].

On and computational linguistics and NLP

MuDoCo: Corpus for Multidomain Coreference Resolution and Referring Expression Generation, Proceedings of LREC, 2020. [ dataset ] (With Shivani Poddar and Kartikeya Upasani.)
Betting Big on Small Data for Conversational AI. Medium article, 2017.
Putting the Conversation in Conversational AI. Talk on dialog and understanding work at Ozlo, 2017.
The role of salience ranking in anaphora resolution. Invited talk at the ESSLLI 27 workshop Logic and Probabilistic Methods for Dialog, 2015.
Inferring the antecedent: Anaphora resolution off the deep end. Presented at the AMPRA 2 panel Computational and Experimental Approaches to Reference and Anaphoric Inference, 2014.
A joint phrasal and dependency model for paraphrase alignment. COLING 24, 2012. (With Kapil Thadani and Michael White.)
Creating disjunctive logical forms from aligned sentences for grammar-based paraphrase generation. Workshop on Monolingual Text-to-Text Generation, 2011 [ talk ]. (With Michael White.)
Grammar engineering for CCG using Ant and XSLT. SETQA- NLP, 2009 [ poster ]. (With Rajakrishnan Rajkumar and Michael White.)
Developing an annotation scheme for ELL spelling errors. MCLC 5, 2008. (With D.J. Hovermale.)
Towards broad coverage surface realization with CCG. UCNLG+MT, 2007. (With Michael White and Rajakrishnan Rajkumar.)

The Multidomain Coreference (MuDoCo) dataset was released as an open-source project in conjunction with our LREC 2020 paper. It contains almost 8,500 authored human-machine dialogs annotated for coreference links.

Dataset on the Facebook Research GitHub: (Initial release with the paper)

This dataset is encoded in JSON, with named entity and reference types and link annotations. It is broken down by domain and additionally split into an 80%/10%/10% scheme for training, testing, and development data, respectively.

Edinburgh++

This corpus is an enhanced version of the Edinburgh paraphrase corpus, with both machine- and hand-corrected tokenization, hand-corrected alignments based on retokenization, parses from both the OpenCCG parser and the Stanford dependency parser. It also includes named entity annotations generated by the Stanford parser and Meteor alignments for use as a baseline.

Edinburgh++ Corpus: (03/22/2013 release)
README file

The corpus is encoded in JSON format, but comes with a handy Python script that outputs just the alignments. The training and a test partitions are based on the partitioning scheme in my COLING 2012 paper.

PEP

The name PEP stands for PEP is an Earley Parser and is an example of direct left recursion. PEP is an implementation of Earley's chart-parsing algorithm in Java. It includes a thin command-line interface, but is intended to be used as a library. PEP is free software released under the LGPL.

PEP project: on GitHub
PEP source and binaries: Version 0.4
Signature: generated using my public key
API Documentation: generated by JavaDoc

The PEP GitHub project, and the tar bundle above, contains PEP's binaries, full source code, generated documentation, and an Ant build file. It also includes several sample grammars for testing and automated JUnit tests.

PEP can parse strings licensed by any CFG (including those that contain recursive rules). PEP's charts use backpointers so that if a grammar allows ambiguity, PEP keeps track of all of the possible parses in a set of traversable parse trees. Version 0.4 is generalized to allow rules with right-hand sides that include a mix of terminals and nonterminals.

Teaching

Graduate

Formal Foundations of Linguistic Theory

(Assistant to Carl Pollard.) Foundational course on the mathematical tools used in formal linguistics.

Syntax 1

(Assistant to Bob Levine.) Overview of syntactic theory and description based on HPSG.

Fall 2011

Undergraduate

Language and Computers

Broad-based overview of topics in computational linguistics.

Language and Formal Reasoning

Truth-conditional meaning in natural language and its interaction with deductive reasoning.

Introduction to Language in the Humanities

Survey course in general linguistics.

Fall 2008