About Me
My work in software engineering has focused
on three main areas:
- Climate technology
- Grid optimization at WeaveGrid, designing algorithms for optimal EV
charging
- Demand response at Onsemble, where I developed a virtual power plant
to aggregate the negative load from an install base of electrified appliances, reducing demand on the grid at peak times
- Home electrification, also at Onsemble, building mobile apps that
help people tap into subsidies to switch from gas to electric appliances
-
NLP /
AI
- Dialog systems at Ozlo, which
eventually became the engine behind Meta's
conversational assistant embedded in the Portal,
Ray-Ban Stories,
and several other devices
- Semantic modeling for
conversational assistants at
NLU
and AI lab and Yahoo Labs
- Trust and safety, also known as
Integrity
- Civic integrity, preventing
encitements to violence on Facebook in at-risk countries
- Facebook feed integrity, by
optimizing the content curation algorithm to limit the spread
of severe misinformation and disinformation
Some
open-source software projects and datasets
I have contributed to are
- The MuDoCo multi-domain coreference dataset
- OpenCCG a parser and
realizer for CCG
- PEP, a general syntactic parser for
CFLs
As an
academic, I have done research into
My full CV has all the details. I also enjoy
bike rides on or around
Mount Tam from time to
time.
On semantics, syntax, and formal/mathematical/theoretical linguistics
- Formal
Foundations of Linguistic Theory, CSLI Publications, to appear.
(With Carl Pollard; updated 10/9/2015.)
- Supplemental
update. Semantics and Pragmatics, 9(5), 2016 [ doi | preprint ].
- It all depends: a modern, type-theoretic,
compositional dynamic semantics for projection and beyond. Invited talk at the workshop Dynamic Semantics: Modern Type
Theoretic and Category Theoretic Approaches, 2015.
- A unidimensional
syntax-semantics interface for supplements. ESSLLI 27 workshop Empirical Advances in Categorial Grammar, 2015 [ talk ].
- A dynamic categorial grammar.
Formal Grammar 19,
LNCS 8612,
2014 [ doi ].
(With Carl Pollard.)
- The Dynamics of Sense and
Implicature, Ph.D. thesis, Ohio State University, 2013
[ invited talk ].
Committee: Carl Pollard (co-advisor),
Craige Roberts (co-advisor), and
Michael White.
- A multistratal
account of the projective Tagalog evidential ‘daw’.
SALT
22, 2012
[ talk ].
(With Greg Kierstead.)
- A higher-order theory of
presupposition. Studia Logica 100(4):727–751, 2012
[ doi |
talk ].
(With Carl Pollard.)
- Weak familiarity and anaphoric accessibility in dynamic
semantics. Formal Grammar 16,
LNCS 7395, 2012
[ doi |
talk ].
- Hyperintensional Dynamic
Semantics: Analyzing definiteness with enriched contexts.
Formal Grammar 15, LNCS 7395,
2012 [ doi ].
(With Carl Pollard.)
- Dynamic semantics in direct style, unpublished manuscript.
Presented in the
LLIC group, 2010.
- Enriching contexts for type-theoretic dynamics.
Invited talk at the
CAuLD Workshop on Logical Methods for Discourse, 2009.
(With Carl Pollard.)
- A poof-theoretic approach to
French pronominal clitics. 13th ESSLLI Student Session,
2008 [ talk ].
On computational linguistics and NLP
- MuDoCo: Corpus for Multidomain Coreference Resolution and Referring
Expression Generation, Proceedings of LREC, 2020. [ dataset ] (With
Shivani Poddar and Kartikeya Upasani.)
- Betting Big on Small Data for
Conversational AI. Medium article, 2017.
- Putting the Conversation in
Conversational AI. Talk on dialog and understanding work at Ozlo, 2017.
- The role of salience ranking in anaphora resolution. Invited talk at the ESSLLI 27
workshop Logic and Probabilistic Methods for Dialog,
2015.
- Inferring the antecedent:
Anaphora resolution off the deep end.
Presented at the
AMPRA 2 panel
Computational and Experimental Approaches to Reference and Anaphoric
Inference, 2014.
- A joint phrasal and dependency model
for paraphrase alignment.
COLING
24, 2012.
(With Kapil Thadani and
Michael White.)
- Creating disjunctive logical forms
from aligned sentences for grammar-based paraphrase generation.
Workshop on Monolingual Text-to-Text
Generation, 2011 [ talk ].
(With Michael White.)
- Grammar
engineering for CCG using Ant and
XSLT.
SETQA-
NLP, 2009
[ poster ].
(With Rajakrishnan Rajkumar and Michael White.)
- Developing an annotation
scheme for ELL spelling errors.
MCLC 5, 2008.
(With D.J. Hovermale.)
- Towards broad
coverage surface realization with CCG.
UCNLG+MT, 2007.
(With Michael White and Rajakrishnan Rajkumar.)
The Multidomain Coreference (MuDoCo) dataset was released as an
open-source project in conjunction with our LREC 2020
paper. It contains almost 8,500 authored human-machine dialogs annotated for coreference links.
- Dataset on the Facebook Research GitHub
- (Initial release with the paper)
This dataset is encoded in JSON, with named entity and reference types and link annotations. It is broken down by
domain and additionally split into an 80%/10%/10% scheme for training, testing, and development data, respectively.
This corpus is an enhanced version of the
Edinburgh paraphrase corpus,
with both machine- and hand-corrected tokenization, hand-corrected alignments based on retokenization,
parses from both the OpenCCG parser and the
Stanford dependency parser.
It also includes named entity annotations generated by the Stanford parser and
Meteor alignments for use as a baseline.
-
Edinburgh++ Corpus
- (03/22/2013 release)
- README file
The corpus is encoded in JSON format, but comes
with a handy Python script that outputs just the alignments. The training and a test partitions are based on the
partitioning scheme in
my COLING 2012 paper.
The name PEP stands for
PEP
is an Earley Parser and is an
example of direct left recursion. PEP is an implementation of
Earley's chart-parsing algorithm
in Java. It includes a thin command-line interface, but is intended to
be used as a library. PEP is free software released under the
LGPL.
-
PEP project
- on GitHub
-
PEP source and binaries
- Version 0.4
- Signature
- generated using my public key
- API Documentation
- generated by JavaDoc
The PEP GitHub project, and the tar bundle above, contains PEP's binaries, full source code,
generated documentation, and an Ant
build file. It also includes several sample grammars for
testing and automated JUnit tests. Version 0.4 is generalized to allow rules
with right-hand sides that include a mix of terminals and nonterminals.
PEP can parse strings licensed by any CFG (including those
that contain recursive rules). PEP's charts use backpointers so that if a grammar allows ambiguity, PEP keeps track
of all of the possible parses in a set of traversable parse trees. For example, the 'tiny' grammar included with
PEP contains the rules
- Det -> the
- N -> man
- N -> telescope
- NP -> Mary
- P -> with
- VT -> saw
- N -> N PP
- PP -> P NP
- NP -> Det N
- VP -> VP PP
- VP -> VT NP
- S -> NP VP
which means that there are two possible parses for
Mary saw the man with the telescope, and PEP generates
both:
$ echo "Mary saw the man with the telescope" | ./bin/pep -s S -g samples/tiny.xml -
ACCEPT: S -> [Mary, saw, the, man, with, the, telescope] (2)
1. [S[NP[Mary]][VP[VP[VT[saw]][NP[Det[the]][N[man]]]][PP[P[with]][NP[Det[the]][N[telescope]]]]]]
2. [S[NP[Mary]][VP[VT[saw]][NP[Det[the]][N[N[man]][PP[P[with]][NP[Det[the]][N[telescope]]]]]]]]
Graduate
- Formal Foundations of Linguistic Theory
- (Assistant to Carl Pollard.) Foundational
course on the mathematical tools used in formal linguistics.
- Syntax 1
- (Assistant to Bob Levine.) Overview of
syntactic theory and description based on HPSG.
Undergraduate
- Language and Computers
- Broad-based overview of topics in computational linguistics.
- Language and Formal Reasoning
- Truth-conditional meaning in natural language and its interaction with deductive reasoning.
- Introduction to Language in the Humanities
- Survey course in general linguistics.