- InfoMator was designed and developed by
Dean H. Nelson (Salt Lake City, UT) with the testing assistance of
NarraText's other original co-founders, Tina Mercer and Rock
name, "InfoMator", is a coinage derived from "informate" (to dispense
information) and "-ator" (to
perform that function) to convey that InfoMator (tm)
"dispenses structurized extractions from unstructured text".
workflow-based software prototype, which is used to
construct, correct, and parse domain-specific text data sets
into data vectors amenable to quantitative and qualitative statistical
workflow architecture is based on NarraText's adaptation of the CRISP-DM
Standard Process for Data Mining).
as a software prototype, emphasizes two
1) a fully
transparent and fully functional software protype that
produces results for business use-cases,
2) a tested/validated
program specification that can be used to engineer
functionality in any desired programming language (i.e. Java, Python,
Perl, Visual Basic, .Net, etc.) with minimum testing costs.
Furthermore, InfoMator has an embedded
code-generator (aka "Snipper(tm)" that is used to generate
early-binding, executable scripts and software artifacts that execute
to produce the desired results.
Used: InfoMator has been deployed
originally in the Insurance Market to extract unstructured
information from claims notes.
- However, InfoMator's
design has focused on rapid
development and implementation of new functionalities
of unstructured data
of user-defined extraction types
sanitization and masking
use-case reporting and analysis requirements in any
industry or market.
in the fall of 2010.
version is 3.3.
- InfoMator is based on a
work flow process structure that is supported by a Core
Knowledge Base (CKB) consisting of Core
LeXicons (CLX) and Core
- Smaller, richer, operational domain-specific
knowledge bases are produced by intersecting of
domain-specific dataset contents with a domain-specific "Knowledge
Sieve", that captures and prioritizes Core Knowledge
artifacts via a Dataset
LeXicon (DTLX) and Dataset TaXonomy(DTX).
- All dataset lexical items that are not known in the
DLX & DTX are made available for InfoMator's knowledge
engineering functions to mark and identify
unknown acronyms and abbreviations and to correct spelling
errors (e.g. "speling erors", broken tokens (e.g. "br oken tok
en", and touching tokens (e.g. "touchingtokens").
- Knowledge engineering results reside in a User
LeXicon (ULX) which is dynamically combined with
the Dataset LeXicon to complement domain knowledge.
- Once the target dataset is corrected (and
normalized), each sentence (token vector) is assigned a unique
Vector and one or
more Semantic-Tag Vectors.
Vectors consisting of "structured
data values", grammatical
use-case defined extractions are parsed from
the combined token-syntax-semantics vectors.
- The data vector extractions can then be distilled
into higher-level extractions (thematic
roles, triples, generative grammar
constituents, epistemological modality, eventuality (fact vs.
- The resulting use-case data
vectors provide rich and hyperparametric input to perform
Statistical Analyses and training sets for Machine Learning
was developed (1) to
improve the levels of precision and recall in text extractions, (2) to
provide rapid prototyping development, and (3) to
extend functional robustness in our
Knowledge Engineering and Text Extraction efforts that we
could not achieve using a commercial Text Analytics software package,
Attensity 5, which was in use at NarraText between 2009 and 2009.