Many applications of current interest involve using
databases or datastreams of events to detect instances of
processes. In those applications, events provide evidence that is
used to infer the existence and estimate the states of the various
processes of interest. Examples of such applications include:
network and computer security; network management; sensor
network tracking; military situational awareness; and critical
infrastructure monitoring and protection.
While these and other applications are superficially different
from one another, they in fact share many common features
when viewed from an appropriately abstract perspective.
This abstract framework posits that a collection of processes,
which is producing an interleaved stream of observable
events,
...; ei; ei+1;
ei+2;...
where event ej occurs at time
tj where ej≤
ej+1. The goal
in many applications is to solve the inverse problem, namely
determining which processes produced which events in the
observed event stream. A Process Query System (PQS) is a
software system that strives to solve this inverse problem.
We adopt the thinking of modern systems and control theory (including
such areas as communications, speech recognition and other areas that
use Hidden Markov Models, for example) in which processes have "Internal"
or "hidden" states that are not always externally observable.
The processes¡¯ hidden states generate observable events from
which we seek to infer the existence of the processes and
estimate the hidden states of the instantiated processes as
observable events are collected.
Software systems for solving the mentioned inverse problem are called Process
Query Systems, just as database management systems (DBMS)
are software systems for solving certain types of data archival
and retrieval problems. Previous publications have already discussed
the application of PQS technology to specific problems. Our current
working implementation of a PQS is
called TRAFEN, for TRacking And Fusion ENgine, just as
OracleTM or SQL ServerTM are software implementations of
the general concept of a database management system.
In order to solve our target problem, Process Query Systems must
solve a variety of subproblems including:
1) Model derivation and description - How are process
models developed for the various application problems
in which a DSSP arises and how are those models
represented?
2) Model-event scoring - Given a subset of events and a
specific process model, what are effective and efficient
algorithms for producing a metric that captures the extent
to which that process could have produced that event
sequence?
3) Event stream partitioning - What are effective and efficient
algorithms for partitioning the sequence of events
and assigning the resulting subsequences to specific process
models?
4) Event stream gating - How can we efficiently filter events
so they only get evaluated against process models that
have a reasonable chance of having generated those
events?
5) Solution evaluation - How can we evaluate the robustness
of a solution to the DSSP? That is, what would be the
analog of a variance estimate as in traditional statistical
inference?
These subproblems will be discussed in detail below and the
reader is encouraged to appreciate their importance and role in
the solving the DSSP. This will lead to a better understanding
the challenges that arise in Process Query Systems and their
implementations as we develop the concepts further.
(to be continued)