Next: SOME
DEFINITIONS Up:
RESEARCH
ACTIVITIES of Stavros Previous: Other Research/Scholarly Activities
Theorem [ Kon05itw, Kon07].1 The following problem can be computed in polynomial time:
Input = a finite automaton
Output = the edit (or Levenshtein) distance of the set of words accepted by the automaton
Consequences: A trie
is a finite automaton that represents a finite set of words (a dictionary). A
trellis is a finite automaton that represents a block code. A constraint system
is another term used for automata and finds applications in coding for data
storage. The result applies to tries, trellises, and constraint systems, as
these objects are finite automata. If the distance of a language is >
, for some positive integer
, then the language is error-detecting for the channel that
allows up to
errors (substitutions,
insertions, and deletions) in any word.
Theorem [KKPWX04wseas].2 The following problem can be computed in quadratic time:
Input = a finite automaton
Output = the Hamming distance of the set of words accepted by the automaton
Consequences: as above.
A DNA
word is a word over the DNA alphabet {A,C,G,T} and
represents a DNA molecule in the 5'- to 3'- direction; for example, AGGTC
represents the molecule 5'-AGGTC-3'. The symbol A is complementary to T and the
symbol C to G. We indicate this by writing A=
(T) or T=
(A), and C=
(G) or G=
(C). The Watson/Crick complement of a DNA word
is the word
. If
is sufficiently large, the two
complementary words (molecules) would stick to each other as a result of
forming
bonds between the
complementary
pairs of symbols (nucleotides). A set of DNA words L is called Hamming(0,k)-bond-free
if, for any segments x and y of length k of any two words in L, we have that x
is not complementary to y, that is, x
(y). This definition was introduced in [JoMa04] - the
authors used the term
-k-code.
The
following result
gives an explicit characterization of
all maximal Hamming(0,k)-bond-free languages. It uses the subword closure operation
. Let F be a set of words of
length k. Then the set F
consists of all words w such that
any segment of w of length k is in F.
Theorem [KKSijfcs, KKS04dna10b] A DNA language is maximal
Hamming(0,k)-bond-free if and only if it is of the form S
, where S is any set of DNA words of
length k such that
constitutes a partition of the set of all DNA words of length k that are not
self-complementary =
.
Proposition[KKLSTfi].4.2. Let X be an alphabet containing at least two symbols, let t be an involution, and let k be a positive integer. The set of all hairpin (t,k)-free words over X is finite if and only if one of the following holds:
(a) t is the identity, or
(b) t is the mirror involution and, either k=1, or k>3 and |X|=2.
Solid codes have remarkable synchronization capabilities. The following results provide methods for the construction of large classes of solid codes. If x is any symbol of the alphabet then the quantity tx(w) is the maximum length of a run of x’s in w. The quantity t(w) is the maximal length of any run in w. For example, tb(aabbaaaba) = 2 and t(aabbaaaba) = 3. If f is an increasing and unbounded function of the positive integers into positive integers, then we use f’ for denoting the near-inverse of f.
Theorem[JKL04jalc].4.1. Let X be an alphabet containing at least two symbols a and b, let f be any increasing (not necessarily strictly) and unbounded function of the positive integers into positive integers, let L be any infix code, and let ta‘ and tb‘ be functions mapping words over X into non-negative integers such that ta‘(w) >= ta(w) and tb‘(w)>= tb(w), for every word w. The following language is a solid code
.
An important application of the above construction method is the following. For each positive integer n, let Kn be the solid code obtained when we use the following values for the parameters in the above construction: X={a,b}, f(k)=k and f’(k)=k+1, L = bXn a, where Xn = all words of length n, and ta‘(w) = tb‘(w) = t(w). Thus,
![]()
Proposition[JKL04jalc].6.1 The information rate of Kn tends to 1, as n goes to infinity. Moreover, Kn has linear-time encoding and decoding complexities.
Next: SOME DEFINITIONS Up:
RESEARCH
ACTIVITIES of Stavros Previous: Other Research/Scholarly Activities