# SOME RESULTS (under construction)

Theorem [ Kon05itw, Kon07].1   The following problem can be computed in polynomial time:

Input = a finite automaton

Output = the edit (or Levenshtein)  distance  of the set of words accepted by the automaton

Consequences: A trie is a finite automaton that represents a finite set of words (a dictionary). A trellis is a finite automaton that represents a block code. A constraint system is another term used for automata and finds applications in coding for data storage. The result applies to tries, trellises, and constraint systems, as these objects are finite automata. If the distance of a language is > , for some positive integer  , then the language is error-detecting for the channel that allows up to   errors (substitutions, insertions, and deletions) in any word.

Theorem [KKPWX04wseas].2   The following problem can be computed in quadratic time:

Input = a finite automaton

Output = the Hamming distance of the set of words accepted by the automaton

Consequences: as above.

A DNA word is a word over the DNA alphabet {A,C,G,T} and represents a DNA molecule in the 5'- to 3'- direction; for example, AGGTC represents the molecule 5'-AGGTC-3'. The symbol A is complementary to T and the symbol C to G. We indicate this by writing A=(T) or T=(A), and C=(G) or G=(C). The Watson/Crick complement of a DNA word  is the word . If  is sufficiently large, the two complementary words (molecules) would stick to each other as a result of forming  bonds between the  complementary pairs of symbols (nucleotides). A set of DNA words L is called Hamming(0,k)-bond-free if, for any segments x and y of length k of any two words in L, we have that x is not complementary to y, that is, x (y). This definition was introduced in [JoMa04] - the authors used the term -k-code.

The following result  gives an explicit characterization of all maximal Hamming(0,k)-bond-free languages. It uses the subword closure operation . Let F  be a set of words of length k. Then the set F consists of all words w such that any segment of w of length k is in F.

Theorem [KKSijfcs, KKS04dna10b]   A DNA language is maximal Hamming(0,k)-bond-free if and only if it is of the form S, where S is any set of DNA words of length k such that  constitutes a partition of the set of  all DNA words of length k that are not self-complementary = .

Proposition[KKLSTfi].4.2.  Let  X  be an alphabet containing at least two symbols, let  t  be an involution, and let  k  be a positive integer. The set of all hairpin (t,k)-free words over X is finite if and only if one of the following holds:

(a) t is the identity, or

(b) t is the mirror involution and,   either k=1,   or  k>3 and |X|=2.

Solid codes have remarkable synchronization capabilities. The following results provide methods for the construction of large classes of solid codes. If x is any symbol of the alphabet then  the quantity tx(w)  is the maximum length of a run of x’s in w. The quantity t(w) is the maximal length of any run in w. For example,  tb(aabbaaaba) = 2 and t(aabbaaaba) = 3. If  f  is an increasing and unbounded function  of the positive integers into positive integers, then we use f’  for denoting the near-inverse of  f.

Theorem[JKL04jalc].4.1.  Let  X  be an alphabet containing at least two symbols a and b, let  f  be   any increasing (not necessarily strictly) and unbounded function of the positive integers into positive integers, let  L  be any infix code, and let  ta  and  tb  be functions mapping words over X into non-negative integers such that ta‘(w) >= ta(w) and tb‘(w)>= tb(w), for every word w. The following language is a solid code

.

An important application of the above construction method is the following. For each positive integer n, let Kn be the solid code obtained when we use the following values for the parameters in the above construction:  X={a,b},  f(k)=k and f’(k)=k+1,  L = bXn a, where Xn = all words of length n,  and ta‘(w) = tb‘(w) = t(w).  Thus,

Proposition[JKL04jalc].6.1 The information rate of  Kn  tends to 1, as n goes to infinity. Moreover, Kn  has linear-time encoding and decoding complexities.