# SOME DEFINITIONS

1. * operation

If X is an alphabet then X* is the set of all words (or strings) that can be formed using the symbols of X. For example, if X={a,b} then b, abba, aabbbba are words in X*. The set X* is infinite and contains the empty word. If L is any language (that is a subset of X*) then L* is also a language that consists of the empty word and of any word that can be obtained by concatenating one or more words from L. For example, if L={aa, babb} then the words aababbaa and babb belong to L*.

1. alphabet, word (or string)

Any nonempty set whose elements are intended to be used as symbols is called an alphabet. Normally an alphabet is finite. For example: binary alphabet {0,1}, DNA alphabet {A,C,G,T}. Also {a,b,c} is an alphabet, etc. Let X be an alphabet. Any sequence of symbols in X is called a word or string (over X). For example, abb, acbb, abbbccaa are words (or strings) over {a,b,c}. The empty word is the only word that consists of no symbols. Usually it is denoted with  or . Words can be concatenated: the concatenation of u=abb and v=bcc is the word uv=abbbcc.  Obviously, u = u = u for every word u. The set of all words over X is denoted by X*.

1. channel (combinatorial)

Let X be an alphabet. A (combinatorial) channel g is a binary relation on X* that is domain preserving; that is, g is a set of pairs of words such that if (w,w') is in g then also (w,w) is in g, for all words w and w' in X*. The expression ‘(w,w') is in g’ represents the fact that the word w' can be received via the channel g when the word w is used as input. In general, the channel g is noisy, meaning that w is not equal to w' - then we say that w has been received with errors. The requirement that (w,w) is in g ensures that error-free communication is possible. This channel definition is given in [Kon02jucs] and is based on a more general channel definition in [JuKo97codes]. These definitions are inspired by the work of Levenshtein in [Le66], where codes for certain combinatorial channels are studied.

1. code (uniquely decodable)

A language K is called a uniquely decodable code (or simply a code), if every word w in K* can be written in the form  w = x1xn , where each xi  is in K, in exactly one way. For example, {aa, bb, aabbbb} is not a code because  (aa)(bb)(bb)(bb) = (aabbbb)(bb). On the other hand every prefix code is a code.

1. edit distance (or Levenshtein distance)

The edit distance between two words u and v is the smallest number of errors in an edit string that can be used to transform u into v. Let’s denote with dist(u,v) this distance. Then dist(bbbb,ababb) = 2, because we can transform bbbb into ababb by inserting an a in front of bbbb and substituting the 2nd b with a, that is, using the edit string (/a)(b/b)(b/a)(b/b)(b/b).

1. edit operation, edit string, error, weight of edit string

Let X be an alphabet of (ordinary) symbols and let  be the empty word. The alphabet E of the edit operations (over X) consists of all symbols (x/y) such that each of x,y is in X or is , and at least one of x and y is in X, that is, nonempty. If (x/y) is an edit operation and x is not equal to y then we call (x/y) an error [KaKo04jalc]. Any word in E* is called an edit string (or e-string). We write (/) for the empty edit string. For example, h =  (/a)(b/b)(b/a)(b/b)(b/) is an edit string. The input part of h is the word bbbb and the output part of h is the word abab. The edit string h says that we can transform the word bbbb into abab using one insertion (insert a in front of bbbb), one substitution (substitute the 2nd b with a) and one deletion (delete the last b). The expression weight(h) denotes the number of errors in the edit string h.

1. error-correction (combinatorial)

In information transmission we fix a communications language L whose words are transmitted over a channel g. In this context, the following problem is important: Suppose that a word w' is received via the channel g - possibly w' has been received with errors, that is, w' is not equal to the transmitted word. We want to be able to compute the transmitted word w of L by looking only at w'. This is possible when no other word z of L can also result into w' via the channel g. This is ensured when the language L is error-correcting for g.

Definition [Kon02jucs]  A language L is called error-correcting for some (combinatorial) channel g if, for any words w and z of , whenever (w,w') is in g and z is not equal to w then (z,w') is not in g.

In this defintion,  is the language together with the empty word. The use of the empty word ensures that a word w'cannot be received by both the empty word (=no transmission) and by a nonempty word of.

1. error-detection (combinatorial)

In information transmission we fix a communications language L whose words are transmitted over a channel g. In this context, the most basic concern is the following: Suppose that a word w' is received via g and w' is in L. Is w' the same as the word of L that was sent into g? or is it the case that another word w of L was sent but, due to channel errors, the word w' was received? If the language L is error-detecting for g then this cannot happen.

Definition [Kon02jucs]  A language L is called error-detecting for some (combinatorial) channel g if, for any words w and w' with w in , whenever (w,w') is in g and w' is in  then w'=w.

In this defintion,  is the language L together with the empty word. The use of the empty word ensures that no word of L can be completely erased by g, and that no word of L can be received via g when the empty word is transmitted.

1. hairpin free word

Let t be an involution and let k be a positive integer. A word  w  is called hairpin (t,k)-free if  w  does not contain two separate segments  v  and  t(v) of   length k; that is,  w  cannot be written in the form  xvyt(v)z  with v being of length k.

1. infix code

A set of words (language) K is an infix code if  no word of K is properly contained in another word of K. For example, the language {abbaa, baa} is not an infix code, as the word  baa  is contained in  abbaa. The language {ababb, aabb} is an infix code.

1. information rate

Let K be a finite code over some alphabet X. Let l(K) be the average length of the words in K and let |K| be the number of words in K. The information  rate of K is the quantity  log|K| / l(K), where the base of the logarithm is equal to the size of the alphabet X. The closer the rate is to 1 the more efficient it is to encode information using the code – here efficiency refers to the length of messages in K* that are used to encode information.

1. involution

Let X be an alphabet. An involution, [KKT02], is a function  t: X à X  such that, if  x  is a symbol in X, then  t(x) is also in X. Moreover, the involution is either morphic, or antimorphic. It is morphic when t(x1xn) = t(x1)…t(xn), for all words x1xn. It is antimorphic when t(x1xn) = t(xn)…t(x1). The mirror involution is the antimorphic involution t such that  t(x1xn) = xn…x1,  for all words x1xn.

1. language

Let X be an alphabet. A language is any set of words; that is, any subset of X*. A language can be finite or infinite.

1. near-inverse of a sequence

Let  s = (s(n))  be a sequence of positive integers, where n=1,2,3,…, such that  s  is monotonically increasing (not necessarily strictly) and unbounded. The near-inverse  of  s  -- called (∞,∞)-inverse in [JuKo06ijcm] --  is the monotonically increasing and unbounded function g, say, such that

g(i) = min {j : s(j) > i}   and   s(j) = min {i : g(i) > j }.

The fact that g is unique is shown in Theorem [JuKo06ijcm].3.1. The term near-inverse is justified by Theorem [JuKo06ijcm].4.1, which states that for any strictly increasing and continuous real function  ŝ  with     for all n  -- such an ŝ  always exists and is called a continuous approximation of s -- one has that   g(i) = 1 + .  The term near-inverse is also defined when at least one of  s and g  is finite [JuKo06ijcm].

1. prefix code

A set of words (language) K is a prefix code if  no word of K is a prefix of another word of K. For example, the language {abbaa, abb} is not a prefix code, as the word  abb  is a prefix of  abbaa. The language {ababb, aab} is a prefix code. See also the dual notion of suffix code.

1. solid code

A set of words (language) K is a solid code [ShYu90,JuYu90,Lam01, JKK01, JKL04], or code without overlaps [Le70,Bal02], if  (a) no word of K is properly contained in another word of K – thus, K is an infix code – and (b) no prefix of a word in K is a suffix of a word in K (unless the prefix is the empty word). For example, the language {abbaa, baa} is not a solid code, as the word  baa  is contained in  abbaa. Similarly, {abbaaabbab, …anything here….} is not a solid code, as  ab is both a prefix of a word and a suffix of a word. The language {ababb, aabb} is a solid code. Solid codes have the following remarkable synchronization property: click here.

1. suffix code

A set of words (language) K is a suffix code if  no word of K is a suffix of another word of K. For example, the language {abbaa, baa} is not a suffix code, as the word  baa  is a suffix of  abbaa. The language {ababb, aab} is a suffix code. See also the dual notion of prefix code.

1. uniquely decodable code: see code

1. word (see alphabet, word)