Lexlog: Theoretical Computer Science: 2012

Wednesday 28 November 2012

Introduction to Automata Theory

Here, we introduce the field of automata theory.

The most general notion of abstract automaton is called a state transition machine (STM), a formalism used to model machines or computations by a 3-tuple (L, S, T):

L is a linguistic structure symbolising the input that will get manipulated by the machine as well as possible output.
S is a state structure with a set of states Q, a single starting state in Q and a set of final states in Q.
T is an action structure containing a set of labels and a transition function, modelling the behaviour of the machine. That is, given a state in Q and specific constraints from a label (input/output symbols, costs, capacities, probabilities, etc.), the function will derive which states the machine should switch to.

Note that there are countably many ways to define a STM, and the literature on STMs is relatively poor and inconsistent. Suffice to say that STMs are a blueprint for constructing any form of automata.

On the other hand, there are uncountably many different types of automata and, much to our despair, there is no unifying classification for all of them...

Automata can then be organised by which machinery or dynamic system they are defined to model (cells - cellular automata, neurons - neural nets, computation - Turing machines, stochastic languages - probabilistic automata, etc.); or by the extent of their computational power, which is reflected by how many algorithms they can simulate or, equivalently, how many languages they can accept:

Finite State Machines/Automata (FSM/FSA) accept regular languages.
Pushdown Automata (PDA) accept context-free languages.
Linear Bounded Automata (LBA) accept context-sensitive languages.
Turing Machines (TM), including quantum TMs, accept recursively enumerable languages.
Stream Automata (\omega-Automata) accept \omega-languages.
Inductive Turing Machines (ITM) accept super-recursive languages.

In the following segment we will choose to focus on finite-state automata, that is: machines with string input, that will parse their input character by character by changing states depending on their transition functions and eventually (since strings are finite) halt; the string will be accepted if the automaton halts in an accepting state and rejected otherwise.

An FSA is a 5-tuple (Q, \Sigma, \delta, q_0, F), where:

Q is a finite set of states
\Sigma is a finite alphabet
\delta is a transition function: Q x \Sigma -> Q or P(Q) (where P(Q) is the powerset of Q)
q_0 is the start state
F is the set of accepting states

FSAs are characterised by having finite memory that depends only on the finite number of states the automaton can reach. They differ on the form of their transition function:

Deterministic FSA (DFSA): \delta : Q x \Sigma -> Q

Moore machine: \delta : Q -> Q (next state depends only on current state)
Mealy machine: \delta : Q x \Sigma -> Q (next state depends on input)

Non-deterministic FSA (NFSA): \delta : Q x \Sigma -> P(Q)

FSAs can only be in a single state at once (like most non-quantum automata): the state in which a DFSA is, is deterministic (we know which state it is in with a 1 probability), the state in which an NFSA is, is non-deterministic (we don't know which state it is in, it could be in any state within the element of P(Q) returned by \delta). An NFSA will accept an input if it could be in an accepting state.

Notice that the transition function for NFSAs differs by the fact that its range is the powerset of the range for that of DFSAs; thus it is possible to model any NFSA with a DFSA (the reverse is trivial), but doing so requires a number of states bounded by the cardinality of the powerset of the states in the NFSA - recall that we proved it to be 2^{#states in NFSA}. Hence non-determinism can be advantageous for its complexity aspect (rather than its computability aspect).

There is a one-to-one correspondence between all languages that can be formed from a regular expression and languages that can be accepted by an FSA, namely both these classes of languages are equivalent - they are the class of regular languages:

prove L(R) C L(M), use the recursive definition of regular expressions to show that FSAs can be built to accept any language induced by the base cases and by construction.
prove L(M) C L(R), use state elimination on an arbitrary FSA: show how a regular expression can express any intermediary state of the FSA, thus how any FSA can be reduced to a single start state with regular expression transitions to accepting states. The union of all these REs is the regexp generating the laguage the FSA accepts.
conclude that L(M) = L(R), that is: REs and FSAs are equivalently suitable to generate/accept any regular language.

We provided a small foundation in automata theory, which provides us with powerful formalisms to model computations. We also briefly had a glance at the standard formalism of Computation: Turing Machines, based on automata (using the same STM construct). We tentatively extrapolate our remark about FSAs being as powerful a formalism as REs, to TMs being as powerful a formalism as another formalism for Computation, the Lambda-Calculus.

Monday 26 November 2012

Introduction to Formal Language Theory

Here, we introduce the field of formal language theory.

There are 4 types of languages:

natural language - language arising naturally within societies
constructed language - language consciously defined within societies
artificial language - language arising from artificial situations
formal language - language axiomatically defined

We now use set theory to construct the theory of formal languages.

An alphabet is a set \Sigma whose elements are symbols (characters). We define the object of string (word), akin to tuple, by introducing the concatenation operation o on symbols and strings (the concatenation of symbols (or strings) a, b in \Sigma, a o b = ab is a string).

For this new type of object, we define empty string \epsilon (akin to empty set), length (akin to cardinality), substring relation (akin to subset relation), and string exponentiation (repeated concatenation).

Finally, we define a new set operator that makes sense on alphabets or sets of strings, the Kleene star * (akin to infinite powersetting): an operator on a set S defined inductively as constructing the first 'powerset' by forming all possible concatenations of elements in S U {\epsilon} (hence it will contain S and \epsilon properly), then constructing all concatenations within that new set, and doing so iteratively infinitely. Then if S is a finite set of symbols or strings, S* will be a countably infinite set closed under o, namely the set of all possible words that can be made by joining together elements of \sigma.

We can now define a language L over an alphabet \Sigma as the subset of \Sigma* that contains all valid words in the language. Moreover, all usual set theoretic operations still hold and we can extend the string operations of concatenation and exponentiation to languages.

This paragraph will be a short digression. Notice that in a language all words are finite, so, in order to avoid insulting Cantor's teachings, we also define an \omega-language over \Sigma as a subset of \Sigma^\omega (constructed by infinitely combining elements in \Sigma). Also, we can denote \Sigma^\inf = \Sigma* U \Sigma^\omega as the set of all finite and infinite words of \Sigma.

To start categorizing formal languages, we first need a formalism to describe the syntax of a language: the concept of (generative) formal grammars. We informally define a formal grammar as the set of production rules necessary to construct all the words in a language, thus it formalizes its syntax.

Then we partition the set of all formal languages into 2 major kinds of languages: recursively enumerable and non-recursively enumerable languages - notions central to computability/recursion theory. Suffice to say that to build a non-recursively enumerable language, one has to use non-straight forward techniques like Cantor's diagonalization; which is outside the scope of formal grammars' production rules.

The Chomsky hierarchy provides a way to categorize formal grammars:

Unrestricted grammars generate recursively enumerable languages. This class includes all formal grammars and contains properly all context-sensitive grammars.
Context-sensitive grammars generate context-sensitive languages. This class contains properly all context-free grammars.
Context-free grammars generate context-free languages. This class contains properly all regular grammars.
Regular grammars generate regular languages. This class of grammars has the same expressiveness as regular expressions, that is both are equivalently suitable to generate the class of regular languages.

We now introduce regular expressions as a convenient, compact notation for representing regular grammars and regular languages. That is, a given regular expression will provide a way to construct/describe all words of a given regular language; the procedure used is known as pattern matching.
Given a finite alphabet \Sigma, the set of regular expression RE is defined recursively as the smallest set described by:
Basis: empty set, empty string and any string literal (symbol) in \Sigma belong to RE.
Inductive construction: if R,S belong to RE, then R+S (alternation), RoS (concatenation, abbreviated RS) and R* (Kleene star) belong to RE.

Thus for a given regular language, we can construct a regular expression - a pattern from which every word in the language can be generated. The string operations o and * hold for this concept of pattern as well as alternation (+), it is the analogue of set union: it will match a string that, at a given position within it, matches either of the disjuncted regexps.
Procedurally: the regexp A*(R+S)BB will match any string that starts with 0 to n literals 'A' followed by a pattern matching either R or S and ending with 2 literals 'B'; for example if \Sigma = {A, B, C} and R = C* and S = CA, the strings BB, AAABB, CCCBB, AAACBB, CABB, AACABB, etc. are in the language L(A*(C*+BA*)BB).

We provided a brief basis in formal language theory, the field of study concerned with the syntactical aspects of formalised language. This basis is essential to domains in TCS such as programming language theory in which words of a formal language are given specific semantics and computational complexity theory and mathematical logic where this formalism is chosen to express decision problems and logical statements.

Wednesday 7 November 2012

Divide and Conquer

Here, we develop results for divide and conquer algorithms.

The divide and conquer strategy consists of dividing a problem into subproblems until these become trivial to solve, then combining the results iteratively (bottom-up) to provide the solution to the original problem. The paradigm is very efficient for certain kinds of problems where that design makes sense: sorting (quicksort, mergesort), searching (binary search), multiplying (karatsuba, strassen), etc. This strategy is naturally implemented using recursion: if the problem is trivial, solve it, else make recursive calls to solve subproblems and combine their return values in a specific way.

The time complexity (function describing the amount of time the algorithm will take to terminate depending on the size of its input) of such algorithms is going to depend on the number of recursive calls, the time each of these recursive calls takes and the time complexity of the function that combines the return values of the recursive calls. Hence, the time complexity is going to take the form of a recurrence relation: for the trivial cases, the time is fixed, for the nontrivial cases, the running time given an input of size n will depend on the time given an input of size n' < n. To obtain a closed form (depending only on the size of the input) for the time complexity, we use the method of repeated substitution.

Since the validity of the obtained closed form relies on the assumptions made on k, there are several more steps to take in order to prove the result for all inputs: since we are interested in the time behaviour, we will actually end up using that closed form to set a bound on the time complexity. We can prove that T(n) is a non-decreasing function inductively, then use the closed form to set a lower bound, upper bound and/or tight bound on T. For convenience:

Recall asymptotic complexity bounds (big O notation):

Upper asymptotic bound

Tight asymptotic bound

Lower asymptotic bound

A generalisation of deriving asymptotic bounds for the recurrence relations arising as time complexities of divide and conquer algorithms is the Master Theorem. Given a relation of the form T(n) = aT(n/b) + f(n) for n bigger than a given threshold and f(n) in \Theta(n^d), where a constitutes the number of recursive calls, b is the dividing factor and f(n) represents the time complexity of the dividing and recombining procedure, the master theorem will provide a tight bound for T as follows:

if a < b^d then T(n) in \Theta(n^d)
if a = b^d then T(n) in \Theta(log(n)*n^d)
if a > b^d then T(n) in \Theta(n^log_b(a))

Notice that the theorem assumes a relation with a constant b, i.e. the subproblems must be roughly the same size. There is another generalisation of this procedure for deriving bounds for the time complexity of divide and conquer recurrences, that can be used when the subproblems have different sizes, it is called the Akra-Bazzi method.

Wednesday 31 October 2012

Recursion

Here, we briefly discuss recursion.

To understand recursion one must first understand recursion

Structurally, recursive definitions are common for building recursively defined sets (e.g. N: 1 in N and [n in N => n+1 in N]); functionally, for building recursively defined functions.

These are alternatively referred to as inductively defined functions, but we will prefer the former term. The Recursion Theorem states their existence and unicity, their general form is given as is from http://www.cs.toronto.edu/~vassos/b36-notes/:

In many cases it is hard to derive a closed form for such functions, however for the special case of functions defined by simple recursion (where there is a single recursive call in the recurrence relationship), a method called repeated substitution can be applied, we give its outline:

Let k be large enough, consider the recursively defined function: T(k) = T(h(k)) + nk
> Substitute T(h(k)) recursively in the equation:
T(k) = T(h(h(k))) + h(k)k + nk
> Then apply arithmetic rules to simplify it
> Repeatedly iterate the process until a pattern emerges (not always the case!), then write the general formula for the ith iteration. In that form, substitute the input k by an input that will eliminate the recursive call by transforming it into the base case. Now the closed form can be proved to be valid using induction. Note that sometimes it might be necessary to make specific assumptions on k (if h(k)=k/2 then we might want to consider k to be a power of 2) in order to make progress.

We provide a working example:
open form : T(n) = 2T(n/2)+n (n>1; T(1)=1)
let n=2^k : T(2^k) = 2T(2^k-1)+2^k
1st sub.        : T(2^k) = 2^2 * T(2^k-2) + 2 * 2^k
2nd sub.        : T(2^k) = 2^3 * T(2^k-3) + 3 * 2^k
3rd sub.        : T(2^k) = 2^4 * T(2^k-4) + 4 * 2^k
gen. ith form   : T(2^k) = 2^i * T(2^k-i) + i * 2^k
let i=k         : T(2^k) = 2^k * T(1) + k * 2^k
elim rec. call : T(2^k) = 2^k + k * 2^k
n=2^k => k=logn : T(n) = n + log_2(n) * n
closed form     : T(n) = n*log_2(n) + n (when n is a power of 2)

In algorithm design, recursion is a powerful tool applied in many paradigms (strategical approaches to problems), the most notorious being the divide and conquer method.

Wednesday 24 October 2012

Correctness

Here, we look at program correctness.

A program can be viewed as a contract binding a precondition to a postcondition; let Pre(X) be the predicate denoting the precondition, where X is the input of the program, and Post(X) be the predicate denoting the postcondition. Then a program is correct if: on all X, such that X satisfies Pre(X), Post(X) holds after the program terminates.

In the context of iterative algorithms, this leads to the notion of a loop invariant with respect to these predicates. Hence, to prove total correctness of an iterative algorithm, we need to prove termination and partial correctness (the loop invariant holds). Both of these are proved inductively, separately.

To prove the correctness of recursive algorithms, we define a predicate expliciting its total correctness (that is that it terminates on valid input and that Pre(X) => Post(X) when it does). We then prove this predicate inductively, hence both termination and partial correctness are proved jointly.

It should be noted that termination analysis is not always possible because of the undecidability of the Halting Problem (given an arbitrary program, prove that it halts or not). The fact that there are such undecidable problems demonstrates a 'hole' in everything our field (CS) hopes to be able to compute, an incompleteness very much related to Godel's theorems.

Monday 15 October 2012

Generalized Induction

Here, we want to generalise the notion of induction.

Mathematical induction (simple and complete) works by exploiting the well-ordering principle (of the natural numbers). A first (natural) generalisation would be to extend the notion to any well-ordered set, that is: a totally ordered set (a set with a total order) such that any non-empty subset of it has a smallest element. This generalisation is called Transfinite Induction, and it works by exploiting the well-orderedness of a set (furthermore, both are equivalent - the same way mathematical induction is equivalent to the well-ordering principle).

I suppose that the name derives from its most useful / original purpose: induction on transfinite numbers, such as transfinite ordinals (infinite numbers that describe position) and transfinite cardinals (infinite numbers that describe size, such as aleph_naught = |N| that we mentioned last time). Together ordinals and cardinals form a (set-theoretic) extension of the natural numbers (to describe 'relative positioning of elements' and size of infinite sets); and whereas mathematical induction works on finite ordinals (= finite cardinals = natural numbers), it fails on infinite ords and cards.

Most proofs by transfinite induction are thus limited to the realm of set theory, but there are some non set-theoretic proofs too! Check out http://www.mathnerds.com/best/Mazurkiewicz/index.aspx to learn if it is possible to construct a subset of the plane such that every line intersects that subset exactly twice!

Another way to extend the principle of induction would be to look at a stronger (= broader) notion of well-ordering. Since a well-order is a well-founded total order, we might want to include well-founded partial orders as part of our generalisation. This generalisation is called Well-founded Induction (and is the most general we can get). This is a quote from Wikipedia about it:

"As an example, consider the well-founded relation (N, S), where N is the set of all natural numbers, and S is the graph of the successor function x → x + 1. Then induction on S is the usual mathematical induction, and recursion on S gives primitive recursion. If we consider the order relation (N, <), we obtain complete induction, and course-of-values recursion. The statement that (N, <) is well-founded is also known as the well-ordering principle.
There are other interesting special cases of well-founded induction. When the well-founded relation is the usual ordering on the class of all ordinal numbers, the technique is called transfinite induction. When the well-founded set is a set of recursively-defined data structures, the technique is called structural induction. When the well-founded relation is set membership on the universal class, the technique is known as epsilon-induction. See those articles for more details." -- http://en.wikipedia.org/wiki/Well-founded

Hence, Structural Induction is concerned by a subset of that generalisation, namely structures with a well-founded partial order (coming from the fact they are recursively defined, e.g. the Fibonacci numbers). Thus, in structural induction, we prove that a property holds for the smallest structure, and that it holds for a greater structure when it holds for its substructures, to prove that it holds for any structure of that type.

Further self-questioning: which generalisation seems the most natural to you? why does transfinite induction seem like a more natural generalisation than structural induction even though SI is clearly more intuitive then TI? does a total order exist on the set of all generalisations of induction such that in any subset of them there is a least 'natural' element?

Wednesday 26 September 2012

Mathematical Induction

Here, we examine a fundamental property of discrete mathematics.

At the heart of theoretical CS and discrete math is the set of natural numbers. A core structural aspect of the set is the principle of well-ordering, that is that every non-empty subset has a smallest element. It is thus possible to define a total order over the natural numbers, and this leads us to mathematical induction: we can use the ordering to prove statements inductively.

Simple induction consists of proving a property for a base case (initial subset) and for an inductive step (if a property holds at a given subset, it will hold for a subset one increment greater). Using the fact that the sizes of these subsets are ordered, we can deduce properties on a countably infinite set!
Complete induction is similar but requires considering every subset smaller than the given subset in the inductive step.

We can prove very interesting (structural) properties thanks to mathematical induction, such as the cardinality of a powerset:

for all n in N, P(n): "for all s, |s| = n => Power(s) = 2^n"
> Base case: Power({}) = |{{}}| = 1 = 2^0 => P(0)
> Inductive step:
    Assume n in N.
      Assume P(n)
    Assume S such that |S| = n+1
    Let X and t such that X = S\{t} (i.e. S = X U {t})
Then |X| = n
Partition Power(S) as follows: (1)
Let S- = {S in Power(S)| t not in S}
Let S+ = {S in Power(S)| t in S}
    Then S- = Power(X)
Then |S-| = |Power(X)| = 2^n (2)
                # because of the induction hypothesis
Let f: S-   ->   S+
                 x |->   x U {t}
We can easily show that this function is a bijection by
          proving by contradiction that it is one-to-one and by
          contradiction that it is onto
            Then f(S-) = S+
              Then |S-| = |S+| (3)
          From (1), (2) and (3): Power(S) = |S-| + |S+| = 2^n+1
        We generalize: for all s, |s| = n+1 => Power(s) = 2^n+1
      Thus P(n) => P(n+1)
    We generalize: for all n, P(n) => P(n+1)
> Conclusion: By the principle of SI:
              [P(0) and P(n)=>P(n+1)] => for all n, P(n)

   This result can be applied to finite sets as well as countably infinite sets. For instance, Cantor showed that the cardinality of R is the same as the cardinality of the power set of N (there is a bijection from one to the other), hence we can get a superficial idea of how big the continuum is in relation to the cardinality of N: |R|=2^|N|

Monday 24 September 2012

Introduction to Lexlog: TCS

Welcome to Lexlog: Theoretical Computer Science

About: I am an undergraduate student belonging the department of Computer Science at the University of Toronto.

Purpose: A strong background in theoretical computer science is fundamental for the computer scientist. This blog is intended to record my progress in learning this foundation.

Overview: The major fields encompassed under Theoretical CS are:

> Theory of Computation

Computability Theory (recursion theory)
Computational Complexity Theory
Automata Theory

> Algorithm Design and Analysis

> Programming Language Theory

> Distributed Computing

> Information Theory

> Formal Methods

Other related fields are discrete mathematics and graph theory, cryptography and formal language theory to name a few.