Saturday, 8 March 2014

Introduction to Algorithms

Here, we briefly introduce the two interrelated fields of Algorithm Design and the Analysis of Algorithms.

Algorithm Design is a framework to study algorithmic design paradigms (what types of problems they can solve optimally, what data structures can be used, what algorithms are robust enough given nondeterministic settings...). The Analysis of Algorithms is the subfield of complexity theory that provides a framework for studying the running time and correctness of algorithms.

Here is an attempt to organise algorithms according to some important properties:

  • recursive/iterative  | not using looping
  • serial                       | parallel (= concurrent)
              • "classical" parallel
              • distributed
  • exact                        | approximate
  • deterministic            | nondeterminisitic*
  • not using randomness | probabilistic (= randomized)
              • Monte Carlo (ND solution)
              • Las Vegas (ND runtime)
  • not using heuristics     | heuristic based*

Here is a classification by design paradigm:
  • Brute Force - Try every candidate solution in the search space until a solution is found.
  • Divide and Conquer - Break the problem down to its most trivial subproblems and recombine solutions bottom-up.
  • Dynamic Programming - Remember solutions to some subproblems in a table so there is no need recompute them.
  • Greedy Method - Order candidate solutions in an informed way to avoid exploring the entire search space.
  • Linear Programming - Optimization of a function given linear constraints.
  • Other Search/Enumeration/Optimization design paradigms
    • Backtracking - Depth first search on search space (tree), only exploring branches where possible solutions can lie.
    • Branch and Bound - Special backtracking where bounding is a bit more sophisticated.
    • Hill Climbing - Make a move in the search space and see if it improves the objective function and keep on going in that direction.
    • Local Search - Move to the neighbour in the search space that improves the objective function the most.
    • Tabu Search - Local Search that avoids (by prior or learning) tabu areas in the search space.
    • Simulated Annealing - Probabilistic exploration of the search space based on decreasing the entropy of the decisions slowly with time.
    • Genetic Algorithm - Algorithm that simulates evolution, ensuring that the genes of the genotype that is the most fitted to its environment (according to some objective function) are more likely to be carried over to the next generation.


Notes (*):

Nondeterminism: when behaviour cannot be predicted deterministically from state and input. There is some discrepancy in the terminology here again:
  • Generally, nondeterminism is equivalent to randomness and stochasticity. It can arise from the algorithm itself (i.e. a probabilistic alg) or from externalities (i.e. because of the implementation setting - e.g. a parallel alg running on a system with a race condition).
  • In Complexity Theory, nondeterminism has a very specific sense: a nondeterministic machine can branch its thread of execution at any point in the control flow; so a ND alg here is one that can run on a ND model of computation. Notice that the result is deterministic!

Heuristic: If you have taken a psychology course before, you might have learned that the difference between an algorithm and a heuristic regards optimality of the solution, so how about approximation algs or heuristic algs? To reconcile terminology, we consider "heuristic algorithms" (or "heuristic-based algs") as synonym of "algorithms making some use of heuristics" but we exclude heuristics as algorithms:
  • heuristic-based algorithm integrates in its logic strategie(s) that improve some aspect of the behaviour of the alg without providing rigorous info about the tradeoffs they induce.
  • heuristic can be viewed as a pure heuristic-based alg (the heuristical aspect applies to the purpose of the alg itself, hence the induced tradeoff here always affects the optimality of the solution). In this sense, heuristics are approximation algs for which we don't have info on the approximation ratio.

As a field of study, the Design and Analysis of Algorithms - or simply "Algorithms" - lacks an overarching formalism (Bellman, Borodin...), even the definition of algorithm itself is not generally agreed upon (if you have time: http://en.wikipedia.org/wiki/Algorithm_characterizations).