Bits of Learning

Learning sometimes happens in big jumps, but mostly in little tiny steps. I share my baby steps of learning here, mostly on topics around programming, programming languages, software engineering, and computing in general. But occasionally, even on other disciplines of engineering or even science. I mostly learn through examples and doing. And this place is a logbook of my experiences in learning something. You may find several things interesting here: little cute snippets of (hopefully useful) code, a bit of backing theory, and a lot of gyan on how learning can be so much fun.

Tuesday, September 13, 2016

APIs with Functions

This post introduces the concept of application programming interfaces (APIs) and what they are good for. As an example, we continue to use the Hisaab program.

As we have seen, functions are a very powerful abstraction mechanism to wrap around pieces of computation which can then be used and re-used as needed. This leads to a code which is more compact, readable, modifiable and flexible.

In Hisaab, the implementation we first created uses a particular representation of the graph data-structure called the edge list. For example, the following graph will be represented by the code right below that:



[
  ("A", 15, "F"),
  ("A", 20, "B"),
  ("B", 10, "D"),
  ("B", 10, "C"),
  ("D", 30, "C"),
  ("C", 30, "G"),
  ("G", 25, "E"),
  ("F", 100, "E"),
  ("E", 150, "D"),
  ("G", 90, "D")
]
The code of the algorithm is reproduced below:



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def reroute((n1, w, n2), n):
  if(n1 == n): return [(n2, -w, n1)]
  elif(n1 != n) and (n2 != n): return [(n1, w, n), (n2, -w, n)]
  else: return [(n1, w, n2)]

def starify_graph(g, n):
  new_g = []
  for e in g:
    new_edge = reroute(e, n)
    new_g.extend(new_edge)
  return new_g

# assumption: g is not a multigraph.
def merge((n1, w, n2), g):
  for i in range(len(g)):
    (m1, w1, m2) = g[i]
    if(m1 == n1 and m2 == n2):
      g[i] = (n1, w + w1, n2)
      return g
  g.append((n1, w, n2))
  return g

def graph_of_mgraph(mg):
  g = []
  for e in mg:
    g = merge(e, g)
  return g


The above code depends on the way we have implemented the graph data-structure. There are several examples in the above code which demonstrate the fact that the above code has been written with the knowledge about the internal implementation of the graph data-structure.

  • Line 1, 14: assumes that edges are triples.
  • Line 8, 10, 16, 18, 20, 24: graph is assumed to be an edge list.

The disadvantage with the above implementation of the algorithm is that it is strictly dependent on the way the graph data-structure is implemented. If the implementation of graph changes, the algorithm would have to be pretty much completely re-implemented.

Let's see how the above graph represented as an adjacency list would look like:

{
  'A': [('F', 15), ('B', 20)],
  'C': [('G', 30)],
  'B': [('D', 10), ('C', 10)],
  'E': [('D', 150)],
  'D': [('C', 30)],
  'G': [('E', 25), ('D', 90)],
  'F': [('E', 100)]
}

Below, we should the implementation of the algorithm when the graph is implemented as an adjacency list.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
def starify_graph(g, n):
  new_g = {} # instantiating a dictionary
  for n1 in g: #initialising the empty graph
    new_g[n1] = []
  for n1 in g:
    # invert and add every node starting with n.
    if(n1 == n):
      for (n2, w) in g[n1]:
        add_edge(new_g[n2], (n1, -w))
    else:
      for (n2, w) in g[n1]:
        # add every node ending at n.
        if(n2 == n):
          add_edge(new_g[n1], (n2, w))
        # reroute all other nodes.
        else:
          add_edge(new_g[n1], (n, w))
          add_edge(new_g[n2], (n, -w))
  return new_g

As is evident, this implementation is significantly different from the previous implementation. Again, there are several evidences in this code to show that it strictly assumes an adjacency list implementation of the graph data-structure.
Line 2: Graph is a dictionary.
Line 4: The elements of the graph dictionary are lists.
Line 8, 9, 11: The elements of the graph dictionary are pairs.

Modularity

Can we implement the above hisaab algorithm in a way so that it is independent of the internal details of the graph data-structure?

If we could do so, there would be the following advantages:
  • The hisaab algorithm would work fine irrespective of which implementation of graph we use.
  • The graph data-structure implementation can be modified subsequently without a risk of breaking the hisaab code.

The answer to the above question is Yes. By defining a set of functions which define a uniform way by which the rest of the code interacts with the graph code, we can ensure that there is separation between the view of the graph that the rest of the code sees, and its internal implementation. We do so, using a group of functions as follows:
  1. empty_graph: Returns a new empty graph with no nodes or edges
  2. make_graph: Returns a new graph having the edges which are provided in a list
  3. get_edges: Returns the list of edges in the graph
  4. add_edge: Adds an edge to a graph
The implementation of the above functions is specific to the specific implementation of the graph, but how they are used is not. Hence, the code which uses a graph only through the above functions is also independent of the specific implementation of the graph data-structure. The above set of functions can be termed as an application programming interface (API). This is because, this set defines the way one part of the program (called the client) interacts with another part of the program (called the server). Further functions like empty_graph and make_graph are called constructors (because they create new instances of the data-structure), get_edges are called extractors (because they reveal information about the data-structure without modifying it) and add_edge are called modifiers (because they modify the underlying data-structure). One way to look at the entire scenario would be as follows:

The client code (or user code) makes use of the facilities provided by the server code (or back-end code). In hisaab, the algorithm (implemented through the functions reroute and starify_graph) can be considered as the client code, while the graph data-structure can be viewed as the server code. As shown above, the API consisting of the four functions listed above form the interface through which the client talks to the server. Note that it is possible to freely change which of the two implementations the interface uses as the server without affecting the client. This gives us the flexibility of independently implementing the client without concerning ourselves about the internal details of the server.



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
def reroute(e, n):
  n1 = get_start_node(e)
  w  = get_weight(e)
  n2 = get_end_node(e)
  if(get_start_node(e) == n):
    return [make_edge(n1 = n2, w = -w, n2 = n1)]
  elif(n1 != n) and (n2 != n):
    return [make_edge(n1 = n1, w = w, n2 = n), make_edge(n1 = n2, w = -w, n2 = n)]
  else:
    return [e]

def starify_graph(g, n):
  new_g = empty_graph()
  edges = get_edges(g)
  for e in edges:
    new_edges = reroute(e, n)
    for e in new_edges:
      add_edge(g = new_g, e = e)
  return new_g

The above code shows the hisaab algorithm re-implemented using the above API. The best way to get convinced this code is independent of any of the internal implementation details of graph data-structure is to look for a dependence. You will not find one!

The API allows us to implement hisaab as a re-usable algorithm independent of the implementation of the graph.

Large Scale Software Development and Software Design

In large scale software development scenario, software systems are developed by teams of multiple software developers. It would be too inefficient to develop each part of the system sequentially. It's important to ensure that all teams are kept busy throughout the project duration. All sub-teams in a project work in parallel to ensure an early completion of the project.

The way the above is achieved is by dividing the work into several modules which interact with each other to create the complete system. This process is called software design. The division is worked out in a way that there is comparable distribution of work between the various modules. More importantly, each module defines an interface (or API) through which the rest of the system interacts with that module. A good design will ensure that the interfaces are enough to allow powerful interaction between modules but doesn't expose unnecessary details of a module to the rest of the system. While the internal implementation of a module is likely to change rapidly over the execution of a project, the module interfaces must be decided early and frozen. This is important to ensure that all other parts of the system can proceed assuming this interface about the given module. Hence, every sub-team, working on an individual module, can proceed in parallel.

Prototypes and Improvisations

As mentioned above, API enable independent development of interacting modules. Suppose, one team is developing the client C, and another is developing the server S. A typical way this piece of software would be developed in practice is as follows: An API for S is defined first. As C is dependent on S, a quick-and-dirty implement of S -- say S1 -- is created and handed over to the client team so that C's development can proceed. While C's development proceeds, the server team continues working on a better-engineered version of S -- say S2 -- in parallel. S2 would follow the same API of S, however may score better than S1 in certain parameters like performance, security etc. By the time C's development gets over, S2 is also ready. The version of the product which gets shipped to the customer uses C as client S2 as the server.

Code:

hisaab1.py: Implements hisaab using a edge-list implementation
hisaab2.py: Implements hisaab using an adjacency-list implementation
hisaab3.py: Implements hisaab as a re-usable code by using a graph API

No comments: