I know I am getting into a well-known topic. What more, make is a pretty old technology. We have so many other cooler build automation technologies out there. IDEs like Eclipse and Visual Studio make it unnecessary to write your old build system in the first place.
After the above negative marketing, let me tell you why then I am writing this post: because understanding how dependencies work between the various modules of your software is fundamentally connected to your design thinking. In fact, understanding how make's algorithm works at a high level is worth the effort. I present a prototype implementation of the same below, all of ~30 lines of OCaml code.
A Running Example
(Adapted from a illustrative project by my student Anirudh C.)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | (* arith.ml -- begin *) let inc x = x + 1 let dec x = x - 1 let add x y = let rec loop x y = if y = 0 then x else loop (inc x) (dec y) in loop x y let sub x y = let rec loop x y = if y = 0 then x else loop (dec x) (dec y) in loop x y (* arith.ml -- end *) (* circle.ml -- begin*) let pi = 3.141 let area r = pi *. r *. r let circumference r = 2. *. pi *. r (* circle.ml - end *) (* main.ml -- begin *) let main () = print_endline (string_of_int (Arith.add 1 2)); print_endline (string_of_int (Arith.sub 3 2)); print_endline (string_of_float (Circle.area 2.)) let _ = main () (* main.ml -- end *) |
In the code shown above, we have three modules, arith (arith.ml, arith.mli), circle (circle.ml, circle.mli) and main (main.ml, main.mli). main uses both arith and circle which are self contained modules. The OCaml implementation of the above idea, therefore, has the following additional files:
arith.cmi: The object code for the interface of arith module
arith.cmo: The object code for the implementation of arith module
circle.cmi: The object code for the interface of circle module
circle.cmo: The object code for the implementation of circle module
main.cmi: The object code for the interface of main module
main.cmo: The object code for the implementation of main module
app: The final executable.
Here are the listings of the interface files:
1 2 3 4 5 6 7 8 9 10 11 12 13 | (* arith.mli -- begin *) val add : int -> int -> int val sub : int -> int -> int (* arith.mli -- end *) (* circle.mli -- begin*) val area : float -> float val circumference : float -> float (* circle.mli - end *) (* main.mli -- begin *) val main: unit -> unit (* main.mli -- end *) |
The dependencies between the various files is shown by the following directed acyclic graph:
As you can see, the dependencies look a bit complicated, if not a lot. Part of why it's this way has to do with the particular programming language that you are using. For example, OCaml has this idea of object code being generated from the interface files by compiling the .mli file. However, the most important aspect of this structure is agnostic to the programming language. That aspect of the dependencies is its directed acyclic nature. In the dependency graph above, you will find no cycles or loops.
What impact do the above dependency relations have on build automation process. Here is the exhaustive sequence of build steps:
1 2 3 4 5 6 7 | ocamlc -c arith.mli # builds arith.cmi ocamlc -c arith.ml # builds arith.cmo ocamlc -c circle.mli # builds circle.cmi ocamlc -c circle.ml # builds circle.cmo ocamlc -c main.mli # builds main.cmi ocamlc -c main.ml # builds main.cmo ocamlc -o app arith.cmo circle.cmo main.cmo # builds app |
The first influence is the following: The dependee must always be compiled after all its dependencies have been compiled. The sequence in which the above steps are written respects this order.
The second one is an opportunity of optimisation: When you are building the application, you compile only that part of the application which is depends on any part which has been modified since the last build. For example, if we make a modification in the circle.ml file, which parts of the build process need to be repeated? Looking at the dependency graph we can say that its circle.cmo, and app, because these are only two nodes in the graph from which you can reach circle.ml by following the directed edges of the dependence graph.
The make Algorithm
The make algorithm allows us to implement the above optimisation. We specify the dependency graph in a makefile. Here's the Makefile for the above application:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | app : arith.cmo circle.cmo main.cmo ocamlc -o app arith.cmo circle.cmo main.cmo arith.cmo: arith.ml arith.cmi ocamlc -c arith.ml circle.cmo: circle.ml circle.cmi ocamlc -c circle.ml arith.cmi : arith.mli ocamlc -c arith.mli circle.cmi : circle.mli ocamlc -c circle.mli main.cmo : main.ml main.cmi arith.cmi circle.cmi ocamlc -c main.ml main.cmi : main.mli ocamlc -c main.mli |
make algorithm takes this graph as an input, studies the modification timestamps of all the targets and does just what is necessary (and useful) in the build process of the application. What's the rule to be followed here? Whenever any of the dependencies, say d, of a dependee/target file f was modified after f was modified, f needs to be rebuilt. However, one more bit. The timestamp of f and d is checked only after this rule is applied on d. And recursively. While an untrained human mind may find this a bit mind-bending, it's really a very very simple algorithm.
1 2 3 4 5 6 7 8 9 10 | function process_node(n) if(n.deps != {}) then process_node(n') for all n' in n.deps maxdepts = max(timestamps(n.deps)) if(timestamp(n) < maxdepts) then execute(n.action) update_timestamp(n, now) endif endif end |
The algorithm above does a depth first traversal starting with the target which we are building. After having processed all the dependencies, it compares the timestamp of the current target with those of all its dependencies. If any one of them is more recently modified than the target, the target is rebuilt by executing its build command.
To round up the article, here's the OCaml implementation of the above algorithm. Even though the code has a good 170 LOC, note that the core algorithm is just 33 lines (L42 to L72). Rest of it is all plumbing and testing code.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | type node = { name : bytes; dependencies : node list; action : bytes; } type timestamp = { n : node; t : int; } let string_of_timestamp ts = "\t(" ^ ts.n.name ^ ", " ^ (string_of_int ts.t) ^ ")" let string_of_timestamps tslist = "[" ^ (let strtss = List.map string_of_timestamp tslist in (List.fold_left (fun x y -> x ^ y ^ "\n") "" strtss)) ^ "]" exception My_exception of string let rec get_timestamp timestamps n' = match timestamps with [] -> raise (My_exception n'.name) | { n; t } :: tl -> if n = n' then t else (get_timestamp tl n') let rec update_timestamp timestamps n' t' = match timestamps with [] -> [] | { n = n; t = t } :: tl -> if n = n' then { n = n'; t = t'} :: tl else { n ; t } :: (update_timestamp tl n' t') let rec max = function [] -> raise Not_found | [h] -> h | h :: t -> let maxt = max t in if h > maxt then h else maxt let rec process_node n ts = let myt = get_timestamp ts n in if (List.length n.dependencies <> 0) then let depts = process_dependencies n.dependencies ts in let maxts = max (List.map (get_timestamp depts) n.dependencies) in if myt < maxts then begin print_endline ("executing " ^ n.action); let ts' = update_timestamp depts n (maxts + 1) in begin print_endline (string_of_timestamps ts'); ts' end end else begin print_endline ("Nothing to be done for " ^ n.name); ts end else begin print_endline ("Nothing to be done for " ^ n.name); ts end and process_dependencies deps ts = let rec loop deps ts = match deps with [] -> ts | d :: deps' -> let ts' = process_node d ts in loop deps' ts' in loop deps ts let arith_mli = { name = "arith.mli"; dependencies = []; action = ""; } let arith_cmi = { name = "arith.cmi"; dependencies = [arith_mli]; action = "ocamlc -c arith.mli" } let arith_ml = { name = "arith.ml"; dependencies = []; action = ""; } let arith_cmo = { name = "arith.cmo"; dependencies = [arith_ml; arith_cmi]; action = "ocamlc -c arith.ml"; } let circle_mli = { name = "circle.mli"; dependencies = []; action = ""; } let circle_cmi = { name = "circle.cmi"; dependencies = [circle_mli]; action = "ocamlc -c circle.mli" } let circle_ml = { name = "circle.ml"; dependencies = []; action = ""; } let circle_cmo = { name = "circle.cmo"; dependencies = [circle_ml; circle_cmi]; action = "ocamlc -c circle.ml"; } let main_mli = { name = "main.mli"; dependencies = []; action = ""; } let main_cmi = { name = "main.cmi"; dependencies = [main_mli]; action = "ocamlc -c main.mli" } let main_ml = { name = "main.ml"; dependencies = []; action = ""; } let main_cmo = { name = "main.cmo"; dependencies = [main_ml; main_cmi; circle_cmi; arith_cmi]; action = "ocamlc -c main.ml"; } let app = { name = "app"; dependencies = [arith_cmo; circle_cmo; main_cmo]; action = "ocamlc -o app arith.cmo circle.cmo main.cmo"; } let ts = [ {n = arith_mli; t = 1}; {n = arith_cmi; t = 0}; {n = arith_ml; t = 0}; {n = arith_cmo; t = 0}; {n = circle_mli; t = 0}; {n = circle_cmi; t = 0}; {n = circle_ml; t = 0}; {n = circle_cmo; t = 0}; {n = main_mli; t = 0}; {n = main_cmi; t = 0}; {n = main_ml; t = 0}; {n = main_cmo; t = 0}; {n = app; t = 0}; ] let t1 () = process_node app ts let t2 () = (update_timestamp ts main_cmo 10) |> string_of_timestamps |> print_endline let t3 () = (get_timestamp ts arith_mli) |> string_of_int |> print_endline; (get_timestamp ts circle_mli) |> string_of_int |> print_endline let _ = t2 () let _ = t1 () let _ = t3 () |
Hope you found the above useful and interesting!