Wednesday, March 25, 2015

A very first step towards fragment-based code distribution

Code package managers like Cabal, NuGet, Maven, Composer and Leiningen allow you to share and reuse code. They work in units of packages. What if we took not entire packages but much smaller code fragments as the unit of distribution? The idea to have finer units of code distribution and dependency has been around for quite a while. A very prominent example is this post by Joe Armstrong. Another example is this proposal to track dependencies on the function level.

I am working on an experimental code manager for Haskell called fragnix. It will allow users to share and reuse code in units of fragments instead of packages. Let’s look at an example fragment:

drawForest :: Forest String -> String
drawForest = unlines . map drawTree

A fragment is a small list of Haskell declarations. We can not use a fragment on its own. We need to know which other fragments it uses and which language extensions it assumes.

That’s why we introduce a slice. A slice consists of four parts:

  • A fragment.
  • Dependencies on other slices or built-in symbols.
  • A list of language extensions.
  • A unique ID.

The unique ID is a hash of the other three parts of the slice. In particular it includes the hashes of all used slices which in turn means that the hash transitively includes all fragments the slice uses. Fragnix stores slices in a structured way and generates an ordinary Haskell modules from each slice for compilation. Those generated modules are not pretty, but I’ll list one anyway.

{-# LANGUAGE NoImplicitPrelude, DeriveDataTypeable,
  StandaloneDeriving, MultiParamTypeClasses,
  NondecreasingIndentation, ExplicitForAll,
  PatternGuards #-}

module F8703530133646291376 where

import Data.List (unlines)
import GHC.Base ((.))
import GHC.Base (map)
import F4814358126730938317 (drawTree)
import F2690501842250508231 (Forest)
import GHC.Base (String)

drawForest :: Forest String -> String
drawForest = unlines . map drawTree

The module name is a capital F concatenated with the slice’s ID. It imports exactly those symbols from other slices and built-in modules that it needs. Currently all of base, ghc-prim and integer-gmp is built-in. The big number of language extensions is unfortunate but hard to avoid because language extensions are declared per module or even per package.

We could use the generated modules in our Haskell code via normal imports. But it is awkward to import from a module with a name that is partly some hash. We therefore introduce environments. You can think of an environment as a list of modules that reexport parts of the generated modules under saner module names. In other words an environment is a map from module name to list of exported symbols. We can import from the environment without any mention of slice IDs. Such an environment module could for example look like:

module Data.Tree (
    drawForest, Forest, [...]) where

import F4814358126730938317 (drawTree)
import F2690501842250508231 (Forest)

We can convert a list of regular Haskell modules into slices and an environment that provide the same functionality under the same names. In fact we have taken the example slice from the containers package. If we want to use containers we also need array and deepseq. There are 1074 slices in these three packages combined. The following example uses two of them directly: one that declares drawForest and one that declares Tree and Node.

module Main where

import Data.Tree (drawForest,Tree(Node))

main :: IO ()
main = putStrLn (drawForest [Node "Fragnix" []])

The environment generated from containers exposes a Data.Tree module. If we use this environment and invoke fragnix on the listed Main module we get the following output:

> fragnix Main.hs
[ 1 of 20] Compiling F2690501842250508231[boot] [...]
[ 2 of 20] Compiling F3665315464247256951[boot] [...]
[ 3 of 20] Compiling F5275745121659705090[boot] [...]
[ 4 of 20] Compiling F1433611137629712686[boot] [...]
[18 of 20] Compiling F4814358126730938317 [...]
[19 of 20] Compiling F8703530133646291376 [...]
[20 of 20] Compiling F7999362455043744817 [...]
Linking main ...

The resulting main executable works as expected. We actually needed only 20 of 1074 slices and got away with compiling only those.

The current state of fragnix is far from a useful tool. I appreciate any feedback and suggestions on the general idea as well as this early implementation.


  1. Great ideas, great post, great project!
    Looking forward to watching it progress.

  2. Any ideas for a tool to create environment modules?

  3. This comment has been removed by the author.

  4. Do you ultimately want to keep generating these fragments on disk, or make this all in-memory?

  5. It depends on what my goal is: If I wanted to see if this speeds up compilation or reduces the size of the resulting binary I would avoid writing the fragments to disk. But I want to distribute them so I have to write them to disk.