Wednesday, December 23, 2015

Releasing haskell-names-0.6.0

A new version of haskell-names is on hackage. haskell-names does name resolution for modules parsed with haskell-src-exts.
The API is very different in this new version. The most important type is now:
type Environment = Map ModuleName [Symbol]
An environment is a map from module name to list of symbols the module exports. A symbol is for example a value, a class, a method or a type. You use functions from Data.Map to work with environments, for example to create an empty environment.
Then there are two functions that work on environments (type class constraints omitted for brevity):
resolve :: [Module s] -> Environment -> Environment
annotate :: Environment -> Module s -> Module (Scoped s)
resolve takes a list of modules and an environment, finds the lists of symbols each of the given modules exports and inserts them into the given environment.
annotate takes an environment and a module AST and annotates the given AST with scoping information. Scoping information is for example the origin of an imported or local name or name resolution errors.
You load and persist environments with two functions:
readSymbols :: FilePath -> IO [Symbol]
writeSymbols :: FilePath -> [Symbol] -> IO ()
You have to come up with your own system to remember which file name corresponds to which module name. Internally the two functions use aeson to serialize symbols to json.
Finally, for quick experimentation there is a function
loadBase :: IO Environment
that gives you a base environment very similar to GHC’s base. An example for a program that uses the new API is on the github page.

Wednesday, March 25, 2015

A very first step towards fragment-based code distribution

Code package managers like Cabal, NuGet, Maven, Composer and Leiningen allow you to share and reuse code. They work in units of packages. What if we took not entire packages but much smaller code fragments as the unit of distribution? The idea to have finer units of code distribution and dependency has been around for quite a while. A very prominent example is this post by Joe Armstrong. Another example is this proposal to track dependencies on the function level.

I am working on an experimental code manager for Haskell called fragnix. It will allow users to share and reuse code in units of fragments instead of packages. Let’s look at an example fragment:

drawForest :: Forest String -> String
drawForest = unlines . map drawTree

A fragment is a small list of Haskell declarations. We can not use a fragment on its own. We need to know which other fragments it uses and which language extensions it assumes.

That’s why we introduce a slice. A slice consists of four parts:

  • A fragment.
  • Dependencies on other slices or built-in symbols.
  • A list of language extensions.
  • A unique ID.

The unique ID is a hash of the other three parts of the slice. In particular it includes the hashes of all used slices which in turn means that the hash transitively includes all fragments the slice uses. Fragnix stores slices in a structured way and generates an ordinary Haskell modules from each slice for compilation. Those generated modules are not pretty, but I’ll list one anyway.

{-# LANGUAGE NoImplicitPrelude, DeriveDataTypeable,
  StandaloneDeriving, MultiParamTypeClasses,
  NondecreasingIndentation, ExplicitForAll,
  PatternGuards #-}

module F8703530133646291376 where

import Data.List (unlines)
import GHC.Base ((.))
import GHC.Base (map)
import F4814358126730938317 (drawTree)
import F2690501842250508231 (Forest)
import GHC.Base (String)

drawForest :: Forest String -> String
drawForest = unlines . map drawTree

The module name is a capital F concatenated with the slice’s ID. It imports exactly those symbols from other slices and built-in modules that it needs. Currently all of base, ghc-prim and integer-gmp is built-in. The big number of language extensions is unfortunate but hard to avoid because language extensions are declared per module or even per package.

We could use the generated modules in our Haskell code via normal imports. But it is awkward to import from a module with a name that is partly some hash. We therefore introduce environments. You can think of an environment as a list of modules that reexport parts of the generated modules under saner module names. In other words an environment is a map from module name to list of exported symbols. We can import from the environment without any mention of slice IDs. Such an environment module could for example look like:

module Data.Tree (
    drawForest, Forest, [...]) where

import F4814358126730938317 (drawTree)
import F2690501842250508231 (Forest)
[...]

We can convert a list of regular Haskell modules into slices and an environment that provide the same functionality under the same names. In fact we have taken the example slice from the containers package. If we want to use containers we also need array and deepseq. There are 1074 slices in these three packages combined. The following example uses two of them directly: one that declares drawForest and one that declares Tree and Node.

module Main where

import Data.Tree (drawForest,Tree(Node))

main :: IO ()
main = putStrLn (drawForest [Node "Fragnix" []])

The environment generated from containers exposes a Data.Tree module. If we use this environment and invoke fragnix on the listed Main module we get the following output:

> fragnix Main.hs
[ 1 of 20] Compiling F2690501842250508231[boot] [...]
[ 2 of 20] Compiling F3665315464247256951[boot] [...]
[ 3 of 20] Compiling F5275745121659705090[boot] [...]
[ 4 of 20] Compiling F1433611137629712686[boot] [...]
[...]
[18 of 20] Compiling F4814358126730938317 [...]
[19 of 20] Compiling F8703530133646291376 [...]
[20 of 20] Compiling F7999362455043744817 [...]
Linking main ...

The resulting main executable works as expected. We actually needed only 20 of 1074 slices and got away with compiling only those.

The current state of fragnix is far from a useful tool. I appreciate any feedback and suggestions on the general idea as well as this early implementation.