Programming Language Manifesto
Published on
Over the past 10 years I have grown interested in a particular set of programming language features:
- Total
- Dependently typed
- Deterministic time and space
- One expression per file
- Minimalist
- Cacheable
- For software developers
Each one of these attributes is held by many languages, but combinations are rare, and as far as I can tell, the full suite is not supported by any one language. I am vaguely interested in building such a language. I will shortly explain what I mean by all those properties, but first let’s look at a few standout examples of languages that point in the general direction I am looking:
- Elm is total (mostly), and built for web application development. All applications are required to use The Elm Architecture, so you know exactly what things are and are not possible within a program, and you know exactly how a program can interact with its surroundings.
- Roc is toted as a “successor to Elm”, but built for general purpose application development and designed to compile to machine code. Roc code does not run immediately, but rather a selected “platform” runs your Roc code, passing it a functional interface to some particular class of application. Because of this, you know exactly what capabilities a Roc program has by understanding its selected platform. Some best-practice design points can be encoded directly into the platform.
- Dhall is total, dependently-typed (in a limited way), minimalist, cacheable, and has one expression per file. If you can see past the explicit type applications, the unhelpful parse and compile errors, and the general lack of ergonomics, there is a very beautiful gem hiding within. Dhall’s unusual design gives it software engineering superpowers I have not seen in any other language. If you place both Dhall source and its output in the same git repository, you can fearlessly refactor because any mistakes will manifest as changes to the output. There is no need for package files because of its ability to import and freeze arbitrary URIs. Because there is no configuration, and because it is designed like a Unix tool, it fits nicely in a scripting environment.
- Unison has a great caching story. All intermediate build artefacts are cacheable and shareable between developers, and even test results are cacheable. Most other languages are not designed to optimise build and test times in a software development team, and this leads to enormous amounts of time being spent waiting for these tasks to complete, even when the process will produce exactly the same output for everyone.
- F* has a very expressive type system, and its refinement types are integrated seamlessly. The way it lets you express the same structure in many different ways means I can express my thoughts more directly than I can in Haskell.
- Jq proves that modern tools can still be built in the style of old Unix tools. It is incredibly fast. A program can be passed as an inline command-line argument, or placed in a file. It reads from
stdin
and writes tostdout
andstderr
. Theman
page includes everything you might want to know.
Now to explain each of the design attributes, and why I think they are important.
Total. This means every expression has a determinable value. Total languages are by necessity not Turing-complete, so they cannot express every possible program. I think this is a good thing, and that the limitations can be worked around by embedding the language into a more expressive one (this is Roc’s “platform” idea, and Elm’s “The Elm Architecture” idea). “Recursion” is allowable by induction over natural numbers – you give the function an amount of “gas” and it needs to return a result before the gas runs out. Because every program can be normalised, some refactors can be verified by checking if there were any changes to the normal form. Because programs can’t have side effects (only platforms can) it is safe to execute any code. eval
ing code you got from a network request is perfectly fine.
A basic web application might be a total function from request to response, with a platform providing the long-running event loop and error handling. A video game might be a total function from the state and latest set of inputs to a new state and a view, while the platform provides the event loop, interaction with hardware, and timing guarantees. There might be a platform to convert a normalised constant to a JSON or YAML value.
Dependently typed. As far as I can tell, languages need dependent types in order to properly be correct-by-construction. Many languages do not have dependent types, and end up with large amounts of features to compensate. You don’t always need them, but it seems that if they are not designed in from the beginning, they are hard to add later on. I want F*’s refinement types everywhere.
Deterministic time and space. I have worked in hard real-time environments with limited resources. These limitations push you into using a “systems” language wherein you have to forfeit expressivity. It is infeasible to use heap allocation in this environment because allocation time is nondeterministic. I have a hunch (though unproven) that a total language can have deterministic runtime and memory requirements. If this is true then the compiler should report how much memory is required.
Minimalist and one expression per file. The operating system should be a perfectly good programming environment. Tools that lean into this philosophy tend to be easier to understand and plumb together. A source file should be self-standing and not require an external configuration file. That means there should not be a fancy module system, but rather that directories should automatically behave as namespaces and files as individual modules. Ideally you can import a whole directory and get access to its contents by dereferencing filenames. It should be possible to import the expression represented by another file by referencing its file path, or any URI. Importing external file types should just work for some file types, like plaintext or JSON. Freezing imports should be achievable using a tool, similar to dhall freeze
which adds a sha256 hash after each import. It should be possible to write a program on the command line as an input, in the style of awk
, sed
, and jq
.
Cacheable. If a language is deterministic then it is possible to store build artefacts right next to the source code, signed with all inputs to that artefact including the compiler itself. If you clone a git repository for the first time, it should come with intermediate build artefacts so you don’t need to build, and it should come with test results so you don’t need to run the tests. (Dependent types should give the ability to specify unit tests as types, so test errors are actually type errors). Because caching includes remotely-downloaded artefacts (raw and normalised), all dependencies are “vendored” so local development does not require internet access, and your project is not vulnerable to things like the package repository going down or the maintainer deleting their package. Locally patching a dependency is easy: you just copy the dependency out of cache into your project directory and make whatever change you need.
For software developers. It is a pity that so many great programming languages are designed for use in academia, with real-world use appearing as an afterthought. Software engineering poses certain demands and constraints that are (quite reasonably) ignored for research languages, like the fact that people need to get up to speed with a language very quickly, learn details gradually, understand error messages, integrate with software written in other languages, use libraries that have become unmaintained, interact with colleagues, find documentation, dig through years of git blame
to discover why on earth the code is written that way, make changes quickly in high-stress scenarios, perform dependency updates, refactor enormous amounts of code over a weekend, run CI/CD, deploy something that mostly works, etc.
My ideal language would be general purpose, so that it could be used for most applications. Common refactorings and other operations should be tool-supported. Error messages should be oriented toward solving an engineering problem rather than a math problem. Updates to the language should not break existing code (so files need to support fixing a language version). For ease of learning, the language should be a superset of JSON (that means it needs to support something like row polymorphism). Shebangs should be supported, so #
should be the comment symbol. The syntax and tooling should permit use within a shell pipeline. The parser should retain comments and formatting so that automatic formatters and refactoring tools can be smart. Absolute correctness is not the primary goal, which permits a bunch of useful things things that might mortify a type-theorist: refinement types powered by an external SMT solver; theorems that are “proved” by randomised property test; JSON serialisation verified by validating many possible values against a JSON schema with ajv
; etc. Responsibility for program safety should lie entirely with the executing platform so security does not depend on regularly updating your package dependencies. If something works, it’s going to continue working: the fact that a library has not had a commit in 3 years should not deter you from using it.
Totality seems to permit clever refactoring tools, such as this imaginary one: replace selected code with a function, by normalising the code and then iteratively reverse-applying all defined functions and asking the user to prune or elaborate on the structure.