Software is great, but sometimes life is too short to write programs bound by pipelined processors, one-size-fits-all arithmetic units, and network cards.
High-frequency trading systems exemplify this. When counting latency in nanoseconds, every clock cycle matters. This is why trading applications like punt-engine are implemented in hardware description languages (HDLs) that compile1 to configurations of registers and digital logic, unlike traditional programming languages that instruct pre-built processors.
This post is primarily a resource for new punt-engine contributors to see first-hand how we describe hardware using a functional (as in pure and declarative, not just “working”) HDL. Others can find a quite beautiful application of a typed lambda calculus to map input wires to output wires in described circuits, producing faster applications than would be in C2 Haskell is faster than C (╯°□°)╯︵ ┻━┻.
Finite state machines
Finite state machines (FSMs) model computation by defining a finite set of program states, an initial state, and a function deriving state transitions from inputs. Transducers are a class of state machines often used to control applications. There’s two distinct types: Moore machines and Mealy machines.
Introducing our example
To illustrate the differences between Moore and Mealy machines, we’ll implement both using Clash, a functional HDL implemented as a subset of Haskell. Our example will focus on detecting rising edges (transitions from 0 to 1) in a bitstream:
- Moore: Outputs 1 one clock cycle after detecting a rising edge.
- Mealy: Outputs 1 immediately upon detecting a rising edge.
We’ll simulate both implementations to see how the different machines respond to the input.
Boilerplate code
Before getting into the specifics of the implementations of the machines, let’s set up some boilerplate. Since we’re solving the same problem in two different ways, we’ll be using the same language pragmas, test input, and simulation function. Below is some generic boilerplate I produced by removing the word “moore” or “mealy” in shared code between my two circuits and just replacing it with “fsm”. Do the inverse to create runable code.
With this out of the way, we’ve reduced the problem to implementing state machine types, a state transition function, and an output function.
Moore machine
In a Moore machine, outputs are determined solely by the current state. Outputs only change when state changes, irrespective of the input. Consequently, there’s a one-clock-cycle delay between an input change and the corresponding output change.
We can implement a Moore FSM to detect rising edges3:
We can use clashi
, a Clash repl, to compile our file and call its members. Running the simulation gives us this output:
Recall that simulateMoore
is a de-generalization of the boilerplate simulation function. You can see in the output that our circuit’s detection of the rising edge is delayed by one clock cycle. Since the Moore machine’s output is purely based on its state, we spend the first clock cycle updating the state based on the input, and the second one outputting it.
Let’s consider a different design that does both of these steps at once.
Mealy machine
In contrast with Moore machines, Mealy machines derive outputs from both the current state and the current inputs.
Consider the same circuit that uses a Mealy machine to react to a rising edge immediately based on the input, instead of updating the state on this cycle and the output on the next one like the Moore one did. You’ll find that this circuit appears to be a clock cycle ahead:
Run the simulation function:
Our Mealy machine outputs the rising edge in the input as soon as it is recieved.
:t mealy
Like most things in Haskell, mealy
is a function. We use this function provided by the Clash prelude to describe a Mealy machine.
You can use ghci
or the Clash docs to find that the mealy
function has a type:
This is a bit of a daunting type signature, but it makes sense when you consider the definition of a Mealy machine. Recall:
In contrast with Moore machines, Mealy machines derive outputs from both the current state and the current inputs.
Running a Mealy machine requires three things:
- a state transition function. That’s the point of a state machine, after all.
- an initial state. In our case, this is
(MIdle, 0)
. We start in the Idle state, and we want to output a rising edge if the input starts with 1. - an input wire. Our function needs an input to produce outputs, or in other words, our circuit needs an input wire to assert an output wire.
With these three parameters, we produce one output: a wire holding the value of the accumulator after performing our operation. This is the last entry in the chain of ->
in our type signature: Signal dom o
.
With the type signature understood, consider how we call mealy
in our code:
We’ve called mealy
with two parameters, yet mealy
expects three. By supplying just two parameters, we’ve performed partial function application to construct mealyMachine
, a function that takes in an input wire Signal dom Bit
as input and outputs a different Signal dom Bit
.
Conclusion
We’ve compared finite-state transducers (Moore and Mealy machines) via their implementations of a trivial rising-edge-detector circuit in Haskell and Clash.
The punt-engine project uses this paradigm to synthesize various cores in our FPGA-accelerated trading engine. For more info on our project, see our GitHub.
Footnotes
-
Clash transforms Haskell to VHDL, Verilog, or SystemVerilog; “transpile” is likely a better word here. ↩
-
Or perhaps, more specifically parallelized. ↩
-
This implementation has syntax likely foreign to imperative programmers. I try explaining most of it in the comments, but you can see Philipp Hagenlocher’s playlist Haskell for Imperative Programmers for a more comprehensive guide. ↩