A photograph of an old European house, with different portions of the facade build from distinct materials, such as stone, red brick, white brick.

Composable Data Validation with Haskell

By Ben Levy and Christian Charukiewicz — July 26, 2021

haskell

Recently, a client asked us to work on a new Rules Engine for them. This system serves as the back end to a highly configurable dashboard where non-technical users define business rules for how a certain component of the system behaves. When deployed, this system has to handle a substantial production workload, so performance is also a key consideration.

So, our primary requirements were that this rules engine would be:

Configurable - The end user should be given toggles to get the desired behavior from the fixed kinds of rules.
Fast - This system should be able to handle several hundred validation requests per second.
Robust - The system should not go down and should be easy to modify on the fly.

A secondary requirement related to robustness would be that the rules be:

Declarative - A product manager should be able to look at the rule code and have a reasonably accurate understanding as to what the rule is doing.

Creating an eDSL

In order to meet the above requirements, we decided to write a small embedded domain-specific language (eDSL) to enable writing declarative validation rules. This article will show a simplified version of the actual language being used in production. We will be using a shallow embedding also known as a monomorphic final tagless encoding.

The first question that we must ask ourselves is not what operations we want to perform but what is the essence of our problem or domain, this is often called denotational design.

In our case, this is relatively simple—a validation is a mapping from an input value to whether it is valid or not. Putting this into code:

newtype ValidationRule a = ValidationRule { validate :: a -> Bool }

Given this as a basis we can start to write functions that help us to build these validation rules, for our basic rules we will use a convention of a trailing “_” to avoid name collisions. A simple example is the equality rule:

eq_ :: Eq a => a -> ValidationRule a
eq_ ruleValue = ValidationRule $ \actual -> actual == ruleValue

In the above, ruleValue is a value that is embedded in the ValidationRule whereas actual is the value that this rule will attempt to validate during a validation operation. An example of how this rule can be built and configured is:

equalsFive :: ValidationRule Int
equalsFive = eq_ 5

Already we see that rules are composed of both static and dynamic parts. The shape of the rule is statically determined by the function eq_ but the value (which is 5 in our example) can be any runtime value. eq_ isn’t the only rule we may want. Let’s make a few more basic rules:

lt_, gt_ :: Ord a => a -> ValidationRule a
gt_ ruleValue = ValidationRule $ \actual -> actual > ruleValue
lt_ ruleValue = ValidationRule $ \actual -> actual < ruleValue

Making rules composable

With this, our eDSL gives us a way to compare a value to something, but this alone is not very useful. We can make bespoke rules each time there is a new business use case, but what we need is the ability to build up larger validations. In boolean logic we have two main operations for conjunction and disjunction also know as AND and OR. Let’s write them:

and_, or_ :: ValidationRule a -> ValidationRule a -> ValidationRule a
and_ rule1 rule2 = ValidationRule $ \actual ->
    validate rule1 actual && validate rule2 actual
or_ rule1 rule2 = ValidationRule $ \actual ->
    validate rule1 actual || validate rule2 actual

It’s easy for us to add support for the inverse, or NOT:

not_ :: ValidationRule a -> ValidationRule a
not_ rule = ValidationRule $ \actual -> not $ validate rule actual

We can compose the rules we have written so far to create new combinators. Let’s create the greater than or equal to operator (>= in most languages):

geq_ :: Ord a => a -> ValidationRule a
geq_ value = gt_ value `or_` eq_ value

Handling complex data types

At the moment, there is still a key limitation with our language. All of these rules need to be of the same type and we do not have a way to change that type. Usually when we see something that looks like f a and we want f b we reach for fmap on the Functor type class. Let’s try to write that for our type:

instance Functor ValidationRule where
    fmap :: (a -> b) -> ValidationRule a -> ValidationRule b
    fmap f rule = ValidationRule $ \actual ->
        validate rule ????

We have run into a problem, we have a function a -> b and a ValidationRule a and want a ValidationRule b. This means when we try to construct our new validation rule we have a value actual :: b but validate rule needs something of type a. If we had a function b -> a we could achieve this. However, f is the reverse of what we need. It seems like we want something that is like a reverse functor:

notQuiteFmap :: (b -> a) -> ValidationRule a -> ValidationRule b
notQuiteFmap f rule = ValidationRule $ \actual ->
    validate rule (f actual)

This function has a name, contramap, and it belongs to the Contravariant Functor also known as a Cofunctor. The documentation for Contravariant defines the difference between the two:

Whereas in Haskell, one can think of a Functor as containing or producing values, a contravariant functor is a functor that can be thought of as consuming values.

In fact, the example used is Predicate a which is exactly the same as our type ValidationRule a. It’s always nice when you can find a preexisting type to validate your approach.

The example provided (checking whether an account balance is overdrawn) is good demonstration of how to use contramap, so let’s implement it. First we set up the data type of our account and our negative rule which is in terms of an Integer:

data Account = Account
    { accountBalance :: Integer
    , accountName :: Text
    , accountOwner :: Text
    }

negative_ :: ValidationRule Integer
negative_ = lt_ 0

We want to be able to validate an Account but we only have a rule that works on Integer and we don’t want to be writing lots of one-off rules. This is where we can apply the contramap function:

overdrawn :: ValidationRule Account
overdrawn = contramap accountBalance negative_

That’s fairly terse, so what does it mean? In plain English we can say an Account (the type that’s being validated) is overdrawn (the name of the function) if the accountBalance (the name of the field we are using contramap with) is negative_ (the rule we contramapped). We can imagine this being used in a larger validation:

accountOwnedBy :: Text -> ValidationRule Account
accountOwnedBy owner = contramap accountOwner $ eq_ owner

withdrawalAllowed :: ValidationRule Account
withdrawalAllowed =
    accountOwnedBy "Alice" `and_` (not_ overdrawn)

So we are now able to make fairly complex validation rules which read almost verbatim like English. “A withdrawal on this Account is allowed if the account is owned by Alice and it is not overdrawn.” That’s pretty cool but we’re starting to see the ergonomics of our API fray.

Cloudtrellis

A new service built by Foxhound Systems Discover problems with your website before your users do

Cloudtrellis scans your entire site for broken links, accessibility issues, and SEO errors to ensure a flawless user experience.

Detect error pages, broken links, accessibility issues, and SEO problems
Create scans with tailored configurations for each website and subdomain you manage
Schedule scans to run monthly, weekly, or even daily to closely monitor for new issues
Get notified of new scan results via email
Share scan results with your team via direct link

Learn more

Adding validation results

Looking at the above example, we know if a withdrawal is allowed or not but we don’t know why it is not allowed. For example, the failure may have occurred due to the account being overdrawn, but it could also have been because the owner of the account was Bob rather than Alice.

Let’s revise our semantic domain a bit to help us keep track of why something failed. In order to track this we’re going to define a Validation data type instead of using a Bool for our return value:

type ErrMsg = Text

data Validation err
    = Success
    | Failure err

type ValidationResult = Validation [ErrMsg]

success :: ValidationResult
success = Success

failure :: ErrMsg -> ValidationResult
failure errMsg = Failure [errMsg]

newtype ValidationRule a =
    ValidationRule { validate :: a -> ValidationResult }

We’re going to need to rewrite our core functions now to use the new representation. We will only rewrite a few of them. The rest are left as an exercise to the reader.

eq_ :: (Show a, Eq a) => a -> ValidationRule a
eq_ value = ValidationRule $ \actual ->
    if actual == value then
        success
    else
        failure (Text.pack $ "Expected " <> show actual <> " to equal " <> show value)

and_ :: ValidationRule a -> ValidationRule a -> ValidationRule a
and_ rule1 rule2 = ValidationRule $ \actual ->
    case (validate rule1 actual, validate rule2 actual) of
        (Failure e1, Failure e2) -> Failure (e1 <> e2)
        (Failure e1, _)          -> Failure e1
        (_, Failure e2)          -> Failure e2
        (Success, Success)       -> Success

or_ :: ValidationRule a -> ValidationRule a -> ValidationRule a
or_ rule1 rule2 = ValidationRule $ \actual ->
    case (validate rule1 actual, validate rule2 actual) of
        (Failure e1, Failure e2) -> Failure (e1 <> e2)
        (Success, _)             -> Success
        (_, Success)             -> Success

Now we can run our validate function and if there is a failure we will know why the validation failed. And the best part is, we don’t actually have to change how the top level rule is written (though we do have to recompile since we’re not taking full advantage of the polymorphic final tagless approach). Since only our primitives are aware of ValidationResult, we did not have to update any of the more sophisticated business rules during this change.

Conclusion

In this post we solved the problem of configurable validation by creating an eDSL that allows us to create sophisticated business rules through the composition of primitive rules. The performance of validating inputs using this approach is very good since there is almost no interpretive overhead when composing functions as we have done here.

The eDSL we have shown here is only slightly simpler than what we built for our client, and illustrates the core ideas clearly. A language like this one can continue to evolve as necessary, oftentimes without requiring existing rules to be rewritten, as we saw above. One aspect we haven’t covered in this post is adding context to rules, giving us the ability to see where one of our primitive rules failed relative to a larger validation (e.g. we expected “Alice” but got “Bob” in the context of validating the account owner).

Ben Levy and Christian Charukiewicz are Partners and Principal Software Engineers at Foxhound Systems. At Foxhound Systems, we focus on building fast and reliable custom software. Are you looking for help with something you’re working on? Reach out to us at [email protected].