Facebook Twitter GitHub LinkedIn LinkedIn LinkedIn

Haskell Software Development Principles

Table of contents

  1. Design functions independent of their context
  2. Organize modules by common purpose
  3. Break convention with intention
  4. Write readable code rather than less code
  5. Refine and improve data types
  6. Avoid overextending types
  7. Be strategic, not tactical
  8. Rely on automated unit and integration tests
  9. Find tools and libraries that already exist
  10. Use explicit imports and exports

Design functions independent of their context

We use functions to break our code into small pieces that solve specific tasks. However in many cases, a function may be useful in many different contexts, so we strive to reuse the function. Maximizing the reusability of a function requires designing for general applicability. Keeping the name, parameters, and dependencies of the function general and context independent not only helps with reusability, but also with keeping code clean and easy to understand. This is because the function will exist in its own context, rather than depending heavily on the context for which it was initially used.

A simple example of this is when we have a Text field from the database that needs to be converted into a domain type. The context-specific naming of that function might resemble something like convertDbFieldToDomainField. The better, general purpose name for this function is simply fromText. Now we do not need any information about the DB type to understand what this function is doing.

It also can be used in more situations and the function name will still accurately describe what it is doing. If the function is named convertDbFieldToDomainField, there is the risk of another developer not viewing this function as reusable, and possibly reimplementing the same functionality elsewhere. We can imagine someone writing convertRequestFieldToDomainField that performs the same conversion but on the JSON data sent to the API instead of what is in the database.

Let’s look at a concrete example. Here’s an example of a context-aware implementation of a conversion function in our Domain.OrderStatus module.

module Domain.OrderStatus where

data OrderStatus
    = Pending
    | Completed
    | Refunded

convertDbOrderStatusToDomainOrderStatus :: Text -> Maybe OrderStatus
convertDbOrderStatusToDomainOrderStatus statusVal =
    case statusVal of
        "PENDING"   -> Just Pending
        "COMPLETED" -> Just Completed
        "REFUNDED"  -> Just Refunded
        _           -> Nothing

The above function will convert the order status string we store in our SQL database into a domain-specific value represented by the OrderStatus domain type. This conversion is both necessary and useful, but the function name implies that it isn’t generally applicable. Another developer looking to convert an order_status field in our incoming API request data is likely to ignore convertDbOrderStatusToDomainOrderStatus as being suitable for the conversion. So instead, they may write:

convertRequestOrderStatusToDomainOrderStatus :: Text -> OrderStatus
convertRequestOrderStatusToDomainOrderStatus statusVal =
    case statusVal of
        "PENDING"   -> Just Pending
        "COMPLETED" -> Just Completed
        "REFUNDED"  -> Just Refunded
        _           -> Nothing

Notice that the body of convertDbOrderStatusToDomainOrderStatus and convertRequestOrderStatusToDomainOrderStatus are identical. This is because they are the same function, just with a different name.

What should we do instead? Let’s change our function to use context-unaware naming:

module Domain.OrderStatus where

data OrderStatus
    = Pending
    | Completed
    | Refunded

fromText :: Text -> OrderStatus
fromText statusVal =
    case statusVal of
        "PENDING"   -> Pending
        "COMPLETED" -> Completed
        "REFUNDED"  -> Refunded

While the above example may seem obvious, the concept holds true for more complex situations. Perhaps the parameters are very specific to one application of the function without good reason. Perhaps our function is doing too much at once and can be broken down into more reusable, general purpose functions. Perhaps the parameters depend on a large record type but it only acts on one field. Be aware of making functions dependent on more than it needs.

[return to top]

Organize modules by common purpose

There is a tendency to split types and functions into separate modules, to have shared helper modules, or to have modules that follow other popular framework file structures. It’s somewhat common to see a structure like:

src/
├── Handler/
│   ├── Customer.hs
│   ├── Order.hs
│   └── Refund.hs
├── Types/
│   ├── Customer.hs
│   ├── Order.hs
│   └── Refund.hs
└── Utils/
    └── OrderValidator.hs

In the structure above, we can see a top level split in the namespace between our handlers and our types, as well as an additional “utils” namespace. This may seem intuitive at the start of a project but as a project evolves, it makes less sense since we will likely have multiple types representing a customer, order, and refund.

Imagine that the above structure eventually grows to the following, as we add repository modules for accessing our database:

src/
├── Handlers/
│   ├── CustomerHandler.hs
│   ├── OrderHandler.hs
│   └── RefundHandler.hs
├── Repository/
│   ├── Customer.hs
│   ├── Order.hs
│   └── Refund.hs
├── Types/
│   ├── DB/
│   │   ├── CustomerModel.hs
│   │   ├── OrderModel.hs
│   │   └── RefundModel.hs
│   ├── Customer.hs
│   ├── Order.hs
│   └── Refund.hs
└── Utils/
    └── OrderValidator.hs

Now we may have an OrderRequest, OrderResponse for our handler, anOrder domain type for our business logic, and another OrderModel that our database library will use.

With the top level Types namespace, it becomes very unclear which of the four kinds of types should go into it. If we want to put all of them there, then we need to split our Types namespace into submodules something like Types.HTTP, Types.Domain, and Types.DB. Doing this is a bad idea as it becomes unwieldy. What we should do instead is define our types in the modules that most closely need them, and break them out only when necessary. If we really need types modules, it’s better to do something like:

src/
├── Handlers/
│   ├── Customer/
│   │   └── Types.hs
│   ├── Order/
│   │   ├── Types.hs
│   │   └── Validator.hs
│   ├── Refund/
│   │   └── Types.hs
│   ├── Customer.hs
│   ├── Order.hs
│   └── Refund.hs
└── Repository/
    ├── Model/
    │   ├── Customer.hs
    │   ├── Order.hs
    │   └── Refund.hs
    ├── Customer.hs
    └── Order.hs

With the above structure, it’s pretty clear that Handler.Order and Handler.Order.Types are related. It’s very easy to see this through the logical placement of the modules in the directory structure of the project. Moreover, if we need to depend on the handler’s types, but do not want to import the handler’s implementation, we can still import only the latter module.

A downside of breaking out types from implementation is that when both are typically necessary, more imports are required. For example, breaking up a DateTime module into Service.DateTime and Types.DateTime just means that we will need to import both modules when we want to work with DateTime values. It is much more intuitive to keep functions with the types they act on. And conversely, it is also more intuitive to keep types that are only used with a specific collection of functions with those functions.

A benefit of this approach is that it will allow us to namespace functions and types in a manner that lends itself to good organization. For example, if we have a DateTime module with a toUTC function, we can call it using DateTime.toUTC. This is easy to read and it is clear that it is converting a DateTime rather than just a Time or some other type to a UTC time. It allows us to simplify naming conventions due to the namespace. If this function existed in a Utils module, it would be less clear what type it is converting. It might even need to be called dateTimeToUTC to avoid clashing with other functions with the same name. (See the note about generic names and ambiguity in the previous section and the importance of designing for qualified imports.)

Overall, it is easier to conceptualize this type of organization. Types and functions work together, so they should be organized in the same manner. It is the same way that classes usually contain both methods and the data those methods act upon. This encapsulation makes these modules more self-contained and easy to follow.

[return to top]

Break convention with intention

When contributing to larger projects, code bases can become messy due to individual preferences, design patterns, and organizational differences interweaving and clashing. Even though these variations in design patterns and organization do not break anything, they frequently add little value and the inconsistency they introduce makes it much more difficult for future developers to understand how the code is structured throughout the code base. As a general rule of thumb, it is more often than not beneficial to follow the conventions and organization that is already in place.

For example, we can imagine a project with several request handler modules in a Handler namespace:

src/
├── Handler/
│   ├── CustomerHandler.hs
│   ├── OrderHandler.hs
│   └── RefundHandler.hs
└── ...

Another developer may look at this structure in the future and think, “The Handler namespace (directory) that these modules are in in addition to the Handler suffix each module has is redundant. For all new handlers I will omit the suffix.” The developer then goes on to make two additional handlers for metrics and transactions. So the directory now looks like:

src/
├── Handler/
│   ├── CustomerHandler.hs
│   ├── Metric.hs
│   ├── OrderHandler.hs
│   ├── RefundHandler.hs
│   └── Transaction.hs
└── ...

The developer has broken convention in order to remove the redundancy. It is true that now our import statements for the new handlers are more concise (we can write import Handler.Transaction instead of import Handler.CustomerHandler). However, now there’s a significant inconsistency introduced in the handler modules that can make it very confusing to both the original and other developers when they look at this code. Does the Handler at the end of the individual modules suffix mean something? Are the Metric and Transaction modules actually handlers, or do they serve some other purpose? It turns out that the answer is no, but the inconsistency in naming suggests otherwise.

So what should we do if the existing structures are less than ideal? Does this mean the first pass is always the best pass? Are we forever stuck with the repetitive Handler namespace and module suffix in the above example? Certainly not. In many cases, the code base can benefit from refactoring the organization and design patterns. However, this decision needs to be made carefully, intentionally, and thoroughly. If it makes sense to add an opposing pattern to the existing flow, it more than likely also makes sense to invest time to convert the old patterns.

In the case of the above example, the developer who decides to eliminate the redundancy should do so from all of the handlers in the Handler namespace, leaving the project structured as follows:

src/
├── Handler/
│   ├── Customer.hs
│   ├── Metric.hs
│   ├── Order.hs
│   ├── Refund.hs
│   └── Transaction.hs
└── ...

Whether to break convention is a judgment call that should take into account the value the break in convention adds, how much time it will take to convert the old code, and how much noise it is likely to introduce. It is important to weigh the pros and cons of these changes in every situation. We should not stick to the style that we are most comfortable with only because we are comfortable with it.

[return to top]

Write readable code rather than less code

“Any fool can write code that a computer can understand. Good programmers write code that humans can understand.”

— Martin Fowler

After getting our code working, our main goal as developers is to make it as easy as possible for future developers to understand, fix, and add to it. Comments can describe complex functionality and give insight as why a certain piece of code was written. However, we should also strive to write self-documenting code. We want to strive to do so to the greatest extent possible, since it is easy to do the opposite and write confusing code that is difficult to understand.

In Haskell, there are often many ways to write code that achieves the same outcome. It’s almost always possible to reduce the number of lines of code we’ve written. However the reduction in lines of code comes at the cost of the readability, it probably is not worth it. It is okay to accomplish what we need to with the “boring” way rather than with some less known syntactical sleight-of-hand.

[return to top]

Refine and improve data types

Our code will revolve around the data types that we define. The more accurately that we represent the data with types, the easier our code will be to work with, as the types will help reinforce a mental model for us and anyone else working on the same code base. Haskell has a very powerful type system that allows us to represent data in ways that will keep our data in valid states. Utilization of these data types is conducive to cleaner code and fewer bugs. It all starts with making sure our types effectively represent our data.

The Haskell type system is built up of composable components starting with the primitive types every programmer knows from most any language they’ve used before, be it statically typed or not. This composability allows us to reach for increasingly sophisticated building blocks as we seek to represent our data in types.

Let’s walk through a simple example of refining our types. Suppose we want to model a customer of our ecommerce system. We know that the customer will have an email address and a name.

type CustomerEmail = Text
type CustomerName = Text

We can start by defining type aliases where our customer is represented by their email and name, modeled as Text types. Obviously this is a very limited implementation. The problem is that a type alias is just renaming a type—anywhere we use CustomerEmail or CustomerName we can replace it with Text. The following type signatures are all the same as far as the compiler is concerned:

isAuthenticated :: CustomerEmail -> CustomerName -> Bool

isAuthenticated :: CustomerName -> CustomerEmail -> Bool

isAuthenticated :: Text -> Text -> Bool

This doesn’t provide much type safety. As we pass around this customer data through our system, we can accidentally flip CustomerEmail and CustomerName at any point and the compiler won’t help us at all. Let’s improve this by defining a data type instead of only relying on type aliases.

type CustomerEmail = Text
type CustomerName = Text

data Customer = Customer CustomerEmail CustomerName

With the above implementation, we now have some type safety afforded to us. The Customer data type we defined now has two fields inside of it, one for email and one for name. We continue to rely on the type aliases we defined earlier. However, we still run into the type alias interchangeability issue mentioned earlier, where there’s nothing protecting us from incorrectly referencing the fields within the customer.

We can improve this further by converting this into a record type, which will allow us to give labels to the field names.

data Customer = Customer
    { customerEmail :: Text
    , customerName :: Text
    }

Now we finally have an implementation that makes it clear what specific data the customer contains, and the customerEmail and customerName fields we’ve defined will serve as accessor functions that will extract the information we want, eliminating the risk of flipping the two fields when accessing them.

We can extend this type safety further by defining additional data types for each field, such as EmailAddress for the email field and a NonEmptyText for the name field. These data types would include smart constructor functions such as parseEmailAddress :: Text -> Maybe EmailAddress that would allow us to construct EmailAddress values that indeed contained valid email addresses.

The key takeaway here is that we should continue to refine and improve our data types in order to improve type safety and model our data more accurately. Sometimes the type of refinement we walked through can happen at the outset of introducing our types, meaning when we are first introducing our type we can iterate on it before settling on the best option. Other times, however, data types already exist in an application and we see opportunities to improve or extend them in order to better fit our evolving application needs. The good news is that even if types are being changed in a mature code base, Haskell’s compiler will protect us from introducing type errors and the refactoring process of updating code that relies on the type tends to be straightforward.

Ultimately, taking opportunities to improve our types is what will keep our code base healthy and will make maintenance and subsequent development easier. This type of iterative improvement is an essential part of writing production code in any language, and fortunately for us, Haskell makes it easy to do so.

[return to top]

Avoid overextending types

We often think of data types as representing an absolute state of some data. While they do represent our data, it is okay to have multiple types that represent that data in different contexts. In other words, don’t be afraid to have multiple types that all represent different forms of one abstract piece of our data.

We use data types to model consistent representations of our data. However, it’s important to recognize that no representation is absolute, and a single piece of information may be transformed into multiple types in our application depending on the surrounding context.

For example, we might have data that we call “User”. It is common to define a data type User that contains all of the fields that will represent a “User”. The mistake that is frequently made is to then attempt to use this single User data type for our incoming API requests, API responses, domain logic, and database persistence.

In such a case, it will almost certainly make sense to have multiple variations of a User type (typically namespaced according to different parent modules). For example, we will define an API.User, Domain.User, and DB.User modules, with several different User-related data types in each.

API.User

module API.User where

data User = User
    { id :: Int
    , email :: Text
    , name :: Text
    , role :: Text
    }

data UserRequest = UserRequest
    { email :: Text
    , password :: Text
    }

In API.User, we define a User data type which is our external representation of a User. This includes primitive types that can easily be encoded into JSON for our API response. In this module we also include a UserRequest that represents an inbound authentication request, containing only an email and the text of a password.

Domain.User

module Domain.User where

data User
    = AuthenticatedUser UserInfo
    | AnonymousUser

data UserInfo = UserInfo
    { id :: Int
    , email :: Text
    , name :: Text
    , role :: UserRole
    }

data UserRole
    = AdminUser
    | StandardUser

Next, we define a Domain.User, where we have the most expressive version of a user. Here, our User type allows us to model both authenticated and anonymous users depending on if the user interacting with the application has already authenticated. For example, we may render a “Login” button in the case of an AnonymousUser but a “Log out” button in case of an AuthenticatedUser.

Along with the User type, we have a UserInfo data type that contains similar fields to our API.User type, except here we use the custom union type UserRole to represent a user role instead of a Text. This makes it easier to use case statements when examining the user’s role and changing the application logic based on whether the user is an admin or not.

DB.User

module DB.User where

data User = User
    { id :: Int
    , email :: Text
    , passwordHash :: Text
    , name :: Text
    , role :: Text
    }

Finally, we define DB.User which directly represents the columns in our users database table. Here, we need to store a password hash for the purpose of authentication, and fields like role are converted back into Text for storage in the database. Some databases, such as PostgreSQL, support custom enum data types, where we would be able to define a user_role type for the users.role column to use, sticking to a textual representation is suitable in many cases.

The main idea behind the various data types for different representations of a User in these three modules is that different parts of our application have different concerns related to a User, and our goal is to create an optimal representation for each. For example, our primary domain logic may be concerned with what access role a user has, and will use that to determine what functionality the user is authorized to use, but it is not concerned with authentication and validating that a user who he says he is. Therefore, the domain types representing a User have no need for fields related to a password.

The domain module is also the only one that uses an algebraic data type (ADT) for representation of the user role. At the edges of our application—both the API and DB modules—we have to work with primitive types to send our data outside of the Haskell application, so the role field becomes a Text.

It’s important not to overextend and overshare the data types in the various parts of the application. It tends to be easier and more maintainable to write a function whose signature is DB.User -> Domain.User that performs the conversion to a domain user after retrieval from the database than it is to use a single data type in both parts of the application that has to compromise its representation in attempt to fit both. A single type for both will either be laden with too many fields or use sub-optimal representations (e.g. a Text where a custom ADT would be more beneficial).

At the same time, the advice to split data types should be applied sensibly. A data type for a new representation of some object in our system should only be created when it is actually necessary to do so. The right time to create a new data type is when we encounter an issue with sharing an existing type. Do not make the mistake of thinking “every object in my system needs an API, Domain, and DB representation.” Some objects may not require an API type since they are derived from other objects and are never exposed via the API. Other objects may not require a DB type since they aren’t stored in the database and instead only derived. Only add each category of type when it is necessary to do so.

[return to top]

Be strategic, not tactical

This principle was conceived of by John Ousterhout. See his talk, A Philosophy of Software Design for more discussion about it.

The tactical programmer has one main goal: to get it working as quickly as possible. The strategic programmer has two: to get it working AND minimize technical debt. This means they focus on implementing the best design patterns, minimizing mess, and doing everything they can to help future developers that will inevitably need to add to and fix that code we write today. Even with deadlines, it is important to value the strategic programmer and invest in minimizing technical overhead to invest in future development.

In Haskell, we have many tools at our disposal to help us lay out and refine our strategy. Earlier we discussed refining our data types as well as not overextending them. These are two of many ways to take a strategic approach to creating, maintaining, and evolving our code base as the needs in our application change.

[return to top]

Rely on automated unit and integration tests

With the Haskell compiler turning many runtime errors into compile time errors, there is a natural tendency to shy away from writing automated unit and integration tests. However, tests help with much more than making sure the program doesn’t crash when the code is run. Most code ends up having domain or business logic that can’t be modeled in the type system and only be verified through testing.

Even if we test the code manually, another developer may make changes that cause our code to behave erroneously and never realize incorrect results are being returned. It eases the mental burden of developers and documents how our code is expected to behave. Implement pinpointed tests thoroughly in a way developers can rely on them to know if something is broken.

One other key element of automated testing is writing tests that are easy to run. In an ideal setup, this would mean tests that ship with the code and that can be provisioned and executed with a few or even just one command. In many cases, tests that watch the code for changes and rerun upon compilation are also helpful and conducive to a fast development workflow. Tests that are difficult or extremely slow to run are unlikely to be used, and even less likely to be updated. Part of good testing practice is good testing ergonomics so all developers working with the code base can easily utilize the available tests.

[return to top]

Find tools and libraries that already exist

When there is a large task to complete, there may be an urge to dive in and work on completing it by writing all the implementation code and tests required. We have a tendency to lean towards creating a brand new implementation of everything required for the task, so a custom solution seems like the only answer. Sometimes it is. However, it is always better to spend some amount of time to see if the solution already exists and is available to use. Whether it is a function in another module in the existing code base, a library we can import, or developer tool, if we can avoid spending copious amounts of time only to add more complexity to the system, it is worth the amount of time researching off-the-shelf solutions.

In Haskell, we have both Hackage and Hoogle at our disposal to find open source code that we’re looking for. Hackage is the central package repository of open source that we can search using high level terms to find packages that we’re looking for. For example, we can search for terms like “csv”, “xml”, or “parsing” to help us identify useful packages. Sorting the search results by download count can be particularly helpful for identifying more commonly used and better maintained libraries.

Hoogle, on the other hand, can be used to search for specific functions within packages. With Hoogle, we can even search by type signature if we have a specific function in mind but don’t know whether it exists or which package it is in. For example, if we can think of the type signature for searching a list for an element but can’t think of the function, we can search Hoogle for a -> [a] -> Bool, which points us to the elem function.

For searching within our own code base, most text editors offer built in search support. For example, Vim and Emacs use ctags and etags, which can be generated by simple utilities, to help us quickly jump around our code base. VSCode, another popular text editor, has numerous plugins available for quickly jumping throughout our code base. Taking some time to configure one of these tools can be immeasurably helpful, especially when working on a code base split into many repositories and constantly evolving. We can search through all of the source code available to us quickly and effectively by using the proper tools.

[return to top]

Use explicit imports and exports

A simple way to help both ourselves and future developers and improve code readability is to avoid importing and exporting more than we need. In Haskell, we can configure compiler warnings that highlight unused imports. However, when importing many modules we can improve on this by defining an explicit import list that contains only what we need. Moreover, we can also use qualified imports to ensure that modules that contain functions with common names are namespaced and avoid clashing with our context.

For example, we may write code that looks like the following:

import Prelude
import Data.Text

isUsernameTooLong :: Text -> Bool
isUserNameTooLong name = length name > 32

Unfortunately, this will result in an error:

Main.hs:4:26-31: error:
    Ambiguous occurrence ‘length’
    It could refer to
       either ‘Data.Text.length’, imported from ‘Data.Text’
           or ‘Prelude.length’,
              imported from ‘Prelude’ (and originally defined in ‘Data.Foldable’)

We can avoid this problem entirely by importing Data.Text with a qualified import.

import Prelude
import qualified Data.Text as T

isUsernameTooLong :: T.Text -> Bool
isUserNameTooLong name = T.length name > 32

With the above approach, the ambiguity error disappears because we are explicit about which length function we are using. This becomes even more important with domain specific modules, such as repository modules. It’s far easier to understand the meaning of User.findById than it is only findById, which is likely implemented in many of the repository modules in our application. Seeing the latter likely requires us to examine the types or look at the module imports at the top of the file to understand exactly which findById is being invoked.

On the other side of the coin, when writing new modules we should be diligent to export only the intended public interface of the module. Using a repository module as an example again, we might see something like the following.

module Repository.User where

findById :: (MonadDB m) => Int -> m (Maybe User)
findById userId = do
    ...
    findBaseQuery
    ...

findByEmail :: (MonadDB m) => Text -> m (Maybe User)
findByEmail email = do
    ...
    findBaseQuery
    ...

findBaseQuery (MonadDB m) => m User
findBaseQuery ...

In the example above, findBaseQuery is intended as a starting point for querying for users, and the function is invoked in the body of findById and findByEmail, which merely accept different search conditions for querying against the same table. As written above, all three of these functions will be exposed and anyone who imports this module can invoke any of them. However, findBaseQuery is clearly not intended to be part of the public interface of this module, so we should create an explicit export list.

module Repository.User
    ( findById
    , findByEmail
    ) where

...

Changing the module definition in the first line of the file to the definition above limits the external interface of the module. This means that anyone who imports Repository.User will never have to see findBaseQuery in their own module scope. This is helpful, since as authors of Repository.User, we don’t intend on it being invoked directly. This helps anyone importing this module even if they aren’t using an explicit import list.

[return to top]