<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"
    xmlns:dc="http://purl.org/dc/elements/1.1/">
    <channel>
        <title>Foxhound Systems</title>
        <link>https://www.foxhound.systems</link>
        <description><![CDATA[Foxhound Systems is a software development company that specializes in building reliable, high performance web applications with superior security.]]></description>
        <atom:link href="https://www.foxhound.systems/blog/rss.xml" rel="self"
                   type="application/rss+xml" />
        <lastBuildDate>Fri, 27 Jun 2025 00:00:00 UT</lastBuildDate>
        <item>
    <title>The AI delegation dilemma</title>
    <link>https://www.foxhound.systems/blog/ai-delegation-dilemma/index.html</link>
    <description><![CDATA[<nav class="container mt-2 mb-2">
    <a href="https://www.foxhound.systems/blog/" class="no-underline magic-underline-slim">Back to all posts</a>
</nav>
<article class="grow post">
    
        <div class="lg-container post-banner">
            <picture>
                
                    <source srcset="https://www.foxhound.systems/img/2025-06-27-ai-delegation-dilemma/banner.webp" type="image/webp" height="717" width="1024">
                
                <img src="https://www.foxhound.systems/img/2025-06-27-ai-delegation-dilemma/banner.jpg" alt="A woman and a robot are in a living room, with the robot holding the vacuum cleaner and vacuuming a table. The woman looks annoyed and is scolding the robot." height="717" width="1024">
            </picture>
        </div>
    
    <div class="container">
        
        <h1 class="title">The AI delegation dilemma</h1>
        <header class="info gy-1">
            <div class="author-date space-y-1">
                <div>
                    <span class="date">June 27, 2025</span>
                </div>
                <div class="flex flex-wrap gx-2 gy-1">
                    
                        
                        <div class="flex gap-2 items-center">
                            <img class="rounded-full w-12" src="https://www.foxhound.systems/assets/img/team/christian-sm.jpg" alt="Photo of Christian Charukiewicz">
                            <div class="ml-1 flex flex-col">
                                <span class="font-bold">Christian Charukiewicz</span>
                                <span>Partner &amp; Principal Software Engineer</span>
                            </div>
                        </div>
                        
                        
                    
                </div>
            </div>
            
            <div>
                <div class="tags"><a title="Posts tagged: software-development" href="https://www.foxhound.systems/blog/tag/software-development/">software-development</a></div>
            </div>
            
        </header>
        <hr>
        <div class="content">
        <p>AI-assisted programming has rapidly become commonplace in software development. This is largely due to the rise of large language models (LLMs), which bring with them the promise of improved developer productivity and faster delivery. AI tools can excel when used surgically: fix this bug, document this function, generate a series of test cases. These bite-sized tasks are tightly scoped, so they are easy to specify and easy to validate. But once you ask AI to build larger components—like an entire authentication flow, a service layer, or even a full-stack CRUD application—you’re not just using AI to <em>assist</em>, you’re instead opting to <em>delegate</em> work to AI. This shift to delegation introduces a subtle but significant challenge.</p>
<p>Imagine you ask an AI to build a complete feature, such as a new account settings page in your web application. The LLM produces a correct-seeming result very quickly, which feels like magic. You test its output and find that a few things are off and require fixing. You go back to the LLM and ask for a few tweaks. This fixes some things but causes others to regress. You try again, but attempt to be more specific. Before you know it, you’re drowning in prompt revisions or sifting through code that doesn’t quite fit your architecture nor does it behave exactly as you wanted.</p>
<p>This is where the constraint of AI delegation emerges: what we refer to as the <em>AI delegation dilemma</em>.</p>
<!--more-->
<h2 id="a-triangle-of-trade-offs">A triangle of trade-offs</h2>
<p>The more you ask the AI to do, the more you find yourself navigating a three-way trade-off between the following priorities:</p>
<ul>
<li><strong>Design Fidelity</strong>: The design and functionality of the code reflects your desired intent—not the AI’s assumptions</li>
<li><strong>Prompt Efficiency</strong>: Writing the prompts requires less effort than writing the code yourself</li>
<li><strong>Output Usability</strong>: There is minimal post-generation cleanup or rewriting of the output code required of you</li>
</ul>
<figure>
<img src="https://www.foxhound.systems/img/2025-06-27-ai-delegation-dilemma/delegation-dilemma-triangle.svg" class="w-100 md-w-75" alt="The dilemma defines the inherent constraints when offloading large amounts of work to AI" />
<figcaption aria-hidden="true">The dilemma defines the inherent constraints when offloading large amounts of work to AI</figcaption>
</figure>
<p>You can only optimize for at most two at a time. Much like the classic “Fast, Cheap, Good: pick two” triangle, this model shows the constraints of AI-generated code:</p>
<ul>
<li>If you want code that matches your intended design (Design Fidelity) and generated results that are ready to use (Output Usability), you’ll spend substantial time writing and revising the suitable prompts (loss of Prompt Efficiency).</li>
<li>If you want a quick prompting experience (Prompt Efficiency) and don’t want to spend time revising the output (Output Usability), you’ll have to let the AI make design decisions and assumptions for you (loss of Design Fidelity).</li>
<li>If you want to stay in control (Design Fidelity) and write prompts quickly (Prompt Efficiency), you will have to make extensive revisions to the AI’s output (loss of Output Usability).</li>
</ul>
<p>We call this triangle of trade-offs <em>the AI delegation dilemma</em>. Inherent to the AI delegation dilemma is the constraint that ‘pick 3’ isn’t possible. There are several reasons why.</p>
<h2 id="the-roots-of-the-dilemma">The roots of the dilemma</h2>
<p>Why the AI delegation dilemma is unavoidable requires examining the roots of how modern LLMs work and the limitations one quickly runs into when delegating work to AI.</p>
<h3 id="the-nature-of-llms-and-code-generation">The nature of LLMs and code generation</h3>
<p>LLMs operate by predicting the most probable sequence of tokens based on their training data, which is vast but invariably leads to generic outputs. They operate based on patterns and statistical prediction as “<a href="https://en.wikipedia.org/wiki/Stochastic_parrot" target="_blank" rel="noopener">stochastic parrots</a>.” For code generation, additional techniques, such as reinforcement learning, are used to hone the output of LLMs and produce code that is favored by humans. This means they lack actual comprehension and have no actual understanding of your specific project’s nuances, architectural philosophy, organizational constraints, or business logic that isn’t explicitly relayed.</p>
<figure>
<img src="https://www.foxhound.systems/img/2025-06-27-ai-delegation-dilemma/parrot_1280.jpg" class="w-100 md-w-60" alt="An LLM is like a parrot that knows everything (in its training data) but understands nothing. This is a departure from real parrots, which understand quite a lot." />
<figcaption aria-hidden="true">An LLM is like a parrot that knows everything (in its training data) but understands nothing. This is a departure from real parrots, which understand quite a lot.</figcaption>
</figure>
<p>The result of this is that LLMs produce <em>plausible</em> code, but not necessarily <em>correct</em> nor <em>optimal</em> code for a specific context. This is why LLMs are prone to not only failing to adhere to your intent, but also even hallucinating code that is entirely believable but completely wrong (ask an LLM to write code for you and it won’t be long before it uses a nonexistent function or imports a fake-but-real-sounding library).</p>
<div class="article-banner">
    <hr>
    <a href="https://www.foxhound.systems/?utm_source=foxhound.systems&amp;utm_medium=banner&amp;utm_campaign=foxhound-systems" class="service-offering no-underline magic-underline-container color-accent-hover-container flex-col w-100 items-center text-left">
        <span class="block w-100 text-center mb-1">
            <div class="flex items-center g-0">
                <span class="inline-block relative">
                    <svg class="block s-w-3 s-h-3"><use href="#fxs-logo"></use></svg>
                </span>
                <span class="font-larger-3 pt-1">Foxhound Systems</span>
            </div>
        </span>
        <span class="w-100 mb-1 block color-muted font-smaller">A US-based software engineering agency</span>
        <span class="font-bold font-larger-3 block mb-2">Precision-built software systems for the most discerning organizations.</span>
        <br><br>
        <span class="w-100 mb-1 block">Foxhound Systems delivers tailored software solutions for clients who demand broad expertise, meticulous attention to detail, and exceptional results</span>
        <br><br>
        <ul class="flex-col items-baseline mb-2 mt-none">
            <li class="flex items-baseline mt-1">
                <span class="ml-1">We're a small, highly skilled team of expert software engineers with a focus on custom software systems for fintech, insurance, and ecommerce</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">We can help you with every part of building a software product: product strategy, UX design, development, production deployment, and beyond</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">We can take your idea to production deployment in as little as 12 weeks</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Benefit from US-market expertise and shared timezones through our entirely US-based team</span>
            </li>
        </ul>
        <span class="block w-full cta-btn pt-2">Learn more</span>
    </a>
    <hr>
</div>

<h3 id="the-cost-of-context-and-implicit-knowledge">The cost of context and implicit knowledge</h3>
<p>Human developers implicitly carry vast amounts of context (team conventions, project history, domain-specific nuances, unspoken requirements, security best practices) that are difficult or impossible to convey fully in a prompt. AI can typically only glimpse a small portion of this through what is conveyed in a prompt or supplemental file, often leading to generic or misaligned output.</p>
<p>The other issue regarding context acquisition by LLMs is that they suffer from anterograde amnesia, which is the inability to form new memories. OpenAI co-founder and Tesla AI director Andrej Karpathy <a href="https://youtu.be/LCEmiRjPEtQ?t=1000" target="_blank" rel="noopener">describes in a talk on the role of AI in software development</a>:</p>
<blockquote>
<p>LLMs also suffer from anterograde amnesia. I’m alluding to the fact that if you have a coworker who joins your organization, this coworker will over time learn your organization and they will understand and gain a huge amount of context about the organization. And they go home and they sleep and they consolidate knowledge. They develop expertise over time. LLMs don’t natively do this and this is not something that has really been solved in the R&amp;D of LLMs.</p>
<p>Context windows are really like working memory and you have to program the working memory quite directly because they don’t get smarter by default. I think a lot of people get tripped up by the analogies in this way.</p>
<p>In popular culture I recommend people watch these two movies: <em>Memento</em> and <em>50 First Dates</em>. In both of these movies the protagonists have their weights fixed and their context windows get wiped every single morning. It’s really problematic to go to work or have relationships when this happens, and this happens to them all the time.</p>
</blockquote>
<p>The very essence of the dilemma is that the complete context for the required features and functionality (the design fidelity) must be relayed to the LLM (at the cost of prompt efficiency) or else post-generation revisions are required (at the cost of output usability). Achieving maximal design fidelity through prompting demands a level of detail that effectively mirrors writing the code itself. To get truly bespoke, usable code quickly, you’d need an AI that perfectly understands your intricate design intent and all necessary context without explicit instruction—a capability LLMs simply do not possess.</p>
<p>This means that a sacrifice must always be made: either in the ease of prompting, the correctness and usability of the output, or the relinquishing of full design control and settling for generic or somewhat misaligned outputs. You need to be prepared to be highly specific in giving instructions to an LLM and even after doing so, still expect to spend significant time in follow-up prompts or manual code revisions to achieve the results you want.</p>
<h2 id="navigating-the-dilemma">Navigating the dilemma</h2>
<p>The AI delegation dilemma means that the LLM-wielding software developer must make a series of trade offs when opting to delegate work to an LLM. There are several different approaches in navigating the dilemma and the best approach largely depends on the nature of the work that the developer is performing.</p>
<h3 id="picking-your-two-priorities-wisely">Picking your two priorities wisely</h3>
<p>Each pair of priorities requires a different set of expectations and leads to distinct usage of the LLM.</p>
<h4 id="get-a-working-prototype-quickly-prompt-efficiency-output-usability-sacrificing-design-fidelity">Get a working prototype quickly: Prompt Efficiency + Output Usability (sacrificing Design Fidelity)</h4>
<p>For quick, immediately usable results without concern for the specifics, you can write concise prompts and embrace generic solutions. This approach works for rapid prototyping. Accept that the AI will make design and functionality assumptions, and budget for potential refactoring if the code later becomes critical.</p>
<p>The downside of this approach is obvious: you’re going to get generic results that don’t necessarily do what you’d ideally want them to do. Depending on the brevity of your prompts and your willingness to accept whatever the LLM produces, you might end up with what is often referred to as “AI slop,” or low quality output that is immediately recognizable as automatically generated. Use this approach for production code at your peril.</p>
<h4 id="produce-a-useful-starting-point-design-fidelity-prompt-efficiency-sacrificing-output-usability">Produce a useful starting point: Design Fidelity + Prompt Efficiency (sacrificing Output Usability)</h4>
<p>If you need to maintain tight control over the design while keeping prompting quick, view the AI as a brainstorming partner generating a rough draft. This means using concise prompts to convey your intent, but being ready for extensive manual editing and cleanup of the AI’s output, utilizing it primarily for ideation or as a conceptual starting point.</p>
<p>This is a path to get the results you actually want while shaving away some of the tedium in a new project, including things like writing boilerplate code beyond what the framework provides for you, setting up configuration, or creating template files that will be written in later on.</p>
<p>With this approach, the downside—or, depending on your perspective, upside—is that you’re only using the AI for a boost, recognizing that most of the work will be done by you, the software developer.</p>
<h4 id="create-comprehensive-specifications-for-maximal-precision-design-fidelity-output-usability-sacrificing-prompt-efficiency">Create comprehensive specifications for maximal precision: Design Fidelity + Output Usability (sacrificing Prompt Efficiency)</h4>
<p>When demanding usable code that aligns with your design and functional requirements, be prepared for substantial effort in writing detailed prompts iteratively. This means providing explicit examples, defining all constraints, and enabling “thinking modes” allow the LLM to produce a better result at the cost of more time.</p>
<p>Like the preceding approach, this tactic seeks to maintain Design Fidelity over the results except it puts the onus of implementation on the LLM rather than a human developer. The task of the human is to supply all of the relevant details of the look, feel, and behavior of the software in English (or whatever non-programming language is chosen).</p>
<p>The downsides of this approach are the burdens associated with relaying a level of detail that coerces the LLM to produce the correct results, along with the need to subsequently review and test the extensive outputs. As mentioned earlier, prompting with sufficient detail to achieve the ultimate level of fidelity will converge with the level of effort required in writing the code itself. Moreover, the limitations of the context window that an LLM operates under means that very large prompts are often at odds with good performance and cost. So even when providing ample context, delegating the task to the LLM may be impractical.</p>
<h2 id="avoid-the-dilemma-entirely">Avoid the dilemma entirely</h2>
<figure>
<img src="https://www.foxhound.systems/img/2025-06-27-ai-delegation-dilemma/exit_1280.jpg" class="w-100 md-w-60" alt="A photograph of an exit sign" />
<figcaption> </figcaption>
</figure>
<p>Sometimes the best way to navigate the dilemma is to avoid it altogether. When working on critical code with hyper-specific requirements, large code bases that require large amounts of contextual information, or when integrating with existing complex systems, it is best to avoid delegating large amounts of work to an LLM.</p>
<p>For example, if working in a well-established code base, delegating a large refactoring task to an LLM can result in significant deviation from existing conventions, introduction of new assumptions, or other undesirable changes. If inclined to use an LLM for such a task, the safer path is to use it to assist with refactoring smaller pieces but steer the high level process on your own.</p>
<p>Keep in mind that with large volumes of work delegated to an AI, the burden or review and revision falls on you, the human developer. In the same way that sometimes a senior engineer may avoid delegating a critical task to a junior engineer because of the increased overall burden caused by the need for more laborious review, avoiding delegation to an AI will in many instances be the best way to get the highest quality result for the lowest amount of total effort.</p>
<h2 id="takeaways">Takeaways</h2>
<p>AI-assisted programming brings with it the potential for additional productivity, but it also introduces the AI delegation dilemma. This fundamental trade-off between Design Fidelity, Prompt Efficiency, and Output Usability means that when offloading larger programming tasks to LLMs, you can achieve at most two out of the three priorities. This isn’t a flaw in the tools themselves, but a consequence of their probabilistic nature and their inherent lack of complete contextual understanding of your specific project.</p>
<p>Recognizing this dilemma allows you to move beyond the initial “magic” of AI code generation and adopt a more strategic approach. Whether you choose to spend time on detailed prompting for precise output, accept generic solutions for rapid prototyping, or use AI as a quick ideation partner for later refinement, recognizing the trade offs is imperative. And for truly critical, complex, or deeply integrated work, sometimes the wisest choice is to avoid the delegation dilemma altogether, and avoid using LLMs for wholesale code generation.</p>
<hr />
<p><span class="color-muted"><em>Christian Charukiewicz is a Partner at Foxhound Systems, a small US-based software engineering agency building outstanding software systems. Need help with a project? <a href="https://www.foxhound.systems/contact/">Contact us</a>.</em></span></p>
        </div>
    </div>
</article>
]]></description>
    <pubDate>Fri, 27 Jun 2025 00:00:00 UT</pubDate>
    <guid>https://www.foxhound.systems/blog/ai-delegation-dilemma/index.html</guid>
    <dc:creator>Foxhound Systems</dc:creator>
</item>
<item>
    <title>TSIDs strike the perfect balance between integers and UUIDs for most databases</title>
    <link>https://www.foxhound.systems/blog/time-sorted-unique-identifiers/index.html</link>
    <description><![CDATA[<nav class="container mt-2 mb-2">
    <a href="https://www.foxhound.systems/blog/" class="no-underline magic-underline-slim">Back to all posts</a>
</nav>
<article class="grow post">
    
        <div class="lg-container post-banner">
            <picture>
                
                    <source srcset="https://www.foxhound.systems/img/2023-12-20-time-sorted-unique-identifiers/hard-drives.webp" type="image/webp" height="731" width="1280">
                
                <img src="https://www.foxhound.systems/img/2023-12-20-time-sorted-unique-identifiers/hard-drives.png" alt="A man stands in the center of a vast hall filled with what appears to be hard drives. Some of the hard drives are huge, as large as computer racks." height="731" width="1280">
            </picture>
        </div>
    
    <div class="container">
        
        <h1 class="title">TSIDs strike the perfect balance between integers and UUIDs for most databases</h1>
        <header class="info gy-1">
            <div class="author-date space-y-1">
                <div>
                    <span class="date">December 20, 2023</span>
                </div>
                <div class="flex flex-wrap gx-2 gy-1">
                    
                        
                        <div class="flex gap-2 items-center">
                            <img class="rounded-full w-12" src="https://www.foxhound.systems/assets/img/team/christian-sm.jpg" alt="Photo of Christian Charukiewicz">
                            <div class="ml-1 flex flex-col">
                                <span class="font-bold">Christian Charukiewicz</span>
                                <span>Partner &amp; Principal Software Engineer</span>
                            </div>
                        </div>
                        
                        
                    
                </div>
            </div>
            
            <div>
                <div class="tags"><a title="Posts tagged: sql" href="https://www.foxhound.systems/blog/tag/sql/">sql</a></div>
            </div>
            
        </header>
        <hr>
        <div class="content">
        <p>When designing a database schema, an important decision is deciding on the type used for the key columns. These are the primary key and foreign key columns in a database, such as <code>users.id</code> and <code>posts.user_id</code>, respectively. The two most common choices are auto-incrementing integer types and Universally Unique Identifiers (UUIDs).</p>
<p>In this article, we’ll examine each of these to understand the trade offs between them, and then examine a third option that our experience dictates offers the best of both worlds: Time-Sorted Unique Identifiers (TSIDs). We’ll explain how TSIDs work, what the pros and cons versus the other two options are, and look at an implementation of TSIDs in PostgreSQL that we’re currently using in our production systems today.</p>
<!--more-->
<h2 id="auto-incrementing-integer-keys">Auto-incrementing integer keys</h2>
<p>The default choice in most database engines is to use an integer type. This is undoubtedly the type that anyone learning the fundamentals of database schema design or looking at examples of <code>CREATE TABLE</code> statements will see.</p>
<p>Using an integer type is sufficient for the vast majority of use cases, so long as the integer type selected for the primary key column is large enough to not cause overflow issues in the given business domain. For example, a mistake that someone unfamiliar with this potential issue can make is setting the primary key type to a signed 32 bit integer (called just <code>INT</code> in many databases), which will result in roughly 2.147 billion keys; a number that can certainly be exceeded in certain contexts. Most SQL databases have a 64 bit integer type with a name like <code>BIGINT</code>, making <code>2 ^ 63 - 1</code>—about 9 <em>billion billion</em>—keys available per table. This is a number so large that it is unreachable for most databases.</p>
<h3 id="benefits-of-auto-incrementing-integers">Benefits of auto-incrementing integers</h3>
<p>There are several obvious benefits to using auto-incrementing integers. One of the most basic benefits is that they will be natively supported by essentially every single SQL database, and they will more than likely have excellent performance due to their modest space requirements and the excellent indexing characteristics of sequential integers. The reason for this is data locality, meaning that similar records live together on disk, allowing significantly more efficient data writes as well as data retrievals in many contexts, which we’ll cover in more detail later.</p>
<p>Another benefit for the purpose of debugging and auditing is that auto-incrementing integers are chronologically sorted, with newer records always having larger values than older records. This issue can be mitigated even when not using auto-incrementing integers simply by defining a <code>created_at</code> column in every table with the default value being set to the current time, but experience dictates that many developers and other schema designers tend to omit this column. Sequential primary keys allow us to at least discern the relative age of the records in a table.</p>
<p>An additional and perhaps underrated benefit is that auto-incrementing integers are human readable. Integers that are just a few digits long can be temporarily recalled by most people for long enough to transcribe them into another window or computer, such as when investigating an issue and searching for them in logs. Even ten or twelve digit numbers can be transcribed by most people with two or three quick glances. When printed with separator characters such as commas (e.g. 3,212,303,404 rather than 3212303404), they’re particularly easy to read.</p>
<div class="article-banner">
    <hr>
    <a href="https://www.cloudtrellis.com/?utm_source=foxhound.systems&amp;utm_medium=banner&amp;utm_campaign=cloudtrellis" class="service-offering no-underline magic-underline-container color-accent-hover-container flex-col w-100 items-center text-left font-manrope" target="_blank" rel="noopener">
        <span class="block w-100 text-center mb-1">
            <div class="flex items-center g-0">
                <span class="inline-block relative">
                    <svg class="color-muted color-accent-hover block s-w-3 s-h-3"><use href="#cloudtrellis-logo"></use></svg>
                </span>
                <span class="font-larger-3">Cloudtrellis</span>
            </div>
        </span>
        <span class="w-100 mb-1 block color-muted font-smaller">A new service built by Foxhound Systems</span>
        <span class="font-bold font-larger-3 block mb-2">Discover problems with your website before your users do</span>
        <br><br>
        <span class="w-100 mb-1 block">Cloudtrellis scans your entire site for broken links, accessibility issues, and SEO errors to ensure a flawless user experience.</span>
        <br><br>
        <ul class="flex-col items-baseline mb-2 mt-none">
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Detect error pages, broken links, accessibility issues, and SEO problems</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Create scans with tailored configurations for each website and subdomain you manage</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Schedule scans to run monthly, weekly, or even daily to closely monitor for new issues</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Get notified of new scan results via email</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Share scan results with your team via direct link</span>
            </li>
        </ul>
        <span class="block w-full cta-btn pt-2">Learn more</span>
    </a>
    <hr>
</div>

<h3 id="downsides-of-auto-incrementing-integers">Downsides of auto-incrementing integers</h3>
<p>There are several downsides to using auto-incrementing integers, however. One problem is that they cannot be generated by multiple separate nodes in tandem. This is a non-issue for systems with a single database that is responsible for generating all of its primary keys. However, this also means that determining the primary key value of a new record requires waiting for the completion of the <code>INSERT</code> operation.</p>
<p>In situations where the client is to generate keys or multiple write-database nodes are needed, using simple auto-incrementing stops being a viable option. In these situations, some sort of orchestration is required to safely generate distinct primary keys in multiple places at the same time.</p>
<p>Another issue is what is referred to as the <a href="https://en.wikipedia.org/wiki/German_tank_problem" target="_blank" rel="noopener">German tank problem</a>, which is the idea that sequential serial numbers (or primary keys) can allow an external observer to make inferences about the total number of records of a given type in a given database. This is particularly an issue in highly sensitive business or even government contexts, where revealing any information at all about the underlying data is considered a risk.</p>
<p>Consider the following scenario: you are a SaaS company that sends out monthly invoices that are tied to an auto-incrementing integer primary key (e.g. visible via URL such as <code>example.com/invoices/10555</code>). In this case, a customer that gets an invoice each month can see the total number of invoices you issue each month. If in successive months they receive invoices with id values of <code>10555</code>, <code>11621</code>, <code>12698</code>, then they can assume that just over 1,000 invoices were issued each month, likely corresponding to the number of customers you have.</p>
<p>This might not only tell them how many customers you have, but with some additional contextual information about your company, they may even be able to approximate your average invoice size. Then, with both the number of invoices you issue each month as well as your average invoice size, they can guess your monthly revenue with reasonable accuracy, which is something that most companies prefer to conceal.</p>
<p>The German tank problem can be alleviated through mechanisms such as hashing (e.g. <code>/invoices/10555</code> becomes <code>/invoices/AbW0D21</code>), but this requires significant extra effort. Many companies just decide what little information they may be revealing through integer ids is not worth concealing, and don’t bother with any special treatment.</p>
<h2 id="universally-unique-identifier-keys">Universally Unique Identifier keys</h2>
<p>UUIDs are another option for primary keys. A UUID is a 128 bit integer that is usually represented as a 32 character hexadecimal string, and is typically displayed in an 8-4-4-4-12 format, such as <code>cd6aefb6-5898-49d9-906d-f7443450cb39</code>. UUIDs can be generated in many different ways, but for the sake of most of this article we’ll predominantly focus on UUIDv4, which is a random UUID generation scheme and very popular whenever UUIDs are used for primary keys.</p>
<h3 id="benefits-of-uuids">Benefits of UUIDs</h3>
<p>Using UUIDs as database primary keys brings with it several benefits over auto-incrementing integers. First, and often touted by proponents of UUIDs, since UUIDs are randomly generated, they can be generated without relying on a central authority—in the database, in the client’s browser, in the server side application, by some external service—and are assumed to always be unique due to the vastness of possible values in the 128 bit number space. Whereas auto-incrementing integers always increase by 1, UUID values can for all intents and purposes be any 128 bit number (for UUIDv4, the randomly generated portion is 122 bits, but we won’t go into the technical details of UUID generation here).</p>
<p>This characteristic has many additional benefits that are not immediately obvious. For example, if two completely separate systems have users that are keyed on UUIDs, the user records from those two systems can be merged without any conflicts, while allowing each user record to retain its original primary key. In a situation where auto-incrementing integers were used in both systems, one system’s set of users would have to be chosen as the incumbent and the other system’s users would need to be reassigned new primary keys during an import process in order to resolve all conflicts.</p>
<p>UUIDs, especially the completely randomly generated UUIDv4 values, cannot be predicted and do not leak information. Since UUIDs are not monotonically incrementing, the gap between successively generated values will vary. Given this, there’s no potential for the German-tank problem to be an issue for record sets where UUIDs are the primary key. There’s no possibility of inferring how many records there are in a system from a UUID given that there’s no meaningful sequence that correlates with the cardinality of the underlying record set that the keys are being created for.</p>
<h3 id="downsides-of-uuids">Downsides of UUIDs</h3>
<p>UUIDs have several major disadvantages as well. First, the 128 bit integers they consist of take significantly more space than auto-incrementing integers. In a large database, the larger keys can take up significantly more total space. The issue with using UUIDs for primary keys specifically is that the entire UUID is stored not only in the table but also in all indexes created for that table. In narrow tables (tables with only a few small columns), the size of the UUID may be larger than all the rest of the data combined. <a href="https://www.percona.com/blog/uuids-are-popular-but-bad-for-performance-lets-discuss/" target="_blank" rel="noopener">An analysis of UUIDs conducted by Percona</a> gave the following example and observation about the characteristics of UUIDs in schemas:</p>
<blockquote>
<p>Let’s assume a table of 1B rows having UUID values as primary key and five secondary indexes. If you read the previous paragraph, you know the primary key values are stored <strong>six</strong> times for each row. That means a total of 6B char(36) values representing 216 GB. That is just the tip of the iceberg, as tables normally have foreign keys, explicit or not, pointing to other tables. When the schema is based on UUID values, all these columns and indexes supporting them are char(36). I recently analyzed a UUID based schema and found that about <strong>70 percent of storage</strong> was for these values.</p>
</blockquote>
<p>UUID support also varies across databases, from acceptable to poor. As a result of this, UUIDs typically have worse performance characteristics than normal integers, and are comparable at best. This is especially true when UUIDs are stored in a string column type, such as <code>VARCHAR</code>, where index performance will not be as good as <code>INTEGER</code> column types. The decision to use <code>VARCHAR</code> may be for a variety of reasons—no better <code>UUID</code> type at all, complexity associated with compiling/loading a UUID handling extension, or a mistake on the part of the developer creating the database. The <code>BINARY(16)</code> type is another option to use in the case of no native UUID support, but as the <a href="https://www.percona.com/blog/uuids-are-popular-but-bad-for-performance-lets-discuss/" target="_blank" rel="noopener">previously linked analysis of UUIDs from Percona</a> says, changing the representation of UUIDs only offers a marginal benefit:</p>
<blockquote>
<p>The use of a smaller representation for the UUID values just allows more rows to fit in the buffer pool but in the long run, it doesn’t really help the performance, as the random insertion order dominates. If you are using random UUID values as primary keys, your performance is limited by the amount of memory you can afford.</p>
</blockquote>
<p>Even if UUID type support is good, randomly generated UUIDs lead to a variety of issues arising from the non-sequential distribution of data, which can be detrimental for performance. Insert performance tends to diminish significantly when data being written uses random primary keys on databases that use normal hard drives (HDDs) rather than solid-state drives (SSDs). Since most SQL databases use B-trees for indexing primary keys, sequential keys lead to a B-tree that grows in a predictable and sequential manner. Conversely, UUIDs, due to their random nature, can lead to a more scattered B-tree. This scattering necessitates keeping larger portions of the tree in memory, potentially reducing overall efficiency.</p>
<p>With an auto-incrementing primary key, records are usually inserted in chronological order. This ordering creates a correlation between temporal and memory locality. As a result, the database can efficiently manage memory by keeping only the frequently accessed (or “hot”) portions of the table in memory, while less frequently accessed (or “cold”) data can be stored on disk.</p>
<p>The contiguous storage of temporally related data with auto-incrementing keys can also enhance CPU prefetching. Prefetching is a process where the CPU anticipates the need for certain data and loads it into faster memory ahead of time. When data is stored non-contiguously, as with UUIDs, the benefits of prefetching are diminished, since the CPU cannot as easily predict which data will be needed next.</p>
<p>An issue with UUIDv4 is that their total randomness means there’s no way to discern which keys are newer than others. As discussed above in the section on auto-incrementing integers, sequential keys provide information into the relative creation times of various records, making debugging and auditing easier in certain circumstances.</p>
<p>The good news is that as of this writing, there is a <a href="https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-04.html#v7" target="_blank" rel="noopener">draft of an upcoming UUIDv7 specification</a>, which will use a timestamp as a component of each generated UUID to ensure that successive values are sequential. It’s unclear when the specification will be finalized, and how much longer it will take before there’s a practical means of generating UUIDv7 values in your database or application of choice. If you are starting a new project at the time of this writing, UUIDv7 is likely not a practical option for you.</p>
<p>Another significant downside of UUIDs, even if sequential, is their lack of readability. Whereas even relatively large integers are easy to read, easy to say, and easy to transcribe, this is largely untenable for UUIDs. Sharing UUIDs between programs, systems, log files, spreadsheets, and wherever else can only be practically done via copy-and-paste. Manually doing so is both error prone and laborious.</p>
<p>The readability of UUIDs is also an issue in other contexts. Consider the following URLs:</p>
<pre><code>https://example.com/user/281714/invoice/3981292</code></pre>
<p>compared to</p>
<pre><code>https://example.com/user/c011fcb7-a180-4430-9412-3684a3a3668c/invoice/264265c0-4861-473e-bc67-80d3b8341847</code></pre>
<p>With UUIDs, the URL becomes so long that it’s likely hard to see the whole thing. Depending on screen size and the visible length of the browser’s address bar, even the <code>/invoice/</code> portion of the URL may be obscured, making it impossible to see at a glance that we are in fact looking at an invoice.</p>
<p>Moreover, as already mentioned, remembering an integer is generally feasible. “Oh yeah, user 281714, that’s the same one that received the other invoice I was looking at.” UUIDs do not lend themselves to the same. It’s certainly possible to do something like “Oh yeah, user that started with c011fc.” But the onus is on the person to remember they’re looking at prefixes rather than suffixes, and in the unlikely event of a prefix collision, things can become very confusing.</p>
<p>The readability concerns persist in other contexts as well, such as when viewing data in a database client or spreadsheet. UUIDs lead to very large and visually noisy data. Consider the following two tables, which use the same IDs employed in the URLs above.</p>
<h4 id="table-with-integer-identifiers">Table with integer identifiers</h4>
<div class="table-narrow table-code">
<table>
<colgroup>
<col />
<col />
<col />
<col />
<col />
</colgroup>
<thead>
<tr>
<th>invoice_id</th>
<th>user_id</th>
<th>invoice_amount</th>
<th>invoice_status</th>
<th>created_at</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>3981292</code></td>
<td><code>281714</code></td>
<td>$2,100</td>
<td>PAID</td>
<td>2023-12-12T16:41:27Z</td>
</tr>
</tbody>
</table>
</div>
<p>compared to</p>
<h4 id="table-with-uuid-identifiers">Table with UUID identifiers</h4>
<div class="table-narrow table-code">
<table>
<colgroup>
<col />
<col />
<col />
<col />
<col />
</colgroup>
<thead>
<tr>
<th>invoice_id</th>
<th>user_id</th>
<th>invoice_amount</th>
<th>invoice_status</th>
<th>created_at</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>264265c0-4861-473e-bc67-80d3b8341847</code></td>
<td><code>c011fcb7-a180-4430-9412-3684a3a3668c</code></td>
<td>$2,100</td>
<td>PAID</td>
<td>2023-12-12T16:41:27Z</td>
</tr>
</tbody>
</table>
</div>
<p>We can see that with UUIDs, the table that contains otherwise the same data is significantly wider whether it be viewed in a database client, a spreadsheet, a log file, or embedded on an HTML page like this.</p>
<p>The last issue we’ll mention with UUIDs as primary keys is that they are incompatible with integers. What this means is that there’s no easy way to switch a given table’s primary key from auto-incrementing integers to UUIDs or vice versa. Doing so may require a significant migration procedure, and may be totally untenable if the data is synchronized across multiple systems (e.g. a nightly data feed or ETL process that is commonly used in B2B integrations or for data analysis purposes). In other words, if you chose auto-incrementing integers as your primary key, odds are you aren’t ever going to be able to change that primary key to UUIDs in a production system. The reverse is also true.</p>
<h2 id="time-sorted-unique-identifier-keys">Time-sorted Unique Identifier keys</h2>
<p>Now, let’s look at TSIDs, which is the name of a particular specification published for implementing time sorted identifiers. The <a href="https://github.com/f4b6a3/tsid-creator" target="_blank" rel="noopener">specification by Fabio Lima</a> we referenced in our use mentions that TSIDs combine ideas from <a href="https://github.com/twitter-archive/snowflake/tree/snowflake-2010" target="_blank" rel="noopener">Snowflake IDs</a> developed and used by Twitter, and <a href="https://github.com/ulid/spec" target="_blank" rel="noopener">ULIDs</a>, another time-sorted identifier that touts a shorter canonical form than UUIDs, amongst several other benefits.</p>
<p>Some of the key features are:</p>
<ul>
<li>TSIDs are generated in time-sortable order (as with other time-sorted identifiers, including the aforementioned UUIDv7)</li>
<li>TSIDs are a 64 bit integer</li>
<li>TSIDs can be represented as a 13 character string through <a href="https://www.crockford.com/base32.html" target="_blank" rel="noopener">Crockford base32 encoding</a></li>
<li>The TSID generation algorithm can optionally include node IDs, to ensure that TSIDs generated at multiple sources (e.g. multiple databases or application servers) stays unique</li>
</ul>
<h3 id="benefits-of-tsids">Benefits of TSIDs</h3>
<p>TSIDs combine many of the benefits of both auto-incrementing integers and UUIDs. As already mentioned, one of the fundamental ideas is that TSIDs are time-sorted, so they are naturally sequenced. TSIDs generated at least one millisecond apart will always maintain their generation order when numerically sorted. This is a benefit that for the purpose of most applications is shared with auto-incrementing integers.</p>
<p>However, unlike auto-incrementing integers, and like UUIDs, TSIDs also include a random component, so successively generated TSIDs cannot be predicted ahead of time. Equally importantly, it is impossible to discern how many TSIDs have already been generated by looking at one or even many TSIDs.</p>
<p>We can demonstrate the above behaviors through examples. Consider the following example in our PostgreSQL database, using our <code>generate_tsid()</code> function (the implementation of which we’ll show later in this article). We’ll run the following query that invokes the function four times in extremely quick succession, and certainly in less than a millisecond.</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">select</span> generate_tsid() <span class="kw">as</span> tsid, <span class="st">'first'</span> <span class="kw">as</span> ord</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="kw">union</span></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="kw">select</span> generate_tsid() <span class="kw">as</span> tsid, <span class="st">'second'</span> <span class="kw">as</span> ord</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a><span class="kw">union</span></span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a><span class="kw">select</span> generate_tsid() <span class="kw">as</span> tsid, <span class="st">'third'</span> <span class="kw">as</span> ord</span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a><span class="kw">union</span></span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a><span class="kw">select</span> generate_tsid() <span class="kw">as</span> tsid, <span class="st">'fourth'</span> <span class="kw">as</span> ord</span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a><span class="kw">order</span> <span class="kw">by</span> tsid <span class="kw">asc</span>;</span></code></pre></div>
<p>Results in:</p>
<div class="table-narrow table-code">
<table>
<thead>
<tr>
<th>tsid</th>
<th>ord</th>
</tr>
</thead>
<tbody>
<tr>
<td>522,836,310,860,897,560</td>
<td>fourth</td>
</tr>
<tr>
<td>522,836,310,861,131,029</td>
<td>first</td>
</tr>
<tr>
<td>522,836,310,862,691,606</td>
<td>second</td>
</tr>
<tr>
<td>522,836,310,863,962,391</td>
<td>third</td>
</tr>
</tbody>
</table>
</div>
<p>We see that the results are returned out of order. This is because a TSID uses millisecond precision in the timestamp component. At the same time, even though we generated four TSIDs in the same millisecond, the difference between the largest and the smallest one is a numerical value of 3,064,831, making it completely untenable to discern how many TSIDs were generated in a given millisecond, let alone across time.</p>
<p>We can use the <code>pg_sleep()</code> function to slow down each query just slightly:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="kw">select</span> tsid, ord <span class="kw">from</span> (</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>		<span class="kw">select</span> generate_tsid() <span class="kw">as</span> tsid, <span class="st">'first'</span> <span class="kw">as</span> ord</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>	) alias</span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="kw">union</span></span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a><span class="kw">select</span> tsid, ord <span class="kw">from</span> (</span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a>		<span class="kw">select</span> pg_sleep(<span class="fl">0.001</span>), generate_tsid() <span class="kw">as</span> tsid, <span class="st">'second'</span> <span class="kw">as</span> ord</span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a>	) alias</span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a><span class="kw">union</span></span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a><span class="kw">select</span> tsid, ord <span class="kw">from</span> (</span>
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a>		<span class="kw">select</span> pg_sleep(<span class="fl">0.001</span>), generate_tsid() <span class="kw">as</span> tsid, <span class="st">'third'</span> <span class="kw">as</span> ord</span>
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a>	) alias</span>
<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a><span class="kw">union</span></span>
<span id="cb4-13"><a href="#cb4-13" aria-hidden="true" tabindex="-1"></a><span class="kw">select</span> tsid, ord <span class="kw">from</span> (</span>
<span id="cb4-14"><a href="#cb4-14" aria-hidden="true" tabindex="-1"></a>		<span class="kw">select</span> pg_sleep(<span class="fl">0.001</span>), generate_tsid() <span class="kw">as</span> tsid, <span class="st">'fourth'</span> <span class="kw">as</span> ord</span>
<span id="cb4-15"><a href="#cb4-15" aria-hidden="true" tabindex="-1"></a>	) alias</span>
<span id="cb4-16"><a href="#cb4-16" aria-hidden="true" tabindex="-1"></a><span class="kw">order</span> <span class="kw">by</span> tsid <span class="kw">asc</span>;</span></code></pre></div>
<p>Which results in:</p>
<div class="table-narrow table-code">
<table>
<thead>
<tr>
<th>tsid</th>
<th>ord</th>
</tr>
</thead>
<tbody>
<tr>
<td>522,837,249,527,293,229</td>
<td>first</td>
</tr>
<tr>
<td>522,837,249,533,629,742</td>
<td>second</td>
</tr>
<tr>
<td>522,837,249,545,303,343</td>
<td>third</td>
</tr>
<tr>
<td>522,837,249,554,283,824</td>
<td>fourth</td>
</tr>
</tbody>
</table>
</div>
<p>We can see that delaying the execution of each call to <code>generate_tsid()</code> because the timestamp component that leads the TSID flips forward between successive calls spaced one millisecond apart. We also see that now the difference between the largest and smallest TSID values in the above sequence generated about three milliseconds apart is more than <em>25 million</em>—utterly obliterating any concerns pertaining to leaking information related to how many records there are in our set.</p>
<p>Looking at the above examples, we can also see that TSIDs are short. Their integer representation is only 18 characters long. Encoded in the aforementioned Crockford base32, a TSID is only 13 URL-safe characters. That’s about 65% shorter than the 36 characters that the standard display format of UUIDs consists of. This means that TSIDs are readable. Let’s consider the above comparison against integers again:</p>
<pre><code>https://example.com/user/281714/invoice/3981292</code></pre>
<p>compared to</p>
<pre><code>https://example.com/user/E34NNFRTCQ15/invoice/DXZBE2D7TB04</code></pre>
<p>This is a huge improvement over UUIDs. Plain integers still have the upper hand when it comes to recognition and short term recall or transcription, but the additional cost of the TSID in this respect is modest compared to the UUID.</p>
<p>One of the most important elements of TSIDs as compared to UUIDs is that they are stored as integers. This means that all of the space, performance, and database support benefits of auto-incrementing integers are true for TSIDs as well. From the perspective of the database, the only distinction between auto-incrementing integers and TSIDs is how each of them are generated, with auto-incrementing integers relying on a per-table sequence that the database stores, and TSIDs relying on their own generation function. The means of storage, of indexing, and of retrieval of both are identical.</p>
<p>There’s another benefit that comes from TSIDs being integers, which is that TSIDs can be a drop in replacement for auto-incrementing integers in any database table whose primary key value is less than about 500,000,000,000,000,000 (that’s a “5” followed by seventeen zeros). I think it’s safe to say that this will likely not be an issue in most tables in most databases.</p>
<p>Once the means of generating TSIDs is defined, it takes at around two queries to switch to TSIDs. Here’s what that looks like for our <code>users</code> table in our PostgreSQL database:</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="kw">ALTER</span> <span class="kw">TABLE</span> <span class="kw">public</span>.users <span class="kw">ALTER</span> <span class="kw">COLUMN</span> <span class="kw">id</span> <span class="kw">TYPE</span> int8 <span class="kw">USING</span> <span class="kw">id</span>:<span class="ch">:int8</span>;</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="kw">ALTER</span> <span class="kw">TABLE</span> <span class="kw">public</span>.users <span class="kw">ALTER</span> <span class="kw">COLUMN</span> <span class="kw">id</span> <span class="kw">SET</span> <span class="kw">DEFAULT</span> generate_tsid();</span></code></pre></div>
<p>And here’s the resulting data (with two rows existing before the switch, and two more being added after):</p>
<div class="table-narrow table-code">
<table>
<thead>
<tr>
<th>id</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Bob</td>
</tr>
<tr>
<td>2</td>
<td>Alice</td>
</tr>
<tr>
<td>522,848,938,755,798,323</td>
<td>Jim</td>
</tr>
<tr>
<td>522,849,990,964,860,213</td>
<td>Jane</td>
</tr>
</tbody>
</table>
</div>
<p>The gap in values may look jarring, but should be inconsequential if the keys are used solely as identifiers throughout the system. For new tables, it’s just as easy. We simply set the <code>DEFAULT</code> value for the primary key column to our TSID generation function.</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="kw">CREATE</span> <span class="kw">TABLE</span> <span class="kw">public</span>.users (</span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>	<span class="kw">id</span> int8 <span class="kw">NOT</span> <span class="kw">NULL</span> <span class="kw">DEFAULT</span> generate_tsid(),</span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a>	<span class="ot">&quot;name&quot;</span> <span class="dt">varchar</span> <span class="kw">NOT</span> <span class="kw">NULL</span>,</span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a>	<span class="kw">CONSTRAINT</span> users_pk <span class="kw">PRIMARY</span> <span class="kw">KEY</span> (<span class="kw">id</span>)</span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a>);</span></code></pre></div>
<p>Another aspect of TSIDs that may be either an upside or a downside is that how they are generated is configurable. The TSID specification defines several variables that can be changed to adjust the generation behavior for a specific context. For example, the node id may be omitted and replaced with a longer random component. While this configuration gives flexibility, if it is written or altered incorrectly, can lead to errors and requires careful adjustment.</p>
<p>One of the other aspects of TSID configuration is that the implementation used in a given database can change over time. As long as the first 42 bits of the TSID are implementing as the timestamp component, the remaining 22 bits can be any combination of random bits, a node id, and a counter component depending on the needs of the deployment, and this portion can change as needed, given that any potential collisions arising from this combination are only a risk in a given millisecond.</p>
<h3 id="downsides-of-tsids">Downsides of TSIDs</h3>
<p>TSIDs do have a few downsides. The first is that they rely on a timestamp component that must fit into the first 42 bits of each standard TSID. The current standard way to generate TSIDs is to use the milliseconds from <code>2020-01-01</code> as the timestamp component. Depending on how the epoch is calculated, this means that the timestamp fits into the allocated space for about 70 or 140 years. This may not be a practical concern for anyone building software today, but is worth noting.</p>
<p>As we saw in the example above, it is possible to generate TSIDs that are out of order if they are generated within the same millisecond. For multi-node generation, clock drift between machines is also an issue, so it is possible that TSIDs generated across machines will be generated out of order due to deviations in the machine clock time. It is worth remarking that clock drift will be a concern for all multi-machine time-sorted identifiers, including UUIDv7 and Snowflake IDs, so TSIDs are no worse in this regard.</p>
<p>Distributed TSID generation is not as straightforward as it is with UUIDs. When generating TSIDs in multiple places, a node ID or machine ID must be maintained. In addition to devising an orchestration scheme to manage the node ID itself, the presence of this node ID subtracts the number of bits available in the random component of the TSID, increasing the possibility of collisions. The probability of a collision remains extremely small, and is only a potential issue when large numbers of TSIDs are generated in a very brief period on a single node—at least a million identifiers <em>per second</em> before the risk appears—but this risk must be considered and managed in high throughput systems, or in cases where large blocks of identifiers are generated before they are actually needed.</p>
<p>A final significant downside is that since TSID generation is highly configurable, managing the configuration can be a burden if it needs to be ported across databases or even across completely distinct system components (e.g. a SQL database and an auxiliary service that both need to generate TSIDs will each need to maintain their own copy of the implementation that stays in agreement to eliminate the risk of collisions). It’s also possible that not every database may make it easy to define a custom TSID generation function. Our PostgreSQL implementation example below is rather compact and self-contained, but other databases may not make it so simple.</p>
<h3 id="tsid-implementation-example">TSID implementation example</h3>
<p>Let’s look at an example implementation of TSIDs that we use for single-database systems. The implementation below is based on an <a href="https://gist.github.com/fabiolimace/6d8d2a4abf67d54d025eca26bcbd1cde/" target="_blank" rel="noopener">example implementation by Fabio Lima</a>. It defines the PostgreSQL function <code>generate_tsid()</code>, which we’ve referenced several times earlier in this article. The following must be ran once per database instance before the function becomes available for use:</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="co">-- PostgreSQL</span></span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a><span class="kw">drop</span> <span class="kw">sequence</span> <span class="cf">if</span> <span class="kw">exists</span> <span class="ot">&quot;generate_tsid_seq&quot;</span>;</span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="kw">create</span> <span class="kw">sequence</span> <span class="ot">&quot;generate_tsid_seq&quot;</span> <span class="kw">maxvalue</span> <span class="dv">1024</span> <span class="kw">as</span> <span class="dt">smallint</span> <span class="kw">cycle</span>;</span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a><span class="kw">create</span> <span class="kw">or</span> <span class="kw">replace</span> <span class="kw">function</span> generate_tsid() returns bigint <span class="kw">as</span> $$</span>
<span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a><span class="kw">declare</span></span>
<span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a>    <span class="co">-- Milliseconds precision</span></span>
<span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a>    C_MILLI_PREC bigint <span class="op">:=</span> <span class="dv">10</span>^<span class="dv">3</span>;</span>
<span id="cb9-9"><a href="#cb9-9" aria-hidden="true" tabindex="-1"></a>    <span class="co">-- Random component bit length: 12 bits</span></span>
<span id="cb9-10"><a href="#cb9-10" aria-hidden="true" tabindex="-1"></a>    C_RANDOM_LEN bigint <span class="op">:=</span> <span class="dv">2</span>^<span class="dv">12</span>;</span>
<span id="cb9-11"><a href="#cb9-11" aria-hidden="true" tabindex="-1"></a>    <span class="co">-- TSID epoch: seconds since 2020-01-01Z</span></span>
<span id="cb9-12"><a href="#cb9-12" aria-hidden="true" tabindex="-1"></a>    <span class="co">-- extract(epoch from '2020-01-01'::date)</span></span>
<span id="cb9-13"><a href="#cb9-13" aria-hidden="true" tabindex="-1"></a>    C_TSID_EPOCH bigint <span class="op">:=</span> <span class="dv">1577836800</span>;</span>
<span id="cb9-14"><a href="#cb9-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-15"><a href="#cb9-15" aria-hidden="true" tabindex="-1"></a>	<span class="co">-- 42 bits</span></span>
<span id="cb9-16"><a href="#cb9-16" aria-hidden="true" tabindex="-1"></a>    C_TIMESTAMP_COMPONENT bigint <span class="op">:=</span> <span class="fu">floor</span>((<span class="fu">extract</span>(<span class="st">'epoch'</span> <span class="kw">from</span> clock_timestamp()) <span class="op">-</span> C_TSID_EPOCH) <span class="op">*</span> C_MILLI_PREC);</span>
<span id="cb9-17"><a href="#cb9-17" aria-hidden="true" tabindex="-1"></a>    <span class="co">-- 12 bits</span></span>
<span id="cb9-18"><a href="#cb9-18" aria-hidden="true" tabindex="-1"></a>    C_RANDOM_COMPONENT bigint <span class="op">:=</span> <span class="fu">floor</span>(<span class="kw">random</span>() <span class="op">*</span> C_RANDOM_LEN);</span>
<span id="cb9-19"><a href="#cb9-19" aria-hidden="true" tabindex="-1"></a>    <span class="co">-- 10 bits</span></span>
<span id="cb9-20"><a href="#cb9-20" aria-hidden="true" tabindex="-1"></a>    C_COUNTER_COMPONENT bigint <span class="op">:=</span> nextval(<span class="st">'generate_tsid_seq'</span>) <span class="op">-</span> <span class="dv">1</span>;</span>
<span id="cb9-21"><a href="#cb9-21" aria-hidden="true" tabindex="-1"></a><span class="cf">begin</span></span>
<span id="cb9-22"><a href="#cb9-22" aria-hidden="true" tabindex="-1"></a>    <span class="kw">return</span> ((C_TIMESTAMP_COMPONENT <span class="op">&lt;&lt;</span> <span class="dv">22</span>) | (C_RANDOM_COMPONENT <span class="op">&lt;&lt;</span> <span class="dv">10</span>) | C_COUNTER_COMPONENT);</span>
<span id="cb9-23"><a href="#cb9-23" aria-hidden="true" tabindex="-1"></a><span class="cf">end</span> $$ language plpgsql;</span></code></pre></div>
<p>Some details about this implementation:</p>
<ul>
<li>We don’t have a node id, and instead we have a 10 bit counter component.</li>
<li>The counter component of TSIDs generated by this function relies on a PostgreSQL sequence (named <code>generate_tsid_seq</code>) that cycles between integers from 1 to 1,024. This sequence is shared across all invocations of the <code>generate_tsid()</code> function across the whole database. In high throughput databases, this can easily be modified to use a sequence per table, significantly reducing the already low chance of collisions even further.</li>
<li>We have a 12 bit random component.</li>
<li>Leading each TSID, we have the aforementioned 42-bit timestamp component that is calculated using the number of milliseconds from <code>2020-01-01</code>.</li>
</ul>
<p>With the above function defined in a database, TSIDs can start to be used as keys in table columns (see the earlier <code>CREATE TABLE</code> and <code>ALTER TABLE</code> statements). This is all that is required for a single-database implementation of TSIDs.</p>
<h2 id="feature-shootout-auto-incrementing-integers-vs.-uuids-vs.-tsids">Feature shootout: Auto-incrementing integers vs. UUIDs vs. TSIDs</h2>
<p>Having made extensive comparisons of auto-incrementing integers, UUIDs, and TSIDs in this article, let’s now look at a summary of what we’ve covered. The below table shows a breakdown of how well each type fares by each feature or trait we’ve examined. Each feature includes an annotation as to whether it is a relative positive, negative, or neutral compared to the other types. This comparison is performed in the context of a typical B2B or B2C SaaS application backed by a single SQL database (spanning one or a few nodes), which describes the vast majority of applications built today.</p>
<div class="table-wrapper table-leftcol-nowrap table-rows-h-5">
<table>
    <colgroup>
        <col />
        <col />
        <col />
        <col />
    </colgroup>
    <thead>
        <tr class="header">
            <th>Feature</th>
            <th>Auto-incr. Integers</th>
            <th>UUIDs</th>
            <th>TSIDs</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td><strong>Key Type</strong></td>
            <td><span class="trait pro" data-aria-label-prefix="Positive trait">Variable size integer</span></td>
            <td><span class="trait con" data-aria-label-prefix="Negative trait">128-bit integer</span></td>
            <td><span class="trait neutral" data-aria-label-prefix="Neutral trait">64-bit integer</span></td>
        </tr>
        <tr>
            <td><strong>Uniqueness</strong></td>
            <td><span class="trait con" data-aria-label-prefix="Negative trait">Unique within a database</span></td>
            <td><span class="trait pro" data-aria-label-prefix="Positive trait">Universally unique</span></td>
            <td><span class="trait neutral" data-aria-label-prefix="Neutral trait">Unique across nodes</span></td>
        </tr>
        <tr>
            <td><strong>Predictability</strong></td>
            <td><span class="trait con" data-aria-label-prefix="Negative trait">Predictable sequence</span></td>
            <td><span class="trait pro" data-aria-label-prefix="Positive trait">Unpredictable</span></td>
            <td><span class="trait pro" data-aria-label-prefix="Positive trait">Unpredictable</span></td>
        </tr>
        <tr>
            <td><strong>Space Efficiency</strong></td>
            <td><span class="trait pro" data-aria-label-prefix="Positive trait"><span>High<br><small>(small size)</span></span></td>
            <td><span class="trait con" data-aria-label-prefix="Negative trait"><span>Low<br><small>(large size)</span></span></td>
            <td><span class="trait neutral" data-aria-label-prefix="Neutral trait"><span>Moderate<br /><small>(larger than integers but smaller than UUIDs)</small></span></span></td>
        </tr>
        <tr class>
            <td><strong>Data locality</strong></td>
            <td><span class="trait pro" data-aria-label-prefix="Positive trait"><span>High<br><small>(sequential increment)</small></span></td>
            <td><span class="trait con" data-aria-label-prefix="Negative trait"><span>Low<br><small>(random order)</small></span></span></td>
            <td><span class="trait pro" data-aria-label-prefix="Positive trait"><span>High<br><small>(time-sorted with random component)</small></span></span></td>
        </tr>
        <tr>
            <td><strong>Performance</strong></td>
            <td><span class="trait pro" data-aria-label-prefix="Positive trait"><span>High<br><small>(efficient indexing, inserts, reads)</small></span></span></td>
            <td><span class="trait con" data-aria-label-prefix="Negative trait"><span>Poor<br><small>(inefficient inserts, scattered indexes, read penalty)</small></span></span></td>
            <td><span class="trait pro" data-aria-label-prefix="Positive trait"><span>High<br><small>(similar to integers)</small></span></span></td>
        </tr>
        <tr>
            <td><strong>Readability</strong></td>
            <td><span class="trait pro" data-aria-label-prefix="Positive trait"><span>High<br><small>(simple numbers)</small></span></span></td>
            <td><span class="trait con" data-aria-label-prefix="Negative trait"><span>Low<br><small>(32 character strings)</small></span></span></td>
            <td><span class="trait neutral" data-aria-label-prefix="Neutral trait"><span>Moderate<br><small>(13 character strings)</small></span></span></td>
        </tr>
        <tr>
            <td><strong>Chronological Sorting</strong></td>
            <td><span class="trait pro" data-aria-label-prefix="Positive trait"><span>Yes, implicit<br><small>(based on sequence)</small></span></span></td>
            <td><span class="trait con" data-aria-label-prefix="Negative trait">No inherent order</span></td>
            <td><span class="trait pro" data-aria-label-prefix="Positive trait"><span>Yes, time-sorted<br><small>(based on time component)</small></span></span></td>
        </tr>
        <tr>
            <td><strong>Multi-node Generation</strong></td>
            <td><span class="trait con" data-aria-label-prefix="Negative trait">Not feasible</span></td>
            <td><span class="trait pro" data-aria-label-prefix="Positive trait">Easily feasible</span></td>
            <td><span class="trait neutral" data-aria-label-prefix="Neutral trait">Feasible with node IDs</span></td>
        </tr>
        <tr>
            <td><strong>Security (Inference Risk)</strong></td>
            <td><span class="trait con" data-aria-label-prefix="Negative trait"><span>High<br><small>(German Tank Problem)</small></span></span></td>
            <td><span class="trait pro" data-aria-label-prefix="Positive trait"><span>Low<br><small>(no inference)</small></span></span></td>
            <td><span class="trait pro" data-aria-label-prefix="Positive trait"><span>Low<br><small>(no inference)</small></span></span></td>
        </tr>
        <tr>
            <td><strong>Ease of Implementation</strong></td>
            <td><span class="trait pro" data-aria-label-prefix="Positive trait"><span>High<br><small>(natively supported)</small></span></span></td>
            <td><span class="trait neutral" data-aria-label-prefix="Neutral trait"><span>Moderate<br><small>(varies by database)</small></span></span></td>
            <td><span class="trait con" data-aria-label-prefix="Negative trait"><span>Low<br><small>(least support, requires function implementation, managing node IDs)</small></span></span></td>
        </tr>
        <tr>
            <td><strong>Scalability</strong></td>
            <td><span class="trait neutral" data-aria-label-prefix="Neutral trait"><span>Varies<br><small>(limited by integer type)</small></span></span></td>
            <td><span class="trait pro" data-aria-label-prefix="Positive trait"><span>High<br><small>(no practical limit)</small></span></span></td>
            <td><span class="trait pro" data-aria-label-prefix="Positive trait"><span>High<br><small>(at least ~70 years, limited by timestamp size)</small></span></span></td>
        </tr>
        <tr>
            <td><strong>Migration Flexibility</strong></td>
            <td><span class="trait neutral" data-aria-label-prefix="Neutral trait"><span>Moderate<br><small>(can change to larger integer type)</small></span></span></td>
            <td><span class="trait con" data-aria-label-prefix="Negative trait"><span>Low<br><small>(hard to change key type)</small></span></span></td>
            <td><span class="trait pro" data-aria-label-prefix="Positive trait"><span>High<br><small>(drop-in compatible with integers)</small></span></span></td>
        </tr>
    </tbody>
</table>
</div>
<h2 id="tsids-have-your-cake-and-eat-it-too">TSIDs: Have your cake and eat it too</h2>
<p>In the projects we’re currently working on, including our last production system, we’re employing TSIDs using the PostgreSQL implementation shared above (along with a language-specific implementation of Crockford base32 for use of TSIDs in URLs). Our experience has been very positive, with a completely seamless switch from predominantly using auto-incrementing integers (along with some use of UUIDs) in older systems to TSIDs in our current work. This ease of adoption was especially true once we settled on the specific TSID configuration we wanted to use.</p>
<p>In considering the trade offs of auto-incrementing integers, UUIDs, and TSIDs, our experience dictates that TSIDs bring together many of the benefits of both auto-incrementing integers and UUIDs and minimizing the downsides. As an organization, we’ve seen success with using TSIDs in production, and we strongly recommend evaluating and considering whether TSIDs are an improved means of primary key generation in your systems, especially to anyone who is already considering a switch from auto-incrementing integers to UUIDs or vice versa. The TSID may be exactly the right balance of features you are looking for.</p>
<hr />
<p><span class="color-muted"><em>Christian Charukiewicz is a Partner at Foxhound Systems. We’re a small team of Software Engineering leaders that can lead your organization towards its software development goals. Want to improve the effectiveness of your development team? Take a look at our <a href="https://www.foxhound.systems/services/technical-guidance/">Technical Guidance subscriptions</a> or for a larger project, <a href="https://www.foxhound.systems/contact/">contact us</a>.</em></span></p>
        </div>
    </div>
</article>
]]></description>
    <pubDate>Wed, 20 Dec 2023 00:00:00 UT</pubDate>
    <guid>https://www.foxhound.systems/blog/time-sorted-unique-identifiers/index.html</guid>
    <dc:creator>Foxhound Systems</dc:creator>
</item>
<item>
    <title>Technical Debt is not real</title>
    <link>https://www.foxhound.systems/blog/technical-debt-is-not-real/index.html</link>
    <description><![CDATA[<nav class="container mt-2 mb-2">
    <a href="https://www.foxhound.systems/blog/" class="no-underline magic-underline-slim">Back to all posts</a>
</nav>
<article class="grow post">
    
        <div class="lg-container post-banner">
            <picture>
                
                    <source srcset="https://www.foxhound.systems/img/2023-12-14-technical-debt/tech-debt-vacuum.webp" type="image/webp" height="854" width="1280">
                
                <img src="https://www.foxhound.systems/img/2023-12-14-technical-debt/tech-debt-vacuum.png" alt="A 1950s-style ad for a futuristic robot vacuum cleaner. The words 'Technical' and 'Debt' are displayed on banners at the top of the image." height="854" width="1280">
            </picture>
        </div>
    
    <div class="container">
        
        <h1 class="title">Technical Debt is not real</h1>
        <header class="info gy-1">
            <div class="author-date space-y-1">
                <div>
                    <span class="date">December 14, 2023</span>
                </div>
                <div class="flex flex-wrap gx-2 gy-1">
                    
                        
                        
                        <div class="flex gap-2 items-center">
                            <img class="rounded-full w-12" src="https://www.foxhound.systems/assets/img/team/ben-sm.jpg" alt="Photo of Ben Levy">
                            <div class="ml-1 flex flex-col">
                                <span class="font-bold">Ben Levy</span>
                                <span>Partner &amp; Principal Software Engineer</span>
                            </div>
                        </div>
                        
                    
                </div>
            </div>
            
            <div>
                <div class="tags"><a title="Posts tagged: software-development" href="https://www.foxhound.systems/blog/tag/software-development/">software-development</a> <a title="Posts tagged: tech-leadership" href="https://www.foxhound.systems/blog/tag/tech-leadership/">tech-leadership</a></div>
            </div>
            
        </header>
        <hr>
        <div class="content">
        <p>In software development, “Technical Debt” often emerges as a foreboding specter, casting a long shadow over codebases and development teams alike. Yet, herein lies a provocative truth: technical debt is not a tangible entity lurking within lines of code. It’s a metaphor, a way of thinking about the accumulated consequences of past decisions and shortcuts.</p>
<p>In this article, we’ll examine this metaphor and see how technical debt has less to do with the code itself and more about the choices and compromises that emerge from the challenges within software development. We’ll delve into two forms of it: emergent technical debt, which arises from evolving system requirements, and deliberate technical debt, a strategic choice to prioritize rapid development over code quality. By looking at technical debt in this way, we not only refine our understanding of the concept but also identify effective strategies for managing it.</p>
<!--more-->
<h2 id="the-conceptual-origins-of-technical-debt">The conceptual origins of “Technical Debt”</h2>
<p>The term “Technical Debt” can be traced back to Ward Cunningham, a notable figure in the software development world and one of the original authors of the Agile Manifesto. Cunningham introduced this term to describe a phenomenon in software development akin to financial debt. He explained that just as it’s sometimes necessary to incur financial debt, it can be strategically acceptable to accumulate technical debt:</p>
<blockquote>
<p>It’s OK to borrow against the future, as long as you pay it off.</p>
<p>— Ward Cunningham [<a href="https://www.agilealliance.org/wp-content/uploads/2016/05/IntroductiontotheTechnicalDebtConcept-V-02.pdf" target="_blank" rel="noopener">source</a>]</p>
</blockquote>
<p>His analogy was simple yet profound: incurring a small amount of debt and repaying it promptly can be beneficial, but allowing it to accumulate unchecked can lead to a crippling cycle, where one is overwhelmed by the burden of merely servicing the interest. Cunningham’s comparison was specifically targeted at the challenges from technical debt arising “because of the way the system is.” However, the term has evolved, often used to describe a broader range of challenges in software development, including the consequences of expedited development.</p>
<div class="article-banner">
    <hr>
    <a href="https://www.cloudtrellis.com/?utm_source=foxhound.systems&amp;utm_medium=banner&amp;utm_campaign=cloudtrellis" class="service-offering no-underline magic-underline-container color-accent-hover-container flex-col w-100 items-center text-left font-manrope" target="_blank" rel="noopener">
        <span class="block w-100 text-center mb-1">
            <div class="flex items-center g-0">
                <span class="inline-block relative">
                    <svg class="color-muted color-accent-hover block s-w-3 s-h-3"><use href="#cloudtrellis-logo"></use></svg>
                </span>
                <span class="font-larger-3">Cloudtrellis</span>
            </div>
        </span>
        <span class="w-100 mb-1 block color-muted font-smaller">A new service built by Foxhound Systems</span>
        <span class="font-bold font-larger-3 block mb-2">Discover problems with your website before your users do</span>
        <br><br>
        <span class="w-100 mb-1 block">Cloudtrellis scans your entire site for broken links, accessibility issues, and SEO errors to ensure a flawless user experience.</span>
        <br><br>
        <ul class="flex-col items-baseline mb-2 mt-none">
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Detect error pages, broken links, accessibility issues, and SEO problems</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Create scans with tailored configurations for each website and subdomain you manage</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Schedule scans to run monthly, weekly, or even daily to closely monitor for new issues</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Get notified of new scan results via email</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Share scan results with your team via direct link</span>
            </li>
        </ul>
        <span class="block w-full cta-btn pt-2">Learn more</span>
    </a>
    <hr>
</div>

<h2 id="because-of-the-way-it-is">Because of the way it is</h2>
<p>In the realm of software development, the inception of a project is often marked by a paradoxical certainty: we know the least about the project at its beginning. This is an inherent trait, not a flaw, of the developmental process. It’s a recognition that emerging requirements are both inevitable and unpredictable, a fundamental principle at the core of the Agile movement.</p>
<p>We build just enough to address current understanding without overreaching. This is essential to building a useful product with the least amount of waste. We try to avoid the pitfall of developing features that, while impressive, may not align with the actual needs of the end-users. We must come face-to-face with the crucial question: What value does a feature-rich software hold if its core functionality fails to serve its intended purpose?</p>
<p>However, this pair of ideas has a complex knock-on effect. When a new requirement emerges that is antithetical to our previous understanding of the system, we are faced with a dilemma between two choices. The first is to throw up our hands and declare the requirement impossible “Because of the way it is”—this of course is not realistic. The second is to somehow shoehorn this new requirement into the system by means of a nasty hack. In this view of technical debt, borrowing against the future is an acknowledgement that “we know we don’t know, so let’s not pretend”. This loan will come due when we learn what the future holds.</p>
<p>There is, of course, a third less taken route—adapting the system to make it as if the requirement had always been known. Performing this type of change is analogous to repaying our debt in our financial metaphor. The more we defer paying off these mismatches between what we initially believed and what we know now, the more interest we accrue. This interest takes the form of bugs, of extra time necessary to understand the system, and of prolonged development time for new features.</p>
<h2 id="buying-on-credit">Buying on credit</h2>
<p>There is another, more common, usage of the term “Technical Debt” that seems to have originated from the startup world, a realm where customer needs and business models are often unclear initially. This form of technical debt draws parallels to the consumer credit boom of mid-20th century America.</p>
<p>In the 1950s, amazing new inventions were hitting the market, including vacuum cleaners for the home, dishwashers, microwave ovens, and washing machines. This amazing plethora of new products granted a lifestyle leap to many, previously exclusive to the ultrarich.</p>
<p>To top it all off, acquiring all these amazing products was possible without having the cash upfront; they could all be purchased on credit. But of course, this led to a significant portion of the middle class saddled with debt, living a life beyond their means.</p>
<p>In this startup context, technical debt is akin to buying time on credit. Companies may rush to release products, compromising on code quality with the hope that success will grant them the opportunity to rectify these shortcuts in the future. This mindset – “we will write bad code now and pay for it later” – represents a distinct category of technical debt, differing fundamentally from the kind driven by evolving project understanding.</p>
<p>This situation mirrors the nuances in financial debt. While corporate debt can be a leveraged tool for growth, personal debt, especially unsecured debt like credit cards, can have devastating impacts. However, unlike people, startups frequently die at a young age so a heightened risk appetite at the outset is the norm.</p>
<p>If we have this kind of debt at our organization and it isn’t on the verge of failure, congratulations are in order, we’ve made it. Our best bet is to now try to develop an understanding of the systems we’ve built and to “refinance” our debt.</p>
<h2 id="paying-down-debt">Paying down debt</h2>
<p>A common sentiment among development teams is the desire for a dedicated refactoring sprint, with the belief that this will restore the system to a more manageable state and improve development efficiency. This often stems from a scenario where changing requirements have prevented the team from tidying up the codebase.</p>
<p>However, this reflects a deeper issue: a fundamental lack of solid understanding of the system from the outset, leading to an ad-hoc approach akin to buying on credit. This mindset can dangerously lead to the illusion of a “big rewrite” as a panacea, which, more often than not, results in failure and an exacerbation of existing technical debt.</p>
<p>The concept of refactoring sprints, while seemingly a solution, is often misguided. The root problem often lies not in the code itself, but in the processes that built that code. In cases of what might be termed “corporate technical debt,” a continuous, proactive approach to managing technical debt is more effective than sporadic refactoring sprints. This approach requires a rethinking of how we structure our work—every time we get a feature request that doesn’t fit with the system in its current state, we are in fact getting two stories: change the design of the system to be as though it was always meant to have that feature, and make the actual feature.</p>
<p>This aligns with Kent Beck’s philosophy (creator of Extreme Programming and author of Test-Driven Development by Example) who said:</p>
<blockquote>
<p>for each desired change, make the change easy (warning: this may be hard), then make the easy change.</p>
<p>— Kent Beck [<a href="https://twitter.com/KentBeck/status/250733358307500032?lang=en" target="_blank" rel="noopener">source</a>]</p>
</blockquote>
<p>Neglecting this practice leads to being mired in code that we resent working on. Working on a system in this dilapidated state is a slog, and the addition of successive features only adds to the problem.</p>
<p>Ignoring this approach can lead to a dreaded scenario where the codebase becomes burdensome “Legacy Code,” causing developer burnout and eventual attrition. In this context, “just one sprint” is insufficient for extricating a project from a technical debt quagmire. It requires a sustained, concerted effort with each feature request to incrementally align the system with the desired state. This involves not just adding new features but also ensuring each addition seamlessly integrates as if it were always part of the system’s design. This strategy transforms the way technical debt is managed, moving away from reactive measures to a more holistic and sustainable approach to software development.</p>
<h2 id="striking-a-balance-in-managing-technical-debt">Striking a balance in managing technical debt</h2>
<p>This exploration into the realm of technical debt and its dual manifestations should provide a foundational understanding of this complex concept. It’s important to recognize that a moderate amount of technical debt, whether emergent or deliberate, can be manageable and sometimes even necessary. However, an excessive accumulation of either type can lead to significant challenges.</p>
<p>Armed with the knowledge of the two forms of technical debt and how to manage them, when responding to concerns about technical debt from your team, you can ask them to elaborate on what specifically they mean when they use the term. If you are in a technical leadership role, the insights gained here may guide you towards more effective strategies than relying solely on the elusive “magical refactoring sprint.” Ultimately, the key lies in striking a balance and pursuing a sustainable approach to managing technical debt.</p>
<hr />
<p><span class="color-muted">Ben Levy is a Partner at Foxhound Systems. We build fast, reliable, and maintainable custom software systems across a wide variety of industries. Want to improve the effectiveness of your development team? Take a look at our <a href="https://www.foxhound.systems/services/technical-guidance/">Technical Guidance subscriptions</a> or for a larger project, <a href="https://www.foxhound.systems/contact/">contact us</a>.</span></p>
        </div>
    </div>
</article>
]]></description>
    <pubDate>Thu, 14 Dec 2023 00:00:00 UT</pubDate>
    <guid>https://www.foxhound.systems/blog/technical-debt-is-not-real/index.html</guid>
    <dc:creator>Foxhound Systems</dc:creator>
</item>
<item>
    <title>The missing letter in 'MVP'</title>
    <link>https://www.foxhound.systems/blog/missing-letter-in-mvp/index.html</link>
    <description><![CDATA[<nav class="container mt-2 mb-2">
    <a href="https://www.foxhound.systems/blog/" class="no-underline magic-underline-slim">Back to all posts</a>
</nav>
<article class="grow post">
    
        <div class="lg-container post-banner">
            <picture>
                
                    <source srcset="https://www.foxhound.systems/img/2023-06-22-missing-letter-in-mvp/banner-skateboard.webp" type="image/webp" height="853" width="1280">
                
                <img src="https://www.foxhound.systems/img/2023-06-22-missing-letter-in-mvp/banner-skateboard.jpg" alt="A photograph of a well-worn skateboard, with its front wheels on an elevated curb and its rear wheels on the asphalt." height="853" width="1280">
            </picture>
        </div>
    
    <div class="container">
        
        <h1 class="title">The missing letter in 'MVP'</h1>
        <header class="info gy-1">
            <div class="author-date space-y-1">
                <div>
                    <span class="date">June 22, 2023</span>
                </div>
                <div class="flex flex-wrap gx-2 gy-1">
                    
                        
                        <div class="flex gap-2 items-center">
                            <img class="rounded-full w-12" src="https://www.foxhound.systems/assets/img/team/christian-sm.jpg" alt="Photo of Christian Charukiewicz">
                            <div class="ml-1 flex flex-col">
                                <span class="font-bold">Christian Charukiewicz</span>
                                <span>Partner &amp; Principal Software Engineer</span>
                            </div>
                        </div>
                        
                        
                    
                </div>
            </div>
            
            <div>
                <div class="tags"><a title="Posts tagged: business" href="https://www.foxhound.systems/blog/tag/business/">business</a> <a title="Posts tagged: product-strategy" href="https://www.foxhound.systems/blog/tag/product-strategy/">product-strategy</a></div>
            </div>
            
        </header>
        <hr>
        <div class="content">
        <p>When someone with an idea for a software product sets out to research how to create it, they’ll quickly run into two pieces of advice regarding building software, often presented in tandem:</p>
<ol type="1">
<li>You need to determine if there’s demand in the market for your idea by validating it.</li>
<li>In order to validate your idea, you need to build an MVP.</li>
</ol>
<p>An ‘MVP’ here is a minimum viable product. The term has been in use since the early 2000s, when it rose to popularity in the startup world. In general an MVP is a product that has a minimal set of features but is entirely usable. The intent behind an MVP is that it allows someone who sets out to build software to do so at a more modest cost by keeping the initial feature set slim but still offering something of value to consumers. Whatever our view is on the totality of the concept, there’s useful advice that can be distilled from this idea. But there’s also something missing.</p>
<!--more-->
<h2 id="avoiding-runaway-costs-what-the-idea-of-the-mvp-gets-right">Avoiding runaway costs: What the idea of the MVP gets right</h2>
<p>Ideas are cheap. A potential software startup founder or leader at an established company that may or may not have a budget to build something can spend hours, days, or even weeks developing an idea for a software product. However, turning a product idea into a finished product ready for actual users can be extremely expensive.</p>
<p>One of the most expensive ways to build a product is to strive for the “100%” version. That is, implementing every single feature, every single convenience, and every luxurious detail that is believed to make a perfect product. Pursuing this type of of build as part of a greenfield project—one that is building something that doesn’t yet exist at all—leads to a massive project scope that tends to balloon even further as emergent complexity and open questions are discovered during development. Often, the project fails entirely or is walked back significantly after most, all, or even more than the initial budget is spent with no end in sight.</p>
<p>Running a greenfield project as an MVP build helps to protect against this. Pursuing a minimal feature set that leads to a still viable-for-use product can result in significantly more modest development costs. Planning a project in such a way and aggressively sticking to the plan will usually result in something approximating the original idea being delivered for roughly the initial budget. At the end of such a build, there’s an actual product ready for users to use that will be used to test the market. Whether the test proves that the product is successful or not is separate from whether the execution of the build of the MVP is successful. Even if the product is abandoned because there is no market for it, the endeavor to build the product as an MVP will have been a success in that costs were controlled and risk was mitigated.</p>
<h2 id="the-common-criticism-of-mvps">The common criticism of MVPs</h2>
<p>If you read articles about MVPs, you’ll find a commonly repeated diagram intended to illustrate the idea. It begins with a skateboard, which then evolves into a kick scooter, which in turn becomes a bike, then a motorcycle, then a car. The idea is apparently that the skateboard is the MVP in this succession of products that culminates with a car.</p>
<figure>
<img src="https://www.foxhound.systems/img/2023-06-22-missing-letter-in-mvp/vehicle-mvp.png" class="w-100 md-w-75" alt="Flawed depiction of the evolution from MVP to refined product. Source: Wikipedia" />
<figcaption aria-hidden="true">Flawed depiction of the evolution from MVP to refined product. Source: <a href="https://en.wikipedia.org/wiki/Minimum_viable_product#/media/File:From_minimum_viable_product_to_more_complex_product.png" target="_blank" rel="noopener">Wikipedia</a></figcaption>
</figure>
<p>Herein lies the problem with the way MVPs are discussed and what leads to criticism of the concept. Building and selling a skateboard doesn’t tell you anything at all about the business viability or the demand for cars. For that matter, it likely tells you nothing about the demand for motorcycles, bikes, or even kick scooters.</p>
<p>An MVP for a highly refined skateboard is going to be a more primitive and less feature filled skateboard. Likewise, a highly refined car being sold today might have started its life as an MVP, but that MVP must have also be a car. In the world of physical products, the word ‘prototype’ is used instead of MVP.</p>
<p>The concept of MVP being discussed in this way leads to a lot of criticism of the concept, often leading to the eschewing of the term and approach altogether. In his post, <a href="https://world.hey.com/jason/validation-is-a-mirage-273c0969" target="_blank" rel="noopener">Validation is a mirage</a>, Jason Fried writes:</p>
<blockquote>
<p>When I hear MVP, I don’t think Minimum Viable Product. I think Minimum Viable Pie. The food kind.</p>
<p>A slice of pie is all you need to evaluate the whole pie. It’s homogenous. But that’s not how products work. Products are a collection of interwoven parts, one dependent on another, one leading to another, one integrating with another. You can’t take a slice a product [<em>sic</em>], ask people how they like it, and deduce they’ll like the <em>rest</em> of the product once you’ve completed it. All you learn is that they like or don’t like the slice you gave them.</p>
<p>If you want to see if something works, make it. The whole thing. The simplest version of the whole thing – that’s what version 1.0 is supposed to be.</p>
</blockquote>
<p>Fried’s position seems to be that the concept of an MVP is just wrong and that the term should be avoided. He describes the simplest version of a product as simply being its version 1.0, and that this is what should be built in pursuit of testing the viability of a product. However, there’s value in the idea, and with refinement it can be made significantly more useful and less prone to causing wasted effort on building an incomplete or entirely different thing.</p>
<h2 id="an-mvp-must-be-representative">An MVP must be representative</h2>
<p>The issue with MVP isn’t that the concept is fundamentally flawed, but that it is incomplete. Instead of talking and thinking about minimum viable products for the purposes of testing a market, we should think in terms of minimum viable <em>representative</em> products, or what I’ll call MVRPs.</p>
<p>The critical point that is often missing in discussions of MVPs and in criticisms such as the ones mentioned above is that they do not in any way reflect the product we intend on testing in the market. Potential users can’t glean what the final product is supposed to be like because the MVP doesn’t clearly illustrate, or represent, what the refined version is to be. This means that whatever the users reception to this non-representative-of-the-actual-product MVP is—whether good or bad—it does not actually answer the question of market viability for the final product we have in mind.</p>
<p>Once we talk and think in terms of MVPs that are representative, or MVRPs, the problem goes away. We should also recognize that all MVRPs are MVPs, but the inverse is not true.</p>
<figure>
<img src="https://www.foxhound.systems/img/2023-06-22-missing-letter-in-mvp/mvp-vs-mvrp.svg" class="w-100 md-w-60" alt="The relationship between the set of all possible minimum viable products and all possible minimum viable representative products: MVRP ⊂ MVP" />
<figcaption aria-hidden="true">The relationship between the set of all possible minimum viable products and all possible minimum viable representative products: MVRP ⊂ MVP</figcaption>
</figure>
<p>If we look back at the diagram above that begins with a skateboard and ends with a car, it’s pretty clear that a skateboard is not an MVRP for the car. In fact, the only MVRP for a car is a simpler version of the car. Perhaps the MVRP doesn’t have leather seats, floor mats, or a radio. But it certainly needs to have an engine, several seats, a trunk, and doors. Part of the intrinsic value of a car is that it is powered not by our own energy but by a power source (be it a combustible fuel or a battery), there’s room for passengers, and that it isolates us from the outside environment.</p>
<div class="article-banner">
    <hr>
    <a href="https://www.cloudtrellis.com/?utm_source=foxhound.systems&amp;utm_medium=banner&amp;utm_campaign=cloudtrellis" class="service-offering no-underline magic-underline-container color-accent-hover-container flex-col w-100 items-center text-left font-manrope" target="_blank" rel="noopener">
        <span class="block w-100 text-center mb-1">
            <div class="flex items-center g-0">
                <span class="inline-block relative">
                    <svg class="color-muted color-accent-hover block s-w-3 s-h-3"><use href="#cloudtrellis-logo"></use></svg>
                </span>
                <span class="font-larger-3">Cloudtrellis</span>
            </div>
        </span>
        <span class="w-100 mb-1 block color-muted font-smaller">A new service built by Foxhound Systems</span>
        <span class="font-bold font-larger-3 block mb-2">Discover problems with your website before your users do</span>
        <br><br>
        <span class="w-100 mb-1 block">Cloudtrellis scans your entire site for broken links, accessibility issues, and SEO errors to ensure a flawless user experience.</span>
        <br><br>
        <ul class="flex-col items-baseline mb-2 mt-none">
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Detect error pages, broken links, accessibility issues, and SEO problems</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Create scans with tailored configurations for each website and subdomain you manage</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Schedule scans to run monthly, weekly, or even daily to closely monitor for new issues</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Get notified of new scan results via email</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Share scan results with your team via direct link</span>
            </li>
        </ul>
        <span class="block w-full cta-btn pt-2">Learn more</span>
    </a>
    <hr>
</div>

<h2 id="determining-what-is-representative">Determining what is representative</h2>
<p>I just described what an MVRP for a car is in the preceding paragraph. But my description might be wrong. Or it might not be. It depends on what we envision the final version of the car to be and what features are representative of that. In order to determine what is required in our MVRP to be in fact representative, we need to examine what our vision for the final product is and what distinctive features it offers.</p>
<p>For example, if we’re making a car designed exclusively for off roading, we might not need doors. In the United States, it’s very common to see Jeep Wranglers—small SUVs sold as off road vehicles—without doors on them. Owners will deliberately take the doors off, ostensibly to feel closer to nature and more immersed in their off-roading experience. A car intended to compete in this niche market might not even need doors in its MVRP. Likewise, if we’re building a sports car designed for maximum performance, we might not need a trunk in the MVRP we build. Yet we probably need doors if we’re to isolate the driver from driving fast.</p>
<p>When building the MVRP of a software product, here too we must identify both the essential and distinctive elements of the product for our initial release to be representative of the refined vision. The core and distinctive features are highly context dependent and require applying the type of discernment we used when determining whether we’re ultimately building an off road vehicle or a sports car. The features required to offer a representative experience of a new music streaming application will be different than those of an image host which will be different still than those of a messaging platform.</p>
<p>But there are other features as well. We shouldn’t forget basic affordances necessary to a decent user experience such as a password reset process or the ability to update our account email address. Not offering these would be kind of like building a car without windshield wipers. All may be fine until you get some rain and then the car is nearly unusable. In the same vein, we don’t want to build a product that leaves users locked out of their accounts.</p>
<h2 id="make-sure-your-next-mvp-is-an-mvrp">Make sure your next MVP is an MVRP</h2>
<p>The concept of a minimum viable product has plenty of utility to it. But it is often interpreted and applied incorrectly due to its incompleteness. Someone seeking to test the market for a car of any sort should not waste time and money building a skateboard. There is no version of a skateboard that is representative of the experience of a car. To alleviate this problem of building products that tell us nothing about the market for our product vision, we must focus on building minimum viable <em>representative</em> products.</p>
<p>Making an MVRP instead of just an MVP requires us to be more careful in thinking about what it is that are the essential elements distilled from the final vision of our product. Starting from the “100%” version mentioned at the very beginning, we can begin to peel away certain aspects of the product until we’re left at a core set of features, both the distinctive and the essential. This allows us to build an initial product that can be used to determine whether there is a demand for the ultimate vision of our product.</p>
<hr />
<p><em>Christian Charukiewicz is a Partner at Foxhound Systems, where we build fast and reliable custom applications across a wide variety of industries. Looking for help with something you’re working on? Reach out to us at <a href="mailto:info@foxhound.systems" target="_blank" rel="noopener">info@foxhound.systems</a></em>.</p>
        </div>
    </div>
</article>
]]></description>
    <pubDate>Thu, 22 Jun 2023 00:00:00 UT</pubDate>
    <guid>https://www.foxhound.systems/blog/missing-letter-in-mvp/index.html</guid>
    <dc:creator>Foxhound Systems</dc:creator>
</item>
<item>
    <title>Essential elements of high performance applications: Server side caching</title>
    <link>https://www.foxhound.systems/blog/essential-elements-of-high-performance-server-side-caching/index.html</link>
    <description><![CDATA[<nav class="container mt-2 mb-2">
    <a href="https://www.foxhound.systems/blog/" class="no-underline magic-underline-slim">Back to all posts</a>
</nav>
<article class="grow post">
    
        <div class="lg-container post-banner">
            <picture>
                
                    <source srcset="https://www.foxhound.systems/img/2022-10-07-essential-elements-of-high-performance-server-side-caching/server-side-caching-banner.webp" type="image/webp" height="853" width="1280">
                
                <img src="https://www.foxhound.systems/img/2022-10-07-essential-elements-of-high-performance-server-side-caching/server-side-caching-banner.jpg" alt="A photograph of a squirrel with a walnut in its mouth perched on a concrete wall. The squirrel appears to be looking for a place to put the nut." height="853" width="1280">
            </picture>
        </div>
    
    <div class="container">
        
        <h2 class="mt-2 mb-none color-muted">Essential elements of high performance applications</h2>
        
        <h1 class="title">Server side caching</h1>
        <header class="info gy-1">
            <div class="author-date space-y-1">
                <div>
                    <span class="date">October  7, 2022</span>
                </div>
                <div class="flex flex-wrap gx-2 gy-1">
                    
                        
                        <div class="flex gap-2 items-center">
                            <img class="rounded-full w-12" src="https://www.foxhound.systems/assets/img/team/christian-sm.jpg" alt="Photo of Christian Charukiewicz">
                            <div class="ml-1 flex flex-col">
                                <span class="font-bold">Christian Charukiewicz</span>
                                <span>Partner &amp; Principal Software Engineer</span>
                            </div>
                        </div>
                        
                        
                    
                </div>
            </div>
            
            <div>
                <div class="tags"><a title="Posts tagged: performance-optimization" href="https://www.foxhound.systems/blog/tag/performance-optimization/">performance-optimization</a> <a title="Posts tagged: essential-elements-of-high-performance" href="https://www.foxhound.systems/blog/tag/essential-elements-of-high-performance/">essential-elements-of-high-performance</a></div>
            </div>
            
        </header>
        <hr>
        <div class="content">
        <p>Our application’s SQL database is a good place to start with performance optimization, as it doesn’t require changing our infrastructure or major rewrites of the code. <a href="https://www.foxhound.systems/blog/essential-elements-of-high-performance-sql-indexes/">Adding indexes</a> and <a href="https://www.foxhound.systems/blog/essential-elements-of-high-performance-offloading-work-sql-database/">rewriting queries</a> are generally isolated measures that we can take to improve the performance of our application. However, sometimes even after optimization we’ll find performance is still worse than required. This may be for a variety of reasons—our request volume is so large that our database is struggling to serve all the queries even after optimization, or we’re running already optimized but more complex queries whose performance is still inadequate.</p>
<p>One technique we can employ in such a situation is server side caching. In simple terms, server side caching is saving the result of an expensive query or computation and making it more quickly retrievable. The results are typically written against and retrieved using a particular ID or URL that acts as a distinct identifier for the particular data.</p>
<!--more-->
<h2 id="server-side-caching-flow-cache-hits-and-misses">Server side caching flow: cache hits and misses</h2>
<p>Employing caching in an application requires handling two cases in our data retreival flow, the <em>cache hit</em> and the <em>cache miss</em>. A cache hit occurs when the cache is checked for a particular piece of data and it is found. By contrast, a cache miss is when the cache is checked and the piece of data is absent from the cache. Let’s look at both caching flows.</p>
<p>Assuming our cache is totally empty to start with, here’s what a cache miss flow looks like:</p>
<ol type="1">
<li>Receive the request for a particular piece of data</li>
<li>Check whether the data for that request’s ID or URL is in the cache</li>
<li>Since the data is not in the cache, run the query or computation to retrieve the result</li>
<li>Save the result in the cache</li>
<li>Return the result</li>
</ol>
<p>We can see above that the cache miss occurs in step 2, causing step 3 to result in the costly query or computation that we are aiming to avoid executing. By saving the result of this query in the cache in step 4, subsequent requests for this piece of data can result in a cache hit, which looks like the following:</p>
<ol type="1">
<li>Receive the request for a particular piece of data</li>
<li>Check whether the data for that request’s ID or URL is in the cache</li>
<li>Since the data is in the cache, return this result</li>
</ol>
<p>In this sequence, we avoid querying the database altogether and rely only on the cache. By avoiding having to read from the database, we can dramatically speed up retrieval of the data that we need.</p>
<p>Let’s look at an example. Suppose we’re working using a piece of project management software and we look up the recent activity of one of our coworkers. The path of the coworkers profile and activity feed might be something like <code>https://www.example.com/users/123456/activity</code> and we navigate to it in our browser. On the web server, the application routes the request to an activity feed handler (a function that handles the request) that takes <code>123456</code>, our coworker’s user ID, as the argument.</p>
<p>The handler function may look like the following:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> getUserActivityHandler(userId):</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>	<span class="co"># [1] Check the cache for activity feed data for the given user</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>	cachedActivityFeedItems <span class="op">=</span> retrieveCachedActivityFeedForUser(key<span class="op">=</span>userId)</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>	<span class="co"># [2] If activity feed data was found for this user, return it</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a>	<span class="cf">if</span>(cachedActivityFeedItems <span class="op">!=</span> <span class="va">None</span>):</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a>		<span class="co"># CACHE HIT</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a>		<span class="cf">return</span> cachedActivityFeedItems</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a>	<span class="co"># CACHE MISS</span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a>	<span class="co"># [3] Otherwise, run SQL queries for each type of activity feed item</span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a>	recentPosts <span class="op">=</span> findRecentPostsForUser(userId)</span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a>	recentComments <span class="op">=</span> findRecentCommentsForUser(userId)</span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a>	recentCompleteTasks <span class="op">=</span> findRecentCompletedTasksForUser(userId)</span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a>	<span class="co"># [4] Sort all retrieved items by date</span></span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a>	sortedActivityFeedItems <span class="op">=</span> sortByDate([</span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a>		recentPosts,</span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a>		recentComments,</span>
<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"></a>		recentCompletedTasks</span>
<span id="cb1-20"><a href="#cb1-20" aria-hidden="true" tabindex="-1"></a>	])</span>
<span id="cb1-21"><a href="#cb1-21" aria-hidden="true" tabindex="-1"></a>	<span class="co"># [5] Save the sorted activity feed items in the cache,</span></span>
<span id="cb1-22"><a href="#cb1-22" aria-hidden="true" tabindex="-1"></a>	<span class="co">#     associated with the current user id</span></span>
<span id="cb1-23"><a href="#cb1-23" aria-hidden="true" tabindex="-1"></a>	saveCachedActivityForUser(key<span class="op">=</span>userId, sortedActivityFeedItems)</span>
<span id="cb1-24"><a href="#cb1-24" aria-hidden="true" tabindex="-1"></a>	<span class="co"># [6] In addition, return the same activity feed data</span></span>
<span id="cb1-25"><a href="#cb1-25" aria-hidden="true" tabindex="-1"></a>	<span class="cf">return</span> sortedActivityFeedItems</span></code></pre></div>
<p>The above handler function will only run steps [1] and [2] if the activity feed data for the specified user ID is found in the cache. If the data isn’t present in the cache, it will run several SQL queries and sort the results (steps [3] and [4]), and then save save the sorted data in the cache in step [5] before also returning the data in step [6].</p>
<p>When employing caching like this, it’s imperative that the <em>key</em> used to look up the cached data is the same as the one used to store it. In both steps [1] and steps [5], we’re using the <code>userId</code> parameter as the cache key. If the keys did not match, then we would always experience a cache miss after step [2], since the data in the cache would not be retrieved using the same identifier that it was stored under.</p>
<p>Looking at the example above, you’ll notice that caching only matters in successive calls of the <code>getUserActivityHandler</code> function with a particular <code>userId</code> parameter. This means that data in the cache is persisted across requests. One way to conceptualize the cache is as a special type of database or data store that our application uses in tandem with a SQL database.</p>
<p>But how do we know when to update the data in the cache? What happens if the underlying data in the SQL database changes? These concerns are solved through <em>cache invalidation</em>.</p>
<div class="article-banner">
    <hr>
    <a href="https://www.cloudtrellis.com/?utm_source=foxhound.systems&amp;utm_medium=banner&amp;utm_campaign=cloudtrellis" class="service-offering no-underline magic-underline-container color-accent-hover-container flex-col w-100 items-center text-left font-manrope" target="_blank" rel="noopener">
        <span class="block w-100 text-center mb-1">
            <div class="flex items-center g-0">
                <span class="inline-block relative">
                    <svg class="color-muted color-accent-hover block s-w-3 s-h-3"><use href="#cloudtrellis-logo"></use></svg>
                </span>
                <span class="font-larger-3">Cloudtrellis</span>
            </div>
        </span>
        <span class="w-100 mb-1 block color-muted font-smaller">A new service built by Foxhound Systems</span>
        <span class="font-bold font-larger-3 block mb-2">Discover problems with your website before your users do</span>
        <br><br>
        <span class="w-100 mb-1 block">Cloudtrellis scans your entire site for broken links, accessibility issues, and SEO errors to ensure a flawless user experience.</span>
        <br><br>
        <ul class="flex-col items-baseline mb-2 mt-none">
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Detect error pages, broken links, accessibility issues, and SEO problems</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Create scans with tailored configurations for each website and subdomain you manage</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Schedule scans to run monthly, weekly, or even daily to closely monitor for new issues</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Get notified of new scan results via email</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Share scan results with your team via direct link</span>
            </li>
        </ul>
        <span class="block w-full cta-btn pt-2">Learn more</span>
    </a>
    <hr>
</div>

<h2 id="expunging-stale-data-cache-invalidation">Expunging stale data: Cache invalidation</h2>
<p>A critical consideration of caching is the concept of <em>cache invalidation</em>, or removing data that is stored in the cache that is not reflective of what is currently present in the SQL database<sup id="ret-delta" title="See footnote: Delta">Δ</sup>, which we can consider our system of record or our single source of truth. Looking back to the code example above, we need to consider what happens when our coworker makes a new post or completes a task in the project management system. If we’ve looked at his activity feed recently, his activity feed data will be cached, so looking at the feed again will not show the latest post.</p>
<p>In order to resolve this issue, we need to invalidate the data in the cache. More specifically, we need to invalidate the activity feed cache entry for our coworker’s user ID. How we do this depends on our tolerance for <em>stale data</em> and the performance related considerations of retrieving data from the SQL database. For example, we may set a 15 minute <em>time time live</em> (or <em>TTL</em>) on all activity feed cache entries, causing any piece of data to be dropped from the cache once its age reaches 15 minutes. With a 15 minute TTL, the data we see for a in any user’s activity feed should never be outdated by more than 15 minutes. Whether this is acceptable depends on the user experience expectations for the application.</p>
<p>Another cache invalidation strategy we can use is on creation of any of the individual items that comprise the activity feed. Using this strategy, when a coworker makes a post, comment, or completes a task, their activity feed cache is automatically invalidated. This way we ensure that the data on any user’s activity feed is always up to date, as a stale copy that isn’t reflective of what was last saved in the database should never stick around in the cache. One of the obvious downsides of this approach is that it requires changing the implementation of every location in our code base that saves data that appears in activity feeds. In this case, this would include sites that save recent posts, recent comments, and recently completed tasks. An additional downside of this strategy is that if users in the system are very active and frequently post, comment, or mark tasks as completed, their activity feeds will seldom be cached and our retrieval code will usually result in a cache miss.</p>
<p>In practice, we may want to employ a combination of these strategies across our application, with certain pages or pieces of information using only a TTL strategy, others using the on-create invalidation, and others still using a combination of the two.</p>
<aside>
<sup title="Footnote Delta">Δ</sup> In this article, we’re referring to a SQL database as our system of record or single source of truth with the assumption that some of the queries are too expensive to run every time and require caching. However, it’s worth underscoring that almost <em>any</em> data being read can be cached by our application server, including API calls to external services, the contents of a file, or query results from non-SQL databases. Of course, in each case, the permissibility of stale data and cache invalidation implications must be considered before implementing caching. [<a href="#ret-delta">↑</a>]
</aside>
<h2 id="selecting-a-store-for-our-cached-data">Selecting a store for our cached data</h2>
<p>Up to this point we’ve only discussed caching in general and as it relates to the overall flow of our application and the role it plays in relation to the SQL database. However, in order to actually implementing caching in our system requires selecting a store for the data that is to be cached.</p>
<p>One of the most commonly used databases for caching is <a href="https://redis.io/" target="_blank" rel="noopener">Redis</a>, which is has a number of distinctions from SQL that make it well suited for serving as a server side data cache:</p>
<ul>
<li>Redis is an in-memory database, meaning its data is stored on the server’s RAM rather than on its disk, and commodity RAM is significantly faster to read from and write to than commodity SSDs, which are commonplace on web application servers.</li>
<li>Redis is a key-value store rather than a SQL database, meaning that instead of tables consisting of columns and rows, it associates every piece of stored data with a single key value. We can think of Redis as a large hashmap or dictionary, which is exactly the data structure we use to implement key-based caching.</li>
<li>Redis does not enforce any schema, meaning that unlike SQL which has tables with predefined structure, we can store free form data in each Redis key (for the purpose of caching, the data we store is usually a serialized array or JSON string)</li>
<li>Redis supports automatic data expiration, so we can set a TTL for each piece of cached data. This allows us to automatically drop data from the cache after it reaches a certain age.</li>
<li>Redis also allows us to set a data eviction policy, such as <em>least recently used</em> (or <em>LRU</em>), meaning that as the cache server reaches its limit for RAM use, it will start evicting least recently used pieces of data, even if their TTL hasn’t been hit.</li>
<li>Redis runs as a server and has a networking interface. This means that an individual Redis instance deployed on a single host can serve multiple application servers, allowing them to share a cache. Redis also supports clustering across multiple nodes, which enables horizontal scaling even and stable performance even at heavy cache workloads.</li>
</ul>
<p>There are other options available to use as a cached data store besides Redis. For example, <a href="https://memcached.org/" target="_blank" rel="noopener">memcached</a> is a tool very similar to Redis when used for the purposes of caching. Its feature set is more limited, with Redis having support for data structures beyond just strings and integers that memcached supports. This difference in features is largely inconsequential for the purpose of caching, since as mentioned earlier, caching typically involves serializing data into a string before storage. However, if Redis is used or will likely be used for other purposes such as pub/sub, message queues, or geospatial indexes, it becomes a natural tool to reach for over memcached, as the overhead of learning, finding language libraries for, and maintaining the infrastructure of multiple tools is avoided.</p>
<p>A cache data store can also be even more rudimentary than the dedicated stores discussed above. For example, file caching is sometimes employed “out of the box” by web frameworks, where an application will write to and read from temporary files on its host machine. Despite the somewhat primitive implementation, there are some notable benefits to this approach:</p>
<ol type="1">
<li>Infrastructure simplicity. Nothing beyond the application needs to be deployed.</li>
<li>Near zero-latency cache lookups. A tool like Redis deployed as described above will have network overhead associated with each cache lookup. Even a host in the same data center will likely incur a couple milliseconds of wait time, whereas a file can be accessed instantaneously.</li>
</ol>
<p>However, there are plenty of downsides as well:</p>
<ol type="1">
<li>A local file cache cannot be shared by multiple hosts, so the same web request subsequently routed to a different server will result in the underlying query running again, even if already cached on the first server.</li>
<li>With file caching, cache invalidation in a multi-host environment becomes impractical. This can lead to inconsistent results across servers. It’s possible for two different versions of data to be cached on each machine.</li>
<li>No “TTL” functionality without cron jobs, some other system process, or cleanup logic in the application, so cache files may take up space on disk forever.</li>
<li>Security implications related to having pieces of your data “at rest” outside of your databases. An attacker that gets access to the application server’s disk may be able to see pieces of other users’ data even without database access.</li>
</ol>
<p>Ultimately, most modern caching setups are likely to employ tools like Redis rather than local file system caching. When network latency is a concern, there are more sophisticated setups that can be employed, such as installing a Redis process on each application server which then broadcasts cache invalidations to other servers (see the links at the end of this article for more on this topic).</p>
<h2 id="wrap-up">Wrap up</h2>
<p>In summary, server side caching allows us to significantly reduce the latency associated with repeatedly running intensive database queries or other computations. Most caching implementations rely on tools like Redis that are well-suited to caching data in a multi-host environment. However, caching of this sort comes with trade-offs. It requires additional infrastructure complexity inherent to having an additional caching database, it requires updating our code to actually make use of the cache, and it requires carefully considering our cache invalidation strategies and deciding on our tolerance for serving stale data.</p>
<h2 id="further-reading-about-caching">Further reading about caching</h2>
<ul>
<li><a href="https://redis.io/docs/latest/develop/reference/eviction/" target="_blank" rel="noopener">Redis key eviction</a></li>
<li><a href="https://redis.io/docs/latest/develop/clients/client-side-caching/" target="_blank" rel="noopener">Redis server-assisted client side caching</a></li>
</ul>
<hr />
<p>This post is part of a series titled <em>Essential elements of high performance applications</em>. The full list of published posts is available below.</p>
<ul>
<li><a href="https://www.foxhound.systems/blog/essential-elements-of-high-performance-sql-indexes/">SQL indexes</a></li>
<li><a href="https://www.foxhound.systems/blog/essential-elements-of-high-performance-offloading-work-sql-database/">Offloading work to the SQL database</a></li>
<li><strong>Server side caching</strong> (this post)</li>
</ul>
<hr />
<p><em>Christian Charukiewicz is a Partner at Foxhound Systems, where we build fast and reliable custom applications across a wide variety of industries. Looking for help with something you’re working on? Reach out to us at <a href="mailto:info@foxhound.systems" target="_blank" rel="noopener">info@foxhound.systems</a></em>.</p>
        </div>
    </div>
</article>
]]></description>
    <pubDate>Fri, 07 Oct 2022 00:00:00 UT</pubDate>
    <guid>https://www.foxhound.systems/blog/essential-elements-of-high-performance-server-side-caching/index.html</guid>
    <dc:creator>Foxhound Systems</dc:creator>
</item>
<item>
    <title>Essential elements of high performance applications: Offloading work to the SQL database</title>
    <link>https://www.foxhound.systems/blog/essential-elements-of-high-performance-offloading-work-sql-database/index.html</link>
    <description><![CDATA[<nav class="container mt-2 mb-2">
    <a href="https://www.foxhound.systems/blog/" class="no-underline magic-underline-slim">Back to all posts</a>
</nav>
<article class="grow post">
    
        <div class="lg-container post-banner">
            <picture>
                
                    <source srcset="https://www.foxhound.systems/img/2022-07-25-essential-elements-of-high-performance-offloading-work-sql-database/offloading-work-sql-database-banner.webp" type="image/webp" height="853" width="1280">
                
                <img src="https://www.foxhound.systems/img/2022-07-25-essential-elements-of-high-performance-offloading-work-sql-database/offloading-work-sql-database-banner.jpg" alt="A photograph a baby elephant walking away from the camera, towards a body of water with which has an adult elephant bathing in it." height="853" width="1280">
            </picture>
        </div>
    
    <div class="container">
        
        <h2 class="mt-2 mb-none color-muted">Essential elements of high performance applications</h2>
        
        <h1 class="title">Offloading work to the SQL database</h1>
        <header class="info gy-1">
            <div class="author-date space-y-1">
                <div>
                    <span class="date">July 25, 2022</span>
                </div>
                <div class="flex flex-wrap gx-2 gy-1">
                    
                        
                        <div class="flex gap-2 items-center">
                            <img class="rounded-full w-12" src="https://www.foxhound.systems/assets/img/team/christian-sm.jpg" alt="Photo of Christian Charukiewicz">
                            <div class="ml-1 flex flex-col">
                                <span class="font-bold">Christian Charukiewicz</span>
                                <span>Partner &amp; Principal Software Engineer</span>
                            </div>
                        </div>
                        
                        
                    
                        
                        
                        <div class="flex gap-2 items-center">
                            <img class="rounded-full w-12" src="https://www.foxhound.systems/assets/img/team/ben-sm.jpg" alt="Photo of Ben Levy">
                            <div class="ml-1 flex flex-col">
                                <span class="font-bold">Ben Levy</span>
                                <span>Partner &amp; Principal Software Engineer</span>
                            </div>
                        </div>
                        
                    
                </div>
            </div>
            
            <div>
                <div class="tags"><a title="Posts tagged: performance-optimization" href="https://www.foxhound.systems/blog/tag/performance-optimization/">performance-optimization</a> <a title="Posts tagged: sql" href="https://www.foxhound.systems/blog/tag/sql/">sql</a> <a title="Posts tagged: essential-elements-of-high-performance" href="https://www.foxhound.systems/blog/tag/essential-elements-of-high-performance/">essential-elements-of-high-performance</a></div>
            </div>
            
        </header>
        <hr>
        <div class="content">
        <p>In the <a href="https://www.foxhound.systems/blog/essential-elements-of-high-performance-sql-indexes/">previous post</a> in this series we discussed the importance of proper use of indexes for performance when defining your SQL database. Creating indexes is a simple but essential part of application performance. However, once your database schema is created and indexes are employed, another key aspect of building a fast application is effectively leaning on the database in order to quickly execute work and perform computations. Doing so effectively will often reduce the total amount of work that your application servers as well as your database have to perform. What this looks like in practice is writing SQL queries that use the full breadth of features available.</p>
<!--more-->
<h2 id="joins-and-subqueries">Joins and Subqueries</h2>
<p>One of the simplest examples of this is knowing how and when to employ <code>JOIN</code> queries, as well as understanding the differences between different types of <code>JOIN</code>s and how they can impact performance.</p>
<p>For example, a common practice that hurts application performance is employing “N+1” queries. We’ve written about this as well as other approaches for retrieving data in the beginning of our post on <a href="https://www.foxhound.systems/blog/grouping-query-results-haskell/">grouping query results with Haskell</a>. In short, code structured in this manner will run one query for each item in a list of other items retrieved in a preceding query.</p>
<p>Here’s an example of code that does this, where we need to query both a <code>posts</code> table and a <code>users</code> table to determine the total number of posts made by each author that has recently made a post:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Runs a SQL query to retrieve recent posts</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>posts <span class="op">=</span> selectAllRecentPosts()</span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>postCounts <span class="op">=</span> {}</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> post <span class="kw">in</span> posts:</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a>    <span class="co"># Runs a SQL query to count of posts for a given user</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a>    postCounts[post.authorId] <span class="op">=</span> countPostsForAuthor(post.authorId)</span></code></pre></div>
<p>Where <code>selectAllRecentPosts()</code> may run a query like:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="kw">SELECT</span> <span class="op">*</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="kw">FROM</span> posts</span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="kw">WHERE</span> created <span class="op">&gt;=</span> NOW() <span class="op">-</span> <span class="dt">INTERVAL</span> <span class="st">'1'</span> <span class="dt">DAY</span>;</span></code></pre></div>
<p>And <code>countPostsForAuthor(123)</code> runs a query like:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">SELECT</span> <span class="fu">COUNT</span>(<span class="kw">DISTINCT</span> posts.<span class="kw">id</span>)</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="kw">FROM</span> posts</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="kw">WHERE</span> posts.author_id <span class="op">=</span> <span class="dv">123</span>;</span></code></pre></div>
<p>These queries are both simple, and it’s easy to make them both take advantage of indexes in order to ensure they run quickly. However, the issue with the N+1 approach is that the second query has to run <em>for each</em> post retrieved by the first query. This means that a list of 20 posts will run 21 (or 20+1) queries.</p>
<p>Not only does this require waiting for the database to execute 21 queries (a number that will scale as the number of recently created posts goes up), but this also requires 21 separate network calls between the application and the database. It is common practice to have database server hardware that is separate from the application server, so network latency of 7 milliseconds per query will add 147 milliseconds to our request <em>in network latency alone</em>. If we had 150 recently created posts, this network latency would balloon to a whopping 1,057 milliseconds, or <em>over 1 second</em> of just waiting for the application to send and retrieve data from the database.</p>
<p>With this in mind, we want to offload this work to the database while also reducing the number of database calls required. So instead of the code we saw earlier, we’re going to move all of the work to the database and change our code to invoke the following query:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="kw">SELECT</span> recent_authors.<span class="kw">id</span>, <span class="fu">COUNT</span>(posts.<span class="kw">id</span>)</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="kw">FROM</span> (</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>    <span class="kw">SELECT</span> <span class="kw">DISTINCT</span> users.<span class="op">*</span></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a>    <span class="kw">FROM</span> users</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a>    <span class="kw">INNER</span> <span class="kw">JOIN</span> posts</span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a>    <span class="kw">ON</span> users.<span class="kw">id</span> <span class="op">=</span> posts.author_id</span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a>    <span class="kw">WHERE</span> posts.created <span class="op">&gt;=</span> NOW() <span class="op">-</span> <span class="dt">INTERVAL</span> <span class="st">'1'</span> <span class="dt">DAY</span></span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a>) <span class="kw">AS</span> recent_authors</span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a><span class="kw">INNER</span> <span class="kw">JOIN</span> posts</span>
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a><span class="kw">ON</span> recent_authors.<span class="kw">id</span> <span class="op">=</span> posts.author_id</span>
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a><span class="kw">GROUP</span> <span class="kw">BY</span> recent_authors.<span class="kw">id</span>;</span></code></pre></div>
<p>This query is more complicated than either one of the preceding queries on their own. It employs an <code>INNER JOIN</code> to retrieve <code>users</code> as well as the associated <code>posts</code> table data based. It also uses a subquery to determine which authors have been recently active. Using this approach, we are certain that we will always run one query. Moreover, our code will have been simplified as well, as we were able to eliminate the <code>for</code> loop entirely and just invoke this query directly.</p>
<p>This example of replacing two separate queries with a single more complex query that employs a <code>JOIN</code> and a subquery illustrates our general point—let the database do the work.</p>
<p>One other important point to make here is that the type of <code>JOIN</code> we are using here is very deliberate. An <code>INNER JOIN</code> will join together two tables and only keep rows that have matches on both the left and right side of the join. In our case, that means every result must include both a <code>users</code> table entry and a <code>posts</code> table entry. This is exactly what we want, since we’re only interested in display posts counts for users that have recently made a post.</p>
<p>If, by contrast, we wanted to generate a report that included the name of each user as well as how many posts they’ve authored, including users that did not author any posts, we might write a query like this:</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="kw">SELECT</span> users.<span class="kw">id</span>, <span class="fu">COUNT</span>(<span class="kw">DISTINCT</span> posts.<span class="kw">id</span>)</span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a><span class="kw">FROM</span> users</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="kw">LEFT</span> <span class="kw">JOIN</span> posts</span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a><span class="kw">ON</span> users.<span class="kw">id</span> <span class="op">=</span> posts.authorId</span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a><span class="kw">GROUP</span> <span class="kw">BY</span> users.<span class="kw">id</span>;</span></code></pre></div>
<p>Here, the <code>LEFT JOIN</code> (which can also be called and written as <code>LEFT OUTER JOIN</code>) will change the way the query runs to keep results from the <code>users</code> table even if there are no matching <code>posts</code>. This is a requirement given that we are interested in displaying a <code>0</code> count for users that have never authored anything.</p>
<p>Different types of joins have different behaviors and performance implications. <code>INNER</code> and <code>LEFT</code> joins are the two most common types you are likely to encounter in most applications, but there are also <code>FULL</code>, <code>RIGHT</code>, and <code>CROSS</code> joins that you may encounter in certain situations.</p>
<p>SQL databases aren’t slow, and in a properly indexed database with sensibly written queries, leaning into the database will result in better system performance than attempting to write simpler queries and then instead performing the same work in the application.</p>
<div class="article-banner">
    <hr>
    <a href="https://www.cloudtrellis.com/?utm_source=foxhound.systems&amp;utm_medium=banner&amp;utm_campaign=cloudtrellis" class="service-offering no-underline magic-underline-container color-accent-hover-container flex-col w-100 items-center text-left font-manrope" target="_blank" rel="noopener">
        <span class="block w-100 text-center mb-1">
            <div class="flex items-center g-0">
                <span class="inline-block relative">
                    <svg class="color-muted color-accent-hover block s-w-3 s-h-3"><use href="#cloudtrellis-logo"></use></svg>
                </span>
                <span class="font-larger-3">Cloudtrellis</span>
            </div>
        </span>
        <span class="w-100 mb-1 block color-muted font-smaller">A new service built by Foxhound Systems</span>
        <span class="font-bold font-larger-3 block mb-2">Discover problems with your website before your users do</span>
        <br><br>
        <span class="w-100 mb-1 block">Cloudtrellis scans your entire site for broken links, accessibility issues, and SEO errors to ensure a flawless user experience.</span>
        <br><br>
        <ul class="flex-col items-baseline mb-2 mt-none">
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Detect error pages, broken links, accessibility issues, and SEO problems</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Create scans with tailored configurations for each website and subdomain you manage</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Schedule scans to run monthly, weekly, or even daily to closely monitor for new issues</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Get notified of new scan results via email</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Share scan results with your team via direct link</span>
            </li>
        </ul>
        <span class="block w-full cta-btn pt-2">Learn more</span>
    </a>
    <hr>
</div>

<h2 id="grouping-and-aggregations">Grouping and Aggregations</h2>
<p>Another way to lean on our database is to use it for calculating aggregates. In simple terms, aggregation is the computation of a single value from many values (or many rows, in the case of SQL aggregations). Examples of aggregate functions supported by most databases include <code>SUM</code>, <code>AVG</code>, <code>MIN</code>, <code>MAX</code>. Perhaps the most commonly used aggregation function is <code>COUNT</code>. Aggregations will almost always be used alongside a <code>GROUP BY</code> statement, which defines how to aggregate the rows for a given set of results.</p>
<p>We employed this manner of grouping above in order to count the number of posts each user had created in our specified search interval. In the outer query, we indicated that we wanted to <code>GROUP BY users.id</code>, which causes all rows with a given user ID to “collapse” into a single row, and any aggregation functions—the <code>COUNT</code> function in our case—to treat all rows in each group as an input to the respective calculation. In our query, every distinct <code>posts.id</code> for each <code>users.id</code> was counted, giving us the number of posts each user created.</p>
<p>Grouping can be somewhat confusing to people first learning SQL, since when the aggregation function is removed, we can’t typically “see” the results of the <code>GROUP BY</code> operation. Different database engines will handle this situation slightly differently, but generally speaking the effect of grouping isn’t visible until some aggregation function is applied.</p>
<h2 id="window-functions">Window Functions</h2>
<p>Above, we mentioned how grouping causes rows to “collapse” into a single row, specified by whatever column(s) should be used as the group identifier. Window functions provide another way to apply aggregations to subsets of rows in a given set of results, but unlike with grouping, using window functions allows rows in a given grouping (or “partition”) to maintain their own identity and continue to appear in the result set.</p>
<h3 id="basic-window-function-example">Basic window function example</h3>
<p>This behavior of window functions is best illustrated through an example. Suppose we have a <code>post_metrics</code> table that contains metrics about the performance of each post in our blog. We query the table but use a window function in order to see view count metrics about each individual post, as well as the average post performance for each author.</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="kw">SELECT</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>    author,</span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a>    post_title,</span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a>    view_count,</span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a>    <span class="fu">AVG</span>(view_count) <span class="kw">OVER</span> (<span class="kw">PARTITION</span> <span class="kw">BY</span> author) <span class="kw">AS</span> average_view_count</span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a><span class="kw">FROM</span></span>
<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a>    post_metrics;</span></code></pre></div>
<p>Which would give us a set of results like:</p>
<div class="table-wrapper">
<table>
<colgroup>
<col />
<col />
<col />
<col />
</colgroup>
<thead>
<tr>
<th>author</th>
<th>post_title</th>
<th>view_count</th>
<th>average_view_count</th>
</tr>
</thead>
<tbody>
<tr>
<td>John Smith</td>
<td>SQL basics, pt 1</td>
<td>1012</td>
<td>772</td>
</tr>
<tr>
<td>John Smith</td>
<td>SQL basics, pt 2</td>
<td>748</td>
<td>772</td>
</tr>
<tr>
<td>John Smith</td>
<td>SQL basics, pt 3</td>
<td>556</td>
<td>772</td>
</tr>
<tr>
<td>Michael Walters</td>
<td>Use Python for your next project</td>
<td>972</td>
<td>1660.5</td>
</tr>
<tr>
<td>Michael Walters</td>
<td>I was wrong, use Rust for your next project</td>
<td>2349</td>
<td>1660.5</td>
</tr>
</tbody>
</table>
</div>
<p>As we can see above, the benefit of using a window function over a <code>GROUP BY</code> statement is that we preserve the presence of each row in the result set. If we were to <code>GROUP BY author</code> (as opposed to applying the window over <code>PARTITION BY author</code>), we would still be able to see the <code>average_view_count</code>. However, depending on database engine, inferring information about the <code>post_title</code> and <code>view_count</code> columns would be either impossible due to error (such as in PostgreSQL, which would not allow us to include these non-aggregated and non-grouped columns in the query) or nonsensical (such as in MySQL, which would display an arbitrary single value from the group).</p>
<h3 id="more-advanced-window-function-example">More advanced window function example</h3>
<p>Let’s look at another application of window functions that really shows their strength. Suppose that we’re writing banking software and want to create an account statement that includes account activity, such as the list of all debits and credits that most banks make visible in their web applications. Let’s assume our database already has a <code>charges</code> table that includes each debit or credit, including the id of the account, the value of the change, and the date with the following data:</p>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>account_id</th>
<th>value</th>
<th>date</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>20</td>
<td>2022-07-12</td>
</tr>
<tr>
<td>2</td>
<td>95</td>
<td>2022-07-12</td>
</tr>
<tr>
<td>2</td>
<td>40</td>
<td>2022-07-13</td>
</tr>
<tr>
<td>1</td>
<td>35</td>
<td>2022-07-13</td>
</tr>
<tr>
<td>1</td>
<td>50</td>
<td>2022-07-14</td>
</tr>
<tr>
<td>1</td>
<td>-15</td>
<td>2022-07-15</td>
</tr>
<tr>
<td>2</td>
<td>-135</td>
<td>2022-07-15</td>
</tr>
</tbody>
</table>
</div>
<p>Generating a statement that includes activity requires indicating whether each charge is a debit or a credit. In our case we’ll treat positive values as a debit and negative charges as a credit. The activity view also requires displaying the account balance (the running total) along with each charge. Fortunately, we can use a window function to achieve this.</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="kw">SELECT</span></span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>    account_id,</span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a>    <span class="fu">abs</span>(<span class="fu">value</span>) <span class="kw">AS</span> charge_amount,</span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a>    <span class="cf">CASE</span> <span class="cf">WHEN</span> <span class="fu">value</span> <span class="op">&gt;</span> <span class="dv">0</span> <span class="cf">THEN</span> <span class="st">'CREDIT'</span> <span class="cf">ELSE</span> <span class="st">'DEBIT'</span> <span class="cf">END</span> <span class="kw">AS</span> charge_kind,</span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a>    <span class="fu">SUM</span>(<span class="fu">value</span>) <span class="kw">OVER</span> (</span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a>        <span class="kw">PARTITION</span> <span class="kw">BY</span> account_id</span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a>        <span class="kw">ORDER</span> <span class="kw">BY</span> created_at</span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a>        <span class="kw">ROWS</span> <span class="kw">BETWEEN</span> <span class="kw">UNBOUNDED</span> <span class="kw">PRECEDING</span> <span class="kw">AND</span> <span class="kw">CURRENT</span> <span class="kw">ROW</span></span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a>        ) <span class="kw">AS</span> balance,</span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a>    created_at <span class="kw">AS</span> transaction_date</span>
<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a><span class="kw">FROM</span> charges;</span></code></pre></div>
<p>This produces:</p>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>account_id</th>
<th>charge_amount</th>
<th>charge_kind</th>
<th>balance</th>
<th>transaction_date</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>20</td>
<td>CREDIT</td>
<td>20</td>
<td>2022-07-12</td>
</tr>
<tr>
<td>1</td>
<td>35</td>
<td>CREDIT</td>
<td>55</td>
<td>2022-07-13</td>
</tr>
<tr>
<td>1</td>
<td>50</td>
<td>CREDIT</td>
<td>105</td>
<td>2022-07-14</td>
</tr>
<tr>
<td>1</td>
<td>15</td>
<td>DEBIT</td>
<td>90</td>
<td>2022-07-15</td>
</tr>
<tr>
<td>2</td>
<td>95</td>
<td>CREDIT</td>
<td>95</td>
<td>2022-07-12</td>
</tr>
<tr>
<td>2</td>
<td>40</td>
<td>CREDIT</td>
<td>135</td>
<td>2022-07-13</td>
</tr>
<tr>
<td>2</td>
<td>135</td>
<td>DEBIT</td>
<td>0</td>
<td>2022-07-15</td>
</tr>
</tbody>
</table>
</div>
<p>Like our first example, this query partitions the data, in this case by the <code>account_id</code>. However, then the window function applies both an ordering as well as indicates which rows should be used in the calculation. The <code>ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code> tell the database to use only rows up to and including the current row in order to calculate the sum of the charges. Without specifying this, we would not see a running total in each row, but the final balance amount repeated for every row in each partition, similar to how we saw the <code>average_view_count</code> repeated in the previous example.</p>
<p>In practice, we may want to limit the data retrieval to one account at a time. In order to do that, we’d simple add a condition like <code>WHERE charges.account_id = 1</code> to the end of the query. The window function would still generate the <code>balance</code> column for just the one account.</p>
<p>When building our application, we can utilize window functions to perform these types of operations directly in the query, precluding the need to make an extra pass over the result set in our application code. Without the use of a window function, if we wanted to preserve individual post metrics in our result set, we would either need to write a second query to calculate the average view count or make a pass through the result set in our application code in order to compute the average for each author.</p>
<h2 id="set-operations-union-intersect-and-except">Set Operations: Union, Intersect, and Except</h2>
<p>Another way of maximizing the workload of the database is utilizing <code>UNION</code>, which enables two otherwise distinct queries that return the same column set to be executed in a single query. <code>UNION</code> can be used to reduce the number of distinct but similar calls to the database, reducing the round trip time that we highlighted earlier. <a href="https://www.foxhound.systems/blog/sql-performance-with-union/"><code>UNION</code> can also be used as an optimization technique</a> , enabling us to rewrite queries in a way that improves their performance while making them easier to understand. In the linked post, we walk through an in-depth example of a large query joining many tables with complex join conditions that was split into two much simpler queries and the result was combined using <code>UNION</code>.</p>
<p>The general structure of a UNION query is:</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="kw">SELECT</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>    col_1, col_2, col_3</span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a><span class="kw">FROM</span></span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a>    some_table</span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a><span class="kw">INNER</span> <span class="kw">JOIN</span></span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a>    some_other_table</span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a><span class="op">..</span>.</span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a><span class="kw">UNION</span></span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a><span class="kw">SELECT</span></span>
<span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a>    col_1, col_2, col_3</span>
<span id="cb8-11"><a href="#cb8-11" aria-hidden="true" tabindex="-1"></a><span class="kw">FROM</span></span>
<span id="cb8-12"><a href="#cb8-12" aria-hidden="true" tabindex="-1"></a>    some_table</span>
<span id="cb8-13"><a href="#cb8-13" aria-hidden="true" tabindex="-1"></a><span class="kw">LEFT</span> <span class="kw">JOIN</span></span>
<span id="cb8-14"><a href="#cb8-14" aria-hidden="true" tabindex="-1"></a>    another_table</span>
<span id="cb8-15"><a href="#cb8-15" aria-hidden="true" tabindex="-1"></a><span class="op">..</span>.</span></code></pre></div>
<p>The result of a query shaped like the one above would be a single set of rows with the <code>col_1</code>, <code>col_2</code>, and <code>col_3</code> columns. It does not matter which tables either side of the <code>UNION</code> performs a <code>SELECT</code> on, so long as the set of columns returned on each side are named the same and returned in the same order.</p>
<p>The <code>UNION</code> operation has the benefit of automatically deduplicating rows. If deduplication is unnecessary or undesirable, the <code>UNION ALL</code> operation should be used instead. For larger data sets, the performance of <code>UNION ALL</code> may be better than <code>UNION</code> as the deduplication can add to the query execution time.</p>
<p>In addition to <code>UNION</code>, there are also <code>INTERSECT</code> and <code>EXCEPT</code> operations that can be performed. <code>INTERSECT</code> will only include rows that appear in the result of both queries, whereas <code>EXCEPT</code> will include all rows from the left query other than ones that also appear in the right query. Both <code>INTERSECT</code> and <code>EXCEPT</code> will deduplicate rows unless the <code>ALL</code> keyword is used as a suffix. These operations are useful for filtering data when a single <code>SELECT</code> query cannot be used to define the conditions while achieving a well performing query, or when readability of the single-query form suffers.</p>
<p>All three operations can be used to both reduce the number of distinct queries that need to be sent to the database as well as reduce the workload of the application servers. For example, there’s no need for merging and deduplicating the result of two separate queries in your code when the SQL database has already done the work by the time the query results are returned.</p>
<h2 id="wrap-up">Wrap Up</h2>
<p>In this post we discussed some of the primary operations that SQL databases enable us to write queries that not only retrieve data but also perform complex computations. Effectively utilizing features like subqueries, aggregations, and set operations allows us to reduce the amount of chatter with the database, cuts down on the amount of data sent over the network, and can altogether eliminate computations that both our database and application servers need to perform. Applying these techniques effectively is absolutely critical to a highly performant application.</p>
<h2 id="further-reading-about-offloading-work-to-the-database">Further reading about offloading work to the database</h2>
<p>Below, we link to PostgreSQL documentation for most of these features, but the functionality is largely the same in MySQL/MariaDB, SQL Server, SQLite, and other SQL databases. If you’re just trying to generally familiarize yourself with these features, reading any database’s documentation should suffice.</p>
<ul>
<li><a href="https://www.postgresql.org/docs/current/tutorial-join.html" target="_blank" rel="noopener">PostgreSQL joins tutorial</a></li>
<li><a href="https://www.postgresql.org/docs/current/tutorial-agg.html" target="_blank" rel="noopener">PostgreSQL aggregate function tutorial</a></li>
<li><a href="https://www.postgresql.org/docs/current/tutorial-window.html" target="_blank" rel="noopener">PostgreSQL window function tutorial</a></li>
<li><a href="https://www.postgresql.org/docs/14/queries-union.html" target="_blank" rel="noopener">PostgreSQL union, intersect, and except tutorial</a></li>
</ul>
<hr />
<p>This post is part of a series titled <em>Essential elements of high performance applications</em>. The full list of published posts is available below.</p>
<ul>
<li><a href="https://www.foxhound.systems/blog/essential-elements-of-high-performance-sql-indexes/">SQL indexes</a></li>
<li><strong>Offloading work to the SQL database</strong> (this post)</li>
<li><a href="https://www.foxhound.systems/blog/essential-elements-of-high-performance-server-side-caching/">Server side caching</a></li>
</ul>
<hr />
<p><em>Christian Charukiewicz and Ben Levy are Partners at Foxhound Systems, where we focus on building fast and reliable custom software. Are you looking for help with something you’re working on? Reach out to us at <a href="mailto:info@foxhound.systems" target="_blank" rel="noopener">info@foxhound.systems</a></em>.</p>
        </div>
    </div>
</article>
]]></description>
    <pubDate>Mon, 25 Jul 2022 00:00:00 UT</pubDate>
    <guid>https://www.foxhound.systems/blog/essential-elements-of-high-performance-offloading-work-sql-database/index.html</guid>
    <dc:creator>Foxhound Systems</dc:creator>
</item>
<item>
    <title>Essential elements of high performance applications: SQL indexes</title>
    <link>https://www.foxhound.systems/blog/essential-elements-of-high-performance-sql-indexes/index.html</link>
    <description><![CDATA[<nav class="container mt-2 mb-2">
    <a href="https://www.foxhound.systems/blog/" class="no-underline magic-underline-slim">Back to all posts</a>
</nav>
<article class="grow post">
    
        <div class="lg-container post-banner">
            <picture>
                
                    <source srcset="https://www.foxhound.systems/img/2022-05-24-essential-elements-of-high-performance-sql-indexes/sql-indexes-banner.webp" type="image/webp" height="853" width="1280">
                
                <img src="https://www.foxhound.systems/img/2022-05-24-essential-elements-of-high-performance-sql-indexes/sql-indexes-banner.jpg" alt="A photograph of a snail attempting to cross the gap between two large rocks. The snail is climbing out of the gap and onto one of the rocks, apparently traveling from the opposite one." height="853" width="1280">
            </picture>
        </div>
    
    <div class="container">
        
        <h2 class="mt-2 mb-none color-muted">Essential elements of high performance applications</h2>
        
        <h1 class="title">SQL indexes</h1>
        <header class="info gy-1">
            <div class="author-date space-y-1">
                <div>
                    <span class="date">May 24, 2022</span>
                </div>
                <div class="flex flex-wrap gx-2 gy-1">
                    
                        
                        <div class="flex gap-2 items-center">
                            <img class="rounded-full w-12" src="https://www.foxhound.systems/assets/img/team/christian-sm.jpg" alt="Photo of Christian Charukiewicz">
                            <div class="ml-1 flex flex-col">
                                <span class="font-bold">Christian Charukiewicz</span>
                                <span>Partner &amp; Principal Software Engineer</span>
                            </div>
                        </div>
                        
                        
                    
                </div>
            </div>
            
            <div>
                <div class="tags"><a title="Posts tagged: performance-optimization" href="https://www.foxhound.systems/blog/tag/performance-optimization/">performance-optimization</a> <a title="Posts tagged: sql" href="https://www.foxhound.systems/blog/tag/sql/">sql</a> <a title="Posts tagged: essential-elements-of-high-performance" href="https://www.foxhound.systems/blog/tag/essential-elements-of-high-performance/">essential-elements-of-high-performance</a></div>
            </div>
            
        </header>
        <hr>
        <div class="content">
        <p>There are many aspects that go into making a fast application. Web application performance is a broad topic because there are numerous concerns in making a page load quickly or a button click feel responsive. One of the difficulties developers must grapple with in pursuit of performance is that any one of these facets can become a bottleneck if overlooked.</p>
<p>Building fast web applications requires a comprehensive understanding and examination of the entire system. In this post, we kickoff a series that covers essential elements that go into building high performance web applications. Throughout this series, we are going to discuss the performance of web applications written in a high level language (such as PHP or Python), backed by a SQL database, and where the frontend interacts with the back end through HTTP requests that download HTML, JSON, or a combination of both, since <a href="https://en.wikipedia.org/wiki/Multitier_architecture#Three-tier_architecture" target="_blank" rel="noopener">this structure</a> is the most common one found today.</p>
<!--more-->
<h2 id="sql-indexes">SQL indexes</h2>
<div class="article-banner">
    <hr>
    <a href="https://www.cloudtrellis.com/?utm_source=foxhound.systems&amp;utm_medium=banner&amp;utm_campaign=cloudtrellis" class="service-offering no-underline magic-underline-container color-accent-hover-container flex-col w-100 items-center text-left font-manrope" target="_blank" rel="noopener">
        <span class="block w-100 text-center mb-1">
            <div class="flex items-center g-0">
                <span class="inline-block relative">
                    <svg class="color-muted color-accent-hover block s-w-3 s-h-3"><use href="#cloudtrellis-logo"></use></svg>
                </span>
                <span class="font-larger-3">Cloudtrellis</span>
            </div>
        </span>
        <span class="w-100 mb-1 block color-muted font-smaller">A new service built by Foxhound Systems</span>
        <span class="font-bold font-larger-3 block mb-2">Discover problems with your website before your users do</span>
        <br><br>
        <span class="w-100 mb-1 block">Cloudtrellis scans your entire site for broken links, accessibility issues, and SEO errors to ensure a flawless user experience.</span>
        <br><br>
        <ul class="flex-col items-baseline mb-2 mt-none">
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Detect error pages, broken links, accessibility issues, and SEO problems</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Create scans with tailored configurations for each website and subdomain you manage</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Schedule scans to run monthly, weekly, or even daily to closely monitor for new issues</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Get notified of new scan results via email</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Share scan results with your team via direct link</span>
            </li>
        </ul>
        <span class="block w-full cta-btn pt-2">Learn more</span>
    </a>
    <hr>
</div>

<p>One of the biggest boons to performance in a web application is effective use of indexes. An index improves the performance of queries by using a lookup table to answer a query instead of scanning all of the data in the underlying table. The role of an index in SQL is similar to the role of the index in the back of a book—instead of scanning the entire book for a particular term, we can look for the term in the index and then jump to the page in the book that the index specifies for that term.</p>
<p>Indexes improve the performance of queries that use conditions to retrieve or modify data. This will frequently be <code>SELECT</code> queries but will often include <code>DELETE</code> and <code>UPDATE</code> queries as well, and more specifically queries containing clauses such as <code>WHERE</code>, <code>HAVING</code>, <code>JOIN</code>, amongst others. The trade off is that indexes reduce the performance of <code>INSERT</code> and <code>UPDATE</code> queries, since the database engine must update the relevant indices whenever the underlying data changes.</p>
<p>Indexes are configured for a table by specifying which columns are a part of the index. Suppose we create the following table:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">CREATE</span> <span class="kw">TABLE</span> users (</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>    <span class="kw">id</span> SERIAL <span class="kw">PRIMARY</span> <span class="kw">KEY</span>,</span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>    password_hash <span class="dt">VARCHAR</span>,</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>    email_address <span class="dt">VARCHAR</span>,</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a>    created_at <span class="dt">TIMESTAMP</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a>);</span></code></pre></div>
<p>It’s worth noting that most SQL databases will automatically create an index on primary key and unique columns, so in the example above our <code>id</code> column will have an index. However, if we find ourselves searching for recently created users by filtering on the <code>created_at</code> column, we’ll want to create an index on that column as well:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="kw">CREATE</span> <span class="kw">INDEX</span> users_created_at_index <span class="kw">ON</span> users (created_at);</span></code></pre></div>
<p>In PostgreSQL, we can use the <code>\d</code> command to see information about a table. Here’s the output after both creating the table and the index:</p>
<pre><code>postgres=# \d users
                                          Table &quot;public.users&quot;
    Column     |            Type             | Collation | Nullable |              Default
---------------+-----------------------------+-----------+----------+-----------------------------------
 id            | integer                     |           | not null | nextval('users_id_seq'::regclass)
 password_hash | character varying           |           |          |
 email_address | character varying           |           |          |
 created_at    | timestamp without time zone |           |          |
Indexes:
    &quot;users_pkey&quot; PRIMARY KEY, btree (id)
    &quot;users_created_at_index&quot; btree (created_at)</code></pre>
<p>We can see that the <code>users_created_at_index</code> exists on the <code>created_at</code> column. The output gives us information about the type of data structure this index uses (a <a href="https://en.wikipedia.org/wiki/B-tree" target="_blank" rel="noopener">B-tree</a>) as well as whether the index is part of a primary key, which we can see is true for the index on the <code>id</code> column. The default data structure your database engine uses to create indexes is adequate for most use cases, and it is rare that you will have to specify something different.</p>
<p>With the above index, queries that use the <code>created_at</code> column will now be able to take advantage of the index. Here’s an example query that finds users created in the last week:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="kw">SELECT</span> <span class="op">*</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="kw">FROM</span> users</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="kw">WHERE</span> created_at <span class="op">&gt;=</span> NOW() <span class="op">-</span> <span class="dt">INTERVAL</span> <span class="st">'7'</span> <span class="dt">DAY</span>;</span></code></pre></div>
<p>In both cases the query will return the same results, but in a table with a lot of data, the presence of the index on the <code>created_at</code> column may allow the query to complete hundreds or even thousands of times faster than without.</p>
<h3 id="multicolumn-indexes">Multicolumn indexes</h3>
<p>In some instances, it can be beneficial to include multiple columns in an index. When your use case involves frequently running queries that specify conditions on multiple columns, creating multicolumn indexes that include several or all of the columns in the condition can significantly improve performance.</p>
<p>For example, suppose we often run this query (with <code>?</code> being a stand-in for arbitrary string values):</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="kw">SELECT</span> <span class="fu">first</span>, <span class="fu">last</span>, email</span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a><span class="kw">FROM</span> users</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="kw">WHERE</span> <span class="fu">first</span> <span class="op">=</span> <span class="st">'?'</span></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a><span class="kw">AND</span> <span class="fu">last</span> <span class="op">=</span> <span class="st">'?'</span>;</span></code></pre></div>
<p>Since both the <code>first</code> and <code>last</code> columns are used in our <code>WHERE</code> conditions, we can create an index on both columns:</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="kw">CREATE</span> <span class="kw">INDEX</span> users_first_last_index <span class="kw">ON</span> users (<span class="fu">first</span>, <span class="fu">last</span>);</span></code></pre></div>
<p>Multicolumn indexes have very specific performance characteristics. Generally speaking, utilization of a multicolumn index is most efficient when there are constraints on the leftmost column in the index (<code>first</code> in this case). It’s worth reading your database engine’s documentation on multicolumn indexes to understand how to best utilize them for your use case.</p>
<h3 id="covering-indexes">Covering indexes</h3>
<p>Building on top of multicolumn indexes, we’ll introduce one last feature: the covering index. When a multicolumn index contains every column that is to be retrieved by the query, most database engines can avoid reading from the table altogether and will return the values directly from the index.</p>
<p>Here’s an example of a query that is suitable for this type of optimization:</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="kw">SELECT</span> email, created_at</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="kw">FROM</span> users</span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="kw">WHERE</span> email <span class="op">=</span> <span class="st">'?'</span></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a><span class="kw">AND</span> created_at <span class="op">&gt;=</span> <span class="st">'?'</span>;</span></code></pre></div>
<p>We can create the following index that will end up serving as a covering index for this query:</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="kw">CREATE</span> <span class="kw">INDEX</span> users_email_created_at_index <span class="kw">ON</span> users (email, created_at);</span></code></pre></div>
<p>Since we’re returning only the <code>email</code> and <code>created_at</code> columns from the query, and the index contains both of these columns, the database will be able to perform an index-only scan and skip reading from the table in order to return our desired results. This can lead to a significant improvement in performance for this query.</p>
<p>However, keep in mind that every index comes with a cost: while the performance improves for querying the data, the performance of writing data goes down, as the index must be updated alongside the underlying table data. Deciding what indexes to create often requires analysis of the performance of your database running under a real workload. Preemptively creating numerous multi-column indexes in an attempt to improve the performance of various queries that your application may perform is not recommended.</p>
<h3 id="further-reading-about-indexes">Further reading about indexes</h3>
<p>Utilizing indexes effectively is an important part of building a fast application and what we’ve covered here only scratches the surface. You can read the index documentation for the SQL database of your choice to learn more.</p>
<ul>
<li><a href="https://use-the-index-luke.com/" target="_blank" rel="noopener">Use the index, Luke!</a></li>
<li><a href="https://www.postgresql.org/docs/current/indexes.html" target="_blank" rel="noopener">PostgreSQL index documentation</a></li>
<li><a href="https://dev.mysql.com/doc/refman/8.0/en/mysql-indexes.html" target="_blank" rel="noopener">MySQL index documentation</a></li>
<li><a href="https://sqlite.org/lang_createindex.html" target="_blank" rel="noopener">SQLite index documentation</a></li>
</ul>
<hr />
<p>This post is part of a series titled <em>Essential elements of high performance applications</em>. The full list of published posts is available below.</p>
<ul>
<li><strong>SQL indexes</strong> (this post)</li>
<li><a href="https://www.foxhound.systems/blog/essential-elements-of-high-performance-offloading-work-sql-database/">Offloading work to the SQL database</a></li>
<li><a href="https://www.foxhound.systems/blog/essential-elements-of-high-performance-server-side-caching/">Server side caching</a></li>
</ul>
<hr />
<p><em>Christian Charukiewicz is a Partner at Foxhound Systems, where we focus on building fast, reliable, and intuitive custom software. Have an idea for a new application? We’ll deliver the best version of it. <a href="https://www.foxhound.systems/contact/">Start a project with us</a>.</em></p>
        </div>
    </div>
</article>
]]></description>
    <pubDate>Tue, 24 May 2022 00:00:00 UT</pubDate>
    <guid>https://www.foxhound.systems/blog/essential-elements-of-high-performance-sql-indexes/index.html</guid>
    <dc:creator>Foxhound Systems</dc:creator>
</item>
<item>
    <title>Composable Data Validation with Haskell</title>
    <link>https://www.foxhound.systems/blog/composable-data-validation/index.html</link>
    <description><![CDATA[<nav class="container mt-2 mb-2">
    <a href="https://www.foxhound.systems/blog/" class="no-underline magic-underline-slim">Back to all posts</a>
</nav>
<article class="grow post">
    
        <div class="lg-container post-banner">
            <picture>
                
                    <source srcset="https://www.foxhound.systems/img/2021-07-26-composable-data-validation/composable-validation-banner.webp" type="image/webp" height="853" width="1280">
                
                <img src="https://www.foxhound.systems/img/2021-07-26-composable-data-validation/composable-validation-banner.jpg" alt="A photograph of an old European house, with different portions of the facade build from distinct materials, such as stone, red brick, white brick." height="853" width="1280">
            </picture>
        </div>
    
    <div class="container">
        
        <h1 class="title">Composable Data Validation with Haskell</h1>
        <header class="info gy-1">
            <div class="author-date space-y-1">
                <div>
                    <span class="date">July 26, 2021</span>
                </div>
                <div class="flex flex-wrap gx-2 gy-1">
                    
                        
                        
                        <div class="flex gap-2 items-center">
                            <img class="rounded-full w-12" src="https://www.foxhound.systems/assets/img/team/ben-sm.jpg" alt="Photo of Ben Levy">
                            <div class="ml-1 flex flex-col">
                                <span class="font-bold">Ben Levy</span>
                                <span>Partner &amp; Principal Software Engineer</span>
                            </div>
                        </div>
                        
                    
                        
                        <div class="flex gap-2 items-center">
                            <img class="rounded-full w-12" src="https://www.foxhound.systems/assets/img/team/christian-sm.jpg" alt="Photo of Christian Charukiewicz">
                            <div class="ml-1 flex flex-col">
                                <span class="font-bold">Christian Charukiewicz</span>
                                <span>Partner &amp; Principal Software Engineer</span>
                            </div>
                        </div>
                        
                        
                    
                </div>
            </div>
            
            <div>
                <div class="tags"><a title="Posts tagged: haskell" href="https://www.foxhound.systems/blog/tag/haskell/">haskell</a></div>
            </div>
            
        </header>
        <hr>
        <div class="content">
        <p>Recently, a client asked us to work on a new <a href="https://en.wikipedia.org/wiki/Business_rules_engine" target="_blank" rel="noopener">Rules Engine</a> for them. This system serves as the back end to a highly configurable dashboard where non-technical users define business rules for how a certain component of the system behaves. When deployed, this system has to handle a substantial production workload, so performance is also a key consideration.</p>
<!--more-->
<p>So, our primary requirements were that this rules engine would be:</p>
<ol type="1">
<li>Configurable - The end user should be given toggles to get the desired behavior from the fixed kinds of rules.</li>
<li>Fast - This system should be able to handle several hundred validation requests per second.</li>
<li>Robust - The system should not go down and should be easy to modify on the fly.</li>
</ol>
<p>A secondary requirement related to robustness would be that the rules be:</p>
<ol start="4" type="1">
<li>Declarative - A product manager should be able to look at the rule code and have a reasonably accurate understanding as to what the rule is doing.</li>
</ol>
<h2 id="creating-an-edsl">Creating an eDSL</h2>
<p>In order to meet the above requirements, we decided to write a small <a href="https://en.wikipedia.org/wiki/Domain-specific_language#External_and_Embedded_Domain_Specific_Languages" target="_blank" rel="noopener">embedded domain-specific language (eDSL)</a> to enable writing declarative validation rules. This article will show a simplified version of the actual language being used in production. We will be using a shallow embedding also known as a monomorphic <a href="https://www.foxhound.systems/blog/final-tagless/">final tagless encoding</a>.</p>
<p>The first question that we must ask ourselves is not what operations we want to perform but what is the essence of our problem or domain, this is often called <a href="http://conal.net/blog/posts/denotational-design-with-type-class-morphisms" target="_blank" rel="noopener">denotational design</a>.</p>
<p>In our case, this is relatively simple—a <em>validation</em> is a mapping from an input value to whether it is valid or not. Putting this into code:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">ValidationRule</span> a <span class="ot">=</span> <span class="dt">ValidationRule</span> {<span class="ot"> validate ::</span> a <span class="ot">-&gt;</span> <span class="dt">Bool</span> }</span></code></pre></div>
<p>Given this as a basis we can start to write functions that help us to build these validation rules, for our basic rules we will use a convention of a trailing “_” to avoid name collisions. A simple example is the equality rule:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="ot">eq_ ::</span> <span class="dt">Eq</span> a <span class="ot">=&gt;</span> a <span class="ot">-&gt;</span> <span class="dt">ValidationRule</span> a</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>eq_ ruleValue <span class="ot">=</span> <span class="dt">ValidationRule</span> <span class="op">$</span> \actual <span class="ot">-&gt;</span> actual <span class="op">==</span> ruleValue</span></code></pre></div>
<p>In the above, <code>ruleValue</code> is a value that is embedded in the <code>ValidationRule</code> whereas <code>actual</code> is the value that this rule will attempt to validate during a validation operation. An example of how this rule can be built and configured is:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="ot">equalsFive ::</span> <span class="dt">ValidationRule</span> <span class="dt">Int</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>equalsFive <span class="ot">=</span> eq_ <span class="dv">5</span></span></code></pre></div>
<p>Already we see that rules are composed of both static and dynamic parts. The shape of the rule is statically determined by the function <code>eq_</code> but the value (which is <code>5</code> in our example) can be any runtime value. <code>eq_</code> isn’t the only rule we may want. Let’s make a few more basic rules:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>lt_,<span class="ot"> gt_ ::</span> <span class="dt">Ord</span> a <span class="ot">=&gt;</span> a <span class="ot">-&gt;</span> <span class="dt">ValidationRule</span> a</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>gt_ ruleValue <span class="ot">=</span> <span class="dt">ValidationRule</span> <span class="op">$</span> \actual <span class="ot">-&gt;</span> actual <span class="op">&gt;</span> ruleValue</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>lt_ ruleValue <span class="ot">=</span> <span class="dt">ValidationRule</span> <span class="op">$</span> \actual <span class="ot">-&gt;</span> actual <span class="op">&lt;</span> ruleValue</span></code></pre></div>
<h2 id="making-rules-composable">Making rules composable</h2>
<p>With this, our eDSL gives us a way to compare a value to something, but this alone is not very useful. We can make bespoke rules each time there is a new business use case, but what we need is the ability to build up larger validations. In boolean logic we have two main operations for conjunction and disjunction also know as <code>AND</code> and <code>OR</code>. Let’s write them:</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>and_,<span class="ot"> or_ ::</span> <span class="dt">ValidationRule</span> a <span class="ot">-&gt;</span> <span class="dt">ValidationRule</span> a <span class="ot">-&gt;</span> <span class="dt">ValidationRule</span> a</span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>and_ rule1 rule2 <span class="ot">=</span> <span class="dt">ValidationRule</span> <span class="op">$</span> \actual <span class="ot">-&gt;</span></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>    validate rule1 actual <span class="op">&amp;&amp;</span> validate rule2 actual</span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a>or_ rule1 rule2 <span class="ot">=</span> <span class="dt">ValidationRule</span> <span class="op">$</span> \actual <span class="ot">-&gt;</span></span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a>    validate rule1 actual <span class="op">||</span> validate rule2 actual</span></code></pre></div>
<p>It’s easy for us to add support for the inverse, or <code>NOT</code>:</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="ot">not_ ::</span> <span class="dt">ValidationRule</span> a <span class="ot">-&gt;</span> <span class="dt">ValidationRule</span> a</span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>not_ rule <span class="ot">=</span> <span class="dt">ValidationRule</span> <span class="op">$</span> \actual <span class="ot">-&gt;</span> <span class="fu">not</span> <span class="op">$</span> validate rule actual</span></code></pre></div>
<p>We can compose the rules we have written so far to create new combinators. Let’s create the <em>greater than or equal to</em> operator (<code>&gt;=</code> in most languages):</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="ot">geq_ ::</span> <span class="dt">Ord</span> a <span class="ot">=&gt;</span> a <span class="ot">-&gt;</span> <span class="dt">ValidationRule</span> a</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>geq_ value <span class="ot">=</span> gt_ value <span class="ot">`or_`</span> eq_ value</span></code></pre></div>
<h2 id="handling-complex-data-types">Handling complex data types</h2>
<p>At the moment, there is still a key limitation with our language. All of these rules need to be of the same type and we do not have a way to change that type. Usually when we see something that looks like <code>f a</code> and we want <code>f b</code> we reach for <code>fmap</code> on the <code>Functor</code> type class. Let’s try to write that for our type:</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Functor</span> <span class="dt">ValidationRule</span> <span class="kw">where</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a><span class="ot">    fmap ::</span> (a <span class="ot">-&gt;</span> b) <span class="ot">-&gt;</span> <span class="dt">ValidationRule</span> a <span class="ot">-&gt;</span> <span class="dt">ValidationRule</span> b</span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a>    <span class="fu">fmap</span> f rule <span class="ot">=</span> <span class="dt">ValidationRule</span> <span class="op">$</span> \actual <span class="ot">-&gt;</span></span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a>        validate rule <span class="op">????</span></span></code></pre></div>
<p>We have run into a problem, we have a function <code>a -&gt; b</code> and a <code>ValidationRule a</code> and want a <code>ValidationRule b</code>. This means when we try to construct our new validation rule we have a value <code>actual :: b</code> but <code>validate rule</code> needs something of type <code>a</code>. If we had a function <code>b -&gt; a</code> we could achieve this. However, <code>f</code> is the reverse of what we need. It seems like we want something that is like a reverse functor:</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="ot">notQuiteFmap ::</span> (b <span class="ot">-&gt;</span> a) <span class="ot">-&gt;</span> <span class="dt">ValidationRule</span> a <span class="ot">-&gt;</span> <span class="dt">ValidationRule</span> b</span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a>notQuiteFmap f rule <span class="ot">=</span> <span class="dt">ValidationRule</span> <span class="op">$</span> \actual <span class="ot">-&gt;</span></span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a>    validate rule (f actual)</span></code></pre></div>
<p>This function has a name, <code>contramap</code>, and it belongs to the <a href="https://hackage.haskell.org/package/base-4.15.0.0/docs/Data-Functor-Contravariant.html" target="_blank" rel="noopener"><code>Contravariant</code></a> Functor also known as a Cofunctor. The documentation for <code>Contravariant</code> defines the difference between the two:</p>
<blockquote>
<p>Whereas in Haskell, one can think of a <a href="https://hackage.haskell.org/package/base-4.15.0.0/docs/Data-Functor.html#t:Functor" target="_blank" rel="noopener">Functor</a> as containing or producing values, a contravariant functor is a functor that can be thought of as consuming values.</p>
</blockquote>
<p>In fact, the example used is <code>Predicate a</code> which is exactly the same as our type <code>ValidationRule a</code>. It’s always nice when you can find a preexisting type to validate your approach.</p>
<p>The example provided (checking whether an account balance is overdrawn) is good demonstration of how to use <code>contramap</code>, so let’s implement it. First we set up the data type of our account and our negative rule which is in terms of an <code>Integer</code>:</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Account</span> <span class="ot">=</span> <span class="dt">Account</span></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>    {<span class="ot"> accountBalance ::</span> <span class="dt">Integer</span></span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a>    ,<span class="ot"> accountName ::</span> <span class="dt">Text</span></span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a>    ,<span class="ot"> accountOwner ::</span> <span class="dt">Text</span></span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a>    }</span>
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a><span class="ot">negative_ ::</span> <span class="dt">ValidationRule</span> <span class="dt">Integer</span></span>
<span id="cb10-8"><a href="#cb10-8" aria-hidden="true" tabindex="-1"></a>negative_ <span class="ot">=</span> lt_ <span class="dv">0</span></span></code></pre></div>
<p>We want to be able to validate an <code>Account</code> but we only have a rule that works on <code>Integer</code> and we don’t want to be writing lots of one-off rules. This is where we can apply the <code>contramap</code> function:</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="ot">overdrawn ::</span> <span class="dt">ValidationRule</span> <span class="dt">Account</span></span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>overdrawn <span class="ot">=</span> contramap accountBalance negative_</span></code></pre></div>
<p>That’s fairly terse, so what does it mean? In plain English we can say an <code>Account</code> (the type that’s being validated) is <code>overdrawn</code> (the name of the function) if the <code>accountBalance</code> (the name of the field we are using <code>contramap</code> with) is <code>negative_</code> (the rule we contramapped). We can imagine this being used in a larger validation:</p>
<div class="sourceCode" id="cb12"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="ot">accountOwnedBy ::</span> <span class="dt">Text</span> <span class="ot">-&gt;</span> <span class="dt">ValidationRule</span> <span class="dt">Account</span></span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a>accountOwnedBy owner <span class="ot">=</span> contramap accountOwner <span class="op">$</span> eq_ owner</span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a><span class="ot">withdrawalAllowed ::</span> <span class="dt">ValidationRule</span> <span class="dt">Account</span></span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a>withdrawalAllowed <span class="ot">=</span></span>
<span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a>    accountOwnedBy <span class="st">&quot;Alice&quot;</span> <span class="ot">`and_`</span> (not_ overdrawn)</span></code></pre></div>
<p>So we are now able to make fairly complex validation rules which read almost verbatim like English. “A withdrawal on this Account is allowed if the account is owned by Alice and it is not overdrawn.” That’s pretty cool but we’re starting to see the ergonomics of our API fray.</p>
<div class="article-banner">
    <hr>
    <a href="https://www.cloudtrellis.com/?utm_source=foxhound.systems&amp;utm_medium=banner&amp;utm_campaign=cloudtrellis" class="service-offering no-underline magic-underline-container color-accent-hover-container flex-col w-100 items-center text-left font-manrope" target="_blank" rel="noopener">
        <span class="block w-100 text-center mb-1">
            <div class="flex items-center g-0">
                <span class="inline-block relative">
                    <svg class="color-muted color-accent-hover block s-w-3 s-h-3"><use href="#cloudtrellis-logo"></use></svg>
                </span>
                <span class="font-larger-3">Cloudtrellis</span>
            </div>
        </span>
        <span class="w-100 mb-1 block color-muted font-smaller">A new service built by Foxhound Systems</span>
        <span class="font-bold font-larger-3 block mb-2">Discover problems with your website before your users do</span>
        <br><br>
        <span class="w-100 mb-1 block">Cloudtrellis scans your entire site for broken links, accessibility issues, and SEO errors to ensure a flawless user experience.</span>
        <br><br>
        <ul class="flex-col items-baseline mb-2 mt-none">
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Detect error pages, broken links, accessibility issues, and SEO problems</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Create scans with tailored configurations for each website and subdomain you manage</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Schedule scans to run monthly, weekly, or even daily to closely monitor for new issues</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Get notified of new scan results via email</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Share scan results with your team via direct link</span>
            </li>
        </ul>
        <span class="block w-full cta-btn pt-2">Learn more</span>
    </a>
    <hr>
</div>

<h2 id="adding-validation-results">Adding validation results</h2>
<p>Looking at the above example, we know if a withdrawal is allowed or not but we don’t know why it is not allowed. For example, the failure may have occurred due to the account being overdrawn, but it could also have been because the owner of the account was Bob rather than Alice.</p>
<p>Let’s revise our semantic domain a bit to help us keep track of why something failed. In order to track this we’re going to define a <code>Validation</code> data type instead of using a <code>Bool</code> for our return value:</p>
<div class="sourceCode" id="cb13"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">ErrMsg</span> <span class="ot">=</span> <span class="dt">Text</span></span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Validation</span> err</span>
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a>    <span class="ot">=</span> <span class="dt">Success</span></span>
<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a>    <span class="op">|</span> <span class="dt">Failure</span> err</span>
<span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-7"><a href="#cb13-7" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">ValidationResult</span> <span class="ot">=</span> <span class="dt">Validation</span> [<span class="dt">ErrMsg</span>]</span>
<span id="cb13-8"><a href="#cb13-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-9"><a href="#cb13-9" aria-hidden="true" tabindex="-1"></a><span class="ot">success ::</span> <span class="dt">ValidationResult</span></span>
<span id="cb13-10"><a href="#cb13-10" aria-hidden="true" tabindex="-1"></a>success <span class="ot">=</span> <span class="dt">Success</span></span>
<span id="cb13-11"><a href="#cb13-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-12"><a href="#cb13-12" aria-hidden="true" tabindex="-1"></a><span class="ot">failure ::</span> <span class="dt">ErrMsg</span> <span class="ot">-&gt;</span> <span class="dt">ValidationResult</span></span>
<span id="cb13-13"><a href="#cb13-13" aria-hidden="true" tabindex="-1"></a>failure errMsg <span class="ot">=</span> <span class="dt">Failure</span> [errMsg]</span>
<span id="cb13-14"><a href="#cb13-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-15"><a href="#cb13-15" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">ValidationRule</span> a <span class="ot">=</span></span>
<span id="cb13-16"><a href="#cb13-16" aria-hidden="true" tabindex="-1"></a>    <span class="dt">ValidationRule</span> {<span class="ot"> validate ::</span> a <span class="ot">-&gt;</span> <span class="dt">ValidationResult</span> }</span></code></pre></div>
<p>We’re going to need to rewrite our core functions now to use the new representation. We will only rewrite a few of them. The rest are left as an exercise to the reader.</p>
<div class="sourceCode" id="cb14"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="ot">eq_ ::</span> (<span class="dt">Show</span> a, <span class="dt">Eq</span> a) <span class="ot">=&gt;</span> a <span class="ot">-&gt;</span> <span class="dt">ValidationRule</span> a</span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a>eq_ value <span class="ot">=</span> <span class="dt">ValidationRule</span> <span class="op">$</span> \actual <span class="ot">-&gt;</span></span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a>    <span class="kw">if</span> actual <span class="op">==</span> value <span class="kw">then</span></span>
<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a>        success</span>
<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a>    <span class="kw">else</span></span>
<span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a>        failure (Text.pack <span class="op">$</span> <span class="st">&quot;Expected &quot;</span> <span class="op">&lt;&gt;</span> <span class="fu">show</span> actual <span class="op">&lt;&gt;</span> <span class="st">&quot; to equal &quot;</span> <span class="op">&lt;&gt;</span> <span class="fu">show</span> value)</span>
<span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a><span class="ot">and_ ::</span> <span class="dt">ValidationRule</span> a <span class="ot">-&gt;</span> <span class="dt">ValidationRule</span> a <span class="ot">-&gt;</span> <span class="dt">ValidationRule</span> a</span>
<span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a>and_ rule1 rule2 <span class="ot">=</span> <span class="dt">ValidationRule</span> <span class="op">$</span> \actual <span class="ot">-&gt;</span></span>
<span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a>    <span class="kw">case</span> (validate rule1 actual, validate rule2 actual) <span class="kw">of</span></span>
<span id="cb14-11"><a href="#cb14-11" aria-hidden="true" tabindex="-1"></a>        (<span class="dt">Failure</span> e1, <span class="dt">Failure</span> e2) <span class="ot">-&gt;</span> <span class="dt">Failure</span> (e1 <span class="op">&lt;&gt;</span> e2)</span>
<span id="cb14-12"><a href="#cb14-12" aria-hidden="true" tabindex="-1"></a>        (<span class="dt">Failure</span> e1, _)          <span class="ot">-&gt;</span> <span class="dt">Failure</span> e1</span>
<span id="cb14-13"><a href="#cb14-13" aria-hidden="true" tabindex="-1"></a>        (_, <span class="dt">Failure</span> e2)          <span class="ot">-&gt;</span> <span class="dt">Failure</span> e2</span>
<span id="cb14-14"><a href="#cb14-14" aria-hidden="true" tabindex="-1"></a>        (<span class="dt">Success</span>, <span class="dt">Success</span>)       <span class="ot">-&gt;</span> <span class="dt">Success</span></span>
<span id="cb14-15"><a href="#cb14-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-16"><a href="#cb14-16" aria-hidden="true" tabindex="-1"></a><span class="ot">or_ ::</span> <span class="dt">ValidationRule</span> a <span class="ot">-&gt;</span> <span class="dt">ValidationRule</span> a <span class="ot">-&gt;</span> <span class="dt">ValidationRule</span> a</span>
<span id="cb14-17"><a href="#cb14-17" aria-hidden="true" tabindex="-1"></a>or_ rule1 rule2 <span class="ot">=</span> <span class="dt">ValidationRule</span> <span class="op">$</span> \actual <span class="ot">-&gt;</span></span>
<span id="cb14-18"><a href="#cb14-18" aria-hidden="true" tabindex="-1"></a>    <span class="kw">case</span> (validate rule1 actual, validate rule2 actual) <span class="kw">of</span></span>
<span id="cb14-19"><a href="#cb14-19" aria-hidden="true" tabindex="-1"></a>        (<span class="dt">Failure</span> e1, <span class="dt">Failure</span> e2) <span class="ot">-&gt;</span> <span class="dt">Failure</span> (e1 <span class="op">&lt;&gt;</span> e2)</span>
<span id="cb14-20"><a href="#cb14-20" aria-hidden="true" tabindex="-1"></a>        (<span class="dt">Success</span>, _)             <span class="ot">-&gt;</span> <span class="dt">Success</span></span>
<span id="cb14-21"><a href="#cb14-21" aria-hidden="true" tabindex="-1"></a>        (_, <span class="dt">Success</span>)             <span class="ot">-&gt;</span> <span class="dt">Success</span></span></code></pre></div>
<p>Now we can run our validate function and if there is a failure we will know why the validation failed. And the best part is, we don’t actually have to change how the top level rule is written (though we do have to recompile since we’re not taking full advantage of the polymorphic final tagless approach). Since only our primitives are aware of <code>ValidationResult</code>, we did not have to update any of the more sophisticated business rules during this change.</p>
<h2 id="conclusion">Conclusion</h2>
<p>In this post we solved the problem of configurable validation by creating an eDSL that allows us to create sophisticated business rules through the composition of primitive rules. The performance of validating inputs using this approach is very good since there is almost no interpretive overhead when composing functions as we have done here.</p>
<p>The eDSL we have shown here is only slightly simpler than what we built for our client, and illustrates the core ideas clearly. A language like this one can continue to evolve as necessary, oftentimes without requiring existing rules to be rewritten, as we saw above. One aspect we haven’t covered in this post is adding context to rules, giving us the ability to see where one of our primitive rules failed relative to a larger validation (e.g. we expected “Alice” but got “Bob” <em>in the context of validating the account owner</em>).</p>
<hr />
<p><em>Ben Levy and Christian Charukiewicz are Partners and Principal Software Engineers at Foxhound Systems. At Foxhound Systems, we focus on building fast and reliable custom software. Are you looking for help with something you’re working on? Reach out to us at <a href="mailto:info@foxhound.systems" target="_blank" rel="noopener">info@foxhound.systems</a></em>.</p>
        </div>
    </div>
</article>
]]></description>
    <pubDate>Mon, 26 Jul 2021 00:00:00 UT</pubDate>
    <guid>https://www.foxhound.systems/blog/composable-data-validation/index.html</guid>
    <dc:creator>Foxhound Systems</dc:creator>
</item>
<item>
    <title>Final tagless encodings have little to do with typeclasses</title>
    <link>https://www.foxhound.systems/blog/final-tagless/index.html</link>
    <description><![CDATA[<nav class="container mt-2 mb-2">
    <a href="https://www.foxhound.systems/blog/" class="no-underline magic-underline-slim">Back to all posts</a>
</nav>
<article class="grow post">
    
        <div class="lg-container post-banner">
            <picture>
                
                    <source srcset="https://www.foxhound.systems/img/2021-05-27-final-tagless/final-tagless-banner.webp" type="image/webp" height="853" width="1280">
                
                <img src="https://www.foxhound.systems/img/2021-05-27-final-tagless/final-tagless-banner.jpg" alt="A photograph of what appears to be a farmer's market stand containing a variety of squashes, including butternut, kabocha, and delicata." height="853" width="1280">
            </picture>
        </div>
    
    <div class="container">
        
        <h1 class="title">Final tagless encodings have little to do with typeclasses</h1>
        <header class="info gy-1">
            <div class="author-date space-y-1">
                <div>
                    <span class="date">May 27, 2021</span>
                </div>
                <div class="flex flex-wrap gx-2 gy-1">
                    
                        
                        
                        <div class="flex gap-2 items-center">
                            <img class="rounded-full w-12" src="https://www.foxhound.systems/assets/img/team/ben-sm.jpg" alt="Photo of Ben Levy">
                            <div class="ml-1 flex flex-col">
                                <span class="font-bold">Ben Levy</span>
                                <span>Partner &amp; Principal Software Engineer</span>
                            </div>
                        </div>
                        
                    
                </div>
            </div>
            
            <div>
                <div class="tags"><a title="Posts tagged: haskell" href="https://www.foxhound.systems/blog/tag/haskell/">haskell</a></div>
            </div>
            
        </header>
        <hr>
        <div class="content">
        <p>There’s a common misconception as to what <em>final tagless</em>, and more specifically, a <em>final encoding</em> is. A common claim I see is that <em>final tagless</em> means coding against typeclasses. The <a href="https://hackage.haskell.org/package/mtl" target="_blank" rel="noopener"><code>mtl</code></a> library and code written in MTL style are raised as examples of <em>final tagless</em>.</p>
<p>I would like to argue that what people are referring to as <em>final tagless</em> is in fact just coding against an interface and that the novelty of <em>final tagless</em> really has very little to do with abstract interfaces. So then what is <em>final tagless</em>? It’s a complicated name for a not-so-complicated idea.</p>
<!--more-->
<p>We can break it down into its constituent parts: <em>final</em> and <em>tagless</em>. The use of <em>final</em> is to contrast with the typical <em>initial</em> encoding of a language. Let’s start by looking at an <em>initial</em> encoding.</p>
<h2 id="an-initial-encoding">An Initial Encoding</h2>
<p>The most straightforward <em>initial</em> encoding is to make a sum type with a constructor per type. Let’s consider modeling a boolean argument to a SQL <code>WHERE</code> clause:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">SqlExpr</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>    <span class="ot">=</span> <span class="dt">B</span> <span class="dt">Bool</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>    <span class="op">|</span> <span class="dt">And</span> <span class="dt">SqlExpr</span> <span class="dt">SqlExpr</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>    <span class="op">|</span> <span class="dt">Or</span> <span class="dt">SqlExpr</span> <span class="dt">SqlExpr</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a>    <span class="op">|</span> <span class="dt">Not</span> <span class="dt">SqlExpr</span></span></code></pre></div>
<p>This <em>initial</em> encoding allows you to create arbitrarily nested boolean expressions, to run these expressions one would create an <code>eval</code> function that takes a <code>SqlExpr</code> and returns a <code>Bool</code>:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="ot">eval ::</span> <span class="dt">SqlExpr</span> <span class="ot">-&gt;</span> <span class="dt">Bool</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>eval (<span class="dt">B</span> b)             <span class="ot">=</span> b</span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>eval (<span class="dt">Leq</span> expr1 expr2) <span class="ot">=</span> eval expr1 <span class="op">&lt;=</span> eval expr2</span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a>eval (<span class="dt">And</span> expr1 expr2) <span class="ot">=</span> eval expr1 <span class="op">&amp;&amp;</span> eval expr2</span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a>eval (<span class="dt">Or</span> expr1 expr2)  <span class="ot">=</span> eval expr2 <span class="op">||</span> eval expr2</span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a>eval (<span class="dt">Not</span> expr)        <span class="ot">=</span> <span class="fu">not</span> (eval expr)</span></code></pre></div>
<p>This encoding is simple Haskell 98. However, our <code>WHERE</code> clause needs to take more than boolean arguments. Let’s try to expand this encoding to include <code>Int</code>:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">SqlExpr</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>    <span class="ot">=</span> <span class="dt">I</span> <span class="dt">Int</span></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>    <span class="op">|</span> <span class="dt">B</span> <span class="dt">Bool</span></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a>    <span class="op">|</span> <span class="dt">Leq</span> <span class="dt">SqlExpr</span> <span class="dt">SqlExpr</span></span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a>    <span class="op">|</span> <span class="dt">And</span> <span class="dt">SqlExpr</span> <span class="dt">SqlExpr</span></span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a>    <span class="op">|</span> <span class="dt">Or</span> <span class="dt">SqlExpr</span> <span class="dt">SqlExpr</span></span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a>    <span class="op">|</span> <span class="dt">Not</span> <span class="dt">SqlExpr</span></span></code></pre></div>
<div class="article-banner">
    <hr>
    <a href="https://www.cloudtrellis.com/?utm_source=foxhound.systems&amp;utm_medium=banner&amp;utm_campaign=cloudtrellis" class="service-offering no-underline magic-underline-container color-accent-hover-container flex-col w-100 items-center text-left font-manrope" target="_blank" rel="noopener">
        <span class="block w-100 text-center mb-1">
            <div class="flex items-center g-0">
                <span class="inline-block relative">
                    <svg class="color-muted color-accent-hover block s-w-3 s-h-3"><use href="#cloudtrellis-logo"></use></svg>
                </span>
                <span class="font-larger-3">Cloudtrellis</span>
            </div>
        </span>
        <span class="w-100 mb-1 block color-muted font-smaller">A new service built by Foxhound Systems</span>
        <span class="font-bold font-larger-3 block mb-2">Discover problems with your website before your users do</span>
        <br><br>
        <span class="w-100 mb-1 block">Cloudtrellis scans your entire site for broken links, accessibility issues, and SEO errors to ensure a flawless user experience.</span>
        <br><br>
        <ul class="flex-col items-baseline mb-2 mt-none">
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Detect error pages, broken links, accessibility issues, and SEO problems</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Create scans with tailored configurations for each website and subdomain you manage</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Schedule scans to run monthly, weekly, or even daily to closely monitor for new issues</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Get notified of new scan results via email</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Share scan results with your team via direct link</span>
            </li>
        </ul>
        <span class="block w-full cta-btn pt-2">Learn more</span>
    </a>
    <hr>
</div>

<p>So now lets define an <code>eval</code> function. But what should the type be? We can create either an <code>Int</code> or <code>Bool</code>. Let’s change our data type to include a type variable, with the intent to allow us to specify a <code>SqlExpr Bool</code> or <code>SqlExpr Int</code>:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">SqlExpr</span> a</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>    <span class="ot">=</span> <span class="dt">I</span> <span class="dt">Int</span></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>    <span class="op">|</span> <span class="dt">B</span> <span class="dt">Bool</span></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a>    <span class="op">|</span> <span class="dt">Leq</span> (<span class="dt">SqlExpr</span> <span class="dt">Int</span>) (<span class="dt">SqlExpr</span> <span class="dt">Int</span>)</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a>    <span class="op">|</span> <span class="dt">And</span> (<span class="dt">SqlExpr</span> <span class="dt">Bool</span>) (<span class="dt">SqlExpr</span> <span class="dt">Bool</span>)</span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a>    <span class="op">|</span> <span class="dt">Or</span> (<span class="dt">SqlExpr</span> <span class="dt">Bool</span>) (<span class="dt">SqlExpr</span> <span class="dt">Bool</span>)</span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a>    <span class="op">|</span> <span class="dt">Not</span> (<span class="dt">SqlExpr</span> <span class="dt">Bool</span>)</span></code></pre></div>
<p>This allows us to change our <code>eval</code> function to take a <code>SqlExpr a</code> and return an <code>a</code>:</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="ot">eval ::</span> <span class="dt">SqlExpr</span> a <span class="ot">-&gt;</span> a</span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>eval (<span class="dt">B</span> b)             <span class="ot">=</span> b</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>eval (<span class="dt">I</span> i)             <span class="ot">=</span> i</span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a>eval (<span class="dt">Leq</span> expr1 expr2) <span class="ot">=</span> eval expr1 <span class="op">&lt;=</span> eval expr2</span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a>eval (<span class="dt">And</span> expr1 expr2) <span class="ot">=</span> eval expr1 <span class="op">&amp;&amp;</span> eval expr2</span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a>eval (<span class="dt">Or</span> expr1 expr2)  <span class="ot">=</span> eval expr2 <span class="op">||</span> eval expr2</span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a>eval (<span class="dt">Not</span> expr)        <span class="ot">=</span> <span class="fu">not</span> (eval expr)</span></code></pre></div>
<p>Unfortunately, this doesn’t compile:</p>
<pre><code>Main.hs:30:26: error:
    • Couldn't match expected type ‘a’ with actual type ‘Bool’
      ‘a’ is a rigid type variable bound by
        the type signature for:
          eval :: forall a. SqlExpr a -&gt; a
        at Main.hs:29:1-22
    • In the expression: b
      In an equation for ‘eval’: eval (B b) = b
    • Relevant bindings include
        eval :: SqlExpr a -&gt; a (bound at Main.hs:30:1)
   |
30 | eval (B b)             = b
   |</code></pre>
<p>The compiler can’t tell that this is correct because there is nothing tying <code>B</code> to <code>SqlExpr Bool</code> and <code>I</code> to <code>SqlExpr Int</code></p>
<p>We can solve this by introducing a universal result type :</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">SqlExprResult</span></span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>    <span class="ot">=</span> <span class="dt">BoolResult</span> <span class="dt">Bool</span></span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a>    <span class="op">|</span> <span class="dt">IntResult</span> <span class="dt">Int</span></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a><span class="ot">eval ::</span> <span class="dt">SqlExpr</span> a <span class="ot">-&gt;</span> <span class="dt">SqlExprResult</span></span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a>eval (<span class="dt">B</span> b) <span class="ot">=</span> <span class="dt">BoolResult</span> b</span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a>eval (<span class="dt">I</span> i) <span class="ot">=</span> <span class="dt">IntResult</span> i</span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a>eval (<span class="dt">Leq</span> expr1 expr2) <span class="ot">=</span></span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a>    <span class="kw">let</span> <span class="dt">IntResult</span> i1 <span class="ot">=</span> eval expr1</span>
<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a>        <span class="dt">IntResult</span> i2 <span class="ot">=</span> eval expr2</span>
<span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a>    <span class="kw">in</span> <span class="dt">BoolResult</span> (i1 <span class="op">&lt;=</span> i2)</span>
<span id="cb7-13"><a href="#cb7-13" aria-hidden="true" tabindex="-1"></a><span class="op">...</span></span></code></pre></div>
<p>This universal result type is a tag, but unfortunately we have to pattern match on the result in our recursive constructors. This is an incomplete pattern match, making it possible to construct malformed statements like <code>Leq (B True) (I 10)</code>, leading to runtime errors.</p>
<p>If we want to continue down the path of this <em>initial</em> encoding we can use fancy types like <code>GADTs</code> to eliminate these, this leads to a <em>tagless initial</em> encoding.</p>
<h2 id="switching-to-a-final-encoding">Switching to a Final Encoding</h2>
<p>The paper <a href="https://okmij.org/ftp/tagless-final/JFP.pdf" target="_blank" rel="noopener">Finally Tagless, Partially Evaluated</a> presents an alternative to this <em>initial</em> encoding—the so-called <em>final</em> encoding. This encoding is called as such because it works in terms of the final representation rather than an intermediate datatype. Let’s write an implementation of our language <code>SqlExpr</code> language in a <em>final</em> encoding:</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">SqlExpr</span> a <span class="ot">=</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>    <span class="dt">SqlExpr</span> {<span class="ot"> unSqlExpr ::</span> a }</span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a><span class="ot">bool ::</span> <span class="dt">Bool</span> <span class="ot">-&gt;</span> <span class="dt">SqlExpr</span> <span class="dt">Bool</span></span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a>bool b <span class="ot">=</span> <span class="dt">SqlExpr</span> b</span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a><span class="ot">int ::</span> <span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="dt">SqlExpr</span> <span class="dt">Int</span></span>
<span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a>int i <span class="ot">=</span> <span class="dt">SqlExpr</span> i</span>
<span id="cb8-11"><a href="#cb8-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-12"><a href="#cb8-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-13"><a href="#cb8-13" aria-hidden="true" tabindex="-1"></a><span class="ot">leq ::</span> <span class="dt">SqlExpr</span> <span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="dt">SqlExpr</span> <span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="dt">SqlExpr</span> <span class="dt">Bool</span></span>
<span id="cb8-14"><a href="#cb8-14" aria-hidden="true" tabindex="-1"></a>leq expr1 expr2 <span class="ot">=</span> <span class="dt">SqlExpr</span> (unSqlExpr expr1 <span class="op">&lt;=</span> unSqlExpr expr2)</span>
<span id="cb8-15"><a href="#cb8-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-16"><a href="#cb8-16" aria-hidden="true" tabindex="-1"></a><span class="op">...</span></span>
<span id="cb8-17"><a href="#cb8-17" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-18"><a href="#cb8-18" aria-hidden="true" tabindex="-1"></a><span class="ot">eval ::</span> <span class="dt">SqlExpr</span> a <span class="ot">-&gt;</span> a</span>
<span id="cb8-19"><a href="#cb8-19" aria-hidden="true" tabindex="-1"></a>eval <span class="ot">=</span> unSqlExpr</span></code></pre></div>
<p>This compiles, has no tags, and allows only well formed statements to compile. Even <code>leq (bool True) (int 10)</code> will fail to compile with the correct compiler error. This is all that <em>final tagless</em> means. And if this is the only interpretation our language needs, then we are done.</p>
<p>But this of course is only one of a family of interpreters available. Another such interpreter will generate the SQL expression rather than interpreting directly. Let’s write a version that generates output fitting for the <code>rawQuery</code> function in <a href="https://hackage.haskell.org/package/persistent" target="_blank" rel="noopener">persistent</a>.</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE OverloadedStrings #-}</span></span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Text.Lazy.Builder</span> ( <span class="dt">Builder</span> )</span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a><span class="co">-- Stand-in for Database.Persist.PersistValue</span></span>
<span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">PersistValue</span></span>
<span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a>    <span class="ot">=</span> <span class="dt">PersistInt64</span> <span class="dt">Integer</span></span>
<span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a>    <span class="op">|</span> <span class="dt">PersistBool</span> <span class="dt">Bool</span></span>
<span id="cb9-9"><a href="#cb9-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-10"><a href="#cb9-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-11"><a href="#cb9-11" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">SqlExpr</span> a <span class="ot">=</span></span>
<span id="cb9-12"><a href="#cb9-12" aria-hidden="true" tabindex="-1"></a>    <span class="dt">SqlExpr</span> {<span class="ot"> unSqlExpr ::</span> (<span class="dt">Builder</span>,  [<span class="dt">PersistValue</span>]) }</span>
<span id="cb9-13"><a href="#cb9-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-14"><a href="#cb9-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-15"><a href="#cb9-15" aria-hidden="true" tabindex="-1"></a><span class="ot">bool ::</span> <span class="dt">Bool</span> <span class="ot">-&gt;</span> <span class="dt">SqlExpr</span> <span class="dt">Bool</span></span>
<span id="cb9-16"><a href="#cb9-16" aria-hidden="true" tabindex="-1"></a>bool b <span class="ot">=</span> <span class="dt">SqlExpr</span> (<span class="st">&quot;?&quot;</span>, [<span class="dt">PersistBool</span> b])</span>
<span id="cb9-17"><a href="#cb9-17" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-18"><a href="#cb9-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-19"><a href="#cb9-19" aria-hidden="true" tabindex="-1"></a><span class="ot">int ::</span> <span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="dt">SqlExpr</span> <span class="dt">Int</span></span>
<span id="cb9-20"><a href="#cb9-20" aria-hidden="true" tabindex="-1"></a>int i <span class="ot">=</span> <span class="dt">SqlExpr</span> (<span class="st">&quot;?&quot;</span>, [<span class="dt">PersistInt64</span> (<span class="fu">fromIntegral</span> i)])</span>
<span id="cb9-21"><a href="#cb9-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-22"><a href="#cb9-22" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-23"><a href="#cb9-23" aria-hidden="true" tabindex="-1"></a><span class="ot">leq ::</span> <span class="dt">SqlExpr</span> <span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="dt">SqlExpr</span> <span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="dt">SqlExpr</span> <span class="dt">Bool</span></span>
<span id="cb9-24"><a href="#cb9-24" aria-hidden="true" tabindex="-1"></a>leq expr1 expr2 <span class="ot">=</span></span>
<span id="cb9-25"><a href="#cb9-25" aria-hidden="true" tabindex="-1"></a>        <span class="kw">let</span> (b1, v1) <span class="ot">=</span> unSqlExpr expr1</span>
<span id="cb9-26"><a href="#cb9-26" aria-hidden="true" tabindex="-1"></a>            (b2, v2) <span class="ot">=</span> unSqlExpr expr2</span>
<span id="cb9-27"><a href="#cb9-27" aria-hidden="true" tabindex="-1"></a>            <span class="kw">in</span> <span class="dt">SqlExpr</span> ( b1 <span class="op">&lt;&gt;</span> <span class="st">&quot; &lt;= &quot;</span> <span class="op">&lt;&gt;</span> b2, v1 <span class="op">&lt;&gt;</span> v2)</span>
<span id="cb9-28"><a href="#cb9-28" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-29"><a href="#cb9-29" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-30"><a href="#cb9-30" aria-hidden="true" tabindex="-1"></a><span class="ot">and_ ::</span> <span class="dt">SqlExpr</span> <span class="dt">Bool</span> <span class="ot">-&gt;</span> <span class="dt">SqlExpr</span> <span class="dt">Bool</span> <span class="ot">-&gt;</span> <span class="dt">SqlExpr</span> <span class="dt">Bool</span></span>
<span id="cb9-31"><a href="#cb9-31" aria-hidden="true" tabindex="-1"></a>and_ expr1 expr2 <span class="ot">=</span></span>
<span id="cb9-32"><a href="#cb9-32" aria-hidden="true" tabindex="-1"></a>   <span class="kw">let</span> (b1, v1) <span class="ot">=</span> unSqlExpr expr1</span>
<span id="cb9-33"><a href="#cb9-33" aria-hidden="true" tabindex="-1"></a>       (b2, v2) <span class="ot">=</span> unSqlExpr expr2</span>
<span id="cb9-34"><a href="#cb9-34" aria-hidden="true" tabindex="-1"></a>   <span class="kw">in</span> <span class="dt">SqlExpr</span> ( b1 <span class="op">&lt;&gt;</span> <span class="st">&quot; AND &quot;</span> <span class="op">&lt;&gt;</span> b2, v1 <span class="op">&lt;&gt;</span> v2)</span>
<span id="cb9-35"><a href="#cb9-35" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-36"><a href="#cb9-36" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-37"><a href="#cb9-37" aria-hidden="true" tabindex="-1"></a><span class="ot">or_ ::</span> <span class="dt">SqlExpr</span> <span class="dt">Bool</span> <span class="ot">-&gt;</span> <span class="dt">SqlExpr</span> <span class="dt">Bool</span> <span class="ot">-&gt;</span> <span class="dt">SqlExpr</span> <span class="dt">Bool</span></span>
<span id="cb9-38"><a href="#cb9-38" aria-hidden="true" tabindex="-1"></a>or_ expr1 expr2 <span class="ot">=</span></span>
<span id="cb9-39"><a href="#cb9-39" aria-hidden="true" tabindex="-1"></a>   <span class="kw">let</span> (b1, v1) <span class="ot">=</span> unSqlExpr expr1</span>
<span id="cb9-40"><a href="#cb9-40" aria-hidden="true" tabindex="-1"></a>       (b2, v2) <span class="ot">=</span> unSqlExpr expr2</span>
<span id="cb9-41"><a href="#cb9-41" aria-hidden="true" tabindex="-1"></a>   <span class="kw">in</span> <span class="dt">SqlExpr</span> ( b1 <span class="op">&lt;&gt;</span> <span class="st">&quot; OR &quot;</span> <span class="op">&lt;&gt;</span> b2, v1 <span class="op">&lt;&gt;</span> v2)</span>
<span id="cb9-42"><a href="#cb9-42" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-43"><a href="#cb9-43" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-44"><a href="#cb9-44" aria-hidden="true" tabindex="-1"></a><span class="ot">not_ ::</span> <span class="dt">SqlExpr</span> <span class="dt">Bool</span> <span class="ot">-&gt;</span> <span class="dt">SqlExpr</span> <span class="dt">Bool</span></span>
<span id="cb9-45"><a href="#cb9-45" aria-hidden="true" tabindex="-1"></a>not_ expr <span class="ot">=</span></span>
<span id="cb9-46"><a href="#cb9-46" aria-hidden="true" tabindex="-1"></a>    <span class="kw">let</span> (b, v) <span class="ot">=</span> unSqlExpr expr</span>
<span id="cb9-47"><a href="#cb9-47" aria-hidden="true" tabindex="-1"></a>    <span class="kw">in</span> <span class="dt">SqlExpr</span> (<span class="st">&quot;NOT &quot;</span> <span class="op">&lt;&gt;</span> b, v)</span></code></pre></div>
<p>Both this and the preceding are valid interpretations. However, they are mutually exclusive. To support both, the paper presents creating a typeclass that abstracts over the representation. This is not a requirement to consider something <em>final tagless</em>. This confusion has led a lot of people astray and had them avoid the simple solution even in situations where several interpretations are not required.</p>
<h2 id="bonus-round-context-aware-encoding">Bonus Round: Context-aware encoding</h2>
<p>There is potential issue in the code above. We aren’t setting parentheses. We also don’t want to use parentheses where they aren’t actually required. This requires us to know our context. One might think it wouldn’t be possible for a function to know what context it is in but we can use a trick where we make the context explicit as a function. Let’s look at our new version:</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE OverloadedStrings #-}</span></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Text.Lazy.Builder</span> ( <span class="dt">Builder</span> )</span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">PersistValue</span></span>
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a>    <span class="ot">=</span> <span class="dt">PersistInt64</span> <span class="dt">Integer</span></span>
<span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a>    <span class="op">|</span> <span class="dt">PersistBool</span> <span class="dt">Bool</span></span>
<span id="cb10-8"><a href="#cb10-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-9"><a href="#cb10-9" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">WithParens</span> <span class="ot">=</span> <span class="dt">Bool</span></span>
<span id="cb10-10"><a href="#cb10-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-11"><a href="#cb10-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-12"><a href="#cb10-12" aria-hidden="true" tabindex="-1"></a><span class="ot">parens ::</span> <span class="dt">Builder</span> <span class="ot">-&gt;</span> <span class="dt">Builder</span></span>
<span id="cb10-13"><a href="#cb10-13" aria-hidden="true" tabindex="-1"></a>parens b <span class="ot">=</span> <span class="st">&quot;(&quot;</span> <span class="op">&lt;&gt;</span> b <span class="op">&lt;&gt;</span> <span class="st">&quot;)&quot;</span></span>
<span id="cb10-14"><a href="#cb10-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-15"><a href="#cb10-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-16"><a href="#cb10-16" aria-hidden="true" tabindex="-1"></a><span class="ot">parensM ::</span> <span class="dt">Bool</span> <span class="ot">-&gt;</span> <span class="dt">Builder</span> <span class="ot">-&gt;</span> <span class="dt">Builder</span></span>
<span id="cb10-17"><a href="#cb10-17" aria-hidden="true" tabindex="-1"></a>parensM <span class="dt">True</span>  <span class="ot">=</span> parens</span>
<span id="cb10-18"><a href="#cb10-18" aria-hidden="true" tabindex="-1"></a>parensM <span class="dt">False</span> <span class="ot">=</span> <span class="fu">id</span></span>
<span id="cb10-19"><a href="#cb10-19" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-20"><a href="#cb10-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-21"><a href="#cb10-21" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span>  <span class="dt">SqlExpr</span> a <span class="ot">=</span></span>
<span id="cb10-22"><a href="#cb10-22" aria-hidden="true" tabindex="-1"></a>    <span class="dt">SqlExpr</span> {<span class="ot"> unSqlExpr ::</span> <span class="dt">WithParens</span> <span class="ot">-&gt;</span> (<span class="dt">Builder</span>,  [<span class="dt">PersistValue</span>]) }</span>
<span id="cb10-23"><a href="#cb10-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-24"><a href="#cb10-24" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-25"><a href="#cb10-25" aria-hidden="true" tabindex="-1"></a><span class="ot">bool ::</span> <span class="dt">Bool</span> <span class="ot">-&gt;</span> <span class="dt">SqlExpr</span> <span class="dt">Bool</span></span>
<span id="cb10-26"><a href="#cb10-26" aria-hidden="true" tabindex="-1"></a>bool b <span class="ot">=</span> <span class="dt">SqlExpr</span> <span class="op">$</span> \_ <span class="ot">-&gt;</span> (<span class="st">&quot;?&quot;</span>, [<span class="dt">PersistBool</span> b])</span>
<span id="cb10-27"><a href="#cb10-27" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-28"><a href="#cb10-28" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-29"><a href="#cb10-29" aria-hidden="true" tabindex="-1"></a><span class="ot">int ::</span> <span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="dt">SqlExpr</span> <span class="dt">Int</span></span>
<span id="cb10-30"><a href="#cb10-30" aria-hidden="true" tabindex="-1"></a>int i <span class="ot">=</span> <span class="dt">SqlExpr</span> (<span class="fu">const</span> (<span class="st">&quot;?&quot;</span>, [<span class="dt">PersistInt64</span> (<span class="fu">fromIntegral</span> i)]))</span>
<span id="cb10-31"><a href="#cb10-31" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-32"><a href="#cb10-32" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-33"><a href="#cb10-33" aria-hidden="true" tabindex="-1"></a><span class="ot">leq ::</span> <span class="dt">SqlExpr</span> <span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="dt">SqlExpr</span> <span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="dt">SqlExpr</span> <span class="dt">Bool</span></span>
<span id="cb10-34"><a href="#cb10-34" aria-hidden="true" tabindex="-1"></a>leq expr1 expr2 <span class="ot">=</span></span>
<span id="cb10-35"><a href="#cb10-35" aria-hidden="true" tabindex="-1"></a>        <span class="kw">let</span> (b1, v1) <span class="ot">=</span> unSqlExpr expr1 <span class="dt">True</span></span>
<span id="cb10-36"><a href="#cb10-36" aria-hidden="true" tabindex="-1"></a>            (b2, v2) <span class="ot">=</span> unSqlExpr expr2 <span class="dt">True</span></span>
<span id="cb10-37"><a href="#cb10-37" aria-hidden="true" tabindex="-1"></a>            <span class="kw">in</span> <span class="dt">SqlExpr</span> <span class="op">$</span> \p <span class="ot">-&gt;</span> (parensM p (b1 <span class="op">&lt;&gt;</span> <span class="st">&quot; &lt;= &quot;</span> <span class="op">&lt;&gt;</span> b2), v1 <span class="op">&lt;&gt;</span> v2)</span>
<span id="cb10-38"><a href="#cb10-38" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-39"><a href="#cb10-39" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-40"><a href="#cb10-40" aria-hidden="true" tabindex="-1"></a><span class="ot">and_ ::</span> <span class="dt">SqlExpr</span> <span class="dt">Bool</span> <span class="ot">-&gt;</span> <span class="dt">SqlExpr</span> <span class="dt">Bool</span> <span class="ot">-&gt;</span> <span class="dt">SqlExpr</span> <span class="dt">Bool</span></span>
<span id="cb10-41"><a href="#cb10-41" aria-hidden="true" tabindex="-1"></a>and_ expr1 expr2 <span class="ot">=</span></span>
<span id="cb10-42"><a href="#cb10-42" aria-hidden="true" tabindex="-1"></a>    <span class="kw">let</span> (b1, v1) <span class="ot">=</span> unSqlExpr expr1 <span class="dt">True</span></span>
<span id="cb10-43"><a href="#cb10-43" aria-hidden="true" tabindex="-1"></a>        (b2, v2) <span class="ot">=</span> unSqlExpr expr2 <span class="dt">True</span></span>
<span id="cb10-44"><a href="#cb10-44" aria-hidden="true" tabindex="-1"></a>    <span class="kw">in</span> <span class="dt">SqlExpr</span> <span class="op">$</span> \p <span class="ot">-&gt;</span> (parensM p  ( b1 <span class="op">&lt;&gt;</span> <span class="st">&quot; AND &quot;</span> <span class="op">&lt;&gt;</span> b2), v1 <span class="op">&lt;&gt;</span> v2)</span>
<span id="cb10-45"><a href="#cb10-45" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-46"><a href="#cb10-46" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-47"><a href="#cb10-47" aria-hidden="true" tabindex="-1"></a><span class="ot">or_ ::</span> <span class="dt">SqlExpr</span> <span class="dt">Bool</span> <span class="ot">-&gt;</span> <span class="dt">SqlExpr</span> <span class="dt">Bool</span> <span class="ot">-&gt;</span> <span class="dt">SqlExpr</span> <span class="dt">Bool</span></span>
<span id="cb10-48"><a href="#cb10-48" aria-hidden="true" tabindex="-1"></a>or_ expr1 expr2 <span class="ot">=</span></span>
<span id="cb10-49"><a href="#cb10-49" aria-hidden="true" tabindex="-1"></a>    <span class="kw">let</span> (b1, v1) <span class="ot">=</span> unSqlExpr expr1 <span class="dt">True</span></span>
<span id="cb10-50"><a href="#cb10-50" aria-hidden="true" tabindex="-1"></a>        (b2, v2) <span class="ot">=</span> unSqlExpr expr2 <span class="dt">True</span></span>
<span id="cb10-51"><a href="#cb10-51" aria-hidden="true" tabindex="-1"></a>    <span class="kw">in</span> <span class="dt">SqlExpr</span> <span class="op">$</span> \p <span class="ot">-&gt;</span> ( parensM p (b1 <span class="op">&lt;&gt;</span> <span class="st">&quot; OR &quot;</span> <span class="op">&lt;&gt;</span> b2), v1 <span class="op">&lt;&gt;</span> v2)</span>
<span id="cb10-52"><a href="#cb10-52" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-53"><a href="#cb10-53" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-54"><a href="#cb10-54" aria-hidden="true" tabindex="-1"></a><span class="ot">not_ ::</span> <span class="dt">SqlExpr</span> <span class="dt">Bool</span> <span class="ot">-&gt;</span> <span class="dt">SqlExpr</span> <span class="dt">Bool</span></span>
<span id="cb10-55"><a href="#cb10-55" aria-hidden="true" tabindex="-1"></a>not_ expr <span class="ot">=</span></span>
<span id="cb10-56"><a href="#cb10-56" aria-hidden="true" tabindex="-1"></a>    <span class="kw">let</span> (b, v) <span class="ot">=</span> unSqlExpr expr <span class="dt">True</span></span>
<span id="cb10-57"><a href="#cb10-57" aria-hidden="true" tabindex="-1"></a>    <span class="kw">in</span> <span class="dt">SqlExpr</span> <span class="op">$</span> \p <span class="ot">-&gt;</span> (parensM p (<span class="st">&quot;NOT &quot;</span> <span class="op">&lt;&gt;</span> b), v)</span>
<span id="cb10-58"><a href="#cb10-58" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-59"><a href="#cb10-59" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-60"><a href="#cb10-60" aria-hidden="true" tabindex="-1"></a><span class="ot">where_ ::</span> <span class="dt">SqlExpr</span> <span class="dt">Bool</span> <span class="ot">-&gt;</span> (<span class="dt">Builder</span>, [<span class="dt">PersistValue</span>])</span>
<span id="cb10-61"><a href="#cb10-61" aria-hidden="true" tabindex="-1"></a>where_ e <span class="ot">=</span></span>
<span id="cb10-62"><a href="#cb10-62" aria-hidden="true" tabindex="-1"></a>    <span class="kw">let</span> (b, v) <span class="ot">=</span> unSqlExpr e <span class="dt">False</span></span>
<span id="cb10-63"><a href="#cb10-63" aria-hidden="true" tabindex="-1"></a>    <span class="kw">in</span> (<span class="st">&quot;WHERE &quot;</span> <span class="op">&lt;&gt;</span> b, v)</span></code></pre></div>
<p><em>Ben Levy is a Partner and Principal Software Engineer at Foxhound Systems. At Foxhound Systems, we focus on using Haskell to create fast and reliable custom built software systems. Looking for help with something you’re working on? Reach out to us at <a href="mailto:info@foxhound.systems" target="_blank" rel="noopener">info@foxhound.systems</a></em>.</p>
        </div>
    </div>
</article>
]]></description>
    <pubDate>Thu, 27 May 2021 00:00:00 UT</pubDate>
    <guid>https://www.foxhound.systems/blog/final-tagless/index.html</guid>
    <dc:creator>Foxhound Systems</dc:creator>
</item>
<item>
    <title>Speeding up SQL queries by orders of magnitude using UNION</title>
    <link>https://www.foxhound.systems/blog/sql-performance-with-union/index.html</link>
    <description><![CDATA[<nav class="container mt-2 mb-2">
    <a href="https://www.foxhound.systems/blog/" class="no-underline magic-underline-slim">Back to all posts</a>
</nav>
<article class="grow post">
    
        <div class="lg-container post-banner">
            <picture>
                
                    <source srcset="https://www.foxhound.systems/img/2021-03-19-sql-performance-with-union/sql-union-banner.webp" type="image/webp" height="860" width="1280">
                
                <img src="https://www.foxhound.systems/img/2021-03-19-sql-performance-with-union/sql-union-banner.jpg" alt="A picture of a construction crew working: one constructor worker is operating a small excavator, while three others are observing him doing his work." height="860" width="1280">
            </picture>
        </div>
    
    <div class="container">
        
        <h1 class="title">Speeding up SQL queries by orders of magnitude using UNION</h1>
        <header class="info gy-1">
            <div class="author-date space-y-1">
                <div>
                    <span class="date">March 19, 2021</span>
                </div>
                <div class="flex flex-wrap gx-2 gy-1">
                    
                        
                        
                        <div class="flex gap-2 items-center">
                            <img class="rounded-full w-12" src="https://www.foxhound.systems/assets/img/team/ben-sm.jpg" alt="Photo of Ben Levy">
                            <div class="ml-1 flex flex-col">
                                <span class="font-bold">Ben Levy</span>
                                <span>Partner &amp; Principal Software Engineer</span>
                            </div>
                        </div>
                        
                    
                        
                        <div class="flex gap-2 items-center">
                            <img class="rounded-full w-12" src="https://www.foxhound.systems/assets/img/team/christian-sm.jpg" alt="Photo of Christian Charukiewicz">
                            <div class="ml-1 flex flex-col">
                                <span class="font-bold">Christian Charukiewicz</span>
                                <span>Partner &amp; Principal Software Engineer</span>
                            </div>
                        </div>
                        
                        
                    
                </div>
            </div>
            
            <div>
                <div class="tags"><a title="Posts tagged: sql" href="https://www.foxhound.systems/blog/tag/sql/">sql</a> <a title="Posts tagged: performance-optimization" href="https://www.foxhound.systems/blog/tag/performance-optimization/">performance-optimization</a></div>
            </div>
            
        </header>
        <hr>
        <div class="content">
        <p>SQL is a very powerful tool for querying data. It allows you to write queries against your relational data in a declarative manner, letting you describe <em>what</em> data that you want to retrieve without having to describe <em>how</em> to retrieve it. In most cases, this works very well, and the query optimizer in many database engines (MySQL, PostgreSQL, etc.) will create an efficient query plan.</p>
<p>Efficient query plans rely on a schema that uses appropriate data types, especially for primary key columns, where doing things such as misusing <code>VARCHAR</code> can kill performance. Another critical element of enabling fast query plans is appropriately indexing columns, which eliminates the need to perform full table scans when retrieving data. Unfortunately, even following these schema rules, it’s possible to write SQL queries that have surprisingly poor performance, often leading to the bewilderment of the developer writing such a query. Perhaps the most surprising aspect of this type of query is that it is often written in the most intuitive way to describe the data.</p>
<!--more-->
<h2 id="a-performance-trap-the-diamond-shaped-schema">A performance trap: the diamond-shaped schema</h2>
<p>One of the most common cases where SQL query performance can degrade significantly is in a diamond shaped schema, where there are multiple ways of joining two tables together. In such a schema, a query is likely to use <code>OR</code> to join tables in more than one way, which eliminates the optimizer’s ability to create an efficient query plan. This scenario is best illustrated through an example.</p>
<p>Imagine we have the following schema for a chain of retail stores that sell food and drink. The table layout is as follows:</p>
<pre><code>                                stores
                                +---------+------+
     customers            +----&gt;| id      | int  |&lt;----------------+
     +----------+------+  |     | address | text |                 |
+---&gt;| id       | int  |  |     +---------+------+                 |
|    | name     | text |  |                                        |
|    | store_id | int  +--+                   employees            |
|    +----------+------+                      +----------+------+  |
|                                      +-----&gt;| id       | int  |  |
|                                      |      | name     | text |  |
|  customer_orders                     |      | role     | text |  |
|  +-------------+-----------+         |      | store_id | int  +--+
|  | id          | int       |&lt;--+     |      +----------+------+
+--+ customer_id | int       |   |     |
   | created     | timestamp |   |     |
   +-------------+-----------+   |     |  employee_markouts
                                 |     |  +--------------+-----------+
                                 |     |  | id           | int       |
    customer_order_items         |     +--+ employee_id  | int       |
    +-------------------+-----+  |        | meal_item_id | int       +--+
    | id                | int |  |        | created      | timestamp |  |
    | customer_order_id | int +--+        +--------------+-----------+  |
 +--+ meal_item_id      | int |                                         |
 |  +-------------------+-----+                                         |
 |                                 meal_items                           |
 |                                 +-------+------+                     |
 +--------------------------------&gt;| id    | int  |&lt;--------------------+
                                   | label | text |
                                   | price | int  |
                                   +-------+------+</code></pre>
<p>Here are a few key features of this schema:</p>
<ul>
<li>Both <code>customers</code> and <code>employees</code> belong to <code>stores</code>.</li>
<li>Customers place <code>customer_orders</code> which consist of one or several <code>customer_order_items</code>.</li>
<li>Employees periodically get free items for their employment, which are recorded as <code>employee_markouts</code>.</li>
<li>Both <code>customer_order_items</code> and <code>employee_markouts</code> reference <code>meal_items</code>, which include the labels and prices of the food items sold.</li>
</ul>
<p>For the purposes of our testing, we’ll be deploying this schema with proper indexing on a PostgreSQL 12.6 database with the following number of records in each table:</p>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>Table</th>
<th>Number of Records</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>stores</code></td>
<td>800</td>
</tr>
<tr>
<td><code>employees</code></td>
<td>20,000</td>
</tr>
<tr>
<td><code>employee_markouts</code></td>
<td>25,000</td>
</tr>
<tr>
<td><code>customers</code></td>
<td>20,000</td>
</tr>
<tr>
<td><code>customer_orders</code></td>
<td>100,000</td>
</tr>
<tr>
<td><code>customer_order_items</code></td>
<td>550,482</td>
</tr>
<tr>
<td><code>meal_items</code></td>
<td>500</td>
</tr>
</tbody>
</table>
</div>
<p>All of the orders and markouts are randomly distributed amongst the customers and employees, respectively. Employees and customers are also randomly distributed across stores.</p>
<h2 id="handling-a-report-request">Handling a report request</h2>
<p>In order to audit inventory, the logistics team at the corporate headquarters requests a tool that can generate a report containing all <code>meal_items</code> that left a given store’s inventory on a particular day. This requires a query that includes items that were both sold to customers as well as recorded as employee markouts for the specified store on the specified day.</p>
<p>To break this request down into more manageable segments, we’ll first retrieve all of the meal items that are a part of an employee markout created on the given day at the given store. Once we have this, we’ll expand it to include meal items that have been purchased by customers.</p>
<p>The query that retrieves only employee markout data starts at the <code>stores</code> table and joins the employee tables down to the <code>meal_items</code> table. This is a fairly straightforward query, and since the columns are indexed, we expect it to perform well.</p>
<h3 id="query-1---retrieving-only-employee-meal-items">Query #1 - Retrieving only employee meal items</h3>
<div class="sourceCode" id="cb2"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="kw">SELECT</span> meal_items.<span class="op">*</span>, employee_markouts.employee_id</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="kw">FROM</span> stores</span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="kw">INNER</span> <span class="kw">JOIN</span> employees</span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a><span class="kw">ON</span> stores.<span class="kw">id</span> <span class="op">=</span> employees.store_id</span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a><span class="kw">INNER</span> <span class="kw">JOIN</span> employee_markouts</span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a><span class="kw">ON</span> employees.<span class="kw">id</span> <span class="op">=</span> employee_markouts.employee_id</span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a><span class="kw">INNER</span> <span class="kw">JOIN</span> meal_items</span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a><span class="kw">ON</span> employee_markouts.meal_item_id <span class="op">=</span> meal_items.<span class="kw">id</span></span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a><span class="kw">WHERE</span> stores.<span class="kw">id</span> <span class="op">=</span> <span class="dv">250</span></span>
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a><span class="kw">AND</span> employee_markouts.created <span class="op">&gt;=</span> <span class="st">'2021-02-03'</span></span>
<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a><span class="kw">AND</span> employee_markouts.created <span class="op">&lt;</span> <span class="st">'2021-02-04'</span>;</span></code></pre></div>
<p>We get the following when we run this query:</p>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>id</th>
<th>label</th>
<th>price</th>
<th>employee_id</th>
</tr>
</thead>
<tbody>
<tr>
<td>173</td>
<td>Pizza</td>
<td>3.73</td>
<td>3737</td>
</tr>
<tr>
<td>339</td>
<td>Tuna Sashimi</td>
<td>21.41</td>
<td>3737</td>
</tr>
</tbody>
</table>
</div>
<p><strong>2 results</strong>, Execution Time: <strong>1.499 ms</strong></p>
<p>This query gives us the data we’re looking for and runs in a blazing fast 1.499 milliseconds—the excellent performance we expected. The problem is that we’re not done yet, we also need to retrieve the meal items that are a part of customer orders. In order to do this, we’ll modify the above query in the following ways:</p>
<ul>
<li>We’ll include a second branch of joins from <code>stores</code> to <code>meal_items</code> through the customers tables, updating the final join into the <code>meal_items</code> table to use an <code>OR</code> to merge both branches.</li>
<li>Since we’re looking for meal items that are a part of either employee markouts or customer orders, we’ll convert all of our joins to be <code>LEFT</code> joins and add a condition in our customers branch to ignore employee markouts.</li>
<li>We’ll also change the columns we’re selecting to include either one of the <code>employee_id</code> or the <code>customer_id</code> that the meal item belongs to.</li>
</ul>
<p>Our new query with these changes incorporated is below:</p>
<h3 id="query-2---retrieving-both-employee-and-customer-meal-items-using-multi-branch-joins">Query #2 - Retrieving both employee and customer meal items using multi-branch joins</h3>
<div class="sourceCode" id="cb3"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">SELECT</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>  meal_items.<span class="op">*</span>,</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>  employee_markouts.employee_id,</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a>  customer_orders.customer_id</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a><span class="kw">FROM</span> stores</span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a><span class="co">-- employees branch</span></span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a><span class="kw">LEFT</span> <span class="kw">JOIN</span> employees</span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a><span class="kw">ON</span> stores.<span class="kw">id</span> <span class="op">=</span> employees.store_id</span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a><span class="kw">LEFT</span> <span class="kw">JOIN</span> employee_markouts</span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a><span class="kw">ON</span> employees.<span class="kw">id</span> <span class="op">=</span> employee_markouts.employee_id</span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a><span class="co">-- customers branch</span></span>
<span id="cb3-12"><a href="#cb3-12" aria-hidden="true" tabindex="-1"></a><span class="kw">LEFT</span> <span class="kw">JOIN</span> customers</span>
<span id="cb3-13"><a href="#cb3-13" aria-hidden="true" tabindex="-1"></a><span class="kw">ON</span> (stores.<span class="kw">id</span> <span class="op">=</span> customers.store_id <span class="kw">AND</span> employee_markouts.<span class="kw">id</span> <span class="kw">IS</span> <span class="kw">null</span>)</span>
<span id="cb3-14"><a href="#cb3-14" aria-hidden="true" tabindex="-1"></a><span class="kw">LEFT</span> <span class="kw">JOIN</span> customer_orders</span>
<span id="cb3-15"><a href="#cb3-15" aria-hidden="true" tabindex="-1"></a><span class="kw">ON</span> customers.<span class="kw">id</span> <span class="op">=</span> customer_orders.customer_id</span>
<span id="cb3-16"><a href="#cb3-16" aria-hidden="true" tabindex="-1"></a><span class="kw">LEFT</span> <span class="kw">JOIN</span> customer_order_items</span>
<span id="cb3-17"><a href="#cb3-17" aria-hidden="true" tabindex="-1"></a><span class="kw">ON</span> customer_orders.<span class="kw">id</span> <span class="op">=</span> customer_order_items.customer_order_id</span>
<span id="cb3-18"><a href="#cb3-18" aria-hidden="true" tabindex="-1"></a><span class="co">-- join both branches into meal_items</span></span>
<span id="cb3-19"><a href="#cb3-19" aria-hidden="true" tabindex="-1"></a><span class="kw">LEFT</span> <span class="kw">JOIN</span> meal_items</span>
<span id="cb3-20"><a href="#cb3-20" aria-hidden="true" tabindex="-1"></a><span class="kw">ON</span> (customer_order_items.meal_item_id <span class="op">=</span> meal_items.<span class="kw">id</span></span>
<span id="cb3-21"><a href="#cb3-21" aria-hidden="true" tabindex="-1"></a>    <span class="kw">OR</span> employee_markouts.meal_item_id <span class="op">=</span> meal_items.<span class="kw">id</span></span>
<span id="cb3-22"><a href="#cb3-22" aria-hidden="true" tabindex="-1"></a>   )</span>
<span id="cb3-23"><a href="#cb3-23" aria-hidden="true" tabindex="-1"></a><span class="kw">WHERE</span> stores.<span class="kw">id</span> <span class="op">=</span> <span class="dv">250</span></span>
<span id="cb3-24"><a href="#cb3-24" aria-hidden="true" tabindex="-1"></a><span class="kw">AND</span> meal_items.<span class="kw">id</span> <span class="kw">IS</span> <span class="kw">NOT</span> <span class="kw">null</span></span>
<span id="cb3-25"><a href="#cb3-25" aria-hidden="true" tabindex="-1"></a><span class="kw">AND</span> ( employee_markouts.created <span class="op">&gt;=</span> <span class="st">'2021-02-03'</span> <span class="kw">AND</span> employee_markouts.created <span class="op">&lt;</span> <span class="st">'2021-02-04'</span></span>
<span id="cb3-26"><a href="#cb3-26" aria-hidden="true" tabindex="-1"></a>      <span class="kw">OR</span> customer_orders.created <span class="op">&gt;=</span> <span class="st">'2021-02-03'</span> <span class="kw">AND</span> customer_orders.created <span class="op">&lt;</span> <span class="st">'2021-02-04'</span></span>
<span id="cb3-27"><a href="#cb3-27" aria-hidden="true" tabindex="-1"></a>    )</span>
<span id="cb3-28"><a href="#cb3-28" aria-hidden="true" tabindex="-1"></a><span class="kw">GROUP</span> <span class="kw">BY</span> meal_items.<span class="kw">id</span>, employee_markouts.<span class="kw">id</span>, customer_orders.<span class="kw">id</span>, customer_order_items.<span class="kw">id</span>;</span></code></pre></div>
<p>Running query #2 gives the following results (abridged for brevity):</p>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>id</th>
<th>label</th>
<th>price</th>
<th>employee_id</th>
<th>customer_id</th>
</tr>
</thead>
<tbody>
<tr>
<td>5</td>
<td>Stinky Tofu</td>
<td>21.24</td>
<td></td>
<td>3769</td>
</tr>
<tr>
<td>17</td>
<td>Chicken Sandwich</td>
<td>18.37</td>
<td></td>
<td>11085</td>
</tr>
<tr>
<td>25</td>
<td>Kebab</td>
<td>16.30</td>
<td></td>
<td>3769</td>
</tr>
<tr>
<td>…</td>
<td>…</td>
<td>…</td>
<td>…</td>
<td>…</td>
</tr>
<tr>
<td>173</td>
<td>Pizza</td>
<td>3.73</td>
<td>3737</td>
<td></td>
</tr>
<tr>
<td>339</td>
<td>Tuna Sashimi</td>
<td>21.41</td>
<td>3737</td>
<td></td>
</tr>
<tr>
<td>…</td>
<td>…</td>
<td>…</td>
<td>…</td>
<td>…</td>
</tr>
<tr>
<td>490</td>
<td>Ribeye Steak</td>
<td>20.10</td>
<td></td>
<td>1052</td>
</tr>
</tbody>
</table>
</div>
<p><strong>45 results</strong>, Execution Time: <strong>3264.547 ms</strong></p>
<p>Of the 45 results, 43 represent meal items purchased by customers, and we continue to see the two meal items from the previous query that come from employee markouts. Unfortunately, the performance of this multi-branch query is far worse than before. The sub-2-millisecond query from before has ballooned into a sluggish 3,264 milliseconds.</p>
<div class="article-banner">
    <hr>
    <a href="https://www.cloudtrellis.com/?utm_source=foxhound.systems&amp;utm_medium=banner&amp;utm_campaign=cloudtrellis" class="service-offering no-underline magic-underline-container color-accent-hover-container flex-col w-100 items-center text-left font-manrope" target="_blank" rel="noopener">
        <span class="block w-100 text-center mb-1">
            <div class="flex items-center g-0">
                <span class="inline-block relative">
                    <svg class="color-muted color-accent-hover block s-w-3 s-h-3"><use href="#cloudtrellis-logo"></use></svg>
                </span>
                <span class="font-larger-3">Cloudtrellis</span>
            </div>
        </span>
        <span class="w-100 mb-1 block color-muted font-smaller">A new service built by Foxhound Systems</span>
        <span class="font-bold font-larger-3 block mb-2">Discover problems with your website before your users do</span>
        <br><br>
        <span class="w-100 mb-1 block">Cloudtrellis scans your entire site for broken links, accessibility issues, and SEO errors to ensure a flawless user experience.</span>
        <br><br>
        <ul class="flex-col items-baseline mb-2 mt-none">
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Detect error pages, broken links, accessibility issues, and SEO problems</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Create scans with tailored configurations for each website and subdomain you manage</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Schedule scans to run monthly, weekly, or even daily to closely monitor for new issues</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Get notified of new scan results via email</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Share scan results with your team via direct link</span>
            </li>
        </ul>
        <span class="block w-full cta-btn pt-2">Learn more</span>
    </a>
    <hr>
</div>

<p>This performance may be acceptable for a one-off query, but for any other use case, this execution time of more than three seconds is very poor given the relatively small amount of data in our database. Our database has roughly only 750,000 rows. If we were dealing with row counts in the tens or hundreds of millions, the performance of our report would likely be in the tens of seconds. This is an unacceptable amount of time to make our end users wait, especially if they need to run multiple reports, so we need to find a way to achieve better performance.</p>
<h2 id="sticking-to-fast-and-simple-queries">Sticking to fast and simple queries</h2>
<p>After running each of our two queries, it’s apparent that attempting to retrieve meal item counts for both employees and customers in a single query in the way that we wrote query #2 resulted in a significant degradation in performance. Even without being familiar with the specifics of the query plan (which we can see by re-running the query prefixed with <code>EXPLAIN</code> or <code>EXPLAIN ANALYZE</code>), we can try to stick to using simpler queries that we think will have better performance, and see whether there’s a better way to compose the results.</p>
<p>Query #1 retrieved meal items only a part of employee markouts and it performed extremely well. Let’s try writing the query to retrieve only the meal items that are a part of customer orders and examine its performance. Like the query for employee data, this query will join the tables between the <code>stores</code> and <code>meal_items</code> tables, but instead do so through the customer tables. There’s three customer-specific tables rather than two employee-specific tables, but otherwise this query is very similar to the first:</p>
<h3 id="query-3---retrieving-only-customer-meal-items">Query #3 - Retrieving only customer meal items</h3>
<div class="sourceCode" id="cb4"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="kw">SELECT</span> meal_items.<span class="op">*</span>, customer_orders.customer_id</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="kw">FROM</span> stores</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="kw">INNER</span> <span class="kw">JOIN</span> customers</span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="kw">ON</span> stores.<span class="kw">id</span> <span class="op">=</span> customers.store_id</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a><span class="kw">INNER</span> <span class="kw">JOIN</span> customer_orders</span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a><span class="kw">ON</span> customers.<span class="kw">id</span> <span class="op">=</span> customer_orders.customer_id</span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a><span class="kw">INNER</span> <span class="kw">JOIN</span> customer_order_items</span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a><span class="kw">ON</span> customer_orders.<span class="kw">id</span> <span class="op">=</span> customer_order_items.customer_order_id</span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a><span class="kw">INNER</span> <span class="kw">JOIN</span> meal_items</span>
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a><span class="kw">ON</span> customer_order_items.meal_item_id <span class="op">=</span> meal_items.<span class="kw">id</span></span>
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a><span class="kw">WHERE</span> stores.<span class="kw">id</span> <span class="op">=</span> <span class="dv">250</span></span>
<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a><span class="kw">AND</span> customer_orders.created <span class="op">&gt;=</span> <span class="st">'2021-02-03'</span></span>
<span id="cb4-13"><a href="#cb4-13" aria-hidden="true" tabindex="-1"></a><span class="kw">AND</span> customer_orders.created <span class="op">&lt;</span> <span class="st">'2021-02-04'</span>;</span></code></pre></div>
<p>When we run this query, we get the following results:</p>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>id</th>
<th>label</th>
<th>price</th>
<th>customer_id</th>
</tr>
</thead>
<tbody>
<tr>
<td>5</td>
<td>Stinky Tofu</td>
<td>21.24</td>
<td>3769</td>
</tr>
<tr>
<td>17</td>
<td>Chicken Sandwich</td>
<td>18.37</td>
<td>11085</td>
</tr>
<tr>
<td>…</td>
<td>…</td>
<td>…</td>
<td>…</td>
</tr>
<tr>
<td>482</td>
<td>Vegetable Soup</td>
<td>4.50</td>
<td>3769</td>
</tr>
<tr>
<td>490</td>
<td>Ribeye Steak</td>
<td>20.10</td>
<td>1052</td>
</tr>
</tbody>
</table>
</div>
<p><strong>43 results</strong>, Execution Time: <strong>102.283 ms</strong></p>
<p>We get exactly the results we expect. Looking at the performance, we see that this query runs in only 102 milliseconds. This is slower than query #1 because we have significantly more customers than employees in our database, but still far faster than the 3264 milliseconds query #2 took to run.</p>
<p>Now we are in a situation where we retrieve the correct results, albeit split across two queries. Despite this, the runtime of both query #1 (only employee meal items) and query #3 (only customer meal items) put together is more than <em>30 times</em> faster than query #2 (both employee and customer meal items through multi-branch joins). All we need to do is merge the results of these queries. The good news is that SQL has an operation that will let us do this while preserving this speed.</p>
<h2 id="preserving-performance-through-union">Preserving performance through UNION</h2>
<p>The <code>UNION</code> operation allows us to merge the results of two queries. Since we know that query #1 and query #3 are each significantly faster than query #2, we would expect that the results of the <code>UNION</code> operation will be fast as well.</p>
<p>We use both query #1 and query #3 nearly verbatim in what will be our new combined query. Since the <code>UNION</code> operation requires that the results of each query contain the same columns, we have to include a <code>NULL</code> placeholder column for whichever type of data (either <code>employee_id</code> or <code>customer_id</code>) the given side of the <code>UNION</code> will not retrieve.</p>
<p>One other thing that the <code>UNION</code> operation does is deduplicate rows in the result set. Since we don’t care about deduplication, we can use <code>UNION ALL</code> to tell the database engine that it can skip the deduplication step. This results in a performance boost with larger data sets.</p>
<p>The resulting query is as follows:</p>
<h3 id="query-4---retrieving-both-employee-and-customer-meal-items-using-union">Query #4 - Retrieving both employee and customer meal items using <code>UNION</code></h3>
<div class="sourceCode" id="cb5"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="kw">SELECT</span> <span class="co">-- employees query</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>  meal_items.<span class="op">*</span>,</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>  employee_markouts.employee_id,</span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a>  <span class="kw">null</span> <span class="kw">as</span> customer_id</span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a><span class="kw">FROM</span> stores</span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a><span class="kw">INNER</span> <span class="kw">JOIN</span> employees</span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a><span class="kw">ON</span> stores.<span class="kw">id</span> <span class="op">=</span> employees.store_id</span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a><span class="kw">INNER</span> <span class="kw">JOIN</span> employee_markouts</span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a><span class="kw">ON</span> employees.<span class="kw">id</span> <span class="op">=</span> employee_markouts.employee_id</span>
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a><span class="kw">INNER</span> <span class="kw">JOIN</span> meal_items</span>
<span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a><span class="kw">ON</span> employee_markouts.meal_item_id <span class="op">=</span> meal_items.<span class="kw">id</span></span>
<span id="cb5-12"><a href="#cb5-12" aria-hidden="true" tabindex="-1"></a><span class="kw">WHERE</span> stores.<span class="kw">id</span> <span class="op">=</span> <span class="dv">250</span></span>
<span id="cb5-13"><a href="#cb5-13" aria-hidden="true" tabindex="-1"></a><span class="kw">AND</span> employee_markouts.created <span class="op">&gt;=</span> <span class="st">'2021-02-03'</span></span>
<span id="cb5-14"><a href="#cb5-14" aria-hidden="true" tabindex="-1"></a><span class="kw">AND</span> employee_markouts.created <span class="op">&lt;</span> <span class="st">'2021-02-04'</span></span>
<span id="cb5-15"><a href="#cb5-15" aria-hidden="true" tabindex="-1"></a><span class="kw">UNION</span> <span class="kw">ALL</span></span>
<span id="cb5-16"><a href="#cb5-16" aria-hidden="true" tabindex="-1"></a><span class="kw">SELECT</span> <span class="co">-- customers query</span></span>
<span id="cb5-17"><a href="#cb5-17" aria-hidden="true" tabindex="-1"></a>  meal_items.<span class="op">*</span>,</span>
<span id="cb5-18"><a href="#cb5-18" aria-hidden="true" tabindex="-1"></a>  <span class="kw">null</span> <span class="kw">as</span> employee_id,</span>
<span id="cb5-19"><a href="#cb5-19" aria-hidden="true" tabindex="-1"></a>  customer_orders.customer_id</span>
<span id="cb5-20"><a href="#cb5-20" aria-hidden="true" tabindex="-1"></a><span class="kw">FROM</span> stores</span>
<span id="cb5-21"><a href="#cb5-21" aria-hidden="true" tabindex="-1"></a><span class="kw">INNER</span> <span class="kw">JOIN</span> customers</span>
<span id="cb5-22"><a href="#cb5-22" aria-hidden="true" tabindex="-1"></a><span class="kw">ON</span> stores.<span class="kw">id</span> <span class="op">=</span> customers.store_id</span>
<span id="cb5-23"><a href="#cb5-23" aria-hidden="true" tabindex="-1"></a><span class="kw">INNER</span> <span class="kw">JOIN</span> customer_orders</span>
<span id="cb5-24"><a href="#cb5-24" aria-hidden="true" tabindex="-1"></a><span class="kw">ON</span> customers.<span class="kw">id</span> <span class="op">=</span> customer_orders.customer_id</span>
<span id="cb5-25"><a href="#cb5-25" aria-hidden="true" tabindex="-1"></a><span class="kw">INNER</span> <span class="kw">JOIN</span> customer_order_items</span>
<span id="cb5-26"><a href="#cb5-26" aria-hidden="true" tabindex="-1"></a><span class="kw">ON</span> customer_orders.<span class="kw">id</span> <span class="op">=</span> customer_order_items.customer_order_id</span>
<span id="cb5-27"><a href="#cb5-27" aria-hidden="true" tabindex="-1"></a><span class="kw">INNER</span> <span class="kw">JOIN</span> meal_items</span>
<span id="cb5-28"><a href="#cb5-28" aria-hidden="true" tabindex="-1"></a><span class="kw">ON</span> customer_order_items.meal_item_id <span class="op">=</span> meal_items.<span class="kw">id</span></span>
<span id="cb5-29"><a href="#cb5-29" aria-hidden="true" tabindex="-1"></a><span class="kw">WHERE</span> stores.<span class="kw">id</span> <span class="op">=</span> <span class="dv">250</span></span>
<span id="cb5-30"><a href="#cb5-30" aria-hidden="true" tabindex="-1"></a><span class="kw">AND</span> customer_orders.created <span class="op">&gt;=</span> <span class="st">'2021-02-03'</span></span>
<span id="cb5-31"><a href="#cb5-31" aria-hidden="true" tabindex="-1"></a><span class="kw">AND</span> customer_orders.created <span class="op">&lt;</span> <span class="st">'2021-02-04'</span>;</span></code></pre></div>
<p>Given what we’ve seen above, we expect 45 results from this query. Two for employees, and 43 for customers. Running the query gives the following results:</p>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>id</th>
<th>label</th>
<th>price</th>
<th>employee_id</th>
<th>customer_id</th>
</tr>
</thead>
<tbody>
<tr>
<td>173</td>
<td>Pizza</td>
<td>3.73</td>
<td>3737</td>
<td></td>
</tr>
<tr>
<td>339</td>
<td>Tuna Sashimi</td>
<td>21.41</td>
<td>3737</td>
<td></td>
</tr>
<tr>
<td>…</td>
<td>…</td>
<td>…</td>
<td>…</td>
<td>…</td>
</tr>
<tr>
<td>403</td>
<td>Ricotta Stuffed Ravioli</td>
<td>11.09</td>
<td></td>
<td>17910</td>
</tr>
<tr>
<td>386</td>
<td>Tacos</td>
<td>11.02</td>
<td></td>
<td>17910</td>
</tr>
</tbody>
</table>
</div>
<p><strong>45 results</strong>, Execution Time: <strong>112.309 ms</strong></p>
<p>We get exactly the same results we expect, in a blazing fast 112 milliseconds. This is now a single query that gives us the same results that query #2 gave us, but does so approximately 30 times faster. Using <code>UNION</code> here costs us virtually nothing in terms of performance. The time is essentially just the sum of the two underlying queries.</p>
<p>It’s worth noting that the results of the above query are ordered differently than our original query, which is ordered by the <code>id</code> column. This is because the <code>UNION</code> operation appends rows in the order that it runs each underlying query (which is also why we get the employee meal items first). If we need the order to match, we can achieve this by wrapping query #4 in a very simple <code>SELECT</code> operation that orders the results by <code>id</code>:</p>
<h3 id="query-5---retrieving-both-employee-and-customer-meal-items-using-union-ordered-by-id">Query #5 - Retrieving both employee and customer meal items using <code>UNION</code>, ordered by <code>id</code></h3>
<div class="sourceCode" id="cb6"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="kw">SELECT</span> <span class="op">*</span> <span class="kw">FROM</span> (</span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>   <span class="co">-- ... Query #4 from above, omitted for brevity</span></span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a>) results <span class="kw">ORDER</span> <span class="kw">BY</span> <span class="kw">id</span>;</span></code></pre></div>
<p>Which gives us:</p>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>id</th>
<th>label</th>
<th>price</th>
<th>employee_id</th>
<th>customer_id</th>
</tr>
</thead>
<tbody>
<tr>
<td>5</td>
<td>Stinky Tofu</td>
<td>21.24</td>
<td></td>
<td>3769</td>
</tr>
<tr>
<td>17</td>
<td>Chicken Sandwich</td>
<td>18.37</td>
<td></td>
<td>11085</td>
</tr>
<tr>
<td>…</td>
<td>…</td>
<td>…</td>
<td>…</td>
<td>…</td>
</tr>
<tr>
<td>173</td>
<td>Pizza</td>
<td>3.73</td>
<td>3737</td>
<td></td>
</tr>
<tr>
<td>…</td>
<td>…</td>
<td>…</td>
<td>…</td>
<td>…</td>
</tr>
<tr>
<td>490</td>
<td>Ribeye Steak</td>
<td>20.10</td>
<td></td>
<td>1052</td>
</tr>
</tbody>
</table>
</div>
<p><strong>45 results</strong>, Execution Time: <strong>113.340 ms</strong></p>
<p>Query #5 gives us exactly the same results in the same order as query #2, but with a 2,880% increase in performance. This is an outstanding improvement, and is now performant enough as to where query #5 can be used in any application.</p>
<h2 id="conclusion">Conclusion</h2>
<p>There are many ways to write a SQL query to retrieve a given set of results. Most database engines are great at creating performant query plans, but certain features within a query can derail the query planner and result in a very slow query. In this post, we covered a common scenario that results in poor query performance: using <code>OR</code> to combine multiple branches of joins in a single query.</p>
<p>Arriving at query #2 to get the combined results was the intuitive way of thinking through the problem, and something that someone with intermediate or advanced SQL skills could come up with. However, once we realized that performance was bad, we applied the following steps to find a solution:</p>
<ol type="1">
<li>We focused on writing only simpler and well-performing queries that each gave different portions of our desired results.</li>
<li>We merged the results using SQL’s <code>UNION</code> operation.</li>
<li>We ensured ordering was identical using a simple wrapper query.</li>
</ol>
<p>This technique can be applied in many situations where query performance is poor due to this type of diamond-shaped branching and merging. When working on production software systems, we often see performance bottlenecks caused by slow queries removed when rewriting queries in this manner. In many cases, the performance improvement is so dramatic that it absolves the need to cache query results in systems like Redis, resulting in less system complexity in addition to better performance.</p>
<p>SQL’s <code>UNION</code> operation is not usually thought of as a means to boost performance. However, in many cases it can dramatically speed queries up by enabling an otherwise complex query to be split into several faster and simpler queries that are then merged together. Recognizing when <code>UNION</code> can be applied takes some practice, but once someone is aware of this technique, it’s possible to look for situations where a performance bottleneck can be removed through this approach.</p>
<hr />
<p><em>Ben Levy and Christian Charukiewicz are Partners and Principal Software Engineers at Foxhound Systems. At Foxhound Systems, we focus on building fast and reliable custom software. Are you facing a performance issue or looking for help with something you’re working on? Reach out to us at <a href="mailto:info@foxhound.systems" target="_blank" rel="noopener">info@foxhound.systems</a></em>.</p>
        </div>
    </div>
</article>
]]></description>
    <pubDate>Fri, 19 Mar 2021 00:00:00 UT</pubDate>
    <guid>https://www.foxhound.systems/blog/sql-performance-with-union/index.html</guid>
    <dc:creator>Foxhound Systems</dc:creator>
</item>
<item>
    <title>Why Haskell is our first choice for building production software systems</title>
    <link>https://www.foxhound.systems/blog/why-haskell-for-production/index.html</link>
    <description><![CDATA[<nav class="container mt-2 mb-2">
    <a href="https://www.foxhound.systems/blog/" class="no-underline magic-underline-slim">Back to all posts</a>
</nav>
<article class="grow post">
    
        <div class="lg-container post-banner">
            <picture>
                
                    <source srcset="https://www.foxhound.systems/img/2021-01-11-why-haskell-for-production/architecture-banner.webp" type="image/webp" height="854" width="1280">
                
                <img src="https://www.foxhound.systems/img/2021-01-11-why-haskell-for-production/architecture-banner.jpg" alt="A symmetrical photograph of what appears to be two identical buildings. The perspective of the photograph is from the middle of the two buildings, looking up towards the sky. There is a reflection of the opposite building in each building's windows." height="854" width="1280">
            </picture>
        </div>
    
    <div class="container">
        
        <h1 class="title">Why Haskell is our first choice for building production software systems</h1>
        <header class="info gy-1">
            <div class="author-date space-y-1">
                <div>
                    <span class="date">January 11, 2021</span>
                </div>
                <div class="flex flex-wrap gx-2 gy-1">
                    
                        
                        <div class="flex gap-2 items-center">
                            <img class="rounded-full w-12" src="https://www.foxhound.systems/assets/img/team/christian-sm.jpg" alt="Photo of Christian Charukiewicz">
                            <div class="ml-1 flex flex-col">
                                <span class="font-bold">Christian Charukiewicz</span>
                                <span>Partner &amp; Principal Software Engineer</span>
                            </div>
                        </div>
                        
                        
                    
                </div>
            </div>
            
            <div>
                <div class="tags"><a title="Posts tagged: haskell" href="https://www.foxhound.systems/blog/tag/haskell/">haskell</a></div>
            </div>
            
        </header>
        <hr>
        <div class="content">
        <p>Haskell is the first programming language we reach for when we build production software systems. This likely seems unusual to anyone who only has a passing familiarity with the language. Haskell has a reputation for being an advanced language with a steep learning curve. It is also often thought of as a research language with limited practical utility.</p>
<p>While Haskell does have a very large surface area, with many concepts and a syntax that will feel unfamiliar to programmers coming from most other languages, it is unrivaled in the combination of developer productivity, code maintainability, software reliability, and performance that it offers. In this post I will cover some of the defining features of Haskell that make it an excellent, industrial-strength language that is well-suited for building commercial software, and why it is usually the first tool we consider using for new projects.</p>
<!--more-->
<h2 id="haskell-has-a-strong-static-type-system-that-prevents-errors-and-reduces-cognitive-load">Haskell has a strong static type system that prevents errors and reduces cognitive load</h2>
<p>Haskell has a very powerful static type system which serves as a programmer aid that catches and prevents many errors before code ever even runs. Many programmers encounter statically typed languages like Java or C++ and find that the compiler feels like an annoyance. By contrast, Haskell’s static type system, in conjunction with compile-time type checking, acts as an invaluable pair-programming buddy that gives instantaneous feedback during development.</p>
<p>There’s a far smaller cognitive load that needs to be maintained when writing Haskell than when writing in languages like Python, JavaScript, or PHP. Many concerns can be completely offloaded to the compiler rather than needing to be remembered by the programmer. For example, when writing Haskell, there’s no need to preemptively ask questions like:</p>
<ul>
<li>Do I need to check whether this field is null?</li>
<li>What if fields are missing from the request payload?</li>
<li>Has this string already been decoded to an integer?</li>
<li>What if this string can’t be decoded to an integer?</li>
<li>Will this operator implicitly convert this integer to a string?</li>
<li>Are these two values comparable?</li>
</ul>
<p>This is not to say that these are questions that never need answering in Haskell; it’s to say that the compiler will throw an error when you need to address one of these issues. For example, it’s possible that a Haskell program needs to handle values that are sometimes not present, but instead of setting any value to <code>NULL</code>, a Haskell programmer must use a <code>Maybe</code> type, which indicates that the value may not be there, and the compiler forces the programmer to explicitly handle the <code>Nothing</code> value; the case where the value is not present.</p>
<p>Haskell’s static type system also leads to other benefits. Haskell code uses type signatures that precede its functions and describe the types of each parameter and return value. For example, a signature like <code>Int -&gt; Int -&gt; Bool</code> indicates that a function takes two integers and returns a boolean value. Since these type signatures are checked and enforced by the compiler, this allows a programmer reading Haskell code to look only at type signatures when getting a sense of what a certain piece of code does. For example, one would not use the type signature above when looking for a function that manipulates strings, decodes JSON, or queries a database.</p>
<p>Type signatures can even be used to search through the entire corpus of Haskell code for a relevant function. Using <a href="https://hoogle.haskell.org/" target="_blank" rel="noopener">Hoogle</a>, Haskell’s API search, we can search for a type signature based off of functionality we know that we need. For example, if we need to convert an <code>Int</code> to a <code>Float</code>, we can search Hoogle for <code>Int -&gt; Float</code> (<a href="https://hoogle.haskell.org/?hoogle=Int+-%3E+Float" target="_blank" rel="noopener">search results</a>), which will point us to the aptly named <code>int2Float</code> function.</p>
<p>Haskell also lets us create polymorphic type signatures through the use of type variables, represented by lowercase type names. For example, a signature of <code>a -&gt; b -&gt; a</code> tells us that that the function takes two parameters of two arbitrary types, and returns a value that whose type is the same as the first parameter. Suppose we want to check whether an element is in a list. We’re looking for a function that takes an item to search for, a list of items, and returns a boolean. We don’t care about the type of the item, so long as the search item and the items in the list are of the same type. So we can search Hoogle for <code>a -&gt; [a] -&gt; Bool</code> (<a href="https://hoogle.haskell.org/?hoogle=a%20-%3E%20%5Ba%5D%20-%3E%20Bool" target="_blank" rel="noopener">search results</a>), which will point us to the <code>elem</code> function. Parametric types are an extremely powerful feature in Haskell and are what enable writing reusable code.</p>
<h2 id="haskell-enables-writing-code-that-is-composable-testable-and-has-predictable-side-effects">Haskell enables writing code that is composable, testable, and has predictable side-effects</h2>
<p>In addition to being statically typed, Haskell is a pure functional programming language. This is one of Haskell’s defining features and what the language is well known for, even amongst programmers that have only heard of Haskell but never used it. Writing in a pure functional style has many benefits, and is conducive to a well-organized code base.</p>
<div class="article-banner">
    <hr>
    <a href="https://www.cloudtrellis.com/?utm_source=foxhound.systems&amp;utm_medium=banner&amp;utm_campaign=cloudtrellis" class="service-offering no-underline magic-underline-container color-accent-hover-container flex-col w-100 items-center text-left font-manrope" target="_blank" rel="noopener">
        <span class="block w-100 text-center mb-1">
            <div class="flex items-center g-0">
                <span class="inline-block relative">
                    <svg class="color-muted color-accent-hover block s-w-3 s-h-3"><use href="#cloudtrellis-logo"></use></svg>
                </span>
                <span class="font-larger-3">Cloudtrellis</span>
            </div>
        </span>
        <span class="w-100 mb-1 block color-muted font-smaller">A new service built by Foxhound Systems</span>
        <span class="font-bold font-larger-3 block mb-2">Discover problems with your website before your users do</span>
        <br><br>
        <span class="w-100 mb-1 block">Cloudtrellis scans your entire site for broken links, accessibility issues, and SEO errors to ensure a flawless user experience.</span>
        <br><br>
        <ul class="flex-col items-baseline mb-2 mt-none">
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Detect error pages, broken links, accessibility issues, and SEO problems</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Create scans with tailored configurations for each website and subdomain you manage</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Schedule scans to run monthly, weekly, or even daily to closely monitor for new issues</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Get notified of new scan results via email</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Share scan results with your team via direct link</span>
            </li>
        </ul>
        <span class="block w-full cta-btn pt-2">Learn more</span>
    </a>
    <hr>
</div>

<p>The word “pure” in “pure functional programming” is significant. Purity in this sense means that the code we write is pure, or free of side-effects. Another term that describes this is <a href="https://en.wikipedia.org/wiki/Referential_transparency" target="_blank" rel="noopener">referential transparency</a>, or the property where any expression (e.g. a function call with a given list of parameters) can be replaced with its return value without changing the functionality of the code. This is only possible when such pure functions do not have side effects, such as creating files on the host system, running database queries, or making HTTP requests. Haskell’s type system imposes this sort of purity.</p>
<p>So does being pure mean that Haskell programs cannot have side effects? Certainly not—but it does mean that effects are pushed to the edge of our system. Any functions that perform I/O actions (such as querying a database or receiving HTTP requests) must have a return type that captures this. This means that type signatures like the ones we saw in the previous section (e.g. <code>Int -&gt; Float</code> or <code>a -&gt; [a] -&gt; Bool</code>) are indicators that the corresponding functions do not produce side effects, since <code>Float</code> and <code>Bool</code> are just primitive return types. For a contrasting example that includes a side effect, a function signature of <code>FilePath -&gt; IO String</code> indicates that the function takes a file path and performs an I/O action that returns a string (which is exactly what the <code>readFile</code> function does).</p>
<p>Another feature of a pure functional programming paradigm is higher-order functions, which are functions that take functions as parameters. One of the most commonly used higher-order functions is <code>fmap</code>, which applies a function to each value in a container (such as a list). For example, we can apply a function named <code>square</code>, which takes an integer and returns that integer multiplied by itself, to a list of integers to turn it into a list of squared integers:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ot">square ::</span> <span class="dt">Int</span> <span class="ot">-&gt;</span> <span class="dt">Int</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>square x <span class="ot">=</span> x <span class="op">*</span> x</span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="fu">fmap</span> square [<span class="dv">1</span>,<span class="dv">2</span>,<span class="dv">3</span>,<span class="dv">4</span>,<span class="dv">5</span>] <span class="co">-- returns [1,4,9,16,25]</span></span></code></pre></div>
<p>Code written in this style tends to be both composable and testable. This above example is trivial, but there are many applications of higher-order functions. For example, we can write a function like <code>renderPost</code> which takes a record of post data and returns the version of the post rendered in HTML. If we have a list of posts, we can run <code>fmap renderPost postList</code> to produce a list of rendered posts. Our <code>renderPost</code> function can be used in both the single case and the multi-post case without any changes, because composing it with <code>fmap</code> changes how we can apply it. We can also write tests for the <code>renderPost</code> function and compose it with <code>fmap</code> in our tests when validating the behavior for a list of posts.</p>
<h2 id="haskell-facilitates-rapid-development-worry-free-refactoring-and-excellent-maintainability">Haskell facilitates rapid development, worry-free refactoring, and excellent maintainability</h2>
<p>Through the combination of the aforementioned static types and pure functional style that Haskell has, developing software in Haskell tends to be very fast. One of the common development workflows we employ is relies on a tool called <a href="https://github.com/ndmitchell/ghcid" target="_blank" rel="noopener"><code>ghcid</code></a>, a simple command line tool that relies on the Haskell REPL to automatically watch code for changes and incrementally recompile. This allows us to see any compiler errors in our code immediately after saving changes to a file. It’s not uncommon for us to open only a terminal with a text editor in one pane and <code>ghcid</code> in another while developing applications in Haskell.</p>
<p>While manually validating the results of our code is eventually necessary, such as by refreshing a page in a browser or using a tool to validate a JSON endpoint, a lot of this can be deferred until the end of a programming session. Many of the runtime errors that a programmer would encounter when writing a web service in a language like Python or PHP are caught immediately and displayed as compiler errors by <code>ghcid</code>. This is a far cry from the need to switch to a browser window and refresh the page after making a change to some code; a development workflow that everyone who has worked on a web application is intimately familiar with.</p>
<p>Beyond the tight feedback loop during development, Haskell code is easy to refactor and modify. Like real world code written in any other language, such code written in Haskell is not write-only. It will eventually need to be maintained, updated, and extended, often by developers that are not the original authors of the code. With the aid of compile-time checking, many code refactors in Haskell become easy; a common refactoring workflow is to make a desired change in one location and then fix one compiler error at a time until the program compiles again. This is far easier than the equivalent changes in dynamically typed languages that offer no such assistance to the programmer.</p>
<p>Proponents of dynamically typed languages will often argue that automated tests supplant the need for compile-time type checking, and can help prevent errors as well. However, tests are not as powerful as type constraints. For tests to be effective, they must:</p>
<ol type="1">
<li>Actually be written, yet many real world code bases have limited testing.</li>
<li>Make correct assertions.</li>
<li>Be comprehensive (test a variety of inputs) and provide good coverage (test a large portion of the code base).</li>
<li>Be easy to run and finish quickly, otherwise they will not become part of the development workflow.</li>
<li>Be updated and maintained in tandem with the code they test.</li>
</ol>
<p>Haskell’s type system has none of the above issues. The type system is a fixture in the language and the compiler always validates that the types are correct. The type system is inherently comprehensive, providing full coverage of every piece of Haskell code, and there are no changes to make to it as the underlying code changes. All this is not to say that the type system can replace every type of test. But what it does do is provide assurances that are more comprehensive than tests, and are present in every code base, even when no tests exist.</p>
<h2 id="haskell-programs-have-stellar-performance-leading-to-faster-applications-and-lower-hardware-costs">Haskell programs have stellar performance, leading to faster applications and lower hardware costs</h2>
<p>GHC, the most commonly used Haskell compiler, produces extremely fast executables, especially when compared against other languages commonly used for application development, such as PHP or Python. This improved performance leads to both a more responsive application and lower hardware costs.</p>
<p>It’s common to hear proponents of other languages be dismissive when their language is described as slow, as hardware is a relatively small cost compared to the cost of hiring programmers. This may be true, but we have found that the difference between Haskell and other languages used for web development is staggering.</p>
<p>On one project we worked on in the past, we began implementing new API endpoints in a Haskell web service instead of the incumbent PHP. After around a year of building features and adding endpoints in Haskell, both the PHP and Haskell web services were dealing with a similar average workload in terms of request count and type, and performed similar CRUD actions backed by the same SQL database. The infrastructure was hosted on AWS, and the breakdown of the infrastructure used for each web service is below.</p>
<div class="table-wrapper">
<table>
<colgroup>
<col />
<col />
<col />
<col />
<col />
<col />
<col />
</colgroup>
<thead>
<tr>
<th>Web Service Language</th>
<th>EC2 Instance Type</th>
<th>CPU</th>
<th>RAM</th>
<th>Monthly Cost Per Instance</th>
<th>Number of Instances Used</th>
<th>Total Monthly Cost</th>
</tr>
</thead>
<tbody>
<tr>
<td>PHP</td>
<td><code>c5.xlarge</code></td>
<td>4 Dedicated CPU cores</td>
<td>8 GB</td>
<td>$122</td>
<td>2</td>
<td>$244</td>
</tr>
<tr>
<td>Haskell</td>
<td><code>t3.nano</code></td>
<td>2 Flex CPU cores (limited to 20% use)</td>
<td>0.5 GB</td>
<td>$3.75</td>
<td>4</td>
<td>$15</td>
</tr>
</tbody>
</table>
</div>
<p>In this application, each of the Haskell and PHP web services handled a similar number of requests, handled a similar workload, and had similar traffic spikes throughout the day, all while querying the same database. Both the PHP and Haskell web services used Nginx as a reverse proxy. In the end, the cost of operating the Haskell infrastructure was roughly 1/16th (or 6%) of what the PHP infrastructure was. Examining our AWS usage metrics, the CPU on our Haskell machines never even hit 5%. The Haskell endpoints consistently had response times of 100ms or less, slightly outperforming the PHP endpoints.</p>
<p>Ultimately, we had two web services, one written in Haskell and the other written in PHP, that had similar performance but the former had a cost of $200/year and the latter had a cost of $3,000/year. It’s worth noting that the user base of this application was relatively small, with under 25,000 monthly active users (MAUs). This difference in cost would scale as the size of the user base, number of MAUs, and underlying infrastructure increased.</p>
<p>It’s certainly possible to criticize this comparison, and I do not claim that it is in any way scientific. But it’s clear to me that based off of our past experience running production workloads, Haskell outperforms PHP by at least an order of magnitude (and PHP 7.0+ performs remarkably well compared to many other similar languages). The cost reduction that comes with operating Haskell over other web languages is not by any means insignificant.</p>
<h2 id="haskell-is-great-for-domain-modeling-and-preventing-errors-in-domain-logic">Haskell is great for domain modeling and preventing errors in domain logic</h2>
<p>Another benefit of Haskell’s type system beyond simple compile time type-checking is that it enables modeling a problem domain through the use of custom data types within an application. This allows a programmer to create a description of business logic rules that are enforced by the type system. Haskell has what are referred to as algebraic data types (ADTs), consisting of both records (product types) and tagged unions (sum types). Records are similar to dictionaries or JSON objects, and commonly available in many languages. Tagged unions, however, are not available in many languages, but are what enable a significant amount of flexibility in domain modeling.</p>
<p>The power of ADTs is best illustrated through an example. Suppose we are creating an invoicing system that must keep track of customer invoices. Each invoice must contain a list of line items that the invoice is for and have an invoice status that indicates whether the order has been paid or canceled. The types we would use to model this might look like the following:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Dollars</span> <span class="ot">=</span> <span class="dt">Int</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">CustomerInvoice</span> <span class="ot">=</span> <span class="dt">CustomerInvoice</span></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a>    {<span class="ot"> invoiceNumber ::</span> <span class="dt">Int</span></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a>    ,<span class="ot"> amountDue     ::</span> <span class="dt">Dollars</span></span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a>    ,<span class="ot"> tax           ::</span> <span class="dt">Dollars</span></span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a>    ,<span class="ot"> billableItems ::</span> [<span class="dt">String</span>]</span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a>    ,<span class="ot"> status        ::</span> <span class="dt">InvoiceStatus</span></span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a>    ,<span class="ot"> createdAt     ::</span> <span class="dt">UTCTime</span></span>
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a>    ,<span class="ot"> dueDate       ::</span> <span class="dt">Day</span></span>
<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a>    }</span>
<span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">InvoiceStatus</span></span>
<span id="cb2-14"><a href="#cb2-14" aria-hidden="true" tabindex="-1"></a>    <span class="ot">=</span> <span class="dt">Issued</span></span>
<span id="cb2-15"><a href="#cb2-15" aria-hidden="true" tabindex="-1"></a>    <span class="op">|</span> <span class="dt">Paid</span></span>
<span id="cb2-16"><a href="#cb2-16" aria-hidden="true" tabindex="-1"></a>    <span class="op">|</span> <span class="dt">Canceled</span></span></code></pre></div>
<p>Modeling domain rules in the type system like this (e.g. the status of an invoice is either <code>Issued</code>, <code>Paid</code>, or <code>Canceled</code>) results in these rules getting enforced at compile time, as described in the earlier section on static typing. This is a much stronger set of guarantees than encoding similar rules in class methods, as one might do in an object oriented language that does not have sum types. With the type above, it becomes impossible to define <code>CustomerInvoice</code> that doesn’t have an amount due, for example. It’s also impossible to define an <code>InvoiceStatus</code> that is anything other than one of the three aforementioned values.</p>
<p>One application of the above types may be a function that creates a notification message based on the status of the invoice. This function would take a <code>CustomerInvoice</code> as a parameter and return a string representing the content of the notification.</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="ot">createCustomerNotification ::</span> <span class="dt">CustomerInvoice</span> <span class="ot">-&gt;</span> <span class="dt">String</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>createCustomerNotification invoice <span class="ot">=</span></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>    <span class="kw">case</span> status invoice <span class="kw">of</span></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a>        <span class="dt">Issued</span> <span class="ot">-&gt;</span></span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a>            <span class="st">&quot;Invoice #&quot;</span> <span class="op">++</span> <span class="fu">show</span> (invoiceNumber invoice) <span class="op">++</span> <span class="st">&quot; due on &quot;</span> <span class="op">++</span> <span class="fu">show</span> (dueDate invoice)</span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a>        <span class="dt">Paid</span> <span class="ot">-&gt;</span></span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a>            <span class="st">&quot;Successfully paid invoice #&quot;</span> <span class="op">++</span> <span class="fu">show</span> (invoiceNumber invoice)</span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a>        <span class="dt">Canceled</span> <span class="ot">-&gt;</span></span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a>            <span class="st">&quot;Invoice #&quot;</span> <span class="op">++</span> <span class="fu">show</span> (invoiceNumber invoice) <span class="op">++</span> <span class="st">&quot; has been canceled&quot;</span></span></code></pre></div>
<p>The above function uses pattern matching, another feature in the language, to handle every possible <code>InvoiceStatus</code> value. The <code>case</code> statement allows us to handle the different possible values of the <code>status</code> field.</p>
<p>The type system can protect us from making mistakes when changing the rules of our domain. Suppose that after this application is live for a while, we get feedback from our users that we need to be able to refund invoices. To facilitate this, we’ll update our <code>InvoiceStatus</code> type to include a <code>Refunded</code> value constructor:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">InvoiceStatus</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>    <span class="ot">=</span> <span class="dt">Issued</span></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>    <span class="op">|</span> <span class="dt">Paid</span></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a>    <span class="op">|</span> <span class="dt">Canceled</span></span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a>    <span class="op">|</span> <span class="dt">Refunded</span></span></code></pre></div>
<p>If this is the only code we change, then upon compilation, we get the following error:</p>
<pre><code>CustomerInvoice.hs:(15,5)-(20,35): error: [-Wincomplete-patterns, -Werror=incomplete-patterns]
    Pattern match(es) are non-exhaustive
    In a case alternative: Patterns not matched: Refunded
   |
15 |     case status invoice of
   |     ^^^^^^^^^^^^^^^^^^^^^^...</code></pre>
<p>Whoops! Looks like we forgot to update the <code>createCustomerNotification</code> function to handle this new status value. The compiler is throwing an error and telling us that the <code>case</code> statement does not handle the <code>Refunded</code> value as part of its pattern matches.</p>
<p>By modeling our domain in our types, the compiler assists us in ensuring that all of our domain logic can handle every possible value in the domain*. This protects us from the very common mistake of an unhandled value when writing in dynamically typed languages. Automated tests are not a replacement for types in this situation, because the introduction of new possible values often requires updating tests to assert whether the new values can be handled, which doesn’t help us avoid the problem—it’s just as easy to forget to update tests for the business logic as it is to forget to update the business logic.</p>
<aside>
* By default, GHC (the Haskell compiler) will not throw an error in the case of an unhandled value, but it’s standard practice for production Haskell projects to use the <code>-Wall</code> and <code>-Werror</code> flags, which turn on nearly every available warning and turn all warnings into errors.
</aside>
<h2 id="haskell-has-a-large-number-of-mature-high-quality-libraries">Haskell has a large number of mature, high-quality libraries</h2>
<p>The Haskell community has a published a large number of high quality, production grade packages, many of which have been maintained for for a decade or longer. The Haskell community has general consensus as to which packages are good options in each functional category (e.g. decoding/encoding JSON, parsing XML, decoding CSVs, working with SQL databases, HTML templating, websockets, using Redis, etc). In some categories there is a single, best option that is the <em>de facto</em> standard. In other categories, there are several comparable options to choose from, depending on what design decisions or trade offs a developer is willing to make.</p>
<p>Haskell has over 21,000 packages available in its package repository, <a href="https://hackage.haskell.org" target="_blank" rel="noopener">Hackage</a>, and many more published in various places such as GitHub that build tools can depend on. However, this number is dwarfed by the number of packages available in the repositories of many other languages. As of this post’s publication date, Ruby has <a href="https://rubygems.org/stats" target="_blank" rel="noopener">164,000 gems published</a>. There are <a href="https://pypi.org/" target="_blank" rel="noopener">282,000 Python packages on PyPI</a>. There were over <a href="https://blog.npmjs.org/post/615388323067854848/so-long-and-thanks-for-all-the-packages" target="_blank" rel="noopener">1.3 million JavaScript packages on npm</a> as of April 2020.</p>
<p>This discrepancy leads to one of the reservations I have heard expressed about using Haskell in production: there aren’t as many Haskell packages available as there are in other languages. My response to this is that when building production systems, the total number of packages available for a given language is largely irrelevant.</p>
<p>When building a production system, the decision of which packages to use is never based off of the total number of packages available, but which individual packages have a good reputation, widespread use, and other factors such as good documentation and whether a given package is still being maintained. To put it simply, it’s quality and not quantity that matters, and to that end, the Haskell community does an excellent job at curating the packages necessary for real world use cases I described earlier.</p>
<h2 id="haskell-makes-it-easy-to-write-concurrent-programs">Haskell makes it easy to write concurrent programs</h2>
<p>One feature of being a pure functional language is that, by default, values in Haskell are immutable. This is not to say that values never change, but state is not changed in-place. For example, when a function appends an element to a list, a new list is returned and the memory used by the old list will be freed by the garbage collector. A benefit of such of immutability is that it simplifies concurrent programming. In a language with mutable values, multiple threads accessing the same value can lead to issues such as race conditions and deadlocks.</p>
<p>Since values in Haskell are immutable, there is no risk of these types of issues even when a program is running on multiple threads and accessing shared memory. This also results in a simpler mental model surrounding concurrent programming. Concurrent code can often be written in the same style as single-threaded code, with functions that run the underlying workload on a new thread simply wrapping the single-threaded implementation.</p>
<p>Concurrency is a useful tool in the Haskell programmer’s toolbox. On projects we have worked on in the past, we have done everything from implemented websocket servers that run as part of the same executable that serves an HTTP API, to created a multi-threaded worker system that required far less overhead than managing individual Linux processes necessary for workers written in languages with limited concurrency support.</p>
<h2 id="haskell-enables-domain-specific-languages-which-foster-expressiveness-and-reduce-boilerplate">Haskell enables domain-specific languages, which foster expressiveness and reduce boilerplate</h2>
<p>Haskell’s type system and language features make it a common choice for writing compilers. One offshoot of this is that Haskell libraries sometimes employ <a href="https://en.wikipedia.org/wiki/Domain-specific_language" target="_blank" rel="noopener">domain-specific languages</a> (DSLs) to improve their usability. A DSL, in contrast to a general purpose language, is a small language designed to be well-suited for expressing the rules of a specific application or problem domain.</p>
<p>One of the most well known and widely used DSLs is SQL, which is the language used to query data stored in relational database systems. Unlike most languages, SQL is declarative rather than imperative. This means that a SQL program tends to describe <em>what</em> the outcome of its execution should be rather than <em>how</em> that outcome should be achieved. Any developer familiar with SQL can imagine how writing code to retrieve data stored in tables as a series of rows in an imperative style would be very cumbersome.</p>
<p>One of the features in Haskell that facilitates DSLs is called Template Haskell. This is commonly employed by library authors to allow consumers of the library to use what is an expressive syntax to avoid a lot of boilerplate. One example of this is in the <a href="https://hackage.haskell.org/package/persistent" target="_blank" rel="noopener">Persistent</a> library, one of the most popular SQL libraries. Persistent exposes a DSL that uses what is referred to as Persistent Entity Syntax that allows the user of the library to define their database schema. An example of this syntax is below.</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="dt">Person</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>    name <span class="dt">Text</span></span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a>    age <span class="dt">Int</span> <span class="dt">Maybe</span></span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a><span class="dt">BlogPost</span></span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a>    title <span class="dt">Text</span></span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a>    authorId <span class="dt">PersonId</span></span>
<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a>    publicationDate <span class="dt">UTCTime</span></span>
<span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a><span class="dt">BlogPostTag</span></span>
<span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a>    label <span class="dt">Text</span></span>
<span id="cb6-10"><a href="#cb6-10" aria-hidden="true" tabindex="-1"></a>    blogPostId <span class="dt">BlogPostId</span></span></code></pre></div>
<p>The code above is not Haskell, and if you have never used Haskell’s Persistent library, odds are you have never seen this syntax. Yet it is apparent what it does—it defines three tables (<code>Person</code>, <code>BlogPost</code>, and <code>BlogPostTag</code>) and the columns within them. This code gets consumed by a Haskell program and supplants the need to write approximately 150 lines of Haskell code to define all of the data types and accessor functions for working with the data from these three tables.</p>
<p>The above is only one example of an external DSL, which is a DSL that uses its own syntax. Other libraries that expose DSLs include ones for webserver route definitions and for HTML templating. Some library authors opt to create embedded domain-specific languages (eDSLs), which are written in Haskell syntax. This results in a series of types and functions that are specialized to a particular domain. <a href="https://hackage.haskell.org/package/esqueleto" target="_blank" rel="noopener">Esqueleto</a> is an example of a widely-used library that exposes an eDSL for writing type-safe SQL queries.</p>
<h2 id="haskell-has-a-large-community-filled-with-smart-and-friendly-people">Haskell has a large community filled with smart and friendly people</h2>
<p>One of the most important facets of using a programming language is the community. Haskell’s community is large and includes a wide variety of people coming from many different technical backgrounds. This includes programming language researchers, some of whom have been working on Haskell since its inception in 1990, creators of other programming languages whose compilers are written in Haskell, self-taught Haskell enthusiasts, professional Haskell programmers using Haskell commercially (we at Foxhound Systems fall into this category), as well as eager-to-learn students, amongst many others.</p>
<p>The Haskell community is very welcoming to beginners. While the language has a learning curve that is steeper than that of many others due to its depth and breadth, it’s easy to ask questions and find help any number of people that sincerely want to help others learn the language.</p>
<p>Some of the forms of communication we like to use to engage with the Haskell community are:</p>
<ul>
<li>The <a href="https://www.reddit.com/r/haskell" target="_blank" rel="noopener">Haskell subreddit</a>, which has over 60,000 readers and is one of the largest programming language communities on reddit.</li>
<li>The <a href="https://fpslack.com/" target="_blank" rel="noopener">Functional Programming Slack</a>, which has a number of channels dedicated to Haskell (including <strong>#haskell</strong>, <strong>#haskell-beginners</strong>, <strong>#haskell-jobs</strong>, and <strong>#haskell-adoption</strong>).</li>
<li>The Haskell mailing lists, such as <a href="https://mail.haskell.org/mailman/listinfo/haskell-cafe" target="_blank" rel="noopener">haskell-cafe</a>, which have a variety of content from library announcements, to Q&amp;A about the language, to volunteer opportunities</li>
<li>The <strong>#haskell</strong> channel on the Freenode IRC network often has over 1,000 people connected to it, and is a great alternative to the Slack channels.</li>
<li>The <a href="https://haskellweekly.news/" target="_blank" rel="noopener">Haskell Weekly Newsletter</a>, which is a weekly newsletter that highlights blog posts and other announcements from the preceding week.</li>
<li>Although not conventional community, the <a href="https://stackoverflow.com/questions/tagged/haskell" target="_blank" rel="noopener"><code>haskell</code> tag on StackOverflow</a> has over 46,000 questions associated with it. It’s not uncommon to find excellent answers that give a great overview of a specific topic or issue related to the language.</li>
</ul>
<p>This is not an exhaustive list, and participation through every forum is not necessary. But when someone is looking for help or generally learning about the language, it’s worth using any of the forums above.</p>
<h2 id="conclusion">Conclusion</h2>
<p>There are many reasons for why Haskell is our first choice of programming language for building production software systems. To recap the whole list covered in this post:</p>
<ul>
<li>Haskell has a strong static type system that prevents errors and reduces cognitive load.</li>
<li>Haskell enables writing code that is composable, testable, and has predictable side-effects.</li>
<li>Haskell facilitates rapid development, worry-free refactoring, and excellent maintainability.</li>
<li>Haskell programs have stellar performance, leading to faster applications and lower hardware costs.</li>
<li>Haskell is great for domain modeling and preventing errors in domain logic.</li>
<li>Haskell has a large number of mature, high-quality libraries.</li>
<li>Haskell makes it easy to write concurrent programs.</li>
<li>Haskell enables domain-specific languages, which foster expressiveness and reduce boilerplate.</li>
<li>Haskell has a large community filled with smart and friendly people.</li>
</ul>
<p>It is the sum of these reasons that makes Haskell such a compelling choice. Haskell enables rapid development, worry-free refactoring, easy maintainability, provides excellent performance, and has a mature ecosystem. These facets among many others make it an excellent choice for building production applications.</p>
<hr />
<p><em>Christian Charukiewicz is a Partner and Principal Software Engineer at Foxhound Systems. At Foxhound Systems, we use using Haskell to create fast and reliable custom built software. Looking for someone to help you build a new product or to introduce Haskell to your own development team? Reach out to us at <a href="mailto:info@foxhound.systems" target="_blank" rel="noopener">info@foxhound.systems</a>.</em></p>
        </div>
    </div>
</article>
]]></description>
    <pubDate>Mon, 11 Jan 2021 00:00:00 UT</pubDate>
    <guid>https://www.foxhound.systems/blog/why-haskell-for-production/index.html</guid>
    <dc:creator>Foxhound Systems</dc:creator>
</item>
<item>
    <title>Reducing the pain of grouping SQL query results using Haskell</title>
    <link>https://www.foxhound.systems/blog/grouping-query-results-haskell/index.html</link>
    <description><![CDATA[<nav class="container mt-2 mb-2">
    <a href="https://www.foxhound.systems/blog/" class="no-underline magic-underline-slim">Back to all posts</a>
</nav>
<article class="grow post">
    
        <div class="lg-container post-banner">
            <picture>
                
                    <source srcset="https://www.foxhound.systems/img/2020-11-23-grouping-query-results-haskell/ducks-in-a-row-banner.webp" type="image/webp" height="850" width="1280">
                
                <img src="https://www.foxhound.systems/img/2020-11-23-grouping-query-results-haskell/ducks-in-a-row-banner.jpg" alt="A photograph of a line of rubber ducks floating down what appears to be a concrete gutter in between two cobblestone paths." height="850" width="1280">
            </picture>
        </div>
    
    <div class="container">
        
        <h1 class="title">Reducing the pain of grouping SQL query results using Haskell</h1>
        <header class="info gy-1">
            <div class="author-date space-y-1">
                <div>
                    <span class="date">November 23, 2020</span>
                </div>
                <div class="flex flex-wrap gx-2 gy-1">
                    
                        
                        
                        <div class="flex gap-2 items-center">
                            <img class="rounded-full w-12" src="https://www.foxhound.systems/assets/img/team/ben-sm.jpg" alt="Photo of Ben Levy">
                            <div class="ml-1 flex flex-col">
                                <span class="font-bold">Ben Levy</span>
                                <span>Partner &amp; Principal Software Engineer</span>
                            </div>
                        </div>
                        
                    
                        
                        <div class="flex gap-2 items-center">
                            <img class="rounded-full w-12" src="https://www.foxhound.systems/assets/img/team/christian-sm.jpg" alt="Photo of Christian Charukiewicz">
                            <div class="ml-1 flex flex-col">
                                <span class="font-bold">Christian Charukiewicz</span>
                                <span>Partner &amp; Principal Software Engineer</span>
                            </div>
                        </div>
                        
                        
                    
                </div>
            </div>
            
            <div>
                <div class="tags"><a title="Posts tagged: haskell" href="https://www.foxhound.systems/blog/tag/haskell/">haskell</a> <a title="Posts tagged: sql" href="https://www.foxhound.systems/blog/tag/sql/">sql</a> <a title="Posts tagged: performance-optimization" href="https://www.foxhound.systems/blog/tag/performance-optimization/">performance-optimization</a></div>
            </div>
            
        </header>
        <hr>
        <div class="content">
        <p>Relational databases allow us to model the associations between different types of data in our system domain. Most application database schemas rely on normalization to avoid data duplication. We use SQL to retrieve this data from a database, but SQL has limitations. When we need data from several tables, we’re forced to make trade offs in how we query our data, and our query results often do not contain an ideal representation of the relationships between our data entites.</p>
<p>In order to mitigate this limitation of SQL, we typically transform the data we retrieve via our queries in our application layer. With a system written in Haskell, we can use the <code>Semigroup</code> typeclass and the append operation it exposes (<code>&lt;&gt;</code>) to transform the data into the shape we need by defining our desired custom data types and simple transformation functions. In this post we’ll explore this method of solving this problem in more detail.</p>
<!--more-->
<h2 id="limitations-of-sql">Limitations of SQL</h2>
<p>SQL is an excellent tool for querying data, with queries allowing us to retrieve data from one or several tables in our database in many ways. However, the data that a query can produce has limitations. Namely, a query can only produce two-dimensional data consisting of columns and rows. We can write <code>JOIN</code> queries to retrieve data from several tables at once, but are still limited to producing two-dimensional data.</p>
<p>This is not a problem when the data between joined tables has a one-to-one relationship, but when data has one-to-many or many-to-many relationships, we’ll see repetition in our query results (e.g. if a user has authored many posts, we’ll see that user repeat in any query that joins the tables together). With this in mind, we have several strategies we can apply in how we query our data.</p>
<h3 id="query-strategy-1-n1-queries">Query strategy 1: N+1 queries</h3>
<p>When we have one-to-many relationships in our data, we can eschew using <code>JOIN</code>s altogether and instead run many simple queries, the number of which scales with the number of related pieces of data. Typically this results in running successive queries in a loop to get each related piece of data:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode php"><code class="sourceCode php"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>authors <span class="op">=</span> queryAuthorsByCountry(<span class="st">'USA'</span>)          <span class="co">// 1</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="cf">foreach</span> author <span class="op">&lt;-</span> authors<span class="ot">:</span>                      <span class="co">// N</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>    authors[author<span class="op">.</span>id][<span class="st">'books'</span>] <span class="op">=</span> queryBooksByAuthorId(author<span class="op">.</span>id)</span></code></pre></div>
<p>The benefit of this approach is that it is simple and allows us to easily write application code that results in data structured in a format that accurately models the relationships between entities. But this approach has a major performance trade off. The number of queries increases linearly “N” with the number of items we’re working with. This workload scales exponentially when we are working with data that is nested across multiple levels, and the N+1 can become (M*N)+N+1.</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode php"><code class="sourceCode php"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>authors <span class="op">=</span> queryAuthorsByCountry(<span class="st">'USA'</span>)          <span class="co">// 1</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="cf">foreach</span> author <span class="op">&lt;-</span> authors<span class="ot">:</span>                      <span class="co">// N</span></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>    books <span class="op">=</span> queryBooksByAuthorId(author<span class="op">.</span>id)</span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a>    <span class="cf">foreach</span> book <span class="op">&lt;-</span> books<span class="ot">:</span>                      <span class="co">// N * M</span></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a>        book_genre_tags <span class="op">=</span> queryBookGenresByBookId(book<span class="op">.</span>id)</span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a>        <span class="co">// ...</span></span></code></pre></div>
<p>In practice, this exponential growth in the number of queries executed can quickly result in significant slowdowns in an application. An application database schema often has half-a-dozen or more tables that require querying in response to a single request, and it’s not uncommon to see this approach resulting in in hundreds or thousands of queries being executed per request. What’s more is that is also possible for a developer to <em>accidentally</em> write code that produces queries like this. Many ORM libraries in object-oriented languages produce queries using this pattern when accessing related data entities.</p>
<h3 id="query-strategy-2-query-per-table">Query strategy 2: Query per table</h3>
<p>An improvement over N+1 queries is to run one query per table. There are several variations in how this can be done in practice, but the most common approach is to use the results from each query to construct a list of keys to constrain each successive query by.</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode php"><code class="sourceCode php"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>authors <span class="op">=</span> queryAuthorsByCountry(<span class="st">'USA'</span>)</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>author_id_list <span class="op">=</span> []</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="cf">foreach</span> author <span class="op">&lt;-</span> authors<span class="ot">:</span></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a>    author_id_list<span class="op">.</span>push(author<span class="op">.</span>id)</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a>books <span class="op">=</span> queryBooksByAuthorIds(author_id_list)</span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a>book_id_list <span class="op">=</span> []</span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a><span class="cf">foreach</span> book <span class="op">&lt;-</span> books<span class="ot">:</span></span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a>    book_id_list<span class="op">.</span>push(book<span class="op">.</span>id)</span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a>book_genre_tags <span class="op">=</span> queryBookGenresByBookIds(book_id_list)</span></code></pre></div>
<p>This approach results in significantly better performance than the N+1 approach in many situations. It’s easy to see that by not running the queries on the related table in a loop, the number of queries executed remains relatively small. The downside, however, is that although we have dramatically reduced the number of queries we’re executing, the data we now have in memory is in distinct lists grouped by type, which does not capture any of the parent-child relationships between the data entities.</p>
<p>When rendering a user interface or returning a JSON API response, it’s often necessary to structure the data in a way where childen entities are nested within their parents (e.g. each author contains a list of books, and each book contains a list of genre tags). In order to do this, we have to make multiple passes through the data in order to create the required nested structure. This requires writing additional code, the complexity of which scales both in terms of how nested the relationships are as well as how many related types of entities are retrieved at each level.</p>
<h3 id="query-strategy-3-single-query-with-joins">Query strategy 3: Single query with joins</h3>
<p>In order to cut down the amount of querying we’re doing, we can write a single query that uses <code>JOIN</code> operations to retrieve data from each of the related tables. This approach typically results in the best performance, as long as queries are written correctly and columns are indexed where necessary. The downside of this approach is that our data is subject to <em>fanout</em>, which occurs when the primary table has fewer rows than a joined table. This occurs when there is a one-to-many relationship between the data.</p>
<p>If we had schema like the following, it would be possible to write several queries in which a single user’s data repeats over several rows. This would occur when joining <code>users</code> and <code>posts</code> for any user that had written more than one post. The same would be true if we retrieved <code>posts</code> with <code>post_comments</code>, for posts with more than one comment.</p>
<pre class="no-code">
   users
   +---------+-------+
+-&gt;| id      | int   |&lt;----+      posts
|  | name    | text  |     |      +-------------+------------+
|  | email   | text  |     |      | id          | int        |&lt;-+
|  +---------+-------+     +------+ user_id     | int        |  |
|                                 | title       | text       |  |
|                                 | body        | text       |  |
|                                 | created_at  | timestamp  |  |
|  post_comments                  +-------------+------------+  |
|  +-------------+------------+                                 |
|  | id          | int        |                                 |
|  | post_id     | int        +---------------------------------+
+--+ user_id     | int        |
   | body        | text       |
   | created_at  | timestamp  |
   +-------------+------------+
</pre>
<p>We can illustrate the fanout problem for the above schema by running the following query:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="kw">SELECT</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>    users.<span class="kw">id</span> <span class="kw">AS</span> user_id,</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>    users.name,</span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a>    posts.<span class="kw">id</span> <span class="kw">AS</span> post_id,</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a>    posts.title,</span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a>    posts.created_at</span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a><span class="kw">FROM</span> users</span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a><span class="kw">LEFT</span> <span class="kw">JOIN</span> posts</span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a><span class="kw">ON</span> users.<span class="kw">id</span> <span class="op">=</span> posts.user_id;</span></code></pre></div>
<p>This query might yield results like the following following:</p>
<div class="table-wrapper monospace">
<table>
<colgroup>
<col />
<col />
<col />
<col />
<col />
</colgroup>
<thead>
<tr>
<th>user_id</th>
<th>name</th>
<th>post_id</th>
<th>title</th>
<th>created_at</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>John Smith</td>
<td>10</td>
<td>How to write SQL</td>
<td>2020-11-18 13:06:02</td>
</tr>
<tr>
<td>1</td>
<td>John Smith</td>
<td>11</td>
<td>How to write SQL pt. 2</td>
<td>2020-11-21 18:17:34</td>
</tr>
<tr>
<td>1</td>
<td>John Smith</td>
<td>12</td>
<td>How to write SQL pt. 3</td>
<td>2020-12-01 15:26:22</td>
</tr>
<tr>
<td>2</td>
<td>Karen Doe</td>
<td>17</td>
<td>How to really write SQL</td>
<td>2021-01-18 11:19:44</td>
</tr>
</tbody>
</table>
</div>
<p>We can see that the <code>users</code> table data containing John Smith’s information is repeated multiple times in our query results, with several rows containing his primary key (a <code>user_id</code> value of 1). This is because a query like this will always produce a denormalized form, with each row’s data repeated however many times is necessary to match the number of entries in the joined table.</p>
<p>This data fanout increases as the number of tables we join into increases. If we added a <code>LEFT JOIN</code> into <code>post_comments</code> to our query, and there were multiple comments for each post, we would see John Smith’s information get duplicated in our query result even more times. We’d also see information from the <code>posts</code> table get duplicated.</p>
<p>In order to structure the data in a way that models parent-child relationships, we can write application code that will transform the data into the necessary format. We’ll apply several different strategies for doing this in Haskell.</p>
<div class="article-banner">
    <hr>
    <a href="https://www.cloudtrellis.com/?utm_source=foxhound.systems&amp;utm_medium=banner&amp;utm_campaign=cloudtrellis" class="service-offering no-underline magic-underline-container color-accent-hover-container flex-col w-100 items-center text-left font-manrope" target="_blank" rel="noopener">
        <span class="block w-100 text-center mb-1">
            <div class="flex items-center g-0">
                <span class="inline-block relative">
                    <svg class="color-muted color-accent-hover block s-w-3 s-h-3"><use href="#cloudtrellis-logo"></use></svg>
                </span>
                <span class="font-larger-3">Cloudtrellis</span>
            </div>
        </span>
        <span class="w-100 mb-1 block color-muted font-smaller">A new service built by Foxhound Systems</span>
        <span class="font-bold font-larger-3 block mb-2">Discover problems with your website before your users do</span>
        <br><br>
        <span class="w-100 mb-1 block">Cloudtrellis scans your entire site for broken links, accessibility issues, and SEO errors to ensure a flawless user experience.</span>
        <br><br>
        <ul class="flex-col items-baseline mb-2 mt-none">
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Detect error pages, broken links, accessibility issues, and SEO problems</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Create scans with tailored configurations for each website and subdomain you manage</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Schedule scans to run monthly, weekly, or even daily to closely monitor for new issues</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Get notified of new scan results via email</span>
            </li>
            <li class="flex items-baseline mt-1">
                <span class="ml-1">Share scan results with your team via direct link</span>
            </li>
        </ul>
        <span class="block w-full cta-btn pt-2">Learn more</span>
    </a>
    <hr>
</div>

<h2 id="transforming-denormalized-data-to-model-parent-child-relationships">Transforming denormalized data to model parent-child relationships</h2>
<p>We’re big fans of the <a href="https://hackage.haskell.org/package/esqueleto" target="_blank" rel="noopener">Esqueleto</a> library, since it allows us to write what looks and feels like SQL in a Haskell EDSL, which gives us the benefits of type safety and compile-time checking of our SQL queries. Esqueleto uses data type that roughly looks like the following to model entities that have been saved with the database, with the <code>entityKey</code> being the primary key of a record.</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Entity</span> a <span class="ot">=</span> <span class="dt">Entity</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>    {<span class="ot"> entityKey ::</span> <span class="dt">Key</span> a</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>    ,<span class="ot"> entityVal ::</span> a</span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a>    }</span></code></pre></div>
<p>So a post that is saved in the database would be represented by <code>Entity Post</code>. The result of our query that joins users, posts, and post comments together would look like:</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>[(<span class="dt">Entity</span> <span class="dt">User</span>, <span class="dt">Entity</span> <span class="dt">Post</span>, <span class="dt">Entity</span> <span class="dt">PostComment</span>)]</span></code></pre></div>
<p>But we want to transform this into:</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>[(<span class="dt">Entity</span> <span class="dt">User</span>, [(<span class="dt">Entity</span> <span class="dt">Post</span>, [<span class="dt">Entity</span> <span class="dt">PostComment</span>])])]</span></code></pre></div>
<h3 id="approach-1-grouping-child-data-with-a-custom-merge-function">Approach 1: Grouping child data with a custom merge function</h3>
<p>One of the common approaches in transforming the former into the latter is to write a pair of functions that look like the following:</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE FlexibleContexts #-}</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Data.Map</span>         <span class="kw">as</span> <span class="dt">Map</span></span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span>           <span class="dt">Data.Traversable</span></span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span>           <span class="dt">Database.Persist</span> ( <span class="dt">Entity</span> (..), <span class="dt">Key</span> )</span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a><span class="ot">groupData ::</span> (<span class="dt">Ord</span> (<span class="dt">Key</span> a)) <span class="ot">=&gt;</span> [(<span class="dt">Entity</span> a, b)] <span class="ot">-&gt;</span> [(<span class="dt">Entity</span> a, [b])]</span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a>groupData res <span class="ot">=</span></span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a>    Map.elems <span class="op">$</span> <span class="fu">foldr</span></span>
<span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a>        (\(a, b) accumulator <span class="ot">-&gt;</span></span>
<span id="cb8-11"><a href="#cb8-11" aria-hidden="true" tabindex="-1"></a>            Map.insertWith</span>
<span id="cb8-12"><a href="#cb8-12" aria-hidden="true" tabindex="-1"></a>                mergeData</span>
<span id="cb8-13"><a href="#cb8-13" aria-hidden="true" tabindex="-1"></a>                (entityKey a)</span>
<span id="cb8-14"><a href="#cb8-14" aria-hidden="true" tabindex="-1"></a>                (a, [b])</span>
<span id="cb8-15"><a href="#cb8-15" aria-hidden="true" tabindex="-1"></a>                accumulator</span>
<span id="cb8-16"><a href="#cb8-16" aria-hidden="true" tabindex="-1"></a>        ) Map.empty res</span>
<span id="cb8-17"><a href="#cb8-17" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-18"><a href="#cb8-18" aria-hidden="true" tabindex="-1"></a><span class="ot">mergeData ::</span> (a, [b]) <span class="ot">-&gt;</span> (a, [b]) <span class="ot">-&gt;</span> (a, [b])</span>
<span id="cb8-19"><a href="#cb8-19" aria-hidden="true" tabindex="-1"></a>mergeData (a, b) (a', b') <span class="ot">=</span></span>
<span id="cb8-20"><a href="#cb8-20" aria-hidden="true" tabindex="-1"></a>    (a, b <span class="op">++</span> b')</span></code></pre></div>
<p>In order to apply <code>groupData</code> to solve our problem, we would have to massage our data and apply the function recursively, since we have multiple levels at which the data must be grouped. This approach works, but has shortcomings. The function above helps us deal only with a single level of parent-child relationships. Depending on the specific results of a query, we may end up writing several variations of functions that look like <code>groupData</code> and <code>mergeData</code>. We end up with a lot of similar-but-slightly-different code when we need to apply this group operation for the results of various queries. We think that we can improve on this approach.</p>
<h3 id="approach-2-grouping-child-data-by-writing-our-own-semigroup-and-monoid-instances">Approach 2: Grouping child data by writing our own Semigroup and Monoid instances</h3>
<p>If we look at the <code>mergeData</code> function, we’ll notice that it looks like the <code>mappend</code> or <code>&lt;&gt;</code> operator defined by the <code>Semigroup</code> typeclass, which we can apply to any values whose types have a <code>Semigroup</code> instance defined. For example, we can apply this operation to tuples so long as each member is also an instance of <code>Semigroup</code>:</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="ot">genericMergeData ::</span> (<span class="dt">Semigroup</span> a, <span class="dt">Semigroup</span> b) <span class="ot">=&gt;</span> (a, b) <span class="ot">-&gt;</span> (a, b) <span class="ot">-&gt;</span> (a, b)</span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a>genericMergeData (a, b) (a', b') <span class="ot">=</span></span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a>    (a, b) <span class="op">&lt;&gt;</span> (a', b')</span></code></pre></div>
<p>It should be noted that we don’t even need a named function like the one above at all. We can modify <code>groupData</code> to replace our call to <code>mergeData</code> with <code>(&lt;&gt;)</code>. With this observation in mind, we can try to leverage the <code>Semigroup</code> typeclass and write our own instances.</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">GroupedPostResult</span> <span class="ot">=</span> <span class="dt">GroupedPostResult</span></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>    {<span class="ot"> groupedResultUser ::</span> <span class="dt">Entity</span> <span class="dt">User</span></span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a>    ,<span class="ot"> groupedResultPosts ::</span> [<span class="dt">Entity</span> <span class="dt">Post</span>]</span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a>    }</span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Semigroup</span> <span class="dt">GroupedPostResult</span> <span class="kw">where</span></span>
<span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a>    <span class="dt">GroupedPostResult</span> a b <span class="op">&lt;&gt;</span> <span class="dt">GroupedPostResult</span> _ b' <span class="ot">=</span> <span class="dt">GroupedPostResult</span> a (b <span class="op">++</span> b')</span></code></pre></div>
<p>What we have now is a single operator instead of a custom <code>mergeData</code> function. This by itself doesn’t help us a lot, but we can also observe that applying <code>foldr</code> using <code>(&lt;&gt;)</code> is very similar to <code>mconcat</code>. However, since <code>mconcat</code> is an operation in the <code>Monoid</code> typeclass, we will have to choose a type that has such an instance as well. In order accumulate the list of values for each of our children, we want <code>Map (Key a) GroupedPostResult</code>, from the <code>Data.Map.Strict</code> module in the <code>containers</code> package.</p>
<p>But there’s a problem. The <code>Semigroup</code> instance for the <code>Map</code> applied <a href="https://hackage.haskell.org/package/containers-0.6.4.1/docs/Data-Map-Strict.html#t:Map" target="_blank" rel="noopener">is a union operation</a> that discards the right hand side:</p>
<blockquote>
<p>The <code>Semigroup</code> operation for <code>Map</code> is <code>union</code>, which prefers values from the left operand. If <code>m1</code> maps a key <code>k</code> to a value <code>a1</code>, and <code>m2</code> maps the same key to a different value <code>a2</code>, then their union <code>m1 &lt;&gt; m2</code> maps <code>k</code> to <code>a1</code>.</p>
</blockquote>
<p>We don’t want to discard data, we want to merge it. In order to do this we can write our own <code>newtype</code> for <code>Map</code> that implements a <code>Semigroup</code> instance with the behavior we want.</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">GroupedPostResultMap</span> <span class="ot">=</span> <span class="dt">GroupedPostResultMap</span></span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>    {<span class="ot"> unGroupedPostResultMap ::</span> <span class="dt">Map</span> (<span class="dt">Key</span> <span class="dt">User</span>) <span class="dt">GroupedPostResult</span></span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a>    }</span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Semigroup</span> <span class="dt">GroupedPostResultMap</span> <span class="kw">where</span></span>
<span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a>    <span class="dt">GroupedPostResultMap</span> lhs <span class="op">&lt;&gt;</span> <span class="dt">GroupedPostResultMap</span> rhs <span class="ot">=</span> <span class="dt">GroupedPostResultMap</span> <span class="op">$</span> Map.unionWith (<span class="op">&lt;&gt;</span>) lhs rhs</span>
<span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-8"><a href="#cb11-8" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Monoid</span> <span class="dt">GroupedPostResultMap</span> <span class="kw">where</span></span>
<span id="cb11-9"><a href="#cb11-9" aria-hidden="true" tabindex="-1"></a>    <span class="fu">mempty</span> <span class="ot">=</span> <span class="dt">GroupedPostResultMap</span> Map.empty</span>
<span id="cb11-10"><a href="#cb11-10" aria-hidden="true" tabindex="-1"></a>    <span class="fu">mappend</span> <span class="ot">=</span> (<span class="op">&lt;&gt;</span>)</span>
<span id="cb11-11"><a href="#cb11-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-12"><a href="#cb11-12" aria-hidden="true" tabindex="-1"></a><span class="ot">groupData ::</span> [(<span class="dt">Entity</span> <span class="dt">User</span>, <span class="dt">Entity</span> <span class="dt">Post</span>)] <span class="ot">-&gt;</span> [(<span class="dt">Entity</span> <span class="dt">User</span>, [<span class="dt">Entity</span> <span class="dt">Post</span>])]</span>
<span id="cb11-13"><a href="#cb11-13" aria-hidden="true" tabindex="-1"></a>groupData <span class="ot">=</span></span>
<span id="cb11-14"><a href="#cb11-14" aria-hidden="true" tabindex="-1"></a>    <span class="fu">fmap</span> (\(<span class="dt">GroupedPostResult</span> a b) <span class="ot">-&gt;</span> (a, b)) <span class="op">.</span></span>
<span id="cb11-15"><a href="#cb11-15" aria-hidden="true" tabindex="-1"></a>        Map.elems <span class="op">.</span></span>
<span id="cb11-16"><a href="#cb11-16" aria-hidden="true" tabindex="-1"></a>        unGroupedPostResultMap <span class="op">.</span></span>
<span id="cb11-17"><a href="#cb11-17" aria-hidden="true" tabindex="-1"></a>        <span class="fu">mconcat</span> <span class="op">.</span></span>
<span id="cb11-18"><a href="#cb11-18" aria-hidden="true" tabindex="-1"></a>        <span class="fu">fmap</span> (\(a, b) <span class="ot">-&gt;</span> <span class="dt">GroupedPostResultMap</span> <span class="op">$</span></span>
<span id="cb11-19"><a href="#cb11-19" aria-hidden="true" tabindex="-1"></a>                            Map.singleton (entityKey a) (<span class="dt">GroupedPostResult</span> a [b])</span>
<span id="cb11-20"><a href="#cb11-20" aria-hidden="true" tabindex="-1"></a>             )</span></code></pre></div>
<p>While this works, this is a lot of boilerplate code to write just to get the correct <code>Map</code> behavior. Moreover, if the data we’re working with has more than one level of nesting, the final lambda function that is passed into <code>fmap</code> will become even more cumbersome than it is in our example above.</p>
<h3 id="approach-3-grouping-child-data-using-ad-hoc-product-types">Approach 3: Grouping child data using ad-hoc product types</h3>
<p>Instead of making a new custom record type each time we want to create groupings of children under a parent, we can take advantage of what we saw earlier when we applied the <code>&lt;&gt;</code> operator to tuples—the product of a multiple <code>Semigroup</code> values is itself a <code>Semigroup</code>. In other words, we can apply <code>&lt;&gt;</code> to any tuple of any size so long as each member of the tuple supports <code>&lt;&gt;</code>.</p>
<p>For the code in the rest of this post, we’ll need the following imports and language extension:</p>
<div class="sourceCode" id="cb12"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE FlexibleContexts #-}</span></span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span>           <span class="dt">Data.Coerce</span></span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span>           <span class="dt">Data.Map.Strict</span>  <span class="kw">as</span> <span class="dt">Map</span></span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span>           <span class="dt">Data.Map.Append</span></span>
<span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span>           <span class="dt">Data.Semigroup</span></span>
<span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span>           <span class="dt">Database.Persist</span></span></code></pre></div>
<p>We can start by thinking of the general ad-hoc structure we want (notice that this is the generalized version of the concrete type described at the beginning of this post):</p>
<div class="sourceCode" id="cb13"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">GroupedListStructure</span> a b c <span class="ot">=</span></span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a>    [(<span class="dt">Entity</span> a, [(<span class="dt">Entity</span> b, [<span class="dt">Entity</span> c])])]</span></code></pre></div>
<p>And we can make an equivalent structure out of <code>Map</code> instead of lists:</p>
<div class="sourceCode" id="cb14"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">GroupedMapStructure</span> a b c <span class="ot">=</span></span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a>    <span class="dt">Map</span> ( <span class="dt">Key</span> a )</span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a>        ( <span class="dt">Entity</span> a</span>
<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a>        , <span class="dt">Map</span> ( <span class="dt">Key</span> b )</span>
<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a>              ( <span class="dt">Entity</span> b</span>
<span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a>              , <span class="dt">Map</span> (<span class="dt">Key</span> c) (<span class="dt">Entity</span> c)</span>
<span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a>              )</span>
<span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a>        )</span></code></pre></div>
<p>This structure will ensure that applying the<code>unionWith</code> operation will still work as intended. However, we still have the issue with <code>Map</code> not having the necessary <code>Semigroup</code> behavior and <code>Entity</code> not having any instance at all. In the previous section we solved this issue by defining the <code>GroupedResultMap</code>.</p>
<p>Looking at the implementation, we observe that <code>GroupedResultMap</code> does not use any information about its key or value type other than the values in the <code>Monoid</code> instance. We can extract this by creating a <code>newtype</code> like the following:</p>
<div class="sourceCode" id="cb15"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">AppendMap</span> k v <span class="ot">=</span> <span class="dt">AppendMap</span></span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a>    {<span class="ot"> unAppendMap ::</span> <span class="dt">Map</span> k v</span>
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a>    }</span></code></pre></div>
<p>With this, we can define a <code>Semigroup</code> instance whenever <code>v</code> is a Semigroup. Fortunately, there’s already a package that exists to solve this problem. <a href="https://hackage.haskell.org/package/appendmap" target="_blank" rel="noopener"><code>appendmap</code></a> is a tiny library that depends only on <code>base</code> and <code>containers</code>, the latter of which you will already be using if you’re using <code>Map</code>. It exposes an <code>AppendMap</code> data type that delegates the <code>Monoid</code> and <code>Semigroup</code> to its elements.</p>
<p>We can now change our desired structure to use <code>AppendMap</code> instead of <code>Map</code>:</p>
<div class="sourceCode" id="cb16"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">GroupedAppendMapStructure</span> a b c <span class="ot">=</span></span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a>    <span class="dt">AppendMap</span> ( <span class="dt">Key</span> a )</span>
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a>              ( <span class="dt">Entity</span> a</span>
<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a>              , <span class="dt">AppendMap</span> ( <span class="dt">Key</span> b )</span>
<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a>                          ( <span class="dt">Entity</span> b</span>
<span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a>                          , <span class="dt">AppendMap</span> (<span class="dt">Key</span> c) (<span class="dt">Entity</span> c)</span>
<span id="cb16-7"><a href="#cb16-7" aria-hidden="true" tabindex="-1"></a>                          )</span>
<span id="cb16-8"><a href="#cb16-8" aria-hidden="true" tabindex="-1"></a>              )</span></code></pre></div>
<p>There’s one last problem. We still don’t have a valid Semigroup without a <code>Semigroup</code> instance for <code>Entity</code>. Since the key in our structure is the <code>Key</code> within our <code>Entity</code>, the <code>Semigroup</code> instance only need the first value, and can ignore subsequent occurrences of <code>Entity</code>.</p>
<p>Conveniently, the<code>Data.Semigroup</code> module provides us with a <code>newtype</code> wrapper named <code>First</code> that does exactly what we need.</p>
<p>So, we can update our ad-hoc structure to use <code>First</code> and finally become:</p>
<div class="sourceCode" id="cb17"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">GroupedAppendMapSemigroupStructure</span> a b c <span class="ot">=</span></span>
<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a>    <span class="dt">AppendMap</span> ( <span class="dt">Key</span> a )</span>
<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a>              ( <span class="dt">First</span> (<span class="dt">Entity</span> a)</span>
<span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a>              , <span class="dt">AppendMap</span> ( <span class="dt">Key</span> b )</span>
<span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a>                          ( <span class="dt">First</span> (<span class="dt">Entity</span> b)</span>
<span id="cb17-6"><a href="#cb17-6" aria-hidden="true" tabindex="-1"></a>                          , <span class="dt">AppendMap</span> (<span class="dt">Key</span> c) (<span class="dt">First</span> (<span class="dt">Entity</span> c))</span>
<span id="cb17-7"><a href="#cb17-7" aria-hidden="true" tabindex="-1"></a>                          )</span>
<span id="cb17-8"><a href="#cb17-8" aria-hidden="true" tabindex="-1"></a>              )</span></code></pre></div>
<p>We can write a function that will give us this structure:</p>
<div class="sourceCode" id="cb18"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="ot">makeGroupedStructure ::</span> (<span class="dt">Entity</span> a, <span class="dt">Entity</span> b, <span class="dt">Entity</span> c)</span>
<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a>                     <span class="ot">-&gt;</span> <span class="dt">GroupedAppendMapSemigroupStructure</span> a b c</span>
<span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a>makeGroupedStructure (a, b, c) <span class="ot">=</span></span>
<span id="cb18-4"><a href="#cb18-4" aria-hidden="true" tabindex="-1"></a>    <span class="dt">AppendMap</span> <span class="op">$</span> Map.singleton</span>
<span id="cb18-5"><a href="#cb18-5" aria-hidden="true" tabindex="-1"></a>            ( entityKey a )</span>
<span id="cb18-6"><a href="#cb18-6" aria-hidden="true" tabindex="-1"></a>            ( <span class="dt">First</span> a</span>
<span id="cb18-7"><a href="#cb18-7" aria-hidden="true" tabindex="-1"></a>            , <span class="dt">AppendMap</span> <span class="op">$</span> Map.singleton</span>
<span id="cb18-8"><a href="#cb18-8" aria-hidden="true" tabindex="-1"></a>                ( entityKey b )</span>
<span id="cb18-9"><a href="#cb18-9" aria-hidden="true" tabindex="-1"></a>                ( <span class="dt">First</span> b</span>
<span id="cb18-10"><a href="#cb18-10" aria-hidden="true" tabindex="-1"></a>                , <span class="dt">AppendMap</span> <span class="op">$</span> Map.singleton (entityKey c) (<span class="dt">First</span> c)</span>
<span id="cb18-11"><a href="#cb18-11" aria-hidden="true" tabindex="-1"></a>                )</span>
<span id="cb18-12"><a href="#cb18-12" aria-hidden="true" tabindex="-1"></a>            )</span></code></pre></div>
<p>We can now update the <code>groupData</code> function we created during our first approach to use this implementation instead:</p>
<div class="sourceCode" id="cb19"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="ot">groupWith ::</span> (<span class="dt">Monoid</span> m, <span class="dt">Coercible</span> m b) <span class="ot">=&gt;</span> (r <span class="ot">-&gt;</span> m) <span class="ot">-&gt;</span> [r] <span class="ot">-&gt;</span> b</span>
<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a>groupWith fn <span class="ot">=</span></span>
<span id="cb19-3"><a href="#cb19-3" aria-hidden="true" tabindex="-1"></a>    coerce <span class="op">.</span> <span class="fu">mconcat</span> <span class="op">.</span> <span class="fu">fmap</span> fn</span>
<span id="cb19-4"><a href="#cb19-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb19-5"><a href="#cb19-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb19-6"><a href="#cb19-6" aria-hidden="true" tabindex="-1"></a><span class="ot">groupData ::</span> (<span class="dt">Ord</span> (<span class="dt">Key</span> a), <span class="dt">Ord</span> (<span class="dt">Key</span> b), <span class="dt">Ord</span> (<span class="dt">Key</span> c))</span>
<span id="cb19-7"><a href="#cb19-7" aria-hidden="true" tabindex="-1"></a>          <span class="ot">=&gt;</span> [(<span class="dt">Entity</span> a, <span class="dt">Entity</span> b, <span class="dt">Entity</span> c)]</span>
<span id="cb19-8"><a href="#cb19-8" aria-hidden="true" tabindex="-1"></a>          <span class="ot">-&gt;</span> <span class="dt">Map</span> (<span class="dt">Key</span> a) (<span class="dt">Entity</span> a, <span class="dt">Map</span> (<span class="dt">Key</span> b) (<span class="dt">Entity</span> b, <span class="dt">Map</span> (<span class="dt">Key</span> c) (<span class="dt">Entity</span> c)))</span>
<span id="cb19-9"><a href="#cb19-9" aria-hidden="true" tabindex="-1"></a>groupData <span class="ot">=</span></span>
<span id="cb19-10"><a href="#cb19-10" aria-hidden="true" tabindex="-1"></a>    groupWith makeGroupedStructure</span></code></pre></div>
<p>If necessary, we can transform the resulting <code>Map</code> structure into the list structure we described earlier by applying <code>Map.elems</code> at each level. We can write a helper function to do this:</p>
<div class="sourceCode" id="cb20"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a><span class="ot">transformMap ::</span> <span class="dt">Map</span> (<span class="dt">Key</span> a) (<span class="dt">Entity</span> a, <span class="dt">Map</span> (<span class="dt">Key</span> b) (<span class="dt">Entity</span> b, <span class="dt">Map</span> (<span class="dt">Key</span> c) (<span class="dt">Entity</span> c)))</span>
<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a>             <span class="ot">-&gt;</span> [(<span class="dt">Entity</span> a, [(<span class="dt">Entity</span> b, [<span class="dt">Entity</span> c ])])]</span>
<span id="cb20-3"><a href="#cb20-3" aria-hidden="true" tabindex="-1"></a>transformMap <span class="fu">map</span> <span class="ot">=</span></span>
<span id="cb20-4"><a href="#cb20-4" aria-hidden="true" tabindex="-1"></a>    <span class="fu">fmap</span> (\(parentA, childrenA) <span class="ot">-&gt;</span></span>
<span id="cb20-5"><a href="#cb20-5" aria-hidden="true" tabindex="-1"></a>             ( parentA</span>
<span id="cb20-6"><a href="#cb20-6" aria-hidden="true" tabindex="-1"></a>             , <span class="fu">fmap</span> (\(parentB, childrenB) <span class="ot">-&gt;</span></span>
<span id="cb20-7"><a href="#cb20-7" aria-hidden="true" tabindex="-1"></a>                        ( parentB</span>
<span id="cb20-8"><a href="#cb20-8" aria-hidden="true" tabindex="-1"></a>                        , Map.elems <span class="op">$</span> childrenB</span>
<span id="cb20-9"><a href="#cb20-9" aria-hidden="true" tabindex="-1"></a>                        )</span>
<span id="cb20-10"><a href="#cb20-10" aria-hidden="true" tabindex="-1"></a>                    ) (Map.elems <span class="op">$</span> childrenA)</span>
<span id="cb20-11"><a href="#cb20-11" aria-hidden="true" tabindex="-1"></a>             )</span>
<span id="cb20-12"><a href="#cb20-12" aria-hidden="true" tabindex="-1"></a>         ) (Map.elems <span class="fu">map</span>)</span></code></pre></div>
<p>Finally, we can compose the two:</p>
<div class="sourceCode" id="cb21"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="ot">groupQueryResults ::</span> (<span class="dt">Ord</span> (<span class="dt">Key</span> a), <span class="dt">Ord</span> (<span class="dt">Key</span> b), <span class="dt">Ord</span> (<span class="dt">Key</span> c))</span>
<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a>                  <span class="ot">=&gt;</span> [(<span class="dt">Entity</span> a, <span class="dt">Entity</span> b, <span class="dt">Entity</span> c)]</span>
<span id="cb21-3"><a href="#cb21-3" aria-hidden="true" tabindex="-1"></a>                  <span class="ot">-&gt;</span> [(<span class="dt">Entity</span> a, [(<span class="dt">Entity</span> b, [<span class="dt">Entity</span> c])])]</span>
<span id="cb21-4"><a href="#cb21-4" aria-hidden="true" tabindex="-1"></a>groupQueryResults <span class="ot">=</span></span>
<span id="cb21-5"><a href="#cb21-5" aria-hidden="true" tabindex="-1"></a>    transformMap <span class="op">.</span> groupData</span></code></pre></div>
<p>With this, we’ve successfully grouped the children at each level and structured structured the data in the way we set out to in the beginning.</p>
<h2 id="putting-it-all-together">Putting it all together</h2>
<p>Let’s apply this approach to the schema we looked at earlier.</p>
<div class="sourceCode" id="cb22"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"></a><span class="co">-- The type representing each row from our Esqueleto query</span></span>
<span id="cb22-2"><a href="#cb22-2" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">PostModel</span> <span class="ot">=</span> (<span class="dt">Entity</span> <span class="dt">User</span>, <span class="dt">Entity</span> <span class="dt">Post</span>, <span class="dt">Entity</span> <span class="dt">PostComment</span>)</span>
<span id="cb22-3"><a href="#cb22-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-4"><a href="#cb22-4" aria-hidden="true" tabindex="-1"></a><span class="co">-- The resulting grouped type we want to achieve</span></span>
<span id="cb22-5"><a href="#cb22-5" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">GroupedPostModel</span> <span class="ot">=</span></span>
<span id="cb22-6"><a href="#cb22-6" aria-hidden="true" tabindex="-1"></a>    <span class="dt">Map</span> <span class="dt">UserId</span></span>
<span id="cb22-7"><a href="#cb22-7" aria-hidden="true" tabindex="-1"></a>        ( <span class="dt">Entity</span> <span class="dt">User</span></span>
<span id="cb22-8"><a href="#cb22-8" aria-hidden="true" tabindex="-1"></a>        , <span class="dt">Map</span> <span class="dt">PostId</span></span>
<span id="cb22-9"><a href="#cb22-9" aria-hidden="true" tabindex="-1"></a>              ( <span class="dt">Entity</span> <span class="dt">Post</span></span>
<span id="cb22-10"><a href="#cb22-10" aria-hidden="true" tabindex="-1"></a>              , <span class="dt">Map</span> <span class="dt">PostCommentId</span> (<span class="dt">Entity</span> <span class="dt">PostComment</span>)</span>
<span id="cb22-11"><a href="#cb22-11" aria-hidden="true" tabindex="-1"></a>              )</span>
<span id="cb22-12"><a href="#cb22-12" aria-hidden="true" tabindex="-1"></a>        )</span>
<span id="cb22-13"><a href="#cb22-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-14"><a href="#cb22-14" aria-hidden="true" tabindex="-1"></a><span class="co">-- The intermediary semigroup type that defines grouping behavior</span></span>
<span id="cb22-15"><a href="#cb22-15" aria-hidden="true" tabindex="-1"></a><span class="co">-- This corresponds directly to GroupedPostModel above</span></span>
<span id="cb22-16"><a href="#cb22-16" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">GroupedPostSemigroup</span> <span class="ot">=</span></span>
<span id="cb22-17"><a href="#cb22-17" aria-hidden="true" tabindex="-1"></a>    <span class="dt">AppendMap</span> ( <span class="dt">UserId</span> )</span>
<span id="cb22-18"><a href="#cb22-18" aria-hidden="true" tabindex="-1"></a>              ( <span class="dt">First</span> (<span class="dt">Entity</span> <span class="dt">User</span>)</span>
<span id="cb22-19"><a href="#cb22-19" aria-hidden="true" tabindex="-1"></a>              , <span class="dt">AppendMap</span> ( <span class="dt">PostId</span> )</span>
<span id="cb22-20"><a href="#cb22-20" aria-hidden="true" tabindex="-1"></a>                          ( <span class="dt">First</span> (<span class="dt">Entity</span> <span class="dt">Post</span>)</span>
<span id="cb22-21"><a href="#cb22-21" aria-hidden="true" tabindex="-1"></a>                          , <span class="dt">AppendMap</span> ( <span class="dt">PostCommentId</span> )</span>
<span id="cb22-22"><a href="#cb22-22" aria-hidden="true" tabindex="-1"></a>                                      ( <span class="dt">First</span> (<span class="dt">Entity</span> <span class="dt">PostComment</span>) )</span>
<span id="cb22-23"><a href="#cb22-23" aria-hidden="true" tabindex="-1"></a>                          )</span>
<span id="cb22-24"><a href="#cb22-24" aria-hidden="true" tabindex="-1"></a>              )</span>
<span id="cb22-25"><a href="#cb22-25" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-26"><a href="#cb22-26" aria-hidden="true" tabindex="-1"></a><span class="co">-- Takes our query result and produces values in the semigroup shape above</span></span>
<span id="cb22-27"><a href="#cb22-27" aria-hidden="true" tabindex="-1"></a><span class="ot">makeSinglePostGroup ::</span> <span class="dt">PostModel</span> <span class="ot">-&gt;</span> <span class="dt">GroupedPostSemigroup</span></span>
<span id="cb22-28"><a href="#cb22-28" aria-hidden="true" tabindex="-1"></a>makeSinglePostGroup (user, post, postComment) <span class="ot">=</span></span>
<span id="cb22-29"><a href="#cb22-29" aria-hidden="true" tabindex="-1"></a>    <span class="dt">AppendMap</span> <span class="op">$</span> Map.singleton</span>
<span id="cb22-30"><a href="#cb22-30" aria-hidden="true" tabindex="-1"></a>            ( entityKey user )</span>
<span id="cb22-31"><a href="#cb22-31" aria-hidden="true" tabindex="-1"></a>            ( <span class="dt">First</span> user</span>
<span id="cb22-32"><a href="#cb22-32" aria-hidden="true" tabindex="-1"></a>            , <span class="dt">AppendMap</span> <span class="op">$</span> Map.singleton</span>
<span id="cb22-33"><a href="#cb22-33" aria-hidden="true" tabindex="-1"></a>                ( entityKey post )</span>
<span id="cb22-34"><a href="#cb22-34" aria-hidden="true" tabindex="-1"></a>                ( <span class="dt">First</span> post</span>
<span id="cb22-35"><a href="#cb22-35" aria-hidden="true" tabindex="-1"></a>                , <span class="dt">AppendMap</span> <span class="op">$</span> Map.singleton (entityKey postComment) (<span class="dt">First</span> postComment)</span>
<span id="cb22-36"><a href="#cb22-36" aria-hidden="true" tabindex="-1"></a>                )</span>
<span id="cb22-37"><a href="#cb22-37" aria-hidden="true" tabindex="-1"></a>            )</span>
<span id="cb22-38"><a href="#cb22-38" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-39"><a href="#cb22-39" aria-hidden="true" tabindex="-1"></a><span class="co">-- Copied directly from the previous section</span></span>
<span id="cb22-40"><a href="#cb22-40" aria-hidden="true" tabindex="-1"></a><span class="ot">groupWith ::</span> (<span class="dt">Monoid</span> m, <span class="dt">Coercible</span> m b) <span class="ot">=&gt;</span> (r <span class="ot">-&gt;</span> m) <span class="ot">-&gt;</span> [r] <span class="ot">-&gt;</span> b</span>
<span id="cb22-41"><a href="#cb22-41" aria-hidden="true" tabindex="-1"></a>groupWith fn <span class="ot">=</span></span>
<span id="cb22-42"><a href="#cb22-42" aria-hidden="true" tabindex="-1"></a>    coerce <span class="op">.</span> <span class="fu">mconcat</span> <span class="op">.</span> <span class="fu">fmap</span> fn</span>
<span id="cb22-43"><a href="#cb22-43" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-44"><a href="#cb22-44" aria-hidden="true" tabindex="-1"></a><span class="ot">groupData ::</span> [<span class="dt">PostModel</span>] <span class="ot">-&gt;</span> <span class="dt">GroupedPostModel</span></span>
<span id="cb22-45"><a href="#cb22-45" aria-hidden="true" tabindex="-1"></a>groupData <span class="ot">=</span></span>
<span id="cb22-46"><a href="#cb22-46" aria-hidden="true" tabindex="-1"></a>    groupWith makeSinglePostGroup</span></code></pre></div>
<p>Applying this approach is straightforward after we decide on how the data should be grouped. Once the grouping is defined in our <code>GroupedPostModel</code> type, writing the rest is purely mechanical. What’s more is that this approach gives us the flexibility to define our groups however we’d like; we can add additional levels of nesting or add more items at each level without increasing the complexity of our <code>groupData</code> function.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Writing the most performant SQL comes with a trade off: the data that our application receives from the database will have fanout if we have any one-to-many relationships in our data. The typical way to solve this problem involves defining a function that specifies merge behavior and using the <code>Map</code> data type to create our parent-child data, which is the first approach we explored. This works, but has shortcomings.</p>
<p>We set out to find a better solution to this problem, noticing that the<code>Semigroup</code> typeclass gives us the append operation we need to generalize the grouping behavior. After a few iterations, the approach we settled on relies on creating an ad-hoc data type to define our intended structure, and relies on instances of <code>Semigroup</code> and <code>Monoid</code> to perform the grouping as necessary. This new approach is an improvement in that allows us to focus primarily on type of the grouped data, with the in-between being a straightforward mechanical translation of this type. This saves us from writing cumbersome merge operations and instead relies on Haskell’s type system to achieve the intended results.</p>
<hr />
<p><em>Looking for help with something you’re working on? We’d love to hear from you. At Foxhound Systems, we focus on using Haskell to create custom built software your business can depend on. Reach out to us at <a href="mailto:info@foxhound.systems" target="_blank" rel="noopener">info@foxhound.systems</a></em></p>
        </div>
    </div>
</article>
]]></description>
    <pubDate>Mon, 23 Nov 2020 00:00:00 UT</pubDate>
    <guid>https://www.foxhound.systems/blog/grouping-query-results-haskell/index.html</guid>
    <dc:creator>Foxhound Systems</dc:creator>
</item>
<item>
    <title>Who is Foxhound Systems?</title>
    <link>https://www.foxhound.systems/blog/who-is-foxhound-systems/index.html</link>
    <description><![CDATA[<nav class="container mt-2 mb-2">
    <a href="https://www.foxhound.systems/blog/" class="no-underline magic-underline-slim">Back to all posts</a>
</nav>
<article class="grow post">
    
        <div class="lg-container post-banner">
            <picture>
                
                <img src="https://www.foxhound.systems/img/fxs-social-media-post-banner.png" alt height="600" width="1200">
            </picture>
        </div>
    
    <div class="container">
        
        <h1 class="title">Who is Foxhound Systems?</h1>
        <header class="info gy-1">
            <div class="author-date space-y-1">
                <div>
                    <span class="date">November 13, 2020</span>
                </div>
                <div class="flex flex-wrap gx-2 gy-1">
                    
                </div>
            </div>
            
            <div>
                <div class="tags"><a title="Posts tagged: foxhound-systems" href="https://www.foxhound.systems/blog/tag/foxhound-systems/">foxhound-systems</a></div>
            </div>
            
        </header>
        <hr>
        <div class="content">
        <p>Foxhound Systems is a custom software development company founded by Christian Charukiewicz and Ben Levy. We started Foxhound Systems because we care about great software, and having worked as professional software engineers for a long time, we believe we can help other organizations grow by creating fast, reliable, and well-designed software systems. We have both seen and had our hands in creating a wide variety of software and products, and one trait that we’ve shared along the way is that we’ve always been on the lookout for ways to learn, improve, and help others.</p>
<!--more-->
<h2 id="our-backstory">Our backstory</h2>
<p>Christian and Ben met in college while working on their Computer Science degrees and have each had varied careers in software development.</p>
<p>Christian started his software development career by working for a small game development company where he oversaw product development and afterwards worked at a Chicago SaaS startup, where he spent his last 5 years as Chief Technology Officer. Over the course of his career, he has played dual roles as both technology and product leads, setting direction and working day-to-day in product, engineering, and infrastructure. As CTO, he introduced his team to the Haskell and Elm programming languages, which became the standard tools for new development.</p>
<p>Ben has worked for a wide variety of organizations over the course of his professional career, including eBay, Bloomberg, Boeing, as well as several smaller companies. Ben has a very broad software engineering background, and has professional experience working on a wide array of software systems and languages, ranging from aircraft flight simulators to web applications. He has spent the last few years working predominantly on web services and open source libraries written in Haskell. He is a maintainer of the <a href="https://github.com/bitemyapp/esqueleto/" target="_blank" rel="noopener">Esqueleto</a> library, which is the most widely used SQL library in the Haskell ecosystem.</p>
<p>Over the course of our professional careers, we’ve both observed numerous ways in which software projects could have been executed more effectively. We’ve witnessed this in our jobs, in talking to other professionals, in examining applications we’ve used. We believe that there’s no need to settle for mediocre software or to accept that many software projects deliver only passable results after taking many times longer than expected to finish. We decided to start Foxhound Systems because we know we can do better, and think that we can help organizations create the software they’re hoping to get.</p>
<h2 id="who-we-are-as-a-company">Who we are as a company</h2>
<p>As a custom software development company, we believe we can help organizations grow through software that can be relied on, whether that’s working on an old system or building something completely new. There are many ways in which we describe ourselves, but here are some of our most defining traits.</p>
<ul>
<li><strong>We’re software engineers.</strong> We’ve been writing code professionally for years. We care about writing code that runs fast and is easy to maintain, and strive to make our software reliable.</li>
<li><strong>We’re functional programmers.</strong> We’re experts at Haskell and Elm and embrace the strengths of these languages. Our experience has led us to believe that statically typed functional programming helps developers write reliable and maintainable code.</li>
<li><strong>We’re ops people.</strong> We have extensive experience designing and maintaining the infrastructure that runs our code. We pride ourselves on zero-downtime deployments, and will go far out of our way to achieve them.</li>
<li><strong>We’re big on relational databases.</strong> We love SQL. Query optimization and database tuning are right in our wheelhouse—we have a ton of experience with squeezing out as much performance as possible.</li>
<li><strong>We’re product people.</strong> We focused on building applications that are not only a joy to use, but also provide significant return on investment. This means building user interfaces that are intuitive and responsive, while keeping in mind the needs of the business.</li>
</ul>
<h2 id="how-we-contribute">How we contribute</h2>
<p>We’ve benefited greatly from open source software and software communities over the course of our careers, and we continue to benefit as a company today. We’d like to give back and help these communities continue to grow. Here’s a few of the things we’ve been doing as well as what we plan to do as a company:</p>
<ul>
<li><strong>We contribute to established open source projects.</strong> We’ve mentioned our involvement in Esqueleto, and we’ve also made numerous contributions to other projects, such as Yesod. Beyond code, we also try to write or improve documentation.</li>
<li><strong>We publish new open source software.</strong> We’ve published open source software both as individuals and <a href="https://github.com/foxhound-systems" target="_blank" rel="noopener">as a company</a>. We frequently look for opportunities to extract libraries that can be used widely.</li>
<li><strong>We help others on various community platforms.</strong> You’ll often find us answering questions and helping beginners in places like the <a href="https://fpslack.com/" target="_blank" rel="noopener">Functional Programming Slack</a>, the <a href="https://www.reddit.com/r/haskell/" target="_blank" rel="noopener">/r/Haskell subreddit</a>, or the <strong>#nixos</strong> channel on Freenode.</li>
<li><strong>We will be writing about what we learn on this blog.</strong> As we discover new ways to solve some of the problems we encounter, we’ll be writing about them here.</li>
</ul>
<h2 id="welcome-aboard">Welcome aboard</h2>
<p>Thanks for joining us on this journey. We hope that what we learn and share can help you.</p>
<p>— Christian &amp; Ben</p>
<hr />
<p><em>Looking for help with something you’re working on? We’d love to hear from you. Reach out to us at <a href="mailto:info@foxhound.systems" target="_blank" rel="noopener">info@foxhound.systems</a></em></p>
        </div>
    </div>
</article>
]]></description>
    <pubDate>Fri, 13 Nov 2020 00:00:00 UT</pubDate>
    <guid>https://www.foxhound.systems/blog/who-is-foxhound-systems/index.html</guid>
    <dc:creator>Foxhound Systems</dc:creator>
</item>

    </channel>
</rss>
