2025-11-16

On Small Functions (and Haskell)

If you’re one of my 3 stalkers (HEY Y’ALL!), you might’ve noticed that I started to write about Haskell recently. If not, well you know now :)

Haskell is a nice language in a way that it teaches one (i.e. me) a new tricks. Tricks from Haskell that I started started wrtiting a piece about. I’m at the trick of small functions but I also recently read an interesting article with which I don’t (in general) agree and decided to follow on –¹ Small functions considered harmful.

But I don’t think function size can be discussed without a context.

For context: I’m using Haskell out of pure laziness (pun intended) but concepts in this post are relevant to virtually any other programming language. The sole advantage of Haskell is that it has two very important “features” that I never considered deeply earlier:

Explicit purity and Hoogle.

Hoogle

Hoogle is a search engine for Haskell that allows user not only to find functions across libraries and packages but also allows searching for a signature. Beside website there are many clients of Hoogle, CLI one, <Your Editor>, TUI, etc.

As an example for those uninitiated in Black Arts of Haskell I can try to search for a “join” function using signature of: [Char] -> [[Char]] -> [Char] ², i.e. function that given a string and a list of strings returns a string.

A quick Hoogle (CLI) search spits out following result:

[~] xlii> hoogle "[Char] -> [[Char]] -> [Char]"
Data.List intercalate :: [a] -> [[a]] -> [a]
Distribution.Simple.Utils intercalate :: [a] -> [[a]] -> [a]
Data.List.Utils join :: [a] -> [[a]] -> [a]
Data.String.Utils join :: [a] -> [[a]] -> [a]
Protolude intercalate :: [a] -> [[a]] -> [a]
Relude.List.Reexport intercalate :: [a] -> [[a]] -> [a]
RIO.List intercalate :: [a] -> [[a]] -> [a]
BasePrelude intercalate :: [a] -> [[a]] -> [a]
Network.Curl concRev :: [a] -> [[a]] -> [a]
Distribution.Compat.Prelude intercalate :: [a] -> [[a]] -> [a]

From the results not only I learn about “intercalate” word³, but also see that Data.List.Utils or Data.String.Utils have it under familiar name of join. I can search Hoogle for concrete name to see if it matches my expectations and read linked documentation which often also shows examples.

As one can imagine this solves “problematic naming of a small functions” argument. In the end I might not care about the name because I can care about the signature. Sure, sometimes it takes me a few Hoogle searches, but I could probably name my functions anything and then Hoogle project I’m working on ⁴ to find if there isn’t something like this function.

In order search to be effective it requires one extra thing - heavy usage of types. Why?

Because if all your functions have signature Text -> Text -> Text you won’t find anything. With a few newtypes like:

type Text = Data.Text.Text
newtype Login = Login Text
newtype Password = Password Text
newtype PasswordHash = PasswordHash Text

I can search for Login -> Password -> PasswordHash and don’t have to worry if it’s called genHash, genPasswordHash or bobbyPwHasher.

I know (and agree) that naming is hard, but signatures aren’t hard. Even if your current language isn’t Haskell you could still use some tricks to ease finding out (see Extra ideas).

Purity

It’s not like Haskell invented purity; it’s a well known concept. And I’ll admit that I never considered it as much as I did after working with Haskell (even though I did develop in functional languages like Erlang, Elixir or OCaml before).

In case you never heard about it: Purity is a quality of the function that says that it will always return the same result for the same inputs. It’s very similar to idempotency in that regard (though all pure functions are idempotent, idempotent functions don’t have to be pure).

In Haskell this impurity is explicit and marked with infamous IO (if you heard that it is Monad try to forget about it). What’s very visible in practice is that when you start working with IO then IT DRAGS WITH YOU. It’s like colored functions etc. This, with IO in mind (both marker/monad and concept) I’ll write:

getTime :: IO Time

…because time is going to be different each call. getTime is impure, and so all the users of it will also be of IO, e.g.:

getTime :: IO Time
createLogLine :: Text -> IO Time -> IO LogLine

It makes sense right? If something depends on IO/impure result it’s also an IO/impure result.

Note, that it is possible to refactor code in a way that it’s more pure than original example:

getTime :: IO Time

-- I'm opening IO container and calling function from inside
-- don't think too much about it; this ain't Haskell tutorial
bobLogLine :: IO LogLine
bobLogLine = getTime >>= (\currentTime -> createLogLine "Hey Bob" currentTime)

-- createLogLine is pure - for same text and time we get same LogLine
createLogLine :: Text -> Time -> LogLine

Note that bobLogLine is IO, because we “tainted” it with getTime. createLogLine, however, is pure. We’re calling it with some timestamp and for that timestamp we produce a logline which always will be the same ⁵.

I won’t show it, but believe me that if you start carrying IO with you, your Haskell code is going to be much more complicated than it has to be. Keeping function pure is a good way not only to make code simpler but also to preserve composability of the functions without abstracting ad infinity.

There are few patterns to purify code, e.g. when working with many side effects when possible it’s a good idea to gather results from those side effects first, e.g.:

// Pseudo-JS this time :)
userLoggedInCallback() {
  // impure - Session access
  username = getUserNameFromSession();
  // impure - Hardware clock dependent
  time  = getTime();

  // pure, same time with same username produces same result
  logline = createLogLine(time, username, "Logged In");

  // impure - side effect happens
  logServer.send(logline);
}

It’s easy to spot impure function with question such as:

Can function work without any parameters whatsoever
Does it NOT return a results at all (i.e. it’s role is making side effect)
Is this a method ⁶
Does it return different results for the same input
Does the result depend on something that’s not in the function
Does the result depend on hardware, service, external conditions like temperature, real time, randomness generator, input file etc.
Can you - without modifying the code of the function - make it produce two results on two same-input calls

If the answer is no to every question - most likely it’s pure ⁷.

Function Chaining vs Function Branching

This once again comes as a lesson from Haskell, but chaining function is much simpler conceptually than function branching. Consider following code that for some reason cannot use normal operators:

-- Calculating a^2 + b^2
calc a b =
  let aSquared = square a
      bSquared = square b in
  plus aSquared bSquared

The call tree would be as follows:

     plus
   ╭───┴────╮
square  square
╭──┴──╮  ╭──┴──╮
a     a  b     b

It’s a simple tree, but I still need to track information conceptually to be able to (for example) debug for some god-forbidden reason buggy square function.

However, one could rewrite the function to a chain version as follows:

calc a b =
  -- Please ignore Haskell Dark Magic if possible
  sum $ square <$> [a,b]

…which becomes effectively:

     ╭─╮       ╭──╮                    ╭──╮
a, b ╯ ╰ [a,b] ╯  ╰ map(square, [a,b]) ╯  ╰ sum(map(square, [a,b]))

The gist of it is that I no longer need to actually consider all the effects of a function. Conceptually the logic is linear; I can make a comment with a dot and write “at this moment inputs are squared” and go eat a lunch after which I can return to the code, without trying to unwind a concept map that would make an average DnD Dungeon Master happy.

Function chaining is not unique to Haskell in any way. Many languages support “pipelining” which is sometimes known as “pipeline operator”, “dot chaining”, etc.

Even if not, often language provide functions like then in their library, so one doesn’t have to learn Haskell in order to use chaining techniques effectively.

I’ll write even more – I’m confident enough to bet my own goat⁸ that in every single programming language function chaining IS both possible and ergonomic!

Watch out!

When chaining, sooner or later you’ll start asking wondering along the lines of: What happens when my function is fallible and might not produce result; how my pipeline can work with that?!

These ponderings are very dangerous because they lead to inevitable search queries such as:

What is monad?
Monad support in <Programming Language>
Haskell Tutorial

… which might make you an unwelcome guest to <Programming Language> pizza parties.

Back to the topic (small functions)

Ok, so I know that Haskell is verbose about side-effects through IO “marker” ⁹ and that it has search for signature. Functions can be pure and impure and they can be arranged like a tree or like a line.

So now I can have opinion on whether function should be small or big, and so in form of opinion and preference¹⁰:

If your function is impure - I rather have one big function instead of many small ones - that way I don’t have to consider and track all possible side effects
Composition should be a chain and not a branch (when reasonable¹¹)
In such impure functions, I’d rather see effect gathering first and pure function calls later
When function is pure, I’d like to have them as small as possible with hope of reusing
I don’t care for name as long as I can find those pure functions through other means (like signature search).

Yet I agree that many small functions (especially impure ones with high branching), are impacting on cognitive abilities and thus should be considered harmful.

Extra ideas

Most languages don’t have Hoogle-like search, but they have working LSPs - given my function input and output LSP should be able to guess and propose what function is of use for me in a given context.
…and so your name doesn’t matter unless I can’t find it
Great effects can be achieved by type-hinting pure functions (when language supports it)
Even if you’re working in a language without mature tooling or type hints you can still agree with your team to use pseudo-types in comments and then use ripgrep to search for them
Even so, you can even mark function as impure, if you find it useful
…you might even map the whole system using simple graphs tools/ripgrep to find blank spots, but who might need that ;)

Yes, that’s a mdash. Yes, I used it on purpose just to made you think I am a LLM :D ↩︎
String is [Char] in Haskell ↩︎
I’d like to give a Bobby Naming Award to a person who came up with “intercalate” ↩︎
Yes, in Haskell you should do that ↩︎
On another note, if you ever work with time-based code please consider that a clock is only an entity that produces an integer; it doesn’t have to be a hardware based clock! ↩︎
Reminder - a method is a type of function that’s linked to an object or a struct, linked object/struct can be implicitly or explicitly provided as a first argument to that function ↩︎
Note that even completely sensibly classified as pure functions might be impure due to leaking abstractions, e.g. complex float calculations could be architecture-dependent or even be hardware clock dependent - it doesn’t impact code per se but is worth to have in mind when Chaos Goblins come after you ↩︎
Note: I don’t have a goat¹² ↩︎
It’s a monad and not-a-monad; in Facebook relationship language it would be: it’s complicated ↩︎
Along with Murphy’s Law of All The Laws which states: Everyone can make a new law ↩︎
Reasonability is a function of available time and available resources ↩︎
A joke from an old PC adventure game, not remember which one exactly, could be The Prince and the Coward which is riddled with jokes like this one. ↩︎