On Small Functions (and Haskell)
If you’re one of my 3 stalkers (HEY Y’ALL!), you might’ve noticed that
I started to write about Haskell recently. If not, well you know now
:)
Haskell is a nice language in a way that it teaches one (i.e. me) a new tricks. Tricks from Haskell that I started started wrtiting a piece about. I’m at the trick of small functions but I also recently read an interesting article with which I don’t (in general) agree and decided to follow on –1 Small functions considered harmful.
But I don’t think function size can be discussed without a context.
For context: I’m using Haskell out of pure laziness (pun intended) but concepts in this post are relevant to virtually any other programming language. The sole advantage of Haskell is that it has two very important “features” that I never considered deeply earlier:
Explicit purity and Hoogle.
Hoogle
Hoogle is a search engine for Haskell that allows user not only to find functions across libraries and packages but also allows searching for a signature. Beside website there are many clients of Hoogle, CLI one, <Your Editor>, TUI, etc.
As an example for those uninitiated in Black Arts of Haskell I can
try to search for a “join” function using signature of: [Char] -> [[Char]] -> [Char] 2, i.e. function that
given a string and a list of strings returns a string.
A quick Hoogle (CLI) search spits out following result:
[~] xlii> hoogle "[Char] -> [[Char]] -> [Char]"
Data.List intercalate :: [a] -> [[a]] -> [a]
Distribution.Simple.Utils intercalate :: [a] -> [[a]] -> [a]
Data.List.Utils join :: [a] -> [[a]] -> [a]
Data.String.Utils join :: [a] -> [[a]] -> [a]
Protolude intercalate :: [a] -> [[a]] -> [a]
Relude.List.Reexport intercalate :: [a] -> [[a]] -> [a]
RIO.List intercalate :: [a] -> [[a]] -> [a]
BasePrelude intercalate :: [a] -> [[a]] -> [a]
Network.Curl concRev :: [a] -> [[a]] -> [a]
Distribution.Compat.Prelude intercalate :: [a] -> [[a]] -> [a]
From the results not only I learn about “intercalate” word3, but also see that
Data.List.Utils or Data.String.Utils have it under familiar name of
join. I can search Hoogle for concrete name to see if it matches my
expectations and read linked documentation which often also shows examples.
As one can imagine this solves “problematic naming of a small functions” argument. In the end I might not care about the name because I can care about the signature. Sure, sometimes it takes me a few Hoogle searches, but I could probably name my functions anything and then Hoogle project I’m working on 4 to find if there isn’t something like this function.
In order search to be effective it requires one extra thing - heavy usage of types. Why?
Because if all your functions have signature Text -> Text -> Text you
won’t find anything. With a few newtypes like:
type Text = Data.Text.Text
newtype Login = Login Text
newtype Password = Password Text
newtype PasswordHash = PasswordHash Text
I can search for Login -> Password -> PasswordHash and don’t have to
worry if it’s called genHash, genPasswordHash or bobbyPwHasher.
I know (and agree) that naming is hard, but signatures aren’t hard. Even if your current language isn’t Haskell you could still use some tricks to ease finding out (see Extra ideas).
Purity
It’s not like Haskell invented purity; it’s a well known concept. And I’ll admit that I never considered it as much as I did after working with Haskell (even though I did develop in functional languages like Erlang, Elixir or OCaml before).
In case you never heard about it: Purity is a quality of the function that says that it will always return the same result for the same inputs. It’s very similar to idempotency in that regard (though all pure functions are idempotent, idempotent functions don’t have to be pure).
In Haskell this impurity is explicit and marked with infamous IO (if
you heard that it is Monad try to forget about it). What’s very
visible in practice is that when you start working with IO then IT
DRAGS WITH YOU. It’s like colored functions etc. This, with IO in mind (both
marker/monad and concept) I’ll write:
getTime :: IO Time
…because time is going to be different each call. getTime is
impure, and so all the users of it will also be of IO, e.g.:
getTime :: IO Time
createLogLine :: Text -> IO Time -> IO LogLine
It makes sense right? If something depends on IO/impure result it’s also an IO/impure result.
Note, that it is possible to refactor code in a way that it’s more pure than original example:
getTime :: IO Time
-- I'm opening IO container and calling function from inside
-- don't think too much about it; this ain't Haskell tutorial
bobLogLine :: IO LogLine
bobLogLine = getTime >>= (\currentTime -> createLogLine "Hey Bob" currentTime)
-- createLogLine is pure - for same text and time we get same LogLine
createLogLine :: Text -> Time -> LogLine
Note that bobLogLine is IO, because we “tainted” it with
getTime. createLogLine, however, is pure. We’re calling it with some
timestamp and for that timestamp we produce a logline which always
will be the same 5.
I won’t show it, but believe me that if you start carrying IO with you, your Haskell code is going to be much more complicated than it has to be. Keeping function pure is a good way not only to make code simpler but also to preserve composability of the functions without abstracting ad infinity.
There are few patterns to purify code, e.g. when working with many side effects when possible it’s a good idea to gather results from those side effects first, e.g.:
// Pseudo-JS this time :)
userLoggedInCallback() {
// impure - Session access
username = getUserNameFromSession();
// impure - Hardware clock dependent
time = getTime();
// pure, same time with same username produces same result
logline = createLogLine(time, username, "Logged In");
// impure - side effect happens
logServer.send(logline);
}
It’s easy to spot impure function with question such as:
- Can function work without any parameters whatsoever
- Does it NOT return a results at all (i.e. it’s role is making side effect)
- Is this a method 6
- Does it return different results for the same input
- Does the result depend on something that’s not in the function
- Does the result depend on hardware, service, external conditions like temperature, real time, randomness generator, input file etc.
- Can you - without modifying the code of the function - make it produce two results on two same-input calls
If the answer is no to every question - most likely it’s pure 7.
Function Chaining vs Function Branching
This once again comes as a lesson from Haskell, but chaining function is much simpler conceptually than function branching. Consider following code that for some reason cannot use normal operators:
-- Calculating a^2 + b^2
calc a b =
let aSquared = square a
bSquared = square b in
plus aSquared bSquared
The call tree would be as follows:
plus
╭───┴────╮
square square
╭──┴──╮ ╭──┴──╮
a a b b
It’s a simple tree, but I still need to track information conceptually
to be able to (for example) debug for some god-forbidden reason buggy
square function.
However, one could rewrite the function to a chain version as follows:
calc a b =
-- Please ignore Haskell Dark Magic if possible
sum $ square <$> [a,b]
…which becomes effectively:
╭─╮ ╭──╮ ╭──╮
a, b ╯ ╰ [a,b] ╯ ╰ map(square, [a,b]) ╯ ╰ sum(map(square, [a,b]))
The gist of it is that I no longer need to actually consider all the effects of a function. Conceptually the logic is linear; I can make a comment with a dot and write “at this moment inputs are squared” and go eat a lunch after which I can return to the code, without trying to unwind a concept map that would make an average DnD Dungeon Master happy.
Function chaining is not unique to Haskell in any way. Many languages support “pipelining” which is sometimes known as “pipeline operator”, “dot chaining”, etc.
Even if not, often language provide functions like then in their
library, so one doesn’t have to learn Haskell in order to use chaining
techniques effectively.
I’ll write even more – I’m confident enough to bet my own goat8 that in every single programming language function chaining IS both possible and ergonomic!
Watch out!
When chaining, sooner or later you’ll start asking wondering along the lines of: What happens when my function is fallible and might not produce result; how my pipeline can work with that?!
These ponderings are very dangerous because they lead to inevitable search queries such as:
- What is monad?
- Monad support in <Programming Language>
- Haskell Tutorial
… which might make you an unwelcome guest to <Programming Language> pizza parties.
Back to the topic (small functions)
Ok, so I know that Haskell is verbose about side-effects through IO
“marker” 9 and that it has search for
signature. Functions can be pure and impure and they can be arranged
like a tree or like a line.
So now I can have opinion on whether function should be small or big, and so in form of opinion and preference10:
- If your function is impure - I rather have one big function instead of many small ones - that way I don’t have to consider and track all possible side effects
- Composition should be a chain and not a branch (when reasonable11)
- In such impure functions, I’d rather see effect gathering first and pure function calls later
- When function is pure, I’d like to have them as small as possible with hope of reusing
- I don’t care for name as long as I can find those pure functions through other means (like signature search).
Yet I agree that many small functions (especially impure ones with high branching), are impacting on cognitive abilities and thus should be considered harmful.
Extra ideas
- Most languages don’t have Hoogle-like search, but they have working LSPs - given my function input and output LSP should be able to guess and propose what function is of use for me in a given context.
- …and so your name doesn’t matter unless I can’t find it
- Great effects can be achieved by type-hinting pure functions (when language supports it)
- Even if you’re working in a language without mature tooling or type hints you can still agree with your team to use pseudo-types in comments and then use ripgrep to search for them
- Even so, you can even mark function as impure, if you find it useful
- …you might even map the whole system using simple graphs tools/ripgrep
to find blank spots, but who might need that
;)
-
Yes, that’s a mdash. Yes, I used it on purpose just to made you think I am a LLM
:D↩︎ -
Stringis[Char]in Haskell ↩︎ -
I’d like to give a Bobby Naming Award to a person who came up with “intercalate” ↩︎
-
Yes, in Haskell you should do that ↩︎
-
On another note, if you ever work with time-based code please consider that a clock is only an entity that produces an integer; it doesn’t have to be a hardware based clock! ↩︎
-
Reminder - a method is a type of function that’s linked to an object or a struct, linked object/struct can be implicitly or explicitly provided as a first argument to that function ↩︎
-
Note that even completely sensibly classified as pure functions might be impure due to leaking abstractions, e.g. complex float calculations could be architecture-dependent or even be hardware clock dependent - it doesn’t impact code per se but is worth to have in mind when Chaos Goblins come after you ↩︎
-
It’s a monad and not-a-monad; in Facebook relationship language it would be: it’s complicated ↩︎
-
Along with Murphy’s Law of All The Laws which states: Everyone can make a new law ↩︎
-
Reasonability is a function of available time and available resources ↩︎
-
A joke from an old PC adventure game, not remember which one exactly, could be The Prince and the Coward which is riddled with jokes like this one. ↩︎
Przemysław Alexander Kamiński
vel xlii vel exlee
Powered by hugo and hugo-theme-nostyleplease.