Emacs: The MacOS Bug

The Context
I have been recently roaming.
Doing some Zig, doing some Go, some Janet. Some C integration. Should have focused on my project but life threw more at me than I could handle, so I sought… happy distractions.
My experience with those technologies taught me new tricks and one day, when I needed some more distraction, I decided to debug something that had made me furious for years: Emacs jank1.
Whatever build I tried, whatever configuration I used, Emacs always rose in RAM usage. It became slower and slower within hours of use and eventually froze completely, requiring a kill.
I have read many complaints about performance searched for debugging techniques, possible causes etc. Some things helped others didn’t, but nothing made my experience closer to that on Linux or Windows.
All the sanctioned profiling I did hadn’t uncovered any anomalies. Crashdumps looked fine, sometimes showing that a hang happens on ns_select_1
; something I wouldn’t expect to be a problem. Everything was reporting near-perfect behavior. Only thing that seemed off was CPU and MEM heavy redisplay_internal
, but rendering big canvas on a HiDPI display might be costly.
Seemed to be “It’s not you, it’s me”. Until I got my hands on the traces.
Meta-consideration
Before going any further, I need to emphasize that, after spending more than a week digging into nsterm.m2, I’m not critical about the design or implementation
It’s easy to claim that something was badly designed or implemented inefficiently. Experience often points to simple reasons: growing complexity and lack of time.
In Emacs’ specific code case for MacOS, I could almost see how it developed over time and how layers upon layers were carefully layered, slowly tangling roots into a complex mesh. With the ever-changing MacOS API, maintaining the code likely took more effort than maintainers could provide. Besides, it’s not a secret that MacOS isn’t the primary platform - which explains the multiple flavors of Emacs on MacOS: Emacs Doom, MacPorts Emacs, Emacs Makefile, Homebrew distribution, Homebrew Emacs-Plus or Mitsuharu Yamamoto’s emacs-mac, to name a few.
Even though the bug exists on such a non-primary platform, the visible care of the code makes it easy to track. For me - and I’m not an expert in Objective-C, C or MacOS system engineering.
Issue
The problem is awkward as because it revolves around [NSApp run]
3 invocations.
What this invocation does is it starts the main run loop, draining all pending event (the queue is shared) and then exits. It doesn’t seem to be problematic at all. Yet, it’s often used to process upcoming low-level system messages. Notably inside ns_select
(replacement for pselect
) and ns_read_socket
, a communication pipe between “core” Emacs and GUI.
And inside it, there is barely visible side effect. Run invocation often occurs within NSAutoreleasePool init
, which starts blank memory slate. Since it initiates a blank run, it creates everything that the main run loop would do during initialization: It creates windows, initializes graphical context, loads glyphs, fonts, and draws the frame.
It sounds horrible. But does it make sense? If it’s that bad - why didn’t anyone notice?
Because Emacs is very efficient.
This invocation is run as a response to an event - which can be almost anything. MacOS processes events rapidly in an async way. However, it’s also very efficient at juggling resources. Emacs, a pinnacle of engineering excellence, sets up and tears down resources in milliseconds.
…and the faster your hardware is, the faster it happens.
For instance, dragging a window handle - depending on the machine - could result in thousands if not millions of such events, causing allocation and reallocation of gigabytes of memory; Emacs is preparing you a new, laid-out window that is deallocated half a millisecond later because you’re not done yet.
Rapid allocations and deallocations cause various issues. Some allocations aren’t released. On purpose, because it’s the “core” Emacs which should handle closing, releasing resources, etc. However, if it didn’t expect those phantom instances, some small allocations might hang. NSMenu items, some strings, and glyphs could be affected. Emacs is unaware that pselect
might be doing that, so it’s not cleaning them up.
Amusingly, when MacOS sees thousands of similar allocations, it starts to… cache them, “perceiving” them as important. They’re mostly useless: already loaded font glyphs in live windows, menu strings, copies of the current state of the rendered frame for use in the next re-rendering events. Those objects are allocated in hundreds of thousands, maybe millions. However, things like Emacs variables, search results reads, and counted in 100s are being pushed out further and further.
All of this processing is unbounded. The faster your Mac can process the events, the more loops happen. The higher DPI of your display - the bigger IOSurface size4 and the bigger the allocations. The newer your MacBook, the more sensitive it is dispatching more mouse events.
In short: the faster your Mac, the slower Emacs gets.
For many, this slowness won’t be a surprise. There are plenty of complaints about slowness on MacOS, especially around popular packages. Often, issues are dismissed by authors as an external problem on which they’re correct.
Unfortunately, getting into Emacs debugging on MacOS is not easy so it’s hard to provide any proof of that behavior (See tutorial if you want to try).
This concrete problem you can easily observe: Open a fresh Emacs instance (with -Q
) grab the handle and resize like a loon over 10 seconds. Then check Activity Monitor’s memory tab. Another easy way is to use following snippet, courtesy of Rudolf Adamkovič, sent to emacs-devel mailing list:
(dotimes (x 1000)
(let ((frame (make-frame-command)))
(sleep-for 0.01)
(delete-frame frame)))
Check out the memory usage.
What can be done?
The unfortunate situation is that it cannot be easily fixed. Code is deeply rooted, and - I can bet - there ARE reasons why it is like it is.
As of the moment of writing these words, there is a discussion on emacs-devel on how to address and alleviate issues. Even in the best case though, things won’t be as great as they are on Linux or Windows.
Emacs on MacOS still uses only 3 threads (Main, EventDispatch, and FD handler), it will still use locks heavily and process (and wait) in infinite loops. In order to bring Emacs on MacOS up to par, deep work on the event queue and proper threading support is required. While this won’t break anything for others, it might make memory management more efficient.
While investigating this issue and attempting to improve memory efficiency, my attention was drawn to Swift. It provides some facilities that could simplify, safe-guard, and definitely optimize the code. Today, Swift is also thread-safe, which wasn’t the case some time back.
I have an idea of gradually moving MacOS specific code to a Swift-controlled environment, often delegating to existing code. Swift is not only more concise but also has:
- Tooling that works outside of XCode
- Built-in asynchrony support
- Thread-safety features (such as Actors)
- More efficient memory management
Hopefully this would allow dropping some of the complex locking mechanisms, using callbacks on return and specialized MacOS APIs.
Who knows, maybe - if that works - it could even be a part of a common GUI interface for more platforms (and pave the way for something like IMGUI across all platforms - equally fast and good-looking).
Sure, it might be a pipe dream, but there’s one thing I’m certain: Emacs on MacOS is a supercar with a shift lock engaged.
Today.
-
Jank is a small lag / unresponsiveness of user interface rendering ↩︎
-
nsterm.m contains Objective-C code containing MacOS (but also GNUStep) specific code ↩︎
-
Objective-C 101: Everything is message. Square brackets means that I’m sending
run
toNSApp
. Simplifying it’s like calling a method ↩︎ -
Surface for rendering Emacs in OS window. ↩︎
Przemysław Alexander Kamiński
vel xlii
vel exlee
Powered by hugo and hugo-theme-nostyleplease.