Name

html-is-a-tree

I use a static site generator to build this site, but I keep hitting its limitations.

Introduction

A long while ago, I used clojure for some web development and used hiccup-style html templates for the first time. For the uninitiated, clojure is a lisp dialect. This makes it possible to directly represent tree structures. For example, here is an HTML document:

[:html
 [:head
  [:link { :rel "stylesheet" :href "/style.css" }]]
 [:body
  [:article
   [:h1 "Hello world!"]
   [:p "This clojure data structure directly encodes an HTML document."]]]]

One can do more than just write static documents this way, as it becomes very easy to directly inject new structure using code:

(let [posts (load-documents "posts/**/*.md")]
 [:html
  [:head
   [:link { :rel "stylesheet" :href "/style.css" }]]
  [:body
   [:article
    [:h1 "Hello world!"]
    [:h2 "Read my articles"]
    [:ol (map
          (fn [post] [:li (post-summary post)])
          posts)]]]])

When the whole document has a uniform and consistent representation, it becomes easy to transform and manipulate. Want to iterate every heading in the doc and make a table of contents? Easy. Want to collect every link, check if it has metadata, and make a list of citations at the bottom of your page? Easy.

I haven't used a static site generator that fully embraced this, but I never tried really hard to find one since I had zola and was satisfied, and I knew it would take a lot of work to migrate.

I also had some big concerns which nagged away at me:

scss: I use scss to write my stylesheets, and didn't want to make the huge undertaking of translating to raw css if necessary.
markdown: I write my aritcles with markdown, and translate that to HTML, but I also use a lot of HTML and shortcodes embeded within my markdown documents.
clojure: clojure (and most LISPs) feel like a heavy dependency, and I didn't want to deal with complicated language setup in addition to complicated dependency management for the site generator itself. A single binary that I can easily compile or install from my package manager is very very hard to beat.

Interstingly, each of these concerns has diminished over time. This has me evaluating my options again.

Just Use CSS

I found an article called You no longer need JavaScript which showed just how far CSS had come. The standard has adopted almost everything I use scss for:

I already make heavy use of CSS variables
It turns out nesting is standards compliant now

I really could just write this site in plain CSS if I wanted to now, and I don't expect I would lose much of anything. I use scss @mixins in a couple of places, but I think it would be possible to use a simpler solution.

Limitations of Markdown and Zola

While text makes up the majority of my documents, I keep finding nice HTML elements like <dl> that I want to use, which markdown cannot represent.

I also find that I need nice ways to express structural data and freely mix structured data and text. Markdown doesn't do this well since it only really supports inline HTML which is verbose, or shortcodes which are limited. Markdown also doesn't allow attributes on elements, which is niche for me, but useful.

One recent pain-point: If I leave extraneous whitespace in some parts of my document, the markdown parser will read code as a preformatted text block instead of HTML or some other a [shortcode] (due to the 4 leading space rule).

If I put the following inside my markdown document:

<style>
figure {
  display: block;

  figcaption {
    display: block;
    padding-top: 1em;

    p { color: var(--fg2); }
  }
}
</style>

The nested p selector inside the figcaption has 4 leading spaces and a new line in front, making it a code block, resulting in the following html:

<style>
figure {
  display: block;

  figcaption {
    display: block;
    padding-top: 1em;
<pre>
<code>
p { color: var(--fg2); }
</code>
</pre>
  }
}
</style>

This is easy (though annoying) to avoid when writing the document directly, but quite difficult to avoid with shortcodes. Sometimes, they have non-trivial template statements inside them. If I fail to use the proper whitespace-removal sytanx, the output of a shortcode may trigger a 4-space codeblock.

shortcodes are not like lisp macros. They do not operate on the AST or structure of the document, they produce a soup of letters. This makes post-processing impossible as well!

Do I need markdown?

The thing is, while I do need a way to write mostly text documents conviniently, I don't necessarily need to use markdown. What I really want, is some way to transform a lightweight document into some sort of tree structure, and then perhaps transform that tree structure into HTML.

For example, I would like to transform a document like this:

## Hobbies

My current hobbies include:

- 3D art
- live coding
- sewing & embroidery
- blogging

But in the past I have enjoyed pottery, bookbinding, and sculpture.
I am always working on projects and endeavor to write about them
from time to time.

Into a data structure like this:

[:section
 [:h1 "Hobbies"]
 [:p "My current hobbies include:"]
 [:ul
  [:li "3D art"]
  [:li "live coding"]
  [:li "sewing & embroidery"]
  [:li "blogging"]]
 [:p "But in the past I have enjoyed pottery, bookbinding, and sculpture.
I am always working on projects and endeavor to write about them
from time to time."]]

And then process that somehow, perhaps rendering it to HTML, perhaps not. Ideally, the scripting would operate on the level of AST nodes.

A lisp-y scripting language feels like the perfect fit for something like this.

As it turns out, people have already experimented with writing mostly-text documents with lisp code inside; for example, skribe! It looks something like this:

(p [Everything inside square brackets is text, but one can inject code by using
lisp's ,(code "unquote") form: 1+4=,(+ 1 4)])

This is missing some of the niceities of markdown, but I find the idea of using something like ,() to inject code which outputs new syntax nodes compelling!

Enter Janet

At this point, the ideas were milling about in my head already and zola's limitations were getting harder to deal with. I found haunt, which seemed to do what I wanted, but I wasn't able to install it so I gave up.

Then I found janet! It was super easy to install, and it takes a lot of inspiration from clojure, which is good since I've used clojure before. I got hacking away and was able to convert tuples to HTML in a day!

(defn html/void-tag? [tag] (in html/void-tags tag))

(def- tab "  ")

(defn- align [i] (string "\n" (string/repeat tab i)))

(defn html/attribute [key value] (string/format " %s=\"%s\"" key value))

(defn html/attributes [dict]
  (reduce
    (fn [i [k v]] (string i (html/attribute k v)))
    ""
    (pairs dict)))

(defn html/void [indent name attr]
  (string (align indent) "<" name (html/attributes attr) " />"))

(defn html/open [indent name attr]
  (string (align indent) "<" name (html/attributes attr) ">"))

(defn html/close [indent name]
  (string (align indent) "</" name ">"))

(defn html/render [indent el]
  (let [element (fn [tag attr children]
        (cond
          (and (not (empty? children)) (html/void-tag? tag))
            (error (string "Void element \":" tag "\" declared with children"))
          (in html/void-tags tag) (html/void indent tag attr)
          (string
            (html/open indent tag attr)
            ;(map (fn [child] (html/render (+ 1 indent) child)) children)
            (html/close indent tag))))]
   (match el
     ([tag attr & children] (dictionary? attr))
       (element tag attr children)
     [tag & children]
       (element tag {} children)
     [tag]
       (element tag {} []))))

It's so nice! And it was not too painful to write. This:

(print
 (html/render 0
  '[:html
    [:head [:meta {}]]
    [:body {:class "name"}]]))

Outputs this:

<html>
  <head>
    <meta />
  </head>
  <body class="name">
  </body>
</html>

The translator doesn't properly handle text elements or escaping, but I think the proof-of-concept of representing the tree-structure of HTML is a success.

Hare-Brained Scheme

At this point, I started having ideas. janet is insanely cool! I decided to dig into the janet C api via zig. It was super easy to install the development package through my system package manager and link to it in a build.zig file.

It took a bit more work to figure out how to actually run janet code from within zig, but I was able to create a function which calls std.log.debug, call that from janet code that I called from zig.

The ouroboros was complete, and I realized that I could potentially create a pretty good static-site generator that takes inspiration from zine and std.Build, but using janet as the orchestration and templating/translation language.

The entry point could be a build.janet file, which exposes a builder API that defines the dependency graph of the site, registering janet functions as the "build pipeline" to transform files from 1 representation into to another until they get "installed" to the build directory.

I could write a function that parses md or orgdown into a janet tuple, splatting code inside ,() forms right into the data.

Since janet also lets one create handlers to use when importing files, I could maybe re-use the import functionality to register those files in the dependency graph for rebuild and watch support.

Suddenly things are feeling dangerously exciting!

And that is where I am now. I intend to prototype this in the most janky, single-threaded way possible and see where it goes :)