Posts from 'december-adventure-2025' series - 2
December Adventure Day 11
I took a break! I got to enjoy the fruits of the past 10 days when I read O Pioneers! part 2 of 39 this morning.
Some things I might pick up when I continue the adventure:
- Error handling.
- Rewrite in another language.
- Investigate why the chunking script failed on the Standard Ebook edition of Frankenstein.
- Do something else! My little e-ink dashboard could use a Message Of The Day.
See you tomorrow!
December Adventure Day 12
When I had tried cutting up Standard Ebook's Frankenstein
as a possible book for the book-posting bot, the script
failed with an error that suggested there was a "navigation item" with
a "target" that wasn't in the "manifest". I looked into that today and
discovered that a nav item's target can include an anchor. In this
case there's a subsection in chapter 24 for "Walton, in Continuation"
that points to text/chapter-24.xhtml#walton-in-continuation.
The manifest only lists the files themselves, so there was nothing
matching exactly text/chapter-24.xhtml#walton-in-continuation.
So I did the easiest thing I could think of: I strip any anchors off
and track which files have already been processed (so I don't end up
repeating the content of text/chapter-24.xhtml).
This will work as long as I'm dealing with linear narrative, which I think should be a safe assumption for a while.
December Adventure Day 13
Just a couple of quick edits today to add some missing type annotations (I like doing types).
I went for a run and made a cornbread chili casserole, too.
December Adventure Day 14
This morning I continued my adjustments to make Frankenstein work.
Each xhtml file in a Standard Ebook epub file basically looks like this:
<?xml version="1.0" encoding="utf-8"?>
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:epub="http://www.idpf.org/2007/ops"
lang="en-GB"
epub:prefix="z3998: http://www.daisy.org/z3998/2012/vocab/structure/, se: https://standardebooks.org/vocab/1.0" xml:lang="en-GB">
<head>
<title>Chapter XXIII</title>
<link href="../css/core.css" rel="stylesheet" type="text/css"/>
<link href="../css/local.css" rel="stylesheet" type="text/css"/>
</head>
<body epub:type="bodymatter z3998:fiction">
<section id="chapter-23" role="doc-chapter" epub:type="chapter">
<h2>
<span epub:type="label">Chapter</span>
<span epub:type="ordinal z3998:roman">XXIII</span>
</h2>
<p>It was eight o’clock when we landed...</p>
<p>...</p>
<p>...</p>
...
</section>
</body>
</html>
One assumption of the script that splits up the xhtml files is that
each immediate child of the <body>'s <section> will be "small". So
all it does is take each child, see if adding it to the current chunk
is still within the size threshold and either add it or make a new
chunk.
However, Frankenstein proved that assumption false. There were two
examples of children that were big: another <section> and
<blockquote>. The <blockquote> examples are kind of fun, because
Frankenstein is a story-within-a-story (-within-a-story...).
Today's solution was to "unwrap" <section>s. A <section> doesn't
do anything in a fediverse post, so it seems safe to do that.
<blockquote>s impact rendering so I'm unwrapping it, but re-wrapping
each child in its own <blockquote>. So this:
<blockquote>
<p>Three be the things I shall have till I die:</p>
<p>Laughter and hope and a sock in the eye.</p>
</blockquote>
Becomes this:
<blockquote>
<p>Three be the things I shall have till I die:</p>
</blockquote>
<blockquote>
<p>Laughter and hope and a sock in the eye.</p>
</blockquote>
Semantically bad, structurally questionable, but renders fine.
Good enough for me.
December Adventure Day 15
I think today will just be the quick morning update to handle ebooks with multiple title entries by simply picking the first one. In the case in front of me,
["Frankenstein", "Or, the Modern Prometheus", "Frankenstein, or the Modern Prometheus"]
it will choose Frankenstein.
Meanwhile, O Pioneers! is proving to be a good read. It wasn't even on my list!
December Adventure Day 16
All my thoughts about cleaning up the code that powers @tomes@phantasmal.work got sidetracked by starting on Tumble Forth, which promises, "Starting from bare metal on the PC platform, we build a Forth from scratch..."
I came across it from a sequence of clicks that started from browsing
the #DecemberAdventure hashtag and ended up derailing my morning.
Now here I am, looking at wikis and trying to write assembly.
mov al, 0 ; clear lines
mov cl, 0 ; starting at the left
mov ch, 0 ; starting at the top
mov dh, 7 ; until line 8 (which is enough)
mov ah, 0x06 ; Scroll up window
int 0x10 ; Interrupt
I wrote a few things in Uxn earlier this year, which primed me for getting captured by a tutorial for implementing Forth from scratch.
December Adventure Day 17
After yesteday's detour into assembly, I decided today to reacquaint myself with Uxn, its ecosystem, and the code I've written in it.
Just a little bit of exploration, late in the day.
December Adventure Day 19
Yesterday was basically a day off. I looked a little bit at the current state of Scheme implementations... I don't know what I'm waiting for, but I think I'm waiting a little longer.
Today I wrote notes toward a new ebook chunking algorithm for @tomes.
The idea being the splitting works from the whole instead of accumulating from the beginning. Trying to balance chunks better.
As an example, a series of chapters with the following character counts:
- 12,000
- 8,000
- 12,000
Currently, the accumulation strategy grows each chunk until hitting around 8,000 characters, and would result in chunks approximately like so:
- 8,000
- 8,000 (spanning the chapter 1-2 border)
- 8,000 (spanning the chapter 2-3 border)
- 8,000
Instead, look first for any chapters within the threshold, and make their chunks (nearly?) invincible. Then chunk the remainders "evenly":
- 6,000 (chapter 1, front half)
- 6,000 (chapter 1, back half)
- 8,000 (the entirety of chapter 2)
- 6,000 (chapter 2, front half)
- 6,000 (chapter 2, back half)
Additionally, apply some penalties to certain pieces of markup to try to avoid breaking up sections that would likely suffer.
For example, near the middle of a long chapter:
<p>
.... long paragraph ...
</p>
<p>"Quick dialog," she said.</p>
<p>"Witty retort," he replied</p>
<p>"Devastating comeback."</p>
<p>
.... long paragraph ...
</p>
It'd be nice to penalize breaking a chunk between short paragraphs, likely to be dialog that would be better to stay together. And have large penalties near the beginning and end of a chapter, to avoid cutting too close to natural seams.
Then find the lowest penalty cut points near the point where a naïve even split would go.
December Adventure Day 21
Yesterday and today I started a Rust project where I'm re-implementing the ebook-splitting code and where I'll write the new algorithm I mentioned the other day.
Part of picking up Rust again is getting my bearings, especially around organizing code into modules. I think I finally have the rule I need to remember that always trips me up:
When you have binary and library code, they are different crates and
the library crate controls all the Rust source except the one file
that makes the binary, main.rs. That's why in main.rs you use PACKAGE_NAME::... and everywhere else you use crate::.... The
binary has to import the library. The library gets to talk about
itself as the crate.
December Adventure Day 23
The last couple of days have been writing Rust in 20-line stints at a time. Things starting with declarations like:
impl TryFrom<PathBuf> for Book {