Page - 2

Page - 2

When I had tried cutting up Standard Ebook's Frankenstein as a possible book for the book-posting bot, the script failed with an error that suggested there was a "navigation item" with a "target" that wasn't in the "manifest". I looked into that today and discovered that a nav item's target can include an anchor. In this case there's a subsection in chapter 24 for "Walton, in Continuation" that points to text/chapter-24.xhtml#walton-in-continuation.

The manifest only lists the files themselves, so there was nothing matching exactly text/chapter-24.xhtml#walton-in-continuation.

So I did the easiest thing I could think of: I strip any anchors off and track which files have already been processed (so I don't end up repeating the content of text/chapter-24.xhtml).

This will work as long as I'm dealing with linear narrative, which I think should be a safe assumption for a while.

I took a break! I got to enjoy the fruits of the past 10 days when I read O Pioneers! part 2 of 39 this morning.

Some things I might pick up when I continue the adventure:

  • Error handling.
  • Rewrite in another language.
  • Investigate why the chunking script failed on the Standard Ebook edition of Frankenstein.
  • Do something else! My little e-ink dashboard could use a Message Of The Day.

See you tomorrow!

Just a quick morning of:

  1. Seeing that the bot posted at 8am just as I expected.
  2. Validating the text. There was a jarring jump in the middle of a paragraph that made me worry I had accidentally dropped a bunch of text. But it was all good. Willa Cather made a choice!
  3. Saw that the RSS feed wasn't working and tried looking into it, but no progress.
  4. Deleted the post and re-ran the posting script after changing the visibility of posts to public. (I think since I put the text behind a spoiler/CW and it only posts once a day that I'm not violating good bot practice.)

I'm excited. Reading a book slowly via fediverse posts is gonna work!

Today's goal was to post sections of a book to the GoToSocial bot I set up earlier in the adventure.

This morning was preparation: cleaning up the code that splits ebooks, cleaning up the directory structure of where book parts get dumped, storing a little json file with each book to store useful information for when it's time to post (like the title, author, and number of parts).

In the evening, I made a bunch of test posts. I found some inconsistencies after checking on the posts in Elk (the web frontend I use most), Pinafore (a popular web frontend), and Tusky (the Android app I use). They all treated whitespace a little differently.

For example:

<p>
  This ebook is the product of
  <a href="https://standardebooks.org/">
    Standard Ebooks
  </a>
</p>

One displayed it as you would expect. One treated the newline between "of" and "Standard" as meaningful and put the two words on different lines. And the other collapsed all the whitespace between "of" and "Standard" down to "ofStandard".

Bizarre.

It was an easy enough fix. I just stopped pretty printing the html. Now they all display things consistently.

So, test posts are done. Tomorrow I'll try deploying the code to my little server, set up a timer to post on a schedule, and let people know they can read O Pioneers! by Willa Cather with me over the course of 39 days.

I've half changed my mind building an RSS feed. The RSS reader I use, self-hosted DanB/RSS, truncates content. I don't feel like migrating to another reader or updating DanB/RSS to allow selectively to not truncate content. I could just put up pages to link to, but I want more to have the content where I am and not a click away.

So the natural choice is to make a bot on the Fediverse.

Today's quick work, then, was creating an account on my GoToSocial server, configuring it as a bot, and setting up an "application" for statuses to be posted through.

I did turn on the RSS feed for the bot, so if I ever to get my RSS reader to behave the way I want I could read there.

Next session I'll do some test posts.

A rough plan for what's to come:

  • Add a header of some sort to posts (like, "Title, by Author, part M of N")
  • Pick a book
  • Split it up
  • Schedule posts
  • Share and celebrate
  • Set a reminder close to when the book will finish to pick another

This morning I fixed the problem where I was skipping content from an ebook. I was filtering it out based on a faulty idea of what was valid content versus empty content! Oops.

I replaced prints with logging.

And I improved the chunking algorithm to honor chapter breaks when they appear within 30% of the size limit.

I'm going to switch out the ebook I've been testing with, Return of the Native by Thomas Hardy, with something else. I started reading my copy of it, so now I'm too far ahead of when this will be ready to read via RSS, so I have to find something else. Not a bad problem!

That's probably it for today. I'm headed to electronics recycling and the library. Then: relaxation.

Short (and late) day of reading ebook content and re-chunking it into the approximate size I want.

Except for the part where I'm accidentally skipping huge swaths of text. Gonna have to debug that tomorrow.

Today I got as far as being able to go from items in my flattened table of contents to getting the xhtml content. I had to go on a bit of a detour that I suspect I wouldn't need if I was feeling sharper today.

The items out of the table of contents have a target property, which is the path to its xhtml file inside the ePub. But in order to pull it out, you need the id for the find_content_by_id method on the Document object. The id isn't a part of the table of content items. I had to match each one to an entry in the manifest, which has both the path to the xhtml and id, but the manifest doesn't have ordering.

It's just a little dance. It's fine.

I also put a # type: ignore directive on a line that I just didn't want to bother figuring out how to please pyright this morning. (I use pyright).

Tomorrow I'm betting I'll get to making fresh xhtml files, re-slicing the content into consistently sized files.