Backing up the Kobo Clara Database
I have a Kobo Clara HD e-reader. It’s great, and the whole concept of e-readers is great.
A neat thing about the Kobo is that it logs everything you do with it and stores that into an SQLite database. Literally everything – it’s not just tracking what books you’ve read, but it tracks how many seconds per reading session and how many pages you covered. A session is defined, as best I can tell, from the time you enter a book (either turning it on, or navigating to a new book from the menu) to the time you leave a book.
My plan was to automatically back up this database every time I connected the e-reader to my computer, and automatically do some data analysis (like getting the last 10 books I read, or what I’m currently reading) and post it on my website, or put it in my discord status or whatever.
I accomplished the back-up part, and played around with some SQL queries for doing analysis, but never ended up wiring that up to any platforms.
I’m sharing some information I discovered in the course of this project, in the hopes that it helps out anyone else attempting a similar task.
The Database
The data is stored in .kobo/KoboReader.sqlite
. There are a number of tables, but two are notable: AnalyticsEvents
, and content
.
AnalyticsEvents
AnalyticsEvents
is the one that stores all the things that have ever happened. The Type
column holds the type of event, and there are, in my database, 50 different entries (there may be more that I just haven’t encountered). Examples include OpenContent
, LeaveContent
, StartReadingBook
, MarkAsFinished
, PluggedIn
, and so on.
There’s also a Timestamp
column, but the really interesting data is stored as JSON in either Attributes
or Metrics
To give you an example, here’s a LeaveContent
’s Attributes
entry:
{
"ContentFormat": "application/epub+zip",
"ContentType": "application/epub+zip",
"Monetization": "Sideloaded",
"Origin": "Sideloaded",
"ViewType": null,
"author": "Indra Das",
"isbn": "9781101967522",
"progress": "2",
"title": "The Devourers"
}
And then the matching Metrics
:
{
"IdleTime": 91,
"PagesTurned": 0,
"SecondsRead": 91
}
(Side note: The Devourers is a fantastic book.)
Not all events have both, or even either. For example, AppStart
just has {}
for both columns, while OpenContent
only includes the attributes, and PluggedIn
only has Metrics
.
content
The content
table (note the lowercase c – there doesn’t seem to be a pattern for when table names use CamelCase or snake_case) stores information about books currently on the device. Of interest are columns like FirstTimeReading
, DateLastRead
, ReadStatus
, CurrentChapterProgres and (I'm just noticing this now)
IsInternetArchive`.
However, most of these columns aren’t used, or the values are confusing – FirstTimeReading
doesn’t seem to actually track what the name would suggest. Your milage may vary.
The SQL
ChatGPT was a big help – I don’t write SQL often enough to remember it all on my own, but I know enough to wrangle GPT to do it for me.
Last 10 Books Read
SELECT Title, Attribution, DateLastRead as DateFinished
FROM content
WHERE ReadStatus=2
ORDER BY DateLastRead DESC
LIMIT 10;
Currently Reading
SELECT Title, Attribution
FROM content
WHERE ReadStatus=1
ORDER BY DateLastRead DESC
LIMIT 1;
More Advanced Statistics
As mentioned, content
only contains data for books that are still on the device. To include books that I read and then deleted, I would have to use AnalyticsEvents
. In addition, anything more sophisticated, like finding out which books I reread (and how many times), or the total time spent on a book, would require using AnalyticsEvents
. Another issue with content
is that if I accidentally open a finished book and set it to finished again, then that becomes the new finished time. In order to weed out these sorts of things, I’d have to use some heuristics and look at a few different events.
With the help of GPT I produced this monstrosity, that finds all books with a progress of above 90%, which is a starting point for finding the “real” finished date.
WITH BookProgress90 AS (
SELECT
Timestamp,
json_extract(Attributes, '$.progress') AS Progress,
Attributes
FROM
AnalyticsEvents
WHERE
Type = 'BookProgress'
AND json_extract(Attributes, '$.progress') = '90'
)
SELECT
BP.Timestamp AS ProgressTimestamp,
BP.Progress,
(
SELECT
AE.Timestamp
FROM
AnalyticsEvents AE
WHERE
AE.Type = 'OpenContent'
AND AE.Timestamp < BP.Timestamp
ORDER BY
AE.Timestamp DESC
LIMIT 1
) AS LastOpenContentTimestamp,
(
SELECT
json_extract(AE.Attributes, '$.title')
FROM
AnalyticsEvents AE
WHERE
AE.Type = 'OpenContent'
AND AE.Timestamp < BP.Timestamp
ORDER BY
AE.Timestamp DESC
LIMIT 1
) AS LastOpenContentTitle,
json_extract(BP.Attributes, '$.volumeid')
FROM
BookProgress90 BP
Honestly, I’m not sure if SQL is the right tool for the job – it may be simpler to use a state machine(s) to parse the data.
Automatic Backups (on Windows)
There are two parts to this: a Rust program that does the copying, and the scheduled task that runs the program when the Kobo is plugged in.
The Rust Program
I was a bit rusty (haha) when I wrote this program, so don’t judge too harshly. The program exits silently if the hardcoded path of F:\.kobo\KoboReader.sqlite
isn’t found. As you’ll see in a moment, this program is going to be automatically run when any USB device is plugged in. Then it checks the sizes, and copies it only if the one on the reader is bigger. Which (I just realized as I wrote this) is going to be always, because the kobo logs when it’s plugged in. Oh well.
Note the flags in cargo.toml
:
rustflags = ["-C", "link-args=/SUBSYSTEM:WINDOWS", "-C", "link-args=/ENTRY:mainCRTStartup"]
This transforms it from a console app to a Windows app, meaning there won’t be the flicker of the command prompt popping up when the program runs in the background. It was the desire to eliminate that command prompt that’s the reason this simple program is a compiled program in the first place and not just a powershell script.
The Task
First, you need to open the Event Viewer. Then drill down to Applications and Services Logs/Microsoft/Windows/DriverFrameworks-UserMode/Operational
, then right click the log and enable it.
Now plug in your Kobo, and you should see a bunch of new events. Pay attention to the event ID – there should be one event that has a unique ID. For me, it’s ID 2006, and it’s the last “Loading Drivers …” event. When we make the task in the next step, it’s just going to be watching this log for that event ID – as far as I can tell, the other information can’t be used to affect whether a task runs or not.
Right click the event, and select “Attach Task to Event”. Follow the wizard, and point it at your EXE that you want to run. Once complete, it will create the task and you’ll be able to find and modify/delete it in the Task Scheduler, under Task Scheduler Library/Event Viewer Tasks
.
The End
That’s all, folks! Hope this helped someone out there.