Fixing a bug in NewsFlash and shaving a yak.

Abstract
A boring story on how I wanted to fix a bug in NewsFlash and ended up shaving a yak.

The Initial Task

There is a bug in NewsFlash, my RSS reader of choice. When using the light theme, <blockquote>’s from this feed have the same white font color as the background and are therefore unreadable. Fixing this seemed like a simple and quick task: It consists of just 18k lines of Rust code (although I later found out it depends NewsFlash is split into two rust crates, called news_flash_gtk and news_flash, which is another 13k lines of rust code) so it cannot be that hard to read. find -name *.css returns three results, one of them is news_flash_gtk/data/resources/article_view/style.css. Naturally, I suspected that the reason some text is white is that this css file sets it to white or at least forgets to set it to black. So, it should be a quick fix. But it turned out to be quite a yak-shaving…

Building NewsFlash

NewsFlash is build with Meson. I like Meson quite a lot, because its build definitions are really boring and readable. As recommended, I build it with

meson --prefix=/usr build
cd build
ninja
which worked just fine. Unfortunately, changing style.css and re-running ninja did not trigger a rebuild. This could have two possible reasons: - style.css is just dead code. - style.css is read in some build step, but not specified as a dependency in the build definition. I unsuccessfully tried a while to find in which build step style.css is read. The winning idea was to prevent the file from being red using chmod 000 style.css, then do a clean rebuild and see who complains.
error: proc-macro derive panicked
  --> src/article_view/mod.rs:39:10
   |
39 | #[derive(RustEmbed)]
   |          ^^^^^^^^^
   |
   = help: message: File should be readable: Os { code: 13, kind: PermissionDenied, message: "Permission denied" }

Bingo. In hindsight, I should have came up with that idea sooner. Understanding why touching style.css did not trigger a rebuild is simple: news_flash_gtk/src/meson.build includes the lines

newsflash_sources = files(
    'article_list/models/article.rs',
    ...
)
cargo_release = custom_target('cargo-build',
    build_by_default: true,
    input: [
            newsflash_sources,
            ],
    output: ['com.gitlab.newsflash'],
    install: true,
    install_dir: newsflash_bindir,
    console: true,
    command: [cargo_script,
                '@SOURCE_ROOT@',
                '@OUTPUT@',
                meson.build_root(),
                profile,
                '--features "@0@"'.format(features)
                ])

Running ninja will only rerun cargo if one of the files listed in newsflash_sources changed. Added styles.css to newsflash_sources therefore fixes this bug. I send this patch upstream and it got merged.

Database Problems

Now that the build works, lets start NewsFlash. Once I attempt to select Local RSS (or attempt to log in), nothing happens but this is printed to stderr.
02:20:41 - ERROR - Database migration failed: Failed with: FOREIGN KEY constraint failed (news_flash::database:181)

Oh Fuck Me. Deleting the config folder that stores the database does not change me.

Failing Tests

After debugging for a while, I thought: I should run the unittests. Maybe there is some minimal database example that triggers the same bug.

 volker   Sync  git  news_flash_gtk  cargo test
    Finished test [unoptimized + debuginfo] target(s) in 0.62s
     Running unittests (target/debug/deps/news_flash_gtk-0f333f82bdbfa5b7)

running 8 tests
test color::tests::hsla_to_rgba ... ok
test color::tests::parse_color_string ... ok
test color::tests::rgba_to_hsla ... ok
test i18n::tests::test_i18n ... ok
test i18n::tests::test_i18n_f ... ok
test i18n::tests::test_pi18n ... ok
test i18n::tests::test_i18n_k ... ok
test util::mercury::tests::parse_phoronix ... ok

test result: ok. 8 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 10.55s

error: test failed, to rerun pass '--bin news_flash_gtk'

Caused by:
  process didn't exit successfully: `/home/volker/Sync/git/news_flash_gtk/target/debug/deps/news_flash_gtk-0f333f82bdbfa5b7` (signal: 6, SIGABRT: process abort signal)
 volker   Sync  git  news_flash_gtk  
Wait what? There are 8 tests, 8 passed, 0 failed, 0 were ignored or filtered out, we have 8 ok marks … but it failed anyway? It took me a while to understand what is happening: It turns out that the parse_phoronix function contains code, that will make the program crash. But this crash is delayed - control flow reaches the end of the parse_phoronix function and returns back to the test harness. It is the exit handler that aborts:
Thread 1 "news_flash_gtk-" received signal SIGABRT, Aborted.
0x00007ffff227a36c in ?? () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff227a36c in  () at /usr/lib/libc.so.6
#1  0x00007ffff222a838 in raise () at /usr/lib/libc.so.6
#2  0x00007ffff2214535 in abort () at /usr/lib/libc.so.6
#3  0x00007ffff54774ac in  () at /usr/lib/libwebkit2gtk-5.0.so.0
#4  0x00007ffff586ecdd in  () at /usr/lib/libwebkit2gtk-5.0.so.0
#5  0x00007ffff57d5c5b in  () at /usr/lib/libwebkit2gtk-5.0.so.0
#6  0x00007ffff2cb1b35 in g_object_unref () at /usr/lib/libgobject-2.0.so.0
#7  0x00007ffff57ddfbc in  () at /usr/lib/libwebkit2gtk-5.0.so.0
#8  0x00007ffff2cb1b35 in g_object_unref () at /usr/lib/libgobject-2.0.so.0
#9  0x00007ffff222cef5 in  () at /usr/lib/libc.so.6
#10 0x00007ffff222d070 in on_exit () at /usr/lib/libc.so.6
#11 0x00007ffff2215297 in  () at /usr/lib/libc.so.6
#12 0x00007ffff221534a in __libc_start_main () at /usr/lib/libc.so.6
#13 0x00005555555dfec5 in _start () at ../sysdeps/x86_64/start.S:115
(gdb)

Rust Linking Errors

I suspected that the parse_phoronix function corrupts memory, which leads to this crash. So, I decided to ran everything under ASAN. The documentation says I should compile with RUSTFLAGS="-Zsanitizer=address". So let’s try that:
error: /home/volker/Sync/git/news_flash_gtk/target/debug/deps/libproc_macro_error_attr-6323cc5d4e1386c9.so: undefined symbol: __asan_option_detect_stack_use_after_return
   --> /home/volker/.cargo/registry/src/github.com-1ecc6299db9ec823/proc-macro-error-1.0.4/src/lib.rs:284:9
    |
284 | pub use proc_macro_error_attr::proc_macro_error;
    |         ^^^^^^^^^^^^^^^^^^^^^

A fucking linking error in Rust. What a rare sight. It turns out that you need an additional flag, so I created a PR that adds a note to the documentation. I am a bit disappointed that Rust does not catch this error and gives you a nicer error message. Anyway, ASAN confirmed the absence of memory corruption. So, our search for the bug that causes the SIGABRT continues.

WebKit Debug Symbols

I want to have debug symbols in my backtrace. So let’s try to compile WebKit with debug symbols. I followed their documentation, then compiled NewsFlash using

export LIBRARY_PATH="/run/media/volker/DATA/cloned/webkitgtk-2.36.1/lib"
export LD_LIBRARY_PATH="/run/media/volker/DATA/cloned/webkitgtk-2.36.1/lib"
cargo build
Now test fails with
** (process:222197): ERROR **: 22:15:48.714: Unable to spawn a new child process: Failed to spawn child process “/usr/local/libexec/webkit2gtk-4.1/WebKitNetworkProcess” (No such file or directory)

Oh Fuck Me Hard. Looks like there are some subtleties in how WebKit is built. So let’s download the PKGBUILD that describes how Arch Linux builds Webkit and modify it to include Debug symbols. Once we run it in gdb, we are greeted by:

warning: Could not find DWO CU Source/WebKit/CMakeFiles/WebKit.dir/__/__/DerivedSources/WebKit/AutomationBackendDispatchers.cpp.dwo(0x8819e41405bd3bc0) referenced by CU at offset 0x23 [in module /usr/lib/debug/usr/lib/libwebkit2gtk-5.0.so.0.0.0.debug]
Dwarf Error: unexpected tag 'DW_TAG_skeleton_unit' at offset 0x23 [in module /usr/lib/debug/usr/lib/libwebkit2gtk-5.0.so.0.0.0.debug]
warning: Could not find DWO CU Source/JavaScriptCore/CMakeFiles/LowLevelInterpreterLib.dir/llint/LowLevelInterpreter.cpp.dwo(0x8640769469a7cd6a) referenced by CU at offset 0x23 [in module /usr/lib/debug/usr/lib/libjavascriptcoregtk-5.0.so.0.0.0.debug]
Dwarf Error: unexpected tag 'DW_TAG_skeleton_unit' at offset 0x23 [in module /usr/lib/debug/usr/lib/libjavascriptcoregtk-5.0.so.0.0.0.debug]

Angry German noises.

It took me longer than it should to understand both warnings: Those *.dwo files get build, but not installed. Source/.../AutomationBackendDispatchers.cpp.dwo does exist in the build directory. So let’s cd into the build directory and try again:

Dwarf Error: unexpected tag 'DW_TAG_skeleton_unit' at offset 0x47fc [in module /usr/lib/debug/usr/lib/libjavascriptcoregtk-5.0.so.0.0.0.debug]
Exception ignored in: <gdb._GdbOutputFile object at 0x7fdccb07fa30>
Traceback (most recent call last):
  File "/usr/share/gdb/python/gdb/__init__.py", line 47, in flush
    def flush(self):
KeyboardInterrupt:

Lol, What? I did not touch the keyboard. Running sudo journalctl -b shows the true reason:

Mai 26 00:01:17 battle earlyoom[513]: sending SIGTERM to process 40789 uid 1000 "gdb": badness 1124, VmRSS 4782 MiB

GDB uses up to much memory, so it gets killed. I have 8 GB of RAM and 24 GB of Swap, but apparently that is not enough. I opened an issue at bugzilla to complain about how GDB claims it was killed by a KeyboardInterrupt, even though it was a SIGTERM.

Next try: Let’s replace -ggdb3 with -ggdb1. This should generate less debuginfo and should therefore use less RAM.

… Nope, still eats too much RAM.

I am running Arch Linux on my machine here. The main reason why I use Arch Linux is because they have a really, really detailed wiki. In fact, I found something about my problem the wiki:

Alternatively you can put the debug information in a separate package by enabling both debug and strip, debug symbols will then be stripped from the main package and placed, together with source files to aid in stepping through the debugger, in a separate pkgbase-debug package. This is advantageous if the package contains very large binaries (e.g. over a GB with debug symbols included) as it might cause freezing and other strange, unwanted behavior occurring.

Ok, so lets try that … Nope, same problem persists: OOM if I cd into build and warnings about non existing .dwo files otherwise.

Downloading Debug Symbols

I don’t think trying to solve the OOM issue from the previous section would have a good cost/reward ratio. So, let’s give up on building it by hand and just use the best feature since sliced bread: For Arch Linux packages (and webkit2gtk-5.0 is an Arch Linux package) gdb gives you this prompt:

This GDB supports auto-downloading debuginfo from the following URLs:
https://debuginfod.archlinux.org
Enable debuginfod for this session? (y or [n])

Upon confirming this prompt, gdb will download the correct debug symbols and we get our backtrace:

#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007ffff208e3d3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2  0x00007ffff203e838 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff2028535 in __GI_abort () at abort.c:79
#4  0x00007ffff543e4bc in WTFCrashWithInfo(int, char const*, char const*, int) () at /usr/src/debug/build/WTF/Headers/wtf/Assertions.h:741
#5  allDataStores () at /usr/src/debug/webkitgtk-2.36.3/Source/WebKit/UIProcess/WebsiteData/WebsiteDataStore.cpp:101
#6  WebKit::WebsiteDataStore::~WebsiteDataStore() () at /usr/src/debug/webkitgtk-2.36.3/Source/WebKit/UIProcess/WebsiteData/WebsiteDataStore.cpp:152
#7  0x00007ffff583626d in WebKit::WebsiteDataStore::~WebsiteDataStore() () at /usr/src/debug/webkitgtk-2.36.3/Source/WebKit/UIProcess/WebsiteData/WebsiteDataStore.cpp:158
#8  0x00007ffff579d04b in WTF::ThreadSafeRefCounted<API::Object, (WTF::DestructionThread)0>::deref() const::{lambda()#1}::operator()() const () at /usr/src/debug/build/WTF/Headers/wtf/ThreadSafeRefCounted.h:117
#9  WTF::ThreadSafeRefCounted<API::Object, (WTF::DestructionThread)0>::deref() const () at /usr/src/debug/build/WTF/Headers/wtf/ThreadSafeRefCounted.h:129
#10 WTF::DefaultRefDerefTraits<WebKit::WebsiteDataStore>::derefIfNotNull(WebKit::WebsiteDataStore*) () at /usr/src/debug/build/WTF/Headers/wtf/RefPtr.h:42
#11 WTF::RefPtr<WebKit::WebsiteDataStore, WTF::RawPtrTraits<WebKit::WebsiteDataStore>, WTF::DefaultRefDerefTraits<WebKit::WebsiteDataStore> >::~RefPtr() () at /usr/src/debug/build/WTF/Headers/wtf/RefPtr.h:74
#12 _WebKitWebsiteDataManagerPrivate::~_WebKitWebsiteDataManagerPrivate() () at /usr/src/debug/webkitgtk-2.36.3/Source/WebKit/UIProcess/API/glib/WebKitWebsiteDataManager.cpp:100
#13 webkit_website_data_manager_finalize() () at /usr/src/debug/webkitgtk-2.36.3/Source/WebKit/UIProcess/API/glib/WebKitWebsiteDataManager.cpp:118
#14 0x00007ffff4269b35 in g_object_unref (_object=<optimized out>) at ../glib/gobject/gobject.c:3678
#15 g_object_unref (_object=0x7fffe825bd20) at ../glib/gobject/gobject.c:3553
#16 0x00007ffff57a53ac in WTF::derefGPtr<_WebKitWebsiteDataManager>(_WebKitWebsiteDataManager*) () at /usr/src/debug/build/WTF/Headers/wtf/glib/GRefPtr.h:269
#17 WTF::GRefPtr<_WebKitWebsiteDataManager>::~GRefPtr() () at /usr/src/debug/build/WTF/Headers/wtf/glib/GRefPtr.h:82
#18 _WebKitWebContextPrivate::~_WebKitWebContextPrivate() () at /usr/src/debug/webkitgtk-2.36.3/Source/WebKit/UIProcess/API/glib/WebKitWebContext.cpp:205
#19 webkit_web_context_finalize() () at /usr/src/debug/webkitgtk-2.36.3/Source/WebKit/UIProcess/API/glib/WebKitWebContext.cpp:305
#20 0x00007ffff4269b35 in g_object_unref (_object=<optimized out>) at ../glib/gobject/gobject.c:3678
#21 g_object_unref (_object=0x7fffe8246140) at ../glib/gobject/gobject.c:3553
#22 0x00007ffff2040ef5 in __run_exit_handlers (status=0, listp=0x7ffff21fe778 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:113
#23 0x00007ffff2041070 in __GI_exit (status=<optimized out>) at exit.c:143
#24 0x00007ffff2029297 in __libc_start_call_main (main=main@entry=0x5555556c8630 <main>, argc=argc@entry=3, argv=argv@entry=0x7fffffffd818) at ../sysdeps/nptl/libc_start_call_main.h:74
#25 0x00007ffff202934a in __libc_start_main_impl (main=0x5555556c8630 <main>, argc=3, argv=0x7fffffffd818, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffd808) at ../csu/libc-start.c:392
#26 0x00005555556c70e5 in _start () at ../sysdeps/x86_64/start.S:115

The good news: We have a backtrace. The bad news: It is a backtrace straight from hell. The good thing is that we can now understand how the crash happens after the test passed: The Webkit/GTK functions called by the unit test uses atexit to register a function (g_object_unref, the #22 in this bt) that is called upon termination. This function aborts. Why? Because #5 in this bt triggers this assertion:

static HashMap<PAL::SessionID, WebsiteDataStore*>& allDataStores()
{
    RELEASE_ASSERT(isUIThread());
    static NeverDestroyed<HashMap<PAL::SessionID, WebsiteDataStore*>> map;
    return map;
}

Apparently Webkit/GTK does not like the fact that the unit test ran on a different thread than the atexit handler. We can confirm this theory: A unit test containing

gtk4::init().unwrap();
MercuryParser::parse("");

will crash with the bt above, but if we put this into fn main it runs fine. Or does it? If we build the webkit2gtk-5.0 package in debug mode, cargo run outputs

LEAK: 1 WebProcessPool

and exits with exit code 0. With webkit2gtk-5.0 cargo test still crashes after the unit tests are completed, but the crash is slightly different:

ASSERTION FAILED: !m_impl || !m_shouldEnableAssertions || m_impl->wasConstructedOnMainThread() == isMainThread()
/usr/src/debug/build/WTF/Headers/wtf/WeakPtr.h(148) : T* WTF::WeakPtr< <template-parameter-1-1>, <template-parameter-1-2> >::operator->() const [with T = IPC::MessageReceiver; Counter = WTF::EmptyCounter]

Thread 1 "news_flash_gtk-" received signal SIGABRT, Aborted.
LEAK: 4 WebCoreNode
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
44        return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007fffeda8e3d3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2  0x00007fffeda3e838 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007fffeda28535 in __GI_abort () at abort.c:79
#4  0x00007ffff1ab152d in  () at /usr/lib/libwebkit2gtk-5.0.so.0
#5  0x00007ffff1ab19b9 in IPC::MessageReceiverMap::invalidate() [clone .cold] () at /usr/lib/libwebkit2gtk-5.0.so.0
#6  0x00007ffff252489a in WebKit::WebProcessPool::~WebProcessPool() () at /usr/lib/libwebkit2gtk-5.0.so.0
#7  0x00007ffff25257dd in WebKit::WebProcessPool::~WebProcessPool() () at /usr/lib/libwebkit2gtk-5.0.so.0
#8  0x00007ffff25ff929 in webkitWebContextDispose(_GObject*) () at /usr/lib/libwebkit2gtk-5.0.so.0
#9  0x00007ffff08b1a64 in g_object_unref (_object=<optimized out>) at ../glib/gobject/gobject.c:3636
#10 g_object_unref (_object=0x7fffe40cd2b0) at ../glib/gobject/gobject.c:3553
#11 0x00007fffeda40ef5 in __run_exit_handlers (status=0, listp=0x7fffedbfe778 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:113
#12 0x00007fffeda41070 in __GI_exit (status=<optimized out>) at exit.c:143
#13 0x00007fffeda29297 in __libc_start_call_main (main=main@entry=0x5555556c8630 <main>, argc=argc@entry=3, argv=argv@entry=0x7fffffffd5e8) at ../sysdeps/nptl/libc_start_call_main.h:74
#14 0x00007fffeda2934a in __libc_start_main_impl (main=0x5555556c8630 <main>, argc=3, argv=0x7fffffffd5e8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffd5d8) at ../csu/libc-start.c:392
#15 0x00005555556c70e5 in _start () at ../sysdeps/x86_64/start.S:115
(gdb)

I am sorry that there are missing debug symbols. Building the debug symbols myself fails as described in “WebKit Debug Symbols” and the debug symbols from https://debuginfod.archlinux.org are for a differently build package. I could take a look at how the debug symbols from https://debuginfod.archlinux.org are build and use this to create working debug symbols. But I chose against it as it would probably have a bad cost/reward ratio. Anyway, we can still see that the crash is in an atexit function that complains it runs on the wrong thread.

D_GLIBCXX_DEBUG leads to Another Warning

Forget for a second the problems of the previous sections, I found another bug, more or less by chance: If I compile webkitgtk with -D_GLIBCXX_DEBUG and put

gtk4::init().unwrap();
MercuryParser::parse("");

into fn main, it still prints LEAK: 1 WebProcessPool, but an additional warning appears:

/usr/include/c++/12.1.0/bits/stl_heap.h:209:
In function:
    constexpr void std::push_heap(_RAIter, _RAIter, _Compare) [with _RAIter
    = WebCore::TimerHeapIterator; _Compare =
    WebCore::TimerHeapLessThanFunction]

Error: comparison doesn't meet irreflexive requirements, assert(!(a < a)).

Objects involved in the operation:
    instance "functor" @ 0x7fff0154ba9f {
    }
    iterator::value_type "ordered type"  {
    }

The documentation about the -D_GLIBCXX_DEBUG flag says:

Note that this flag changes the sizes and behavior of standard class templates such as std::vector, and therefore you can only link code compiled with debug mode and code compiled without debug mode if no instantiation of a container is passed between the two translation units.

Compiling everything with this flag would too much work, so I manually added the same check to stl_heap.h:

if (!(__first == __last || !__comp(*__first, *__first))) {
  __builtin_trap();
}

and build webkit2gtk-5.0 with this modified library. Running everything works and it does not trap, so I guess our comparison doesn't meet irreflexive requirements warning was just an artifact of us compiling one part with -D_GLIBCXX_DEBUG and the other part without it. Right?

Giving Up

So, where was I again? We have a unit tests that makes the exit handler crash, and we have a backtrace. However, heavy-hearted I decided to give up on fixing it. Why? The code crashes because the rust code calls webkit in a way you should not call it. But how are you supposed to call it? Surely it has proper documentation, you might say. You are partially right - webkit has good documentation. But we are not using webkit: There are four different versions of webkit: 1. The main version of webkit from Apple 2. The GTK3 fork of webkit which has a Rust wrapper 3. The GTK4 version of it 4. A custom fork of the GTK4 version Each step down in that list is a step down in the quality of documention. (To be fair, 3. and 4. are nearly identical.) Other people seem to agree with me. The basic usuage example of the GTK3 fork works. The basic usuage example of the GTK4 fork is broken, it is just a copy of the GTK3 example and will not compile with GTK4. Also, take a look at these two programs:

#include "webkit2/webkit2.h"
int main() { auto view = webkit_web_view_new(); }

and

#include "webkit2/webkit2.h"
#include <thread>
int main() {
  std::thread helper(webkit_web_view_new);
  helper.join();
}

Both of them run fine with a release version of webkit2gtk-5.0, but with a debug version of webkit2gtk-5.0, the first one warns

LEAK: 1 WebProcessPool
LEAK: 1 WebPageProxy

and the second one aborts with

ASSERTION FAILED: Completion handler should always be called
!m_function
/usr/src/debug/build/WTF/Headers/wtf/CompletionHandler.h(59) : WTF::CompletionHandler<Out(In ...)>::~CompletionHandler() [with Out = void; In = {WTF::HashMap<WTF::String, std::unique_ptr<WebKit::DeviceIdHashSaltStorage::HashSaltForOrigin, std::default_delete<WebKit::DeviceIdHashSaltStorage::HashSaltForOrigin> >, WTF::DefaultHash<WTF::String>, WTF::HashTraits<WTF::String>, WTF::HashTraits<std::unique_ptr<WebKit::DeviceIdHashSaltStorage::HashSaltForOrigin, std::default_delete<WebKit::DeviceIdHashSaltStorage::HashSaltForOrigin> > >, WTF::HashTableTraits>&&}]

So, it seems we need to properly destruct our WebView Object. Surely, the documentation can tell us how we should properly destruct it. Nope. There is no Tutorial, no Guide, no usuage example - nothing. I am not looking forward to touching 3 million lines of C++ code with bad API documention, so I gave up. At the time, I was at the GPN20 and everyone ran away once they heard the word “webkit”.

Update: There actually is an example program, called MiniBrowser in the source tree.

More Failing Unit Tests

I said before that cargo test reports the success of 8 of 8 unit tests, then aborts. That is true for news_flash_gtk, but news_flash_gtk has a dependency called news_flash. Running the unit tests of news_flash shows 15 fails, apparently with a common reason.

Using git bisect, I found that commit 3c840c2b introduced this bug by adding another database migration. I should note here, that if news_flash wants to setup a new database, it sets up an emtpy one, then runs every database migration they ever did to arrive at the current version. A dump of the so constructed database showed the problem:

CREATE TABLE taggings (
        article_id TEXT NOT NULL REFERENCES "_articles_old"(article_id),
        tag_id TEXT NOT NULL REFERENCES tags(tag_id),
        PRIMARY KEY (article_id, tag_id)
);

I should note that no _articles_old table exists. If I concatenate all database migrations and run them, instead of running them using diesel, the constructed database contains

CREATE TABLE taggings (
        article_id TEXT NOT NULL REFERENCES articles(article_id),
        tag_id TEXT NOT NULL REFERENCES tags(tag_id),
        PRIMARY KEY (article_id, tag_id)
);

(the articles table exists) and everything works fine. I used diesel_logger to log all SQL queries executed by diesel. But strangely, executing those queries did not reproduce the problem. It turned out that there is a bug in diesel_logger: It does not log batch_execute queries. This bug is fixed in master, and I made a PR to improve the documentation. Instead of using this logger, I ended up putting raw println! statements in the diesel source code, to get the output format right. Executing the logged queries did reproduce the problem. The difference between the logged queries and the concatination of all migrations is that diesel wraps every migration in a BEGIN COMMIT block. Reducing the log and the concatenation leads to this minimal example:

PRAGMA foreign_keys = ON;
CREATE TABLE article (
    id TEXT
);
CREATE TABLE other (
    id TEXT REFERENCES article(id)
);
BEGIN;
PRAGMA legacy_alter_table=ON;
PRAGMA foreign_keys=OFF;
ALTER TABLE article RENAME TO old;
COMMIT;

This creates the following database:

PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE IF NOT EXISTS "old" (
    id TEXT
);
CREATE TABLE other (
    id TEXT REFERENCES "old"(id)
);
COMMIT;

If BEGIN and COMMIT are removed, it creates this database.

PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE IF NOT EXISTS "old" (
    id TEXT
);
CREATE TABLE other (
    id TEXT REFERENCES article(id)
);
COMMIT;

Notice the difference in line 7. The reason for the difference can be found in the SQLite docs of PRAGMA foreign_keys:

This pragma is a no-op within a transaction; foreign key constraint enforcement may only be enabled or disabled when there is no pending BEGIN or SAVEPOINT.

One way to fix this is to do a PRAGMA foreign_keys=OFF before the migrations and a PRAGMA foreign_keys=ON afterwards. Unfortunately, while I was working on fixing this bug, another fix landed.

Frustrated German Noises

Note to myself: If you are debugging something, debug it on the latest relase and check every day if the bug still exists on master. Or even better: Talk to the other programmers.

Failing Package Build

Now, back to our database problem. This bug only occured when I build newsflash myself, but the version I got with pacman -S newsflash worked fine. So, lets try to build the newsflash package myself:

 volker   Documents  cloned  newsflash  wget https://raw.githubusercontent.com/archlinux/svntogit-community/packages/newsflash/trunk/PKGBUILD
 volker   Documents  cloned  newsflash  makepkg
...
Running custom install script '/usr/bin/python /home/volker/Documents/cloned/newsflash/src/news_flash_gtk-1.5.1/build-aux/meson_post_install.py'
rm: cannot remove '/home/volker/Documents/cloned/newsflash/pkg/newsflash/build': No such file or directory
==> ERROR: A failure occurred in package().
    Aborting...
 volker   Documents  cloned  newsflash      
Is there anything, anything that is not broken? If I build it in a clean chroot, it works. I searched for the reason, fixed it and send the following diff to the maintainer of the PKGBUILD:
<   rm -r "$pkgdir"/build
---
> 
>   # Prior to 3ce5d93, news_flash_gtk/data/meson.build contained the lines
>   # gnome.compile_resources(
>   #   'symbolic_icons',
>   #   'symbolic_icons.gresource.xml',
>   #   gresource_bundle: true,
>   #   source_dir: meson.current_build_dir(),
>   #   install: true,
>   #   install_dir: join_paths(meson.source_root(), 'data/resources/gresource_bundles')
>   # )
>   # gnome.compile_resources(
>   #   'ui_templates',
>   #   'ui_templates.gresource.xml',
>   #   gresource_bundle: true,
>   #   source_dir: meson.current_build_dir(),
>   #   install: true,
>   #   install_dir: join_paths(meson.source_root(), 'data/resources/gresource_bundles')
>   # )
>   # This causes symbolic_icons.gresource and ui_templates.gresource to be
>   # installed in a weird path, the concatination of DESTDIR and the absolute
>   # path of *.gresource in the source tree, to be exact. We want neither those
>   # *.gresource files, nor their parent directories in our package. Properly
>   # finding those parent directories is tricky so we just delete everything
>   # except the usr folder. If you would build this package inside your /usr
>   # folder, this would create problems, but I am too lazy to fix that and the
>   # next release of news_flash_gtk will contain a fix anyway.
> 
>   # If e.g. you build the package in a clean chroot, as
>   # explained in
>   # https://wiki.archlinux.org/title/DeveloperWiki:Building_in_a_clean_chroot ,
>   # this will delete "$pkgdir"/build which is
>   # /build/newsflash/pkg/newsflash/build . If you e.g. build this package in
>   # your home directory, this will delete
>   # /home/username/pkgbuildfolder/pkg/newsflash/home
>   # which contains
>   # /home/username/pkgbuildfolder/pkg/newsflash/home/username/pkgbuildfolder/src/news_flash_gtk-1.5.1/data/resources/gresource_bundles/symbolic_icons.gresource
> 
>   find "$pkgdir" -mindepth 1 -maxdepth 1 ! -name 'usr' -exec rm -r {} +

Version Trouble

Now, lets run the newest master version of news_flash_gtk: On this version, rendering does not work at all, the article is simple one big white space. (There is no text at all, not even white text as can be confirmed by attempting to highlight it.) git bisect does not work nicely because there are quite a lot of commits where the build is broken, but at the end I managed to find the following:

Did I say it does not render anything? It does not anything on my desktop, but it works perfectly on my laptop. Is there a single kind of bug we did not encounter in this yak-shaving? Now we have a fucking platform dependent bug! My desktop and laptop are nearly identical Arch Linux machines.

Turns out that this is a known issue with WebKit and Nvidia GPUs.

Finishing Up

This story ends here. As I said in the abstract, nothing interesting, just bug-fixing.