Notes on LLM GUIs

This week’s notes come a little earlier, partly because of an upcoming long weekend and partly because I’ve been mulling the LLM space again due to the close release of both llama3 and phi-3.

Thanks to my I’ve been spending a fair bit of time seeing how feasible it is to run these “small” models on a slow(ish), low-power consumer-grade GPU (as well as more ARM hardware that I will write about later), and I think we’re now at a point where these things are finally borderline usable for some tasks and the tooling is finally becoming truly polished.

As an example, I’ve been playing around with dify for a few days. In a nutshell, it is a node-based visual environment to define chatbots, workflows and agents that can run locally against llama3 or phi-3 (or any other LLM model, for that matter) and makes it reasonably easy to define and test new “applications”.

It’s pretty great in the sense that it is a docker-compose invocation away and the choice of components is sane, but it is, like all modern solutions, just a trifle too busy:

$ docker ps
CONTAINER ID   IMAGE                              COMMAND                  CREATED        STATUS                  PORTS                               NAMES
592917c4d941   nginx:latest                       "/docker-entrypoint.…"   36 hours ago   Up 36 hours             0.0.0.0:80->80/tcp, :::80->80/tcp   me_nginx_1
2942ff1a3983   langgenius/dify-api:0.6.4          "/bin/bash /entrypoi…"   36 hours ago   Up 36 hours             5001/tcp                            me_worker_1
bc1319008a7a   langgenius/dify-api:0.6.4          "/bin/bash /entrypoi…"   36 hours ago   Up 36 hours             5001/tcp                            me_api_1
2265bb3bc546   langgenius/dify-sandbox:latest     "/main"                  36 hours ago   Up 36 hours                                                 me_sandbox_1
aed35da737f9   langgenius/dify-web:0.6.4          "/bin/sh ./entrypoin…"   36 hours ago   Up 36 hours             3000/tcp                            me_web_1
824dfb63f2d0   postgres:15-alpine                 "docker-entrypoint.s…"   36 hours ago   Up 36 hours (healthy)   5432/tcp                            me_db_1
bbf486e08182   redis:6-alpine                     "docker-entrypoint.s…"   36 hours ago   Up 36 hours (healthy)   6379/tcp                            me_redis_1
b4280691205d   semitechnologies/weaviate:1.19.0   "/bin/weaviate --hos…"   36 hours ago   Up 36 hours                                                 me_weaviate_1

It has most of what you’d need to do RAG or ReAct except things like database and filesystem indexing, and it is sophisticated to the point where you can not just provide your models with a plethora of baked-in tools (for function calling) but also define your own tools and API endpoints to call–it is pretty neat, really, and probably the best all-rounded, self-hostable graphical environment I’ve come across so far.

The examples are... naïve, but workable, and the debugging is actually useful.

But I have trouble building actual useful applications with it, because none of the data I want to use is accessible to it, and it can’t automate the things I want to automate because it doesn’t have the right hooks. By shifting everything to the web, we’ve foregone the ability to, for instance, index a local filesystem or e-mail archive, interact with desktop applications, or even create local files (like, say, a PDF).

And all the stuff I want to do with local models is, well, local. And still relies on things like Mail.app and actual documents. They might be in the cloud, but they are neither in the same cloud nor are they accessible via uniform APIs (and, let’s face it, I don’t want them to be).

This may be an unpopular opinion in these days of cloud-first everything, but the truth is that I don’t want to have to centralize all my data or deal with the hassle of multiple cloud integrations just to be able to automate it. I want to be able to run models locally, and I want to be able to run them against my own data without having to jump through hoops.

On the other hand, I am moderately concerned with control over the tooling and code that runs these agents and workflows. I’ve had great success in using to (partly because it can be done “upstream” without need for any local data), and there’s probably nothing I can do in dify that I can’t do in Node-RED, but building a bunch of custom nodes for this would take time.

Dropping back to actual code, a cursory review of the current state of the art around langchain and all sorts of other LLM-related projects shows that code quality still leaves a lot to be desired1, to the point where it’s just easier (and more reliable) to write some things from scratch.

But the key thing for me is that creating something genuinely useful and reliable that does more than just spew text is still… hard. And I really don’t like that the current state of the art is still so focused on content generation.

One of dify's predefined workflows for generating garbage SEO articles

It’s not the prompting, or the endless looping and filtering, or even the fact that many agent frameworks actually generate their own prompts to a degree where they’re impossible to debug–it’s the fact that the actual useful stuff is still hard to do, and LLMs are still a long way from being able to do it.

They do make for amazing Marketing demos and autocomplete engines, though.


  1. I recently came across a project that was so bad that it just had to be AI-generated. The giveaway was the complete comment coverage of things that were essentially re-implementations of stuff in the Python standard library in the typical “Java cosplay” style object-oriented code that is so prevalent in the LLM space. It was that bad. ↩︎

Notes for April 15-21

I ended up throwing my back out on early in the week, so most of my time was spent in comical pain, moving around like a crab on stilts and trying to get some work done in between bouts of lying down, watching Fallout and reading Scalzi’s Starter Villain, which was actually quite fun.

Read More...

The BSP D8 Bluetooth Game Controller

Since it seems to be , I thought I’d rewind back to my , when as a concession to the need to relax I decided to pack some form of gaming device. But since I also wanted to minimize packing, I settled on a game controller and using my iPad Pro for light gaming.

Read More...

Notes for April 8-14

In short, I spent a fair chunk of my time dabbling with LLMs again, but also still dealing with shifting priorities at work.

Read More...

iGPU Compute and LLMs on the AceMagic AM18

As part of my forays into LLMs and GPU compute “on the small” I’ve been playing around with the AceMagic AM18 in a few unusual ways. If you missed my , you might want to check them out first.

Read More...

Notes for April 1-7

Guess what, anyway. Easter Break helped reset my expectations towards work and this week I managed to get back into the swing of things, with a fair bit of writing and documentation work done on my own time.

Read More...

The Keychron K7 Max

This one is going to be a bit of a nostalgia trip, so bear with me.

Read More...

The Bunnies Are Loose

Easter break was a bit different this year, but we still managed to pop over to the countryside for a (wet) couple of days.

Read More...

Taking a Break From Spherical Cows

I have been having a highly unusual couple of weeks (which included recovering from a bout of food poisoning that hit the day after my ), and I am now taking a break from my usual routine to try to get some perspective on things.

Read More...

The AceMagic AM18 Linux Gaming Experience

If you’ve been tracking my , you’ll know that I’ve been on the lookout for a small, powerful, and quiet Ryzen desktop and mini-server. I’ve been using a for almost three years now and it’s been a great experience, but I wanted to explore the desktop side of things, especially since my have shown that ARM-based servers are at an interesting inflection point for running small AI models, and I just had no real data on AMD‘s capabilities in that space.

Read More...

On Notes, This Site and Other Things

Somewhat like , I decided to put my on hold again–just a bit earlier than Easter this time.

Read More...

Notes for March 11-12

When things are so-so, they can always get worse. Much worse, often in multiple dimensions at once.

Read More...

Notes for March 4-10

A busy, exhausting work week punctuated by surreal moments that shall remain forever .

Read More...

Archives3D Site Map