On Things I can't Code with LLMs
I’m a big believer in using LLMs for coding. I think on average, it makes skilled coders 2-5x faster and more efficient at what they do. When I’m coding I use LLMs like chatgpt and claude for basically everything: new endpoints, database models, frontend components, configs, essentially everything you could imagine. Our entire API is LLM generated, docs etc
Over the past few months we’ve been building the voice-only version of flyflow. The backend is fairly complex: there are roughly 10 goroutines (lightweight threads), async processing data for each call with many database objects (saves, fetches etc), many inference calls, all happening all at once, and latency really really matters. It’s the first time that I’ve really noticed I can’t use LLMs for coding.
I was trying to describe this today and realized that the reason is I can’t concretely explain what I want, across multiple files, in plain english. For most other things there’s a plain english query that gets me most of the way there. For whatever reason the flyflow voice streaming infrastructure, I can’t hold in my head everything I want, I just have an intuition about what it should look like and I need to “feel“ all of the pieces come together for myself.
Most of the queries I throw into chatgpt look something like the following:
“I want a dockerfile for this python project in python 3.11 with a requirements file requirements.txt that serves app.py using wsgi on port 5000“
“I want an endpoint in golang using gorillas that takes db model [insert gorm model definition] and creates a paginated list based on created_at“
“I want a plot that uses dataframe [insert head() of dataframe] in matplotlib that shows x data over time“
All of these things require context, but the LLM can handle it by parroting back what it knows. The flyflow voice streaming infrastructure goes beyond just parroting back code from a description, you can’t put all of that in a prompt: it requires real thinking and reasoning to come to a conclusion of what it should look like. It’s a weird line that I’m still working on describing, but think is a unique still that humans still have that LLMs don’t.
Generally, I think the “LLMs won’t replace software engineers“ is coping from engineers who love their high-paying jobs, but it’s possible this is something they really can’t do. Maybe as we scale LLMs we get there, I don’t know. My bet would be though that this skill is embedded in our internal representation of the world, which is very separate from the internet text LLMs train on. If GPT-5 breaks through this I will be a true believer.