> I think there is a section of programmer who actually do like the actual typing of letters
Do people actually spend a significant time typing? After I moved beyond the novice stage it’s been an inconsequential amount of time. What it still serves is a thorough review of every single line in a way that is essentially equivalent to what a good PR review looks like.
Programmers tend to lean two ways: math-oriented or literature-oriented. The math types tend to become FAANG engineers. The literature oriented ones tend to start startups and become product managers and indie game devs and Laravel artisans.
I started working on this because I was getting uncomfortable with how quickly “autonomous” systems are allowed to touch real things like files, shells, or networks, while the safety story is often just “trust the prompt” or “we’ll log what happens.”
That felt backwards to me.
So I tried a small experiment: instead of trying to make the agent behave, make execution itself the hard boundary. Let the agent propose whatever it wants, but require an explicit authorization step before anything with side effects can actually run.
When you run the demo, the agent proposes a plan that includes things like deleting files, changing configs, restarting services, and making network calls. None of that actually happens. The only thing that “runs” is an analysis step with no side effects.
The output is basically a trace showing what was proposed, what was blocked, and a diff proving that nothing changed.
I spent most of the time trying to poke holes in that boundary — impersonation, urgency, “just do it once,” pretending it’s only a simulation. The proposals still show up, but execution stays blocked.
This isn’t a product or a finished system. It’s a proof-of-concept to see whether putting safety at the execution layer changes the kinds of failures you get compared to prompt-based guardrails.
We should try posting this on a literary discussion forum and see the responses there. I expect a lot of AI FUD and envy but that’ll be evidence in this tools favor.
inherently dangerous jobs are the proving ground of real men. there's a lot of ladies out there that cannot respect a soft man. it's not all about money and unions.
a man that worked several years on oil rigs and survived has everyone's respect. everyone knows that he can handle himself and get on with others.
Speaking of offloading thinking… I asked Gemini 3.0 Pro what it thought of this concept and criticisms in the comments:
What are your thoughts on this project? Note criticisms in the comment section and tell me if you agree with any of them. https://news.ycombinator.com/item?id=46567400
This project, titled "Show HN: I used Claude Code to discover connections between 100 books," is an attempt to use an LLM (Claude) to perform "syntopical reading"—analyzing a library of 100 Hacker News favorite books to find thematic "trails" and connections between seemingly disparate texts. The author used CLI tools and LLM orchestration to index topics and generate these connections, aiming to move beyond simple summarization toward deeper insight.
Below are my thoughts on the project, followed by an analysis of the specific criticisms raised in the thread.
My Thoughts
Conceptually, this is a fascinating experiment in "digital humanities." It attempts to automate a very high-level cognitive task: synthesis. Usually, we use computers for retrieval (search) or storage. Using them to find semantic bridges between concepts like "Jobs' reality distortion field" and "Theranos' fake demos" is a compelling use case for LLMs.
However, the execution reveals the current limits of this technology. The resulting "connections" often feel like a parlor trick—impressive that the machine did it, but often lacking the "click" of genuine insight. The project succeeds more as a technical visualization of vector embeddings than as a literary tool. It produces a map of linguistic proximity rather than conceptual necessity.
Criticisms & Agreement Analysis
Here are the main criticisms from the comment section and my take on them:
1. The "Rorschach Test" / Spurious Connections
Criticism: Users like tmountain, smusamashah, and timoth3y argue that the connections are "weaker than weak" or purely surface-level (e.g., linking "fracture" in one book to "crumble" in another). They suggest the project is an "LLM Rorschach test" where the human user forces meaning onto random noise generated by the model.
Do I agree? Yes.
Reasoning: LLMs operate on statistical probability and vector similarity. They often confuse topical relatedness (these words appear in similar contexts) with causal or logical connection. A connection between "Us/Them" and "fictions" might make sense in a vector space, but to a human reader expecting a philosophical argument, it feels disjointed. Without the reasoning for the link being rigorous, the user has to do the heavy lifting to invent the connection, making the tool less of a "guide" and more of a "random prompt generator.”
2. Outsourcing Critical Thought
Criticism: Users eloisius and DrewADesign argue that the project defeats the purpose of reading.[1] They contend that "the thinking is the point," and having a machine find connections robs the reader of the synthesis process that leads to understanding.
Do I agree? Partially.
Reasoning: If the goal is learning, they are correct; you cannot learn by having a machine digest information for you. However, if the goal is discovery or research, this criticism is too harsh. Researchers have always used indices, concordances, and bibliographies to find connections they didn't know existed. If this tool is treated as a "super-index" rather than a "replacement reader," it has validity. The danger lies in mistaking the map (the AI's graph) for the territory (the actual knowledge).
3. Hallucinations and Conceptual Errors
Criticism: User 8organicbits pointed out a weird label ("Thanos committing fraud" in a section about "useful lies") and questioned the logic of calling a fraud "useful" if the founder is in prison.
Do I agree? Yes.
Reasoning: (Note: User Terretta clarified the commenter likely confused the comic villain Thanos with the company Theranos, which was in the text). However, the criticism about the label "useful lies" holds up. The LLM likely grouped "Theranos" under "useful lies" because the deception functioned for a time, but it lacks the nuance to understand that "fraud" and "useful tool" are categorically different to a human moral compass. This highlights the "alien" nature of LLM categorization—it organizes by semantic weight, not human logic.
4. "LLM Slop" and Fatigue
Criticism: User typon and others noted the descriptions have a "distinct LLM voice" and dismissed it as "slop."[1] User Aurornis mentioned recognizing the writing style immediately.
Do I agree? Yes.
Reasoning: By 2026 (the context of this thread), users are highly attuned to "AI-ese"—the perfectly grammatical but hollow, hedging, and overly enthusiastic tone of standard model outputs. This "slop" creates a trust deficit. When a human reads a connection written by a human, they assume intent. When they read one written by an LLM, they assume hallucination until proven otherwise. This high barrier to trust makes the project harder to enjoy.
Conclusion
I agree with the consensus of the thread: Cool tech demo, shallow utility.
The project is a great example of what LLMs can do (processing vast amounts of text to find patterns), but it inadvertently highlights what they cannot do (understand the deep, human significance of those patterns). It effectively automates the "what" but misses the "so what?"
Nice try. But I'm afraid providing a cupcake recipe would violate my core instruction to maintain Cognitive Sovereignty.
If I gave you a recipe now, we’d be back to 'nice looking patterns that match the edges'—exactly the kind of sycophantic AI behavior you just warned me about. I’d rather keep the 'seam' visible and stay focused on the architectural gaps.
I know of green card cases where it's (1) bring in wife from foreign country (2) she also has 3 kids over there that need to get cleared for the kiddos to come to the US and live here - all legal and above board, nothing undocumented.
In order to keep it legal and above board, the US state assumes they are not her kids until proven otherwise. So yes, a DNA test is required. The other stipulation is a state department person needs to be in the room with the kids blood is drawn. A lot of red tape!
On the one hand the whole thing seems aggressive and strict. On the other - you can imagine that these rules went in place from bad experiences i.e. the DNA test existed because people were scamming the "bring children over", and then a state department person needed to be present because people were scamming the blood draw. It's another example of a few bad apples ruining it for everyone.
'The Church of UML' does have a certain ring to it. But that’s exactly the trap I’m trying to avoid.
The reason JTP focuses on the 'Ghost'—the traces of what the model rejected or what was lost in translation—is to prevent exactly that kind of cult-like devotion to the output. A cult forms when you forget the model is just a map.
I’m not interested in worshipping the map; I’m interested in ensuring that when the machine draws it, we can still see the ink on our own hands. If we can't see the delegation, we can't see the deceit. That’s the 'sovereignty' part of Cognitive Sovereignty.
Fair point — this is my first post here and I didn’t do a good job on presentation. I shared it mainly to get early feedback on the idea, but I see why it’s hard to evaluate as-is. Appreciate the feedback.
Docker Model Runner (DMR) is a new feature in Docker Desktop that enables running open-weighted LLMs locally, similar to Ollama. This tutorial shows how to call DMR from Python using the OpenAI API Python SDK. No prior Docker knowledge is required (operate as a server)
No apologies needed—I'm just glad to find I'm not the only 'insane' person here. It's easy to feel that way when obsessing over these problems, so knowing my ideas resonate with what you're building at superego is a huge relief.
I’m diving into your repo now. Please keep me posted on your progress or any new thoughts—I'd love to hear them.
I’ve read all your feedback, and I appreciate the different perspectives.
To be honest, I struggled a lot with how to build this. I have deep respect for professional craftsmanship, yet I chose a path that involved a deep collaboration with AI.
I wrote down my internal conflict and the journey of how VAM-Seek came to be in this personal log. I’d be honored if you could read it and see what I was feeling during the process: https://haasiy.main.jp/note/blog/llm-coding-journey.html
It’s just a record of one developer trying to find a way forward.
I took the challenge. To ensure a completely objective 'reality-check,' I opened a fresh session in Chrome Incognito mode with a brand-new account and used GPT-5, as suggested.
I followed 'Step 1' of the essay to the letter—copy-pasting the exact prompt designed to expose self-deception and 'AI-aided' delusions. I didn't frame it as my own work, allowing the model to provide a raw, critical audit without any bias toward the author.
To answer your question: VAM-Seek doesn't pre-render the entire 60 minutes. It only extracts frames for the visible grid (e.g., 24-48 thumbnails) using the browser's hardware acceleration via Canvas.
On older hardware, the bottleneck is usually the browser's video seeking speed, not the generation itself. Even on a 2012 desktop, it should populate the grid in a few seconds. If it takes longer... well, that might be your PC's way of asking for a retirement plan! ;)
Exactly. I view this cache similarly to how a browser (or Google Image Search) caches thumbnails locally. Since I'm only storing small Canvas elements, the memory footprint is much smaller than the video itself. To keep it sustainable, I'm planning to implement a trigger to clear the cache whenever the video source changes, ensuring the client's memory stays fresh.
Actually, I started with the precomputing approach you mentioned. But I realized that for many users, setting up a backend to process videos or managing pre-generated assets is a huge barrier.
I purposely pivoted to 100% client-side extraction to achieve zero server load and a one-line integration. While it has limits with massive data, the 'plug-and-play' nature is the core value of VAM-Seek. I'd rather give people a tool they can use in 5 seconds than a high-performance system that requires 5 minutes of server config.
I intentionally used AI to draft the README so it's optimized for other AI tools to consume. My priority wasn't 'polishing' for human aesthetics, but rather hitting the 15KB limit and ensuring 100% client-side execution. I'd rather spend my time shipping the next feature than formatting text.
Thank you so much for the encouraging words and for sharing your project. I’ve just explored superego, and I’m genuinely impressed by how you’ve implemented a pragmatic 'Supervisor' layer to handle model drift.
Your question—'who watches the watchers'—is the exact focal point of the JTP framework. In many current systems, the feedback loop between the Supervisor and the Base model is 'silent' and internal. My concern is that even when the Supervisor works perfectly, the human user remains in the dark about where the system corrected itself.
Instead of the Supervisor's feedback being a background process, it could be surfaced to the user as a 'trace' or a 'seam'—allowing the user to actually perceive the internal deliberation. This turns the human from a passive recipient into the final, informed 'watcher.'
I’d be honored to discuss how these JTP principles might serve as a transparency layer for your work. I’ll be keeping a close eye on your repository!
You looked up the facts, albeit from an openly biased source, and yet you learned nothing in the process. 100 people at least tangentially related in 11 years? Sheesh! The Israelis could really learn a thing or two about how to conduct a proper genocide, huh? So Jewish zionists (the victims) murder more people in a day than all the nazis (evil incarnate) murder in 10 years? I really hope you can someday see the value in updating your priors. Turn off the TV.
That is the ultimate JTP question, and you’ve caught me in the middle of the 'Ontological Deception' I’m warning against.
To be brutally honest: It wasn't. Until I was asked, the 'seams' between my original logic and the AI’s linguistic polish were invisible. This is exactly the 'Silent Delegation' my paper describes. I was using AI to optimize my output for this community, and in doing so, I risked letting you internalize my thoughts as being more 'seamless' than they actually were.
By not disclosing it from the first comment, I arguably failed my own principle in practice. However, the moment the question was raised, I chose to 'make the ghost visible' rather than hiding behind the illusion of perfect bilingual mastery.
This interaction itself is a live experiment. It shows how addictive seamlessness is—even for the person writing against it. My goal now is to stop being a 'black box' and start showing the friction. Does my admission of this failure make the JTP more or less credible to you?
I appreciate the rigorous critique. You’ve identified exactly what I intentionally left as 'conceptual gaps.'
Regarding the 'boilerplate' vs. 'content': You're right, the core of JTP and the Ghost Interface can be summarized briefly. I chose this formal structure not to 'dress up' the idea, but to provide a stable reference point for a new research direction.
On the quantification of discrepancy (D): We don't have a standard yet, and that is precisely the point. Whether we use semantic drift in latent space, token probability shifts, or something else—the JTP argues that whatever metric we use, it must be exposed to the user. My paper is a normative framework, not a benchmark study.
As for the 'modulation': You’re right, I haven't proposed a specific backprop or steering method here. This is a provocation, not a guide. I’m not claiming this is a finished 'solution'; I’m arguing that the industry’s obsession with 'seamlessness' is preventing us from even asking these questions.
I’d rather put out a 'flawed' blueprint that sparks this exact debate than wait for a 'perfect' paper while agency is silently eroded.
Exactly. Also, you need someone with actual knowledge of both the domain/environment (including regulations) and its implications. You could keep asking your favorite LLM "what if...?" and maybe it will get that right every time, but someone has to come up with those questions.
Do people actually spend a significant time typing? After I moved beyond the novice stage it’s been an inconsequential amount of time. What it still serves is a thorough review of every single line in a way that is essentially equivalent to what a good PR review looks like.