Today’s goal was to determine which open-source LLMs I can run locally using my upgraded Dell T3600 workstation. With 64GB RAM, 12GB VRAM, and 2TB SSD, I’m preparing to light the fire in the Forge and populate it with multiple specialized agents.
This post documents a review of five top local LLMs that align with my spec requirements, plus a strategy for how many can run simultaneously. The Forge will eventually host multiple bots, each tailored to a specific domain of knowledge—philosophy, code, news, and beyond.
Pros: Fast, efficient, Apache 2.0 licensed, and solid for general use.
Cons: Needs prompt tuning for better domain alignment.
Pros: Great accuracy, 128K token support, backed by Meta.
Cons: Pushing VRAM limits at 12GB unless quantized (Q4/Q5).
Pros: Blazing fast inference, low memory footprint, math-friendly.
Cons: Less documentation, fewer fine-tuned variants.
Pros: Versatile, actively maintained, low VRAM overhead.
Cons: Mid-tier performance on highly technical tasks.
Pros: Great for creative dialog, responsive and expressive tone.
Cons: Not ideal for factual tasks, RP focus may bias output.
With 64GB of RAM and 12GB of VRAM, I can confidently run:
Using llama.cpp
, koboldcpp
, or exllama2
backends, the Forge can dynamically allocate agents by role. This allows me to host a live IRC channel where each LLM bot handles a different topic, self-contained and locally run.
I'll begin downloading these models and organizing them within the Forge directory. Each will serve as a unique modular brain in my upcoming IRC setup. A control interface will allow hot-swapping roles on demand.
View all logs at DevLogs