DevLog 250510 – Local AI Model Research for The Forge

> Log Date: 250510

Today’s goal was to determine which open-source LLMs I can run locally using my upgraded Dell T3600 workstation. With 64GB RAM, 12GB VRAM, and 2TB SSD, I’m preparing to light the fire in the Forge and populate it with multiple specialized agents.

This post documents a review of five top local LLMs that align with my spec requirements, plus a strategy for how many can run simultaneously. The Forge will eventually host multiple bots, each tailored to a specific domain of knowledge—philosophy, code, news, and beyond.

Top 5 LLMs Compatible with My Setup

1. Mistral 7B

Pros: Fast, efficient, Apache 2.0 licensed, and solid for general use.

Cons: Needs prompt tuning for better domain alignment.

2. LLaMA 3 8B

Pros: Great accuracy, 128K token support, backed by Meta.

Cons: Pushing VRAM limits at 12GB unless quantized (Q4/Q5).

3. DeepSeek V2 7B

Pros: Blazing fast inference, low memory footprint, math-friendly.

Cons: Less documentation, fewer fine-tuned variants.

4. Qwen 2.5 7B

Pros: Versatile, actively maintained, low VRAM overhead.

Cons: Mid-tier performance on highly technical tasks.

5. Dolphin 2.9.3 (RP-tuned Mistral)

Pros: Great for creative dialog, responsive and expressive tone.

Cons: Not ideal for factual tasks, RP focus may bias output.

How Many Models Can I Run at Once?

With 64GB of RAM and 12GB of VRAM, I can confidently run:

2–3 quantized 7B models simultaneously (4-bit or 5-bit).
Or 1x 8B full precision model + 1x 7B quantized model with swap buffers.

Using llama.cpp, koboldcpp, or exllama2 backends, the Forge can dynamically allocate agents by role. This allows me to host a live IRC channel where each LLM bot handles a different topic, self-contained and locally run.

Toolchain for Model Management

LM Studio – GUI for easy loading and testing.
Ollama – One-command model loader and API server.
llama.cpp – Native terminal control, supports quantization.

Next Steps

I'll begin downloading these models and organizing them within the Forge directory. Each will serve as a unique modular brain in my upcoming IRC setup. A control interface will allow hot-swapping roles on demand.

View all logs at DevLogs