Today I tested multiple Hugging Face models and finally stood up a free version of my Arynwood Robot — currently named Veylan. This post documents where I hit walls with local installs, where I found success, and what remains to be styled and refined.
I started the day by attempting to install phi-4
locally. This model was just released by Microsoft and is promising in benchmarks — however, my 2016 MacBook Air simply couldn't handle it. No GPU, and barely any VRAM. I saw errors and slowdowns almost immediately when loading the weights via transformers
.
After abandoning the local route, I pivoted to trying a series of Hugging Face-hosted models. I attempted to stand up Spaces using microsoft/phi-2
, mistralai/Mistral-7B-Instruct
, and HuggingFaceH4/zephyr-7b-beta
. However, most of these either required paid inference endpoints, exceeded free-tier compute limits, or returned “no API found” errors because the model was incompatible with static CPU deployments.
I finally had success by using HuggingFaceH4/zephyr-7b-beta
via the InferenceClient
with Gradio’s ChatInterface
. This setup doesn’t require API keys as long as the model supports hosted inference. I created a new Space named veylan, and used a simplified app.py
that streams responses.
client = InferenceClient("HuggingFaceH4/zephyr-7b-beta")
The assistant currently accepts a user-defined system message and has sliders for max_tokens
, temperature
, and top_p
. It's clean, fast, and feels close to the goal.
Next steps: finish styling the interface, refine the assistant voice, and eventually explore hosting an actual open-source model myself if I ever get better hardware. For now, the project lives in the cloud and is fully functional — no API keys, no fees.
Here's a link: Visit Veylan on Arynwood,com/robot