DevLog 250424 – Hugging Face Experiments

> Log Date: 2025-04-24

Today I tested multiple Hugging Face models and finally stood up a free version of my Arynwood Robot — currently named Veylan. This post documents where I hit walls with local installs, where I found success, and what remains to be styled and refined.

I started the day by attempting to install phi-4 locally. This model was just released by Microsoft and is promising in benchmarks — however, my 2016 MacBook Air simply couldn't handle it. No GPU, and barely any VRAM. I saw errors and slowdowns almost immediately when loading the weights via transformers.

After abandoning the local route, I pivoted to trying a series of Hugging Face-hosted models. I attempted to stand up Spaces using microsoft/phi-2, mistralai/Mistral-7B-Instruct, and HuggingFaceH4/zephyr-7b-beta. However, most of these either required paid inference endpoints, exceeded free-tier compute limits, or returned “no API found” errors because the model was incompatible with static CPU deployments.

Current Implementation

I finally had success by using HuggingFaceH4/zephyr-7b-beta via the InferenceClient with Gradio’s ChatInterface. This setup doesn’t require API keys as long as the model supports hosted inference. I created a new Space named veylan, and used a simplified app.py that streams responses.

client = InferenceClient("HuggingFaceH4/zephyr-7b-beta")

The assistant currently accepts a user-defined system message and has sliders for max_tokens, temperature, and top_p. It's clean, fast, and feels close to the goal.


What’s Left

Next steps: finish styling the interface, refine the assistant voice, and eventually explore hosting an actual open-source model myself if I ever get better hardware. For now, the project lives in the cloud and is fully functional — no API keys, no fees.


Here's a link: Visit Veylan on Arynwood,com/robot