Issue #7: hiyouga/LlamaFactory: unified CLI and web UI for fine-tuning 100+ LLMs via LoRA, QLoRA,

TL;DR

This week: a fine-tuning launcher, a browser automation agent, two eval/observability tools, and a repo-to-context packer. All five are cloneable and runnable this week with no new frameworks to learn.

hiyouga/LlamaFactory -- run QLoRA or DPO on Llama 3, Qwen3, or DeepSeek from a single YAML with no training loop to write
browser-use/browser-use -- give an OpenAI or Anthropic model a natural-language task and have it navigate real websites via Playwright
langfuse/langfuse -- self-host LLM tracing, prompt versioning, and evals in one docker compose instead of paying for LangSmith
promptfoo/promptfoo -- run prompt regression tests and jailbreak scans locally before your next deployment
yamadashy/repomix -- pack an entire git repo into a single token-counted file for Claude or ChatGPT context

This Week's Picks

1. hiyouga/LlamaFactory: unified CLI and web UI for fine-tuning 100+ LLMs via LoRA, QLoRA, DPO, PPO, or full fine-tuning without writing training code

hiyouga/LlamaFactory -- 70711 stars

Covers SFT, DPO, KTO, PPO, and reward modeling under one CLI and Gradio UI, so you stop stitching together Axolotl for supervised fine-tuning and TRL for RLHF separately. It wraps both Unsloth and TRL under the hood, so you get their speed and flexibility without wiring them yourself.

Who should try it: You if you need a working QLoRA or DPO run on Llama 3, Qwen3, DeepSeek, or Gemma against your own dataset this week without hand-rolling a TRL training loop.

Try it:

pip install llamafactory && llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml

Honest caveat: Documentation is marked WIP on ReadTheDocs, so for anything beyond the example YAML configs you will be reading source code or Discord threads.

Ship window: this week

2. browser-use/browser-use: Python library that wraps Playwright with an LLM agent loop so a model can navigate, click, fill forms, and extract data from real websites

browser-use/browser-use -- 90880 stars

Replaces fragile CSS selector scripts with LLM reasoning over an indexed DOM, handling element extraction and retry logic out of the box -- the part that makes raw Playwright + LangChain browser tools break on dynamic pages. Compared to Skyvern, it runs fully in your own Python process with no external service dependency.

Who should try it: You if you are building an automation or agent pipeline in Python that needs to interact with sites lacking APIs -- form submissions, scraping behind login walls, or multi-step web workflows -- using OpenAI, Anthropic, or Gemini models.

Try it:

uv init && uv add browser-use && uv sync && uvx browser-use install && uvx browser-use init --template default

Honest caveat: LLM-driven browser agents are still unreliable on complex multi-step tasks without supervision; success rates in their own benchmark drop noticeably below GPT-4o/Claude-class models, so plan for retry logic and human-in-the-loop fallback in production.

Ship window: this week

3. langfuse/langfuse: self-hostable backend that captures LLM traces in ClickHouse and provides a UI for prompt versioning, evals, datasets, and a prompt playground

langfuse/langfuse -- 26231 stars

MIT-licensed and self-hosted, so your traces stay on your own infra instead of LangSmith's cloud, and you pay nothing past the free tier. The prompt versioning and management UI alone replaces the ad-hoc .txt files and spreadsheets most teams use alongside LangChain or the OpenAI SDK in production.

Who should try it: You, if you are running the OpenAI SDK, LiteLLM, or LangChain in production and currently have zero visibility into which prompts are failing, what latency looks like per trace, or how evals trend over time.

Try it:

git clone https://github.com/langfuse/langfuse.git && cd langfuse && docker compose up

Honest caveat: Self-hosted setup pulls in ClickHouse, Postgres, and Redis via Docker Compose -- not ideal if you are on a small VPS or want a minimal single-container footprint.

Ship window: this week

4. promptfoo/promptfoo: CLI that runs automated test suites against LLM prompts and scans them for security vulnerabilities via red-teaming attacks

promptfoo/promptfoo -- 20665 stars

Runs entirely locally so prompts never leave your machine, and covers both prompt regression evals and jailbreak/injection scanning in one YAML config -- instead of using Braintrust for evals and Garak for red-teaming separately. It wires into CI/CD and gets you a working test suite in under 30 minutes without Python boilerplate.

Who should try it: You if you are shipping a GPT-4o or Claude-based app and need regression tests on prompt changes before they hit production, or need to run injection scans before a security review.

Try it:

export OPENAI_API_KEY=sk-abc123 && npx promptfoo@latest init --example getting-started && cd getting-started && npx promptfoo@latest eval && npx promptfoo@latest view

Honest caveat: Now owned by OpenAI, which may affect long-term neutrality for teams building on competing providers like Anthropic or Google.

Ship window: this week

5. yamadashy/repomix: CLI that packs an entire git repo into a single XML, Markdown, or plain-text file for LLM context windows, with token counting and secrets scanning built in

yamadashy/repomix -- 24036 stars

Adds token counts per file, Secretlint scanning, and a --compress flag that uses tree-sitter to strip boilerplate before you paste into Claude, ChatGPT, or Gemini -- the parts that Simon Willison's files-to-prompt and gitingest skip. It is .gitignore-aware so you are not leaking node_modules or .env into your context window.

Who should try it: You if you regularly paste codebases into Claude or Gemini for code review or onboarding and currently do it with ad-hoc shell scripts or manual file copying.

Try it:

npx repomix@latest

Honest caveat: The --compress tree-sitter parsing is language-dependent and may silently fall back to uncompressed output for unsupported languages.

Ship window: this week

hiyouga/LlamaFactory: unified CLI and web UI for fine-tuning 100+ LLMs via LoRA, QLoRA,

TL;DR

This Week's Picks

1. hiyouga/LlamaFactory: unified CLI and web UI for fine-tuning 100+ LLMs via LoRA, QLoRA, DPO, PPO, or full fine-tuning without writing training code

2. browser-use/browser-use: Python library that wraps Playwright with an LLM agent loop so a model can navigate, click, fill forms, and extract data from real websites

3. langfuse/langfuse: self-hostable backend that captures LLM traces in ClickHouse and provides a UI for prompt versioning, evals, datasets, and a prompt playground

4. promptfoo/promptfoo: CLI that runs automated test suites against LLM prompts and scans them for security vulnerabilities via red-teaming attacks

5. yamadashy/repomix: CLI that packs an entire git repo into a single XML, Markdown, or plain-text file for LLM context windows, with token counting and secrets scanning built in

Repos

Get the next issue