pydantic-ai-mlx MLX local inference for Pydantic AI through LM Studio or mlx-lm directly. Run MLX compatible HuggingFace models on Apple silicon locally with Pydantic AI. Two options are provided as backends; - LM Studio backend (OpenAI compatible server that can also utilize mlx-lm, model runs on a separate background process) - mlx-lm backend (direct integration with Apple's library, model runs within your applicaiton, experimental support) STILL IN DEVELOPMENT, NOT RECOMMENDED FOR PRODUCTION USE YET. Contributions are welcome! Features [x] LM Studio backend, should be fully supported [x] Streaming text support for mlx-lm backend [ ] Tool calling support for mlx-lm backend Apple's MLX seems more performant on Apple silicon than llama.cpp (Ollama), as of January 2025. Installation bash uv add pydantic-ai-mlx Usage LM Studio backend ```python from pydantic_ai import Agent from pydantic_ai.messages import ModelMessage from pydantic_ai_lm_studio import LMStudioModel model = LMStudioModel(model_name="mlx-community/Qwen2.5-7B-Instruct-4bit") # supports tool calling agent = Agent(model, system_prompt="You are a chatbot.") async def stream_response(user_prompt: str, message_history: list[ModelMessage]): async with agent.run_stream(user_prompt, message_history) as result: async for message in result.stream(): yield message ``` mlx-lm backend ```python from pydantic_ai import Agent from pydantic_ai.messages import ModelMessage from pydantic_ai_mlx_lm import MLXModel model = MLXModel(model_name="mlx-community/Llama-3.2-3B-Instruct-4bit") See https://github.com/ml-explore/mlx-examples/blob/main/llms/README.md#supported-models also https://huggingface.co/mlx-community agent = Agent(model, system_prompt="You are a chatbot.") async def stream_response(user_prompt: str, message_history: list[ModelMessage]): async with agent.run_stream(user_prompt, message_history) as result: async for message in result.stream(): yield message ```
pypi package. Binary
Latest version: 0.2.5 Released: 2025-03-09