Microsoft's SkillOpt framework optimizes AI agent skills without changing model weights
Microsoft's open-source SkillOpt framework helps AI agents adapt to new domains by optimizing their skills without changing the underlying model weights.

Agent skills have become a crucial part of real-world AI applications, providing a mechanism for models to adapt to specific enterprise use cases and complex workflows. These skills are typically stored as text documents and inserted into the agent's context before execution. However, optimizing these skills is a slow and faulty process, as they cannot be trained in the same way as the parameters of the underlying AI model.
Microsoft has developed an open-source framework called SkillOpt, which introduces an optimizer designed for agent skills. It turns the agent's skill document into a trainable object that evolves based on performance feedback. SkillOpt uses deep-learning-style optimization to make it possible for the AI to systematically explore modifications to the document and find the best combination of instructions.
The framework has been tested on various industry benchmarks, outperforming existing baselines and significantly boosting accuracy for models like GPT-5.5 and Qwen. According to Yifan Yang, Senior Research SDE at Microsoft Research Asia, the problem is not making changes, but ensuring those changes are mathematically sound. SkillOpt optimizes a text document through an iterative propose-and-test loop that separates the model executing the tasks from the model optimizing the skill.
The process unfolds in several steps: SkillOpt starts with an initial skill document and a frozen target model, where the target model runs a batch of tasks to generate execution trajectories that act as the evidence for the current step. The creators note that “the deep-learning analogy is operational rather than decorative,” helping the framework avoid the instability issues associated with other optimization techniques. SkillOpt directly addresses the problem of treating text as a trainable object by importing mathematical concepts from deep learning.
To evaluate the technique in practice, researchers tested SkillOpt across different models, ranging from large-scale frontier models like GPT-5.5 to smaller closed and open models including GPT-5.4-mini and Qwen3.5-4B. The true value of SkillOpt lies in its portability, efficiency, and compatibility with existing infrastructure. Experiments confirm that the framework is harness-agnostic.
In addition to basic chat, the same optimization loop was successfully integrated into tool-backed execution environments like the Codex CLI and Claude Code with significant gains on industry benchmarks. Developers can train a skill using one execution loop and deploy it in another. For example, a spreadsheet skill trained entirely inside the Codex loop was moved directly into Claude Code and drove a +59.7 point gain over Claude Code's native baseline without any further changes.
Source: VentureBeat