
New MRAgent Framework Uses 118K Tokens Per Query, Outperforms LangMem
MRAgent framework reduces token consumption and runtime costs for long-horizon reasoning tasks in AI agents.
AINOVAT
Source
60 articles from this source

MRAgent framework reduces token consumption and runtime costs for long-horizon reasoning tasks in AI agents.

Incomplete data hampers autonomous security agents; experts stress verifying endpoint coverage before deployment.

OpenAI announces limited preview of GPT-5.6 family, including Sol, Terra, and Luna models for various enterprise needs.

Industrialized factories changed how the world produced physical goods: more output, lower costs, faster than anything that came before.

Liquid AI's LFM2.5-230M model outperforms larger models in data extraction and can run on local devices.

OpenAI updates GPT-5.5 Instant model, used in free ChatGPT version, with improved shopping results and complex constraint handling.

Mindstone's Rebel enables enterprises to automatically select the best AI model for each task and subtask, ensuring reliable and secure AI workflows.

Mistral AI releases OCR 4, a document intelligence model that extracts structured representations of documents, complete with bounding boxes and confidence scores.

Alibaba's Qwen team releases Qwen-AgentWorld, two models trained to predict environment returns, boosting agent performance in seven domains.

Xiaomi's HarnessX framework autonomously improves AI software scaffolding, yielding significant performance gains across domains.

Stanford researchers develop agentic AI 'scientists' to streamline drug discovery, reducing inefficiency and high failure rates.

Shopify built an LLM proxy that gives every engineer access to multiple AI providers — with automatic failover when any one of them goes down, changes, or disappears.

Amazon's AGI autonomy lab develops framework for trustworthy AI agents, focusing on consistency, robustness, and safety.

Intuit overhauled its AI infrastructure to support complex tasks, shifting from a multi-agent setup to a granular, skill-and-tool-based architecture.

OpenAI and Broadcom partner on Jalapeño, a custom AI accelerator chip for large language model inference.

Krea releases Krea 2 Raw and Turbo, open-weights AI image models with fast generation speeds, under a custom license.

Anthropic launches Claude Tag, an AI agent that embeds directly inside Slack as a persistent teammate that anyone can delegate work to.

Enterprises struggle to scale AI workloads due to fragile data delivery infrastructure.

Alibaba Cloud on Sunday released HappyHorse 1.1 , a major upgrade to its AI video generation model that the company says delivers production-ready video synthesis across core content creation scenarios.

Sakana AI launches Fugu, a multi-agent orchestration system delivering frontier-level AI performance through a single, OpenAI-compatible API.

Organizations need to capture and reuse knowledge gained from daily operations to improve AI-driven decisions.

Self-Harness lets AI agents systematically improve their operating rules, boosting performance up to 60% without relying on human engineers or stronger external models.

As AI inference workloads evolve, GPU availability is no longer the primary bottleneck; instead, context management has become a major challenge.

Your AI agent did exactly what it was designed to do.

Enterprise AI agents often stall in production, requiring human oversight, but hypernetworks may offer a solution by generating task-specific models on demand.

Anthropic announced a potentially game-changing new feature for users of Claude Code on the Claude Team and Enterprise subscription plans: Artifacts .

Arbor framework automates AI-driven research and optimization, outperforming Claude Code and Codex by 2.5x on the same compute budget.

Two AI tools broke in the same way in the same two weeks, and four research teams proved it.

Adobe expands 'creative agent' across Creative Cloud suite and upgrades Firefly AI studio to automate complex production workflows.

AWS launches context intelligence stack with knowledge graph service that improves over time through agent usage.

When Anthropic quietly released Claude Design in April as a " research preview ," it generated the kind of instant traction most product teams dream about: more than one million users in its first week.

A 3 billion parameter language model from Sina Weibo achieves benchmark scores comparable to much larger models, sparking debate over AI benchmarks and scaling laws.
Chinese AI startup Z.ai releases GLM-5.2, a 753-billion parameter open-weights LLM that outperforms GPT-5.5 on multiple long-horizon coding benchmarks at 1/6th the cost.

Databricks announces two products to unify operational and analytical databases, eliminating latency and performance degradation.

Stanford's DeLM framework enables agents to coordinate directly, reducing multi-agent task costs by 50% without a central controller.

Microsoft CEO Satya Nadella warns AI could concentrate value, commoditize industries, and urges businesses to build proprietary learning loops.

Tokyo-based Sakana AI debuts Sakana Marlin, an autonomous B2B research agent generating in-depth strategy reports.

Organizational leaders are nearly twice as likely to hide their AI use compared to all other employees, at 42% versus 23%, according to new Ivanti research surveying 3,900 employees across six countries.

AI has changed the economics of cyber deception, making it essential for defenders to prioritize truth at machine speed.

US government issues export control directive, citing national security, for Anthropic to suspend access to top-tier AI models for foreign nationals.

Moonshot AI releases Kimi K2.7-Code, an open-source update to its K2 coding model family, with claimed performance gains and reduced thinking tokens.

Large language models continue to struggle with hallucinations, presenting a major roadblock for real-world enterprise applications.

NanoClaw and JFrog launch joint security integration to protect AI agents from malicious code injection.

PixelRAG skips text parsing, rendering web pages as screenshots to improve retrieval accuracy and cut AI agent token costs by 10x.

Microsoft's open-source SkillOpt framework helps AI agents adapt to new domains by optimizing their skills without changing the underlying model weights.

Xiaomi's open-source MiMo Code V0.1.0 beats Anthropic's Claude Code on agentic coding benchmarks, especially on 200+ step tasks.

AI teams focus on compute and storage, but neglect network issues that cause performance drops in production.

Google's DiffusionGemma generates 256 tokens in parallel, self-correcting as it goes, with speeds up to 4x faster than standard models on GPUs.

Enterprises struggle to make AI work in the real world, not to experiment with it.

Researchers launch Agents' Last Exam, a benchmark testing AI's ability to execute long-horizon professional workflows, with GPT-5.5 taking top spot.

Sapient's HRM-Text model achieves competitive performance with much larger models at a fraction of the cost and training data.

Anthropic CEO Dario Amodei calls for government regulations on powerful AI models, comparing the industry to commercial aviation.

MassMutual uses 12-month contracts, measures productivity gains, and avoids vendor lock-in to stay agile with AI.

Apple’s new Siri AI, unveiled yesterday at Apple's annual Worldwide Developers Conference (WWDC 2026), may look like a consumer product story on the surface.

Cohere releases North Mini Code, an open-source coding agent that runs on a single H100, targeting agentic software engineering and coding pipelines.

On-device AI models have stayed small because the entire weight set has to live in DRAM, capping practical parameter counts well below what server-side deployments use.

Anthropic releases Claude Fable 5 and Claude Mythos 5, its most powerful generally available AI models, with enhanced performance and safeguards.

Researchers develop Harness-1, a 20-billion parameter open-source search agent that surpasses GPT-5.4 in recalling relevant information.

The integration of agentic AI in software engineering has accelerated code generation, but also revealed deeper challenges in defining requirements, integrating complex systems, and maintaining software under real-world conditions.

A software team's experience with a large language model upgrade highlights the challenges of managing AI blast radius in production.