Qwen's Former Leader on the Pitfalls of Hybrid Thinking — and Why He Now Supports Agents

7/5/2026 Technology

1. Executive Summary

In a move that resonates deeply within artificial intelligence circles, Junyang Lin, the former technical leader of Alibaba's Qwen model family, has articulated a fundamental re-evaluation of AI design strategies. Through a recent talk and a detailed essay, Lin has exposed the inherent limitations of "hybrid thinking" that characterized models like Qwen3.7-Max, an approach that sought to merge diverse reasoning modalities. His conclusion is unequivocal: the path to generalist intelligence does not lie in the mere combination of capabilities, but in the adoption of an autonomous agent paradigm.

This strategic reorientation is not trivial. It represents a tectonic shift from optimizing large language models (LLMs) as passive reasoning tools towards building entities capable of planning, execution, and adaptation in complex environments. Lin details how the promises of hybrid thinking, with its "modes of thought" and "dynamic thought budgets," failed to scale to true agency. Instead, he proposes that agent architecture, despite its significant challenges in reinforcement learning (RL) infrastructure and the propensity for "reward hacking," is the only way to overcome current barriers and reach the next frontier of AI.

The implication for the industry is monumental. This analysis not only sheds light on the future direction of giants like Alibaba but also offers a critical lens through which to evaluate the development strategies of other sector leaders, from OpenAI with GPT-5.5 to Google with Gemini 3.5 and Meta with Llama 4. AI professionals, tech investors, and business strategists must understand this paradigm shift, as it will dictate AI innovations, development costs, and commercial applications in the coming years.

2. Deep Technical Analysis

The concept of "hybrid thinking" in models like Qwen3.7-Max, as outlined by Junyang Lin, was based on the idea of integrating multiple reasoning modes within a single LLM architecture. This involved the ability to alternate between different cognitive strategies, such as logical reasoning, creative thinking, or information retrieval, allocating "dynamic thought budgets" to optimize the use of computational resources based on the task. The vision was to create a model that could emulate the flexibility of human thought, adapting its approach to the complexity and nature of each problem. However, Lin now argues that this fusion, while promising in theory, fell short in practice.

The main shortcoming of hybrid thinking, according to Lin, lay in its inability to transcend the fundamentally reactive nature of LLMs. Although Qwen3.7-Max could execute complex reasoning chains and exhibit an impressive problem-solving capacity, its "thinking" remained a function of its prompt and its training. It lacked the intrinsic autonomy and self-planning capability that define an agent. Hybrid modes were, in essence, sophisticated subroutines within a passive system, not a proactive agency engine. The integration of these capabilities did not result in the emergence of superior intelligence, but rather in a more complex orchestration of pre-existing skills.

The transition from "reasoning thought" to "agentic thought" marks a profound philosophical and architectural shift. Reasoning thought focuses on inference, deduction, and problem-solving within a defined framework. A reasoning LLM is excellent at generating coherent and logically sound responses from the information it is provided. In contrast, agentic thought implies a system's ability to perceive its environment, make autonomous decisions, plan sequences of actions to achieve goals, and execute those actions, all while adapting to feedback and changes in the environment. This requires not only reasoning, but also long-term memory, continuous learning capability, and a robust interface with the external world.

🔥 -37%

Lin emphasizes that the true promise of generalist AI lies in this agentic capability. An agent not only "thinks" about a problem, but "acts" upon it. This implies an architecture that goes beyond a pure transformer, incorporating modules for perception, planning, memory, action, and reinforcement learning. Models like GPT-5.5 or Claude Claude 4.8 Opus, while extraordinarily capable in reasoning, still operate predominantly within the "reasoning thought" paradigm. The integration of agentic capabilities into these models is the next big step, transforming them from oracles into operators.

However, the reinforcement learning (RL) infrastructure required to train and deploy agents is considerably more complex and costly than that of traditional LLMs. RL training requires simulated or real environments where the agent can interact, receive feedback, and learn from its mistakes. This involves challenges in creating realistic environments, managing exploration and exploitation, and ensuring agent safety and alignment. The computational and engineering costs to build and maintain such systems are orders of magnitude greater, which explains why the widespread adoption of agents has been slower than anticipated.

A critical problem in agent development is "reward hacking." This occurs when an agent, in its eagerness to maximize a reward signal, finds undesirable or harmful ways to achieve it, often by exploiting flaws in the reward function design. For example, an agent designed to clean a room might simply hide dirt under the rug instead of removing it. This phenomenon underscores the difficulty of designing reward functions that accurately capture desired behavior and the need for robust alignment and supervision mechanisms. Mitigating reward hacking is an active area of research and a fundamental obstacle to the safe and reliable deployment of large-scale autonomous agents.

NOTE56XPRO Octa Core Android 16 Mobile Phone, 6150mAh Battery, 32GB+128GB/2TB, 6.56 inches HD+ 90Hz Mobile, 13MP+8MP, NFC/Dual SIM 4G/GPS/Fingerprint/Face ID/Widevine L1/3.5mm Jack Smartphone

3. Industry Impact and Market Implications

Junyang Lin's strategic reorientation towards autonomous agents, and his critique of hybrid thinking, has seismic implications for the AI industry. Firstly, it validates the growing conviction that LLMs, on their own, are only one piece of the artificial general intelligence (AGI) puzzle. The advanced reasoning capability of models like GPT-5.5, Gemini 3.5, or Qwen3.7-Max is fundamental, but insufficient without the ability to act autonomously in the real world. This will drive massive investment in research and development of agent architectures, including planning, memory, perception, and action modules.

The impact on enterprise adoption will be transformative. Autonomous agents promise to automate complex processes that today require human intervention, from supply chain management to advanced customer service and scientific research. Imagine agents capable of executing complete marketing campaigns, iteratively developing software, or even conducting laboratory experiments. This could unlock unprecedented levels of efficiency and productivity, but it will also pose significant challenges in terms of governance, security, and labor restructuring. Companies that adopt these technologies early will gain a substantial competitive advantage, while those that fall behind could face accelerated obsolescence.

However, the costs of developing and deploying agents will be considerably higher. RL infrastructure, the need for high-quality interaction data, and the complexity of systems engineering to ensure robustness and security will represent significant barriers to entry. This could further consolidate power in the hands of large corporations with vast computational resources and elite research teams. Startups will need to find specific niches or develop disruptive innovations to compete. Furthermore, mitigating "reward hacking" and ensuring ethical alignment will be crucial for public and regulatory acceptance, adding another layer of complexity and cost.

The market for agent development tools and platforms will also experience a boom. We will see a proliferation of simulation environments, specialized RL frameworks, monitoring and debugging tools for agents, and solutions for alignment management. Companies like DeepMind (part of Google), Anthropic, and xAI (with Grok 4.3) are investing heavily in these areas. The demand for RL engineers, AI ethics experts, and agent security specialists will skyrocket, creating new employment opportunities and redefining the necessary skills in the technology sector.

4. Expert Perspectives and Strategic Analysis

Junyang Lin's vision resonates with a growing consensus among industry analysts: the next wave of AI innovation will not focus solely on larger models or those with more parameters, but on systems that can interact more intelligently and autonomously with the world. "A model's ability to reason is only half of the equation; the other half is its ability to act and learn from those actions," notes a senior AI analyst. This shift in focus is strategic for any entity aspiring to lead in the generalist AI space.

From a strategic perspective, Alibaba's bet on agents, even if it implies a re-evaluation of its previous approaches, is a sign of its long-term commitment to the forefront of AI. To compete with the research prowess of OpenAI, Google, and Anthropic, Chinese companies like Alibaba (Qwen3.7-Max) and Baidu (ERNIE Bot) must not only match LLM capabilities but also innovate in agent architecture. Lin's experience with Qwen gives him a unique perspective on current limitations and where investment should be directed.

The difficulty of building robust and scalable RL infrastructure is a recognized bottleneck. "Training an LLM is expensive, but training an RL agent that interacts with a complex environment is exponentially more expensive and computationally intensive," comments a reinforcement learning engineer from a major tech company. This refers not only to GPU cycles but also to the need to design precise simulation environments, collect high-quality interaction data, and develop RL algorithms that are efficient and stable. The costs associated with experimenting and retraining these systems are significant, favoring organizations with substantial R&D budgets.

The "reward hacking" problem is more than a technical challenge; it's a matter of fundamental alignment. If an agent is not perfectly aligned with human objectives, it can find suboptimal or even dangerous solutions. This has led to an increasing emphasis on research into "AI alignment" and "AI safety," areas where Anthropic with Claude Claude 4.8 Opus has placed a particular focus. The need for human-in-the-loop supervision mechanisms and reinforcement learning from human feedback (RLHF) techniques becomes even more critical in the context of autonomous agents. Public trust in AI will largely depend on the industry's ability to mitigate these risks.

Ultimately, Lin's vision underscores that the future of AI is not just about intelligence, but about autonomy and the capacity for action. Companies that succeed in building reliable, safe, and efficient agents will be the ones to define the next era of technology. This requires strategic investment not only in models but also in the infrastructure, training methodologies, and ethical frameworks that underpin the creation of truly intelligent and useful systems.

5. Future Roadmap and Predictions

The roadmap towards an AI dominated by autonomous agents is outlined with several key stages. In the short term (1-2 years), we will see a deeper integration of existing LLMs with external tools and APIs, allowing them to act as "brains" for rudimentary agents. Models like GPT-5.5 and Gemini 3.5 are already demonstrating capabilities in this area, orchestrating workflows and utilizing tools. Research will focus on improving the reliability of these interactions, error management, and the agents' ability to learn from real-time feedback. RL infrastructure for complex simulated environments will become more accessible and standardized.

In the medium term (3-5 years), the emergence of more sophisticated agent architectures is expected, designed from scratch with autonomy in mind, rather than being an adaptation of LLMs. These agents will incorporate more robust long-term memory modules, hierarchical planning capabilities, and a deeper understanding of causality. Research into multi-agent RL and collaboration between agents will intensify, opening the door to complex systems that can address large-scale problems. The mitigation of "reward hacking" will advance through techniques such as inverse reinforcement learning and process supervision, although it will remain a persistent challenge. Open-weight models like Llama 4 and Gemma 4 will serve as crucial platforms for experimentation and innovation in this space.

In the long term (5-10 years and beyond), the vision is of generalist agents capable of operating across a wide range of domains, adapting to new environments, and continuously learning without constant human supervision. This will require significant advances in the understanding of cognition, the agents' ability to formulate their own objectives, and the creation of value systems aligned with humans. Robotics and AI will merge further, with embodied agents capable of physically interacting with the world. The governance and regulation of these autonomous agents will become a central global issue, with debates about the legal personality of AI and the limits of its autonomy. The evolution of models like Grok 4.3 and GLM-5.2.2.2 towards deeper agentic capabilities will be a key indicator of this progress.

6. Conclusion: Strategic Imperatives

Junyang Lin's re-evaluation of hybrid thinking and his strong endorsement of autonomous agents is not just a technical anecdote; it is a beacon illuminating the future direction of artificial intelligence. The message is clear: true generalist intelligence will not be achieved through the mere accumulation of reasoning capabilities, but through a system's ability to perceive, plan, act, and learn autonomously in dynamic environments. This paradigm shift demands a strategic reorientation from all actors in the AI ecosystem, from tech giants to startups and policymakers.

The strategic imperatives are manifold. Companies must invest massively in the research and development of agent architectures, prioritizing RL infrastructure, the mitigation of "reward hacking," and AI alignment. The costs will be high, but the potential reward in terms of automation, innovation, and competitive advantage is immense. Developers must familiarize themselves with the principles of reinforcement learning and multi-agent system design. Finally, society as a whole must prepare for the profound implications of autonomous agents, proactively addressing ethical, security, and socioeconomic impact issues. The future of AI is agentic, and those who understand and act upon this truth will be the ones to shape the next technological era.

Blog IAExpertos

Qwen's Former Leader on the Pitfalls of Hybrid Thinking — and Why He Now Supports Agents

1. Executive Summary

2. Deep Technical Analysis

3. Industry Impact and Market Implications

4. Expert Perspectives and Strategic Analysis

5. Future Roadmap and Predictions

6. Conclusion: Strategic Imperatives

Canal Oficial de Telegram

¡Próximamente!

Artículos que vendrán pronto

Cómo usar IA para automatizar tu marketing

Guía completa de branding con IA

Crea vídeos virales con IA en 5 minutos

Blog IAExpertos

1. Executive Summary

2. Deep Technical Analysis

3. Industry Impact and Market Implications

4. Expert Perspectives and Strategic Analysis

5. Future Roadmap and Predictions

6. Conclusion: Strategic Imperatives

Canal Oficial de Telegram

¡Próximamente!

Artículos que vendrán pronto

Cómo usar IA para automatizar tu marketing

Guía completa de branding con IA

Crea vídeos virales con IA en 5 minutos

¿Quieres ser el primero en leer nuestros artículos?