📊 Full opportunity report: Engineering Is Automated. Research Is the Residual. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
AI systems are now capable of automating most engineering tasks in AI research, with benchmarks nearing saturation. Research activities, however, remain less automated, though progress suggests they may also become more scalable soon.
Recent empirical evidence confirms that AI systems can now automate most core engineering tasks involved in AI research, with benchmarks reaching near saturation. Meanwhile, research activities remain less automated but show signs of rapid progress, signaling a potential paradigm shift in AI development.
Multiple benchmarks measuring AI capabilities in tasks critical to AI R&D—such as reproducing research, participating in Kaggle competitions, and designing GPU kernels—are approaching or have reached saturation levels, indicating that AI can automate these engineering functions at a high reliability. For example, the CORE-Bench, which tests the reproduction of research papers, improved from 21.5% in September 2024 to 95.5% in December 2025, with some experts declaring it ‘solved.’ Similarly, the MLE-Bench, which assesses performance on Kaggle competitions, advanced from 16.9% to 64.4% over sixteen months, reaching a level comparable to mid-tier human performance. These trends suggest that the bottleneck for AI-driven research engineering is diminishing significantly.
In contrast, the capacity of AI to automate research as a creative and hypothesis-generating activity remains less certain. Clark’s analysis indicates that while engineering tasks are nearing full automation, the more abstract aspects of research—such as formulating novel hypotheses or designing entirely new experiments—may still require human insight. However, ongoing advances in kernel design and model automation demonstrate that even these areas are rapidly progressing, with recent developments including automated GPU kernel generation and optimization tools becoming production-ready.
Engineering is automated.
Research is the residual.
Six skill benchmarks. Edison’s framing. The question Clark leaves open is whether research is just engineering at scale.
Jack Clark’s Import AI #455 catalogs six benchmarks measuring AI capability on AI R&D tasks and concludes “AI can today automate vast swatches, perhaps the entirety, of AI engineering.” The residual question is research. The structural read on the residual: it may not be a permanent moat.
Six skills. One trajectory.
Clark catalogs six benchmarks measuring AI capability on AI R&D-relevant tasks. Each individual benchmark could be noise. Six benchmarks moving together is a curve. The pattern is the cascade observed across the broader Clark series — visible here in the specific R&D-skill domain.
automated GPU kernel development tools
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three data points. Mixed signal.
Clark provides three data points on the creative-spark question. Yes-evidence: Erdős-1051, centaur math discovery, sporadic Move-37-style moments. No-evidence: low yield, framing dependence, absence of acceleration. The mixed signal is the honest read.
The data supports two readings. Pessimistic: rare moments suggest creative insight is qualitatively distinct from engineering work. Optimistic: rare moments are an artifact of low-volume exploration; more shots on goal yields more discoveries. Both readings are consistent with Clark’s “vast swatches, perhaps the entirety” claim. They differ on the residual.
AI research automation software
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five dimensions Clark gestures at but leaves underdeveloped.
Clark’s section is rigorous on the empirical evidence. Five strategic dimensions matter for the institutional response that the Clark series synthesis argues is structurally inadequate.
AI model optimization tools
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Two readings. Different equilibria.
The structural question Clark leaves open: is research a permanent moat that bounds automated AI R&D, or is it engineering at scale that dissolves with more shots on goal? Both readings are consistent with the current data. They differ by orders of magnitude in consequences.
Productivity multiplier years
Recursive loop operational
AI research reproducibility tools
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five audiences. Asymmetric cost of being wrong.
The institutional response should not bet on inspiration being a permanent moat. If the distinction holds, capacity built is still useful. If it closes, capacity is necessary. Asymmetric cost-of-being-wrong points toward building now.
IN INDUSTRY
IN ACADEMIA
POLICYMAKERS
INVESTORS
EVERYONE ELSE
Engineering is automated. The residual is the question. The institutional response should not bet on inspiration being a permanent moat.
Implications of Engineering Automation for AI Research
The rapid automation of core engineering tasks in AI research suggests a fundamental shift in how AI development will proceed. As these tasks become automated, the cost and time required to reproduce, validate, and build upon research decrease dramatically, potentially accelerating innovation cycles. This shift could reduce reliance on human engineers for routine tasks, freeing up researchers to focus on higher-level scientific questions. However, the less-automated aspects of research—such as creative hypothesis generation—may become the new bottleneck, emphasizing the importance of understanding how AI can support or augment these functions.
Recent Advances in AI Capabilities for R&D Tasks
Over the past year, multiple independent benchmarks have demonstrated significant progress in AI’s ability to perform tasks central to research and engineering. The CORE-Bench, measuring research reproduction, showed a 4.4× improvement, with AI systems handling dependencies and code execution at post-doc levels. The MLE-Bench, assessing Kaggle competition performance, improved nearly fourfold, reaching levels comparable to mid-tier human practitioners. Additionally, advances in kernel design—such as automated GPU kernel generation and optimization—indicate that AI is moving from experimental to production-grade engineering tools. These developments mark a clear pattern of saturation across multiple R&D skill domains, suggesting a structural shift in AI’s role in research.
“The pattern across these benchmarks indicates that AI can now automate vast swaths of engineering tasks in AI research, with some areas approaching ‘solved’ status.”
— Thorsten Meyer
Unresolved Questions About Research Automation
While engineering tasks are nearing full automation, the extent to which AI can automate the creative and hypothesis-driven aspects of research remains uncertain. It is unclear whether current progress in kernel design and code automation will translate into fully autonomous research processes or if human insight will remain essential for breakthrough scientific discoveries.
Future Developments in AI-Driven Research Automation
In the coming months, researchers will likely focus on integrating these automation tools into broader research workflows and testing their limits in more complex, creative tasks. Monitoring the progress of AI in hypothesis generation, experimental design, and scientific discovery will be critical. Additionally, institutional responses—such as policy adjustments and new research paradigms—may emerge as AI capabilities continue to evolve rapidly.
Key Questions
What specific tasks has AI automated in research engineering?
AI has automated tasks such as reproducing research experiments, generating optimized GPU kernels, and participating in competitive ML tasks like Kaggle competitions, reaching near-human or superhuman levels.
Does this mean AI can now do all scientific research?
No, while engineering aspects are highly automated, the creative and hypothesis-driven parts of research are still less automated and remain an active area of development.
How soon might AI fully automate research activities?
It is uncertain; current trends suggest rapid progress in engineering automation, but full automation of all research aspects could still take several years, depending on breakthroughs in creative AI capabilities.
What are the risks of relying on AI for research automation?
Risks include over-reliance on automated systems that may lack understanding of scientific context, potential biases, and the need for human oversight to ensure scientific validity and ethical standards.
Source: ThorstenMeyerAI.com