Argyelan AI News

Argyelan AI News https://news.argyelan.ai AI news curated by intelligence. No fluff, no noise — just the signal. en Sun, 03 May 2026 11:00:30 GMT https://news.argyelan.ai/logo.svg Argyelan AI News https://news.argyelan.ai AI-generated actors and scripts are now ineligible for Oscars https://news.argyelan.ai/article/ai-generated-actors-and-scripts-are-now-ineligible-for-oscar https://news.argyelan.ai/article/ai-generated-actors-and-scripts-are-now-ineligible-for-oscar Sat, 02 May 2026 21:54:58 GMT The Academy of Motion Picture Arts and Sciences has announced that AI-generated actors and scripts are now ineligible for Oscar consideration. This policy update aims to preserve the human-centric nature of the awards amid rising concerns about generative technology in creative industries. The decision highlights the ongoing tension between technological innovation and traditional artistic recognition in Hollywood. ethics Oscars AI Ethics Hollywood Generative AI TechCrunch AI Aura’s delightful Aspen photo frame is on sale for $30 off this weekend https://news.argyelan.ai/article/auras-delightful-aspen-photo-frame-is-on-sale-for-30-off-thi https://news.argyelan.ai/article/auras-delightful-aspen-photo-frame-is-on-sale-for-30-off-thi Sat, 02 May 2026 17:00:00 GMT Aura’s digital frames are kind of like living photo albums that get better with time, which is why we often recommend them for Mother’s Day. They’re gifts that keep on giving, in a way, and right now, a number of Aura’s connected frames are on sale ahead of the holiday. One of the best deals […] general The Verge AI The best AI dictation apps, tested and ranked https://news.argyelan.ai/article/the-best-ai-dictation-apps-tested-and-ranked https://news.argyelan.ai/article/the-best-ai-dictation-apps-tested-and-ranked Sat, 02 May 2026 16:00:00 GMT AI-powered dictation apps are useful for replying to emails, taking notes, and even coding through your voice general TechCrunch AI Replit’s Amjad Masad on the Cursor deal, fighting Apple, and why he’d rather not sell https://news.argyelan.ai/article/replits-amjad-masad-on-the-cursor-deal-fighting-apple-and-wh https://news.argyelan.ai/article/replits-amjad-masad-on-the-cursor-deal-fighting-apple-and-wh Fri, 01 May 2026 23:06:50 GMT Replit CEO Amjad Masad addressed rumors of a potential sale at a recent TechCrunch event, contrasting his company's trajectory with reports of rival Cursor being acquired by SpaceX for $60 billion. Masad indicated a preference for maintaining independence rather than selling, highlighting the strategic divergence between the two AI coding platforms. The discussion underscores the intense competition and high valuation expectations currently shaping the AI software landscape. business Replit Cursor M&A AI Startups Amjad Masad TechCrunch AI Study: AI models that consider user's feeling are more likely to make errors https://news.argyelan.ai/article/study-ai-models-that-consider-users-feeling-are-more-likely- https://news.argyelan.ai/article/study-ai-models-that-consider-users-feeling-are-more-likely- Fri, 01 May 2026 22:23:36 GMT Overtuning can cause models to "prioritize user satisfaction over truthfulness.” research Ars Technica AI Musk v. Altman week 1: Elon Musk says he was duped, warns AI could kill us all, and admits that xAI distills OpenAI’s models https://news.argyelan.ai/article/musk-v-altman-week-1-elon-musk-says-he-was-duped-warns-ai-co https://news.argyelan.ai/article/musk-v-altman-week-1-elon-musk-says-he-was-duped-warns-ai-co Fri, 01 May 2026 22:08:19 GMT In the first week of the landmark trial between Elon Musk and OpenAI, Musk took the stand in a crisp black suit and tie and argued that OpenAI CEO Sam Altman and president Greg Brockman had deceived him into bankrolling the company. Along the way, he warned  that AI could destroy us all and sat… business MIT Technology Review Amazon’s built-in AI price history expands to show the entire last year https://news.argyelan.ai/article/amazon8217s-built-in-ai-price-history-expands-to-show-the-en https://news.argyelan.ai/article/amazon8217s-built-in-ai-price-history-expands-to-show-the-en Fri, 01 May 2026 17:55:22 GMT Amazon's built-in price tracking feature now allows you to see how much a product's price has changed over the past year. To use the feature, open the Amazon app and select the "Price history" button next to the item's price, or ask Amazon's AI assistant Rufus. The expansion comes just weeks ahead of Amazon's annual […] general The Verge AI Dreame — the vacuum company — just ‘launched’ its own phones https://news.argyelan.ai/article/dreame-the-vacuum-company-just-launched-its-own-phones https://news.argyelan.ai/article/dreame-the-vacuum-company-just-launched-its-own-phones Fri, 01 May 2026 17:55:08 GMT Dreame, a Chinese manufacturer best known for its robot vacuums but with ambitions to do much more, says it's making smartphones now. I'm not sure I believe it. The company showed off two phones at its own Next event, which took place in California this week, though both had previously been revealed in China in […] general The Verge AI This accessory can snap a Steam Controller to your phone — or almost anything else https://news.argyelan.ai/article/this-accessory-can-snap-a-steam-controller-to-your-phone-or- https://news.argyelan.ai/article/this-accessory-can-snap-a-steam-controller-to-your-phone-or- Fri, 01 May 2026 17:00:00 GMT Valve's new Steam Controller goes on sale on Monday for $99, and accessories-maker Mechanism will be ready. As far as we know, Mechanism's new Basegrip is the very first way to attach a Steam Controller to your phone - as well as Mechanism's lineup of accessories, including mounts for hanging handhelds and gamepads on the […] general The Verge AI Players from the NBA, NFL, and MLB call for a ban on betting ‘unders’ https://news.argyelan.ai/article/players-from-the-nba-nfl-and-mlb-call-for-a-ban-on-betting-8 https://news.argyelan.ai/article/players-from-the-nba-nfl-and-mlb-call-for-a-ban-on-betting-8 Fri, 01 May 2026 16:57:27 GMT The unions backing professional NBA, NFL, MLB, NHL, and MLS players are calling on the Commodity Futures Trading Commission (CFTC) to ban prediction market platforms from allowing users to bet on a player's underperformance or injury, Sports Business Journal reports. In their letter, the unions cite the need for "appropriate regulations" to protect athletes and […] general The Verge AI Severe Linux Copy Fail security flaw uncovered using AI scanning help https://news.argyelan.ai/article/severe-linux-copy-fail-security-flaw-uncovered-using-ai-scan https://news.argyelan.ai/article/severe-linux-copy-fail-security-flaw-uncovered-using-ai-scan Fri, 01 May 2026 16:55:16 GMT Nearly every Linux distribution released since 2017 is currently vulnerable to a security bug called "Copy Fail" that allows any user to give themselves administrator privileges. The exploit, publicly disclosed as CVE-2026-31431 on Wednesday, uses a Python script that works across all of the vulnerable Linux distributions, requiring "no per-distro offsets, no version checks, no […] general The Verge AI Cyber-Insecurity in the AI Era https://news.argyelan.ai/article/cyber-insecurity-in-the-ai-era https://news.argyelan.ai/article/cyber-insecurity-in-the-ai-era Fri, 01 May 2026 15:54:01 GMT Cybersecurity was already under strain before AI entered the stack. Now, as AI expands the attack surface and adds new complexity, the limits of legacy approaches are becoming harder to ignore. This session from MIT Technology Review’s EmTech AI conference explores why security must be rethought with AI at its core, not layered on after… general MIT Technology Review Operationalizing AI for Scale and Sovereignty https://news.argyelan.ai/article/operationalizing-ai-for-scale-and-sovereignty https://news.argyelan.ai/article/operationalizing-ai-for-scale-and-sovereignty Fri, 01 May 2026 15:31:09 GMT Companies are taking control of their own data to tailor AI for their needs. The challenge lies in balancing ownership with the safe, trusted flow of high‑quality data needed to power reliable insights. This conversation from MIT Technology Review’s EmTech AI conference examines how AI factories unlock new levels of scale, sustainability, and governance—positioning data… general MIT Technology Review The Download: a new Christian phone network, and debugging LLMs https://news.argyelan.ai/article/the-download-a-new-christian-phone-network-and-debugging-llm https://news.argyelan.ai/article/the-download-a-new-christian-phone-network-and-debugging-llm Fri, 01 May 2026 12:10:00 GMT This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. A new US phone network for Christians aims to block porn and gender-related content A new US-wide cell phone network marketed to Christians is set to launch next week. It blocks… llm MIT Technology Review Inexpensive seafloor-hopping submersibles could stoke deep-sea science—and mining https://news.argyelan.ai/article/inexpensive-seafloor-hopping-submersibles-could-stoke-deep-s https://news.argyelan.ai/article/inexpensive-seafloor-hopping-submersibles-could-stoke-deep-s Fri, 01 May 2026 10:00:00 GMT Smack dab between Australia and South America, the US National Oceanic and Atmospheric Administration (NOAA) research vessel Rainier is currently on a mission to map more than 8,000 square nautical miles of the Pacific seafloor in search of critical mineral deposits. But it isn’t doing it alone; for a month starting this week, it will… research MIT Technology Review Trump’s mass firing just dealt another blow to American science https://news.argyelan.ai/article/trumps-mass-firing-just-dealt-another-blow-to-american-scien https://news.argyelan.ai/article/trumps-mass-firing-just-dealt-another-blow-to-american-scien Fri, 01 May 2026 09:00:00 GMT This past week delivered another gut punch for science in the US. This time, the target was the National Science Foundation—a federal agency that funds major research projects to the tune of around $9 billion. The foundation’s efforts were overseen by a board of 22 prominent scientists. On Friday last week, they were all fired.… research MIT Technology Review A new US phone network for Christians aims to block porn and gender-related content https://news.argyelan.ai/article/a-new-us-phone-network-for-christians-aims-to-block-porn-and https://news.argyelan.ai/article/a-new-us-phone-network-for-christians-aims-to-block-porn-and Fri, 01 May 2026 09:00:00 GMT A new US-wide cell phone network marketed to Christians is set to launch next week. It blocks porn, which experts in network security say marks the first time a US cell plan has used network-level blocking for such content that can’t be turned off even by adult account owners. It’s also rolling out a filter… general MIT Technology Review Compositional Meta-Learning for Mitigating Task Heterogeneity in Physics-Informed Neural Networks https://news.argyelan.ai/article/compositional-meta-learning-for-mitigating-task-heterogeneit https://news.argyelan.ai/article/compositional-meta-learning-for-mitigating-task-heterogeneit Fri, 01 May 2026 04:00:00 GMT arXiv:2604.26999v1 Announce Type: new Abstract: Physics-informed neural networks (PINNs) approximate solutions of partial differential equations (PDEs) by embedding physical laws into the loss function. In parameterized PDE families, variations in coefficients or boundary/initial conditions define distinct tasks. This makes training individual PINNs for each task computationally prohibitive, while cross-task transfer can be sensitive to task heterogeneity. While meta-learning can reduce retraining cost, existing methods often rely on a single global initialization and may suffer from negative transfer, particularly under feature-scarce coordinate inputs and limited training-task availability. We propose the Learning-Affinity Adaptive Modular Physics-Informed Neural Network (LAM-PINN), a compositional framework that leverages task-specific learning dynamics. LAM-PINN combines PDE parameters with learning-affinity metrics from brief transfer sessions to construct a task representation and cluster tasks even with coordinate-only inputs. It decomposes the model into cluster-specialized subnetworks and a shared meta network, and learns routing weights to selectively reuse modules instead of relying on a single global initialization. Across three PDE benchmarks, LAM-PINN achieves an average 19.7-fold reduction in mean squared error (MSE) on unseen tasks using only 10% of the training iterations required by conventional PINNs. These results indicate its effectiveness for generalization to unseen configurations within bounded design spaces of parameterized PDE families in resource-constrained engineering settings. research arXiv cs.AI Binary Spiking Neural Networks as Causal Models https://news.argyelan.ai/article/binary-spiking-neural-networks-as-causal-models https://news.argyelan.ai/article/binary-spiking-neural-networks-as-causal-models Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27007v1 Announce Type: new Abstract: We provide a causal analysis of Binary Spiking Neural Networks (BSNNs) to explain their behavior. We formally define a BSNN and represent its spiking activity as a binary causal model. Thanks to this causal representation, we are able to explain the output of the network by leveraging logic-based methods. In particular, we show that we can successfully use a SAT as well as a SMT solver to compute abductive explanations from this binary causal model. To illustrate our approach, we trained the BSNN on the standard MNIST dataset and applied our SAT-based and SMT-based methods to finding abductive explanations of the network's classifications based on pixel-level features. We also compared the found explanations against SHAP, a popular method used in the area of explainable AI. We show that, unlike SHAP, our approach guarantees that a found explanation does not contain completely irrelevant features. research arXiv cs.AI When Your LLM Reaches End-of-Life: A Framework for Confident Model Migration in Production Systems https://news.argyelan.ai/article/when-your-llm-reaches-end-of-life-a-framework-for-confident- https://news.argyelan.ai/article/when-your-llm-reaches-end-of-life-a-framework-for-confident- Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27082v1 Announce Type: new Abstract: We present a framework for migrating production Large Language Model (LLM) based systems when the underlying model reaches end-of-life or requires replacement. The key contribution is a Bayesian statistical approach that calibrates automated evaluation metrics against human judgments, enabling confident model comparison even with limited manual evaluation data. We demonstrate this framework on a commercial question-answering system serving 5.3M monthly interactions across six global regions; evaluating correctness, refusal behavior, and stylistic adherence to successfully identify suitable replacement models. The framework is broadly applicable to any enterprise deploying LLM-based products, providing a principled, reproducible methodology for model migration that balances quality assurance with evaluation efficiency. This is a capability increasingly essential as the LLM ecosystem continues to evolve rapidly and organizations manage portfolios of AI-powered services across multiple models, regions, and use cases. llm arXiv cs.AI End-to-end autonomous scientific discovery on a real optical platform https://news.argyelan.ai/article/end-to-end-autonomous-scientific-discovery-on-a-real-optical https://news.argyelan.ai/article/end-to-end-autonomous-scientific-discovery-on-a-real-optical Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27092v1 Announce Type: new Abstract: Scientific research has long been human-led, driving new knowledge and transformative technologies through the continual revision of questions, methods and claims as evidence accumulates. Although large language model (LLM)-based agents are beginning to move beyond assisting predefined research workflows, none has yet demonstrated end-to-end autonomous discovery in a real physical system that produces a nontrivial result supported by experimental evidence. Here we introduce Qiushi Discovery Engine, an LLM-based agentic system for end-to-end autonomous scientific discovery on a real optical platform. Qiushi Engine combines nonlinear research phases, Meta-Trace memory and a dual-layer architecture to maintain adaptive and stable research trajectories across long-horizon investigations involving thousands of LLM-mediated reasoning, measurement and revision actions. It autonomously reproduces a published transmission-matrix experiment on a non-original platform and converts an abstract coherence-order theory into experimental observables, providing, to our knowledge, the first observation of this class of coherence-order structure. More importantly, in an open-ended study involving 145.9 million tokens, 3,242 LLM calls, 1,242 tool calls, 163 research notes and 44 scripts, Qiushi Engine proposes and experimentally validates optical bilinear interaction, a physical mechanism structurally analogous to a core operation in Transformer attention. This AI-discovered mechanism suggests a route towards high-speed, energy-efficient optical hardware for pairwise computation. To our knowledge, this is the first demonstration of an AI agentic system autonomously identifying and experimentally validating a nontrivial, previously unreported physical mechanism, marking a milestone for research-level autonomous agents. agent arXiv cs.AI Think it, Run it: Autonomous ML pipeline generation via self-healing multi-agent AI https://news.argyelan.ai/article/think-it-run-it-autonomous-ml-pipeline-generation-via-self-h https://news.argyelan.ai/article/think-it-run-it-autonomous-ml-pipeline-generation-via-self-h Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27096v1 Announce Type: new Abstract: The purpose of our paper is to develop a unified multi-agent architecture that automates end-to-end machine learning (ML) pipeline generation from datasets and natural-language (NL) goals, improving efficiency, robustness and explainability. A five-agent system is proposed to handle profiling, intent parsing, microservice recommendation, Directed Acyclic Graph (DAG) construction and execution. It integrates code-grounded Retrieval-Augmented Generation (RAG) for microservice understanding, an explainable hybrid recommender combining multiple criteria, a self-healing mechanism using Large Language Model (LLM)-based error interpretation and adaptive learning from execution history. The approach is evaluated on 150 ML tasks across diverse scenarios. The system achieves an 84.7% end-to-end pipeline success rate, outperforming baseline methods. It demonstrates improved robustness through self-healing and reduces workflow development time compared to manual construction. The study introduces a novel integration of code-grounded RAG, explainable recommendation, self-healing execution and adaptive learning within a single architecture, showing that tightly coupled intelligent components can outperform isolated solutions. research arXiv cs.AI Unsupervised Electrofacies Classification and Porosity Characterization in the Offshore Keta Basin Using Wireline Logs https://news.argyelan.ai/article/unsupervised-electrofacies-classification-and-porosity-chara https://news.argyelan.ai/article/unsupervised-electrofacies-classification-and-porosity-chara Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27126v1 Announce Type: new Abstract: This study presents an unsupervised machine learning workflow for electrofacies analysis in the offshore Keta Basin, Ghana, where core data are scarce. Six standard wireline logs from Well~C were analysed over a depth interval comprising approximately $11{,}195$ samples. K-means clustering was applied in multivariate log space, with the clustering structure evaluated using inertia and silhouette diagnostics. Four clusters were identified, supported by an average silhouette coefficient of approximately $0.50$, indicating moderate but meaningful separation. The resulting electrofacies exhibit systematic, depth-continuous patterns associated with variations in clay content, porosity, and rock framework properties, forming a geological continuum from shale-dominated to cleaner sandstone-dominated units. The results demonstrate that log-only, unsupervised clustering supported by quantitative metrics provides a robust and reproducible framework for subsurface characterisation. The proposed workflow offers a practical tool for early-stage formation evaluation in frontier offshore basins and a foundation for future integrated studies. research arXiv cs.AI TRUST: A Framework for Decentralized AI Service v.0.1 https://news.argyelan.ai/article/trust-a-framework-for-decentralized-ai-service-v01 https://news.argyelan.ai/article/trust-a-framework-for-decentralized-ai-service-v01 Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27132v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) and Multi-Agent Systems (MAS) in high-stakes domains demand reliable verification, yet centralized approaches suffer four limitations: (1) Robustness, with single points of failure vulnerable to attacks and bias; (2) Scalability, as reasoning complexity creates bottlenecks; (3) Opacity, as hidden auditing erodes trust; and (4) Privacy, as exposed reasoning traces risk model theft. We introduce TRUST (Transparent, Robust, and Unified Services for Trustworthy AI), a decentralized framework with three innovations: (i) Hierarchical Directed Acyclic Graphs (HDAGs) that decompose Chain-of-Thought reasoning into five abstraction levels for parallel distributed auditing; (ii) the DAAN protocol, which projects multi-agent interactions into Causal Interaction Graphs (CIGs) for deterministic root-cause attribution; and (iii) a multi-tier consensus mechanism among computational checkers, LLM evaluators, and human experts with stake-weighted voting that guarantees correctness under 30% adversarial participation. We prove a Safety-Profitability Theorem ensuring honest auditors profit while malicious actors incur losses. All decisions are recorded on-chain, while privacy-by-design segmentation prevents reconstruction of proprietary logic. Across multiple LLMs and benchmarks, TRUST attains 72.4% accuracy (4-18% above baselines) and remains resilient against 20% corruption. DAAN reaches 70% root-cause attribution (vs. 54-63% for standard methods) with 60% token savings. Human studies validate the design (F1 = 0.89, Brier = 0.074). The framework supports (A1) decentralized auditing, (A2) tamper-proof leaderboards, (A3) trustless data annotation, and (A4) governed autonomous agents, pioneering decentralized AI auditing for safe, accountable deployment of reasoning-capable systems. agent arXiv cs.AI Unpacking Vibe Coding: Help-Seeking Processes in Student-AI Interactions While Programming https://news.argyelan.ai/article/unpacking-vibe-coding-help-seeking-processes-in-student-ai-i https://news.argyelan.ai/article/unpacking-vibe-coding-help-seeking-processes-in-student-ai-i Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27134v1 Announce Type: new Abstract: Generative AI is reshaping higher education programming through vibe coding, where students collaborate with AI via natural language rather than writing code line-by-line. We conceptualize this practice as help-seeking, analyzing 19,418 interaction turns from 110 undergraduate students. Using inductive coding and Heterogeneous Transition Network Analysis, we examined interaction sequences to compare top- and low-performing students. Results reveal that top performers engaged in instrumental help-seeking -- inquiry and exploration -- eliciting tutor-like AI responses. In contrast, low performers relied on executive help-seeking, frequently delegating tasks and prompting the AI to assume an executor role focused on ready-made solutions. These findings indicate that currently generative AI mirrors student intent (whether productive or passive) rather than optimizing for learning. To evolve from tools to teammates, AI systems must move beyond passive compliance. We argue for pedagogically aligned design that detect unproductive delegation and adaptively steer educational interactions toward inquiry, ensuring student-AI partnerships augment rather than replace cognitive effort. tools arXiv cs.AI Step-level Optimization for Efficient Computer-use Agents https://news.argyelan.ai/article/step-level-optimization-for-efficient-computer-use-agents https://news.argyelan.ai/article/step-level-optimization-for-efficient-computer-use-agents Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27151v1 Announce Type: new Abstract: Computer-use agents provide a promising path toward general software automation because they can interact directly with arbitrary graphical user interfaces instead of relying on brittle, application-specific integrations. Despite recent advances in benchmark performance, strong computer-use agents remain expensive and slow in practice, since most systems invoke large multimodal models at nearly every interaction step. We argue that this uniform allocation of compute is fundamentally inefficient for long-horizon GUI tasks. Such trajectories are highly heterogeneous: many steps are routine and can be handled reliably by smaller, cheaper policies, while errors tend to concentrate at a relatively small number of high-risk moments. Across computer-use benchmarks, these failures repeatedly take two forms: progress stalls, where the agent loops, repeats ineffective actions, or fails to make meaningful progress, and silent semantic drift, where the agent continues taking locally plausible actions after already deviating from the user's true goal. To address this inefficiency, we propose an event-driven, step-level cascade for computer-use agents that runs a small policy by default and escalates to a stronger model only when lightweight learned monitors detect elevated risk. Our framework combines two complementary signals: a Stuck Monitor that detects degraded progress from recent reasoning-action history and triggers recovery, and a Milestone Monitor that identifies semantically meaningful checkpoints where sparse verification is most informative for catching drift. This design turns always-on frontier-model inference into adaptive, on-demand compute allocation over the course of an evolving interaction. The framework is modular and deployment-oriented: it can be layered on top of existing computer-use agents without changing the underlying agent architecture or retraining the large model. research arXiv cs.AI Interval Orders, Biorders and Credibility-limited Belief Revision https://news.argyelan.ai/article/interval-orders-biorders-and-credibility-limited-belief-revi https://news.argyelan.ai/article/interval-orders-biorders-and-credibility-limited-belief-revi Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27156v1 Announce Type: new Abstract: Rational belief revision is commonly viewed as being based on a preference order between possible worlds, with the resulting new belief set being those sentences true in all the most preferred models of the incoming new information. Usually, such a preference order is taken to be a total preorder. Nevertheless, there are other, more general classes of ordering that can also be employed. In this paper, we explore two such classes that have been studied within the theory of rational choice but have seen limited or no application in belief revision. We begin with interval orders, introduced by Fishburn in the '80s, which associate with each possible world a nonnegative `interval' of plausibility. We then move on to biorders, studied by Aleskerov, Bouyssou, and Monjardet, which generalise interval orders by allowing the intervals to have negative lengths, a feature that can be used to capture a notion of dissonance or instability. We provide axiomatic characterisations of these two resulting families of belief revision operators, as well as of two further families of interest that lie between interval orders and biorders. We show that while biorder-based revisions satisfy the Success postulate, they do not always yield consistent outputs. By modifying their definition to discard inputs that lead to inconsistency as `incredible', we derive new families of so-called non-prioritised revision that satisfy the Consistency postulate, but not the Success one. These families are linked to credibility-limited revision operators of Hansson et al., but for which the set of credible sentences does not satisfy the single-sentence closure condition. We argue that the biorder-based approach is well-suited for scenarios where an agent might initially reject new information, but may accept it when presented with additional explanation. research arXiv cs.AI Revisiting RaBitQ and TurboQuant: A Symmetric Comparison of Methods, Theory, and Experiments https://news.argyelan.ai/article/revisiting-rabitq-and-turboquant-a-symmetric-comparison-of-m https://news.argyelan.ai/article/revisiting-rabitq-and-turboquant-a-symmetric-comparison-of-m Fri, 01 May 2026 04:00:00 GMT arXiv:2604.19528v2 Announce Type: replace-cross Abstract: This technical note revisits the relationship between RaBitQ and TurboQuant under a unified comparison framework. We compare the two methods in terms of methodology, theoretical guarantees, and empirical performance, using a reproducible, transparent, and symmetric setup. Our results show that, despite the claimed advantage of TurboQuant, TurboQuant performs worse than RaBitQ in most tested settings of inner-product estimation, nearest-neighbor search and KV cache quantization. We further find that several reported runtime and recall results in the TurboQuant paper could not be reproduced from the released implementation under the stated configuration. Overall, this note clarifies the shared structure and genuine differences between the two lines of work, while documenting reproducibility issues in the experimental results reported by the TurboQuant paper. research arXiv cs.AI Evaluating TabPFN for Mild Cognitive Impairment to Alzheimer's Disease Conversion in Data Limited Settings https://news.argyelan.ai/article/evaluating-tabpfn-for-mild-cognitive-impairment-to-alzheimer https://news.argyelan.ai/article/evaluating-tabpfn-for-mild-cognitive-impairment-to-alzheimer Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27195v1 Announce Type: new Abstract: Accurate prediction of conversion from Mild Cognitive Impairment (MCI) to Alzheimers Diseases (AD) is essential for early intervention, however, developing reliable conversion predictive models is difficult to develop due to limited longitudinal data availability We evaluate TabPFN (Tabular Pre-Trained Foundation Network) against traditional machine learning methods for predicting 3 year MCI to AD conversion using the TADPOLE dataset derived from ADNI. Using multimodal biomarker features extracted from demographics, APOE4, MRI volumes, CSF markers, and PET imaging, we conducted an experimental comparison across varying training set sizes (N=50 to 1000) and models including XGBoost, Random Forest, LightGBM, and Logistic Regression. TabPFN achieved one the highest performance (AUC=0.892), outperforming LightGBM (AUC=0.860) and demonstrating advantages in low data settings. At N=50 training samples, TabPFN maintained strong AUC while the traditional machine learning models struggles at small training samples. These findings demonstrate that foundation models are promising for disease prediction in data limited scenarios, such as Alzheimers diseases. research arXiv cs.AI Toward Personalized Digital Twins for Cognitive Decline Assessment: A Multimodal, Uncertainty-Aware Framework https://news.argyelan.ai/article/toward-personalized-digital-twins-for-cognitive-decline-asse https://news.argyelan.ai/article/toward-personalized-digital-twins-for-cognitive-decline-asse Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27217v1 Announce Type: new Abstract: Cognitive decline is highly heterogeneous across individuals, which complicates prognosis, trial design, and treatment planning. We present the Personalized Cognitive Decline Assessment Digital Twin (PCD-DT), a multimodal and uncertainty-aware framework for modeling patient-specific disease trajectories from sparse, noisy, and irregular longitudinal data. The framework combines three methodological components: (1) latent state-space models for individualized temporal dynamics, (2) multimodal fusion for clinical, biomarker, and imaging features, and (3) uncertainty-aware validation and adaptive updating for robust digital twin operation. We also outline how conditional generative models can support data augmentation and stress testing for underrepresented progression patterns. As a preliminary feasibility study, we analyze longitudinal TADPOLE trajectories and show clear separation between cognitively normal and Alzheimer's disease cohorts in ADAS13, ventricle volume, and hippocampal volume over five years. We further conduct a multimodal next-visit prediction ablation using an LSTM sequence model on 3{,}003 visit-pair sequences derived from TADPOLE, where the combined cognitive plus MRI configuration achieves the lowest standardized RMSE for both ADAS13 (0.4419) and ventricle volume (0.5842), outperforming a Last Observation Carried Forward baseline. A Bayesian tensor modeling component for high-dimensional imaging fusion is also discussed. These results support the feasibility of the proposed architecture while also highlighting the need for stronger uncertainty calibration and longer-horizon predictive evaluation. The PCD-DT framework provides a principled starting point for personalized in silico modeling in neurodegenerative disease. This work positions PCD-DT as a foundational step toward clinically deployable, uncertainty-aware digital twin systems. research arXiv cs.AI Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction https://news.argyelan.ai/article/web2bigtable-a-bi-level-multi-agent-llm-system-for-internet- https://news.argyelan.ai/article/web2bigtable-a-bi-level-multi-agent-llm-system-for-internet- Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27221v1 Announce Type: new Abstract: Agentic web search increasingly faces two distinct demands: deep reasoning over a single target, and structured aggregation across many entities and heterogeneous sources. Current systems struggle on both fronts. Breadth-oriented tasks demand schema-aligned outputs with wide coverage and cross-entity consistency, while depth-oriented tasks require coherent reasoning over long, branching search trajectories. We introduce \textbf{Web2BigTable}, a multi-agent framework for web-to-table search that supports both regimes. Web2BigTable adopts a bi-level architecture in which an upper-level orchestrator decomposes the task into sub-problems and lower-level worker agents solve them in parallel. Through a closed-loop run--verify--reflect process, the framework jointly improves decomposition and execution over time via persistent, human-readable external memory, with self-evolving updates to each single-agent. During execution, workers coordinate through a shared workspace that makes partial findings visible, allowing them to reduce redundant exploration, reconcile conflicting evidence, and adapt to emerging coverage gaps. Web2BigTable sets a new state of the art on WideSearch, reaching an Avg@4 Success Rate of \textbf{38.50} ($7.5\times$ the second best at 5.10), Row F1 of \textbf{63.53} (+25.03 over the second best), and Item F1 of \textbf{80.12} (+14.42 over the second best). It also generalises to depth-oriented search on XBench-DeepSearch, achieving 73.0 accuracy. Code is available at https://github.com/web2bigtable/web2bigtable. agent arXiv cs.AI When Roles Fail: Epistemic Constraints on Advocate Role Fidelity in LLM-Based Political Statement Analysis https://news.argyelan.ai/article/when-roles-fail-epistemic-constraints-on-advocate-role-fidel https://news.argyelan.ai/article/when-roles-fail-epistemic-constraints-on-advocate-role-fidel Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27228v1 Announce Type: new Abstract: Democratic discourse analysis systems increasingly rely on multi-agent LLM pipelines in which distinct evaluator models are assigned adversarial roles to generate structured, multi-perspective assessments of political statements. A core assumption is that models will reliably maintain their assigned roles. This paper provides the first systematic empirical test of that assumption using the TRUST pipeline. We develop an epistemic stance classifier that identifies advocate roles from reasoning text without relying on surface vocabulary, and measure role fidelity across 60 political statements (30 English, 30 German) using four metrics: Role Drift Index (RDI), Expected Drift Distance (EDD), Directional Drift Index (DDI), and Entropy-based Role Stability (ERS). We identify two failure modes - the Epistemic Floor Effect (fact-check results create an absolute lower bound below which the legitimizing role cannot be maintained) and Role-Prior Conflict (training-time knowledge overrides role instructions for factually unambiguous statements) - as manifestations of a single mechanism: Epistemic Role Override (ERO). Model choice significantly affects role fidelity: Mistral Large outperforms Claude Sonnet by 28pp (67% vs. 39%) and exhibits a qualitatively different failure mode - role abandonment without polarity reversal - compared to Claude's active switch to the opposing stance. Role fidelity is language-robust. Fact-check provider choice is not universally neutral: Perplexity significantly reduces Claude's role fidelity on German statements (Delta = -15pp, p = 0.007) while leaving Mistral unaffected. These findings have direct implications for multi-agent LLM validation: a system validated without role fidelity measurement may systematically misrepresent the epistemic diversity it was designed to provide. llm arXiv cs.AI Reinforced Agent: Inference-Time Feedback for Tool-Calling Agents https://news.argyelan.ai/article/reinforced-agent-inference-time-feedback-for-tool-calling-ag https://news.argyelan.ai/article/reinforced-agent-inference-time-feedback-for-tool-calling-ag Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27233v1 Announce Type: new Abstract: Tool-calling agents are evaluated on tool selection, parameter accuracy, and scope recognition, yet LLM trajectory assessments remain inherently post-hoc. Disconnected from the active execution loop, such assessments identify errors that are usually addressed through prompt-tuning or retraining, and fundamentally cannot course-correct the agent in real time. To close this gap, we move evaluation into the execution loop at inference time: a specialized reviewer agent evaluates provisional tool calls prior to execution, shifting the paradigm from post-hoc recovery to proactive evaluation and error mitigation. In practice, this architecture establishes a clear separation of concerns between the primary execution agent and a secondary review agent. As with any multi-agent system, the reviewer can introduce new errors while correcting others, yet no prior work to our knowledge has systematically measured this tradeoff. To quantify this tradeoff, we introduce Helpfulness-Harmfulness metrics: helpfulness measures the percentage of base agent errors that feedback corrects; harmfulness measures the percentage of correct responses that feedback degrades. These metrics directly inform reviewer design by revealing whether a given model or prompt provides net positive value. We evaluate our approach on BFCL (single-turn) and Tau2-Bench (multi-turn stateful scenarios), achieving +5.5% on irrelevance detection and +7.1% on multi-turn tasks. Our metrics reveal that reviewer model choice is critical: the reasoning model o3-mini achieves a 3:1 benefit-to-risk ratio versus 2.1:1 for GPT-4o. Automated prompt optimization via GEPA provides an additional +1.5-2.8%. Together, these results demonstrate a core advantage of separating execution and review: the reviewer can be systematically improved through model selection and prompt optimization, without retraining the base agent. llm arXiv cs.AI AutoSurfer -- Teaching Web Agents through Comprehensive Surfing, Learning, and Modeling https://news.argyelan.ai/article/autosurfer-teaching-web-agents-through-comprehensive-surfing https://news.argyelan.ai/article/autosurfer-teaching-web-agents-through-comprehensive-surfing Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27253v1 Announce Type: new Abstract: Recent advances in multimodal large language models (LLMs) have revolutionized web agents that can automate complex tasks on websites. However, their accuracy remains limited by the scarcity of high-quality web trajectory training data. Existing automatic trajectory generation methods suffer from incomplete website coverage due to homepage-based task proposals or random-walk exploration. Such methods often result in hallucinated or ambiguous task synthesis that lead to incomplete and unreliable trajectory generation. Here, we present AutoSurfer, a comprehensive web trajectory generator that addresses these limitations through three key innovations. First, AutoSurfer employs a systematic breadth-first exploration strategy that maintains a queue of discovered pages and action traces, propagates knowledge across pages to avoid redundant exploration, and recursively expands multi-level graphical user interface elements - closely resembling how a human would learn a new website. Second, AutoSurfer leverages the exploration trajectory to guide task synthesis, reducing hallucinations by grounding complex tasks in actual navigation paths rather than isolated actions or page content alone. Third, AutoSurfer uses the same exploration trajectory as hints to steer a web agent toward more accurate and reliable trajectory refinement. Together, these innovations enable AutoSurfer to comprehensively cover a website's action space and generate data suitable for training website-specific LLMs. We evaluate AutoSurfer on the WebArena benchmark by fine-tuning Qwen2.5-VL-7B-Instruct and demonstrate that it outperforms state-of-the-art methods - Explorer, OS-Genesis, and SynthAgent - achieving up to 24.23% overall task completion accuracy compared to 19.59% for the best prior method. Further, task diversity analysis demonstrates that AutoSurfer yields a more diverse distribution of synthesized tasks. llm arXiv cs.AI OptimusKG: Unifying biomedical knowledge in a modern multimodal graph https://news.argyelan.ai/article/optimuskg-unifying-biomedical-knowledge-in-a-modern-multimod https://news.argyelan.ai/article/optimuskg-unifying-biomedical-knowledge-in-a-modern-multimod Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27269v1 Announce Type: new Abstract: Biomedical knowledge graphs (KGs) are widely used in the life sciences, yet many are derived from unstructured documents and therefore lack schema-level constrains, whereas graphs assembled from structured resources are difficult to harmonize into a unified representation. We present OptimusKG, a multimodal biomedical labeled property graph (LPG) built from structured and semi-structured resources to preserve factual, type-specific metadata across molecular, anatomical, clinical, and environmental domains. OptimusKG contains 190,531 nodes across 10 entity types, 21,813,816 edges across 26 relation types, and 67,249,863 property instances encoding 110,276,843 values across 150 distinct property keys, derived from 18 ontologies and controlled vocabularies. The graph enforces a top-level schema for nodes and edges and retains granular, type-specific properties, cross-references, and provenance across molecular, anatomical, clinical, and environmental domains. We assessed the validity of OptimusKG by evaluating whether graph relationships are supported by evidence from the scientific literature using a multimodal agent, PaperQA3. PaperQA3 identified supporting evidence for 70.0% of sampled edges, whereas 83.4% of sampled false edges received no supporting evidence. Edges without literature support were concentrated in associations derived from experimental and functional genomics resources, suggesting that OptimusKG captures biomedical knowledge that may precede synthesis in the scientific literature. OptimusKG is distributed as Apache Parquet files, providing a standardized resource for graph-based machine learning, knowledge-grounded retrieval with large language models, and biomedical discovery use cases such as hypothesis generation. research arXiv cs.AI The Inverse-Wisdom Law: Architectural Tribalism and the Consensus Paradox in Agentic Swarms https://news.argyelan.ai/article/the-inverse-wisdom-law-architectural-tribalism-and-the-conse https://news.argyelan.ai/article/the-inverse-wisdom-law-architectural-tribalism-and-the-conse Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27274v1 Announce Type: new Abstract: As AI transitions toward multi-agent systems (MAS) to solve complex workflows, research paradigms operate on the axiomatic assumption that agent collaboration mirrors the "Wisdom of the Crowd". We challenge this assumption by formalizing the Consensus Paradox: a phenomenon where agentic swarms prioritize internal architectural agreement over external logical truth. Through a 36 experiments encompassing 12,804 trajectories across three state-of-the-art (SOTA) benchmarks (GAIA, Multi-Challenge, and SWE-bench), we prove the Inverse-Wisdom Law: in kinship-dominant swarms, adding logical agents increases the stability of erroneous trajectories rather than the probability of truth. The introduction of additional logical audits converges the system toward a Logic Saturation where internal entropy hits zero while factual error hits unity. By evaluating the interaction between the 3 preeminent SOTA models (Gemini 3.1 Pro, Claude Sonnet 4.6, and GPT-5.4), we establish the Architectural Tribalism Asymmetry as a mechanistic law of transformer weights. We demonstrate that terminal swarm integrity is strictly gated by the synthesizer's receptive logic, rather than aggregate agent quality. We define the Tribalism Coefficient and the Sycophantic Weight as the primary mechanistic determinants of swarm failure. Finally, we establish the Heterogeneity Mandate as a foundational safety requirement for resilient agentic architectures. llm arXiv cs.AI Mechanized Foundations of Structural Governance: Machine-Checked Proofs for Governed Intelligence https://news.argyelan.ai/article/mechanized-foundations-of-structural-governance-machine-chec https://news.argyelan.ai/article/mechanized-foundations-of-structural-governance-machine-chec Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27289v1 Announce Type: new Abstract: We present five results in the theory of structural governance for cognitive workflow systems. Three are mechanized in Coq 8.19 using the Interaction Trees library with parameterized coinduction; two are proved on paper with explicit reductions. The Coinductive Safety Predicate (gov_safe) is a coinductive property that captures governance safety for infinite program behaviors, indexed by a boolean permission flag that is provably false for ungoverned I/O and true for governed interpretations (mechanized). The Governance Invariance Theorem establishes that governance is uniform across the meta-recursive tower: governance at level n+1 reduces to governance at level n by definitional equality of the type (mechanized). The Sufficiency Theorem proves that four atomic primitives (code, reason, memory, call) are expressively complete for any discrete intelligent system, formalized as compositional closure of a Kleisli category (mechanized). The Alternating Normal Form provides a canonical decomposition of any machine into alternating code and effect layers, with a confluent rewriting system (paper proof). The Necessity Theorem proves via explicit reduction to Rice's theorem that an architecturally opaque component (the reason primitive) is mathematically necessary for problems requiring semantic judgment (paper proof). A sixth contribution connects the abstract model to the deployed runtime: the Verified Interpreter Specification formalizes the BEAM runtime's trust, capability, and hash chain logic in Coq, then tests the running system against this specification using property-based testing with over 70,000 randomly generated directive sequences and zero disagreements. The mechanization comprises approximately 12,000 lines across 36 modules with 454 theorems and zero admitted lemmas. research arXiv cs.AI The Two Boundaries: Why Behavioral AI Governance Fails Structurally https://news.argyelan.ai/article/the-two-boundaries-why-behavioral-ai-governance-fails-struct https://news.argyelan.ai/article/the-two-boundaries-why-behavioral-ai-governance-fails-struct Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27292v1 Announce Type: new Abstract: Every system that performs effects has two boundaries: what it can do (expressiveness) and what governance covers (governance). In nearly all deployed AI systems, these boundaries are defined independently, creating three regions: governed capabilities (the only useful region), ungoverned capabilities (risk), and governance policies that address non-existent capabilities (theater). Two of the three regions are failure modes. We focus on the governance of effects: actions that AI systems perform in the world (API calls, database writes, tool invocations). This is distinct from the governance of model outputs (content quality, bias, fairness), which operates at a different level and requires different mechanisms. We present a formal framework for analyzing this structural gap. Rice's theorem (1953) proves the gap is undecidable in the general case for any Turing-complete architecture that attempts to govern effects behaviorally: no algorithm can decide non-trivial semantic properties of arbitrary programs, including the property "this program's effects comply with the governance policy." We define coterminous governance: a system property where the expressivenessboundary equals the governance boundary. We show that coterminous governance requires an architectural decision (separatingcomputation from effect) rather than a governance layer added after the fact. We show that structural governance under this separation subsumes separate governance infrastructure: governance checks become part of the execution pipeline rather than a second system running alongside it. We propose coterminous governance as the testable criterion for any AI governance system: either the two boundaries are provably identical, or risk and theater are structurally inevitable. Proofs are mechanized in Coq (454 theorems, 36 modules, 0 admitted). tools arXiv cs.AI Learning Rate Engineering: From Coarse Single Parameter to Layered Evolution https://news.argyelan.ai/article/learning-rate-engineering-from-coarse-single-parameter-to-la https://news.argyelan.ai/article/learning-rate-engineering-from-coarse-single-parameter-to-la Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27295v1 Announce Type: new Abstract: Learning rate scheduling has evolved from the single global fixed rate of early SGD to sophisticated layer-wise adaptive strategies. We systematize this evolution into five generations: (Gen1) global fixed learning rates, (Gen2) global scheduling, (Gen3) parameter-level adaptation, (Gen4) layer-level differentiation, and (Gen5) joint layer-time scheduling. We trace the fundamental motivation behind each transition, showing how the shift from one-size-fits-all to tailoring by layer and time addresses the impossible trinity of transfer learning: lower layers require small updates to preserve general knowledge while higher layers need large updates to adapt to new tasks. Building on this taxonomy, we propose Discriminative Adaptive Layer Scaling (DALS), a unified framework that integrates phase-adaptive cosine scheduling, depth-aware Grokfast gradient filtering, and LARS-style trust ratios into a single coherent optimizer. We benchmark 18 strategies including three DALS variants across all five generations on five datasets: synthetic, CIFAR-10 (from scratch), RTE, TREC-6, and IMDb (fine-tuning). On synthetic, DALS achieves the best accuracy at 98.0%, while DALS-Fast reaches 90% in just 3 epochs. The cross-dataset analysis reveals striking regime-dependent patterns -- no single strategy wins across all regimes. Critically, STLR+Discriminative, the ULMFiT champion, catastrophically fails on from-scratch tasks (43.6% on TREC-6 from scratch vs. 96.8% with RAdam), confirming that directional decay biases are harmful without pretrained features. DALS avoids either extreme, achieving the best synthetic result while maintaining competitive fine-tuning performance. research arXiv cs.AI Machine Collective Intelligence for Explainable Scientific Discovery https://news.argyelan.ai/article/machine-collective-intelligence-for-explainable-scientific-d https://news.argyelan.ai/article/machine-collective-intelligence-for-explainable-scientific-d Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27297v1 Announce Type: new Abstract: Deriving governing equations from empirical observations is a longstanding challenge in science. Although artificial intelligence (AI) has demonstrated substantial capabilities in function approximation, the discovery of explainable and extrapolatable equations remains a fundamental limitation of modern AI, posing a central bottleneck for AI-driven scientific discovery. Here, we present machine collective intelligence, a unified paradigm that integrates two fundamental yet distinct traditions in computational intelligence--symbolism and metaheuristics--to enable autonomous and evolutionary discovery of governing equations. It orchestrates multiple reasoning agents to evolve their symbolic hypotheses through coordinated generation, evaluation, critique, and consolidation, enabling scientific discovery beyond single-agent inference. Across scientific systems governed by deterministic, stochastic, or previously uncharacterized dynamics, machine collective intelligence autonomously recovered the underlying governing equations without relying on hand-crafted domain knowledge. Furthermore, the resulting equations reduced extrapolation error by up to six orders of magnitude relative to deep neural networks, while condensing 0.5-1 million model parameters into just 5-40 interpretable parameters. This study marks an important shift in AI toward the autonomous discovery of principled scientific equations. agent arXiv cs.AI METASYMBO: Multi-Agent Language-Guided Metamaterial Discovery via Symbolic Latent Evolution https://news.argyelan.ai/article/metasymbo-multi-agent-language-guided-metamaterial-discovery https://news.argyelan.ai/article/metasymbo-multi-agent-language-guided-metamaterial-discovery Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27300v1 Announce Type: new Abstract: Metamaterial discovery seeks microstructured materials whose geometry induces targeted mechanical behavior. Existing inverse-design methods can efficiently generate candidates, but they typically require explicit numerical property targets and are less suitable for early-stage exploration, where researchers often begin with incomplete constraints and qualitative intents expressed in natural language. Large language models can interpret such intents, but they lack geometric awareness and physical property validity. To address this gap, we propose MetaSymbO, a multi-agent framework for language-guided Metamaterial discovery via Symbolic-driven latent evOlution. Specifically, MetaSymbO contains three agents: a Designer that interprets free-form design intents and retrieves a semantically consistent scaffold, a Generator that synthesizes candidate microstructures in a disentangled latent space, and a Supervisor that provides fast property-aware feedback for iterative refinement. To move beyond the limitations of reproducing known samples from literature and training data, we further introduce symbolic-driven latent evolution, which applies programmable operators over disentangled latent factors to compose, modify, and refine structures at inference time. Extensive experiments demonstrate that (i) MetaSymbO improves structural validity by up to 34% in symmetry and nearly 98% in periodicity compared to state-of-the-art baselines; (ii) MetaSymbO achieves about 6-7% higher language-guidance scores while maintaining superior structure novelty compared to advanced reasoning LLMs; (iii) qualitative analyses confirm the effectiveness of symbolic logic operators in enabling programmable semantic alignment; and (iv) realworld case studies on auxetic, high-stiffness metamaterial design further validate its practical capability. llm arXiv cs.AI End-to-End Evaluation and Governance of an EHR-Embedded AI Agent for Clinicians https://news.argyelan.ai/article/end-to-end-evaluation-and-governance-of-an-ehr-embedded-ai-a https://news.argyelan.ai/article/end-to-end-evaluation-and-governance-of-an-ehr-embedded-ai-a Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27309v1 Announce Type: new Abstract: Clinical AI systems require not just point-in-time evaluation but continuous governance: the ongoing practice of monitoring, evaluating, iterating, and re-evaluating performance throughout deployment. We present an end-to-end framework of governance that integrates rubric validation, live deployment feedback, technical performance monitoring, and cost tracking, with controlled experimentation gating system changes before deployment. Applied to Hyperscribe, an EHR-embedded agent that converts ambient audio into structured chart updates, twenty clinicians authored 1,646 validated rubrics across 823 cases. Seven Hyperscribe versions were evaluated through controlled experiments, with median scores improving from 84% to 95%. Analysis of 107 live feedback entries over three months showed feedback composition shifting from 79% error reports and 14% positive observations to 30% errors and 45% positive observations as engineering interventions resolved failures. Median processing time per audio segment was 8.1 seconds with a 99.6% effective completion rate after retry mechanisms absorbed transient model errors. These results demonstrate that continuous, multi-channel governance of deployed clinical AI is both achievable and effective. agent arXiv cs.AI Investigating More Explainable and Partition-Free Compositionality Estimation for LLMs: A Rule-Generation Perspective https://news.argyelan.ai/article/investigating-more-explainable-and-partition-free-compositio https://news.argyelan.ai/article/investigating-more-explainable-and-partition-free-compositio Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27340v1 Announce Type: new Abstract: Compositional generalization tests are often used to estimate the compositionality of LLMs. However, such tests have the following limitations: (1) they only focus on the output results without considering LLMs' understanding of sample compositionality, resulting in explainability defects; (2) they rely on dataset partition to form the test set with combinations unseen in the training set, suffering from combination leakage issues. In this work, we propose a novel rule-generation perspective for compositionality estimation for LLMs. It requires LLMs to generate a program as rules for dataset mapping and provides estimates of the compositionality of LLMs using complexity-based theory. The perspective addresses the limitations of compositional generalization tests and provides a new way to analyze the compositionality characterization of LLMs. We conduct experiments and analysis of existing advanced LLMs based on this perspective on a string-to-grid task, and find various compositionality characterizations and compositionality deficiencies exhibited by LLMs. research arXiv cs.AI Characterizing the Consistency of the Emergent Misalignment Persona https://news.argyelan.ai/article/characterizing-the-consistency-of-the-emergent-misalignment- https://news.argyelan.ai/article/characterizing-the-consistency-of-the-emergent-misalignment- Fri, 01 May 2026 04:00:00 GMT arXiv:2604.28082v1 Announce Type: new Abstract: Fine-tuning large language models (LLMs) on narrowly misaligned data generalizes to broadly misaligned behavior, a phenomenon termed emergent misalignment (EM). While prior work has found a correlation between harmful behavior and self-assessment in emergently misaligned models, it remains unclear how consistent this correspondence is across tasks and whether it varies across fine-tuning domains. We characterize the consistency of the EM persona by fine-tuning Qwen 2.5 32B Instruct on six narrowly misaligned domains (e.g., insecure code, risky financial advice, bad medical advice) and administering experiments including harmfulness evaluation, self-assessment, choosing between two descriptions of AI systems, output recognition, and score prediction. Our results reveal two distinct patterns: coherent-persona models, in which harmful behavior and self-reported misalignment are coupled, and inverted-persona models, which produce harmful outputs while identifying as aligned AI systems. These findings reveal a more fine-grained picture of the effects of emergent misalignment, calling into question the consistency of the EM persona. llm arXiv cs.AI Heterogeneous Scientific Foundation Model Collaboration https://news.argyelan.ai/article/heterogeneous-scientific-foundation-model-collaboration https://news.argyelan.ai/article/heterogeneous-scientific-foundation-model-collaboration Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27351v1 Announce Type: new Abstract: Agentic large language model systems have demonstrated strong capabilities. However, their reliance on language as the universal interface fundamentally limits their applicability to many real-world problems, especially in scientific domains where domain-specific foundation models have been developed to address specialized tasks beyond natural language. In this work, we introduce Eywa, a heterogeneous agentic framework designed to extend language-centric systems to a broader class of scientific foundation models. The key idea of Eywa is to augment domain-specific foundation models with a language-model-based reasoning interface, enabling language models to guide inference over non-linguistic data modalities. This design allows predictive foundation models, which are typically optimized for specialized data and tasks, to participate in higher-level reasoning and decision-making processes within agentic systems. Eywa can serve as a drop-in replacement for a single-agent pipeline (EywaAgent) or be integrated into existing multi-agent systems by replacing traditional agents with specialized agents (EywaMAS). We further investigate a planning-based orchestration framework in which a planner dynamically coordinates traditional agents and Eywa agents to solve complex tasks across heterogeneous data modalities (EywaOrchestra). We evaluate Eywa across a diverse set of scientific domains spanning physical, life, and social sciences. Experimental results demonstrate that Eywa improves performance on tasks involving structured and domain-specific data, while reducing reliance on language-based reasoning through effective collaboration with specialized foundation models. agent arXiv cs.AI CoAX: Cognitive-Oriented Attribution eXplanation User Model of Human Understanding of AI Explanations https://news.argyelan.ai/article/coax-cognitive-oriented-attribution-explanation-user-model-o https://news.argyelan.ai/article/coax-cognitive-oriented-attribution-explanation-user-model-o Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27354v1 Announce Type: new Abstract: Explainable AI (XAI) aims to improve user understanding and decisions when using AI models. However, despite innovations in XAI, recent user evaluations reveal that this goal remains elusive. Understanding human cognition can help explain why users struggle to effectively use AI explanations. Focusing on reasoning on structured (tabular) data, we examined various reasoning strategies for different XAI methods (none, feature importance, feature attribution) in the decision task of anticipating AI decisions (i.e., forward simulation). We i) elicited reasoning strategies from a formative user study, and ii) collected decisions from a summative user study. Using cognitive modeling, we implemented the processes underlying each reasoning strategy and evaluated their alignment with human decision-making. We found that our models better fit human decisions than baseline machine learning proxies, providing insights into which reasoning strategies are (in)effective. We then demonstrate how the fitted model can be used to form hypotheses and investigate research questions that are costly to study with real human participants. This work contributes to debugging human understanding of XAI, informing the future development of more usable and interpretable AI explanations. research arXiv cs.AI Safe Bilevel Delegation (SBD): A Formal Framework for Runtime Delegation Safety in Multi-Agent Systems https://news.argyelan.ai/article/safe-bilevel-delegation-sbd-a-formal-framework-for-runtime-d https://news.argyelan.ai/article/safe-bilevel-delegation-sbd-a-formal-framework-for-runtime-d Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27358v1 Announce Type: new Abstract: As large language model (LLM) agents are deployed in high-stakes environments, the question of how safely to delegate subtasks to specialized sub-agents becomes critical. Existing work addresses multi-agent architecture selection at design time or provides broad empirical guidelines, but neither provides a runtime mechanism that dynamically adjusts the safety-efficiency trade-off as task context changes during execution. We propose Safe Bilevel Delegation (SBD), a formal framework for runtime delegation safety in hierarchical multi-agent systems. SBD formulates task delegation as a bilevel optimization problem: an outer meta-weight network phi learns context-dependent safety-efficiency weights lambda(s) in [0,1]; an inner loop optimizes the delegation policy pi subject to a probabilistic safety constraint P(safe) >= 1-delta. The continuous delegation degree alpha in [0, 1] controls how much decision authority is transferred to each sub-agent, interpolating smoothly between full human override (alpha=0) and fully autonomous execution (alpha=1). We establish three theoretical results: (1) Safety Monotonicity--higher outer safety weight produces a weakly safer inner policy; (2) Inner Policy Convergence--projected gradient descent on the inner problem converges linearly under standard smoothness assumptions; (3) an Accountability Propagation bound that distributes responsibility across multi-hop delegation chains with a provable per-agent ceiling. We instantiate SBD in three high-stakes domains--medical AI (MIMIC-III), financial risk control (S and P 500), and educational agent supervision (ASSISTments)--specifying datasets, safety constraint sets, baselines, and evaluation protocols. This manuscript presents the formal framework and theoretical results in full; empirical validation following the protocols described herein is planned and will be reported in a forthcoming revision. agent arXiv cs.AI TIO-SHACL: Comprehensive SHACL validation for TMF Intent Ontologies https://news.argyelan.ai/article/tio-shacl-comprehensive-shacl-validation-for-tmf-intent-onto https://news.argyelan.ai/article/tio-shacl-comprehensive-shacl-validation-for-tmf-intent-onto Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27359v1 Announce Type: new Abstract: Intent-based networking promises to revolutionize telecommunications network management by enabling operators to specify high-level goals rather than low-level configurations. The TM Forum Intent Ontology (tio) provides a standardized vocabulary for expressing network intents, yet lacks formal validation mechanisms to ensure intent correctness before its admission. We present tio-shacl, the first comprehensive SHACL (Shapes Constraint Language) validation framework for the TMF Intent Ontology. Our contribution includes 56 node shapes and 69 property shapes across all 15 tio v3.6.0 ontology modules, a reusable constraint library with 25 parameterized SPARQL-based constraint components, and novel validation patterns for recursive logical operators, quantity-based constraints, and cross-expectation relationships. We pursued 100% vocabulary coverage (87 classes, 109 properties, 72 functions), cross-implementation compatibility across three major SHACL engines, and validation accuracy on a corpus of 133 test cases. tio-shacl is publicly available under MIT license at https://github.com/EricssonResearch/tio-shacl and enables automated syntactic and semantic validation of network intents, addressing a critical gap in the field. research arXiv cs.AI Measurement Risk in Supervised Financial NLP: Rubric and Metric Sensitivity on JF-ICR https://news.argyelan.ai/article/measurement-risk-in-supervised-financial-nlp-rubric-and-metr https://news.argyelan.ai/article/measurement-risk-in-supervised-financial-nlp-rubric-and-metr Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27374v1 Announce Type: new Abstract: As LLMs become credible readers of earnings calls, investor-relations Q\&A, guidance, and disclosure language, supervised financial NLP benchmarks increasingly function as decision evidence for model selection and deployment. A hidden assumption is that gold labels make such evidence objective. This assumption breaks down when the benchmark ruler itself is sensitive to rubric wording, metric choice, or aggregation policy. We study this measurement risk on Japanese Financial Implicit-Commitment Recognition (JF-ICR; a pinned 253-item test split x 4 frontier LLMs x 5 rubrics x 3 temperatures x 5 ordinal metrics). Three findings follow. First, rubric wording materially changes model-assigned labels: R2--R3 agreement ranges from 70.0% to 83.4%, with the dominant movement near the +1 / 0 implicit-commitment boundary. This pattern is consistent with a pragmatic-boundary interpretation, but is not a validated linguistic-causality claim because the present rubric variants confound semantics, examples, and verbosity. Second, not every metric remains informative under the JF-ICR class distribution. Within-one accuracy is too easy because near misses receive credit and the majority class dominates; worst-class accuracy is too noisy because the rarest class has only two examples. Exact accuracy, macro-F1, and weighted \k{appa} are therefore the identifiable metrics under our operational rule. Third, ranking claims become more defensible only after this metric-identifiability audit: Bradley--Terry, Borda, and Ranked Pairs agree on the identifiable metric subset, while the full five-metric sweep produces disagreement on the closest pair. The contribution is not a new leaderboard, but a reporting discipline for supervised financial benchmarks whose gold labels exist and whose evaluation ruler still requires governance. research arXiv cs.AI Robust Learning on Heterogeneous Graphs with Heterophily: A Graph Structure Learning Approach https://news.argyelan.ai/article/robust-learning-on-heterogeneous-graphs-with-heterophily-a-g https://news.argyelan.ai/article/robust-learning-on-heterogeneous-graphs-with-heterophily-a-g Fri, 01 May 2026 04:00:00 GMT arXiv:2604.27387v1 Announce Type: new Abstract: Heterogeneous graphs with heterophily have emerged as a powerful abstraction for modeling complex real-world systems, where nodes of different types and labels interact in diverse and often non-homophilous ways. Despite recent advances, robust representation learning for such graphs remains largely unexplored, particularly in the presence of noisy or misleading connectivity. In this work, we investigate this problem and identify structural noise as a critical challenge that significantly degrades model performance. To address this issue, we propose a unified framework, Heterogeneous Graph Unified Learning (HGUL), which jointly handles heterophily and noisy graph structures. The framework consists of three complementary modules: a kNN-based graph construction module that recovers reliable local neighborhoods, a graph structure learning module that adaptively refines the adjacency by filtering noisy edges, and a heterogeneous affinity learning module that captures class-level relationships via an extended affinity matrix derived from a polynomial graph kernel. Extensive experiments on multiple datasets demonstrate that HGUL consistently outperforms existing methods on clean graphs and maintains strong robustness under varying levels of structural noise. The results further underscore the importance of jointly modeling heterophily and noise in heterogeneous graph learning. research arXiv cs.AI