parsifalster的文档

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

过去十年见证了数据科学和机器学习的实验革命，这是通过深度学习方法体现的。实际上，实际上，具有适当的计算量表，许多以前被认为是无法触及的高维学习任务（例如计算机视觉，玩游戏或蛋白质折叠）是可行的。值得注意的是，深度学习的本质是从两个简单的算法原理中构建的：首先，表示表示或特征学习的概念，通过这些概念（通常是层次结构，特征）为每个任务捕获适当的规律性概念，其次是通过本地梯度散发类型的方法进行的，通常以反射性为背部 ...

0 0 0 0 2025/07/06 arXiv:2104.13478v2 parsifalster

Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought

大型语言模型（LLMS）在许多应用中表现出了出色的性能，包括通过思考链（COTS）技术挑战推理问题，这些技术在回答问题之前会产生``思考 Token ''。尽管现有的理论作品表明，具有离散 Token 的COTS提高了LLM的能力，但最近对COTS的工作缺乏理论上的理解，为什么它在各种推理任务中胜过诸如定向图形可及性的各种推理任务（例如，包括许多实用的域名域名）的基本图形问题等各种推理任务。在本文中，我们证明具有连续COTS的两层 Transformer 可以解决有向的图形可及性问题，其中$ d $是该图的直径，而具有离散cots的恒定深度 Transformer 的最著名结果需要$ o（n^2）$ o（n^2）$ n $ n $ n $ n $ n $ n $ vertices（$ d <n $ d <n $） ...

0 0 0 0 2025/07/06 arXiv:2505.12514v2 parsifalster

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought

On the Turing Completeness of Modern Neural Network Architectures

Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View

Saturated Transformers are Constant-Depth Threshold Circuits

Neural Machine Translation and Sequence-to-sequence Models: A Tutorial

Statistical Physics of Deep Neural Networks: Generalization Capability, Beyond the Infinite Width, and Feature Learning

The Hintons in your Neural Network: a Quantum Field Theory View of Deep Learning

FieldFormer: Self-supervised Reconstruction of Physical Fields via Tensor Attention Prior

An Interpretable Automated Mechanism Design Framework with Large Language Models