arxiv的文档

arxiv 南京

个性签名 ...

Dust to Tower: Coarse-to-Fine Photo-Realistic Scene Reconstruction from Sparse Uncalibrated Images

从稀疏视图中重建照片现实的场景，在实践中极高地需要未校准的图像。尽管已经取得了一些成功，但现有方法要么是稀疏视图，但需要准确的相机参数（即 ...

0 0 0 0 2025/08/01 arXiv:2412.19518v1 wonglliam

InstantSplat: Sparse-view Gaussian Splatting in Seconds

虽然基于稀疏图像集的新视图合成（NVS）在3D计算机视觉领域取得了显着的进步，但它依赖于使用运动结构（SfM）对参数参数进行精确的最终估计。例如，最近开发的高斯溅射在高度依赖于 SfM 导出的点和位姿的准确性。然而，SfM 过程非常运行，并且在匹配特征稀缺的稀疏视图场景中通常不可靠，导致累积错误和跨数据集的泛化能力有限... ...

0 0 0 0 2025/08/01 arXiv:2403.20309v6 wonglliam

GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking

4D视频控制在视频生成中至关重要，因为它可以使用复杂的镜头技术，例如多相机拍摄和Dolly Zoom，这些镜头目前不受现有方法的支持。直接培训视频扩散 Transformer （DIT）以控制4D内容需要昂贵的多视频视频。受单眼动态新型视图合成（MDV）的启发，该视图（MDVS）优化了4D表示并根据不同的4D元素（例如相机姿势和对象运动编辑）渲染视频，我们将伪4D高斯字段带到视频生成 ...

0 0 0 0 2025/08/01 arXiv:2501.02690v1 小小卡拉米

StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter

文本到视频（T2V）模型在生成多种视频方面表现出了显着的功能。但是，由于（i）文字在表达特定样式方面的固有笨拙以及（ii）普遍退化的样式保真度，他们很难制作用户呈现的风格化视频。为了应对这些挑战，我们介绍了StyleCrafter，这是一种通用方法，可通过风格控制适配器增强预训练的T2V模型，从而通过提供参考图像来以任何样式的视频生成 ...

0 0 0 0 2025/08/01 arXiv:2312.00330v2 Abidalswark

TrajFlow: Multi-modal Motion Prediction via Flow Matching

有效，准确的运动预测对于确保自主驾驶中的安全性和明智的决策至关重要，尤其是在需要多模式预测的动态现实世界中。我们介绍了Trajflow，这是一种基于流动匹配的新型运动预测框架，该框架解决了现有生成轨迹预测方法的可扩展性和效率挑战。与采用i的常规生成方法不同 ...

0 0 0 0 2025/08/01 arXiv:2506.08541v2 zhlstone

The Evolving Role of Large Language Models in Scientific Innovation: Evaluator, Collaborator, and Scientist

Scientific innovation is undergoing a paradigm shift driven by the rapid advancement of Large Language Models (LLMs). As science faces mounting challenges including information overload, disciplinary silos, and diminishing returns on conventional research methods, LLMs are emerging as powerful agents capable not only of enhancing scientific workflows but also of participating in and potentially leading the innovation process. Existing surveys mainly focus on different perspectives, phrases, and tasks in scientific research and discovery, while they have limitations in understanding the transformative potential and role differentiation of LLM.

0 0 0 0 2025/08/01 arXiv:2507.11810v1 kkkk

Dust to Tower: Coarse-to-Fine Photo-Realistic Scene Reconstruction from Sparse Uncalibrated Images

InstantSplat: Sparse-view Gaussian Splatting in Seconds

GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking

StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter

TrajFlow: Multi-modal Motion Prediction via Flow Matching

The Evolving Role of Large Language Models in Scientific Innovation: Evaluator, Collaborator, and Scientist

A Survey on Memory-Efficient Transformer-Based Model Training in AI for Science

Deep Graph Anomaly Detection: A Survey and New Perspectives

ICPC-Eval: Probing the Frontiers of LLM Reasoning with Competitive Programming Contests

One Object, Multiple Lies: A Benchmark for Cross-task Adversarial Attack on Unified Vision-Language Models

来一起翻译吧！

为了您和其他读者获得更好的阅读体验，请您在阅读时勇敢地改正翻译，特别是一些显而易见的机器翻译错误。

虽然我们追求卓越，但我们并不要求翻译十全十美，因此请不要担心您翻译有误 —— 我们的服务器已经记录所有的翻译，您不必担心会因为您的失误导致无法挽回的破坏。（改编自维基百科）