笔记 · 科研阅读 · 2026

论文

  • Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction
  • Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
  • GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluation
  • Towards Scalable Pre-training of Visual Tokenizers for Generation