笔记 · 科研阅读 · 2026
论文
- Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction
- Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
- GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluation
- Towards Scalable Pre-training of Visual Tokenizers for Generation