论文

Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluation
Towards Scalable Pre-training of Visual Tokenizers for Generation
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding
Meta-RL Induces Exploration in Language Agents
~~Qwen3-VL Technical Report~~
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs
~~Hyper-Connections~~
mHC: Manifold-Constrained Hyper-Connections
Next-Latent Prediction Transformers Learn Compact World Models
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
Gramian Multimodal Representation Learning and Alignment