论文
- Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction
- Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
- GenEval 2: Addressing Benchmark Drift in Text-to-Image Evaluation
- Towards Scalable Pre-training of Visual Tokenizers for Generation
- Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition
- The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding
- Meta-RL Induces Exploration in Language Agents
Qwen3-VL Technical Report - DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs
Hyper-Connections - mHC: Manifold-Constrained Hyper-Connections
- Next-Latent Prediction Transformers Learn Compact World Models
- NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
- Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
- Gramian Multimodal Representation Learning and Alignment