Yujin Tang

Recent Publications

Large Language Models to Diffusion Finetuning [ICML 2025]

We introduce a finetuning method that enables large language models to scale test-time compute using the diffusion framework. Accuracy improves with more diffusion steps, and the model can adaptively allocate compute via ODE solvers and guided generation. Our method applies to any cross-entropy–trained model without altering original weights, complements standard finetuning and search, and bridges autoregressive and diffusion paradigms.

Automating the Search for Artificial Life with Foundation Models [ALIFE 2025]

We present ASAL, the first method to apply vision-language foundation models to Artificial Life (ALife). ASAL automates the discovery of lifelike simulations by finding target behaviors, generating open-ended novelty, and illuminating diverse phenomena. It works across multiple ALife substrates—Boids, Lenia, Game of Life, and more—and has led to the discovery of previously unseen lifeforms. By enabling human-aligned, scalable exploration, ASAL introduces a powerful new paradigm for accelerating ALife research beyond manual trial-and-error.

Transformer²: Self-Adaptive LLMs [ICLR 2025]

We present Transformer², a self-adaptive framework that enables large language models to handle unseen tasks in real time by adjusting only the singular components of their weight matrices. Using a two-pass inference process with a task dispatcher and RL-trained expert vectors, Transformer² outperforms methods like LoRA with fewer parameters and greater efficiency. It generalizes across architectures and modalities, offering a scalable path toward dynamic, self-organizing AI systems.

Agent Skill Acquisition for LLMs via CycleQD [ICLR 2025]

We introduce CycleQD, a skill-focused training method for language models that applies Quality Diversity with cyclic task adaptation, model merging–based crossover, and SVD-based mutation. By rotating the focus across tasks, CycleQD avoids data imbalance issues and simplifies objective design. Applied to LLAMA3-8B-INSTRUCT, it outperforms traditional fine-tuning on coding, OS, and database tasks, matching GPT-3.5-Turbo’s performance while preserving strong language abilities. The method is general and extends beyond language to domains like image segmentation.

An Evolved Universal Transformer Memory [ICLR 2025]

We introduce Neural Attention Memory Models (NAMMs), a learned memory management system that enhances both the efficiency and performance of transformers by selectively focusing on relevant context. Unlike prior rule-based methods, NAMMs evolve atop pre-trained models and condition only on attention matrices, making them broadly applicable. Trained on a small set of tasks, NAMMs improve performance across long-context benchmarks while drastically reducing input size, and they generalize zero-shot across architectures and modalities—including vision and reinforcement learning.

Yujin Tang (湯聿津)

News

Recent Publications

Large Language Models to Diffusion Finetuning [ICML 2025]

Automating the Search for Artificial Life with Foundation Models [ALIFE 2025]

Transformer²: Self-Adaptive LLMs [ICLR 2025]

Agent Skill Acquisition for LLMs via CycleQD [ICLR 2025]

An Evolved Universal Transformer Memory [ICLR 2025]

Yujin Tang (湯 聿津)

News

Recent Publications

Large Language Models to Diffusion Finetuning [ICML 2025]

Automating the Search for Artificial Life with Foundation Models [ALIFE 2025]

Transformer²: Self-Adaptive LLMs [ICLR 2025]

Agent Skill Acquisition for LLMs via CycleQD [ICLR 2025]

An Evolved Universal Transformer Memory [ICLR 2025]

Yujin Tang (湯聿津)