Efficient Test-Time Scaling for Small Vision-Language Models

Mehmet Onurcan Kaya^1,2, Desmond Elliott^3,2, Dim P. Papadopoulos^1,2

¹ Technical University of Denmark ² Pioneer Center for AI ³ University of Copenhagen

Our framework consists of two main pipelines: (1) Test-Time Augmentation: Given an input image and text prompt, we apply various transformations to create multiple augmented versions. VLM processes each augmented input to produce next token probability distributions, which are then aggregated at the token level to generate the final response. (2) Test-Time Adaptation: We create pseudolabels through test-time augmentation and fine-tune the VLM parameters, then repeat the process. Both methods demonstrate effectiveness across nine diverse benchmarks as shown in (b).

🔎 Abstract

Small Vision-Language Models (VLMs) provide a computationally efficient alternative to larger models, at the cost of weaker generalization abilities and downstream task performance. These shortcomings could be addressed by test-time scaling techniques, but existing methods are typically computationally demanding, contradicting the resource-efficient design goals of small models. To address these limitations, we propose two novel and efficient test-time scaling strategies that leverage the model-internal features rather than external supervision: (i) Test-Time Augmentation (TTAug), which generates multiple augmented inputs and aggregates outputs at the token level without parameter updates, and (ii) Test-Time Adaptation (TTAdapt), which adapts model parameters during inference using consensus-based pseudolabels from TTAug. Through extensive experiments across nine benchmarks, we demonstrate consistent performance improvements while maintaining computational efficiency suitable for resource-constrained environments. The generality of our approach is demonstrated both within models at different scales and across different VLMs without additional tuning.

🔧 Installation

git clone https://github.com/monurcan/efficient_test_time_scaling.git
cd efficient_test_time_scaling
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt --no-deps
pip install -e . --no-deps

Note that the code has been tested with Python 3.10.12 and CUDA 12.5.

💻 Inference: Run an Experiment

bash scripts/benchmark.sh benchmark_configs/test_config.json

This will execute the experiment configuration defined in benchmark_configs/test_config.json.

For customizing experiments, refer to the configuration system documentation: docs/en/ConfigSystem.md

Results will be automatically saved to the benchmark_results directory as specified in scripts/benchmark.sh.

🚀 Development

The core logic of our methods is located in vlmeval/vlm/tta

Utility scripts for analysis and visualization are available in scripts:

figure_create.ipynb - Figure generation, saves them to benchmark_visualizations directory
table_create.ipynb - Results table generation

🙏 Acknowledgement

This project builds upon VLMEvalKit. For more details, refer to README_VLMEVALKIT.md.

📚 Citation

@article{Kaya2025EfficientTTS,
  title={Efficient Test-Time Scaling for Small Vision-Language Models},
  author={Mehmet Onurcan Kaya and Desmond Elliott and Dim P. Papadopoulos},
  journal={arXiv preprint arXiv:2510.03574},
  year={2025},
  url={https://monurcan.github.io/efficient_test_time_scaling}
}

💬 Contact

For questions, please open an issue or contact me at [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github		.github
assets		assets
benchmark_configs		benchmark_configs
benchmark_results/n_samples_1000		benchmark_results/n_samples_1000
benchmark_visualizations		benchmark_visualizations
docs		docs
requirements		requirements
scripts		scripts
vlmeval		vlmeval
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
README_VLMEVALKIT.md		README_VLMEVALKIT.md
requirements.txt		requirements.txt
run.py		run.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Efficient Test-Time Scaling for Small Vision-Language Models

🔎 Abstract

🔧 Installation

💻 Inference: Run an Experiment

🚀 Development

🙏 Acknowledgement

📚 Citation

💬 Contact

About

Uh oh!

Releases

Packages

Languages

monurcan/efficient_test_time_scaling

Folders and files

Latest commit

History

Repository files navigation

Efficient Test-Time Scaling for Small Vision-Language Models

🔎 Abstract

🔧 Installation

💻 Inference: Run an Experiment

🚀 Development

🙏 Acknowledgement

📚 Citation

💬 Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages