Low-Bit-width-Large-Model-Quantization-Challenge

The IEEE International Conference on Multimedia & Expo (ICME) 2026 Low Bit-width Large Model Quantization Challenge(Large Model Low-Precision Quantization Challenge)

This challenge is an official challenge track of the
IEEE International Conference on Multimedia & Expo (ICME) 2026.

Note: This page provides a brief overview of the challenge.
The simulation tools, benchmarks, and related resources used in this challenge will be released progressively.
Please follow this page for the latest updates and announcements.


🔹 Quick Start


1. Challenge Description

Challenge Details

The challenge centers on Text-to-Image and Text-to-Video generation tasks. This challenge establishes two primary research directions focusing on low-precision computation for Large Language Models(LLM).

Direction A – Quantization-Aware Training (QAT).
Participants are required to use specified public datasets and pre-trained models to perform quantization-aware fine-tuning using the HiFloat8 (HiF8) numerical format [1]. The objective is to optimize model accuracy on downstream tasks while reducing training and computation costs.

Direction B – Post-Training Quantization (PTQ).
Participants are required to apply inference-time quantization directly to pre-trained models to achieve model compression and acceleration. Low-precision formats such as HiFloat4 (HiF4)[2] or MXFP4[3] are used in this direction.

Based on these two research directions, two sub-challenges are set up. Each sub-challenge adheres to the unified principles stipulated in the Evaluation Criteria section.

Each participant may choose only one sub-challenge to participate in.


Sub-Challenge 1: W4A4 Quantization for Inference (HiF4 / MXFP4)

LLM inference faces significant deployment costs, which often constrain application proliferation. Quantizing the weights and activations of the linear layers in LLM effectively enhances inference performance.

This sub-challenge focuses on 4-bit weight and activation quantization (W4A4), restricted to either the HiF4 or MXFP4 numerical format. Participants are required to develop and apply quantization strategies to the open-source, state-of-the-art multimodal generative model Wan 2.2. We will utilize the comprehensive OpenS2V-5M dataset and associated VBench[5] metrics to rank the score.

Participants are allowed to keep a limited number of Transformer blocks in high precision: a maximum of 5 layers for MXFP4 and 2 layers for HiF4.

Mini-Challenge for W4A4 Quantization

To promote and encourage research into low-precision data formats for quantization, a Mini-Challenge is established under this sub-challenge. This track will not be formally ranked against the main competition leaderboards. However, submissions to this track will be collectively eligible for consideration in the Innovation Award category.

The reference model for this Mini-Challenge is Pangu-72B-2512. Evaluation is conducted on standard downstream task datasets, using the mean absolute percentage precision loss as the final metric.

The objective is to achieve a W4A4 inference average precision loss that is no more than 1% lower than the BF16 average baseline on the following datasets:

Only submissions that satisfy this objective will qualify for evaluation for the Innovation Award.


Sub-Challenge 2: W8A8 Quantization for Training (HiF8)

LLM training incurs high costs and lengthy iteration cycles, limiting rapid development. Quantizing the weights and activations of linear layers, and utilizing low-precision formats within attention layers, can effectively reduce data movement costs and accelerate training via low-precision computation.

This task focuses on 8-bit weight and activation quantization and attention quantization, strictly limited to the HiF8 numerical format. The test model is Wan2.1 T2V-1.3B.

Participants are encouraged to employ delayed scaling strategies[4] wherever possible to further minimize quantization overhead and accelerate training.

The training methodology will be evaluated based on the video generation quality produced by the post-training model using the BestWishYsh/OpenS2V-5M dataset.

The Vbench evaluation metric will be used, aiming for a precision loss of less than 0.5%.

Participants are allowed to keep a limited number of Transformer blocks in high precision: up to 5 layers when training with HiF8.

Mini-Challenge for W8A8 Quantization

A Mini-Challenge is also established under this sub-challenge to encourage further exploration of low-precision training methodologies. This track will not be formally ranked on the main competition leaderboard, but eligible submissions will be considered for the Innovation Award.

The test model for this Mini-Challenge is OpenPangu1B. Participants are encouraged to employ delayed scaling strategies to reduce quantization overhead and accelerate training.

Evaluation will be conducted on a set of language model benchmarks, including but not limited to:

The target objectives are:

Only submissions that satisfy both objectives will qualify for evaluation for the Innovation Award.


2. Participation and Registration

🔗 Registration Form:
Register for the Challenge

Alternative Registration Form (Only use this link when you have trouble opening the first registration link.)

For registration inquiries:
📧 zhaoy21@tsinghua.org.cn

Participation Terms


3. Evaluation Criteria

All submissions will be evaluated based on the following primary criteria, totaling 100%.
The evaluation is divided into two main categories: Objective Evaluation and Subjective Evaluation.

I. Objective Evaluation (70% of total score)

This section utilizes quantitative metrics to measure model performance.
The final score is calculated based on comparative rankings. The score of objective evaluation is based on the percentile ranking in the following components.

1. Baseline Requirement

This metric measures the discrepancy or degree of loss between the model output and the target data. A lower loss indicates superior and more accurate model performance.

Requirement:

Only submissions meeting this baseline requirement will proceed to the ranking phase.

2. Quality & Diversity Metrics: VBench (50%)

3. Quantitative Proportions (20%)

Percentile Rank Core Quality Score (Weight: 50%) Quantitative Ratio Score (Weight: 20%)
Top 10% 50 20
Top 11% – 25% 45 18
Top 26% – 50% 35 14
Top 51% – 75% 20 8
Bottom 25% 10 4
Unqualified 0 0

II. Subjective Evaluation (30% of total score)

This section involves perceptual quality scoring by a panel of human judges, consisting of technical experts and university student volunteers.

1. Realism and Clarity (20%)

2. Innovation (10%)


4. Important Dates

Phase Description Date
Registration Challenge registration period Feb 10, 2026 – Apr 10, 2026
Submission Result submission window Mar 15, 2026 – Apr 20, 2026 (Tentative)
Review Review & reproducibility verification Apr 15, 2026 – Apr 23 2026 (Tentative)
Announcement Final results announcement Apr 27, 2026 (Tentative)
Camera-Ready Deadline Final results announcement May 15, 2026

5. Submission

📧 Submission Email: zhaoy21@tsinghua.org.cn

Each submission must be sent as a single archive (.zip / .tar.gz) and include:

Please note:

For details, please refer to the Awards section.


6. Code, Tools, and Datasets

The HiFloat8/4 (HiF8/4) simulation toolkit used in this challenge has been officially released.

Participants are required to download and use the HiF8/4 simulation tools from the following repository:

👉 HiFloat8 Simulation Toolkit (GitHub)

👉 HiFloat4 Simulation Toolkit (GitHub)

Detailed deployment instructions, environment configuration, and usage examples are provided in the README file of the repository.


7. Awards

Awards are established for this challenge to recognize both overall performance and technical innovation.

The First, Second, and Third Prizes are awarded separately for each sub-challenge, based on the total evaluation score within that sub-challenge.
The Innovation Awards are evaluated across all sub-challenges, including Mini-Challenge submissions, and are judged solely on originality and breakthrough.

Mandatory Open-Source Requirement

All award-winning teams are required to open-source their final submission, including source code, materials, and team information, on GitCode.

This is a mandatory condition for receiving any award. Failure to comply within 7 working days after the award announcement may result in disqualification and revocation of the award.

Alternatively, teams may voluntarily forfeit their awards if they choose not to comply with the open-source requirement.


8. Official Contact

📧 zhaoy21@tsinghua.org.cn


9. Organizers


References

[1] Luo Y., Zhang Z., Wu R., et al.
Ascend HiFloat8 Format for Deep Learning.
arXiv preprint, arXiv:2409.16626, 2024.

[2] Luo Y., Huang J., Cheng Y., st al. HiFloat4 Format for Language Model Inference. arXiv preprint, arXiv:2602.11287, 2026.

[3] Rouhani B D, Zhao R, More A, et al. Microscaling data formats for deep learning. arXiv preprint, arXiv:2310.10537, 2023.

[4] Peng H., Wu K., Wei Y., et al.
FP8-LM: Training FP8 Large Language Models.
arXiv preprint, arXiv:2310.18313, 2023.

[5] Huang Z., He Y., Yu J., et al.
VBench: Comprehensive Benchmark Suite for Video Generative Models.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 21807–21818.

Note: Baseline code, simulation tools, datasets, and evaluation scripts will be released in stages. Please watch the GitCode repository for updates.