# Sequential-Hidden-Decoding-8B-n4 **Repository Path**: hf-models/Sequential-Hidden-Decoding-8B-n4 ## Basic Information - **Project Name**: Sequential-Hidden-Decoding-8B-n4 - **Description**: Mirror of https://huggingface.co/tencent/Sequential-Hidden-Decoding-8B-n4 - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-03-11 - **Last Updated**: 2026-03-11 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README --- license: other license_name: sequential-hidden-decoding license_link: LICENSE base_model: - Qwen/Qwen3-8B-Base tags: - sequential-hidden-decoding - pretrained - base-model --- # Sequential-Hidden-Decoding-8B-n4 This is the **n=4** variant of Sequential Hidden Decoding, a method that scales sequence length by n× with only additional Embedding parameters — same Transformer, more compute per token. - **Base model:** [Qwen3-8B-Base](https://huggingface.co/Qwen/Qwen3-8B-Base) - **Scale:** 4× - **Additional Embedding Params:** 3.1B - **Training Tokens:** 150B - **Dtype:** bfloat16 > **Note:** This is a **base model** (not instruction-tuned). It is intended for benchmarking, text completion, and as a foundation for downstream fine-tuning (SFT / RLHF). For conversational or instruction-following use cases, please fine-tune on your own data. ## Key Idea Prepare *n* independent Embedding matrices to encode the same token sequence *n* times, interleave the results, and feed the *n*×-length sequence into the same Transformer. Only the last embedding of each token computes the next-token loss, while the preceding embeddings serve as implicit reasoning steps in a continuous latent space. ## Results | Benchmark | # Shots | 8B Baseline | 8B scale n=2 | 8B scale n=4 | 8B scale n=8 | |-----------|:-------:|:-----------:|:------------:|:------------:|:------------:| | BBH (EM) | 3-shot | 78.8 | 81.3 | **83.0** | 83.9 | | MMLU (EM) | 5-shot | 79.8 | 80.9 | **81.9** | 82.2 | | MBPP+ (Pass@1) | 1-shot | 66.7 | 69.4 | **68.7** | 69.4 | | MATH (LLM-judge) | 4-shot | 56.0 | 58.2 | **60.0** | 61.1 | | ARC-C | 25-shot | 93.9 | 94.3 | **94.4** | 94.7 | | Hellaswag | 10-shot | 79.7 | 83.1 | **85.0** | 85.3 | | GSM8K | 4-shot | 92.5 | 93.3 | **93.9** | 94.6 | ## Serving (SGLang) This model requires a patched version of [SGLang](https://github.com/sgl-project/sglang) for inference. See the [project page](https://github.com/Tencent/Sequential-Hidden-Decoding) for installation options (Docker image, forked repo, or manual patch). ```bash python -m sglang.launch_server \ --model-path tencent/Sequential-Hidden-Decoding-8B-n4 \ --trust-remote-code \ --tp-size 1 \ --port 30000 --host 0.0.0.0 \ --chunked-prefill-size -1 \ --attention-backend fa3 \ --mem-fraction-static 0.82 \ --max-running-requests 32 \ --context-length 131072 \ --cuda-graph-max-bs 128 \ --cuda-graph-bs 1 2 4 8 16 32 64 128 ``` ```python from openai import OpenAI client = OpenAI(base_url="http://localhost:30000/v1", api_key="EMPTY") response = client.completions.create( model="tencent/Sequential-Hidden-Decoding-8B-n4", prompt="The meaning of life is", max_tokens=128, temperature=0, ) print(response.choices[0].text) ``` ## All Models | Model | Scale | Embedding Params | Training Tokens | |-------|:-----:|:----------------:|:---------------:| | [Sequential-Hidden-Decoding-8B-n2](https://huggingface.co/tencent/Sequential-Hidden-Decoding-8B-n2) | 2× | 1.9B | 75B | | [Sequential-Hidden-Decoding-8B-n4](https://huggingface.co/tencent/Sequential-Hidden-Decoding-8B-n4) | 4× | 3.1B | 150B | | [Sequential-Hidden-Decoding-8B-n8](https://huggingface.co/tencent/Sequential-Hidden-Decoding-8B-n8) | 8× | 5.6B | 187B | ## Citation ```bibtex @article{hidden_decoding_2026, title = {Hidden Decoding: Scaling Sequence Length in Pretraining}, year = {2026}, url = {https://welm.weixin.qq.com/posts/hidden_decoding/} } ``` ## License This model is released under the [License Terms of Sequential-Hidden-Decoding](LICENSE).