
Agentica and Together AI are releasing DeepCoder-14B-Preview, an open-source model boasting coding abilities on par with proprietary models.
The development is a notable advancement in applying reinforcement learning (RL) to coding, an area that has traditionally lagged behind progress seen in mathematical reasoning models. The key challenge has been sourcing high-quality, verifiable coding datasets essential for effective RL training.
Overcoming data hurdles for open-source AI coding models
The teams behind DeepCoder addressed the data scarcity issue head-on. While existing datasets like APPS, TACO, CodeContests, KodCode, and LeetCode were evaluated, many were found to be either too simple, noisy, or lacked reliable test cases.
To build a robust training set, Agentica and Together AI curated 24,000 high-quality, verifiable coding problems. This involved combining verified problems from TACO and PrimeIntellect’s SYNTHETIC-1 dataset with specific LiveCodeBench problems.
A rigorous filtering pipeline was implemented, ensuring each problem had an officially verifiable solution passing all unit tests and included at least five unit tests to prevent ‘reward hacking’ – where models learn to game simple tests. Crucially, deduplication across datasets and checks against test datasets prevented contamination.
Training innovations and performance
DeepCoder-14B-Preview was trained over 2.5 weeks on 32 H100 GPUs. The training employed a sparse Outcome Reward Model (ORM), assigning reward only if the generated code passed all sampled unit tests, avoiding issues seen with partial rewards. Test cases were sampled carefully, focusing on the 15 most challenging ones per problem, with strict timeouts.
The team enhanced the GRPO reinforcement learning algorithm (termed GRPO+) by integrating insights from DAPO, resulting in more stable training.
Key modifications included removing entropy and KL loss terms and implementing ‘overlong filtering’ and ‘clip high’ techniques. Overlong filtering proved vital, allowing the model to preserve long-context reasoning even when training on shorter contexts by masking loss for truncated sequences.
A technique called “iterative context lengthening,” previously successful in smaller models, was adapted. By starting training at a 16K context length and extending to 32K, combined with overlong filtering, DeepCoder demonstrated remarkable generalisation. While trained up to 32K context, it achieved its peak performance when evaluated at a 64K context length—unlike base models which plateaued.
DeepCoder achieved an impressive 60.6% Pass@1 accuracy on LiveCodeBench, matching the performance level of OpenAI’s o3-mini (Low) and o1-2024-12-17 models, but with significantly fewer parameters (14B). It also attained a Codeforces rating of 1936 (95.3 percentile) and 92.6% on HumanEval+.
Despite not being specifically trained for maths, its enhanced reasoning generalised well—scoring 73.8% on AIME2024, improving upon its base model.
System optimisations for quicker RL
Recognising that long-context RL training is incredibly time-intensive, the collaboration also yielded verl-pipeline: an optimised, open-source extension to the verl RLHF library.
Post-training systems are often bottlenecked by the time it takes to sample long sequences. verl-pipeline introduces system-level improvements, including “one-off pipelining,” which allows sampling, reward calculation, and training steps to occur in parallel more effectively than naive minibatch pipelining.
This resulted in up to 2x speedup in end-to-end training times for coding tasks compared to baseline verl implementations. These optimisations were validated by training a smaller DeepCoder-1.5B model, which showed significant improvement over its base model.
By open-sourcing the dataset, training code, logs, and the verl-pipeline system optimisations, Agentica and Together AI aim to empower the broader community.
This release not only provides a powerful, parameter-efficient open-source AI coding model but also offers a blueprint and tools for others to “progress on scaling and accelerating intelligence with RL.”
(Photo by Bernd Dittrich)
See also: Stack Overflow: Developers strongly prefer open-source AI
Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.
Explore other upcoming enterprise technology events and webinars powered by TechForge here.
#Opensource #matches #coding #abilities #proprietary #models