The latest blogs

All the latest blogs and news, straight from the team.

Aug 5, 2025

Mind the Gap: Why Arabic LLMs Still Lag Behind (and What We Can Do About It)

This blog explores why Arabic large language models lag behind their English counterparts, highlighting gaps in post-training data quality, cultural alignment, and task diversity. It offers practical solutions to build authentic, high-impact Arabic datasets that empower better AI for Arabic speakers across dialects and domains.

Aug 1, 2025

5 min read

Auto Evol-Instruct: Toward Self-Evolving Instruction Tuning

A deep dive into Microsoft's Auto Evol-Instruct framework — how LLMs can autonomously generate, analyze, and optimize instruction data for alignment and reasoning improvement.

Sep 5, 2024

5 min read

Scaling Synthetic Data Creation with 1,000,000,000 Personas

How the Persona Hub framework uses billions of fictional identities to scale synthetic data creation and diversify AI training.