Join as a Contributor
Open issues on GitHub, propose experiments.
To advance the field of artificial intelligence through rigorous research, open collaboration, and a commitment to creating tools and insights that empower researchers and practitioners worldwide to build better, more reliable AI systems.
A future where AI research is accessible, transparent, and impactful. We envision a global community united by shared knowledge, where breakthrough discoveries in data science and machine learning accelerate innovation and solve humanity’s greatest challenges.
Whether you're a researcher, hacker, founder, or policymaker — you're welcome here.
Open issues on GitHub, propose experiments.
Short-term projects with publishing support.
Build on our SDKs, tools and benchmarks.
Bring your own vision to life with our lab's support.
The principles that guide our research and shape our community
Pushing the boundaries of AI research with cutting-edge methodologies and novel approaches to data science.
Building a global community of researchers, developers, and enthusiasts working together towards common goals.
Committed to transparency and knowledge sharing, making our research accessible to everyone.
Creating solutions that address real-world challenges and benefit communities worldwide.
Whether you’re a researcher, developer, or AI enthusiast, there’s a place for you in our community. Let’s build the future of AI together.
All the latest blogs and news, straight from the team.
This blog explores why Arabic large language models lag behind their English counterparts, highlighting gaps in post-training data quality, cultural alignment, and task diversity. It offers practical solutions to build authentic, high-impact Arabic datasets that empower better AI for Arabic speakers across dialects and domains.
A deep dive into Microsoft's Auto Evol-Instruct framework — how LLMs can autonomously generate, analyze, and optimize instruction data for alignment and reasoning improvement.
How the Persona Hub framework uses billions of fictional identities to scale synthetic data creation and diversify AI training.