4 min read
Typhoon: Open-Source Language Technologies for Thai Language Knowledge, and Culture

Overview

Typhoon is a leading initiative that advancing AI and Large Language Models for Thai. As a founding leader at SCB 10X, I helped establish it as nowaday Thailand’s top AI research lab.

The project spans the full AI development cycle - from research on multimodal and reasoning adaption to application — aiming to position Thailand as a technology creator, not just a user.

Key Achievements & Impact

  • Built Thailand’s Leading AI Research Lab: Founding member of the Typhoon Team, building it from the ground up into Thailand’s leading AI research lab
  • Massive Adoption at Scale: Achieved 1M+ downloads on Hugging Face and processed over 30M+ API requests at opentyphoon.ai
  • Enterprise Production Deployment: First LLM adopted by SCBx for production use. Currently deployed by multiple enterprise clients including PEA, Toyota Leasing, TDRI, VISAI, Siriraj Hospital, and the Office of the Education Council
  • Superior Performance & Cost: Models outperform GPT-4o, Claude 3.7, and Gemini 2.5 Flash in Thai-specific tasks. Typhoon ASR delivers comparable performance to Google and Azure at 156× lower cost
  • Industry Recognition:
    • Winner of Techsauce Innovation Award 2024
    • Selected as AWS GAIA 2025 startup from over 1,000 applicants, receiving $1M in AWS credits
  • Open Research: All major Typhoon models and research papers are open-sourced, fostering collaboration and advancing the field within Thailand, SEA, and globally

Core Typhoon Lab Works

1. Foundational Models (Typhoon)

  • Led development of the first comprehensive Thai Large Language Model family, achieving most competitive open-source Thai LLM status
  • Designed from scratch a Thai knowledge evaluation system for LLMs, establishing evaluation standards for the Thai AI community
  • Implemented end-to-end pipeline from web crawling to data filtering and continuous pretraining, enabling efficient adaptation of high-resource language LLMs to Thai

2. Multimodal Capabilities (Typhoon2)

  • Typhoon Family of Multimodal LLMs: Developed one of the first text, vision, and audio multimodal models in Southeast Asia, successfully deployed by TDRI, VISAI, Siriraj Hospital, the Office of the Education Council, and other organizations

3. Reasoning Models (Typhoon T1 & Typhoon R1)

  • Typhoon T1: Developed the first reasoning model in Southeast Asia, pioneering test-time compute scaling for Thai language
  • Typhoon R1: Created the most advanced reasoning LLM tailored for Thai, matching DeepSeek R1’s reasoning performance while surpassing it in Thai benchmarks
  • Pioneered novel model merging techniques to efficiently adapt language-specific LLMs into reasoning models, reducing development time from months to days

4. Leadership & Team Development

As Lead AI Scientist and Founding Member, I built, led, mentored, and directed the team to deliver:

  • Typhoon Audio2: Among the first end-to-end speech LLMs in Southeast Asia, enabling advanced audio processing and understanding for low-resource languages

  • Typhoon OCR: Next-generation bilingual vision-language model outperforming GPT-4o and Gemini 2.5 Flash in Thai document understanding. Currently deployed by PEA and Toyota Leasing for production document parsing

  • Typhoon Translate: Lightweight Thai-English translation model with only 4B parameters, outperforming GPT-4o, Claude 3.7, and Gemini 2.5 Flash while enabling efficient edge deployment for privacy-first applications

  • Typhoon ASR Real-Time: Real-time Thai speech-to-text model delivering performance comparable to Google and Azure solutions at 156× lower cost. Trained from scratch using Transducer architecture with streaming capabilities

  • Regional & International Collaboration: Led collaborations with SEA AI LAB (Sealion2), AI-SG (Sealion, Project Aquarium), and Stanford University (ThaiHelm, Talk-Arena, and multiple research initiatives)

Key Publications: