Master the full range of KOAI Subject 4 (Natural Language Processing & Audio) through hands-on practice. Covering BERT, encoder-decoder models, LLM APIs, and Whisper, students reach a level where they can build end-to-end pipelines for text and audio data themselves. This is an advanced NLP and audio course for high school division candidates.
Published: May 16, 2026 | Last updated: May 16, 2026 · Based on the KOAI 2026 guidelines
Track
Advanced
High-school division competitors
Target Grade
High-school division
F1·F2 completed
Recommended Hours
About 10 hours
For 1:1 · varies 6–14 hours
KOAI Mapping
Full range of Subject 4
Syllabus 4-1 ~ 4-2
By the end of A2, students master the full range of KOAI Subject 4 (Natural Language Processing & Audio) at a hands-on level. Starting from tokenization and vocabulary building, the course covers BERT, encoder-decoder models, language modeling, LLM API usage, and Whisper-based speech recognition. The goal is to reach a level where students can build end-to-end pipelines for text and audio data from start to finish on their own.
A2 builds natural language processing and audio processing on top of the machine learning and deep learning foundation laid in F1 and F2 — an advanced course exclusively for high school division candidates. It addresses the unique characteristics of Korean NLP separately, so a capstone built on Korean data connects directly to distinctive material for the application essay.
Below is the standard outline on a 1:1 basis. Depending on a student's prior knowledge and absorption speed, some weeks may be accelerated, compressed, or covered in greater depth. Core tools: Hugging Face Transformers, PyTorch, BERT, mT5/MarianMT, KoBERT/KLUE, Llama/Qwen, Anthropic/OpenAI API, Whisper, HuBERT.
| Week | Topic | Key Deliverable |
|---|---|---|
| 1 | Text classification + tokenization & vocabulary building | TF-IDF + neural network baseline |
| 2 | Pretrained text encoder BERT (theory + practice) | BERT fine-tuning (sentiment analysis) |
| 3 | Language modeling (theory + practice), causal vs masked | Token-level LM training |
| 4 | Encoder-decoder models (machine translation, summarization) | mT5 & MarianMT fine-tuning |
| 5 | Korean NLP specifics (morphemes, Korean tokenizers) | Using KoBERT & KLUE |
| 6 | Using open-source LLMs (Llama, Qwen) | Local inference + LoRA |
| 7 | Using LLM APIs (Anthropic, OpenAI): prompt engineering | RAG mini system |
| 8 | Audio data processing + HuBERT | Audio classification |
| 9 | Whisper, Qwen-Audio, Voxtral | Speech recognition + multilingual |
| 10 | Capstone: NLP or audio application project | repo + demo |
※ Weeks are content units; actual time varies by student. Recommended about 10 hours, ranging 6–14 hours.
Students write a Jupyter notebook every week. They implement each topic in code — tokenization, BERT fine-tuning, LLM APIs, speech recognition — leaving a cumulative asset. English notebooks include annotations of key Korean terms alongside. This also prepares students for the KOAI Round 2 Korean written section.
Students complete one Korean or English text/audio application project. They choose either NLP or audio, implement it end-to-end, and package it as a repo and demo to leave as a portfolio asset.
A2 leaves one GitHub repo koai-nlp-audio as a cumulative asset. A capstone built on Korean data is used directly as material for the "localized AI experience" in Question 2 of the application essay.
GitHub
1 organized repo koai-nlp-audio
Korean NLP
KoBERT/KLUE deliverable
Personal statement
Question 2 "localized AI experience"
This track record accumulates as dated evidence in the KOAI Round 1 application's Portfolio 40% · AI competency 30% section as cumulative evidence. The earlier you start, the deeper your level by the time you sit the exam.
A2 is a course in the advanced track of the KOAI curriculum. For the full track structure, see KOAI Prep Curriculum Hub.
Current Course
A2. Advanced II: NLP & Audio
Natural Language Processing & Audio
You need to have completed F1 and F2. It can be taken alongside A1 and is recommended for students planning to sit the high school division exam.
It maps to the entire range of KOAI Syllabus Subject 4 (Natural Language Processing & Audio), 4-1 through 4-2. It covers tokenization, BERT, LLM APIs, RAG, and Whisper.
Yes. In Week 5 we cover morphemes, Korean tokenizers, KoBERT, and KLUE. The Korean-data capstone is used as material for the "localized AI experience" in the application essay.
In Week 7, students build a RAG mini system hands-on using the Anthropic and OpenAI APIs and prompt engineering.
It leads to C1 Portfolio Studio → C2 Mock Bootcamp → C3 Selection Camp. For exact dates, see the KOAI competition guide (https://citcoding.com/competitions/koai.html).
We design your advanced text-and-audio pathway and a Korean-NLP strength strategy individually during a diagnostic session.