The Mathematics of Learning

duiduidui! is built on a unified mathematical framework combining Bayesian inference, information theory, and cognitive science. It starts with a difficulty scale rooted in the structure of Chinese itself, then measures everything — your knowledge, your skill, your cognitive load — on that same scale.

1. The Difficulty Hierarchy

Every character, word, and phrase in the app has a difficulty score — a single number that tells the system how hard it is. These scores aren't arbitrary: they follow the compositional structure of Chinese itself. Simple characters get low scores, and when characters combine into words and phrases, the difficulty of the combination is calculated from the difficulty of its parts. This creates a natural learning progression from easy building blocks to complex expressions.

This difficulty scale is the foundation of everything else in the system. A user with a skill level of N should be more likely than not to have mastered any item with a difficulty score of N or lower. Every model described on this page — proficiency tracking, skill estimation, cognitive load, card selection — operates on this same scale.

Single Characters

The foundation. Every character has a hand-curated difficulty score based on character complexity, usefulness, and real-world frequency. Examples:

Character	Pinyin	Meaning	Difficulty
一	yī	one	$1$
二	èr	two	$2$
三	sān	three	$3$
你	nǐ	you	$4$
好	hǎo	good	$5$

Multi-Character Phrases

When characters combine into a phrase, the phrase's difficulty is always higher than its hardest character — but not as hard as simply adding all the character difficulties together. This makes intuitive sense: knowing the individual characters gives you a head start on the phrase, but there's still something extra to learn about how they work together.

Formally, for a phrase $P$ composed of characters $C_1, C_2, \ldots, C_n$, difficulty is calculated using the ceiling of the root-sum-of-squares:

$$\text{difficulty}(P) = \left\lceil \sqrt{\sum_{i=1}^{n} \text{difficulty}(C_i)^2} \;\right\rceil$$

This formula has several desirable properties:

Strict monotonicity: Phrases are always harder than their hardest component, guaranteed by the ceiling operation.
Subadditivity: $\text{difficulty}(P) < \sum_i \text{difficulty}(C_i)$ — the whole is easier than the sum of its parts.
Component sensitivity: Every constituent character influences the final difficulty.

Worked Examples

Phrase	Components	Calculation	Difficulty
不好 (bùhǎo, bad)	不(8), 好(5)	$\lceil\sqrt{64+25}\rceil = \lceil 9.4 \rceil$	$10$
很好 (hěnhǎo, very good)	很(6), 好(5)	$\lceil\sqrt{36+25}\rceil = \lceil 7.8 \rceil$	$8$
你好 (nǐhǎo, hello)	你(4), 好(5)	$\lceil\sqrt{16+25}\rceil = \lceil 6.4 \rceil$	$7$
好吃 (hǎochī, delicious)	好(5), 吃(12)	$\lceil\sqrt{25+144}\rceil = \lceil 13.0 \rceil$	$13$
你很好 (nǐhěnhǎo, you're very good)	你(4), 很(6), 好(5)	$\lceil\sqrt{16+36+25}\rceil = \lceil 8.8 \rceil$	$9$

Radicals

Radicals are the recurring building blocks inside characters — like how the "water" radical (氵) appears in 河 (hé, river), 海 (hǎi, ocean), and 湖 (hú, lake). Learning radicals helps you recognize patterns and guess meanings, but they're abstract concepts that only make sense once you've seen enough real characters. So the app waits until you've built a solid foundation of characters before introducing them, ensuring you have plenty of concrete examples to anchor each radical's meaning.

Radical difficulty is calculated to favor common radicals early:

$$\text{difficulty} = \max\!\Big(500,\; 500 + \frac{d_3}{\sqrt{c}} + \frac{c_{\max} - c}{10}\Big)$$

where $d_3$ is the difficulty of the 3rd-easiest character containing the radical, $c$ is the number of characters referencing it, and $c_{\max}$ is the count of the most-referenced radical. Common radicals surface earlier; rare ones wait.

2. Bayesian Proficiency Model

Think of each flashcard as having a hidden "knowledge meter" somewhere between 0% and 100%. When you first encounter a card, the app genuinely doesn't know whether you'll get it right — you could be anywhere on that scale. With each review, it watches how you do and nudges its estimate accordingly: get it right, and the meter climbs; get it wrong, and it dips back down. Over many reviews, the app builds an increasingly accurate picture of what you actually know.

Mathematically, we model your knowledge of each Chinese phrase as a Bernoulli random variable with unknown parameter $\theta \in [0,1]$, representing the probability you'll correctly recall it. We place a Beta prior on $\theta$:

$$\theta \sim \text{Beta}(\alpha, \beta)$$

The Beta distribution is the conjugate prior for the Bernoulli likelihood, which means every review produces an elegant closed-form update:

$$\theta \mid \text{data} \;\sim\; \text{Beta}(\alpha + s,\; \beta + f)$$

where $s$ is the number of successes and $f$ the number of failures. The expected proficiency at any point is:

$$\mathbb{E}[\theta] = \frac{\alpha}{\alpha + \beta}$$

Review Score Updates

After each review, you tell the app how it went: Easy, Good, Hard, or Fail. Each response provides a different amount of evidence. Nailing an "Easy" is a strong signal that you know the material. Tapping "Hard" is a gentle hint that you're still working on it. Here's how each outcome shifts the model:

Outcome	$\Delta\alpha$	$\Delta\beta$	Interpretation
EASY	$+1.0$	—	Strong evidence of knowledge
GOOD	$+0.7$	—	Moderate evidence of knowledge
HARD	—	$+0.3$	Slight evidence of difficulty
FAIL	—	$+1.0$	Strong evidence of non-knowledge

Conservative Convergence

The system is deliberately conservative — it won't decide you've mastered something after just one or two correct answers. Mastery is earned through consistent performance over time. This means a lucky guess won't fool the system, but genuine knowledge will always be recognized:

After $n$ EASY reviews	Distribution	$\mathbb{E}[\theta]$
$n = 0$	$\text{Beta}(1, 1)$	$0.50$
$n = 1$	$\text{Beta}(2, 1)$	$0.67$
$n = 3$	$\text{Beta}(4, 1)$	$0.80$
$n = 10$	$\text{Beta}(11, 1)$	$0.92$

This conservatism prevents overconfidence from lucky guesses while allowing genuine mastery to be recognized over time.

3. Component Attribution

Here's something unique about Chinese: characters combine to form words, and words combine to form phrases. When you successfully review a phrase like 好吃 (hǎochī, delicious), that tells us something not just about the phrase, but also about the individual characters 好 (hǎo) and 吃 (chī) that make it up. You clearly know those characters too — otherwise you couldn't have gotten the phrase right.

But here's the subtle part: not every character in a phrase deserves equal credit. If one character is much harder than another, and you got the whole phrase right, the hard character is the more impressive part. So we give it more credit.

Difficulty-Weighted Attribution

Each character $i$ in a phrase has a difficulty score $d_i$. We use squared difficulties to compute normalized weights:

$$w_i = \frac{d_i^2}{\displaystyle\sum_{j} d_j^2}$$

The quadratic relationship ensures harder characters receive proportionally more attribution. For a review with outcome $O$, each constituent character receives a weighted update:

$$\Delta\alpha_i = w_i \cdot \Delta\alpha(O) \qquad\qquad \Delta\beta_i = w_i \cdot \Delta\beta(O)$$

Worked Example

Suppose you score EASY on 好吃 (hǎochī, delicious), where 好 (hǎo) has difficulty 5 and 吃 (chī) has difficulty 12:

$$w_{\text{好}} = \frac{5^2}{5^2 + 12^2} = \frac{25}{169} \approx 0.148$$ $$w_{\text{吃}} = \frac{12^2}{5^2 + 12^2} = \frac{144}{169} \approx 0.852$$

So 吃 (chī) receives about 6× the attribution of 好 (hǎo), reflecting its greater contribution to the phrase's difficulty. This propagation means that every phrase review simultaneously builds evidence about every character inside it.

4. Skill Level Tracking (Clamped IRT)

Beyond tracking individual cards, the app also maintains an estimate of your overall skill level — a single number that represents roughly how difficult of a card you can handle. This number lives on the same difficulty scale as the dictionary: a skill level of 100 means you should be comfortable with characters and phrases up to difficulty 100.

This estimate is deliberately cautious: it only updates when you do something surprising. Getting an easy card right doesn't prove anything new, and getting a hard card wrong is expected. But getting a hard card right? That's meaningful evidence that your skill level should increase.

Technically, this uses a custom Item Response Theory model with asymmetric evidence clamping. Unlike traditional Elo or Glicko systems where single observations can cause large jumps, the clamped approach ensures skill estimates move gradually toward demonstrated competence.

Two-Parameter Model

Your skill is represented by $\mu$ (current estimate) and $\sigma$ (uncertainty):

$$\text{skill} = \big(\mu,\; \sigma\big) \qquad \text{initially} \;(\mu_0 = 0,\;\sigma_0 = 50)$$

The 95% confidence interval at any time is:

$$\text{CI}_{95\%} = \mu \pm 1.96\,\sigma$$

Asymmetric Evidence Clamping

The key insight is that not all review outcomes are equally informative. Imagine you're a skill-100 learner who gets a difficulty-5 card right — of course you did, that's trivial for you. But if you get a difficulty-150 card right, that's real news. The system only updates your skill estimate when the outcome is genuinely surprising:

Outcome	Card vs $\mu$	Informative?	Effect
Success	difficulty $> \mu$	Yes	$\mu$ moves toward difficulty
Failure	difficulty $< \mu$	Yes	$\mu$ moves toward difficulty
Success	difficulty $\leq \mu$	No	No change
Failure	difficulty $\geq \mu$	No	No change

Clamped Update Rule

When a surprising outcome happens, your skill estimate moves partway toward the card's difficulty — but never all the way. Each informative review closes 35% of the gap, which means the system converges smoothly without overreacting to any single review:

$$\mu' = \mu + \alpha \cdot (d - \mu) \qquad\text{where}\quad \alpha = 0.35$$

This is exponential smoothing — each informative review moves skill 35% of the remaining distance toward the demonstrated level. Convergence is asymptotic: you can never overshoot in a single observation.

Review	$\mu$	Distance to $d = 100$
Start	$0$	$100$
1	$35.0$	$65.0$
2	$57.8$	$42.2$
3	$72.5$	$27.5$
5	$88.4$	$11.6$
10	$98.6$	$1.4$

Uncertainty Decay

The system also tracks how confident it is in your skill estimate. Early on, it's quite uncertain and open to big revisions. As you provide more informative reviews, that uncertainty shrinks — but it never reaches zero, because people's abilities naturally fluctuate over time.

Importantly, only surprising outcomes reduce uncertainty. You can't game the system by drilling easy cards all day — that won't make it any more confident about your true level.

Uncertainty decreases geometrically, but only when evidence is informative:

$$\sigma' = \sigma \cdot \gamma^{\,n} \qquad\text{where}\quad \gamma = 0.93$$

Each informative review reduces $\sigma$ by 7%. After 13 informative reviews, $\sigma$ drops below 20 — high confidence. A floor of $\sigma_{\min} = 5$ maintains epistemic humility: the system never becomes completely certain.

Crucially, uninformative outcomes (success on easy cards, failure on hard cards) do not reduce $\sigma$. You can't build confidence by drilling material you've already mastered.

5. Cognitive Load Model

Have you ever crammed too many new vocabulary words at once and ended up remembering none of them? That's cognitive overload, and it's the biggest enemy of effective language learning. duiduidui! constantly monitors how much mental effort you're juggling and adjusts the pace accordingly. When you're fresh and things are clicking, it introduces new material. When you're at capacity, it slows down and focuses on reinforcing what you've already seen.

The system tracks four factors to estimate your cognitive load: how uncertain each card is, how well you know it, how hard it is relative to your level, and how much total capacity you have as a learner.

Only active cards (cards you've directly reviewed) contribute to cognitive load. Cards with only implied reviews from component attribution are excluded — they represent inferred knowledge, not active learning burden.

Attention Demand

A card you've never seen demands maximum attention — it could be anything. A card you've reviewed many times, whether you know it well or not, demands less attention because you've at least categorized it in your mind. The system measures this using the statistical uncertainty of your knowledge estimate:

$$\text{Var}[\theta] = \frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}$$

We normalize this into an attention score:

$$\text{attention} = \sqrt{\frac{\text{Var}[\theta]}{0.25}}$$

Maximum attention occurs at $\text{Beta}(1,1)$ (never seen). As evidence accumulates, attention drops toward zero regardless of whether the card is easy or hard — you've simply categorized it.

Mastery Factor

As you get better at a card, it takes up less mental space. This isn't a sudden switch — it's a smooth curve where every small improvement in proficiency gives you a little more breathing room. A card you're 80% confident about barely registers as cognitive load at all:

$$m(\theta) = \frac{1}{1 + e^{12(\theta - 0.65)}}$$

$\theta$	$m(\theta)$	Relief
$0.50$	$0.95$	5% — card is starting to stick
$0.60$	$0.73$	27% — knowledge solidifying
$0.70$	$0.38$	62% — mostly mastered
$0.80$	$0.11$	89% — fully mastered
$0.90$	$0.01$	99% — automatic recall

Every increment in proficiency provides tangible relief — no plateau feeling.

Relative Difficulty Weight

Difficulty is always relative. A card with difficulty 50 is overwhelming for a beginner but trivial for an intermediate learner. The system compares each card's difficulty to your current skill level, so a card that matches your level always contributes the same amount of load — whether you're on day one or day three hundred:

$$r = \sqrt{\frac{\text{difficulty}}{\text{effective\_skill}}} \qquad\text{where}\quad \text{effective\_skill} = \max(\mu, 10)$$

A card at your skill level always has weight $\approx 1.0$, whether you're a beginner (skill 10, difficulty 10) or intermediate (skill 500, difficulty 500).

Cognitive Capacity

Beginners can only juggle a handful of cards before feeling overwhelmed. But as you learn more Chinese, you develop mental shortcuts, pattern recognition, and contextual anchors that let you handle more material at once. The system models this growing capacity — an advanced learner can comfortably manage ten times more active cards than a beginner:

$$\text{capacity} = \text{effective\_skill}^{\,0.6}$$

Skill Level	Capacity	~Cards for 80% Load
10 (beginner)	$4.0$	~10
100	$15.9$	~39
500 (intermediate)	$41.6$	~100
2000 (advanced)	$95.6$	~235

Combined Load

Putting it all together: each card's cognitive cost depends on how uncertain it is, how well you know it, how hard it is for someone at your level, and your total capacity. Sum these up across all your active cards and you get a single number that represents how "full" your brain is:

$$\ell_i = \frac{\text{attention}_i \;\cdot\; m(\theta_i) \;\cdot\; r_i}{\text{capacity}}$$

Total cognitive load is the sum across all active cards:

$$L = \sum_{i \in \text{active}} \ell_i$$

The capacity normalization ensures that "80% loaded" always means "near your personal limit" — whether that limit is 10 cards or 235.

6. Adaptive Card Selection

This is where all the pieces come together to answer the most important question: what should you study next? The system faces a fundamental tradeoff — introduce something new and exciting, or go back and reinforce something you're still learning? Too much new material and you'll forget everything. Too much review and you'll get bored. The answer depends on how loaded you are right now.

Strategy Selection

When your cognitive load is low (you have plenty of mental bandwidth), the system leans heavily toward introducing new cards. When you're approaching capacity, it shifts almost entirely to review. The balance point is determined dynamically:

$$T = \text{clamp}\!\big(0.5 + \Delta(L),\; 0.1,\; 0.9\big)$$

When $L < 0.3$ (ample capacity), $\Delta = +0.3$ — heavily favor new cards. When $L > 0.8$ (overloaded), $\Delta = -0.4$ — almost exclusively review.

Probabilistic Review Selection

When the system decides to show you a review card, it doesn't just pick at random. It looks for cards in your "sweet spot" — ones you're uncertain about (not too easy, not too hard) and whose difficulty matches your current skill level. Cards right at the edge of your knowledge get the highest scores, because that's where review is most valuable:

$$\text{score} = 0.7 \cdot \underbrace{e^{-\left(\frac{\theta - 0.5}{0.2}\right)^2}}_{\text{learning zone}} + 0.3 \cdot \underbrace{e^{-|\mu - d|/50}}_{\text{difficulty match}}$$

The first term is a Gaussian centered at $\theta = 0.5$ — the optimal learning zone where you're most likely to benefit from review. The second term favors cards whose difficulty is close to your current skill level. Cards are then selected probabilistically according to these scores.

7. How It All Fits Together

Every time you tap a button — Easy, Good, Hard, or Fail — you set off a chain reaction through every layer of the system. Your knowledge estimates update, your skill level adjusts, your cognitive load recalibrates, and the next card is chosen. It all happens in milliseconds, and the result is a learning experience that feels like it's reading your mind.

Here's the full cascade:

Review event: Update Bayesian $\alpha, \beta$ for the phrase and propagate weighted updates to all constituent characters.
Skill update: If the outcome is informative, adjust $\mu$ toward the card's difficulty and reduce $\sigma$.
Cognitive load: Recalculate $L$ from all active cards' attention, mastery, relative difficulty, and capacity.
Strategy selection: Use $L$ to set the new/review threshold $T$.
Card selection: Choose the next card — new or review — scored by learning value and difficulty match.

What makes this system robust is that it self-corrects. Several natural feedback loops keep everything in balance:

Overload protection: High $L$ → fewer new cards → $L$ decreases.
Progress acceleration: Low $L$ → more new cards → sustained challenge.
Difficulty calibration: $\mu$ adjusts via clamped IRT → better card matching → appropriate challenge.
Uncertainty reduction: Informative reviews → $\sigma$ shrinks → narrower confidence bounds → more precise card selection.

The result is a learning experience that feels effortless — because the mathematics are doing the heavy lifting behind every card.

Read Our Approach