"The decipherment of Hieroglyphic Egyptian required the replacement of the deep-seated notion of symbolism by the correct view that the main (though not the only) feature of the script is phonetic."
— Cyrus H. Gordon, Forgotten Scripts, 1968, p. 24

Roughly 80% of Chinese characters can speak—whether in something close to Mandarin or in the ancient utterances of Old Chinese, understood only by scholars. They’re not particularly talkative, capable of producing just one or two syllables in their own language. In fact, the only thing they were ever taught to say, is their own name.

Teacher: Chinese characters are not difficult. They are just pictograms 1—simplified drawings representing real things. Try to guess what these mean: 一, 二, 三, 火, 雨, 山, 木…

Student: Oh! Clearly those first three represent the numbers 1, 2 and 3. I can imagine 火 represents fire, perhaps 雨 represents rain. The character 山 is definitely a crown, so maybe it stands for a king…? Hmmm… I think 木 is a depiction of a slingshot—yeah, someone is aiming and about to shoot.

Teacher: I can see where you’re coming from, but 山 actually represents a mountain. And as for 木—well, you’ve got quite the imagination, but it’s just a tree. Surely you see that 女 shows a woman with her legs crossed, right? 2

Student: Huh… well, now that you mention it, and with a bit of imagination… Let’s also pretend men didn’t like crossing their legs in ancient times.

Teacher: Ok, let’s try a harder example: the character 玛, which can mean agate, a type of precious stone. It consists of two parts: 𤣩 and 马.

  • The first part 𤣩 comes from 玉, which depicts three pieces of jade strung together on a string. When used in other characters, it takes the form 𤣩 and typically relates to gems.
  • The second part 马 is the simplified version of 馬, which is a depiction of a horse.

In ancient times, some traders would showcase their gemstones—like agates—at markets. But some buyers would question their authenticity. To gain trust, traders would also show off their magnificent horses, symbols of wealth and elegance, just like the gems themselves.

Student: Wow, seems like these ancient Chinese people also had some imagination… Anyway, how is this story specific to agate? It could just as well apply to any other precious stone, right? And wait—how do you even pronounce this character? We haven’t even talked about pronouncing characters! Such a cool story for the meaning, but maybe there’s more to it?

If you’ve ever started learning Chinese, you might find the previous dialogue somewhat relatable. If you’ve delved deeper, perhaps you already know that characters can speak. Or maybe you still think of them as mute individuals, silent figures that only convey meaning (I made up the story for 玛—guessed it?). Of course, you could argue that this dialogue is exaggerated and that teachers do mention that phonetic parts of characters are a thing, and you would have a point. However, this is often only briefly touched upon, whereas pictogram-based explanations tend to dominate. Only these, in turn, are the ones that stick with students, shaping their perception of Chinese characters for the rest of their journey.

Consider the following set of characters: 七, 十, 五, 三, 一, 六, 九, 四, 八, 二. These are the Chinese characters for the numbers from 1 to 10, but shuffled in a random order. Would you be able to guess them if you didn’t already know them? From the previous dialogue, you should already know that 一, 二 and 三 are 1, 2 and 3 respectively. As for the rest, if by any chance you have some strong detective-like intuition and manage to figure them out, I would be delighted to know how3. For the rest of us, they’re just a bunch of strokes put together, following no apparent easy ideogram explanation like for the first three numbers. Why these choices? Open your ears wide—you will slowly start unraveling the hidden voices of the characters!

Excuse me, can I borrow your sound?

Okay, our previous game wasn’t fair. Instead, I could have told you that the number 4 is written as 亖, and you would, of course, believe me. How is this not better than 四? What is 四 even representing?

You see, Chinese people like their characters stylized. The character 亖 was once the representation of the number 4 in ancient times, but they must have felt that four horizontal strokes were too much—not exactly in line with their idea of beautiful characters. They could have chosen other visual ways to represent numbers (good ol’ tally marks?), but instead, they took another ingenious path.

Enter the world of sound loan.

Because spoken language is older than written language, people already had a sound for the number 4 (). Incidentally, the sound for ‘to exhale’ was also —believe it or not, it was originally represented by 四 (trust me, this is a depiction of a nose). Can you connect the dots? There was a character, 四, that already had a meaning (to exhale), but people also started using it for another meaning (the number 4) just because the sounds matched. Now, for the number 4, the character only tells us how to say it, there’s no direct meaning involved! As is often the case with these examples, over time, the original meaning of ‘to exhale’ gradually fell out of use. It was later represented by 呬, with a mouth (口) component added to reinforce the meaning and distinguish it from 四. Even so, the character 呬 eventually became obsolete too, with speakers now opting instead for compound words (terms made of at least two characters).

Evolution of number 4 (leftmost) to borrowed to exhale, nose depiction (the rest)

No matter if the explanation for the number 4 left you satisfied or disappointed, the truth is that the characters for the rest of the numbers follow the same concept of sound loan. We could now discuss the next number, the number 5, which is represented by the character 五 (wǔ). A sharp reader might try to explain that this character is made up of five strokes, so maybe there were five individual lines at some point in the past. In fact, a more experienced Chinese learner would know that the character is actually written with four strokes (one with a bend), not five, and even if we look at early texts, there are only four well defined strokes, following a crisscross pattern, which was also the original meaning. While there are some theories that try to explain this choice, the most logical one is still—you guessed it—just a sound loan. The sound for the number 5 happened to match, so the character was borrowed just for its sound.

Evolution of the character 五 now used to represent the number 5

This pattern of borrowing a character only for its sound and using it to represent a meaning completely unrelated to the original character’s meaning is quite common in general—it’s not limited to numbers. For instance, another one of the first characters Chinese learners see is 我 (wǒ), which is used as the first person singular pronoun (‘I’ in English). This character is a depiction of a trident-like weapon. You already know where this is going, don’t you? Yes, just a sound loan.

I am not a linguist myself, but I am at a point where I find knowing the real explanations (i.e., the most widely accepted by scholars) much more satisfying than coming up with a mnemonic story to remember the character and its meaning, even if the latter seems easier. In the end, we still have to learn by repetition. Anyway, I have nothing against mnemonics themselves (Heisig enjoyers, u there?), as long as we acknowledge that the vast majority are made up. However, there is usually only a really thin line between these stories and folk etymology (people thinking these stories are the real origins of the meanings). Maybe some readers have heard the popular explanation for 我—that it depicts a hand (手) holding a weapon (戈) to protect oneself (我). While the story sounds reasonable, scholars believe it to be a reinterpretation of the character into more commonly understood concepts, which is why it falls into folk etymology. In contrast, our sound loan explanation is simple but less sticky, which is why many people don’t find it useful.

Many students who start learning Chinese feel attracted by the characters so much so that they often fall for The ideographic myth4. Thus, methods like Heisig’s Remembering the Hanzi disregard character pronunciation, as if it were acceptable for students to turn a deaf ear to the characters’ voices. I myself used Remembering the Kanji—the equivalent for Japanese characters—when I was learning Japanese, and memorized hundreds of them without knowing how to pronounce them. There is actually a part 2 of Remembering the Kanji which focuses on mnemonic stories for how characters are read, although as far as I know this doesn’t exist in Remembering the Hanzi, making it rather detrimental.

I started with a popular statistic about 80% of Chinese characters being able to speak. This was to catch the reader’s attention, because in reality this percentage doesn’t include the sound loan characters introduced in this section. While not many people know about the widespread use of sound loan in common characters, others do know about another special type—the one actually being referred to when talking about the 80%—and potentially the most important when trying to decipher the characters’ voices.

But my life isn’t totally meaningless

Of course, let’s not forget that at some point in ancient times, before some were borrowed for their sound, all characters did have an original meaning. Recall, for example, the character 马 introduced in the dialogue, which is a depiction of a horse, as a simplified version of 馬. Do you still not see the horse here? Oh well…

Evolution of the 馬 character from left to right

Now the meaning argument goes like this: Chinese characters, when part of larger characters, still contribute their meaning as part of a compound explanation. If we believe this, then naturally, we must find a reasonable story to connect the pieces in the character 玛—we have to find a way to connect the seemingly unrelated meanings of its components gemstone and horse. While certain characters composed only of meaning components—known as associative compounds—are widely accepted, we shouldn’t force this idea onto the vast majority of compound characters, despite many attempts to do so.

Enter the world of phono-semantic compounds.

Our example 玛 (agate5) has a much simpler explanation—though for some, a less satisfying one. The character 马 is pronounced mǎ, and, well, the character 玛 is pronounced—yeah, you guessed it—also mǎ. The character has a meaning component, 𤣩, which provides a small hint—indicating that it’s a kind of gemstone—and a sound component, which suggests its pronunciation but doesn’t necessarily contribute to its meaning. This is how phono-semantic characters work. Believe it or not, you now more or less understand how 80% of Chinese characters work.

Now that we know 马 can be used as a sound component, a Chinese learner might recall two other characters that are taught right at the start.

  • The character 吗 (ma). This is used as a sentence-final particle6 to denote yes-no questions. Particles are often another example of phono-semantic characters. They typically include the mouth 口 (i.e. speech) as the meaning component, alone with the sound component. In the case of 吗, our lovely horse provides the sound. Just note that the tone7 is not the same, but as I mentioned earlier, the sound component serves as a hint and doesn’t need to be exactly spot on.
  • The character 妈 (mā), which means mother, usually also doubled as 妈妈. From the previous dialogue, you now know that 女 means woman. What do women have to do with horses? Oh, come on, Chinese characters are not a joke… Fortunately, by this point, you can already understand how this works: a mother is a woman, and 马 provides the sound—it’s as simple as that.

You can see where this is going. There are some characters which are commonly used as sound components of other characters, and 马 is one of them. But they don’t have to be just one type of component. In this case, 马 can also work as a meaning component in many other characters, and in these cases the meaning related to horses should be much more convincing than, say, my made-up story for 玛. A couple of examples (still phono-semantic) of 马 as a meaning component:

  • The character 骑 (qí), which originally meant to ride a horse, but later on extended its meaning to riding in general (horse, motorcycle, bike…). Here, it makes sense that 马 provides the meaning, while the other component, 奇 (qí), provides the sound. But wait, 奇 means ‘weird, unusual’… No, no, no, no, no, there is nothing weird about riding horses. Remember, there’s no need to force meanings—it just provides the sound.
  • The character 驴 (lǘ), which means donkey. Again, 马 provides the meaning (a donkey is similar to a horse), while 户 (hù, meaning ‘door’) provides the sound. Another witty joke? Okay, okay, no more jokes. Here 户 just gives the sound. But are hù and lǘ that similar sounds…? Well, you decide. But do they really need to be? Not necessarily. How come? Bear with me—we’ll go back to this example in the next section.

The sharp reader may have noticed that when the character works as a meaning component it appears on the left, and when it works as a sound component it appears on the right. This is a nice observation, and is true most of the time, so it can be used as a rule of thumb. However, always be careful because there are also exceptions where this pattern doesn’t hold.

The set of other characters where a specific character is used with the same function is called a series. In the case of 马, if you’re interested in more examples, the (probably non-exhaustive) series are:

  • The sound series: 吗 (ma), 妈 (mā), 玛 (mǎ), 码 (mǎ), 蚂 (mǎ), 骂 (mà).

  • The semantic series: 骑, 驴, 验, 驻, 骗, 驱, 驶, 驾, 冯, 驳, 骤, 驰, 闯, 骚, 骄, 骇, 驼, 驯, 骆, 笃, 骏, 骡, 驭, 驮, 驹, 骁.

While the number of characters commonly used as meaning components is a rather sane amount, the number of those used as sound components is I think way too large. However, I don’t recommend trying to memorize either sound or semantic series of characters, since I think this is something you pick up naturally while encountering the characters themselves, and I would instead focus on learning the most common characters. For instance, from the sound and semantic series for 马 presented above, I could probably identify the function of 马 in them without knowing the actual character, and honestly, I don’t even know most of them, and it would be useless to try to do so just because they happen to be in the same series.

At the end of the last section, I mentioned Remembering the Hanzi as a book that seemed to ignore the phonetic aspect of characters. Thanks to some fellow learners, I also became aware of a similar book in Spanish by Pedro Ceinos titled Caracteres chinos (Chinese characters). This one actually does seem to introduce phonetic components, trying to group the characters by their series, but it always tries to enforce a meaning along with the sound. In the book’s introduction he explains this point of view, which he shares with a few other scholars, that no sound component comes without a slight hint of additional meaning to the overall character, effectively saying that almost all sound components are also meaning components at the same time. In addition, even though he mentions the concept of sound loan, in the actual characters where this happens he doesn’t identify it, relying only on meaning-based arguments.

While most scholars believe that sound loans and sound components can stand on their own without adding any additional meaning—just like I tried to convey here—I still think authors like Ceinos are laudable for their hard work. He himself states that he’s no linguist, and his book is a result of his own experiences and beliefs, not necessarily aligned with other linguists’ point of view, which he is well aware of.

By now, I think you can’t deny that characters do speak, and sound is a crucial part of most of them. Unfortunately, you may still encounter some tricky characters whose voices you will struggle to understand. Although this can be quite challenging, we will devote the next section to at least try to decipher some of them.

I think I’m just a bit empty

In this article, I’m focusing on simplified Chinese characters, which are used in mainland China, Singapore, and Malaysia, because that’s what most foreign learners start with, although I think experienced learners should know both simplified and traditional scripts. For example, 馬 is the traditional horse, while 马 is the simplified counterpart. Traditional characters are now mainly used in Taiwan, Hong Kong and Macao. Until the last century, they used to be a single script with maybe some character variants. We won’t try to address why and when they changed but rather what they changed and how these changes affected the understanding of the characters’ etymology.

In the previous section, we saw 驴 (lǘ) as an example of a phono-semantic character, where 户 (hù) gave the sound. While you could believe this works as a sound hint, another person might argue they are quite different. While in some cases the answer won’t be too satisfying, in this case, it’s quite simple. The traditional version of 驴 is 驢, and it turns out that the phonetic component’s (盧) sound is lú, which seems much more accurate. What happened here?

Enter the world of empty components.

When characters are simplified in one way or another, we’re doomed to lose some important information about them. In the previous example, the simplified component 户 can be known as both an empty component and a sound component. When simplifying characters, there are a few ways to do it. In this case, 户 doesn’t look excessively similar to 盧, nor does its sound, but together they somehow work as a simplified alternative. It can be called empty because it lost its original form, although it can also still be called a sound component because it still conveys a small sound hint. Simplifying characters often comes at a small cost—they become harder to understand logically.

Other times, however, the simplified character works quite well without losing any significant parts. Let’s have a look at 认 (rèn), which can mean to know (part of the common compound 认识, rènshi, to know), and whose traditional character is 認.

  • The meaning component is 讠, which is the simplified version of 訁, meaning speech or something related to speech, giving the overall meaning to recognize, to admit, to know. One of the common simplification techniques is the regularization of cursive script. In fact, our lovely 马 also comes from a cursive variant! In some cases, this may still look similar to the original character and thus not be considered an empty component. Additionally, this simplification of some characters is consistent and always used this way when it works as a component in other characters.
  • The sound component is 人 (rén), which is totally not a reasonable simplification of 忍 (rěn) when it comes to its form, and can thus be considered an empty component. But keep in mind that its function was only as a sound hint, and the simplified alternative also does this pretty well, so it’s still a perfect sound component in the simplified character. We couldn’t ask for more. Also, remember that it’s fine for characters to work as either meaning or sound components, so in this case, just because 人 (meaning person) is almost always a meaning component (written as 亻) when part of other characters, you don’t need to force a meaning, it can also just work as a sound component.

Was that okay? Well, now for the real mess… A character that often appears as an empty component in simplified characters is 又 (yòu), which would by itself be a depiction of a right hand, and looks something more similar to ㄡ when used as a component on the left side. Let’s see some examples in characters that may be familiar even to inexperienced Chinese learners:

  • The character 汉 (hàn), which can now reference the Han ethnic group, and is also part of the word 汉字 (hànzi), which means Chinese character. The 又 is an empty component and used to be the sound component 𦰩 (jiān) for the traditional variant 漢.
  • The character 难 (nán), which now means difficult just by pure sound loan. However, before that, its original meaning was a type of bird, with 又 being an empty component, simplification of 𦰩 (jiān) from the traditional variant 難. What? The same one as for 汉? Then it’s always like this? Ha! Not even close… Just wait and see the next ones.
  • The character 对 (duì), which now means yes or to face, among other things. Here the 又 is an empty component, simplification of 丵 and 土 in 對. It used to represent something related to developing the land, certainly not a sound component, maybe later a sound loan, but well, I won’t even try to explain this one because I don’t even get it myself.
  • The character 欢 (huān), which can mean happy. Here the 又 is—you guessed it—an empty component, simplification of 雚 (guàn), a sound component in the traditional… What? Well, somehow there are four different variants in the traditional script—歡, 驩, 懽, 讙. At least the sound component is the same in all of them…
  • The character 观 (guān), which means to observe. Your good friend 又 is yet again here trying to hide another sound component, this one again being 雚 (guàn), from the traditional variant 觀.
  • The character 鸡 (jī), which means chicken. You obviously know what function 又 plays, and it used to be 奚 (xī), a sound component for the two different variants 鷄 and 雞 of the traditional script.

Phew! That really was a mess… And believe me there are like ten more characters where 又 is an empty component. When confronted with these difficulties, one could only think that maybe traditional characters are not that hard, since their components are completely more consistent.

Another unfortunate example of empty component can be found in 续 (xù), which can mean to continue. Here, 卖 (mài) is supposed to be the sound component. Well, this doesn’t sound quite similar… Its traditional form is 賣. It turns out that this is accurate for the single character 卖, but for our original one 续/續, the component 賣 was actually a simplification of 𧶠8 (you’d better zoom your screen if you can’t tell the difference), which is pronounced yù (much more similar to xù). Therefore, the sound component was quite accurate at some point in the past but turned into an empty component when it was simplified to the unrelated 賣, leaving us apparently with two different sounds for the same character. But now you know the reason.

Seal scripts for 续/續 (left), its actual sound component 𧶠 (middle) and the real 賣 (right)

We’re bringing the third act to a close, but there’s still one thing I want to talk about. In the first sentence of the article, I mentioned that characters can produce utterances of Old Chinese understood only by scholars. Although not directly related to empty components, this phenomenon can also disguise the real intentions, just like empty components do. I will try to explain what I meant with another example.

Consider the character 也 (yě), which can now be used by sound loan to mean also or can be used as a particle6. The following is, again, a non-exhaustive list of its sound series: 他 (tā), 她 (tā), 牠 (tā), 地 (dì), 施 (shī), 池 (chí), 驰 (chí), 弛 (chí), 㐌 (yí), 匜 (yí). What? There are at least five different sounds here! What’s going on? Well, as almost always, the explanation can be quite simple, although not too satisfying.

You see, the sounds I showed you are the evolved modern sounds used in Mandarin for those characters. Thousands of years have passed, and spoken language evolves, so in the past, all of them would have had a similar sound, but they evolved into quite different sounds as time passed. For instance, some of them in Old Chinese are transcribed as *l̥ʰaːl, *l’els, *hljal, *l’aːl, *l’al, *hljalʔ, *lal. No, no, no, I’m not an expert in Old Chinese. I just though you would like to see the transcriptions yourself (see this phonetic series table). Unless we happen to be linguists ourselves, trained in the recognition of sound evolutions, this can just be a pain, but the only workaround is to acknowledge that these characters now have multiple voices, and we should take all of them into account. In the end, I guess we should just trust phonologists on this one.

What a beautiful recital!

By now, I just hope I didn’t disappoint you when I claimed that Chinese characters can speak. I will be satisfied if I helped you train your ear just a tiny bit so that next time you encounter a character in the wild, you’ll have a higher chance of deciphering its voice. Let’s review what we learned here.

  • The sound loan phenomenon. Sometimes characters are borrowed just for their sound, as simple as that. They had an original meaning but it started being substituted by another one whose spoken word sounded the same, perhaps because it was too abstract of a concept to build a real pictogram for. Example: 四 (sì), originally to exhale, but now four (sì), because it had the same sound.
  • The phono-semantic compound. Roughly 80% of Chinese characters are built from two blocks: a meaning component, which gives a small hint for its meaning, and a sound component, which gives a small hint for how it’s read. Example: 妈 (mā), from 女 (woman), meaning component, and 马 (mǎ), sound component.
  • The empty component. Sometimes a component undergoes a simplification process that loses the clearer sight of its function in the character, usually conveying misleading information (mostly about its sound). Example: the excessively large number of unrelated components that 又 simplifies, like 汉, 难, 对, 欢, 观, 鸡, etc.

You’re now closer to becoming the conductor of the characters’ orchestra!

It was never my intention to make you think that learning and understanding Chinese characters is easy. Again, I still remember many years ago when I would immerse myself in playful stories and crafty mnemonics, with the sole purpose of getting my brain to remember a bit of these Japanese characters, which for some reason the Japanese thought it was a good idea to borrow from the Chinese hundreds of years ago. But at some point, I quit Heisig’s method out of pure boredom. I just wanted to read.

I started changing my strategy when I met etymology—real etymology. Although there were still some parts of the puzzle that I couldn’t put together at times, overall, the process of understanding the characters became increasingly clear. I was really starting to hear (and enjoy) the characters’ voices to the point that they would end up feeling like a real alphabet—not much different from a Latin alphabet (ok, ok, just a few thousand more symbols), but still a millennia-old script system that initially built itself on the pretense of pictographic roots yet eventually kept growing systematically out of the need to represent abstract and complex concepts that would otherwise prove impossible without stealing some voices from here or there, recycling some characters all over again, building their own monumental hieroglyphs, inducing false fantasies in the marveled Westerners whose eyes dreamed of intellectual paintings, metaphysical and ideal pictorial algebras of the sciences and arts9, but still, down to its core, just another way to convey the power and beauty of language.


Bonus: Character etymology resources

Pssst, the show is over…. If you keep reading this, I’m assuming I somehow caught your attention and you’re interested in knowing more about Chinese character etymology. After this long journey, I wouldn’t want you to conclude that I wrote the whole thing just as a shameless advertisement. This post was born for three main reasons:

  • I barely ever write. But I wanted to give it a try. I just wanted to convey my interests in one topic and make an attempt at some serious writing. I can now say I enjoyed the process, and I hope I’ll keep writing in the future.
  • I really got into real etymology when stumbling upon Outlier linguistics. Yeah, yeah, this is a paid service, blah, blah, blah. Since it has been so useful to me, I also wanted to present it briefly here but not in the main article, lest it would feel like I’m trying to sell you something. If you’re curious and willing to spend a bit of money, maybe you could also read the second subsection.
  • Once I got to grasp the characters’ voices myself, and also being aware of many people who still haven’t, I just wanted to put my two cents here and link them to this so that they could widen their vision a little bit.

Let’s go to the resources!

I’m not into pay-to-win, there’s plenty of free stuff on the internet

Yes, you’re right! I’m also fond of free resources, particularly when you’re a student and can’t afford any paid alternatives. I have some interesting ones for you that I also use from time to time:

  • Wiktionary: I wasn’t paying too much attention to it in the past, but believe me, I used it a lot when writing this article. When looking at entries for Chinese characters, I mainly use the section called Glyph origin, which is where you can find information like the one I present here. Look, for example, at this entry for 四, that I used to get some info for our sound loan introduction. There are probably not too many places with such a varied knowledge yet still detailed and quite accurate.
  • Hanziyuan: From 汉字源 (hànziyuán), meaning source of Chinese characters. This is a lifelong project by Richard Sears, popularly known as Uncle hanzi in China. He presents an astounding database of different versions of each character throughout history to see their evolution, along with additional data about the character, some related to its etymology and decomposition, though this information can sometimes feel a bit cryptic. The site can feel a bit old, but it’s still incredibly relevant. On the main page, there are also a few articles linked, but the links are broken. You can access them here. These are also an invaluable resource to get started in etymology and might also contain some additional explanations to better understand the sometimes cryptic information found on the site.
  • Dong Chinese dictionary: Okay, I must confess I haven’t used this one myself a lot, but probably because I only found it recently. I had a look at some dictionary entries and the origin explanations seem accurate. For instance, the entry for 四 is also in line with what we explained. It seems they take the origins seriously, as I read through this brief post. Incidentally, they also mention a resource I wasn’t aware of, which they describe as The most reliable and comprehensive source for character origins available online that I’ve found. I’ll include it here as well, again with our example for 四 . That one is only in Chinese—fortunately, we live in a time when we can use tools like ChatGPT for accurate translations, so that shouldn’t be a problem.
  • Zhongwen: When talking about Chinese characters etymology, it’s impossible not to mention the Shuowen Jiezi, a character dictionary compiled by Xu Shen by the year 100 CE, which continues to be a relevant source to this day, even though some of its character explanations may now be considered outdated due to more recent research. Many tools try to break down characters into radicals, but doing so automatically usually results in a horrible and meaningless decomposition. On this site, you can navigate through a tree containing all characters and their inclusion relations as compiled manually by the Shuowen Jiezi. Each entry also contains a little explanation from the book itself and vocabulary examples. Keep in mind that the general tree is presented in traditional script, but the simplified version is also shown for each entry. Also, remember this is faithful to the Shuowen Jiezi, so some explanations may be outdated.
  • Shuowenjiezi: Tired of seeing short explanations from others that have read the Shuowen Jiezi for you? Then why not read it yourself? Okay, this is a hardcore resource. Here, you can access the entire Shuowen Jiezi in Chinese, along with commentaries from Duan Yucai’s annotated edition from 1815 and many other sources. You must really like etymology if you end up here. Again, if you don’t speak Chinese, you can just translate, not a big deal. Another thing that I find awesome (and I value in any dictionary) is that you can click any character in the text and it directly goes to the entry for that character. A paradise for linguists and etymology enthusiasts.

Let’s acknowledge and appreciate the immense effort that so many people have put into compiling these resources and offering them for free. The world is a better place thanks to people like them.

I value my time, take my money and give me the etymology

After presenting such a useful list of free resources above, what are we left with? Can anything be better than that? Well, it depends on what you consider better. From those resources, apart from the Dong Chinese dictionary and, to a lesser extent, the Hanziyuan and the Zhongwen ones, in general, we can assume the vast majority of reliable etymology resources are not focused on Chinese learners, but rather on academia and research itself. Therefore, a Chinese language student might find them overwhelming (I’m aware you might have also felt like that when reading some parts of my article, and I’m sorry for that).

As much as I find all of this interesting, you would have a point if you said most of the information is quite unhelpful for learning the language. Unfortunately, I don’t remember when and how this happened, but at some point while I was learning Japanese I stumbled upon Outlier linguistics. I found myself watching their How Chinese characters actually work playlist and found it enlightening. Then I also read through some of their posts (the equivalents for Japanese, but I’m linking the ones for Chinese here), notably these—part 1 and part 2 (they have more nice free-to-read posts there if you’re interested).

The team at Outlier linguistics is made up of two people—Ash Henson, who has a PhD touching many fields of Chinese linguistics, and John Renfroe, who basically got into Sinology by accident but none could stop him until he found himself researching in classical Chinese, Chinese paleography, history and archaeology. Surely this pair must know something about Chinese character origins.

These two put their efforts into building a reliable dictionary of Chinese character etymology, keeping up to date with research, and trying to provide only the information necessary for Chinese learners to understand how the characters form, without necessarily going into too much detail. I found this systematic approach extremely convenient when I would be reading some text and want to know about an unfamiliar character. I can basically do it in less than 30 seconds from my phone (it integrates with the Pleco app).

If you believe you can pay back for a service that makes your life easier and you can afford it, I would highly recommend The Outlier Dictionary of Chinese Characters (the Essentials edition). The Expert edition contains much more information in the spirit of the more detailed research resources I presented before, but since this is also a lot of work to do, for now, they only have this extra information for a subset of the characters. In any case, an amazing feature is that, even in the Essentials edition, they include at least one reference to a technical book as a source for their claims, making it extremely reliable, and you can use that as a starting point if you really want to delve deeper. For learners, though, the Essentials edition is enough, just consider the Expert edition if you use the Essentials and you absolutely love it.

From everything I presented, this is the resource I use the most without a doubt, and I also used it for some of the information included in this article (trying not to abuse it too much and also draw from open resources), so I hope the authors can forgive me by having devoted a whole subsection to their amazing dictionary. I won’t try to explain anything else; they already do it a hundred times better than me.

Lastly, for those of you learning Chinese, if you accept a little advice from another fellow learner, just go and read. Use all these resources just as a helpful secondary tool. Only access them when you find characters you don’t know. Don’t lose time trying to memorize words, or worse, individual characters. The best way to remember a word or character is through the frustration of not having been able to understand it in its context. And if you don’t understand the sentence, fear not! Tell ChatGPT to break it down and explain grammar points. You will grasp them in no time. Just go and read. Personally, I love graded readers. You can see some recommendations here (I’m particularly fond of the Chinese breeze series) and also learn more about extensive reading.

Enjoy the journey.


Images sourced from Wikimedia Commons, created by various contributors. Available sources: Sinica Database, Chinese Etymology, Multi-function Chinese Character Database, Chinese Text Project, and Chinese Linguipedia. Public Domain.
  1. There is a subtle difference in wording here. First, characters like 一, 二 and 三 should be called ideograms, since they represent abstract ideas rather than pictures of real things. The other examples, however, should be called pictograms. While the terms pictogram and pictograph are sometimes used interchangeably, I would argue that pictograph is more precise when referring to ancient historical representations like cave paintings. In contrast, pictogram should be more appropriate when discussing characters within a writing system. 

  2. I’ve heard this explanation many times, and while it might seem reasonable, early depictions in texts suggest otherwise. The pictogram appears to more closely resemble a human figure with breasts, either kneeling or standing—not sitting with crossed legs. In that case, the inferred gender would be better explained by the presence of breasts, which would likely be a more convincing argument for our confused student. 

  3. If you still didn’t know the characters and tried to guess them, you can check a table of the numbers here. If you’re interested in knowing a bit more about the numbers, this article is quite interesting. 

  4. This is one chapter from The Chinese Language: Fact and Fantasy by John DeFrancis, who seems to be controversial at times. I can’t tell you why, since I have only read this specific chapter, so I can’t speak for the rest, but at least this one I found quite inspiring. 

  5. The meaning actually comes as an abbreviation of 玛瑙 (mǎnǎo), from an earlier form, 马脑, which seems to be a mistranslation from Sanskrit, mistakenly understood as horse’s brain. So while originally even the horse part could have an explanation, here we only focus on the modern character. See the Wiktionary entry for more details. 

  6. For the reader not familiar with particles, just think of them as characters without a real concrete meaning, which are used to give a sentence or a part of it a certain function or emotion.  2

  7. For the reader not familiar with tones, just keep in mind that the sound (what you say) is the same, but the intonation (how you say it) can be different. In Mandarin pinyin (the romanization of Chinese characters) we denote the intonations with different marks on the vowels. 

  8. This character is itself a simplification. Initially I wanted to show here the absolute beast that is the older character 𧸇, but most likely you can’t see it in your device unless you have a special font installed that supports old characters. It’s not your fault. If you can’t see the one in this footnote, you can get an idea of how it looks like in the images I included. 

  9. Metaphors from an excerpt of Mémoires concernant l’histoire, les sciences, les arts, les moeurs, les usages, &c des Chinois, par les missionaries de Pekin, which I read in the article The ideographic myth