Teaching a computer to think like a human



In this series, NUS News profiles the University’s Presidential Young Professors who are at the forefront of their research fields, turning creative ideas into important innovations that make the world better.


What is easier to teach? A computer or a child? Maybe a computer because it is able to execute commands perfectly. Maybe a child because most of us understand how to speak to another human while we may not understand how to write complex code. Presidential Young Professor Professor You Yang from NUS Computer Science says that it depends on the task.

We are at the stage of technological development where it is easy to teach a computer basic tasks like elementary image recognition. However, computers still struggle to learn concepts that have higher-orders of thinking.

The human brain is a spectacularly complicated organ. What comes naturally to people is difficult to teach a computer. Take the example of language. At six months old, a baby can babble “ma-ma” and repeat the sounds they hear. And at the age two to three, the child begins to use prepositions that indicate the position of an object (“the cat is in the box”), use pronouns (“you”, “me”, “her”) and can respond to simple questions.

These actions require a high level of cognition. Language is made up of grammar, context, tone as well as the definitional meaning of individual words. In fact, the human brain is so advanced that within five years of casual learning and observation, most children can carry a basic conversation. The same goes for a human’s ability to make sense of images they see in the world. So, how do we teach a computer to see the world in the same way a human does?

This is the scope of Asst Prof You’s work: developing and improving upon artificial intelligence (AI) and machine learning. This area of research is part of NUS’ research thrust of developing capabilities to realise Singapore’s vision of becoming a smart nation, and supports the Smart Nation and Digital Economy domain of Singapore’s Research, Innovation and Enterprise 2025 Plan.

Fascination with AI

When Asst Prof You was 15, he saw Google and Facebook’s explosive growth as he watched the news, and the young teenager knew computers were going to be the future. He later attended a talk by Andrew Ng, the then-Chief Scientist at Baidu, about the future of AI as he was studying computer science at Tsinghua University. It became evident that AI would dominate technology in the next century with uses in every day applications like Google Translate and YouTube’s recommendation algorithm. AI would also form the foundation for future innovation.

“Tesla needs a supercomputer to train an AI system for their self-driving cars,” Asst Prof You said, “But the future may be even more crazy – we could have individual self-flying machines.”

To achieve these large technological feats, it is important that scientists are able to accurately and efficiently train AI models – a field where Asst Prof You has made headlines. In 2017, his team broke the ImageNet training speed record. He then broke the BERT training speed record two years later. His contributions have cemented him in Forbes’ Asia 30 Under 30 list in 2021 when he also was awarded the Presidential Young Professorship at NUS.

Asst Prof You says that training a neural network is similar to teaching a child to read by giving them books – but instead of books, we give the AI data to learn from. Training takes time, and often speed sacrifices the accuracy of the end result. For example, if we want to train a child to read 1000 books in total, we can either give them 10 books a day for 100 days or 100 books a day for 10 days. However, when a child has too many books to read, we cannot be sure that the child is learning the correct content. Increasing the “batch size”, in other words, often decreases accuracy.

Asst Prof You helps to solve this problem by creating an optimiser that allows computer scientists to train neural networks quickly without sacrificing performance.

Training computers faster and more accurately

So what is an optimiser? Let’s take the example of a neutral network learning how to translate from English to French. Say you wish to translate the sentence, “I like basketball” (which Asst Prof You says he enjoys watching in his spare time). You can just translate each word into its equivalent in French - “I” is “je”, “like” is “aimer” and “basketball” is just “basketball”. However, French speakers know that the phrase, “Je aimer basketball” is grammatically incorrect. The correct French translation is actually, “J’aime le basketball”.

Hard-programming all the small and unique rules in grammar and context would take ages and probably would not yield good results. So, it’s much better if we could somehow teach the French and English language to a computer. In neural networks, scientists feed the AI system a huge data bank of sentences in English and their correct translations in French. These sentences are converted to “vectors” – which is a group of numbers that the computer can understand and work with. The computer then maps the English-language vectors onto the French-language vectors via a mathematical function (i.e. an equation). The function is the computer’s way of translating English to French. A “decoder” will then convert these numbers back into words and sentences.

Of course, these functions are rarely simple and they may work perfectly in some examples but may not for other exceptions. So, we need a way to maximise the accuracy of these functions, which is measured by a “loss function” – the less loss, the better the result. But it is hard for a computer to minimise loss. How does a computer know if the loss it currently has is the smallest possible loss? How does it know that there is no better function that leads to a more accurate translation? This process of the AI model adjusting its functions in order to minimise loss until it has the minimum possible is determined by an optimiser.

Computers need to adjust the things they learn when they come across anomalies, but these adjustments should not be too small or too large. A good optimiser makes sure that a neural network makes appropriately-sized adjustments to the things it learns.

Asst Prof You devised two optimisation techniques: Layer-wise Adaptive Rate Scaling (LARS) and Layer-wise Adapative Moments optimisation for Batch training (LAMB). Both propelled neural network training to be faster by allowing larger batch sizes to be used without degrading performance. The training time for BERT (a natural language model) was reduced from three days to just 76 minutes. Similarly, ImageNet (which is used for image processing) training times were reduced from 14 days to 14 minutes. Scientists from Google, Microsoft and NVIDIA used Asst Prof You’s techniques when they proceeded to further improve training speeds for BERT and ImageNet.

Breaking new ground in distributed computing

Besides smashing records in training times, Asst Prof You’s work encompasses a wide gamut of other AI research. He runs several projects at NUS Computing and one of his favourites is the development of a computer system that allows for code written for one computer to easily be transferred onto a distributed computing system.

Often one computer is not enough to run or train an AI model. Instead, scientists must separate out tasks and datasets across multiple servers, but getting many computers to efficiently communicate with each other is currently very hard. Asst Prof You’s work brings science one step closer to surmounting these challenges in distributed computing.

In August 2021, Asst Prof You founded a startup, HPC-AI Tech, which uses cloud-based technology to speed up central processing units and graphics processing units so both can keep up with ever faster AI models. The company has since attracted $4.7 million in venture capital from former Google China chief Kai-Fu Lee's Sinovation Ventures, Forbes Midas Lister Anna Fang's ZhenFund and Menlo Park-based BlueRun Ventures.

While HPC-AI currently has 10 clients, including Chinese auto maker Geely, Asst Prof You expects a big demand for his technology in the future. He has therefore set a goal of 1,000 clients in three years with a 50 per cent annual increase after that.

Never one to rest on his past laurels, Asst Prof You looks into the future and says, “The future of computing is distributed – and I'm excited to leverage this to maximise the potential of AI and beyond.”


More Proof of Passion stories here

The NUS Presidential Young Professorship (PYP) scheme supports talented young academics with excellent research track records in advancing their cutting-edge research. More information about the PYP scheme is available here