A conference room was packed with software engineers, researchers, and entrepreneurs eager to learn about the emerging field of generative artificial intelligence (AI), including its origins, developments into chatbots based upon large language models (LLM), and associated applications such as computer vision (CV).
The event was hosted by Galaxy Software Services (GSS), an over-the-counter (6752-TE) listed company that services more than 2,000 companies ranging from those in financial services to government agencies, hospitals, and telecom providers. The company offers software that helps organizations automate their services, cutting down on staffing costs and even record-keeping.
Servicing so many clients means that GSS has a major stake in the development of AI and is one of the companies powering this technological revolution in Taiwan. Hosting an annual forum for clients and stakeholders is just one way the company addresses the latest opportunities and challenges in AI.
The first speaker to address the forum was Galaxy Software Services (GSS) Chairman Perry Chang (張培鏞), who advocates the creation of indigenous LLM for the benefit of the nation. Chang explained that his initial interest in this field was to create a simple Chinese grammar and “spell check” function equivalent to that which is widely available in English.
His foray into this field proved more difficult than he thought, and after running through initial funding, he gave up. Years later, a colleague approached him once again, encouraging him to take up this task again, only this time using machine learning (ML). After starting again in 2013, he was ultimately successful and given the tools and experience to start along the path of natural language processing (NLP), which is a field of AI dedicated to the intersection of computing and human language.
At the outset, his company made use of rudimentary optical character recognition (OCR) for many of its software applications. “For example, if you had a poster, we would train the computer to examine preset locations on the poster to read the data,” said Chang. He added that much has changed since the early days of NLP, where computers were trained to look for certain characters or symbols in a prescribed location.
The rise of messaging as a means of communicating with friends as well as businesses created a business opportunity for companies such as GSS, which quickly rolled out its first chatbot in 2017. Early versions of these chatbots were quite limited, responding only to a number of preset cue words that could trigger an automated response.
At the outset, chatbots only utilized natural language processing (NLP), which was largely unstructured and cumbersome, being limited to cue words and canned responses. With the introduction of AI, greater accuracy and better predictive responses could be achieved through chatbots. In addition, to further improve the quality of chatbots, a larger database or set of queues and prompts would be needed, which is often referred to generically as large language models (LLM).
“Taiwan needs to create an indigenous LLM that reads traditional Chinese characters, understands Taiwan’s cultural heritage, ensures data confidentiality, and its cost is manageable, said Chang. He added that since 2013, GSS has heavily invested in machine learning, NLP, and later on visual document understanding (VDU).
“A breakthrough came at the presence of the ChatGPT in 2022,” said Chang. “We see challenges but more opportunities, and we are thrilled to bring our long-time endeavor in the AI field to the next level by fine-tuning the LLM to develop tailor-made AI solutions to solve the pain points for our clients from government agencies, logistics to healthcare service providers,” Chang said at the forum.
The convergence of chatbots and AI, which created ChatGPT at the end of 2022, is expected to bring even more opportunities and challenges for GSS to explore. Given the company's extensive implementation experience, it can quickly test and clarify innovative LLM functions paired with AI to meet market demand.
Developing an indigenous LLM for Taiwan
Advancements in AI in the field of LLM necessitate having a large and accurate database from which computers can draw upon. Naturally, the accuracy and integrity of these databases are vital to effective AI applications. Given the diversity of languages around the world, different databases are needed to increase the effectiveness of LLM.
Chang believes it is essential for Taiwan to develop its own LLM that can better address the needs of its public and private sectors. He said it would be a closed system that would protect sensitive data and other personal information from being misused by bad actors both domestically and abroad.
Addressing the issue of a secure database for LLM was the topic of the next speaker at the forum, National Taiwan University Computer Science and Information Engineering Department Chair, Jane Hsu (許永真). For the past few years, she has been developing the Trustworthy AI Dialogue Engine (TAIDE), which is dedicated to compiling Taiwan-specific information regarding language, culture, and history.
“Classical machine learning needs a huge amount of data and labeled data to be effective. But with GPT (generative pre-trained transformers), a type of large language model, we have made tremendous progress; we can now guess the next word through predictive capabilities and even input sentences. We even have tools for the summarization of text,” said Hsu.
But all this technology comes at a cost, as the number of processors for AI computation has exploded exponentially, leading to higher costs and increased staffing needs. Hsu lamented that Taiwan is having trouble keeping up with international trends and innovation, as it calls upon the government to quickly increase funding.
The release of the Taiwan AI Action Plan 2.0 has been both a blessing and a curse for Hsu and colleagues. On the one hand, more government attention increases the possibility for more funding, but greater scrutiny also opens the door for more oversight and demand for progress.
“We now have a national AI development strategy. At the moment, this is currently focused on the development of TAIDE. We simply can’t ask Chat GPT to do everything because we would run the risk of data bias, trustworthiness, norms, and cybersecurity issues to protect businesses and customer information,” said Hsu.
At a broader level, the development of databases such as TAIDE as well as domestic AI capabilities helps cultivate Taiwanese talent in this area. “We are working on a blend of Indigenous languages and the use of Chinese characters, as Taiwan has its own unique linguistic landscape,” said Hsu.
Hsu believes that AI is reaching an “Oppenheimer” moment where each invention is bringing us closer to a sort of escalation in technology or a virtual arms race for greater power and superiority. In turn, this will put a strain on staff, funding, and computational abilities.
Despite the challenges, there is a clear need to develop TAIDE as a trustworthy base model that can leverage LLM transformers or tokenizers like LLaMa. “There are a lot of challenges in providing instructions, fine-tuning, and a quest for quality data and parallel efforts,” said Hsu.