A visitor tries his hands on a holographic intelligent medical imaging system under the guidance of a staff member at the Light of Internet Expo in Wuzhen, east China's Zhejiang Province, Nov. 19, 2024. (Photo/Xinhua)
Chongqing - Proficient in poetry, painting, diagnosing, and problem-solving, AI’s all-around capabilities today are truly extraordinary. At the forefront of this technological evolution, AI trainers are steadily emerging as key players.
From data "movers" to expert "trainers," AI trainers have experienced both the rapid advancement of artificial intelligence and their own constant transformation.
Earlier this year, Ya, a recent graduate majoring in the Internet of Things, joined an outsourcing company working with a major tech firm, stepping into the role of AI trainer. “I mostly handle data labeling,” Ya explained. “My assignment is in the mathematics field, and I use tools like LaTeX to write out the problem-solving process.”
Ya acknowledged that the position didn’t require a highly specialized background. “There are rule documents to follow, and it’s simply a matter of organizing the steps.”
At the end of last year, Lin, an art major, also made the leap into AI training after a decade as a UI designer. “I’d been a designer for ten years, but I felt that this industry would soon be overtaken by AI, so I decided to make a career switch,” Lin said.
Lin quickly discovered that her new role bore little resemblance to her previous one. “I mainly work on text-based data labeling, scoring, and rewriting model-generated content according to company guidelines, which helps train large models,” she noted.
Lin likens her new job to teaching a child. “You have to guide the large model in generating more reasonable content. When it makes mistakes, you need to correct them.”
Unlike Ya and Lin, Lei is a seasoned veteran in the field. With only a high school diploma, Lei first encountered AI training in 2018 while working in customer service. “Back then, I was assisting trainers on an outsourcing project, doing data labeling for a major tech company. A year later, a trainer introduced me to the role, and I began doing this full-time.”
At that time, the job was still relatively niche. “My educational background wasn’t an issue since I had relevant work experience,” Lei said. With a willingness to learn, Lei made her entry into the AI industry.
“In the beginning, I worked on personalized demand implementations for intelligent products, mostly dealing with Q&A—essentially small models. I improved answer quality by building knowledge bases and fine-tuning Q&A strategies,” Lei explained.
Since 2022, Lei has focused on larger models, managing corpora and leading resource teams in data production. “This involves close collaboration with the algorithmic team, translating their data requirements into detailed labeling guidelines.”
Lei’s current responsibilities involve large-scale data labeling tasks that often number in the tens or even hundreds of thousands. Team sizes fluctuate according to project demands—sometimes as few as five or six people, and other times expanding to over a hundred.
After years of working in the industry, Lei has seen firsthand how large models have changed the role of AI trainers. “Before the advent of large models, the work of AI trainers was relatively focused,” Lei recalled. “Content output relied mainly on knowledge base retrieval, and data labeling was simply about redesigning poorly performing data. The labeling work was relatively light, and the content triggered by questions was all internal industry knowledge, making it easier to control.”
With the rise of large models, however, the complexity of the work has grown. “Now, there are various types of tasks, not only text but also images, audio, etc. The answers inferred by large models based on corpora are also more uncontrollable.”
The introduction of DeepSeek has also brought notable changes to the field. “In the past, everyone was stacking corpora, thinking the more, the better. But now, we need to question whether we should adjust the direction,” Lei said.
Lei pointed out that this issue isn’t entirely new. “At first, adding more knowledge improved results, but once it passed a certain threshold, it led to intent confusion and the knowledge became unclear.”
Despite the increasing capabilities of large models, Lei has observed new problems arising. For instance, while large models are adept at writing official documents, their accuracy still has limits. Additionally, they sometimes produce erroneous responses or even “speak nonsense with confidence.” Lei suggested that these issues often trace back to corpus quality. “It’s necessary to troubleshoot the cases, identify which part of the process went wrong, and then adjust the strategies of the large model.”
The growing sophistication of large models has also made AI training more specialized. “In the past, teaching AI mainly involved teaching basic knowledge, like teaching a child common sense. Now that AI has some cognitive ability, it needs higher-level or more specialized individuals to teach it advanced thinking,” Lei said.
According to Lei, large models are increasingly used in specialized fields, requiring professionals with backgrounds in areas like medicine, education, and law. “These specialized corpora generation tasks can’t be done by outsiders,” he noted.
Lei also pointed out that as these roles become more specialized, the pay disparity for AI trainers has widened. “For basic data labeling positions, the monthly salary may only be a few thousand yuan, while for roles with higher requirements in large companies, the monthly salary can reach 30,000 to 50,000 yuan ($4,119.21 to $6,865.35).”
Many outsourcing companies in the data labeling field have high recruitment needs, Lei said, but the job’s repetitive nature and limited growth potential lead many people to leave after a short period. Still, Lei believes that for individuals with limited education who are interested in the industry, starting with data labeling can be a valuable entry point. “At least it gives you exposure to the industry, and with work experience, you can gradually transition to higher-level training roles.”
As demand for AI trainers grows, related training programs have begun to appear. However, Lei cautioned against rushing into costly courses. “If it’s just about data labeling, the requirements aren’t that high. Even if it’s to prepare for interview questions, you don’t necessarily need to spend a lot of money on a course. You can easily find relevant knowledge and learn it on your own.”
(Beijing Daily contributed to this report.)
By continuing to browse our site you agree to our use of cookies, revised Privacy Policy and Terms of Use. You can change your cookie settings through your browser.
For any inquiries, please email service@ichongqing.info