职位描述
Summary:
Implement, manage, support/maintain the data application and security of AI in MPS.
RESPONSIBILITIES:
• Responsible for end-to-end data processes for AI projects, including data collection, cleaning, labeling, augmentation, and the construction and management of high-quality datasets.
• Develop and optimize data labeling tools, supporting data preparation required for large model training (e.g., SFT, RM data) and knowledge graph construction.
• Collaborate with AI teams to improve LLMs performance through data solutions.
• Other assignments from Supervisor.
REQUIREMENTS:
• Data Capabilities: Proficient in Python, SQL, and data processing technologies like Pandas/Spark; possesses experience in the secondary development of data labeling tools.
• AI Knowledge: Familiar with machine learning data formats and requirements; understanding of LLMs data construction methods; mastery of data augmentation techniques.
• Comprehensive Qualities: High standards for data quality; excellent communication skills; ability to accurately understand algorithm and business requirements.
• Good people and communication skills (English, Chinese).