Soy Vitou
Software Engineer specializing in AI & Data Science. Currently designing optimized systems for ACLEDA Bank, focusing on Khmer OCR, ASR, and robust backend architectures.
Phnom Penh,
Cambodia
GPA
3.66
Professional Experience
AI Engineer & Data Scientist
- • Designing and optimizing AI models: TTS, Tiny Whisper (ASR), NLP, LLM, and Khmer OCR.
- • End-to-End Development Lifecycle: Data preprocessing, training, deployment, and MLOps monitoring.
- • Collaborating across data science, software engineering, and DevOps teams to integrate production AI solutions.
- • Tech Stack: Python, FastAPI, Docker, Minio, Hadoop, Kubernetes, MLflow, Airflow, Clickhouse, PowerBI.
Web Developer
Metra BCCJP Co., Ltd.
- • Developed and maintained responsive websites and web applications.
- • Managed NAS and high-performance computing servers to ensure data availability and infrastructure stability.
- • Provided networking IT support and system troubleshooting to optimize internal office operations.
Industry Projects
Banking & Automation
- • Khmer Voice Command System: Server-side deployment using Kubernetes with batching inference for high-concurrency bank application UI control.
- • Register Document Classification: Integrated ResNet101 + YOLO pipeline to classify Khmer ID, Birth Certificates, and Passports for automated registration.
- • Background Removal for Marketplace AC Supper App: Remove background of the product to make application clean, make it's standard, I'm fine-tuning on BiRefNet.
- • Cambodia Land Classification on Google Map: Provide land information based on lat-lng input from CO to estimate the land price by provide necessary land information on the whole country location geometry.
- • MOC (Business Registration): Develop a tool to scrape information from the MOC and display it to help the marketing team collaborate and partner with clients more easily.
- • EKYC (Khmer ID): Develop Khmer ID Data Verification, Develop SDK (Android and IOS) and Modeling Text Detection and OCR model training from scratch on 20 millions images with 400k image of KhmerID.
Technical Research Projects
Academic & Experimental
- • Khmer Handwritten OCR (TrOCR): Fine-tuned TrOCR on handwritten samples achieving a CER of 0.17. Custom tokenizer for mixed Khmer/English text.
- • BacII Data Extraction (YOLO + CNN-LSTM): High-speed structured data extraction from government certificates using CTC decoding.
- • Attendance System (Using Facial Recognition): Face Detection (MTCNN), Face Embedding (512 dimensions, Using ArcFace) and lastly I'm using using Cosine Similarity.
Open Source Datasets
HuggingFace Contributions
- • 62k Images Khmer Printed Dataset: View on HuggingFace.
- • Khmer Handwritten Dataset (4.2k): View on HuggingFace.
Open Source Projects
GitHub & Huggingface
- • Fast Khmer OCR Running on Edge Device (ONNX conversion): View on GitHub.
- • Fast Khmer OCR Running on Edge Device (Live Demo): 🟢 Live Demo.
- • Infinity Large Khmer OCR Live Demo: 🟢 Live on HuggingFace.
Education & Academic Research
Academic Background
-
•
B.E. Information Technology Engineering: Royal University of Phnom Penh (RUPP)
2022 – 2025, GPA: 3.66. -
•
Teaching Assistant: Supported students in writing publication papers and guided them through project practicum research completion
2023 – 2024. -
•
Python + Big Data: Samsung Innovation Campus. Learning about Python programming language and Big Data tools such as Hadoop, PySpark, NiFi, Sqoop, etc.
2022 – 2024.
Selected Publications
Research & Conferences
-
•
STI Focus Vol. 3: Abusive Image Classification using Hybrid Model. DOI: 10.13140/RG.2.2.13718.77129
View on ResearchGate -
•
ACET Conference (CADT): Lung Cancer Classification Based on CT Images (Hybrid CNN-RF).
View on ResearchGate -
•
2nd SCDT Conf.: Animal Classification using Convolutional Neural Network. Best Research Paper Award (2023).
View on ResearchGate