Soy Vitou

Software Engineer specializing in AI & Data Science. Currently designing optimized systems for ACLEDA Bank, focusing on Khmer OCR, ASR, and robust backend architectures.

Soy Vitou
Phnom Penh, Cambodia GPA 3.66

Professional Experience

AI Engineer & Data Scientist

ACLEDA Bank Plc.

Mar 2025 – Present
  • Designing and optimizing AI models: TTS, Tiny Whisper (ASR), NLP, LLM, and Khmer OCR.
  • End-to-End Development Lifecycle: Data preprocessing, training, deployment, and MLOps monitoring.
  • Collaborating across data science, software engineering, and DevOps teams to integrate production AI solutions.
  • Tech Stack: Python, FastAPI, Docker, Minio, Hadoop, Kubernetes, MLflow, Airflow, Clickhouse, PowerBI.

Web Developer

Metra BCCJP Co., Ltd.

Sep 2024 – Mar 2025
  • Developed and maintained responsive websites and web applications.
  • Managed NAS and high-performance computing servers to ensure data availability and infrastructure stability.
  • Provided networking IT support and system troubleshooting to optimize internal office operations.

Industry Projects

Banking & Automation

  • Khmer Voice Command System: Server-side deployment using Kubernetes with batching inference for high-concurrency bank application UI control.
  • Register Document Classification: Integrated ResNet101 + YOLO pipeline to classify Khmer ID, Birth Certificates, and Passports for automated registration.
  • Background Removal for Marketplace AC Supper App: Remove background of the product to make application clean, make it's standard, I'm fine-tuning on BiRefNet.
  • Cambodia Land Classification on Google Map: Provide land information based on lat-lng input from CO to estimate the land price by provide necessary land information on the whole country location geometry.
  • MOC (Business Registration): Develop a tool to scrape information from the MOC and display it to help the marketing team collaborate and partner with clients more easily.
  • EKYC (Khmer ID): Develop Khmer ID Data Verification, Develop SDK (Android and IOS) and Modeling Text Detection and OCR model training from scratch on 20 millions images with 400k image of KhmerID.

Technical Research Projects

Academic & Experimental

  • Khmer Handwritten OCR (TrOCR): Fine-tuned TrOCR on handwritten samples achieving a CER of 0.17. Custom tokenizer for mixed Khmer/English text.
  • BacII Data Extraction (YOLO + CNN-LSTM): High-speed structured data extraction from government certificates using CTC decoding.
  • Attendance System (Using Facial Recognition): Face Detection (MTCNN), Face Embedding (512 dimensions, Using ArcFace) and lastly I'm using using Cosine Similarity.

Open Source Datasets

HuggingFace Contributions

Open Source Projects

GitHub & Huggingface

Education & Academic Research

Academic Background

  • B.E. Information Technology Engineering: Royal University of Phnom Penh (RUPP)
    2022 – 2025, GPA: 3.66.
  • Teaching Assistant: Supported students in writing publication papers and guided them through project practicum research completion
    2023 – 2024.
  • Python + Big Data: Samsung Innovation Campus. Learning about Python programming language and Big Data tools such as Hadoop, PySpark, NiFi, Sqoop, etc.
    2022 – 2024.

Selected Publications

Research & Conferences

  • STI Focus Vol. 3: Abusive Image Classification using Hybrid Model. DOI: 10.13140/RG.2.2.13718.77129
    View on ResearchGate
  • ACET Conference (CADT): Lung Cancer Classification Based on CT Images (Hybrid CNN-RF).
    View on ResearchGate
  • 2nd SCDT Conf.: Animal Classification using Convolutional Neural Network. Best Research Paper Award (2023).
    View on ResearchGate