Technical Projects

Technical Portfolio

My technical work focuses on language technology, natural language processing, and software development. Here are selected projects that demonstrate my technical skills and approach to solving complex problems.

NLP & Machine Learning Projects

Image Captioning with LSTM

A deep learning system that generates natural language descriptions for images using a CNN-LSTM architecture.

  • Developed CNN-LSTM model for automated image captioning using Keras and InceptionV3
  • Implemented beam search algorithm to improve caption quality on 8,000+ images
  • Technologies: Python, TensorFlow, Keras, OpenCV

Lexical Substitution System

An NLP system that identifies contextually appropriate word replacements using multiple approaches.

  • Combined WordNet, pre-trained Word2Vec embeddings, and BERT for contextual word substitution
  • Achieved 10% higher accuracy than baseline methods in suggesting replacements
  • Technologies: Python, NLTK, Gensim, BERT

PCFG Parsing Implementation

Implementation of the CKY algorithm for parsing with Probabilistic Context-Free Grammars (PCFGs).

  • Developed efficient implementation of the CKY dynamic programming algorithm
  • Created probabilistic grammar handling for syntactic analysis
  • Technologies: Python, NLTK

Sustainability Language Analysis Tool (SLAT)

NLP tool developed during a hackathon that analyzes corporate sustainability commitments.

  • Built web scraping and text analysis tool to evaluate sustainability language in business communications
  • Implemented scoring algorithm based on keyword frequency and sentiment analysis
  • Technologies: Python, BeautifulSoup, NLTK, scikit-learn

Computational Research Projects

Neighborhood-based Clustering for Visual Mental Imagery

Applied machine learning techniques to categorize visual mental imagery and perceptual domains.

  • Developed clustering algorithms to identify domain-specific patterns in visual processing
  • Implemented dimension reduction techniques to analyze performance score relationships
  • Created visualization tools for complex cognitive data
  • Technologies: Python, scikit-learn, pandas, matplotlib, k-means clustering

Multilingual Acquisition Analysis

Comparative corpus-driven study of sentence-final particle acquisition across different language backgrounds.

  • Designed data processing pipeline for analyzing 10,000+ multilingual utterances
  • Implemented statistical models to identify acquisition patterns across language groups
  • Created quantitative metrics for cross-linguistic influence measurement
  • Technologies: Python, R, NLTK, pandas, lme4, statistical modeling

Web & Application Development

CantoLeap - Cantonese Vocabulary Learning App

A beginner-friendly mobile application designed to help users learn essential Cantonese vocabulary through interactive exercises.

  • Developed vocabulary learning system with flashcard-style exercises
  • Implemented practice quizzes to reinforce learning retention
  • Created community notes feature for users to share learning tips and insights
  • Technologies: Kotlin, Android Development

Django Web Application

Backend system developed with Django framework.

  • Created RESTful API endpoints and database models
  • Implemented authentication and authorization systems
  • Technologies: Python, Django, SQL, REST APIs

Speech Recognition System

Multi-API speech recognition project with sentiment analysis capabilities.

  • Integrated multiple speech recognition APIs for comparative performance
  • Implemented real-time transcription and sentiment analysis
  • Technologies: Python, AssemblyAI API, OpenAI API

Vocabulary Test and Translation Game

Educational web application for language learning.

  • Developed interactive web-based game for vocabulary practice across languages
  • Implemented responsive design for cross-device compatibility
  • Technologies: HTML, CSS, JavaScript, LocalStorage

Language Engineering Tools

Grammatical Error Detection

NLTK-based system for identifying and correcting grammatical errors.

  • Created rule-based and statistical approach to error detection
  • Achieved 20% improvement over baseline methods in accuracy
  • Technologies: Python, NLTK, POS tagging

Java Spell Checker

Implementation of spelling correction algorithms in Java.

  • Developed efficient edit distance algorithms for spelling suggestions
  • Implemented context-aware correction prioritization
  • Technologies: Java, String processing, Algorithm optimization

Technical Approach

My engineering work combines software development best practices with linguistic expertise to create robust, user-friendly language technology solutions. I focus on building systems that are technically sound, linguistically accurate, and globally accessible.

View my GitHub | Contact me about collaboration opportunities