Technical Projects

Technical Portfolio

My technical work focuses on language technology, natural language processing, and software development. Here are selected projects that demonstrate my technical skills and approach to solving complex problems.

NLP & Machine Learning Projects

Beyond Pattern Matching: Dataset Artifacts in SQuAD

Systematic analysis of shortcuts in reading comprehension models with two mitigation strategies: adversarial training and question-type aware loss.

Uncovered model brittleness through comprehensive testing: 26.1% adversarial vulnerability and 35.5% reasoning gap
Achieved 1.4x robustness improvement through adversarial training (50.1% → 72.5% EM)
Improved reasoning performance (+2.6% on reasoning questions) while maintaining 77.2% overall accuracy
Technologies: Python, PyTorch, Transformers, ELECTRA, NLP research methods

Fact-Checking LLM Outputs with Textual Entailment

An automated verification system that validates ChatGPT-generated biographies against Wikipedia using bag-of-words and neural entailment models.

Implemented a verification pipeline to decompose model outputs into atomic facts and validate them against BM25-retrieved Wikipedia passages
Developed a high-precision fact-checker using a fine-tuned DeBERTa-v3 model to determine logical entailment between claims and source text
Conducted detailed error analysis of false positives and negatives to identify linguistic patterns where LLMs struggle with factual consistency
Technologies: Python, PyTorch, DeBERTa-v3, Textual Entailment (NLI), FActScore, BM25

Transformer-Based Character Language Model

A custom-built Transformer architecture designed for sequence-to-sequence character counting and next-token prediction.

Engineered a Transformer encoder from scratch, implementing self-attention, residual connections, and positional encodings without high-level library abstractions
Developed a causal-masked language model trained on the text8 Wikipedia collection to predict next-character probability distributions
Optimized training performance through hyperparameter tuning and attention map visualization to achieve a target perplexity of less than 7
Technologies: Python, PyTorch, Transformer Architecture, Self-Attention, Positional Encoding, Language Modeling

Deep Averaging Networks for Robust Sentiment Analysis

A neural text classification system exploring the impact of word embeddings and architectural depth on sentiment detection.

Implemented a Deep Averaging Network (DAN) using GloVe embeddings to classify movie review sentiment into binary positive/negative labels
Developed a typo-robust generalization module using prefix embeddings to maintain performance on misspelled text where standard word-level models fail
Optimized training performance through mini-batching and dynamic sequence padding in PyTorch to handle varying sentence lengths efficiently
Technologies: Python, PyTorch, GloVe Embeddings, Deep Averaging Networks, String Edit Distance

Lexical Substitution System

An NLP system that identifies contextually appropriate word replacements using multiple approaches.

Combined WordNet, pre-trained Word2Vec embeddings, and BERT for contextual word substitution
Achieved 10% higher accuracy than baseline methods in suggesting replacements
Technologies: Python, NLTK, Gensim, BERT

PCFG Parsing Implementation

Implementation of the CKY algorithm for parsing with Probabilistic Context-Free Grammars (PCFGs).

Developed efficient implementation of the CKY dynamic programming algorithm
Created probabilistic grammar handling for syntactic analysis
Technologies: Python, NLTK

Computational Research Projects

Neighborhood-based Clustering for Visual Mental Imagery

Applied machine learning techniques to categorize visual mental imagery and perceptual domains.

Developed clustering algorithms to identify domain-specific patterns in visual processing
Implemented dimension reduction techniques to analyze performance score relationships
Created visualization tools for complex cognitive data
Technologies: Python, scikit-learn, pandas, matplotlib, k-means clustering

Multilingual Acquisition Analysis

Comparative corpus-driven study of sentence-final particle acquisition across different language backgrounds.

Designed data processing pipeline for analyzing 10,000+ multilingual utterances
Implemented statistical models to identify acquisition patterns across language groups
Created quantitative metrics for cross-linguistic influence measurement
Technologies: Python, R, NLTK, pandas, lme4, statistical modeling

Web & Application Development

CantoLeap - Cantonese Vocabulary Learning App

A beginner-friendly mobile application designed to help users learn essential Cantonese vocabulary through interactive exercises.

Developed vocabulary learning system with flashcard-style exercises
Implemented practice quizzes to reinforce learning retention
Created community notes feature for users to share learning tips and insights
Technologies: Kotlin, Android Development

Django Web Application

Backend system developed with Django framework.

Created RESTful API endpoints and database models
Implemented authentication and authorization systems
Technologies: Python, Django, SQL, REST APIs

Speech Recognition System

Multi-API speech recognition project with sentiment analysis capabilities.

Integrated multiple speech recognition APIs for comparative performance
Implemented real-time transcription and sentiment analysis
Technologies: Python, AssemblyAI API, OpenAI API

View my GitHub

Kassey Chang

Technical Portfolio

NLP & Machine Learning Projects

Computational Research Projects

Web & Application Development