Projects & Work
Applied Data Science
The following projects were completed as part of my Master of Science in Applied Data Science program, spanning machine learning, database management, financial analysis, and cloud infrastructure — each grounded in real-world business problems.
Problem: Heart disease is the leading cause of death in the US and remains difficult to detect due to overlapping symptoms with other conditions.
Goal: Using a CDC dataset of over 300,000 patient questionnaire responses, build a predictive model to help medical professionals identify patients at high risk for heart disease early.
Method: Six classifiers evaluated through a 5-fold cross-validation pipeline measuring Accuracy, ROC-AUC, and F1-score. SMOTE addressed class imbalance; GridSearch tuned hyperparameters; PCA reduced dimensionality.
Clinical Insight: Age and chronic conditions — stroke, kidney disease, diabetes — are the strongest predictors. Sleep, exercise, BMI, and mental health are measurable levers for early intervention.
Model Performance
Problem: First Rate Financial, a Texas mortgage brokerage funding over $100M annually, faced inefficiencies identifying refinance opportunities among past clients and evaluating new borrowers for loan approval.
Solution: Designed and implemented a relational SQL database in Azure, separating data across Property, Borrower, and Loan tables. Loan Officers can now query past transactions by interest rate, mortgage insurance, and second lien status.
Outcome: New applicants are automatically segmented into High, Medium, and Low Risk groups based on income, DTI ratio, FICO scores, and loan-to-value — improving efficiency across the lending workflow.
Database Overview
Objective: Analyze the relationship between five high-performing tech stocks — NVDA, AAPL, MSFT, GOOGL, and AMZN — to determine whether they outpace the general market over a 15-year period, benchmarked against the DOW and S&P 500.
Method: Time series analysis, correlation studies, and trend tracking visualized through line plots, heatmaps, boxplots, and histograms. Sentiment analysis via Tweepy API attempted to correlate market volatility with major news events.
Key Finding: NVIDIA showed the lowest correlation with other FAANG stocks. Apple, Google, Amazon, and Microsoft moved closely in tandem with the S&P 500 and DOW. The DJI experienced the most significant single-month absolute price changes across the study period.
Correlation with S&P 500
Problem: A fictional music streaming platform (Cecilia) competed against Spotify and Apple Music while constrained by aging on-premises datacenter infrastructure — unable to scale during peak hours, suffering outages, and draining engineering resources.
Analysis: AWS, Microsoft Azure, and Google Cloud were evaluated across 19 categories spanning financial, technical, and operational dimensions using a structured side-by-side scorecard.
Conclusion: AWS was identified as the clear choice for media-specific workloads. A 10-year cost analysis showed the on-premises path costing over $118M versus $3.1M on AWS — a gap of more than $115M.
Annual Cost Comparison