Projects & Work

Portfolio

Applied Data Science

Projects & Analysis

The following projects were completed as part of my Master of Science in Applied Data Science program, spanning machine learning, database management, financial analysis, and cloud infrastructure — each grounded in real-world business problems.

01

Predictive Analytics — Heart Disease Detection

Machine Learning · Final Project · CDC Dataset · Complexity: Advanced

Problem: Heart disease is the leading cause of death in the US and remains difficult to detect due to overlapping symptoms with other conditions.

Goal: Using a CDC dataset of over 300,000 patient questionnaire responses, build a predictive model to help medical professionals identify patients at high risk for heart disease early.

Method: Six classifiers evaluated through a 5-fold cross-validation pipeline measuring Accuracy, ROC-AUC, and F1-score. SMOTE addressed class imbalance; GridSearch tuned hyperparameters; PCA reduced dimensionality.

Clinical Insight: Age and chronic conditions — stroke, kidney disease, diabetes — are the strongest predictors. Sleep, exercise, BMI, and mental health are measurable levers for early intervention.

Python Random Forest Logistic Regression SMOTE GridSearch PCA

Model Performance

Best Model Logistic Regression
Test Accuracy 75.7%
F1-Macro Score 0.76
ROC-AUC 0.84
Dataset Size 300,000+ patients
Classifiers Evaluated 6 models

Top Risk Predictors

Age / Chronic ConditionsHigh
BMIMedium
Sleep / ExerciseMedium
02

SQL Querying for Mortgage Lender

Data Admin & Database Management · Final Project · Azure · Complexity: Intermediate

Problem: First Rate Financial, a Texas mortgage brokerage funding over $100M annually, faced inefficiencies identifying refinance opportunities among past clients and evaluating new borrowers for loan approval.

Solution: Designed and implemented a relational SQL database in Azure, separating data across Property, Borrower, and Loan tables. Loan Officers can now query past transactions by interest rate, mortgage insurance, and second lien status.

Outcome: New applicants are automatically segmented into High, Medium, and Low Risk groups based on income, DTI ratio, FICO scores, and loan-to-value — improving efficiency across the lending workflow.

SQL Azure Relational DB Risk Segmentation ERD Modeling

Database Overview

Platform Microsoft Azure
Annual Loan Volume $100M+
Tables Designed 3 core tables
Query Types Views, Joins, Aggregates
High Risk
DTI >43%
Med Risk
DTI 36–43%
Low Risk
DTI <36%
03

FAANG Stock Data Analysis

Scripting for Data Analysis · Final Project · Python · Complexity: Intermediate

Objective: Analyze the relationship between five high-performing tech stocks — NVDA, AAPL, MSFT, GOOGL, and AMZN — to determine whether they outpace the general market over a 15-year period, benchmarked against the DOW and S&P 500.

Method: Time series analysis, correlation studies, and trend tracking visualized through line plots, heatmaps, boxplots, and histograms. Sentiment analysis via Tweepy API attempted to correlate market volatility with major news events.

Key Finding: NVIDIA showed the lowest correlation with other FAANG stocks. Apple, Google, Amazon, and Microsoft moved closely in tandem with the S&P 500 and DOW. The DJI experienced the most significant single-month absolute price changes across the study period.

Python Time Series yfinance API Tweepy API Sentiment Analysis Correlation Matrix

Correlation with S&P 500

Study Period 15 Years
Securities Tracked 7 total

Correlation Scores

GOOGL0.99
AAPL0.97
MSFT0.97
AMZN0.95
NVDA0.79
04

Cloud Migration Feasibility Study

Cloud Management · Final Project · AWS · Complexity: Intermediate

Problem: A fictional music streaming platform (Cecilia) competed against Spotify and Apple Music while constrained by aging on-premises datacenter infrastructure — unable to scale during peak hours, suffering outages, and draining engineering resources.

Analysis: AWS, Microsoft Azure, and Google Cloud were evaluated across 19 categories spanning financial, technical, and operational dimensions using a structured side-by-side scorecard.

Conclusion: AWS was identified as the clear choice for media-specific workloads. A 10-year cost analysis showed the on-premises path costing over $118M versus $3.1M on AWS — a gap of more than $115M.

AWS Microsoft Azure Google Cloud Cost Modeling Cloud Architecture IaaS / PaaS / SaaS

Annual Cost Comparison

On-Premises
$13.2M/yr
AWS Cloud
$2.4–3.1M/yr
Est. Annual Savings: $9.6M – $10.4M

10-Year Cost Total

On-Premises$118M
AWS Cloud$3.1M
10-Year Total Savings $115M+
Categories Evaluated 19 dimensions