Comparative ML study using XGBoost and LSTM to predict significant crypto price movements, with extensive feature engineering and financial performance optimization.
A two-phase research project tackling cryptocurrency price prediction through a research-oriented lens. Phase 1 compared logistic regression, random forest, XGBoost, and LSTM models for predicting 15%+ price increases over 30 days across 2,000+ cryptocurrencies. Phase 2 focused on hyperparameter-tuning an XGBoost model specifically for predicting price doublings within 60 days, optimized for financial returns rather than conventional classification metrics. We collected daily data for 2,000+ coins via the CoinGecko API, engineered 50+ features across price, volume, volatility, momentum, and temporal categories, and introduced a novel train-leader-test-follower dataset split methodology.
0.389
LSTM F1-Score
0.383
XGBoost F1-Score
1,689%
Portfolio Return
9.59
Profit Factor
Month (20.4%)
Top Feature
2,000+
Coins Analyzed
Collected 365 days of daily data for 2,000+ cryptocurrencies via CoinGecko API with custom rate-limiting. Filtered stablecoins and engineered features across five categories: price (moving averages, momentum), volume (OBV, correlations), volatility (Bollinger Bands, std dev), momentum (RSI, MACD), and temporal (month, day of week). Used a train-leader-test-follower split — training on top 50% market cap coins, testing on a random sample from the bottom 50% — to evaluate generalization. Addressed severe class imbalance (<1% positive) with undersampling + SMOTE. Phase 2 conducted threshold sensitivity analysis up to 0.95 probability.