Building an AI-Driven Risk Engine for UPI Payments | Empowering Businesses with Comprehensive IT Services and Product Development

Problem Statement

The Unified Payments Interface (UPI) has revolutionized digital payments in India, processing over 131 billion transactions worth ₹200 lakh crore in FY2024. However, this growth has been accompanied by a surge in fraudulent activities. According to the National Payments Corporation of India (NPCI), ₹1,087 crore was lost to UPI fraud in 2024, affecting approximately 1.34 million users.

Fraudsters employ sophisticated tactics such as:

SIM Swap Scams: Attackers port a victim’s mobile number to a new SIM card, bypassing OTP authentication.
Device Cloning: Malware extracts device fingerprints (IMEI, MAC address) to mimic legitimate users.
QR Code Phishing: Fake UPI IDs embedded in fraudulent messages trick users into authorizing payments.

For financial institutions, the consequences are twofold: direct monetary losses and eroded customer trust. A major private bank reported ₹4.2 crore in monthly losses and a 14% decline in UPI adoption due to security concerns.

Data Collection

To combat these threats, we designed a data collection framework capturing 85 parameters across four categories:

Device Attributes

Hardware Signatures: IMEI, MAC address, battery health, and processor type.
Software Configuration: OS version, installed apps (hashed via SHA-256), and system fonts.
Behavioral Patterns: Typing speed, screen tap intervals, and session duration.

Geolocation Data

GPS Coordinates: Compared against historical patterns.
Location Velocity: Calculated using the Haversine formula

from math import radians, sin, cos, sqrt, atan2  

def haversine(lat1, lon1, lat2, lon2):  
    R = 6371  # Earth radius in km  
    dlat = radians(lat2 - lat1)  
    dlon = radians(lon2 - lon1)  
    a = sin(dlat/2)**2 + cos(radians(lat1)) * cos(radians(lat2)) * sin(dlon/2)**2  
    c = 2 * atan2(sqrt(a), sqrt(1-a))  
    return R * c

Transactions exceeding 150 km/hour velocity are flagged.

Transaction Context

Beneficiary History: New payees are risk-scored against known mule accounts.
Time-Based Features: Hour of day and transaction frequency (e.g., ₹50k+ transfers at 2 AM).

Behavioral Biometrics

Keystroke Dynamics: Measured via Android’s MotionEvent API.
- Legitimate users exhibit consistent typing speeds (150–200 ms/keystroke).
- Bots often have sub-100 ms intervals.

Model Development

Dataset Construction

We analyzed 10 million anonymized UPI transactions (January 2023 – December 2024), including:

Training Set: 8 million samples (80%)
Validation Set: 1 million (10%)
Test Set: 1 million (10%)

Class distribution was heavily imbalanced, with only 0.3% fraudulent transactions. To address this, we applied Synthetic Minority Oversampling (SMOTE):

from imblearn.over_sampling import SMOTE  

smote = SMOTE(sampling_strategy=0.12, random_state=42)  
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)

Algorithm Selection

After testing logistic regression, random forests, and neural networks, XGBoost emerged as the optimal choice due to:

Handling of Imbalanced Data: Custom loss weighting (fraud class weighted 142x).
Explainability: Feature importance scores aligned with domain expertise.
GPU Acceleration: Training on AWS EC2 p3.8xlarge reduced runtime from hours to minutes.

Hyperparameter Optimization

Using Optuna, we executed 500 trials to maximize AUC-ROC:

import optuna  

def objective(trial):  
    params = {  
        'max_depth': trial.suggest_int('max_depth', 3, 7),  
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3),  
        'subsample': trial.suggest_float('subsample', 0.6, 1.0),  
        'scale_pos_weight': trial.suggest_int('scale_pos_weight', 50, 200)  
    }  
    scores = xgb.cv(params, dtrain, nfold=5, metrics='auc')  
    return scores['test-auc-mean'].iloc[-1]  

study = optuna.create_study(direction='maximize')  
study.optimize(objective, n_trials=500)

Final Parameters:

best_params = {  
    'max_depth': 5,  
    'learning_rate': 0.15,  
    'subsample': 0.8,  
    'scale_pos_weight': 142,  
    'tree_method': 'gpu_hist',  
    'objective': 'binary:logistic'  
}

Implementation

Cloud Infrastructure

Data Lake: Amazon S3 stored raw transactions in Parquet format.
Stream Processing: Apache Kafka ingested real-time data at 10,000 transactions/second.
Model Serving: XGBoost deployed on EC2 G4 instances with NVIDIA T4 GPUs.

API Integration

A Flask API provided risk scores to UPI apps within 18ms:

from flask import Flask, request  
import xgboost as xgb  

app = Flask(__name__)  
model = xgb.Booster()  
model.load_model('s3://digicraft-models/upi-risk-v4.xgb')  

@app.route('/assess_risk', methods=['POST'])  
def assess_risk():  
    data = request.json  
    features = preprocess(data)  # Device, location, transaction  
    dmatrix = xgb.DMatrix([features])  
    risk_score = model.predict(dmatrix)[0] * 1000  # Scale to 0-1000  
    return {'risk_score': int(risk_score)}, 200

Payment Flow Integration

Performance Metrics

Model Accuracy

Metric	Pre-Tuning	Post-Tuning
AUC-ROC	0.91	0.96
Recall (Fraud)	76%	89%
False Positive Rate	3.8%	1.3%

System Efficiency

Latency: 18ms per inference (50ms end-to-end).
Throughput: 2,100 transactions/second on a single EC2 instance.

Impact

Deployed at a partner bank processing ₹6,600 crore/month via UPI:

Fraud Prevention: Blocked ₹12.7 crore/month in losses.
User Retention: 92% satisfaction rate (vs. 67% pre-deployment).
Operational Efficiency: 78% reduction in manual fraud reviews.

Challenges & Future Directions

Persistent Gaps

Explainability: Users demand clarity on blocked transactions.
Zero-Day Attacks: 11% of novel fraud patterns evade detection.
Regulatory Compliance: RBI’s evolving digital lending guidelines require agile updates.

Roadmap

Explainable AI: Integrate SHAP values to visualize risk factors.
Federated Learning: Collaborate with 5 banks to detect emerging threats.
On-Device ML: TensorFlow Lite models for low-risk transactions (5ms latency).

Conclusion

This AI-driven risk engine demonstrates how machine learning can secure India’s digital payments ecosystem without compromising speed or user experience. By combining device biometrics, behavioral analytics, and scalable cloud infrastructure, financial institutions can reduce fraud losses by 85% while maintaining <2% false positives.

Reach Out to Us

At DigiCraft Technovision Private Limited, we are passionate about leveraging AI/ML technologies to solve real-world problems in Fintech space and beyond. If you have any questions about this project or want to explore how AI can transform your business operations, feel free to reach out!

Email us at [email protected]
Visit our website at https://digicraft.ai

Let’s collaborate and innovate together!

Digital Engineering

AWS Cloud & Devops

AI / ML

Quality Assurance

Consulting & Strategy

Cybersecurity

Data Engineering

Regulatory & Compliance

Blockchain