ATP Tennis Match Analysis and Anomaly Detection

This project focuses on analyzing ATP tennis match data using a deep learning model with joint embedding techniques. The objective is to detect anomalies in professional men's tennis tournament draws using advanced statistical and machine learning methods. The project employs PyTorch for building and training the neural network, Optuna for hyperparameter optimization, and DBSCAN for anomaly detection.

Overview
Features
Setup
Usage
Model Architecture
Hyperparameter Optimization
Anomaly Detection
Results
Contributing
License

Overview

The project aims to identify irregularities in tennis matches by examining patterns and discrepancies in player rankings, ages, and other match-related features. This analysis can help detect potential biases or unusual outcomes in tournament draws.

Features

Data Loading and Preprocessing: Handles ATP match data from multiple years, with preprocessing steps including encoding categorical features and handling missing values.
Feature Engineering: Creates new features such as age difference and rank difference between players.
Joint Embedding Neural Network: A PyTorch-based model that combines categorical and numerical features for robust prediction of match outcomes.
Hyperparameter Tuning: Uses Optuna for efficient optimization of model hyperparameters.
Anomaly Detection: Applies DBSCAN clustering to the embeddings generated by the model to identify anomalies in player performance.

Setup

Prerequisites

Python 3.8 or later
PyTorch
Optuna
Scikit-learn
Matplotlib
Pandas
NumPy

Installation

Clone the repository:

git clone https://github.com/yourusername/atp-tennis-analysis.git
cd atp-tennis-analysis

pip install -r requirements.txt

Download the ATP match data files and place them in the project directory. Ensure the files are named in the format atp_matches_.csv (e.g., atp_matches_2000.csv).

Run the main script to load data, preprocess it, and train the model: python main.py

Model Training

The script trains the model using the preprocessed data, optimizing hyperparameters with Optuna, and saves the best-performing model.

Anomaly Detection

The model’s predictions are used to perform anomaly detection, identifying unusual matches or player performances.

View Results

Results, including anomaly plots and metrics, will be saved in the output directory. CSV files summarizing the anomalies per player, year, and tournament will also be generated.

Model Architecture

The JointEmbeddedModel consists of:

Embeddings for Categorical Features: Each categorical variable (e.g., player IDs, tournament IDs) is embedded into a dense vector.

Fully Connected Layers: These layers combine embeddings and numerical features to predict match outcomes.

Dropout Layers: Used to prevent overfitting and improve model generalization.

Hyperparameter Optimization

The project uses Optuna to automatically search for the best combination of model parameters, including:

Embedding dimension

Hidden layer size

Learning rate

Batch size

Dropout rate

Anomaly Detection

Anomalies are detected by comparing expected and actual rank differences in matches using DBSCAN clustering. Anomalies can indicate unexpected match outcomes, potential biases, or errors in player rankings.

Results

Positive Anomalies: Matches where the predicted rank difference was significantly lower than expected. Negative Anomalies: Matches where the predicted rank difference was significantly higher than expected. The results are visualized using TSNE plots and saved as images and CSV files.

Contributions are welcome! Please feel free to submit a Pull Request or open an Issue for any improvements or bugs you encounter.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

sunga25
/

Detecting-Tennis-Anomalies-JEPA

ATP Tennis Match Analysis and Anomaly Detection

Table of Contents

Overview

Features

Setup

Prerequisites

Installation

Model Training

Anomaly Detection

View Results

Model Architecture

Hyperparameter Optimization

Anomaly Detection

Results

Contributions are welcome! Please feel free to submit a Pull Request or open an Issue for any improvements or bugs you encounter.

License

Model tree for sunga25/Detecting-Tennis-Anomalies-JEPA

Collection including sunga25/Detecting-Tennis-Anomalies-JEPA

JEPA