
Research Question:
Can a machine learning model, trained on extracted audio features, accurately distinguish between truthful and deceptive speech in audio recordings, in various natural languages?
Objective of the exercise:
Develop a machine learning model to detect deception from short speech clips using acoustic features such as:
Mel-Frequency Cepstral Coefficients (MFCCs)
Pitch
Spectral characteristics
Dataset:
100 labelled audio recordings in multiple languages (Hindi, English, Bengali)
Each sample is labeled as either truthful or deceptive
Methodology:
Feature Extraction: MFCCs, pitch, spectral centroid, bandwidth
Preprocessing:
Standardisation of features
Stratified 80/20 train-test split
Model: Support Vector Machine (SVM) with linear kernel
Evaluation Metrics:
Accuracy
Precision, Recall
F1-Score
Confusion Matrix
Impact of Using SVM
The Support Vector Machine (SVM) classifier, especially with a linear kernel, proved well-suited for this task for several reasons:
Effective on small datasets: SVM is robust even with limited data (like the 100 samples used here), especially when the number of features is high after extraction.
High-dimensional space handling: Acoustic features like MFCCs and spectral statistics can be complex - SVM handles this high-dimensional space efficiently.
Reduced overfitting: Compared to more complex models SVM performed well without overfitting, especially after dimensionality reduction via PCA.
Observed Improvements with SVM:
Performance improved over baseline models (e.g., logistic regression or naive classifiers).
Provided more balanced precision and recall, making it better at both detecting deception and minimizing false accusations.
When combined with PCA, SVM training became faster and more stable, helping generalize better on the test set.

Key Insights
Limitations
Small Dataset: Only 100 samples were available, limiting the model’s ability to generalize effectively.
Artificial Deception: The dataset features acted deceptive stories, which may not reflect natural, spontaneous lying behavior.
Language Imbalance: Uneven representation of languages may bias the model, as certain audio features may correlate with specific linguistic traits.
Limited Feature Scope: Only spectral and prosodic features were used; temporal or linguistic features could provide deeper insights.
Subtle Differences: Deceptive speech may not always differ clearly in acoustic features, making consistent detection difficult.
Speaker Variability: Natural differences in speech patterns across individuals introduce noise and reduce model reliability.
Lack of Context: Without non-verbal or situational context, audio-only analysis may miss key cues relevant to deception.
Future Improvements
Increase Dataset Size: Collect a larger dataset to improve the generalisability of the model. A bigger dataset would allow the model to capture more diverse patterns in speech.
Collect Natural Deception Data: Instead of relying on prompted deceptive stories, collect data in more realistic contexts where deception occurs naturally.
Prosodic Features: Include features like pitch range, speaking rate, or energy dynamics that might capture intentional modulations in deceptive speech.
Voice Quality Features: Analyse jitter, shimmer, or harmonics-to-noise ratio (HNR) to detect subtle changes in voice quality.
Higher-Level Linguistic Features: Extract semantic or syntactic features using tools like ASR (Automatic Speech Recognition) to analyze the content of the stories.
Experiment with Other Models: Test more advanced classifiers like Gradient Boosting Machines (e.g., XGBoost or LightGBM) or Neural Networks.
Ensemble Learning: Combine multiple models (e.g., SVM and Random Forest) to improve classification performance.