Abuse and Fraud Detection in Streaming Services Using Heuristic-Aware Machine Learning

Authors - Soheil Esmaeilzadeh, Negin Salajegheh, Amir Ziai, and Jeff Boote

Affiliation - Netflix, CA

Download the Paper 

Abstract - This work presents a fraud and abuse detection frame- work for streaming services by modeling user stream- ing behavior. The goal is to discover anomalous and suspicious incidents and scale the investigation efforts by creating models that characterize the user behavior. We study the use of semi-supervised as well as super- vised approaches for anomaly detection. In the semi- supervised approach, by leveraging only a set of au- thenticated anomaly-free data samples, we show the use of one-class classification algorithms as well as autoen- coder deep neural networks for anomaly detection. In the supervised anomaly detection task, we present a so- called heuristic-aware data labeling strategy for creating labeled data samples. We carry out binary classifica- tion as well as multi-class multi-label classification tasks for not only detecting the anomalous samples but also identifying the underlying anomaly behavior(s) associ- ated with each one. Finally, using a systematic feature importance study we provide insights into the underly- ing set of features that characterize different streaming fraud categories. To the best of our knowledge, this is the first paper to use machine learning methods for fraud and abuse detection in real-world scale streaming services.

Keywords: Heuristic-Aware, Fraud Detection, Machine Learning, Streaming Services

alternate text

Schematic of a streaming service platform: subfigure (a) illustrates device types that can be used by clients for streaming, subfigure (b) designates the set of authentication and authorization systems such as DRMs and license and manifest servers for providing encrypted contents as well as decryption keys and manifests, and subfigure (c) shows the streaming service provider, as a surrogate entity for digital content providers, that interacts with the other two components.

Highlights

alternate text

Fig.1. Model-based anomaly detection approaches: (a) semi-supervised and (b) supervised.

alternate text

Table.1. The list of streaming related features with the suffixes _pct and _cnt respectively referring to percentage and count.

alternate text

Fig.3. Correlation matrix of the features presented in Table (1) for (a) clean and (b) anomalous data samples.

alternate text

Fig.4. A schematic of Synthetic Minority Over-sampling Technique (SMOTE)

alternate text

Fig.5. For the three fraud categories before and after carry- ing out multi-class multi-label SMOTE: (a) number of anoma- lous tagged accounts and (b) label imbalance ratio. </div>

alternate text

Table 2. The values of the evaluation metrics for a set of semi-supervised anomaly detection models. </div>

alternate text

Table 3. The values of the evaluation metrics for a set of supervised binary anomaly detection classifiers. </div>

alternate text

Table 4. The values of the evaluation metrics for a set of supervised multi-class multi-label anomaly detection approaches. The values in parenthesis refer to the performance of the models trained on the original (not upsampled) datasets. </div>


Tags: