The internet has become an integral part of our daily lives, making it more important than ever to ensure a safe and secure online environment. As an academic researcher, my primary research goal is to contribute to the understanding of malicious activity on the internet and its impact on individuals and society. My core research interests are in the areas of malware analysis, vulnerability management, and threat intelligence. My work broadly combines principles of the design, analysis, and development of tools that impact the different stages of incident response lifecycle. To do so, I adopt exploratory, constructive, and empirical methods. Through malware analysis, my research aims to enable effective detection and identification of malware threats. My vulnerability management efforts aims at improved tracking and prioritization of vulnerabilities. And, my threat intelligence research explores the utilization of crowdsourced intelligence towards practical detection of threats at the network. My research contributions have proposed robust Machine Learning (ML)- based detection systems for the prevalent IoT malware, investigated emerging threat actors, and improved the usability of the vulnerability databases.
Uncovering attack infrastructure patterns
Temporal analysis of vulnerability disclosure
Abusing malware variance and persistence
OSCTI for practical threat identification
Practical Open Source Cyber Threat Intelligence
An effective way to ensure security is to understand the capabilities of the adversary. Understanding the threats that we are exposed to, helps us build defenses, share insights, and identify the perpetrators so as to limit their impact. An accepted and effective way utilizes a network of honeypots. Honeypot interactions have been widely studied in the past. However, lately the focus has been on IoT-based threats, leaving the others untouched. The lack of such overarching studies in the past decade in the face of the evolution that we have seen in this time period. This underlines the importance to emphasize upon the dynamics of threats that we are exposed to. To do so, we perform a measurement study on honeypot data collected between July 2020 and June 2021 by an industry giant in cybersecurity. We measure 806 million alerts raised by 603 endpoints (honeypots). We create a framework that leverages the Open Source Cyber Threat Intelligence by the security organizations and independent researchers to generate high-level attack inferences and malware campaign inferences. Our investigations show that the vulnerabilities travel over geographies, over time, emphasizing on the need for cross-continental resource sharing to limit the impacts of recent attack vectors. Additionally, we find the continued existence of rogue networks since them being identified in the last decade, creating an atmosphere of unaccountability. This work has been accepted for publication at ACSAC 2022.
Securing computer systems in practice entails identifying, understanding, and remediating the stream of software security concerns that are continuously uncovered. To effectively do so, security professionals and researchers depend on various sources of information to acquaint themselves of the new security issues. One vital source is vulnerability databases, which operate as a repository of vulnerability information. In this line of work, we investigate the reliability of vulnerability reports in the vulnerability databases and their financial impact on the vendors. Parts of this work have been published at AsiaCCS 2018, SecureComm 2018, and TDSC 2020.
Cleaning the NVD
Are the vulnerability databases reliable and accurate? This work explores this question with the National Vulnerability Database (NVD), the U.S. government's repository of vulnerability information that arguably serves as the industry standard.
We uncover inconsistent or incomplete data in the NVD that can impact its practical uses, affecting information such as the vulnerability publication dates, applications affected by the vulnerability, their severity scores, and their high level type categorization. We explore the extent of these discrepancies and identify methods for their automated corrections. Finally, we demonstrate the impact that these data issues can pose by comparing analyses using the original and our rectified versions of the NVD. Ultimately, our investigation of the NVD not only produces an improved source of vulnerability information, but also provides important insights and guidance for the security community on the curation and use of such data sources.
Cost of Vulnerabilities
Vulnerabilities also have a detrimental effect on end-users and enterprises, both direct and indirect; including loss of private data, intellectual property, the competitive edge, and performance. This work investigates the hidden cost of publicly disclosed software vulnerabilities. We estimate vulnerability disclosure dates as a baseline for estimating the implication of software vulnerabilities. We further built a model for stock price prediction using the NARX Neural Network model to estimate the effect of vulnerability disclosure on the stock price. Our analysis also shows that the effect of vulnerabilities on vendors varies, and greatly depends on the specific software industry. Whereas some industries are shown statistically to be affected negatively by the release of software vulnerabilities, even when those vulnerabilities are not broadly covered by the media, some others were not affected at all. This work has been published at SecureComm 2018.
Internet of Things Security
The increasing acceptance of Internet of Things (IoT) devices by end users has been paralleled with their increased susceptibility to attacks. Software vulnerabilities in these emerging systems allow for multiple attack vectors that are exploited by adversaries for malicious intents. One of such vectors is malware, where limited efforts have been dedicated to IoT malware analysis, characterization, and understanding. In this area of research, I have focused on understanding the behavior of the IoT malware through static and dynamic analysis. We propose machine and deep learning based defenses against such malware and analyze their behavior against adversarial attacks. We also propose a lightweight method to categorize malicious executions through execution traces. This work has been published at ICICS 2020, Internet of Things Journal (2019, 2020, 2021), TDSC 2021, and RAID 2022.
Statically Dissecting Internet of Things Malware
We analyze recent IoT malware through the lenses of static analysis. Towards this, we reverse-engineer and perform a detailed analysis of almost 2,900 IoT malware samples of eight different architectures across multiple analysis directions. We conduct string analysis, unveiling operation, unique textual characteristics, and network dependencies. Through the control flow graph analysis, we unveil unique graph-theoretic features. Through function analysis, we address obfuscation by function approximation. We then pursue two applications based on our analysis: 1) Combining various analysis aspects, we reconstruct the infection lifecycle of various prominent malware families, and 2) using multiple classes of features obtained from our static analysis, we design a machine learning-based detection model with features that are robust and an average detection rate of 99.8%.
Malware Detection in Light of Adversarial Attacks
We investigated the robustness of binary-based models against adversarial attacks. The project utilized practical augmentation of subgraphs into the CFG of the malware binary, such that it does not affect the execution of the program. The proposed approach, SGEA (Subgraph Embedding and Augmentation), injects patterns learnt from the benign software into the malicious software to craft the adversarial IoT software. Through experiment we showed that all of the adversarial software were classified as benign by embedding an average size of 6.8 executional blocks. Our experiments showcased that the CFG-based malware detection systems are prone to practical adversarial attacks; putting forward the necessity to build robust systems to detect such manipulated features.