About the project

The limitations of existing single modality-based techniques are significant. To overcome these limitations, we propose a novel infrastructure, comprising algorithmic as well as architectural advances, that supports a multi-modal anomaly detection framework. We propose to investigate how this infrastructure can serve as the foundation for the next generation of intrusion detection and attack forensics techniques. In particular, the large number of available controllers in a system provide various perspectives of system behaviors, for example, a NIC controller witnesses network behaviors, a SATA controller observes hard-disk behaviors, memory controllers can be used to track memory accesses, and so on. These controllers can be queried to collect a rich set of events, which could be uploaded to a secure cloud or sent to trusted hardware such as a GPU for subsequent intrusion detection and forensics. The comprehensive aggregated set of traces can substantially improve, not only defense effectiveness, but also resilience in Byzantine environments where any component (outside the trusted compute base (TCB)) can be presumed to be compromised. For example, although ransomware and zip have indistinguishable file system behaviors, they can be distinguished by looking at their memory accesses and/or network behaviors. Moreover, when information from multiple modalities is aggregated, failures in a subset of modalities (due to cyber attacks) can be suppressed.

Malware Collection

We have downloaded a total of 2,672 x64 malware samples from VirusTotal and Malpedia. This collection includes 297 dynamically linked and 2,375 statically linked malware samples.

Malware Patching

Why we need patch malware?

Due to the inherent nature of malware, such as evasion, cloaking, and server connections, the actual malicious payload is difficult to expose.
The execution of malware samples seems to terminate very early, possibly due to configuration issues.

How to patch malware?

PMP (SP’20) is an advanced dynamic malware analysis technique that can expose actual malicious payloads regardless of malware cloaking or any similar techniques.
The main idea behind PMP is to flip predicates, enabling the dynamic execution to access deeper program states.

More details can be found at patch.

Environment Setup

We use VirtualBox to run malware, the network is configured as host-only and the disk is isolated from host. Each time, we start with the same snapshot “start_snap_for_malware” , we run patched malware and collect traces.

Trace Collection

For each patched malware, we execute it and collect four types of traces: syscall traces, network traces, disk traces, and perf traces.

More details can be found at dataset.

Model Training

Extract 240 semantic features from four types of log.

More details can be found at result.

License

This dataset release is governed by MIT license.

Acknowledgement

This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA) under LastAct seedling cooperative agreement number HR00112320035. The views, opinions and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of DARPA, the Department of Defense, or the U.S. Government, and no official endorsement should be inferred.