Intertwined developments between program attacks and defenses witness the evolution of program anomaly detection methods, yet emerging categories of program attacks, e.g., non-control data attacks, nullify state-of-the-art detection mechanisms, e.g., pushdown automata anomaly detection. This paper points out the deficiency of existing program anomaly detection models under new attacks and presents an anomaly detection model based on context-sensitive grammar verification. The key feature of the proposed model is the reasoning of correlations among arbitrary events occurred in long program traces. It relaxes existing correlation analysis between events at a stack snapshot, e.g., paired call and ret, to correlation analysis among events historically occurred during the execution. The proposed method leverages specialized machine learning techniques to probe normal program behavior boundaries in vast high-dimensional detection space. Its two-stage modeling/detection design analyzes event correlation at both binary and quantitative levels. Our prototype successfully detects all reproduced real-world attacks against sshd, libpcre, and sendmail. The detection procedure incurs 0.1-1.3 ms overhead to profile and analyze a single behavior instance that consists of tens of thousands of function call or system call events.
In this paper, we present an efficient, unidirectional and multi-hop lattice-based Proxy Re-Encryption (PRE) scheme and describe its applications to distributed ad-hoc information sharing. Unidirectional and multi-hop PRE is ideal for secure publish-subscribe operations where a publisher encrypts information using a public key without knowing upfront who the subscriber will be (and what private key will be used for decryption). The proposed PRE scheme provides a multi-hop capability, meaning that when PRE-encrypted information is published onto a PRE-enabled server, the server can either delegate access to specific clients or enable other servers the right to delegate access. Our approach relies on NTRU-LTV, a lattice-based homomorphic encryption scheme that is also computationally efficient for use on low-resource embedded systems. We present an open-source C++ implementation of this scheme and discuss several algorithmic and software optimizations. In particular, we examine parameter selection tradeoffs in the context of security, runtime/latency, throughput, ciphertext expansion, and multi-hop capabilities. We show experimentally that our implementation runs two orders of magnitude faster than an implementation of a prior lattice-based PRE scheme which is arguably less secure than our design. Finally, we present practical recommendations for applying the PRE scheme to several use cases of ad-hoc information sharing for publish-subscribe operations.
K-means clustering is a widely used clustering analysis technique in machine learning. In this paper, we study the problem of differentially private k-means clustering. Several state-of-the-art methods follow the single-workload approach which adapts an existing machine learning algorithm by making each step private. However, most of them do not have satisfactory empirical performance. In this work, we develop techniques to analyze the empirical error behaviors of one of the state-of-the-art single-workload approaches, DPLloyd, which is a differentially private version of the Lloyd algorithm. Based on the analysis, we propose an improvement of DPLloyd. We also propose a new algorithm for k-means clustering from the perspective of the non-interactive approach which publishes a synopsis of the input dataset. After analyzing the empirical error behaviors of EUGkM, we further propose a hybrid approach that combines our DPLloyd improvement and EUGkM. Results from extensive and systematic experiments support our analysis and demonstrate the effectiveness of the DPLloyd improvement, EUGkM and the hybrid approach.
Online Social Networks (OSNs) offer convenient ways to cheaply reach out to potentially large audiences worldwide. As number of likes of a page has become de-facto measure of its popularity and profitability, alongside Facebooks official targeted advertising platform, an underground market of services artificially inflating page likes, like-farms, has also emerged. However, there is very little work that systematically analyzes Facebook pages promotion methods. This paper presents a honeypot-based comparative measurement study of page likes. First, we analyze likes based on demographic, temporal, and social characteristics, and find that some farms seem to be operated by bots and do not really try to hide the nature of their operations, while others follow a stealthier approach, mimicking regular users behavior. Next, we look at fraud-detection algorithms currently deployed by Facebook and show that they are inefficient to detect stealthy farms which spread likes over longer timespans and like popular pages to mimic regular users. To overcome their limitations, we investigate the feasibility of timeline-based detection of like-farm accounts, focusing on characterizing content generated by Facebook accounts on their timelines as an indicator of genuine versus fake social activity. We analyze wide range of features extracted from timeline posts, categorized into lexical and non-lexical. We find that like-farm accounts tend to re-share content, use fewer words and poorer vocabulary, and generate duplicate comments and likes compared to normal users. Using lexical and non-lexical features, we built a classifier to detect like farms accounts that achieves precision higher than 99% and 93% recall.