User:Gtyopal/sandbox

= Location Privacy = Location Privacy is highly related to consumers' location information. It is usually an important part in LBS(like Foursquare, Gowalla) concerning the transmission and sharing of user location data. Although LBS has long been providing benefits and convenience to consumers, it inevitably brings about a disclosure of users' detailed personal attributes which may reveal users' interests or their health information.

According to some literature, location privacy is a special type of information privacy which concerns the claim of individuals to determine for themselves when, how, and to what extent location information about them is communicated to others. From above, location privacy is mainly composed of two parts: the accurate positioning and the inference from side channels. Following the terminology introduced in [10], location privacy is usually divided into two levels:macroscopic and microscopic. The macroscopic location privacy means user's privacy which is measured in large scale. For example, when a user is moving, there are multiple queries from the user as they are constantly moving. On the other hand, the microscopic location privacy represents the user's privacy which is measured in small scale. For example, the user issue a single query, which represents how accurately the adversary can detect the user's location after getting information of the single queries, which is based upon previous background knowledge.

General Situation
At present,as smart phones and social networks are gaining increasing popularity today, the mobile Internet manifests a new trend which is often termed ‘SoLoMo’, namely, the combination of social, local and mobile. Not only social sites like Facebook, Twitter and Microblog are offering location based services, but conventional information service like ‘search for surrounding dining room’ is also featured by social elements. Location data is of great value among all these social applications, and it can be categorized into three classes according to different purposes.
 * In the first classification, location data is treated as input information when requesting for services such as finding the nearest bank.
 * In the second classification, location data refers to data that users are willing to share with others, one of this application scenarios is check-ins.
 * In the third classification, location data is considered as the cost of certain free services.

Y.-A.Montjoye et al. studied fifteen months of human mobility data for one and a half million individuals and found that human mobility traces are highly unique. In their research, four Spatio-temporal points are enough to uniquely identify 95% of the individuals. In other circumstances, a variety of risks will spring up once the location data is misused. For example, attackers may induce users’ interests, habits or health information by counting up their check-ins; users’ address may also be obtained through searching the location data generated at night. Besides, advertising agencies can push location-aware advertisements to consumers if they get access to their location data in time. To sum up, the disclosure of location data is capable of bring potential harm to consumers, especially in this big data era. Therefore, how to ensure consumers’ location data privacy has become a necessary and urgent problem at this moment.

Challenges

 * First of all, preserving location privacy is often personalized requirement which means different consumers expect different demands. Even the same user may wish diverse levels for location privacy preserving on different occasions. There we consider two types of privacy which are location privacy and query privacy. the former is directly related to the user's sensitive location information, for which the adversaries care about if the user's location is accurately location or if the user's personal information can be speculated from it; the latter related to the detailed content of the user's LBS request, which can also be utilized by adversary to deduce the user's true demand.
 * Secondly, Location privacy and location information availability is actually in contradiction due to the fact that service provider needs to know exactly where the consumer is if he demands a higher QoS. Thus, An appropriate balance is needed between the location privacy protection strategy and location information availability[1].

Countermeasures
Privacy preserving techniques adopted in LBS must take comsumers' personal requirements and the overall QoS into cosideration.In short, location privacy preserving techniques have to focus on these key problems below:  Classification of privacy preserving techniques in LBS :
 * 1) How to accurately measure the risk of the disclosure of consumers' privacy;
 * 2) How to choose an efficient privacy protection mechanism in order to preserve users' privacy comprehensively;
 * 3) How to make a balance between the preserving level,the QoS and the cost of resource.
 * 1) Policy-based privacy preserving technique(such as P3P);
 * 2) Fuzzification-based privacy preserving technique(such as Pseudonym, Dummy, Location Cloaking, space-transformation);
 * 3) Cryptography-based privacy preserving technique(such as PIR, HilCloak);
 * Pseudonym breaks the mapping between the user identity and the location in this case that an untrusted server only receives the location without the user identity . There are diverse types of the pseudonym of a user, like the user's username, or inferable content of the IP address. All of them can be viewed to reveal some information of the user's true identity. Permanent pseudonyms can link user's queries and then enabling re-identification, while most of the pseudonym is dynamically changeable. However, such a technique is limited to those location based services that do not require the user’s identity. In particular, the lack of user identity makes the billing of these services impossible.
 * Dummy[3] generates fake user locations(called dummies) and mixes them together with the genuine user location into the request. While server can detect the genuine user location from dummies by checking the long-time movement patterns of a user. Ghinita et al. introduced a new frame based on PIR that partitions the space into grid cells for which the user can request for the detailed content of each cell. The user can receive the exact content of the cell as well as hide some information of the requested cells. The scheme ensures the zero-disclosure of information from the user to the server with a highly cost. While the scheme needs to adopt a large amount of modulus bits to calculation for PIR's security, which incurs significant overhead in terms of both computational and communication costs, compared with location cloaking techniques. So large datasets environments, this schemes becomes too costly to requesting the whole dataset.
 * Location Cloaking is aimed at blurring the user locations when they request for location-based services. The main idea is to extend the user's accurate location into a cloaked region according to some privacy metric such as Cloaking granularity or K-anonymity[4]. The area of the Cloaking granularity should be larger than the user's specified threshold value; In order to protect a user’s location privacy using k-anonymity, each of her queries must be indistinguishable from that of at least k − 1 other users, and to extend the user's location to a cloaked region so that each cloaked region should cover at least k users. The k-anonymity privacy metrics is very popular because of its simplicity. Many research has been focusing on how to reduce the cost of the query fuzzification as well as how to improve the working efficiency of k-anonymity, by extending the fuzzification method to protect traces like location privacy at the macroscopic level, or utilizing k-anonymity in different conditions[10]. There are two approaches to implement Cloaking/Anonymity schemes: centralized approaches and decentralized approaches. The former utilized a trusted Central Anonymity Server (CAS) to act as cloaking/anonymity engine, cloaking/anonymity repository, and results refiner; the latter is to consider that the cloaking region is computed by a set of entities like the users themselves in a distributed manner. For example, they corporately play the role of the CAS.
 * Space-transformation adopted one-way space transformation to encode the user's location and POI(points of interest) into an encrypted space and can do the query evaluation in the transformed space, which maintains the distance features of the original space. Also, by receiving the transformed query results, user can also effectively reverse the transformation by using the trapdoor information. The trapdoor information is only provided to the user within the same group but it is unknown from the service provider. The elimination of the trusted third party "anonymizer" during the query processing is the key advantage of such a scheme over the existing k-anonymity and cloaking approaches.

 Limitations of Cloaking and Anonymity techniques : For fuzzification-based privacy preserving, to address the location privacy issue, location k-anonymity and cloaking granularity are two commonly used privacy metrics. While the location k-anonymity protects the user identity out of k users, it may not be able to prevent the location disclosure (e.g., a cloaked region covering k users in populated areas could be very small). On the other hand, the cloaking granularity prevents the location disclosure but cannot defend against attacks for user identifies in the cases where user locations are publicly known and there is only one user in the cloaked region, as shown in Fig.3.

1. Many of the k-anonymity/cloaking approaches rely on a trusted third party to “anonymize" user's location which means all queries should involve the "anonymizer" during the regular operation of a system.

2. Either the QoS or overall system performance degrades significantly as user wish to have some more strict privacy preferences.

3. Majority of the k-anonymity/cloaking approaches are subject to location-dependent attacks that if the attacks know the historical cloaked region or the movement pattern like the maximal speed of a user, then the location privacy of the user might be compromised.

4. The concept of k-anonymity may not work in all conditions. For example, in a less populated area, the size of the cloaked region can be        very large to include all K users. In another case, there are not enough number of users subscribed to the service to construct the required cloaked region.

5. These schemes are assuming that all users are trustworthy. However, in decentralized approaches, if some of them are malicious, they can  easily collude to compromise the user's location privacy.

 PIR in Location Privacy Preserving :

In cryptography, a private information retrieval (PIR) protocol allows a user to retrieve an item from a server in possession of a database without revealing which item is retrieved. By assuming there is a secure communications channel between user and location server, for which the channel can not be eavesdropped by any adversaries, users can request content about the location of themselves and POIs without revealing any information about their locations by using PIR.

There are roughly two types of PIR in location privacy preserving that are computational PIR and hardware-base PIR. By computational PIR, computational hardness is enforced for any adversary to find which database item being queried, from the client server communications by converting spatial query processing to private retrievals from a matrix representation of server's data. Its underlying techniques is Quadratic Residuacity Assumption (QRA) which acts as the basis for the secrecy of the computational PIR protocol. The drawback of this scheme is the very high communication and computational cost; The hardware based PIR adopt a secure coprocessor to pretend a option of contents requested by the user from the server, and the coprocessor can prevent the server from knowing which database items are read by the coprocessor. The hardware-based PIR are equipped by the hardware cryptographic accelerators to implement the algorithms like DES and RSA. While the major drawback of this method are limited storage and computational resources of the secure coprocessor.

Finally, another significant restriction of spatial query processing of PIR-based techniques is their inability to effectively deal with continuous location updates, but some novel algorithms has been provided to solve this issue recently.

Some Front-end Location Privacy Preserving Techniques
1. 2PASS (2-Phase Asynchronous Secure Search):

Hu and Xu proposed a novel framework of 2PASS[4]. The Spatio-temporal granularity techniques allow user to define the location granularity area which he is exposed. While previous works only focus on minimizing the cloaked region by satisfying the privacy metrics to indirectly minimize the bandwidth, and they did not consider much the query content. For example as a nearest neighbor query, a user wants to search the nearest restaurant from his location, a simple and direct method is to generate a region with granularity metrics of T which is covering the user's location, and send query request to the service provider for the anonymity region. Then Service Provider will return all detailed content for all possible nearest neighbor objects in the cloaked region to the user, and the user will decide the final resulted objects based on his genuine location, as shown in Fig.4.(a). There is two deficiencies for this method, the first is that the processing time of the service provider might be too many possible nearest neighbor objects for the cloaked region and the service provider needs to check all possible objects; the second is the result return time might be too long due to the fact that the detailed content along with the objects sent by the service provider might be too much, that to increase the data transmission overhead. To overcome this problem, a 2PASS framework by adopting the Voronoi cell is proposed, as shown in Fig.4. (b).Given a set of n objects, a Voronoi diagram divides the space into n partitions. Each partition is called a Voronoi cell and corresponds to one object, each object can be viewed as a nearest neighbor of any point of its Voronoi cell. By connecting all objects in each cell, it becomes to the weighted adjacency graph(WAG), where the weight of a vertex is the area of the corresponding Voronoi cell. as shown in Fig.5. As shown in the Fig.6, a mobile user wants to issue a location-based service like finding the nearest restaurant from the LBS server. It shows how 2PASS differs from conventional cloaking approaches that the user firstly invoke a location cloaking before requesting for the service(step ➀ ). Then the location cloaking functions to generate a random cloaked region by covering the user's genuine location according to the user's specified threshold, the user then attach the region into the service request to send to the service provider(step ➁ ).Upon receiving this request, the server processes it and returns the resulted objects (step ➂ ). However, 2PASS will not generate the the cloaked region blindly without knowing the dataset; rather, 2PASS knows the spatial locations of the objects and can directly request for the detailed contents of result objects from the server. We can see that a 2PASS works in two phases. In the first phase (steps ➊➋ ), the mobile user requests a WAG of its neighborhood area from the LBS server, and the LBS server generate this information and send them back to the user; In the second phase (steps ➌➍ ), the client selects objects from this WAG like two restaurants Mille’s and Maxim’s here, and requests for their detailed contents like the opening time, map, customer reviews, and reservation status from LBS server, upon receiving the request the LBS server processed the request and generated all detailed content for these objects and send them back to the user.

Between step➋ and ➌, The core component of 2PASS is a lightweight WAG-tree index from which the client can compute out the objects to request from the server. In the system initialization phrase, the WAG-tree is been sent and cached on the client side, the client then look up the WAG-tree and locate the WAG snippets which contains the query point, if the subgraph area of this snippet are still under the threshold value, then the user will further locate the lowest-level ancestor node for which the subgraph area is just over the threshold, then the user can request all snippets rooted at the same node, we call these snippets the host snippets, and the client will join all host snippets into a single WAG and apply the approximated MVWCC(Minimum valid-weight connected component) algorithm (which can be reduced to a kMST(k Minimum spanning tree) problem in polynomial time) to generate the result VWCC(valid-weight connected component). The client in second phase will request for the detailed content of the objects appear in the result VWCC as in step ➌.

Without additional information, the server can only know that the user is in the cloaked region implied by these requested objects. While the mobile user controls the returned objects and minimizes their number and thus the total bandwidth usage while still satisfying the privacy requirement. The framework can be also extended to support kNN queries and other LBS with different objectives. There are three three models from the server that might compromise the location privacy of the client.The first threat is reverse engineering of the genuine NN object based on the set of requested objects and the approximate MVWCC(Minimum valid-weight connected component) algorithm. The second threat is the "pseudorandomness" of the genuine NN. The threat is guessing k out of the received k' regarding kNN queries. While 2PASS addresses these threats by assuming the client hides several secret values from the server, including the scaling factor for the approximate MVWCC algorithm and the promotion degree for the k-promotion algorithm.

2. ICliqueCloak : Pan and Xu provided a new frame called ICliqueCloak[6]. This paper mainly concerns the defending for the location-dependent attacks that in the scenario of various LBS request were continuously generated by mobile users when they were moving. In such condition, most of the k-anonymity/cloaking approaches only focus on the snapshot user location to defend for the snapshot location attacks, while can't taking into consideration of the continuous location update to defend for the location-dependent attacks. For example, if the attackers knows the historical cloaked region and the movement pattern like the maximal speed of a user, then the attackers may deduce the user should be limited in the area of MMB(Maximal Movement Boundary) at time ti+1(which is a round rectangular area by extending the previous cloaked region of R at ti by Va*(ti+1-ti)),then the attackers can further deduce the user should be in the intersection area of R at ti+1 and MMB at ti,ti+1, at time ti+1, which means the user's location privacy is compromised. Therefore, by utilizing both the location k-anonymity and cloaking granularity as privacy metrics, the ICliqueCloak which is a incremental clique-based cloaking algorithm is proposed to defend again location-dependent attack, as shown in Fig.7. To use a graph model to formulate this problem, as shown in Fig.8. Each mobile user can be viewed as a node in the graph, and each edge exists between two nodes only if they are in MMB of each other and they are potential pairs for the cloaking. Then the problem becomes to find the k-node clique in the graph for the node in the clique to form a cloaking set to meet the requirement of location k-anonymity and the MBR(minimum bounding rectangle) of the cloaking set is considered a candidate cloaked region; to further reduce the computational complexity, the problem becomes to identify and main a maximal clique incrementally based on three classes; finally the candidate cloaked region need to be checked to see if it needs to be adjusted to prevent the location-dependent attacks, then a clique which meet the requirement can be identified and used to generate the cloaking set which means the cloaking is successful, by updating the graph when a new request arrives. ICliqueCloak system architecture as shown in Fig.9., Like most existing work, considering a system consisting of mobile users, a trusted third party anonymizer, and an untrusted service provider. The "anonymizer" consists of a cloaking engine/cloaked repository, and a results refiner. A mobile user sends location-based query requests (e.g., “finding the nearest restaurant”) in the form of (id, l, P, q), to the "anonymizer" through an secure and encrypted connection( in which P contains the privacy parameters, q is the query content id is the real user identity,and l is the user location). The framework works as follow: First, the "anonymizer" will check if the user has made any queries before upon receiving a LBS request. If this is not first time, then the cloaking engine will function to replaces id with the user’s existing pseudonym id’; otherwise, it will generate a new pseudonym id’ and replace the user's true id with it. Second, the location cloaking will work to generate a new cloaked region R according the user's privacy metrics like cloaking granularity/location k-anonymity, the the cloaked region will be send to the cloaked repository for storage as format of (id, id', P, Rti, ti).Third, the "anonymizer" send the revised query request as form of (id’, R, q) to the service provider. Fourth, the service provider will generate all detailed content for the query objects for any location point within R upon receiving a LBS query request, and send the results back to the "anonymizer". Fifth, the "anonymizer" will do results refining step based on the user's genuine location. Finally the "anoanymizer" will send the refined results to the mobile user.

Experimental results show that ICliqueCloak is efficient in terms of various performance metrics including the cloaking time, the request processing time, and the cloaking success rate, while its anonymization cost is only slightly increased in comparison with the existing algorithm. Except of the location-dependent attacks, ICliqueCloak can also effectively prevent snapshot location attacks, query tracking attacks, and trajectory attacks.

3. Multi-level Grid Scheme :

According to Li, Hu and Xu's paper[7]. Proximity detection, is an emerging category of mobile geosocial networking services. The basic idea of proximity detection is the apply the one-way space transformation to preserve the proximity that the service provider can to the proximity detection in the transformed space. For a typical example, the “nearby friend alert” notifies a user when a friend is nearby. With such services, users not only issue proximity queries but also serve as the query results, new privacy schemes on mutual party is needed for those querying and those being queried, in which both of them are constantly moving. There are no quantitative studies of detection accuracy with true proximity.

Grid-and-hashing paradigm is proposed as shown in Fig.10. From it we can see that if any signature of any two users are same in certain dimensions, then the two users may be probably in proximity. This paradigm imposes a uniform grid G to partition the space into cells, and the placement of G is unknown from the service provider which is a key distributed among the users in the same group. Then the cell which the user located will be indexed and be further encoded by using the one-way hash function like SHA-256(he key is distributed among users in the same group and is dynamically changing (by rehashing) against a service provider’s inference), to generate a signature according to G and to be sent to the service provider, then the service provider can to the proximity detection by simply testing the equality of the signature uploaded. While there is some issues with grid-and-hashing paradigm is that it might come up with the 'false negative' detection result due to the grid division is pre-defined, that two users are actually in proximity, but because they were putted into different cells, the signatures they uploaded is different.To enhance the grid-and-hashing paradigm by increasing detection accuracy while preserving wireless bandwidth, their multilevel grid scheme offers continuous proximity detection. Firstly, The proposed scheme can eliminate false-positive cases by setting an appropriate grid size. To minimize false negative detection, we propose grid overlay(as shown in Fig.11) with a set of independent grids—and study the optimal placement of these grids(as shown in Fig.12). The optimal grid overlay is created by shifting the first grid G in both diagonals for every 1/k of the cell length. Secondly, The continuous monitoring involves user signature update and service provider query reevaluation. A naive approach is that the user updates any signature of the grid overlay whenever it changes. Obviously, this approach is extremely costly, because the overlay might contain dozens of grids and thus lead to frequent signature updates. To capture this, we designed a multilevel grid overlay hierarchy, as shown in Fig.13. This bottom level of grid overlay is used for in-proximity pairs, while all upper levels are used for non-proximity pairs.

The dynamically trading of accuracy for communication or computational cost (or vice versa) is the important advantage of this scheme over the existing works, that the users can do it in a quantitative manner by adding or removing grids as they want, which is a key issue in mobile environments. a client-side location update scheme and server-side location update handing procedure is also devised by using this scheme, to support for the continuous proximity detection, so that it can reduce the frequency of the location updates when a user is far from their friends.

==Summary Of Current Location Privacy Preserving Techniques ==