Details of Grant 

EPSRC Reference: EP/J005266/1
Title: The Uncertainty of Identity: Linking Spatiotemporal Information Between Virtual and Real Worlds
Principal Investigator: Longley, Professor PA
Rajarajan, Professor M Musolesi, Dr M
Department: Geography
Organisation: UCL
Scheme: Standard Research
Starts: 01 November 2011 Ends: 30 April 2015 Value (£): 1,218,191
EPSRC Research Topic Classifications:
Computer Graphics & Visual. Human Geography
Information & Knowledge Mgmt Mobile Computing
New Media/Web-Based Studies
EPSRC Industrial Sector Classifications:
Information Technologies Communications
Summary on Grant Application Form
This is an interdisciplinary proposal from Computer Science (St. Andrews), Engineering (City University London) and Geography (UCL), in partnership with experts in Visual Analytics at Purdue University in the United States. Our goal is to link information pertaining to human characteristics in 'real' and 'virtual' worlds in order to better manage the uncertainties inherent in establishing human identity and linking it to geographic locations. Our basic premise is that uncertainty in identifying and characterising individuals may be managed and understood by: (a) detecting and exploring spatio-temporal profiles of lifestyles and activity patterns; (b) concatenating and conflating detailed, but under-exploited, datasets in the virtual and real domains; and, more speculatively (c) soliciting and analysing crowd-sourced volunteered data that link physical and virtual identities. Through these actions it will be possible to improve our ability to characterize and validate an individual's identity, to infer more informative profiles of individuals and groups that bridge the real and virtual domains, and to document and manage the uncertainties inherent in these tasks. Aspects of this highly innovative research agenda are inevitably risky and speculative, but following an EPSRC 'WDYTYA' Sandpit we have appraised risk, examined the feasibility of data acquisition and addressed ethical approval issues.

The research will require multiple sources of data about a user's online activities (henceforth 'virtual' sources, such as multiple social networks, commercial information, purchases, etc.) alongside more conventional data (henceforth 'real' sources: population censuses, names registers, telephone directories, social surveys, etc.). Systematic linkage will be used to better resolve the question "Who do I think you are?" We propose the exploitation of complementary databases and methods in order to relate 'real' and 'virtual' properties, to glean, synergise and cross validate new information and to leverage value from secondary sources. This will be achieved by developing novel methods of data collection, maintenance, exploration, analysis and modelling, that are efficient, effective, scalable, and safe to use.

The work programme will be undertaken through a programme of six inter-linked work packages in the UK and US, viz:

Work Package 1: Data Collection Tools

The development of new and effective tools for virtual data collection

Work Package 2: Text Analytics

Development of text analytics algorithms to describe clusters of concepts, or associations between certain concepts or named entities.

Work Package 3: Data Anonymisation and Privacy Preservation

Achieving a balance between the benefits of enhanced data collection (Work Package 1) and text mining (Work Package 2) versus the imperatives of preserving individual privacy.

Work Package 4: Cybergeodemographics

Use of primary (Work Packages 1 and 2) and secondary data to relate virtual Internet traffic to the probable physical locations from which it emanated; and the development of typologies of social networks that are robust, generalized and related to physical locations.

Work Package 5: Spatio-temporal Network Analysis

Development and application of spatio-temporal network analysis techniques to emerging social and geographic networks of individuals and the systems used by them.

Work package 6: Visual Analytics

Deployment of a range of visual exploratory data analysis techniques to alert users to deviations from trend or average behaviour and profile. http://worldnames.publicprofiler.org/UncertaintyOfIdentity/index.html
Key Findings
1) Identification of the community structures in the social media datasets by using the attributes of information, in this case the user name, which users provide while registering with a social media service.

2) Analysis of the Spatial and Temporal activities of the Social media usage around different world cities.

3) Development of a prediction model for studying the spreading of information on the Internet, considering the announcement of the discovery of the Higgs boson as case study.

4) Development of a model for studying and predicting the interactions of physical mobility and communication patterns (mobile phone calls) in presence of an epidemics.

5) Development of a prediction model of future locations of individuals given their previous movement patterns and their social ties.

6) Development of methods to group email addresses based on semantics of surnames using vector space model.

7) Study of the geo-genealogy aspects by using internet search histories.
Potential use in non-academic contexts
Project URL: http://www.uncertaintyofidentity.com/
