Welcome to the homepage of the BIGDATA: Mid-Scale: ESCE: DCM: Collaborative Research: DataBridge – A Sociometric System for Long-Tail Science Data Collections project.

For updates on the project, check out the News page.


About DataBridge

There are currently thousands of scientists creating millions of data sets describing an increasingly diverse matrix of social and physical phenomena. This rapid increase in both amount and diversity of data implies a corresponding increase in the potential of data to empower important new collaborative research initiatives. However, the sheer volume and diversity of data presents a new set of challenges in locating all of the data relevant to a particular line of research. Taking full advantage of the unique data managed by the “long-tail of science” requires new tools specifically created to assist scientists in their search for relevant data sets. DataBridge is an e-science collaboration environment tool designed specifically for the exploration of a rich set of sociometric tools and the corresponding space of relevance algorithms, and their adaptation to define semantic bridges that link large numbers of diverse datasets into a sociometric network. Data from several large NSF funded projects will be analyzed to develop relevance-based data discovery methods. Sociometric network analysis (SNA) algorithms will be used to explore the space of relevancy (different ways data can be related to each other) by metadata and ontology, by pattern analysis and feature extraction, and via human connections. By linking data, human interactions, and usage methods and practices, rich models of social networks inter-connecting massive long tail science data can be created that enhance scientific collaboration and discovery. DataBridge supports advances in Science and Engineering by directly enabling and improving discovery of relevant scientific data across large, distributed and diverse collections. The system will also provide an easy means of publishing data to the DataBridge and incentivize data producers to do so by enabling collaboration and citation. The design will be domain-agnostic and highly extensible and adaptive, supporting inclusion of new relevance algorithms and indexing techniques. DataBridge will be distributed under an open source license enabling wider use and crowd-sourced improvements of the technology. The concepts developed in the project – semantically linking data through sociometric network analysis – will have an impact on non-scientific data collections and will effectively improve access and discovery of information over the Web.