You are here
WEBINAR MAR 03: Linking databases to code repositories with Throughput
Join us for a one-hour webinar on Wednesday, March 03 for an introduction to the Throughput Project from one of its core developers. This online presentation will be hosted by Simon Goring, an Assistant Scientist with the Department of Geography at the University of Wisconsin-Madison, and and Adjunct Professor of Computer Science at the University of British Columbia.
Linking databases to code repositories with Throughput
Wednesday, March 03
10:00 - 11:00 am Pacific
Register Online Here
The National Science Foundation (NSF) funded Throughput Project has several parts, but the core is a large graph database that adheres to the W3C Annotation standard and uses schema.org labels for its data elements. It links almost 2000 databases, across a range of disciplines, to nearly 200,000 online code repositories, 158,000 journal articles, 215,000 individuals and 465,000 (NSF) research grants, comprising 17 million relationships between over 1.6 million nodes.
The database itself (snapshots available: https://doi.org/10.6084/m9.figshare.c.5075912.v1) is the result of web-scraping, the use of the xDeepDive infrastructure (http://geodeepdive.org), and user contributions through the Throughput Annotation API (https://throughputdb.com/api). Throughput is intended to support the work of several communities. Importantly, we hope to promote the uptake of research computing infrastructure by creating a "cookbook" for data resources that uses links between data and computing resources and their implementations in publications and online. Throughput has catalogued over 200,000 links between research data services and GitHub repositories, and has begun making these links discoverable.
For example, GeoGratis is linked to 104 separate code repositories, and 39 code repositories are discoverable by searching for Ocean Networks Canada. Our goal with this "Cookbook" search is to provide (early career) researchers with tools to leverage the work of data providers, and, by making these code resources discoverable, we hope to increase the practice of code citation, by providing a simple tool to export bibtex, RIS and other citations for code repositories. We also hope to promote best practices in documentation, by allowing data providers to see how their data or resources (R or Python packages, APIs etc.) are being used, to find exemplars of use, and to detect common anti-patterns.
This is an online webinar. Connection instructions will be emailed to all registrants.