You are here
WEBINAR Feb 06: Text parsing and matching with High Performance Computing resources
Join us for a one-hour webinar on Wednesday, February 06 for brief introduction to some of the core concepts of analyzing text using computational tools.
We will use examples from social science research where multiple data sets refer to the same individuals and they need to be merged while accounting for deviations in how individuals are named or described.
This online presentation will be hosted by Ian Percel, a data scientist at the University of Calgary.
Text parsing & matching with High Performance Computing resources
Wednesday, February 06
10:00 - 11:00 am Pacific
Register Online Here
This session will demonstrate how standard calculations can be scaled to work on very large data sets through simple strategies that are easy to deploy in an HPC environment. To illustrate a typical solution, we will demonstrate three key steps:
- text parsing and cleaning with data frames and regular expressions
- a parallelization strategy using blocking keys
- approximate text matching, string similarity measures, and reduction to a well-defined machine learning problem
This is an online webinar. Connection instructions will be emailed to all registrants.