You are here

WEBINAR Feb 06: Text parsing and matching with High Performance Computing resources

WestGrid webinar text parsing Feb 2019

Join us for a one-hour webinar on Wednesday, February 06 for brief introduction to some of the core concepts of analyzing text using computational tools.

We will use examples from social science research where multiple data sets refer to the same individuals and they need to be merged while accounting for deviations in how individuals are named or described.

This online presentation will be hosted by Ian Percel, a data scientist at the University of Calgary.


Text parsing & matching with High Performance Computing resources
Wednesday, February 06

10:00 - 11:00 am Pacific 
Register Online Here


This session will demonstrate how standard calculations can be scaled to work on very large data sets through simple strategies that are easy to deploy in an HPC environment. To illustrate a typical solution, we will demonstrate three key steps:

  • text parsing and cleaning with data frames and regular expressions
  • a parallelization strategy using blocking keys
  • approximate text matching, string similarity measures, and reduction to a well-defined machine learning problem


This is an online webinar. Connection instructions will be emailed to all registrants. 

More Information

For more information or to register, visit the event page. Questions can be emailed to