New Collaboration with the Institute for European Global Studies in Basel

During November 2015 Data Futures commenced a collaboration with the Institute for European Global Studies at the University of Basel, Switzerland, to develop a workflow for comprehensive digital extraction and semantic engineering on trade directories produced by British and Japanese publishers during the 19th and early 20th centuries and covering European activities in many South-East Asian countries. These publications have been known for many years to be rich sources of information about persons, corporations and locations during this critical period in history. However, the irregular layout, typography and evolving conventions over this extended period - in chinese, english and japanese languages - has previously made this corpus resistant to digitization and more challenging still for automated entity extraction. Effective automation is also crucial because these volumes, often of more than 2000 dense pages each, and published every year for almost a century, require careful workflow planning to minimize skilled manual intervention if an accurate digital resource is to be developed even within a five year period.

The director of the Institute for European Global Studies, Professor Madeleine Herren-Oesch, and Christian Henriot - Professor of Asian Studies at Aix-Marseille University, have long-standing research activities using the Asian Directories and Chronicles and have built up a collection of these volumes as well as instigating digitization of others held in multiple libraries. This new collaboration is using advanced workflows embedding the International Image Interchange Framework with Data Futures freizo technology to manage more than 100,000 high resolution page impressions. Entities are extracted automatically from OCR at the University of Heidelberg's Center for Transcultural Studies using a new parser and analysis using progressively enriched dictionaries now enables significant reduction in the need for experts to interpret inconsistent and evolving representations as well as OCR ambiguities. Ten volumes have already been extracted with this new approach, reducing to a few days tasks that previously required multiple skilled historians for many months. From early 2017 this workflow will support a small team addressing extraction of many more volumes of the Directories, and in parallel a virtual research environment (VRE) is being developed which integrates the fichoz digital history platform. Dr Jean-Pierre Dedieu, the developer of fichoz at France's CNRS, who has applied it to other major projects in South America, the middle-East and China over the last decade has joined the Asian Directories collaboration, enabling new research to address connection of the VRE with the latest semantic engineering instruments.