The Center for Transcultural Studies at Heidelberg University and the Contemporary China Centre at University of Westminster have long-standing research activities in the field of the Cultural Revolution and Mao- and post-Mao-era studies. Both organizations hold significant collections of Mao-era wall posters - an important communication and publication mechanism employed by the Communist Party of China - the ruling political party of the People's Republic of China. The Westminster China Poster Collection, dating from the 1970s, was one of the first to be digitized and appear on the internet, and with almost 800 posters remains the largest publicly-available collection of this material. However by 2009, after multiple internet deployments, it had become increasingly difficult to maintain because of technical obsolescence and organizational change and part of this document addresses transitioning it into a standards-based research resource. The Heidelberg China Poster Collection of 1500 unique posters and many duplicates addresses the same period, and unusually, has little overlap with the Westminster Collection. The Heidelberg Collection was digitized in 2014 and Heidelberg and Westminster have now agreed to create a combined Digital China Poster Collection and to develop protocols for its joint operation and the administration of research projects. Commencing in 2013, this program has so far achieved:
* Tamboti is a collection management ecosystem developed by the Heidelberg Research Architecture (HRA) and currently supports approximately 50 independent digital collections - it is an application of eXistdb, an open source native-XML database.
** The International Image Interoperability Framework, originally an initiative of the Andrew W. Mellon Foundation, has among its goals the provision for 'scholars [of] an unprecedented level of uniform and rich access to image-based resources hosted around the world'. The IIIF service for the combined digital poster collection at http://iiif.freizo.org/china_posters/ currently supports research into both Tamboti binary storage requirements and detail windowing developments to enable metadata developers to work on small fragments of large images without the need frequently to transfer very large files. This service is designed as a management infrastructure component or workflow development tool and does not itself provide facilities to search collections. However a windowing demonstration using image files from the combined digital poster collection re-directs to https://data-futures.org/iiif.html.
Several new areas of research have been identified as a result of the recent work on the collections reported (they are summarized here, but not addressed in detail) and further collaboration between the organizations is currently being defined. The current phase of development on the China Poster collections is scheduled to complete in Spring 2015 with installation of the newly Combined Heidelberg-Westminster Digital Poster Collection and provision of a new publicly-accessible website. Managed under Tamboti, with instances at both the HRA and The University of Westminster Archive, this will be accompanied by launch of an English language Tamboti collection management training program in The Westminster Archive.
This case study focuses on the methods and infrastructure used to transition the Westminster collection, to develop multiple standards-based metadata descriptions of both collections and to deliver the new combined digital collection into the Tamboti management system. More information about the poster collections themselves can be found in the appendix.
Professional photography of Westminster's posters dating from the 1980s had produced transparencies which were later scanned to produce TIFF files of approximately 20MB. While inadequate by today's standards (a discussion below addresses resolution issues associated with small printed text and the development of windowing solutions) these files still form a significant part of the main high-resolution corpus. A new digitization program is planned at Westminster, but both rapid changes in imaging technology, together with the significant re-investment necessary to minimize geometric distortion and to ensure color correction, have contributed to that project being deferred. The implications for such collections retaining multiple versions of digitized files, especially given the variable production values of the original posters as a disposable high volume communication medium, leading to significant variation over time, is considered in more detail elsewhere.
By 2001, much of the collection had been made freely available as medium resolution images through a simple html website, which remained one of the main collection services until 2014. In 2009 a new digital collection was produced using a content management system (CMS) called Zenphoto – an open-source php image gallery application using a MySQL database. In contrast to the 2001 project, which handled Chinese characters as images, the 2009 internet project supported unicode, which enabled its interface to be made available in Chinese as well as in English. However, this project never provided the main poster collection service for the Contemporary China Center - partly because of difficulties of its personnel being able to administer it directly as a research and reporting tool, and partly because of organizational changes. In contrast to early expectations, continued recourse to engineering support from Information Services was necessary to assist with content changes after initial configuration of the website. Crucially, Chinese language metadata needed to be entered by subject-based researchers or native language speakers.
More significantly still, although selection of the Zenphoto CMS had (potentially) permitted delivery of the Poster Collection for internet users within the available budget, its functionality was oriented primarily towards development of image collection websites: its metadata formats did not conform to any digital collection standards; nor did it offer more than rudimentary search facilities, which were themselves defined at implementation time. This meant that interoperability with emerging digital humanities projects in contemporary China studies, or indeed other fields, or with metadata harvesting services – was not possible without creating a parallel implementation of the collection – and requiring significant further investment. Also, the period for which Zenphoto will continue to be supported is unclear and scaling its performance for large collections and very large image file sizes, as well as arranging backup and fail-over mechanisms require engineering investment. This situation, in which internet presentation and development of metadata are integrated in a common technical solution, presents sustainability challenges because of the rapid development (and limited lifetime) of internet technologies and difficulty of export of metadata in a convenient format for re-use.
IMCC's Data Futures project was approached to develop migration and management strategies for the China Poster Collection, with emphasis on creating a sustainable infrastructure able to support direct accessibility by scholars, students and contractors, as well as inter-operation with standards-based research tools and other collections. This led to a roadmap for:
The subsequent workplan to create a new research collection comprehended generation of metadata compliant with Library of Congress standards – notably the Visual Resources Association VRA Core4 – but recognized that such standards were still evolving. Further, it was evident that comprehensive collection description would require flexibility to integrate information defined by other activities such as METS and PREMIS at a later date, and that introduction of structured annotation metadata such as OADM was still at an early stage of testing. This meant that even a standards-based collection would have to be re-organized multiple times for reasons independent of obsolescence of the technologies with which it was deployed. Accommodating evolution of the standards themselves and future distinction of object and annotation metadata required that portability for future migration became an important design consideration for the new collection.
However, it was also essential to be able to re-deliver the digital poster collection, together with new research, in a short timescale to support a government review scheduled in mid 2014. Development of requirements for a new public website for the China Poster Collection was impractical in such a timescale. Budget restrictions and the timescale for complying with University design guidelines if the existing CMS were abandoned also contributed to the decision to develop an interim solution:- using the existing Zenphoto CMS template but with an automatically genera replacement MySQL database. Longer-term strategy for producing internet presentations, which separates website development activities from the collection management ecosystem in order to protect investment in metadata development, is addressed in more detail in a later section.
The workplan for reclamation of the Westminster collection was based on developing importer software for both the existing internet projects and databases, as well as newer research (largely recorded using spreadsheets), and creating an intermediate freizo [3] collection from which a range of different data structures could be produced automatically. Such collection-wide transformation was possible because of the scalability of MongoDB (an open source document-oriented database technology) on which the freizo platform is currently based. Programmatic access to metadata represented in Mongo's native javascript object notation (JSON - although internally BSON), together with large computational and storage capacity provided by distribution on multiple real and virtual computers, enabled the existing fragments of the China Poster Collection - both metadata and multiple versions of binary asset files - to be imported from their different sources into a single container.
Temporary freizo collections were created from the 2001 internet project, from the 2009 Zenphoto MySQL database and from researcher's documents. A webpage-based workflow was then developed to enable scholars and contractors to search the existing material, organize image files and consolidate metadata into the new intermediate collection (internally designated CC3). Multiple contributors were able concurrently to re-organize multi-language metadata and new research directly into CC3 using dynamically rendered contributor webpages that supported English, Chinese, unaccented and accented Pinyin character-sets. The use of native speakers in different locations speeded-up translation and checking tasks and allowed reduction of costs. In some cases research based on GoogleDocs was linked dynamically to CC3, so that continuing updates from research became directly accessible to the workflow. A side effect of this strategy was that the temporary collections became documents, potentially accessible in the long-term, of the legacy internet projects, which were to be taken offline because of terminating support.
Development of the workflow web pages drew on experience with tools produced for contributor communities in other freizo-based projects, such as Princeton's Phono-Post collection and Merve Verlag's hybrid publishing research. These had tested fine-grain control of permissions of contributors' accounts, determining which information each account-holder could modify and providing curator-defined buttons and message fields for communication about work-in-progress to enable management of the workflow by temporary personnel. Adminstration of credentials, potentially with different permissions across multiple collections, together with user password services is key to effective coordination of extended communities of scholars, students and contractors. Additionally, freizo's bulk accession mechanisms, originally used to create large batches of new assets automatically from external digitization contractors, (see the Phono-Post case study for a more detailed description of these developments) enabled the rapid creation of temporary collections from the historic projects. Further, design of forms for contributor use by managers and curators, without the need for programming or developer intervention was invaluable to customize the workflow for different tasks and user groups - in some cases on a day-to-day basis. The native JSON of MongoDB provides a particularly effective environment for developing web pages that render database information dynamically and which are efficient in contemporary browsers.
Since early availability of the freizo workflow in mid 2013, significant new information had been entered in CC3 about the China Poster Collection - especially in the area of printer and artist information - which had only been available previously in one language or within detail images. In addition, powerful search tools could be developed as needed for the freizo collection, while the facilities of the CMS were fixed. It was possible to supply to the CMS via the MySQL export, pre-defined information that had previously been unavailable for some posters, and also to create new poster entries, but not to change the functionality of the website. Replacement of the Zenphoto CMS interface with a new publicly-accessible presentation website driven directly by CC3 is scheduled for the first quarter of 2015. Uploadable javascript and CSS techniques [5], which support development of third-party-supplied interfaces tailored to specific collection and research needs, have already been evaluated in other freizo projects, and will allow the display of information now available in CC3, such as locations of specific Chinese texts within each poster.
For the China poster reclamation project described here, generation of VRA Core 4 metadata by freizo was accomplished by writing exporter software - initially to render poster object metadata so that it could be displayed and managed within the CC3 workflow - and subsequently to export it as a file (including digitized image file locators) for import by Tamboti. It is significant that multiple different valid mappings of CC3 metadata to VRA could be defined - which we refer to as possible 'profiles'. Amongst these, the encapsulation of annotation metadata within miscellaneous VRA mechanisms might be achieved. However, such profiles require definition of collection-specific conventions for representation of contextual information, and the development of custom methods based on those conventions in order to employ annotation - for example, in searching the collection. Significantly, migration of the collection to remain compliant with future VRA developments would incur collection-specific engineering investment.
Installation under Tamboti of the new Westminster China Poster Collection, developed in freizo as CC3, allowed editing of its metadata by collection managers in the same environment as many other collections in different subject areas. This has the advantage of the availability of mature training courses and an independent user support and maintenance organization. Further, the approach adopted of defining the relationship between CC3 and VRA at the export stage meant not only that this mapping could be changed with low overhead and a new profile exported to Tamboti, but also that implementation of a mapping table mechanism in the exporter allowed documentation of the VRA profile history of the collection. It is expected that such traceability will become important as the VRA standard evolves and also as better strategies for defining structured contextual metadata are developed - making possible more clear guidelines for the payload of metadata elements such as VRA and MODS. This infrastructure will support a new research program addressing tools for migration of collections between metadata profiles as well as evaluation of annotation strategies.
With availability of the CC3 collection in the Tamboti ecosystem, the relationship between the Westminster collection and the Heidelberg China Poster collection was examined. The Heidelberg collection had not been digitized previously, and during mid 2014 it was decided that a preliminary photography project be undertaken so that a survey could be conducted. An intermediate freizo collection - CPH - was generated from the resulting image files and a workflow made to enable the significant numbers of duplicated posters to be grouped - leading to a count of almost 1500 unique posters - and comparison with the Westminster collection to be made. As a result it was determined that less than 10% of the assets of the individual collections overlap. The CPH workflow has subsequently been extended to enable native Chinese-speaking contractors to develop initial metadata for the Heidelberg Collection.
A memorandum of collaboration was established between the Heidelberg Center for Transcultural Studies and the Westminster Contemporary China Center, to develop a Combined China Poster Collection. As a result the VRA profile in use at that time for the Westminster Collection was applied to CPH and a Tamboti collection exported. The freizo IIIF service was also extended to CPH and evaluation of IIIF as a binary asset service for Tamboti commenced. The Tamboti Heidelberg Poster Collection entered service during November 2014 supporting a course being delivered at Stanford University. During the first quarter of 2015, completion of the current phase of metadata entry for the Heidelberg collection, and further development of the poster collection VRA profile will enable launch of the Combined China Poster Collection. This will provide a valuable research platform - not just for subject specialists, but also for collection management work. Multiple instances of the Tamboti ecosystem - in particular operating Tamboti in the University of Westminster Archive will provide fail-over and improved service performance, but also allow testing of synchronization and collaborative administration of the two sub-collections by the respective organizations' archivists. The new Westminster China Poster website, which will be driven from the combined VRA-based collection will provide public access only to posters in the Westminster collection. In addition Data Futures will investigate automated updating of the freizo intermediate collections based on research and administration activities applied the Tamboti collection. Such a synchronization mechanism would enable flexible collection-wide migration of the Combined Collection to address metadata profile changes and new VRA developments.
The Heidelberg University China Poster Collection is a large collection of Mao and post-Mao era posters (almost 1,500 unique posters with, unusually, many multiple copies) collected in China by Peter Köppler and donated by him to The University in 2007. Early work to catalog these posters led to a digitization project in the Heidelberg Library during the Summer of 2014 and creation of a freizo digital collection in collaboration with the Data Futures Laboratory at the University of Westminster.
The University of Westminster’s China Poster Collection is a unique resource of approximately 800 posters spanning the 1950s-1980s. It attracts visitors from the EU, US and China, including film-makers, curators and collectors as well as academics and PhD students of visual and film studies, modern history, politics and international relations, and museums and collections.
The Collection was founded in 1977 by the writer and journalist John Gittings, then Senior Lecturer in Chinese at the Polytechnic of Central London (PCL) – later The University of Westminster – with the aim of providing a teaching and learning resource on the Mao era. Major exhibitions of posters from the Collection have taken place in Australia as well as the EU and US. There have also been numerous symposia, seminar papers and public lectures based on the Collection’s materials, and two documentary films based on study of the Collection were made in 2006 – ‘Painting for the Revolution: Peasant Paintings from Huxian’ and ‘Red Art’ by China’s most eminent documentary film makers – Hu Jie and Ai Xiaoming.