Making Datasets Comparable

Making sets of microdata comparable is the necessary first step towards a ‘European life course history’. Each database consists of widely varying variables, due to the diverging sources from which individual lives are reconstructed. These sources range from censuses, tax records and land registers, through church records and civil records (recording birth, marriage and death) to dynamically updated population registers. Furthermore, each database has its own problems of sampling, selection biases and ‘loss to observation’, that are documented extensively. The diversity in content means that not all research questions can be addressed to all databases. The tasks of the network are fourfold:

  • Mapping the diversity of data contained in European historical micro databases into a common interface based on an already established format (IDS), including metadata and documentation;
  • Bringing together programmers and researchers in projects to write data extraction programmes for a number of research fields such as migration, fertility and social mobility;
  • Enhancing dissemination by educating scholars in the use of the IDS and data extraction programmes;
  • Encouraging the development of new databases in parts of Europe where historical population databases are scarce.

The figure below presents the outline of our strategy. The main idea is that all relevant databases will transfer their data into the common data format of the IDS. Databases with not yet linked data, like census databases, can easily be included in this format because they have the same kind of data and ultimately they will develop into longitudinal databases. On the left side of the diagram you find the various types of sources included in historical longitudinal databases. Because each database captures and stores data in a different way, it is impossible to create a single data management structure that will work for every situation. On the right side of the diagram are the data files that researchers require for analyses. These files are made by the extraction software using the date from the IDS of each database in the same way. Standardised data extraction, metadata and documentation make for large gains in efficiency.

George Alter and Kees Mandemakers finished the fourth version of the IDS. This version is published in the first volume of Historical Life Course Studies. It integrates the results of several IDS-meetings and workshops, amongst others  Chicago (2010), Boston (2011) and Vancouver (2012). Since 2011 the enterprise has become part of the European Historical Population Samples network, resulting in workshops in Umeå (2012) and Lund (2013).

Version 4 is accompanied by a new version of the meta data table, version 4.01, dated the 12th of June 2013. Changes of and additions to this table have to be reported to the Clearing committee (Luciana Quaranta, George Alter and Kees Mandemakers; kma@iisg.nl).

For more information about the scientific background of the  IDS read the article 'Defining and Distributing Longitudinal Historical Data in a General Way Through an Intermediate Structure' by George Alter, Kees Mandemakers and Myron Gutmann, published in Historical Social Research.