Tested on the Scanian Economic Demographic Database
Rectangular file where each row represents a unique combination of Id_I and date, and each variable has been transposed into a column.
“Elisodes_file_time_string”. The time string contains the date and time when the final file is saved.
Id_I, date1 (start of spell), date2 (end of spell), and all variables included in the extraction.
The extraction must contain a variable specifying the period at risk. The program excludes all rows when the individual is not at risk from the final episodes table.
None. It is possible to add labels to categorical variables. Labels should be stored in an external file containing the fields Type, Value and ValueLabel.
The file can be specified in the syntax when running the program, as shown below. Using labels is optional
This program was developed for the IDS and EIDS. It converts a data extraction into a rectangular episodes table that is ready for statistical analysis.
It uses in input the file Chronicle.dta, containing the data extraction, and the file VarSetup.dta, containing information relative to such extraction. These files must be saved in the working directory.
Each date of change in any of the variables considered in the analysis must be included in the Chronicle file. The Chronicle file should contain the columns ID_I, Type, Value, Day, Month, Year and DayFrac (used to adjust date collisions). The Chronicle file must also contain a variable which defines the period in which the individual is at risk (specified in the syntax below), with value 1 when the individual first becomes at risk and Value 0 when the individual last stops being at risk. Gaps in the period at risk can also be defined (0 on the date of exit and 1 on the date of return). All spells when the individual is not at risk are deleted from the episodes table by the program.
Information relating to the variables included in the extraction must be included in the Variable setup file, which should contain the fields Type, Transition and Duration. Transition distinguishes between events (which change value at the end of a spell: Transition = End), time-varying variables (which change value at the start of a spell: Transition = Start) and time-invariant variables (Transition = Invariant). Duration distinguishes whether the Values of a Type are valid only on their date of declaration (Duration = Instant) or between a date of declaration and the next date of declaration/End_date (Duration = Continuous).
When creating an episodes table, labels can be added by the program to Values of categorical numerical variables. Labels should be stored in an external file containing the fields Type, Value and ValueLabel.
The file can be specified in the syntax when running the program, as shown below. Using labels is optional.
The program rectangularises time-varying and time-invariant variables and events and creates spells, based on the dates of change in any of these variables. In the last steps the program formats the episodes file based on the information specified in the Variable setup file.
The output of the program is saved with the name Elisodes_file_time_string.dta (the time string contains the date and time when the final file is saved).
Part 1.- Reading and preparation of the variable setup file.
Part 2.- Reading and preparation the chronicle file.
Part 3.- Checking that chronicle and variable setup files contain the same types.
Part 4.- Rectangularization of time-varying variables
Part 5.- Rectangularization of time-invariant variables
Part 6.- Rectangularization of events
Part 7.- Construction of spells
Part 8.- Formatting of the episodes file
Part 9.- Dropping of spells when the individual is not at risk
a. Test data: DemoDatabase (see related documents)
b. Validation tests: The program has been tested using the DemoDatabase and the SEDD.
Chronicle and Variable setup files (see related documents).
Any (source or created variables); must contain a variable specifying the period at risk.
Day, Month, Year, DayFrac. DayFrac is optional and is used to handle date collisions (more than one Value of the same Type on the same date).