Updating
January 1, 2022
This is just a reminder to me. There are so many steps to keeping a model like this up-to-date. And no real way of automating the process, sadly.
This tries to bring together everything in the convoluted steps needed to add a year’s worth of data to the model (FRS/SHS), uprate the variables and create a new weighting target dataset.
This is convoluted. I don’t remember Taxben being this hard. Ideally I’d automate much of this - I had a brief go using some of the APIs for grabbing ONS data, but didn’t get far.
Also, paths and filenames are often hard-wired in: add a paths config file.
Always keep running the test suite while you’re doing any of this.
The struct UpdatingInfo
in Definitions.jl
holds static info on when each component below was last updated and should be kept up-to-date.
File paths are held as constants in Definitions.jl
.
Raw survey data from UK Data Service (login in keypass). Unpack these into the $RAW_DATA/[dataset]/[year]
directories and add simlinks to tab
and mrdoc
directories.
- bad thing: the
.rtf
format we needed for loading the documentation into thedictionaries
Postgres database isn’t there anymore. TODO document this db and the Ruby code somewhere and add parsing from.html
files in mrdoc.
General usage.
In what follows I assume using the repl plus Revise. Starting julia in the ScottishTaxBenefitModel
home directory.
1) load a script file:
] activate .
using Revise
includet( "src/[yourfiles])
2) load tests:
] activate .
] test
3) load specific test:
] activate .
using Revise, ScottishTaxBenefitModel
includet( "test/testutils.jl")
includet( "test/[your specific test"])
1. ADDING a new FRS/HBAI
Code is HouseholdMappingFRS_HBAI.jl
(note: not a package). Check the .tab
files and .docs
carefully:
- benefit, income codes may change. Check against enums in
Definitions.jl
; - 2020 FRS
.tab
version had a missing tab which caused wrongly labelled data (See emails to UKDS June-Aug 2022). - date ranges are hard-wired into
HouseholdMappingFRS_HBAI.jl
and need manually changed.
Note that we use HBAI for optional SPI
’ wage and self-employment data so we can only add a year when the HBAI is released.
Note paths wired in to Definitions.jl
.
Then, run create_data()
. This creates a full UK-wide dataset. Run scripts/create_scottish_subset
with ADD_IN_MATCHING
set to false
(initially) to create just scottish bit. ADD_IN_MATCHING
needs to be false
until step (2) below.
2. Matching in a new SHS
Unpack new SHS as above. The matching code is an unholy mess.
In matching/
:
matching_funcs.jl
- librarymatching.jl
- driver code. Note this also has year ranges hardwired in at the top which need manually changed. You may need to execute the code inmatching.jl
in stages as it makes successively coarsened matches. Also some crude hhld totals close to the bottom used as consistency checks - these need to be updated each year (or just deleted). TODO output of this should be lists of candidate SHS donor households.
3. Creating a target weighting dataset
Directory (for 2022) data/targets/aug-2022-updates/
; create something similar for each year. Main workfile is target_generation.ods
which attempts to get counts of people, households, employment, etc. consistent.
Output is (for 2022) at 90 piece target set. Sources:
- NOMIS Standard Scottish Report - employment, social class. These numbers are adjusted manually to match entire popn totals (shouldn’t be different but are);
- Stat XPLORE - benefits (in payment);
- National Records Scotland (NRS) Household Projections. 2018 based data. 2022 projections. Note we use the non-household pcts from
2018-house-proj-source-data-alltabs.xlsx
for scaling down population just to household based popn (excluding students in halls, those in care homes, etc.); - NRS - Housing Stock by Tenure - scaled up to 2022 hhld projections
- NRS Population forecasts. NOte we scale by NRS estimates of Scotland-level proportion of populations in households. TODO: Glasgow,Edinburgh have huge student 16-21 population in halls but we only have hhld counts by LA so are ignoring this.
All this has to be merged together manually on any update, I’m afraid. Note how we change the standard age ranges 10-14 and 15-19 to 10-15 16-19 to better mesh with employment data. Note how everything needs to be scaled to match 2022 hhld/population numbers (popn is all or hhld depending on the question - see the spreadsheet).
4. Uprating
Main uprating file is data/prices/indexes/indexes.tab
. Uprating code is Uprating.jl
; filenames and uprating targets in Settings.jl
. Sources are as in indexes.tab
header rows. Indexes are quarterly. Sources:
- [OBR Economic and fiscal outlook – March 2022: Supplementary Data]https://obr.uk/data/);
- Scottish Fiscal Commission Forecast
FIXME this needs updating urgently.
5. Benefits
There are 3 things here: numbers for the transition to UC, estimates of how many on legacy disability benefits we should move to new benefits and some probits we use to model generosity of disability tests.
5.1 The Legacy/UC transition
This is done very, very crudely using House of Commons Data. We use Scotland-wide approximations, which are then hard-wired into UCTransition.jl
. We could use LA level if someone still produced this (HoC is constituency). Can’t be bothered trying myself.
5.2 Model Transitions to new disable/carer benefits
Code is HistoricBenefits.jl
. It re-assigns DLA recipients to PIP according to proportions on each in the interview month for Scotland as a whole.
Data files are:
data/receipts/[pip|dla]_2002-2020_from_stat_explore.csv
To update these, randomly press buttons on STat Explore until something comes out - DLA/PIP in receipt, including devolved to Scotland, current tables. Note I have a saved table format for PIP. Export as .xlsx
. Transpose in open office to same format as data/receipts/pip_2002-2020_from_stat_explore.csv
. Change filename in HistoricBenefits.jl
.
You also need to update params/historic_benefits.csv
; see section on updating parameters below.
5.3 Benefit Generosity
Main script is regressions/disability_regressions.jl
Creates candidates
files in data/disability/
If the data has been created correctly, just running the script should create these files automatically. A data year dummy for the new year’s data should be automatically added.
6. Adding new default parameters
Most of the individual level tests are based on the system when I started, using the 2020/21 values hard-wired into the parameter definitions using the @with_kw Macro. So, don’t alter the defaults there. Instead, copy
sys_2022-23.jl
and update that. This can be loaded using get_default_system_for_date
in STBParameters.
6.1 Direct Taxes
Note that it’s best to get an updated version of Melville’s Taxation for a consolidated set of parameters and test examples.
But use Mellvile.
6.2 UK Benefits
Only place I know with everything in one place is the CPAG Guide.
But:
6.3 Scottish Benefits
See here. Notes:
- some implementations are incomplete (TODO);
- link may not be permanent.
6.4 Local Housing Allowances
2022/3 values are here.
Note the BRMA definitions are treated as constants.
” for this year (2022-23) all rates have been frozen at the rate last determined on 31st March 2020. This was the 30th percentile at that time.”
So I’ll skip changing this for now.
6.5 Council Tax
[CHECK WALES PROJECT]
This needs parameterised better.
Default alues are hard-wired into default_band_ds
function in STBParameters.
Example loading new values in sys_2022-23.jl
, at the bottom. Values from ScotGov CT Datasets. We just need the band Ds here so long as the relativities don’t change.
7. Updating Tests
7.1 Individual Level Unit Tests
- there are places (e.g.
uprating_tests.jl
,historic_tests.jl
) where I’ve hard wired in test values that will change on each updating. I’m trying to mark all of these with “CHANGEME”.