Configuration File Reference
The mejiro pipeline is driven entirely by a YAML configuration file. Example configurations live in mejiro/data/mejiro_config/ (e.g., simple.yaml, roman_test.yaml, roman_data_challenge_rung_1.yaml); the recommended starting point is to copy one and edit it.
This page documents every top-level section and attribute. Units are given only where they are documented in the YAML comments or in the code. Pipeline scripts referenced below all live in mejiro/pipeline/.
Global attributes
These keys sit at the top level of the YAML file.
data_dirBase directory under which all pipeline output is written. Required: if
null, the--data_dirCLI argument must be supplied to override it. The per-pipeline output directory is<data_dir>/<pipeline_label>.Used by
PipelineHelperto setpipeline_dirfor every pipeline script, and read directly inromanisim_pipeline.pyto locate input synthetic images and write romanisim output.pipeline_labelName of the pipeline run. Used to construct the output directory (
<data_dir>/<pipeline_label>/) and as a prefix for output filenames. Whendev: True,_devis appended.Used in every pipeline script via
PipelineHelper. In_01a_generate_galaxy_tables.pyand_01b_run_survey_simulation.pythe literal valuealltriggers loading of every supported instrument’s speclite filters.psf_cache_dirDirectory where cached STPSF PSFs are stored as
.npyfiles. May be a path relative todata_dir, an absolute path, ornull. Whennull,PipelineHelperdefaults tomejiro/data/psfs/<instrument>.Used in
_00_cache_psfs.pyas the write target, and in_01b_run_survey_simulation.py,_04_create_synthetic_images.py,_04_create_synthetic_images_interpol.py,_04_jax_create_synthetic_images.py,calculate_snrs.py, and_06_h5_export.pyas the read target when constructingkwargs_psf.devDevelopment-mode flag. When
True,_devis appended topipeline_labelso dev runs do not overwrite production output.Used by
PipelineHelperand explicitly byromanisim_pipeline.pyto set the pipeline directory.niceProcess
nicevalue applied to every pipeline worker. Defaults to0if absent.Set via
os.nice(...)inPipelineHelper.__init__.show_progress_barBoolean. When
True, tqdm bars are shown during long inner loops (e.g., the per-run candidate loops in_01b_run_survey_simulation.py).Used in
_01b_run_survey_simulation.pyto toggle the per-candidate progress bars.suppress_warningsBoolean. When
True,warnings.filterwarnings("ignore")is installed in each worker process.Used in
_01a_generate_galaxy_tables.pyand_01b_run_survey_simulation.pyinside the worker functions; also set globally inPipelineHelperto suppressUserWarning.logging_levelString passed to
logging.basicConfig(e.g.,INFO,WARNING,DEBUG). Defaults toINFOif absent.Set in
PipelineHelper.__init__.limitMaximum number of systems each script should process, or
nullfor no limit.Used in
_01b_run_survey_simulation.pyto cap the number of detectable lenses per run, in_02_build_lens_list.pyto short-circuit the lens-conversion loop, in_03_generate_subhalos.py,_04_create_synthetic_images.py,_04_create_synthetic_images_interpol.py,_04_jax_create_synthetic_images.py,_05_create_exposures.py, andcalculate_snrs.pyto subsample input pickles (sequentially with--sequential, otherwise vianp.random.choice), and inromanisim_pipeline.pyto subsample lens IDs per SCA.seedInteger global random seed. Defaults to
42when accessed viaconfig.get('seed', 42).Used in
_01a_generate_galaxy_tables.pyto seed the per-table draw viahash((seed, 'table', table_index)), in_01b_run_survey_simulation.pyfor the per-run draw viahash((seed, run)), in_03_generate_subhalos.pyto seed the substructure assignment mask, and inromanisim_pipeline.pyto derive per-batch RNGs (batch_seed = seed + sca_num * 10000 + band_idx * 1000 + batch_idx). Typically also referenced fromimaging.engine_params.rng_seedvia a YAML anchor.
cores
Per-script worker counts for ProcessPoolExecutor. Each key maps the script name to its worker count; PipelineHelper.calculate_process_count reads cores.script_<script_name>.
script_00Workers for
_00_cache_psfs.py.script_01aWorkers for
_01a_generate_galaxy_tables.py.script_01bWorkers for
_01b_run_survey_simulation.py.script_02Workers for
_02_build_lens_list.py(single-process in current scripts; included for completeness).script_03Workers for
_03_generate_subhalos.py.script_04Workers for
_04_create_synthetic_images.py(and its interpol/JAX variants).script_05Workers for
_05_create_exposures.py.script_05_romanisimWorkers for
romanisim_pipeline.py. Read directly asconfig['cores']['script_05_romanisim']; the per-worker thread count is derived asmax(2, 64 // num_workers).script_snrWorkers for
calculate_snrs.py.
jaxtronomy
Controls JAX-based acceleration for ray-shooting and image rendering.
use_jaxBoolean. When
True, lens objects are built withuse_jax=Trueso thatjaxtronomyis used in place oflenstronomyfor ray-shooting.Used in
_01b_run_survey_simulation.py,_02_build_lens_list.py,_03_generate_subhalos.py,_04_create_synthetic_images.py,_04_create_synthetic_images_interpol.py, and_04_jax_create_synthetic_images.py. In_04_create_synthetic_images.pyit also gates whetherJAX_PLATFORM_NAMEis exported into the worker environment.jax_platformString (
cpuorgpu) exported asJAX_PLATFORM_NAMEbefore JAX is imported. Defaults tocpu.Used in
_01b_run_survey_simulation.py,_04_create_synthetic_images.py,_04_create_synthetic_images_interpol.py, and_04_jax_create_synthetic_images.py.parallel_systems(Optional; only consumed by
_04_jax_create_synthetic_images.py.) Boolean. WhenTrue, systems are bucketed bylens_model_listsignature so that the first system in a bucket pays the JIT-warmup cost and the rest reuse the cached compilation. Defaults toFalse.batch_size(Optional; only consumed by
_04_jax_create_synthetic_images.py.) Integer cap on bucket size whenparallel_systemsis enabled — useful as a memory ceiling on GPU. Defaults to8.
instrument
Single string identifying the instrument: roman, jwst, or hwo. Lower-cased and validated against the calling script’s SUPPORTED_INSTRUMENTS by PipelineHelper. Drives loading of the appropriate mejiro.instruments class (Roman, JWST, or HWO).
survey
Parameters for the survey simulation (population draw, deflector/source cuts, source catalog wiring).
runsNumber of independent simulation runs. Each run gets a unique random seed and processes one galaxy table.
Stored as
pipeline.runsbyPipelineHelper; used in_01b_run_survey_simulation.pyto build the per-run task list (round-robining detectors and pre-computed galaxy tables).num_galaxy_tablesNumber of independent galaxy-population tables generated by
_01a_generate_galaxy_tables.py. Higher values give more intrinsic diversity but cost more compute. Defaults topipeline.runswhen absent.Used in
_01a_generate_galaxy_tables.pyto build the task list of tables to generate.speed_factorInteger
>= 1passed toslsim’sLensPop.draw_populationto speed up population draws at the cost of completeness. Defaults to1.Used in
_01b_run_survey_simulation.pyfor both the total-population and detectable-population draws.areaSurvey area in
deg2(the unit is asserted in_01b_run_survey_simulation.pyviaQuantity(value=area, unit='deg2')). Must match thefskyvalue of the chosen SkyPy config.Used in
_01a_generate_galaxy_tables.pyto buildsky_areafor the SkyPy and SLHammocks pipelines, and in_01b_run_survey_simulation.pyfor the same purpose plus the per-square-degree lens density logged after the survey.skypy_configName of the SkyPy configuration file (without extension) under
mejiro/data/skypy/<skypy_config>/. For Roman,_01a_generate_galaxy_tables.pyappends the lower-cased SCA string to find the per-detector YAML.Used in
_01a_generate_galaxy_tables.pyto locate the SkyPy config that drives the galaxy-population draw.write_to_csvBoolean. When
True,_01b_run_survey_simulation.pyexports per-run population tables (total_pop_<run_id>.csvanddetectable_pop_<run_id>.csv) viaslsim_util.write_lens_population_to_csv.total_populationBoolean. When
True,_01b_run_survey_simulation.pyadditionally draws the full (pre-cut) lens population, computes SNRs for every candidate, and (ifwrite_to_csvis alsoTrue) writes the result to CSV.use_real_sourcesBoolean. When
True,_01b_run_survey_simulation.pyconstructs the source population withextended_source_type='catalog_source'and forwardscatalog_source_kwargsto slsim’ssources.Galaxies.catalog_source_kwargsDict forwarded under the key
extended_source_kwargstoslsim.Sources.Galaxies(and thence toslsim.Lenses.lens_pop.LensPop). See the upstream slsim documentation for the accepted keys (the example configs usecatalog_path,catalog_type,sersic_fallback,max_scale).Used in
_01b_run_survey_simulation.pyonly whenuse_real_sourcesisTrue.use_slhammocks_pipelineBoolean. When
True,_01a_generate_galaxy_tables.pyalso runsSLHammocksPipelineto draw a dark-matter halo catalog (halo_galaxies), and_01b_run_survey_simulation.pyreconstructs the deflector population fromdeflectors.CompoundLensHalosGalaxiesrather thandeflectors.AllLensGalaxies.slhammocks_pipeline_kwargsDict of keyword arguments forwarded to
slsim.Pipelines.sl_hammocks_pipeline.SLHammocksPipeline. See the upstream slsim documentation for accepted keys.Used in
_01a_generate_galaxy_tables.py. Note thatskypy_configinside this dict is rewritten in-place to an absolute path undermejiro/data/skypy/slhammocks/(for Roman, the per-detector variant is selected).detectorsList of detector IDs (for Roman, SCAs 1-18). Stored as
pipeline.detectorsbyPipelineHelperand used to round-robin runs across detectors in_01a_generate_galaxy_tables.pyand_01b_run_survey_simulation.py. Typically referenced via a YAML anchor (&detectors) so the same list is reused underpsf.detectors.bandsList of photometric bands for which SkyPy computes magnitudes. These magnitudes are stored on the resulting strong-lens objects (
physical_params).Used in
_01b_run_survey_simulation.pyfor SNR-candidate construction and CSV export, and in_02_build_lens_list.pywhen callingGalaxyGalaxy.from_slsim(..., bands=bands).remap_bands(Optional; only present in the rung-1 challenge configs.) Dict of
{destination_band: source_band}pairs that override which catalog cutout backs each band insource_images. Self-mappings are no-ops; omit a band to leave its default in place.Used in
_02_build_lens_list.py: when present,slsim_util.remap_source_imagesis called on each constructed lens. Defaults toNone(no remapping).deflector_cut_band,deflector_cut_band_maxBand and maximum magnitude used to cut the deflector population before lens drawing. Passed to slsim as
kwargs_deflector_cut={'band': ..., 'band_max': ...}.Used in
_01b_run_survey_simulation.py.deflector_z_min,deflector_z_maxMinimum and maximum deflector redshift. Used in
_01a_generate_galaxy_tables.pyfor the SLHammocks pipeline and in_01b_run_survey_simulation.pyforkwargs_deflector_cut.source_cut_band,source_cut_band_maxBand and maximum magnitude used to cut the source population. Passed to slsim as
kwargs_source_cut={'band': ..., 'band_max': ...}.Used in
_01b_run_survey_simulation.py.source_z_min,source_z_maxMinimum and maximum source redshift. Used in
_01b_run_survey_simulation.pyforkwargs_source_cut.min_image_separation,max_image_separationMinimum and maximum image separation in arcseconds (units documented in the YAML inline comments). Combined into
kwargs_lens_detectable_cutand passed toLensPop.draw_populationin_01b_run_survey_simulation.py.mag_arc_limit_band,mag_arc_limitBand and maximum arc magnitude. Combined as
{mag_arc_limit_band: mag_arc_limit}and added tokwargs_lens_detectable_cutin_01b_run_survey_simulation.py.magnificationMinimum total magnification required of a candidate. Lenses with
strong_lens.physical_params['magnification']below this value are skipped before the (expensive) SNR check in_01b_run_survey_simulation.py.
subhalos
Dark-matter substructure parameters. Consumed by _03_generate_subhalos.py.
fractionFraction of detectable systems (in
[0, 1]) that receive a substructure realization. When< 1.0,_03_generate_subhalos.pyseeds annp.randommask withseedso the choice of which systems get substructure is reproducible.pyhalo_modelString name of the pyHalo preset model (e.g.,
CDM). Resolved viapyHalo.preset_models.preset_model_from_nameand called withrealization_kwargs.Used in
_03_generate_subhalos.py.realization_kwargsDict forwarded as
**realization_kwargsto the pyHalo preset-model constructor. See the upstream pyHalo documentation for the accepted keys. The example configs includelog_mlow,log_mhigh,LOS_normalization,concentration_model_subhalos,concentration_model_fieldhalos,shmf_log_slope, and (Roman-only)r_tidal,sigma_sub(see Section 3.1 and Section 6.3 of Gilman et al. 2020).Used in
_03_generate_subhalos.py. Ifcone_opening_angle_arcsecis not present,_03_generate_subhalos.pyinjects it aslens.get_einstein_radius() * 3. Also written verbatim as per-system HDF5 attributes in_06_h5_export.py(each key carries the description"See pyHalo documentation").
synthetic_image
Parameters for rendering idealized (PSF-convolved, noise-free) images. Each value here is forwarded to mejiro.synthetic_image.SyntheticImage.
bandsList of photometric bands for which to render synthetic images. May be a subset of
survey.bands.Used in
_04_create_synthetic_images.py,_04_create_synthetic_images_interpol.py,_04_jax_create_synthetic_images.py,romanisim_pipeline.py, and_06_h5_export.py.fov_arcsecField of view in arcseconds (documented in the
SyntheticImagedocstring). Forwarded toSyntheticImage(fov_arcsec=...).Used in
_04_create_synthetic_images.py,_04_create_synthetic_images_interpol.py, and_04_jax_create_synthetic_images.py.supersampling_compute_modeString forwarded to lenstronomy’s
kwargs_numericsascompute_mode(e.g.,adaptive).Used in
_04_create_synthetic_images.py,_04_create_synthetic_images_interpol.py, and_04_jax_create_synthetic_images.py.supersampling_factorInteger supersampling factor forwarded to lenstronomy’s
kwargs_numerics. Typically referenced via a YAML anchor (&supersampling_factor) sopsf.oversamplescan include the same value.Used in
_04_create_synthetic_images.py,_04_create_synthetic_images_interpol.py,_04_jax_create_synthetic_images.py, and (for HDF5 metadata)_06_h5_export.py.piecesBoolean forwarded to
SyntheticImage(pieces=...). WhenTrue, lens and source surface brightness are computed and stored separately.Used in
_04_create_synthetic_images.py,_04_create_synthetic_images_interpol.py, and_04_jax_create_synthetic_images.py.serializationString selecting the on-disk format for each
SyntheticImagewritten by step 04. One of:full(default): pickle the entireSyntheticImage(including the embeddedStrongLensand any pyhalo realization). Required by the galsim path (_05_create_exposures.py) and by analysis scripts that need the full lens model (e.g.,projects/roman_data_challenge/substructure_snr_histogram.py).lightweight: write a compact.npzper (system, band) containing the image asfloat32plus a JSON metadata blob with only what the romanisim path consumes downstream (redshifts, Einstein radius, per-band magnitudes, detector position,magnitude_zeropoint, etc.). Roughly 20× smaller per (system, band) thanfull; incompatible with the galsim path. Loaded transparently bymejiro.utils.util.load_synthetic_image, which auto-detects the extension and returns amejiro.synthetic_image.LightweightSyntheticImageshim.
Used in
_04_create_synthetic_images.pyand_04_jax_create_synthetic_images.py(writers); inromanisim_pipeline.py,_06_h5_export_romanisim.py,projects/roman_data_challenge/rung_1.py, andcalculate_snrs.py(readers, via the unified loader)._05_create_exposures.pyraises at startup whenserialization == 'lightweight'because the galsim engine requires the fullSyntheticImage.
exposure
(Roman only.) Romanisim observation metadata. Consumed by romanisim_pipeline.py.
ma_table_numberInteger Multi-Accumulation (MA) table number. Indexes
romanisim.parameters.read_patternto determine the read pattern (and hence the total exposure time,parameters.read_time * read_pattern[-1][-1]); also written tometa['exposure']['ma_table_number']for the simulated observation.dateObservation date as an ISO-8601 string (e.g.,
2027-04-15T00:00:00). Converted toastropy.time.Timeand assigned tometa['exposure']['start_time'].coordinates.ra,coordinates.decPointing right ascension and declination.
romanisim_pipeline.pyconstructsSkyCoord(ra=ra * u.deg, dec=dec * u.deg), so the values are interpreted in degrees.
imaging
Parameters for the detector-effects step. Consumed by _05_create_exposures.py and calculate_snrs.py.
exposure_timeExposure time in seconds (documented in the
Exposuredocstring). Typically referenced via a YAML anchor (&exposure_time) sosnr.snr_exposure_timecan reuse it.Used in
_05_create_exposures.pyto build eachExposure.engineString selecting the detector-effects engine, e.g.,
galsim(see Engines for the available engines).Used in
_05_create_exposures.pyand incalculate_snrs.pyfor the SNR-rebuild path.engine_paramsDict of engine-specific parameters forwarded to the simulation engine. See
mejiro.exposure.Exposureand the engine modules undermejiro.enginesfor the accepted keys per engine. For the GalSim Roman engine, the example configs includerng_seed,min_zodi_factor, and boolean togglessky_background,detector_effects,poisson_noise,reciprocity_failure,dark_noise,nonlinearity,ipc,read_noise.Used in
_01b_run_survey_simulation.pyfor the SNR-detectionExposure, in_05_create_exposures.pyfor the productionExposure, and incalculate_snrs.pyfor the SNR-rebuildExposure.
snr
Parameters for SNR-based detectability cuts (in _01b_run_survey_simulation.py) and for the standalone SNR calculation (calculate_snrs.py).
snr_bandBand used to render the SNR-evaluation
SyntheticImage.Used in
_01b_run_survey_simulation.py.snr_exposure_timeExposure time in seconds for the SNR
Exposure. Typically a YAML reference (*exposure_time) toimaging.exposure_time.Used in
_01b_run_survey_simulation.pyandcalculate_snrs.py.snr_fov_arcsecField of view in arcseconds for the SNR
SyntheticImage.Used in
_01b_run_survey_simulation.pyandcalculate_snrs.py.snr_supersampling_compute_modecompute_modeforkwargs_numericsof the SNRSyntheticImage.Used in
_01b_run_survey_simulation.pyandcalculate_snrs.py.snr_supersampling_factorsupersampling_factorforkwargs_numericsof the SNRSyntheticImage. Typically referenced via a YAML anchor (&snr_supersampling_factor) sopsf.oversamplescan include the same value.Used in
_01b_run_survey_simulation.pyandcalculate_snrs.py.snr_thresholdMinimum total-system SNR for a candidate to be considered detectable. Applied in
_01b_run_survey_simulation.pyafter the per-pixel SNR calculation.snr_per_pixel_thresholdPer-pixel SNR threshold passed to
snr_calculation.get_snrandExposure.get_snr.Used in
_01b_run_survey_simulation.py,_06_h5_export.py, andcalculate_snrs.py.snr_detector_position(Optional.) Tuple
[x, y]giving the detector position used when building the SNR PSF. Defaults to(2554, 2554)(the detector center) when absent.Used in
_01b_run_survey_simulation.py.
psf
PSF cache and generation parameters. Consumed primarily by _00_cache_psfs.py (Roman) and read elsewhere when constructing kwargs_psf.
bandsList of bands for which PSFs are pre-computed.
Used in
_00_cache_psfs.py.oversamplesList of integer oversampling factors for which PSFs are pre-computed. Typically the YAML references
[*snr_supersampling_factor, *supersampling_factor]so both the SNR and synthetic-image paths find cached PSFs.Used in
_00_cache_psfs.py.num_pixesList of PSF array sizes in pixels. The first element is used by the pipeline when constructing
kwargs_psf(the additional elements are still pre-computed by_00_cache_psfs.pyso that ad-hoc analysis scripts can request other sizes from the cache).Used in
_00_cache_psfs.pyto enumerate PSFs to generate, and in_01b_run_survey_simulation.py,_04_create_synthetic_images.py,_04_create_synthetic_images_interpol.py,_04_jax_create_synthetic_images.py,calculate_snrs.py, and_06_h5_export.pyasnum_pixes[0]for the active PSF.detectorsList of detector IDs for which PSFs are pre-computed. Typically references
*detectorsso it trackssurvey.detectors.Used in
_00_cache_psfs.pyand_06_h5_export.py.divide_up_detectorInteger
Ncontrolling how each detector is divided into anN x Ngrid of PSF-evaluation positions (e.g.,5-> 25 positions per detector). Roman-specific;psf_config.get('divide_up_detector')isNonefor HWO.Used in
_00_cache_psfs.py(viaroman_util.divide_up_sca), in_04_create_synthetic_images.py,_04_create_synthetic_images_interpol.py, and_04_jax_create_synthetic_images.pyto pick a random detector position per image, in_06_h5_export.pyto enumerate positions for the PSF HDF5 dataset, and inromanisim_pipeline.pyto define the PSF buckets that group tiles within each 4088x4088 detector image (asserted to divide bothGRID_SIDEand4088).
dataset
Output-dataset options. Consumed by _06_h5_export.py.
versionDataset version string. Embedded in the HDF5 filename as
<pipeline_label>_v_<version>.h5(with.replaced by_) and written to the file-level attributedataset_version.labeled(Challenge-only attribute, present in the
roman_data_challenge_*configs.) Boolean indicating whether the exported dataset includes ground-truth labels. Read by downstream consumers of the Roman Data Challenge dataset; not consumed by the pipeline scripts inmejiro/pipeline/themselves.include_psfsBoolean. When
True,_06_h5_export.pyadds apsfsgroup to the HDF5 file containing cached PSFs for every (detector, position, band) combination.include_synthetic_imagesBoolean. When
True,_06_h5_export.pywrites the band-matched synthetic-image arrays alongside eachExposuredataset.