Tutorial#

Here is a guide to running Prospector fits from the command line using parameter files, and working with the output. This is a generalization of the techniques demonstrated in the quickstart, with more detailed descriptions of how each of the ingredients works.

We assume you have installed Prospector and all its dependencies as laid out in the docs. The next thing you need to do is make a temporary work directory, <workdir>

cd <workdir>
cp <codedir>/demo/demo_* .

We now have a parameter file or two, and some data. Take a look at the demo_photometry.dat file in an editor, you’ll see it is a simple ascii file, with a few rows and several columns. Each row is a different galaxy, each column is a different piece of information about that galaxy.

This is just an example. In practice Prospector can work with a wide variety of data types.

The parameter file#

Open up demo_params.py in an editor, preferably one with syntax highlighting. You’ll see that it’s a python file. It includes some imports, a number of methods that build the ingredients for the fitting, and then an executable portion.

Executable Script

The executable portion of the parameter file that comes after the if __name__ == "__main__" line is run when the parameter file is called. Here the possible command line arguments and their default values are defined, including any custom arguments that you might add. In this example we have added several command line arguments that control how the data is read and how the The supplied command line arguments are then parsed and placed in a dictionary. This dictionary is passed to all the ingredient building methods (described below), which return the data dictionary and necessary model objects. The data dictionary and model objects are passed to a function that runs the prospector fit (prospect.fitting.fit_model()). Finally, the fit results are written to an output file.

Building the fit ingredients: build_model

Several methods must be defined in the parameter file to build the ingredients for the fit. The purpose of these functions and their required output are described here. You will want to modify some of these for your specific model and data. Note that each of these functions will be passed a dictionary of command line arguments. These command line arguments, including any you add to the command line parser in the executable portion of the script, can therefore be used to control the behaviour of the ingredient building functions. For example, a custom command line argument can be used to control the type of model that is fit, or how or from where the data is loaded.

First, the build_model() function is where the model that we will fit will be constructed. The specific model that you choose to construct depends on your data and your scientific question.

We have to specify a dictionary or list of model parameter specifications (see Models). Each specification is a dictionary that describes a single parameter. We can build the model by adjusting predefined sets of model parameter specifications, stored in the models.templates.TemplateLibrary dictionary-like object. In this example we choose the "parametric_sfh" set, which has the parameters necessary for a vasic delay-tau SFH fit with simple attenuation by a dust screen. This parameter set can be inspected in any of the following ways

from prospect.models.templates import TemplateLibrary, describe
# Show basic description of all pre-defined parameter sets
TemplateLibrary.show_contents()
# method 1: print the whole dictionary of dictionaries
model_params = TemplateLibrary["parametric_sfh"]
print(model_params)
# Method 2: show a prettier summary of the free and fixed parameters
print(describe(model_params))

You’ll see that this model has 5 free parameters. Any parameter with "isfree": True in its specification will be varied during the fit. We have set priors on these parameters, visible as e.g. model_params["mass"]["prior"]. You may wish to change the default priors for your particular science case, using the prior objects in the models.priors module. An example of adjusting the priors for several parameters is given in the build_model() method in demo_params.py. Any free parameter must have an associated prior. Other parameters have their value set to the value of the "init" key, but do not vary during the fit. They can be made to vary by setting "isfree": True and specifying a prior. Parameters not listed here will be set to their default values. Typically this means default values in the fsps.StellarPopulation object; see python-fsps for details. Once you get a set of parameters from the TemplateLibrary you can modify or add parameter specifications. Since model_params is a dictionary (of dictionaries), you can update it with other parameter set dictionaries from the TemplateLibrary.

Finally, the build_model() function takes the model_params dictionary or list that you build and uses it to instantiate a SedModel object.

from prospect.models import SedModel
model_params = TemplateLibrary["parametric_sfh"]
# Turn on nebular emission and add associated parameters
model_params.update(TemplateLibrary["nebular"])
model_params["gas_logu"]["isfree"] = True
model = SedModel(model_params)
print(model)

If you wanted to change the specification of the model using custom command line arguments, you could do it in build_model() by allowing this function to take keyword arguments with the same name as the custom command line argument. This can be useful for example to set the initial value of the redshift "zred" on an object-by-object basis. Such an example is shown in demo_params.py, which also uses command line arguments to control whether nebular and/or dust emission parameters are added to the model.

Building the fit ingredients: build_obs

The next thing to look at is the build_obs() function. This is where you take the data from whatever format you have and put it into the dictionary format required by Prospector for a single object. This means you will have to modify this function heavily for your own use. But it also means you can use your existing data formats.

Right now, the build_obs() function just reads ascii data from a file, picks out a row (corresponding to the photometry of a single galaxy), and then makes a dictionary using data in that row. You’ll note that both the datafile name and the object number are keyword arguments to this function. That means they can be set at execution time on the command line, by also including those variables in the run_params dictionary. We’ll see an example later.

When you write your own build_obs() function, you can add all sorts of keyword arguments that control its output (for example, an object name or ID number that can be used to choose or find a single object in your data file). You can also import helper functions and modules. These can be either things like astropy, h5py, and sqlite or your own project specific modules and functions. As long as the output dictionary is in the right format (see dataformat.rst), the body of this function can do anything.

Building the fit ingredients: the rest

Ok, now we go to the build_sps() function. This one is pretty straightforward, it simply instantiates our sources.CSPSpecBasis object. For nonparameteric fits one would use the sources.FastStepBasis object. These objects hold all the spectral libraries and produce an SED given a set of parameters. After that is build_noise(), which is for complexifying the noise model – ignore that for now.

Running a fit#

There are two kinds of fitting packages that can be used with Prospector. The first is emcee which implements ensemble MCMC sampling, and the second is dynesty, which implements dynamic nested sampling. It is also possible to perform optimization. If emcee is used, the result of the optimization will be used to initalize the ensemble of walkers.

The choice of which fitting algorithms to use is based on command line flags (--optimization, --emcee, and --dynesty.) If no flags are set the model and data objects will be generated and stored in the output file, but no fitting will take place. To run the fit on object number 0 using emcee after an initial optimization, we would do the following at the command line

python demo_params.py --objid=0 --emcee --optimize \
--outfile=demo_obj0_emcee

If we wanted to change something about the MCMC parameters, or fit a different object, we could also do that at the command line

python demo_params.py --objid=1 --emcee --optimize \
--outfile=demo_obj1_emcee --nwalkers=32 --niter=1024

And if we want to use nested sampling with dynesty we would do the following

python demo_params.py --objid=0  --dynesty \
--outfile=demo_obj0_dynesty

Finally, it is sometimes useful to run the script from the interpreter to do some checks. This is best done with the IPython enhanced interactive python.

ipython
In [1]: %run demo_params.py --objid=0 --debug=True

You can then inspect the obsdat dictionary, the model object, and the run_params dictionary to make sure everything is working fine.

To see the full list of available command-line options, you can run the following

python demo_params.py --help

Working with the output#

After the fit is completed we should have a file with a name like demo_obj0_<fitter>_<timestamp>_mcmc.h5. This is an HDF5 file containing sampling results and various configuration data, as well as the observational data that was fit. By setting run_params["output_pickles"]=True you can also output versions of this information in the less portable pickle format. We will read the HDF5 with python and make some plots using utilities in Prospector

To read the data back in from the output files that we’ve generated, use methods in prospect.io.read_results.

import prospect.io.read_results as reader
res, obs, model = reader.results_from("demo_obj_<fitter>_<timestamp>_mcmc.h5")

The res object is a dictionary containing various useful results. You can look at res.keys() to see a list of what it contains. The obs object is just the obs dictionary that was used in the fitting. The model object is the model object that was used in the fitting.

Diagnostic plots

There are also some methods in this module for basic diagnostic plots. The subcorner method requires that you have the corner package installed. It’s possible now to examine the traces (i.e. the evolution of parameter value with MC iteration) and the posterior PDFs for the parameters.

# Trace plots
tfig = reader.traceplot(res)
# Corner figure of posterior PDFs
cfig = reader.subcorner(res)

Working with samples

If you want to get the maximum a posteriori sample, or percentiles of the posterior pdf, that can be done as follows (note that for dynesty the weights of each posterior sample must be taken into account when calculating quantiles) :

# Maximum posterior probability sample
imax = np.argmax(res['lnprobability'])
csz = res["chain"].shape
if res["chain"].ndim > 2:
    # emcee
    i, j = np.unravel_index(imax, res['lnprobability'].shape)
    theta_max = res['chain'][i, j, :].copy()
    flatchain = res["chain"].reshape(csz[0] * csz[1], csz[2])
        else:
    # dynesty
    theta_max = res['chain'][imax, :].copy()
    flatchain = res["chain"]

# 16th, 50th, and 84th percentiles of the posterior
from prospect.plotting.corner import quantile
weights = res.get("weights", None)
post_pcts = quantile(flatchain.T, q=[0.16, 0.50, 0.84], weights=weights)

Stored “best-fit” model

Further, the prediction of the data for the MAP posterior sample may be stored for you.

# Plot the stored maximum ln-probability sample
import matplotlib.pyplot as pl

best = res["bestfit"]
a = model.params["zred"] + 1
pl.plot(a * best["restframe_wavelengths"], best['spectrum'], label="MAP spectrum")
if obs['filters'] is not None:
    pwave = [f.wave_effective for f in obs["filters"]]
    pl.plot(pwave, best['photometry'], label="MAP photometry")
    pl.set_title(best["parameter"])

This stored best-fit information is only available if the sps object was passed to the write_hdf5() after the fit is run. If it isn’t available, you can regnerate the model predictions for the highest probability sample using the approach below.

Regenerating Model predictions

If necessary, one can regenerate models at any position in the posterior chain. This requires that we have the sps object used in the fitting to generate models, which we can regenerate using the read_results.get_sps() method.

# We need the correct sps object to generate models
sps = reader.get_sps(res)

Now we will choose a specific parameter value from the chain and plot what the observations and the model look like, as well as the uncertainty normalized residual. For emcee results we will use the last iteration of the first walker, while for dynesty results we will just use the last sample in the chain.

# Choose the walker and iteration number by hand.
walker, iteration = 0, -1
if res["chain"].ndim > 2:
    # if you used emcee for the inference
    theta = res['chain'][walker, iteration, :]
else:
    # if you used dynesty
    theta = res['chain'][iteration, :]

# Or get a fair sample from the posterior
from prospect.plotting.utils import sample_posterior
theta = sample_posterior(res["chain"], weights=res.get("weights", None), nsample=1)[0,:]

# Get the modeled spectra and photometry.
# These have the same shape as the obs['spectrum'] and obs['maggies'] arrays.
spec, phot, mfrac = model.predict(theta, obs=res['obs'], sps=sps)
# mfrac is the ratio of the surviving stellar mass to the formed mass (the ``"mass"`` parameter).

# Plot the model SED
import matplotlib.pyplot as pl
wave = [f.wave_effective for f in res['obs']['filters']]
sedfig, sedax = pl.subplots()
sedax.plot(wave, res['obs']['maggies'], '-o', label='Observations')
sedax.plot(wave, phot, '-o', label='Model at {},{}'.format(walker, iteration))
sedax.set_ylabel("Maggies")
sedax.set_xlabel("wavelength")
sedax.set_xscale('log')

# Plot residuals for this walker and iteration
chifig, chiax = pl.subplots()
chi = (res['obs']['maggies'] - phot) / res['obs']['maggies_unc']
chiax.plot(wave, chi, 'o')
chiax.set_ylabel("Chi")
chiax.set_xlabel("wavelength")
chiax.set_xscale('log')