Data Formats#
The Observation class#
Prospector expects the data in the form of list of Observations, preferably
returned by build_obs() (see below). Each Observation instance
corresponds to a single dataset, and is basically a namespace that also supports
dict-like accessing of important attributes. In addition to holding data and
uncertainties thereon, they tell Prospector what data to predict, contain
dataset-specific information for how to predict that data, and can even store
methods for computing likelihoods in the case of complicated, dataset-specific
noise models.
There are two fundamental kinds of data, Photometry and
Spectrum that are each sub-classes of Observation. There
is also also a Lines class for integrated emission line fluxes. They
have the following attributes, most of which can also be accessed as dictionary
keys:
wavelengthThe wavelength vector for a Spectrum` or the effective wavelengths of the filters in a Photometry data set, ndarray. Units are vacuum Angstroms. Generally these should be observed frame wavelengths.
fluxThe flux vector for a
Spectrum, or the broadband fluxes forPhotometryndarray of same length as the wavelength vector.For Photometry the units are maggies. Maggies are a linear flux density unit defined as \({\rm maggie} = 10^{-0.4 \, m_{AB}}\) where \(m_{AB}\) is the AB apparent magnitude. That is, 1 maggie is the flux density in Janskys divided by 3631. If absolute spectrophotometry is available, the units for a
Spectrum`should also be maggies, otherwise photometry must be present and a calibration vector must be supplied or fit. Note that for convenience there is a maggies_to_nJy attribute of Observation that gives the conversion factor. ForLines, the units should be erg/s/cm^2
uncertaintyThe uncertainty vector (sigma), in same units as
flux, ndarray of same length as the wavelength vector.
maskA boolean array of same length as the wavelength vector, where
Falseelements are ignored in the likelihood calculation.
filtersFor a Photometry, this is a list of strings corresponding to filter names in sedpy
line_indFor a Lines instance, the (zero-based) index of the emission line in the FSPS emission line table
In addition to these attributes, several additional aspects of an observation are used to help predict data or to compute likelihoods. The latter is particularly important in the case of complicated noise models, including outlier models, jitter terms, or covariant noise.
nameA string that can be used to identify the dataset. This can be useful for dataset-specfic parameters. By default the name is constructed from the kind and the memory location of the object.
resolutionFor a Spectrum this defines the instrumental resolution. Analagously to the
filtersattribute for Photometry, this knowledge is used to accurately predict the model in the space of the data.
noiseA
NoiseModelinstance. By default this implements a simple chi-square calculation of independent noise, but it can be complexified.
Example#
For a single observation, you might do something like:
def build_obs(N):
from prospect.observation import Spectrum
N = 1000 # number of wavelength points
spec = Spectrum(wavelength=np.linspace(3000, 5000, N), flux=np.zeros(N), uncertainty=np.ones(N))
# ensure that this is a valid observation for fitting
spec = spec.rectify()
observations = [spec]
return observations
Note that build_obs returns a list even if there is only one dataset.
For photometry this might look like:
def build_obs(N):
from prospect.observation import Photometry
# valid sedpy filter names
fnames = list([f"sdss_{b}0" for b in "ugriz"])
Nf = len(fnames)
phot = [Photometry(filters=fnames, flux=np.ones(Nf), uncertainty=np.ones(Nf)/10)]
# ensure that this is a valid observation for fitting
phot = phot.rectify()
observations = [phot]
return observations
Converting from old style obs dictionaries#
A tool exists to convert old combined observation dictionaries to a list of Observation instances:
from prospect.observation import from_oldstyle
# dummy observation dictionary with just a spectrum
N = 1000
obs = dict(wavelength=np.linspace(3000, 5000, N), spectrum=np.zeros(N), unc=np.ones(N),
filters=[f"sdss_{b}0" for b in "ugriz"], maggies=np.zeros(5), maggies_unc=np.ones(5))
# ensure that this is a valid observation for fitting
spec, phot = from_oldstyle(obs)
print(spec.ndata, phot.filternames, phot.wavelength, phot.flux)
The build_obs() function#
The build_obs() function in the parameter file is written by the user.
It should take a dictionary of command line arguments as keyword arguments. It
should return a list of prospect.observation.Observation instances,
described above.
Other than that, the contents can be anything. Within this function you might
open and read FITS files, ascii tables, HDF5 files, or query SQL databases. You
could, using e.g. an objid parameter, dynamically load data (including
filter sets) for different objects in a table. Feel free to import helper
functions, modules, and packages (like astropy, h5py, sqlite, astroquery, etc.)
The point of this function is that you don’t have to externally convert your
data format to be what Prospector expects and keep another version of files
lying around: the conversion happens within the code itself. Again, the only
requirement is that the function can take a run_params dictionary as keyword
arguments and that it return prospect.observation.Observation instances, as
described above. Each observation instance should correspond to a particular dataset (e.g. a broadband photomtric SED, the spectrum from a particular instrument, or the spectrum from a particular night) that shares instrumental and, more importantly, calibration parameters.