yippy.datasets#

Public catalog and loader for yield input packages (YIPs).

The CATALOG dict is the source of truth: keys are flat YIP names like eac1_aavc_2d; values carry just enough metadata to drive downloads and filtered discovery (telescope, coronagraph, sampling, md5). Descriptive metadata (designer, wavelengths, dark-zone extent, …) lives in the FITS headers inside each YIP, not here.

Archives are hosted as assets on a tagged GitHub release of this repo (currently data-v1) and fetched via pooch over HTTPS. The release tag is separate from the code-release lifecycle managed by release-please. To publish new YIPs: bump DATA_RELEASE_TAG to a new data-vN, attach the updated zips to that release, and refresh the md5 hashes here.

The catalog currently includes only the two reference YIPs used by the yippy paper validation pipeline. Long-term YIP hosting will be provided by ExEP, and when that catalog comes online the discovery API here will grow back into a thin proxy over it.

Public API: - ``fetch_yip(name=None, *, telescope=None, coronagraph=None, sampling=None,

cache_path=None) -> str``

  • cache_dir() -> Path

  • list_yips(**filters) -> list[str]

  • yip_exists(name) -> bool

  • yip_info(name) -> dict

Attributes#

Functions#

_make_pikachu(cache_dir_path)

Build a pooch instance for the YIP catalog at cache_dir_path.

cache_dir()

Return the directory where yippy caches YIP archives by default.

fetch_yip([name, telescope, coronagraph, sampling, ...])

Download a YIP archive (if not cached), unpack, and return its path.

list_yips(**filters)

Return catalog names matching all filters. No filters returns all names.

yip_exists(name)

True iff name is an available YIP in the catalog.

yip_info(name)

Return the catalog metadata dict for name.

Module Contents#

yippy.datasets.logger#
yippy.datasets.CACHE_DIR_ENV_VAR = 'YIPPY_CACHE_DIR'#
yippy.datasets.DATA_RELEASE_TAG: str = 'data-v2'#
yippy.datasets._DATA_BASE_URL: str = 'https://github.com/CoreySpohn/yippy/releases/download/data-v2/'#
yippy.datasets.CATALOG: dict[str, dict[str, Any]]#
yippy.datasets._make_pikachu(cache_dir_path)[source]#

Build a pooch instance for the YIP catalog at cache_dir_path.

Parameters:

cache_dir_path (str | pathlib.Path)

Return type:

pooch.Pooch

yippy.datasets._PIKACHU#
yippy.datasets.cache_dir()[source]#

Return the directory where yippy caches YIP archives by default.

Resolution order:
  1. YIPPY_CACHE_DIR environment variable, if set.

  2. pooch.os_cache("yippy") – the OS-conventional cache directory provided by platformdirs (e.g. ~/Library/Caches/yippy on macOS, ~/.cache/yippy on Linux).

Override per call by passing cache_path to fetch_yip().

Return type:

pathlib.Path

yippy.datasets.fetch_yip(name=None, *, telescope=None, coronagraph=None, sampling=None, cache_path=None)[source]#

Download a YIP archive (if not cached), unpack, and return its path.

Pass either name (flat: "eac1_aavc_2d") OR keyword filters (structured: telescope="eac1", coronagraph="aavc", sampling="2d"). The keyword form must resolve to exactly one catalog entry; pass sampling whenever a (telescope, coronagraph) pair has both 1D and 2D variants.

YIPs are cached at cache_dir() (which honors the YIPPY_CACHE_DIR environment variable). Pass cache_path to override the cache location for this call only – useful for shared institutional setups or project-scoped caches.

Raises:

TypeError: if both name and filters are passed (or neither). KeyError: if name is not in the catalog. ValueError: if the structured query has zero or multiple matches.

Parameters:
Return type:

str

yippy.datasets._FILTERABLE_FIELDS#
yippy.datasets.list_yips(**filters)[source]#

Return catalog names matching all filters. No filters returns all names.

Raises:

TypeError: if a filter key is not a valid catalog field.

Parameters:

filters (str)

Return type:

list[str]

yippy.datasets.yip_exists(name)[source]#

True iff name is an available YIP in the catalog.

Parameters:

name (str)

Return type:

bool

yippy.datasets.yip_info(name)[source]#

Return the catalog metadata dict for name.

Raises:

KeyError: if name is not in the catalog.

Parameters:

name (str)

Return type:

dict[str, Any]