utils

Utility Functions and Classes

This module collects small pieces of code used throughout bioconda_utils.

Functions

allowed_env_var(s[, docker])

bin_for([name])

built_package_paths(recipe)

Returns the path to which a recipe would be built.

changed_since_master(recipe_folder)

Return filenames changed since master branch.

check_recipe_skippable(recipe, check_channels)

Return True if the same number of builds (per subdir) defined by the recipe are already in channel_packages.

ellipsize_recipes(recipes, recipe_folder[, n, m])

Logging helper showing recipe list

ensure_list(obj)

Wraps obj in a list if necessary

envstr(env)

file_from_commit(commit, filename)

Returns the contents of a file at a particular commit as a string.

flatten_dict(dict)

get_blacklist(config, recipe_folder)

Return list of recipes to skip from blacklists

get_conda_build_config_files([config])

get_deps([recipe, build])

Generator of dependencies for a single recipe

get_free_space()

Return free space in MB on disk

get_latest_recipes(recipe_folder, config[, …])

Generator of recipes.

get_package_paths(recipe, check_channels[, …])

get_recipes(recipe_folder[, package, exclude])

Generator of recipes.

last_commit_to_master()

Identifies the day of the last commit to master branch.

load_all_meta(recipe[, config, finalize])

For each environment, yield the rendered meta.yaml.

load_conda_build_config([platform, trim_skip])

Load conda build config while considering global pinnings from conda-forge.

load_config(path)

Parses config file, building paths to relevant blacklists

load_first_metadata(recipe[, config, finalize])

Returns just the first of possibly many metadata files.

load_meta_fast(recipe[, env])

Given a package name, find the current meta.yaml file, parse it, and return the dict.

newly_unblacklisted(config_file, …)

Returns the set of recipes that were blacklisted in master branch but have since been removed from the blacklist.

parallel_iter(func, items, desc, *args, **kwargs)

run(cmds[, env, mask, live, mylogger, loglevel])

Run a command (with logging, masking, etc)

sandboxed_env(env)

Context manager to temporarily set os.environ, only allowing env vars from the existing os.environ or the provided env that match ENV_VAR_WHITELIST globs.

set_max_threads(n)

setup_logger([name, loglevel, logfile, …])

Set up logging for bioconda-utils

temp_env(env)

Context manager to temporarily set os.environ.

temp_os(platform)

Context manager to temporarily set sys.platform.

threads_to_use()

Returns the number of cores we are allowed to run on

tqdm(*args, **kwargs)

Wrapper around TQDM handling disable

validate_config(config)

Validate config against schema

wraps(func)

Custom wraps() function for decorators

Classes

AsyncRequests

Download a bunch of files in parallel

CondaBuildConfigFile(arg, path)

Create new instance of CondaBuildConfigFile(arg, path)

EnvMatrix(env)

Intended to be initialized with a YAML file and iterated over to yield all combinations of environments.

JinjaSilentUndefined([hint, obj, name, exc])

LogFuncFilter(func[, trunc_msg, max_lines, …])

Logging filter capping the number of messages emitted from given function

LoggingSourceRenameFilter

Logging filter for abbreviating module name in logs

Progress()

RepoData

Singleton providing access to package directory on anaconda cloud

TqdmHandler(*args, **kwargs)

Tqdm aware logging StreamHandler

Exceptions

BiocondaUtilsWarning

DivergentBuildsError

Documentation

class bioconda_utils.utils.AsyncRequests[source]

Bases: object

Download a bunch of files in parallel

This is not really a class, more a name space encapsulating a bunch of calls.

CONNECTIONS_PER_HOST = 4

Max connections to each server

USER_AGENT = 'bioconda/bioconda-utils'

Identify ourselves

classmethod fetch(urls, descs, cb, datas)[source]

Fetch data from URLs.

This will use asyncio to manage a pool of connections at once, speeding up download as compared to iterative use of requests significantly. It will also retry on non-permanent HTTP error codes (i.e. 429, 502, 503 and 504).

Parameters
  • urls – List of URLS

  • descs – Matching list of descriptions (for progress display)

  • cb – As each download is completed, data is passed through this function. Use to e.g. offload json parsing into download loop.

exception bioconda_utils.utils.BiocondaUtilsWarning[source]

Bases: UserWarning

class bioconda_utils.utils.CondaBuildConfigFile(arg, path)

Bases: tuple

Create new instance of CondaBuildConfigFile(arg, path)

property arg

Alias for field number 0

property path

Alias for field number 1

exception bioconda_utils.utils.DivergentBuildsError[source]

Bases: Exception

class bioconda_utils.utils.EnvMatrix(env)[source]

Bases: object

Intended to be initialized with a YAML file and iterated over to yield all combinations of environments.

YAML file has the following format:

CONDA_PY:
  - "2.7"
  - "3.5"
CONDA_BOOST: "1.60"
CONDA_PERL: "5.22.0"
CONDA_NPY: "110"
CONDA_NCURSES: "5.9"
CONDA_GSL: "1.16"
Parameters

env (str or dict) – If str, assume it’s a path to a YAML-format filename and load it into a dict. If a dict is provided, use it directly.

class bioconda_utils.utils.JinjaSilentUndefined(hint=None, obj=missing, name=None, exc=<class 'jinja2.exceptions.UndefinedError'>)[source]

Bases: jinja2.runtime.Undefined

class bioconda_utils.utils.LogFuncFilter(func, trunc_msg=None, max_lines=0, consecutive=True)[source]

Bases: object

Logging filter capping the number of messages emitted from given function

Parameters
  • func – The function for which to filter log messages

  • trunc_msg (Optional[str]) – The message to emit when logging is truncated, to inform user that messages will from now on be hidden.

  • max_lines (int) – Max number of log messages to allow to pass

  • consectuctive – If try, filter applies to consectutive messages and resets if a message from a different source is encountered.

Fixme:

The implementation assumes that func uses a logger initialized with getLogger(__name__).

class bioconda_utils.utils.LoggingSourceRenameFilter[source]

Bases: object

Logging filter for abbreviating module name in logs

Maps bioconda_utils to BIOCONDA and for everything else to just the top level package uppercased.

class bioconda_utils.utils.RepoData[source]

Bases: object

Singleton providing access to package directory on anaconda cloud

If the first call provides a filename as cache argument, the file is used to cache the directory in CSV format.

Data structure:

Each channel hosted at anaconda cloud comprises a number of subdirs in which the individual package files reside. The subdirs can be one of noarch, osx-64 and linux-64 for Bioconda. (Technically (noarch|(linux|osx|win)-(64|32)) appears to be the schema).

For channel/subdir (aka channel/platform) combination, a repodata.json contains a package key describing each package file with at least the following information:

name: Package name (lowercase, alphanumeric + dash)

version: Version (no dash, PEP440)

build_number: Non negative integer indicating packaging revisions

build: String comprising hash of pinned dependencies and build

number. Used to distinguish different builds of the same package/version combination.

depends: Runtime requirements for package as list of strings. We

do not currently load this.

arch: Architecture key (x86_64). Not used by conda and not loaded

here.

platform: Platform of package (osx, linux, noarch). Optional

upstream, not used by conda. We generate this from the subdir information to have it available.

Repodata versions:

The version is indicated by the key repodata_version, with absence of that key indication version 0.

In version 0, the info key contains the subdir, platform, arch, default_python_version and default_numpy_version keys. In version 1 it only contains the subdir key.

In version 1, a key removed was added, listing packages removed from the repository.

Makes RepoData a singleton

cache_timeout = 28800

default lifetime for repodata cache

property channels

Return channels to load.

columns = ['build', 'build_number', 'name', 'version', 'depends', 'channel', 'subdir', 'platform']

Columns available in internal dataframe

property df

Internal Pandas DataFrame object

Try not to use this … the point of this class is to be able to change the structure in which the data is held.

get_latest_versions(channel)[source]

Get the latest version for each package in channel

get_package_data(key=None, channels=None, name=None, version=None, build_number=None, platform=None, build=None, native=False)[source]

Get key for each package in channels

If key is not give, returns bool whether there are matches. If key is a string, returns list of strings. If key is a list of string, returns tuple iterator.

get_versions(name)[source]

Get versions available for package

Parameters

name – package name

Returns

Dictionary mapping version numbers to list of architectures e.g. {‘0.1’: [‘linux’], ‘0.2’: [‘linux’, ‘osx’], ‘0.3’: [‘noarch’]}

platforms = ['linux', 'osx', 'noarch']

Platforms loaded

set_timeout(timeout)[source]

Set the timeout after which the repodata should be reloaded

class bioconda_utils.utils.TqdmHandler(*args, **kwargs)[source]

Bases: logging.StreamHandler

Tqdm aware logging StreamHandler

Passes all log writes through tqdm to allow progress bars and log messages to coexist without clobbering terminal

emit(record)[source]

Emit a record.

If a formatter is specified, it is used to format the record. The record is then written to the stream with a trailing newline. If exception information is present, it is formatted using traceback.print_exception and appended to the stream. If the stream has an ‘encoding’ attribute, it is used to determine how to do the output to the stream.

bioconda_utils.utils.built_package_paths(recipe)[source]

Returns the path to which a recipe would be built.

Does not necessarily exist; equivalent to conda build --output recipename but without the subprocess.

bioconda_utils.utils.changed_since_master(recipe_folder)[source]

Return filenames changed since master branch.

Note that this uses origin, so if you are working on a fork of the main repo and have added the main repo as upstream, then you’ll have to do a git checkout master && git pull upstream master to update your fork.

bioconda_utils.utils.check_recipe_skippable(recipe, check_channels)[source]

Return True if the same number of builds (per subdir) defined by the recipe are already in channel_packages.

bioconda_utils.utils.ellipsize_recipes(recipes, recipe_folder, n=5, m=50)[source]

Logging helper showing recipe list

Parameters
  • recipes (Collection[str]) – List of recipes

  • recipe_folder (str) – Folder name to strip from recipes.

  • n (int) – Show at most this number of recipes, with “…” if more are found.

  • m (int) – Don’t show anything if more recipes than this (pointless to show first 5 of 5000)

Return type

str

Returns

A string like ” (htslib, samtools, …)” or “”

bioconda_utils.utils.ensure_list(obj)[source]

Wraps obj in a list if necessary

>>> ensure_list("one")
["one"]
>>> ensure_list(["one", "two"])
["one", "two"]
bioconda_utils.utils.file_from_commit(commit, filename)[source]

Returns the contents of a file at a particular commit as a string.

Parameters
  • commit (commit-like string) –

  • filename (str) –

bioconda_utils.utils.get_blacklist(config, recipe_folder)[source]

Return list of recipes to skip from blacklists

Return type

set

bioconda_utils.utils.get_deps(recipe=None, build=True)[source]

Generator of dependencies for a single recipe

Only names (not versions) of dependencies are yielded.

If the variant/version matrix yields multiple instances of the metadata, the union of these dependencies is returned.

Parameters
  • recipe (str or MetaData) – If string, it is a path to the recipe; otherwise assume it is a parsed conda_build.metadata.MetaData instance.

  • build (bool) – If True yield build dependencies, if False yield run dependencies.

bioconda_utils.utils.get_free_space()[source]

Return free space in MB on disk

bioconda_utils.utils.get_latest_recipes(recipe_folder, config, package='*')[source]

Generator of recipes.

Finds (possibly nested) directories containing a meta.yaml file and returns the latest version of each recipe.

Parameters
  • recipe_folder (str) – Top-level dir of the recipes

  • config (dict or filename) –

  • package (str or iterable) – Pattern or patterns to restrict the results.

bioconda_utils.utils.get_recipes(recipe_folder, package='*', exclude=None)[source]

Generator of recipes.

Finds (possibly nested) directories containing a meta.yaml file.

Parameters
  • recipe_folder (str) – Top-level dir of the recipes

  • package (str or iterable) – Pattern or patterns to restrict the results.

bioconda_utils.utils.last_commit_to_master()[source]

Identifies the day of the last commit to master branch.

bioconda_utils.utils.load_all_meta(recipe, config=None, finalize=True)[source]

For each environment, yield the rendered meta.yaml.

Parameters

finalize (bool) – If True, do a full conda-build render. Determines exact package builds of build/host dependencies. It involves costly dependency resolution via conda and also download of those packages (to inspect possible run_exports). For fast-running tasks like linting, set to False.

bioconda_utils.utils.load_conda_build_config(platform=None, trim_skip=True)[source]

Load conda build config while considering global pinnings from conda-forge.

bioconda_utils.utils.load_config(path)[source]

Parses config file, building paths to relevant blacklists

Parameters

path (str) – Path to YAML config file

bioconda_utils.utils.load_first_metadata(recipe, config=None, finalize=True)[source]

Returns just the first of possibly many metadata files. Used for when you need to do things like check a package name or version number (which are not expected to change between variants).

If the recipe will be skipped, then returns None

Parameters

finalize (bool) – If True, do a full conda-build render. Determines exact package builds of build/host dependencies. It involves costly dependency resolution via conda and also download of those packages (to inspect possible run_exports). For fast-running tasks like linting, set to False.

bioconda_utils.utils.load_meta_fast(recipe, env=None)[source]

Given a package name, find the current meta.yaml file, parse it, and return the dict.

Parameters
  • recipe (str) – Path to recipe (directory containing the meta.yaml file)

  • env – Optional variables to expand

Returns

Tuple of original recipe string and rendered dict

bioconda_utils.utils.newly_unblacklisted(config_file, recipe_folder, git_range)[source]

Returns the set of recipes that were blacklisted in master branch but have since been removed from the blacklist. Considers the contents of all blacklists in the current config file and all blacklists in the same config file in master branch.

Parameters
  • config_file (str) – Needs filename (and not dict) because we check what the contents of the config file were in the master branch.

  • recipe_folder (str) – Path to recipe dir, needed by get_blacklist

  • git_range (str or list) – If str or single-item list. If 'HEAD' or ['HEAD'] or ['master', 'HEAD'], compares the current changes to master. If other commits are specified, then use those commits directly via git show.

bioconda_utils.utils.run(cmds, env=None, mask=None, live=True, mylogger=<Logger bioconda_utils.utils (WARNING)>, loglevel=20, **kwargs)[source]

Run a command (with logging, masking, etc)

  • Explicitly decodes stdout to avoid UnicodeDecodeErrors that can occur when using the universal_newlines=True argument in the standard subprocess.run.

  • Masks secrets

  • Passed live output to logging

Parameters
Return type

CompletedProcess

Returns

CompletedProcess object

Raises
  • subprocess.CalledProcessError if the process failed

  • FileNotFoundError if the command could not be found

bioconda_utils.utils.sandboxed_env(env)[source]

Context manager to temporarily set os.environ, only allowing env vars from the existing os.environ or the provided env that match ENV_VAR_WHITELIST globs.

bioconda_utils.utils.setup_logger(name='bioconda_utils', loglevel=20, logfile=None, logfile_level=10, log_command_max_lines=None, prefix='BIOCONDA ', msgfmt='%(asctime)s %(log_color)s%(name)s %(levelname)s%(reset)s %(message)s', datefmt='%H:%M:%S')[source]

Set up logging for bioconda-utils

Parameters
  • name (str) – Module name for which to get a logger (__name__)

  • loglevel (Union[str, int]) – Log level, can be name or int level

  • logfile (Optional[str]) – File to log to as well

  • logfile_level (Union[str, int]) – Log level for file logging

  • prefix (str) – Prefix to add to our log messages

  • msgfmt (str) – Format for messages

  • datefmt (str) – Format for dates

Return type

Logger

Returns

A new logger

bioconda_utils.utils.temp_env(env)[source]

Context manager to temporarily set os.environ.

Used to send values in env to processes that only read the os.environ, for example when filling in meta.yaml with jinja2 template variables.

All values are converted to string before sending to os.environ

bioconda_utils.utils.temp_os(platform)[source]

Context manager to temporarily set sys.platform.

bioconda_utils.utils.threads_to_use()[source]

Returns the number of cores we are allowed to run on

bioconda_utils.utils.tqdm(*args, **kwargs)[source]

Wrapper around TQDM handling disable

Logging is disabled if:

  • TERM is set to dumb

  • CIRCLECI is set to true

  • the effective log level of the is lower than set via loglevel

Parameters
  • loglevel – logging loglevel (the number, so logging.INFO)

  • logger – local logger (in case it has different effective log level)

bioconda_utils.utils.validate_config(config)[source]

Validate config against schema

Parameters

config (str or dict) – If str, assume it’s a path to YAML file and load it. If dict, use it directly.

bioconda_utils.utils.wraps(func)[source]

Custom wraps() function for decorators

This one differs from functiools.wraps and boltons.funcutils.wraps in that it allows adding keyword arguments to the function signature.

>>> def decorator(func):
>>>   @wraps(func)
>>>   def wrapper(*args, extra_param=None, **kwargs):
>>>      print("Called with extra_param=%s" % extra_param)
>>>      func(*args, **kwargs)
>>>   return wrapper
>>>
>>> @decorator()
>>> def test(arg1, arg2, arg3='default'):
>>>     pass
>>>
>>> test('val1', 'val2', extra_param='xyz')