Skip to main content

Antibody-Annotation-to-JSON

This project is a Python-based CLI utility that parses and converts annotations for antibody-based therapeutics from this format, designed by Prof. Andrew Martin, into structured JSON files. Each file (input and output) contains annotations for one antibody-based therapeutic that has been granted an International Nonproprietary Name (INN) by the World Health Organisation.

Internally, the tool executes 3 main steps:

  1. Parse the original flat text format and structure the data into a Python dict
  2. Validate against a JSON schema
  3. Serialise the output into a JSON file

It's ultimate purpose is to allow import of the converted data into a MongoDB database for further analysis and to make it accessible to researchers through a web front-end.

Features include:

  • Built-in validation against a JSON schema

Installation

  1. For users running Ubuntu or any other OS that requires an explicit python3 shell command, either replace python with python3 in any of the below commands, or run the following:

    sudo apt install python-is-python3 
  2. Clone the repo and move into it:

    git clone https://github.com/greglv93/Antibody-Annotation-to-JSON.git && cd Antibody-Annotation-to-JSON
  3. (Optional) Create a virtual environment to avoid cluttering your base environment and avoid conflicts with OS package managers such as Homebrew:

    Linux/macOS:

    python -m venv .venv/ && source .venv/bin/activate

    Windows:

    python -m venv .venv\ && .venv\Scripts\activate
  4. Install with pip:

    python -m pip install --upgrade pip

    Option 1: For a regular installation:

    python -m pip install .

    (python -m pip install --upgrade . needs to be run to update the installation with any changes to the source code)

    Option 2: For developers who want an editable install with live feedback on changes to the code (also refer to the CONTRIBUTING guidelines):

    python -m pip install --editable ".[dev]"
    python -m pip install --force-reinstall sourcemeta-jsonschema

    The last line is important as it solves a CLI name conflict: sourcemeta's tool ships a jsonschema executable, but this can get replaced by the CLI executable from the Python jsonschema library, which is installed as a runtime dependency for this project. The 'force-reinstall' command guarantees that sourcemeta's jsonschema CLI ends up on top, overwriting the Python jsonschema library. The Python library is needed at runtime but its command-line executable is not needed (and will in fact be deprecated in future versions), while sourcemeta's tool should be available from the CLI, where it is needed for some pre-commit hooks and can also be invoked for manual testing and exploration of the JSON schema.

    If the Python jsonschema library accidentally gets reinstalled and 'takes back' the CLI, run the 'force-reinstall' command again inside your project environment.

Usage

The tool is now available as an executable command from anywhere in your system terminal. To get started, run:

antibody-to-json --help

Quick usage without installing the package

For users running Ubuntu or other OSs that require an explicit python3 shell command by default, refer to step 0 above

  1. Clone the repository and optionally create a virtualenv as in steps 1-2 above.

  2. Install only the dependencies:

    python -m pip install --upgrade pip
    python -m pip install -r requirements.txt
  3. From the project directory (the repository root), run:

    python -m antibody_annotation_to_json.cli --help

More information

Refer to the documentation website (click the link in the sidebar of the repository's main page on GitHub) or view the documentation files here. Examples of the input data and corresponding JSON output produced and validated by the current version of the tool are available to view here and here.


This project is licensed under the MIT License.