Antibody-Annotation-to-JSON
This project is a Python-based CLI utility that parses and converts annotations for antibody-based therapeutics from this format, designed by Prof. Andrew Martin, into structured JSON files. Each file (input and output) contains annotations for one antibody-based therapeutic that has been granted an International Nonproprietary Name (INN) by the World Health Organisation.
Internally, the tool executes 3 main steps:
- Parse the original flat text format and structure the data into a Python dict
- Validate against a JSON schema
- Serialise the output into a JSON file
It's ultimate purpose is to allow import of the converted data into a MongoDB database for further analysis and to make it accessible to researchers through a web front-end.
Features include:
- Built-in validation against a JSON schema
Installation
-
For users running Ubuntu or any other OS that requires an explicit
python3shell command, either replacepythonwithpython3in any of the below commands, or run the following:sudo apt install python-is-python3 -
Clone the repo and move into it:
git clone https://github.com/greglv93/Antibody-Annotation-to-JSON.git && cd Antibody-Annotation-to-JSON -
(Optional) Create a virtual environment to avoid cluttering your base environment and avoid conflicts with OS package managers such as Homebrew:
Linux/macOS:
python -m venv .venv/ && source .venv/bin/activateWindows:
python -m venv .venv\ && .venv\Scripts\activate -
Install with pip:
python -m pip install --upgrade pipOption 1: For a regular installation:
python -m pip install .(
python -m pip install --upgrade .needs to be run to update the installation with any changes to the source code)Option 2: For developers who want an editable install with live feedback on changes to the code (also refer to the CONTRIBUTING guidelines):
python -m pip install --editable ".[dev]"python -m pip install --force-reinstall sourcemeta-jsonschemaThe last line is important as it solves a CLI name conflict: sourcemeta's tool ships a
jsonschemaexecutable, but this can get replaced by the CLI executable from the Pythonjsonschemalibrary, which is installed as a runtime dependency for this project. The 'force-reinstall' command guarantees that sourcemeta'sjsonschemaCLI ends up on top, overwriting the Pythonjsonschemalibrary. The Python library is needed at runtime but its command-line executable is not needed (and will in fact be deprecated in future versions), while sourcemeta's tool should be available from the CLI, where it is needed for some pre-commit hooks and can also be invoked for manual testing and exploration of the JSON schema.If the Python
jsonschemalibrary accidentally gets reinstalled and 'takes back' the CLI, run the 'force-reinstall' command again inside your project environment.
Usage
The tool is now available as an executable command from anywhere in your system terminal. To get started, run:
antibody-to-json --help
Quick usage without installing the package
For users running Ubuntu or other OSs that require an explicit python3 shell command by default,
refer to step 0 above
-
Clone the repository and optionally create a virtualenv as in steps 1-2 above.
-
Install only the dependencies:
python -m pip install --upgrade pippython -m pip install -r requirements.txt -
From the project directory (the repository root), run:
python -m antibody_annotation_to_json.cli --help
More information
Refer to the documentation website (click the link in the sidebar of the repository's main page on GitHub) or view the documentation files here. Examples of the input data and corresponding JSON output produced and validated by the current version of the tool are available to view here and here.
This project is licensed under the MIT License.