Model Deployment API#

This services allows you to fetch the status of a deployed model and call predictions on the exposed API.

Time to Integrate#

Less than 5 minute


  1. (Optional) For deployment we recommend using the python SDK. However, you can also push your model using an API. To do this, you will first have to install truss to create a truss folder locally. Then compress that truss folder into a .tar.gz file. Then send a post POST request to with the compressed file as the body of the request. Save the id in the response object.

  2. Check the status of the model deployment by sending a GET request to

  3. You can make predctions on the deployed model by sending a POST request to The body should contain a json object with model_input which is the input prompt to the model.

Code Blocks#

Submit model for deployment#

Install truss

pip install truss

You can the create a truss object by running the following command from within Python

# you might have to install transfomers and torch
from transformers import pipeline

def train_model():
    # Bring in model from huggingface
    return pipeline('fill-mask', model='bert-base-uncased')

my_model = train_model()

# save the model
truss.create(my_model, 'my_model')

Then convert the folder into a .tar.gz file

tar -czvf my_model.tar.gz my_model


Then send a post POST request to with the compressed file as the body of the request. Save the id in the response object.

import requests

url = ""

payload={'model_name': 'test-dep-model'}


headers = {
  'Authorization': 'Token YOUR_TOKEN'

response = requests.request("POST", url, headers=headers, data=payload, files=files)


Response (200)#

    "id": "a5822206-9680-444c-87ec-4b66a7bcfc26",
    "created": "2023-06-13T06:38:55.311751Z",
    "status": "IN_PROGRESS",
    "name": "'test-dep-model'"

Response (400)#

    "error" : {
        "message" : "something bad happened",

Check status of model#



import requests

url = ''

headers = {
  'Authorization': 'Token <YOUR_API_KEY>'

response = requests.get(url, headers=headers, data=payload)

Response (200) - MODEL-READy#

    # keep track of the id for later
    "id": "ozfv3zim7-9725-4b54-9b71-f527bc21e5ab",
    "created": "2023-06-13T06:38:55.311751Z",
    "name": "test-dep-model",
    "status": "MODEL_READY",
    "name": "test-dep-model",

Response (400) - Error#

    "error" : {
        "message" : "something bad happened"

Note: The status will go from ‘QUEUED’ to ‘BUILDING_MODEL’ to ‘DEPLOYING_MODEL’ to ‘MODEL_READY’. If there’s an error processing your input, the status will go to ‘FAILURE’ and there will be an ‘ERROR’ key in the response JSON which will contain more information.

Submit a prediction to the model#


Then send a post POST request to with the model-input as the body of the request.

import requests
import json

url = ""

payload = json.dumps({
  "model_input": [
    "steve jobs is the [MASK] of apple"

headers = {
  'Authorization': 'Token a7011983a0f3d64ee113317b1e36f8e5bf56c14a',
  'Content-Type': 'application/json'

response = requests.request("POST", url, headers=headers, data=payload)


Response (200)#

    "id": "a5822206-9680-444c-87ec-4b66a7bcfc26",
    "model_input": [
        "steve jobs is the [MASK] of apple"
    "model_response": {
        "predictions": [
                "score": 0.516463041305542,
                "sequence": "steve jobs is the founder of apple",
                "token": 3910,
                "token_str": "founder"
                "score": 0.3604991137981415,
                "sequence": "steve jobs is the ceo of apple",
                "token": 5766,
                "token_str": "ceo"
                "score": 0.04929964989423752,
                "sequence": "steve jobs is the president of apple",
                "token": 2343,
                "token_str": "president"
                "score": 0.021112028509378433,
                "sequence": "steve jobs is the creator of apple",
                "token": 8543,
                "token_str": "creator"
                "score": 0.008550147525966167,
                "sequence": "steve jobs is the father of apple",
                "token": 2269,
                "token_str": "father"

Response (400)#

    "error": "some error occured when requesting job status",
    "full_message": [
        "{'error': ErrorDetail(string='model not ready', code='permission_denied'), 'reasons': [ErrorDetail(string='model not ready', code='permission_denied'), ErrorDetail(string='model is still being built or deployed', code='permission_denied')]}"