Model Deployment API#

This services allows you to fetch the status of a deployed model and call predictions on the exposed API.

Time to Integrate#

Less than 5 minute

Instructions#

  1. (Optional) For deployment we recommend using the python SDK. However, you can also push your model using an API. To do this, you will first have to install truss to create a truss folder locally. Then compress that truss folder into a .tar.gz file. Then send a post POST request to https://api.slashml.com/model-deployment/v1/models with the compressed file as the body of the request. Save the id in the response object.

  2. Check the status of the model deployment by sending a GET request to https://api.slashml.com/model-deployment/v1/models/YOUR-MODEL-ID/status

  3. You can make predctions on the deployed model by sending a POST request to https://api.slashml.com/model-deployment/v1/models/YOUR-MODEL-ID/predict. The body should contain a json object with model_input which is the input prompt to the model.

Code Blocks#

Submit model for deployment#

Install truss

pip install truss

You can the create a truss object by running the following command from within Python

# you might have to install transfomers and torch
from transformers import pipeline

def train_model():
    # Bring in model from huggingface
    return pipeline('fill-mask', model='bert-base-uncased')

my_model = train_model()

# save the model
truss.create(my_model, 'my_model')

Then convert the folder into a .tar.gz file

tar -czvf my_model.tar.gz my_model

Request#

Then send a post POST request to https://api.slashml.com/model-deployment/v1/models with the compressed file as the body of the request. Save the id in the response object.

import requests

url = "https://api.slashml.com/model-deployment/v1/models/"

payload={'model_name': 'test-dep-model'}

files=[
  ('model_file',('my_model.tar.gz',open('path/to/my_model.tar.gz','rb'),'application/octet-stream'))
]

headers = {
  'Authorization': 'Token YOUR_TOKEN'
}

response = requests.request("POST", url, headers=headers, data=payload, files=files)

print(response.json())

Response (200)#

{
    "id": "a5822206-9680-444c-87ec-4b66a7bcfc26",
    "created": "2023-06-13T06:38:55.311751Z",
    "status": "IN_PROGRESS",
    "name": "'test-dep-model'"
}

Response (400)#

{
    "error" : {
        "message" : "something bad happened",
    }
}

Check status of model#

GET https://api.slashml.com/model-deployment/v1/models/YOUR-MODEL-ID/status

Request#

import requests

url = 'https://api.slashml.com/model-deployment/v1/models/YOUR-MODEL-ID/status'

headers = {
  'Authorization': 'Token <YOUR_API_KEY>'
}

response = requests.get(url, headers=headers, data=payload)
print(response.json())

Response (200) - MODEL-READy#

{
    # keep track of the id for later
    "id": "ozfv3zim7-9725-4b54-9b71-f527bc21e5ab",
    "created": "2023-06-13T06:38:55.311751Z",
    "name": "test-dep-model",
    "status": "MODEL_READY",
    "name": "test-dep-model",
}

Response (400) - Error#

{
    "error" : {
        "message" : "something bad happened"
    }
}

Note: The status will go from ‘QUEUED’ to ‘BUILDING_MODEL’ to ‘DEPLOYING_MODEL’ to ‘MODEL_READY’. If there’s an error processing your input, the status will go to ‘FAILURE’ and there will be an ‘ERROR’ key in the response JSON which will contain more information.

Submit a prediction to the model#

Request#

Then send a post POST request to https://api.slashml.com/model-deployment/v1/models/YOUR-MODEL-ID/predict with the model-input as the body of the request.

import requests
import json

url = "https://api.slashml.com/model-deployment/v1/models/YOUR-MODEL-ID/predict"

payload = json.dumps({
  "model_input": [
    "steve jobs is the [MASK] of apple"
  ]
})

headers = {
  'Authorization': 'Token a7011983a0f3d64ee113317b1e36f8e5bf56c14a',
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

Response (200)#

{
    "id": "a5822206-9680-444c-87ec-4b66a7bcfc26",
    "model_input": [
        "steve jobs is the [MASK] of apple"
    ],
    "model_response": {
        "predictions": [
            {
                "score": 0.516463041305542,
                "sequence": "steve jobs is the founder of apple",
                "token": 3910,
                "token_str": "founder"
            },
            {
                "score": 0.3604991137981415,
                "sequence": "steve jobs is the ceo of apple",
                "token": 5766,
                "token_str": "ceo"
            },
            {
                "score": 0.04929964989423752,
                "sequence": "steve jobs is the president of apple",
                "token": 2343,
                "token_str": "president"
            },
            {
                "score": 0.021112028509378433,
                "sequence": "steve jobs is the creator of apple",
                "token": 8543,
                "token_str": "creator"
            },
            {
                "score": 0.008550147525966167,
                "sequence": "steve jobs is the father of apple",
                "token": 2269,
                "token_str": "father"
            }
        ]
    }
}

Response (400)#

{
    "error": "some error occured when requesting job status",
    "full_message": [
        "{'error': ErrorDetail(string='model not ready', code='permission_denied'), 'reasons': [ErrorDetail(string='model not ready', code='permission_denied'), ErrorDetail(string='model is still being built or deployed', code='permission_denied')]}"
    ]
}