Model Deployment API#
This services allows you to fetch the status of a deployed model and call predictions on the exposed API.
Time to Integrate#
Less than 5 minute
Instructions#
(Optional) For deployment we recommend using the python SDK. However, you can also push your model using an API. To do this, you will first have to install
truss
to create a truss folder locally. Then compress that truss folder into a.tar.gz
file. Then send a postPOST
request tohttps://api.slashml.com/model-deployment/v1/models
with the compressed file as the body of the request. Save theid
in the response object.Check the status of the model deployment by sending a
GET
request tohttps://api.slashml.com/model-deployment/v1/models/YOUR-MODEL-ID/status
You can make predctions on the deployed model by sending a
POST
request tohttps://api.slashml.com/model-deployment/v1/models/YOUR-MODEL-ID/predict
. The body should contain a json object withmodel_input
which is the input prompt to the model.
Code Blocks#
Submit model for deployment#
Install truss
pip install truss
You can the create a truss object by running the following command from within Python
# you might have to install transfomers and torch
from transformers import pipeline
def train_model():
# Bring in model from huggingface
return pipeline('fill-mask', model='bert-base-uncased')
my_model = train_model()
# save the model
truss.create(my_model, 'my_model')
Then convert the folder into a .tar.gz
file
tar -czvf my_model.tar.gz my_model
Request#
Then send a post POST
request to https://api.slashml.com/model-deployment/v1/models
with the compressed file as the body of the request. Save the id
in the response object.
import requests
url = "https://api.slashml.com/model-deployment/v1/models/"
payload={'model_name': 'test-dep-model'}
files=[
('model_file',('my_model.tar.gz',open('path/to/my_model.tar.gz','rb'),'application/octet-stream'))
]
headers = {
'Authorization': 'Token YOUR_TOKEN'
}
response = requests.request("POST", url, headers=headers, data=payload, files=files)
print(response.json())
Response (200)#
{
"id": "a5822206-9680-444c-87ec-4b66a7bcfc26",
"created": "2023-06-13T06:38:55.311751Z",
"status": "IN_PROGRESS",
"name": "'test-dep-model'"
}
Response (400)#
{
"error" : {
"message" : "something bad happened",
}
}
Check status of model#
GET https://api.slashml.com/model-deployment/v1/models/YOUR-MODEL-ID/status
Request#
import requests
url = 'https://api.slashml.com/model-deployment/v1/models/YOUR-MODEL-ID/status'
headers = {
'Authorization': 'Token <YOUR_API_KEY>'
}
response = requests.get(url, headers=headers, data=payload)
print(response.json())
Response (200) - MODEL-READy#
{
# keep track of the id for later
"id": "ozfv3zim7-9725-4b54-9b71-f527bc21e5ab",
"created": "2023-06-13T06:38:55.311751Z",
"name": "test-dep-model",
"status": "MODEL_READY",
"name": "test-dep-model",
}
Response (400) - Error#
{
"error" : {
"message" : "something bad happened"
}
}
Note: The status will go from ‘QUEUED’ to ‘BUILDING_MODEL’ to ‘DEPLOYING_MODEL’ to ‘MODEL_READY’. If there’s an error processing your input, the status will go to ‘FAILURE’ and there will be an ‘ERROR’ key in the response JSON which will contain more information.
Submit a prediction to the model#
Request#
Then send a post POST
request to https://api.slashml.com/model-deployment/v1/models/YOUR-MODEL-ID/predict
with the model-input as the body of the request.
import requests
import json
url = "https://api.slashml.com/model-deployment/v1/models/YOUR-MODEL-ID/predict"
payload = json.dumps({
"model_input": [
"steve jobs is the [MASK] of apple"
]
})
headers = {
'Authorization': 'Token a7011983a0f3d64ee113317b1e36f8e5bf56c14a',
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
Response (200)#
{
"id": "a5822206-9680-444c-87ec-4b66a7bcfc26",
"model_input": [
"steve jobs is the [MASK] of apple"
],
"model_response": {
"predictions": [
{
"score": 0.516463041305542,
"sequence": "steve jobs is the founder of apple",
"token": 3910,
"token_str": "founder"
},
{
"score": 0.3604991137981415,
"sequence": "steve jobs is the ceo of apple",
"token": 5766,
"token_str": "ceo"
},
{
"score": 0.04929964989423752,
"sequence": "steve jobs is the president of apple",
"token": 2343,
"token_str": "president"
},
{
"score": 0.021112028509378433,
"sequence": "steve jobs is the creator of apple",
"token": 8543,
"token_str": "creator"
},
{
"score": 0.008550147525966167,
"sequence": "steve jobs is the father of apple",
"token": 2269,
"token_str": "father"
}
]
}
}
Response (400)#
{
"error": "some error occured when requesting job status",
"full_message": [
"{'error': ErrorDetail(string='model not ready', code='permission_denied'), 'reasons': [ErrorDetail(string='model not ready', code='permission_denied'), ErrorDetail(string='model is still being built or deployed', code='permission_denied')]}"
]
}