Guides

Run GPT-2 Inside an Enclave using Python

In this post you will build an app that runs GPT-2 in a Enclave. You can follow along by cloning this repo.

Using language model APIs and services is common; you’ve probably already used one for a programming task or to help you write a blog post in the past week. However, in order to use some of the most popular ones available, you must relinquish privacy and security. Personal information is passed to these providers, and sensitive data within prompts may be shared and stored.

One way to utilize the power of language models while ensuring the protection of personal and prompt data is by running one in a secure enclave — a cloud based Trusted Execution Environment (TEE). Evervault Enclaves allow you to build, deploy, and scale apps running in Docker containers inside secure enclaves In this guide you will use Python to run a language model inside of an Enclave.

When run, this will be the result.

Input:

Output:

As you can see it’s not a perfect output We’ve had to use an older, smaller language model and the results won’t be as detailed as more advanced models (but of course, even newer versions make mistakes).

To try training and fine-tuning GPT-2 or minGPT yourself, you can use this Google Colab notebook. It can work running on CPUs, but will work faster if you can access GPUs.

Prerequisites

Set up

Install the Enclaves CLI by running the following command.

Downloading the Model

The model used by default is the pretrained GPT2 model from hugging face. You can change the model name to other variants of GPT2 such as gpt2-xl by verifying that it is on hugging face and then changing the MODEL_NAME environment variable.

Set up the Python app

The back end of the app takes a pre trained model and passes a prompt you input in a POST request to the model to generate responses.

Load the model

This bit of code will set the device as CPU since we won’t have access to GPUs. It will then download the weights and load them.

Get the prompt

The app is running as a simple flask app that will retrieve the prompt from the data sent in a POST request. You can override the number of samples in num_samples to increase or decrease the number of responses sent back. The response will also include the time that the text generation took.

Generate the response

This code will take the prompt and generate the given number of responses. It uses a GPT-2 tokenizer provided by HuggingFace and returns the responses generated concatenated as a string.

Open up the Dockerfile. You will use a virtual environment to install the required libraries needed to run the app. You’ll also tell Docker your webserver will listen on port 8008, which matches the Flask server port defined in app.py.

Initialize the Enclave

First, make sure that you have Docker running. Then, in your terminal run the following command to initalize the Enclave. You can use the suggested name below or change it to one of your choosing.

You should see that a cert.pem, key.pem and enclave.toml are generated. Open up the enclave.toml. You can see that important details are generated that will help with the deployment of your Enclave.

Add Environment Variables

You can add environment variables by going to Enclaves > Environment in your Evervault Dashboard. Be sure to check the “secret” box on any sensitive credentials (like the access keys). This will encrypt them and they will only be decrypted in the Enclave.

Build the Enclave

Now build the Enclave using the following command. This will also generate a Dockerfile used to generate the EIF file which will run inside the Enclave (it may take a few minutes).

Deploy the Enclave

Finally, you can deploy the Enclave.

Note that if you are commiting your code, you will want to add the .eif file to .gitignore as it is a large file.

Make a Request

In your terminal make a CURL request to the Enclave endpoint and pass in a prompt as JSON.

When run successfully, you should get a response that looks like the below.

Conclusion

In this guide, you used Python to run GPT-2 inside an Enclave. If you ran into an issue or have any questions about this guide, feel free to raise them on GitHub. Running a different model in a Enclave? Let us know — we'd love to hear about it!