In the past year, ChatGPT has brought AI to the limelight. ChatGPT 3 was the model in use when I first heard of it last year. Just like it did with me, It has captured the interest of many people and businesses.
ChatGPT can be used for many things, such as finding answers to daily questions, solving maths problems, writing blog posts, and writing code, to name a few.
In this tutorial, we’ll connect a SwiftUI app to OpenAI’s ChatGPT API to provide a chatbot for your mobile app. The tutorial will briefly cover how to create a simple user interface and connect to OpenAI’s API to receive answers to questions you ask within the app.
The application here is just a chatbot, but you can use what you learn here to add value to any kind of app such as a note-taking app where a user can write some notes and then use AI to either add, summarise or correct what has been written.
OpenAI’s API is very similar to many other APIs on the internet in that you create an HTTP request, such as a POST request, add the bearer to the header, add items to the body, and then post it. The API then passes back a result that contains the answer you need.
What is Needed
To create an iPhone app, you’ll need a Mac with Xcode installed. You’ll also need an account at OpenAI and add some credit. Just be sure you limit your monthly spending, although OpenAi API access is cheap if it’s just you using it, fractions of a penny per request. I have my account capped at $10/month, and so far have used about 1 cent in tests. You’ll need to grab your api key and keep it safe as you will need it for testing the app.
To clarify some of the terminology I have mentioned, OpenAI is the company that created ChatGPT, amongst many things. We will use the API that OpenAI offers to connect to the ChatGPT model(s) to ask and get its response.
Creating a new SwiftUI App
We’ll begin by creating a new SwiftUI app.
- Open Xcode
- Click “Create new Xcode Project”
- Select iOS across the top, and select App (click next)
- Give your app a name
- Select your team
- Select SwiftUI and Swift for Interface and Language
- Leave the boxes unchecked
- Choose a location to save the project and save it
When loaded, you can open ContentView.swift in the left sidebar. Rather than go through the whole architecture of an app by breaking it up into classes, we’ll have all of the code in ContentView.swift so that you can see it working together.
You’ll see a body in this file that conforms to some View. More on that here. This is where we’ll add our user interface. Given that the ChatGPT app is relatively simple, with a text entry box at the bottom and an area for the response above that fills the rest of the screen, we’ll go for something similar.
@State private var prompt: String = ""
@State private var apiKey: String = ""
@State private var resultText: String = ""
@State private var cancellable: AnyCancellable? = nil
@State private var messages: [Message] = []
var body: some View {
VStack {
TextField("API Key", text: $apiKey)
.textFieldStyle(.roundedBorder)
.padding()
TextField("Enter Question", text: $prompt)
.textFieldStyle(.roundedBorder)
.padding()
Button(action: {
submit()
}) {
Image(systemName: "paperplane")
.font(.title)
}
.padding()
Text(resultText)
.padding()
}
}
In the code above we have five properties using the @state property wrapper.
- We have the prompt (meaning the question that the user will write)
- apiKey which is where you will paste your api key in from OpenAI (note that this is not how you would do this in an actual app. The key would be securely stored somewhere)
- resultText will contain any response from OpenAI
- cancellable will be explained later but is related to the HTTP request that will be made to OpenAI
- messages is an array of messages. When you ask a question it will be added to the array, and when you receive an answer, that will be added to the array. By passing in the previous parts of the conversation it allows ChatGPT to keep context of where the conversation has been.
A quick mention of @State and what it does. @State is a property wrapper that defines mutable state properties within a view. Whenever the property is modified, @State triggers an update to the view. For example, when we reach out to OpenAI and get a response, our code can set resultText to the answer. Because resultState is marked as @State, the view will reload and the text appear on screen.
Lets move onto the body, it is fairly simple in that it has a VStack (vertical stack) containing two text fields, a button, and a Text view at the bottom.
The first TextField provides a place for you to paste your API key in.
Next is our prompt which is the text you enter when asking a question to the chat bot.
You’ll notice that the text is set to $apiKey on the first text field and $prompt on the second text field. The $ sign here means that it relates to a two-way binding.
The two-way binding means that if you add text to the textfield when the app is launched, the property “apiKey” will have a valued stored in it. Equally, if you programatically set the apiKey property, the view will update, hence the name “two-way binding”.
We next have a button that has an action. When you tap the button it calls the submit() function. We haven’t written that yet, but will do so in a moment, so don’t worry about any errors you might see in the code for now.
Finally we have a Text view that contains resultText. Notice that resultText does not have the $ sign. This is not needed because we can’t two-way bind on a Text view. It shows text but cannot be edited it. But any change to the properties will trigger the view to update.
OpenAI API Documentation
Before moving on, we’ll take a quick look at the API documentation at OpenAI. For this app we’ll look at the Chat Completion endpoint as found here.
This section of the documentation tells us everything we need to know to send a POST request to OpenAI and to get a response back. If you were so inclined, you could test this now on an app such as Postman on the Mac or Windows. You can create a POST request within and copy/paste the parameter section into Postman, and then add your bearer. When you make the POST request, you’ll get a response back in JSON.
Lets take a look at how we can make this request from our SwiftUI app, how we can create the bearer and body request, and then receive the response back and handle converting the JSON response into a way that our SwiftUI app can understand. If it sounds overwhelming, it isn’t particularly. There’s just a few moving parts that you need to get used to which will come with practice.
OpenAI Chat Completion Response
Lets begin by looking at the response that OpenAI provides for a chat completion request:
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "\n\nHello there, how may I assist you today?",
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 12,
"total_tokens": 21
}
}
Each response that comes back from OpenAI will follow this format for the endpoint we are using. The format is JSON. If you are unfamiliar with JSON you can check out json.org, but in short the response above is a dictionary of key values with either a curly brace or square brace before and after a section.
A curly brace indicates that this is an object. This particular object is a dictionary that has 5 keys and 5 values. Three of the five values are either string values or integers (see “chatcmpl-123” or 1677652288 in the sample response above).
Next, we have a choices key that has an object within which is a list of objects (shown by the square [] braces). This has an index which is an int, and a message which is another object with nested key/values.
Usage is the last key that also has a dictionary as a value. This has three items, with all three being integers.
To convert this into models that SwiftUI can handle we need to make a few structs to represent this data so that we can use codable and decodable to convert to and from JSON.
At the highest level, meaning the outer curly braces (the first and last characters), we can represent the data as follows:
struct APIResponse: Decodable {
let id: String?
let object: String
let created: Int
let model: String
let choices: [Choice]
let usage: Usage
}
This struct is called APIResponse. It represents the main object that comes back as a result from OpenAI.
Note that we have [Choice] and Usage that are undefined.
struct Choice: Decodable {
let index: Int
let message: Message
let finish_reason: String
}
This is the Choice struct. It has an index, message, and a finish reason. The first and last are an int and string, respectively. Message is a Message struct which we can define here:
struct Message: Codable {
let role: Role
let content: String
}
The message contains a Role and string values. We could specify role as a string, but because the documentation explains that there are only four possibilities for this property, we can add these to an enum as follows:
enum Role: String, Codable {
case system
case user
case assistant
case function
}
This means that role can only be one of the cases above and nothing else.
Moving onto usage, we have the following struct:
struct Usage: Decodable {
let prompt_tokens: Int
let completion_tokens: Int
let total_tokens: Int
}
Notice that all of these items are either Codable or Decodable which is needed to convert to and from JSON.
When we receive a response from the ChatGPT API, we will now be able to decode it thanks to the structures declared above.
We have the response in order. Next we need to create the APIRequest. The API expects a JSON string to be passed in and we need a way to do that. This is done in almost the same was as handling the response:
struct APIRequest: Encodable {
let model: Model
let messages: [Message]
}
Here we have an APIRequest model. We use that name as it follows common conventions, although you can name it whatever you want.
The OpenAI documentation tells us that we need to pass in the model and a list of messages so that we can ask a question.
Model in this example is an enum.
enum Model: String, Codable {
case gpt_3_5_turbo = "gpt-3.5-turbo"
}
I prefer working with enums so that if many places are using it and a new model comes out, we can just add it to the list and it can be available wherever we need it in the app.
The messages are a list of Messages. We already have a message model for the response, and looking closely at the documentation the message request is the same as the message response which means we can just reuse it.
Note that it is a list of messages. This allows us to put our initial question in the request, and when we get a response we can add it. We can then add our next message to the list and keep adding to it so that OpenAI can keep up with the context of the conversation. You might notice in testing that even if you don’t pass in previous questions and responses that it still keeps context, but adding the context makes it more reliable.
Asking a Question and Getting a Response
Now that we have the structs in place it allows us to start pulling all of this together.
func submit() {
guard let url = URL(string: "https://api.openai.com/v1/chat/completions") else {
print("Invalid URL")
return
}
var request = URLRequest(url: url)
request.httpMethod = "POST"
request.setValue("Bearer \(apiKey)", forHTTPHeaderField: "Authorization")
request.setValue("application/json", forHTTPHeaderField: "Content-Type")
messages.append(Message(role: .user, content: prompt))
do {
let payload = APIRequest(model: .gpt_3_5_turbo, messages: messages)
let jsonData = try JSONEncoder().encode(payload)
request.httpBody = jsonData
} catch {
print("Error: \(error)")
return
}
cancellable = URLSession.shared.dataTaskPublisher(for: request)
.tryMap { $0.data }
.decode(type: APIResponse.self, decoder: JSONDecoder())
.receive(on: DispatchQueue.main)
.sink(
receiveCompletion: { completion in
switch completion {
case .failure(let error):
resultText = "Error: \(error.localizedDescription)"
case .finished:
break
}
},
receiveValue: { response in
resultText = response.choices.first?.message.content ?? "No response"
messages.append(Message(role: .assistant, content: resultText))
prompt = ""
}
)
}
Here is how it all works behind the scenes. If you remember the beginning of the tutorial we had a button that called a submit function. This is that function. Lets go through it:
We create a URL from a string to the endpoint we are interested in, in this case we want chat completions.
Assuming creating the URL is successful, which in this case there is no reason it wouldn’t be because of being hard-coded, we can then create a URL request with that URL.
We then need to add some items to the header, particularly the Authorisation and Content-Type. We use a bearer here with the apiKey from the first text field populating it.
We also set the httpMethod to POST.
You may have noticed that at the top, we have @state for messages. We now need to add our message which is what is in the second textfield.
At this point, the role is set to .user (from the enum) and the message is appended.
Next we need to create a payload that can be added to the http request. Here we add the APIRequest struct with the model specified and list of messages added.
We then use the JSON encoder to encode the payload. Because we are using Encodable on our APIRequest struct it makes it possible to encode the payload. Conforming to Encodable ensures that the struct has what it needs to encode.
When we have successfully encoded the payload into data, we add that to the http request’s body.
We now move on to making the request.
The first line uses the @State property wrapper called cancellable. We use this because it keeps the URL session alive while the view is present. If we were to switch to another view then cancellable would drop and the request would be terminated.
tryMap is used to try get the data received from the URLSession request. A tuple is returned and the $0.data refers to the data in what is received. This needs more explanation, but I’ll skip that in this tutorial.
Next is decode. This attempts to decode the received data into an APIResponse which was the first struct that we created. It is set to use the JSONDecoder to attempt this. Notice how APIResponse conformed to Decodable which means that if we declared that structure correctly it should successfully decode. If you ever run into errors with coding/decoding then check to make sure the response you are getting matches the struct that you declare.
Next up is receive(on: DispatchQueue.main). Because this request will update the view, it should be executed on the main queue.
sink is next and is used as a subscriber to the values that have been emitted. It there’s a problem, we can set resultText to the error. If you got your structs wrong, you’ll see this called with an explanation that something didn’t work out.
Finally we get to receiveValue which is what we hit when all is well. Assuming we got something back from ChatGPT we can set resultText to the response and this, being an @State property wrapper, will update the view to contain the answer. I added an extra line in here that appends the message in the response to the array of messages so that if another question is asked, it will contain the context. I also set prompt to an empty string so that the user doesn’t need to manually delete their last question.
Suggestions
I mentioned earlier that you don’t just need to use this for getting answers to questions. You could adapt the code to have two textfields, one with a note you are taking in a meeting and another to ask ChatGPT to do something with that note. For example, you could write notes for a meeting and then at the end ask ChatGPT to extract all actionable items from your notes. It will then create a list of items for you that it believes are actionable.
To accomplish this you would need to grab both the meeting note and question and concatenate them in a message to send. ChatGPT is smart enough to know that you asked a question and will answer accordingly.
Project Download
You can grab the project from here.
Leave a Reply
You must be logged in to post a comment.