-
Notifications
You must be signed in to change notification settings - Fork 0
3. API & Development Specifics
The programming language, corresponding packages and the framework used are given below:
| OS | Ubuntu 20.10 |
| Language | Go (1.16.x) |
| Package | aws-lambda-go |
| Framework | Serverless |
| Route | /replaceOrganisation |
| Method | POST |
| Content Type | application/json |
The POST request body accepts two parameters in JSON:
-
text : Input text to be passed to the API
-
add_organisation (optional) : Optional list of names of organisations to add to the default list
Example Request:
{
"text" : "Google decided to give it up for Oracle. Facebook introduces new technology"
"add_organisation" (optional) : ["Facebook"]
}
- The method is a POST method because we are accepting a JSON request body.
- The API can be an GET call, where the input text is passed in query string instead
- However, assuming that there might be cases where we have longer text lengths, the method is kept as POST to avoid the query url to be long in case of GET.
- The "add_organisation" property is optional and can be skipped. In this case, the API will use the default list.
- If "add_organisation", the API will consider all the names within it in addition to the default list, after removing duplicate values.
Success Response - 200 OK :
{
"data" : "Google decided to give it up for Oracle. Facebook introduces new technology"
"error" : null
}
Error Response - 500 Internal server error :
{
"data" : null
"error" : "Something went wrong"
}
curl -X POST "https://kepxuc1vi5.execute-api.us-east-1.amazonaws.com/dev/replaceOrganisation" -H "accept: application/json" -H "Content-Type: application/json" -d "{\"text\":\"Facebook, Netflix, Amazon and Google are called the FANG companies.\",\"add_organisation\":[\"Facebook\"]}"
-
The entire repository contains one "main.go" file that implements the Lambda Handler function for the API
-
A regular expression is created based on the names of the organisations which includes the default list and additional organisations passed.
-
The regex the checks for matches in the input string and replaces the match accordingly.
-
A serverless.yml that is used to deploy the lambda function
-
The regex is case sensitive. That is, it will assume that the user inputs text that has the name of the organisation in the title case. So it will recognise "Google" but not "google" as a valid match.
-
This is done because words like "google" and "oracle" are generic english words. For example, we want to avoid sentences like:
- "I google a lot these days"
- "Delphi was a great oracle in ancient Greece"
-
Also,in addition to this, the original case of the entire text will be maintained here.
If the above mentioned case has to be resolved, an alternative approach is to use a text processing / NLP library.
- The Spacy library in python is a production grade NLP library that can be differentiate between the instances we mentioned above.
- It can even detect instances like "AWS, Google Cloud, Azure".
- Spacy is a NLP library which uses machine learning models downloaded from the internet.
- The model along with the library are heavy in size which bloats up the size of the dependancies. (Also, the language changes to python, which being interpreted, adds to the bloat)
- It might be a bit too much for this use case! :)
To keep the API lightweight and simple, a simple and light approach was used with minimal dependancies. Go was chosen as the language since it compiles down into a single binary with a low memory footprint.