iphoneWWDC 2017

was a buzz of exciting and innovative announcements and presentations. One especially exciting announcement was the introduction of the newly improved Natural Language Processing (NLP) APIs. These APIs can be integrated into existing apps which can significantly improve the user experience. Typing can become much more accurately predictive, searching more expansive, and the power of machine learning can now be harnessed to identify and extract text from browsed media. All of this is accomplished through the evaluation of natural language inputs and outputs such as typed text, recognized handwriting, or transcribed speech. The result being either the conference of intelligence or the extraction of information.

Documentation for the NSLinguisticTagger APIs can be found here.

Let's consider the following examples.

Say you were texting a friend using an English keyboard about some faraway location with a name definitely not born of the English language, perhaps a mountain hike to a famous boulder in Norway named Kjeragbolten. If you were to open your messaging app right now and try to type this word your keyboard might predict knew, knee, or even Khedive, but not Kjeragbolten. However, if you switch over to Safari and browse for Kjeragbolten, maybe open up an article about this attraction, and then head back over to your messaging app, upon retyping the beginning of the word Kjer you should see Kjeragbolten show up as a predicted option. This is the power of NLP at work. Apple in fact is using the NLP APIs in their first party apps and software, the same APIs which we now have access to for the first time.

Beyond search, the NLP APIs also open up data organization opportunities through the use of tagging. Let's consider a financial tracking app. In our app the user enters expense items, and each item has a title, a cost, and a comment. With the NLP APIs we can analyze the user's comments and identify patterns. For example, if a user often enters comments like "dinner with Marty" and "drinks with Dianne", we can use those to build knowledge about their spending habits. By using tokenization and named entity recognition we can identify these comments as being of a similar type; perhaps we group them as social or entertainment events. When a user then looks back at their monthly finances we are able to show them that they spent 30% of their budget on these types of expenses. Furthermore, with these same two tools, tokenization and named entity recognition, we could identify places where our user likes to shop or dine. We could track expense categories that tend to be high and suggest a savings plan to help balance their budget. Additionally, we could build payment reminders if we notice a recurring cost item that includes a specific person or company.

The new APIs are housed under a class in Foundation called NSLinguisticTagger.

This class is used to segment and tag text. In other words we are able to break up text into more meaningful pieces and identify them in useful ways through natural language processing.

Through the NLP APIs we have a rather diverse collection of tools that we can leverage depending on our needs.

We begin with a source of natural language text, and feed this into NLP. This text could be user input, a document file, or even a web article. From here we are able to leverage the new NLP APIs to complete any of the following tasks, known as tagging schemes:

  • Language Identification (Is this phrase in English or Spanish?)
    • Identify the specific language a user is typing in
  • Tokenization (Break text into paragraphs, sentences, and words)
    • Targeting language by its hierarchical nature (words, sentences, paragraphs, etc.)
  • Parts of speech (Break a sentence into nouns, verbs, and adjectives)
    • Nouns, verbs, adjectives
  • Lemmatization (Identify all of the forms of a word)
    • A lemma is the root form of a word which can be used to identify all possible forms of a word
      • preside ⇢ presided, presides, presiding
  • Named entity recognition (Identify the names of people, places, and things within a sentence or text)
    • Person names, Organization names, Object names

Previously we were only able to perform operations on words.

The new NLP APIs allow us to perform operations on multiple units. These tagging units are provided as a public enum. This greatly improves our ability to interpret language and perform much more complex tasks within our applications:

public enum NSLinguisticTaggerUnit: Int {
 case word
 case sentence
 case paragraph
 case document
}

While we do have access to all of the tag schemes for each tagging unit, we cannot apply all of them to each unit. For example, we cannot apply parts of speech to a paragraph, or lemmatization to sentence. Apple provides a very useful way to find out which schemes are available to each tag:

class func availableTagSchemes(for unit: , language: ) -> [NSLinguisticTagScheme]

Let's examine some example implementations to see how this works.

Assume we have an app that provides the ability to perform text based searching. Maybe this is photo app with photo tagging, or a note taking app, or even a calendar. Without implementation of the NLP APIs our searches will return results as one to one. If we search for the word "bike" we will only see search results that directly contain the word "bike". This could be very limiting considering the possible other forms of this word that could be found; biking, bikes, biked. This would be an example of leveraging the lemmatization API, processing inflected versions of the root term.

We would start with language identification; this step is important as we may in fact have text in multiple languages within our text resources. From here we would tokenize the text, identifying words, sentences, paragraphs, etc. Finally, we leverage part of speech tagging to perform lemmatization.

NLP Processing Diagram

Code example: Language Identification

In this example we look at how we can use NSLinguisticTagger to identify the language of user text. If we know that the user is only going to be working with one of the dominant languages of the API, we can use the dominantLanguage method and leverage the API to do the heavy lifting for us, simply passing in a string of text. If we aren't as certain about the language we can use orthography to set our language identity. We could use the following block of code to determine what the language is, and then set it for processing with the NLP APIs.


var tagger : NSLinguisticTagger

func determineLanguage(for text: String) {
 // initialize linguistic tagger with the language scheme
 tagger = NSLinguisticTagger(tagSchemes: [.language], options: 0)

 // determine dominant language
 tagger.string = text

 // tagger returns the language, which we could use if we wanted to, but more imporantly tagger will automatically set the orthography for the language
 // we could assign taggers output to _ if we did not feel that we needed to use the return value of #dominantLanguage
 let language = tagger.dominantLanguage
}

// example usage:
let text = "I see, said the blind man as he picked up the hammer and saw."
determineLanguage(for: text)
print(tagger.dominantLanguage) // "en"

// alternatively if we know the dominant language already we can instead tell tagger what the dominant language is
// in this case we would change the argument of our function to include the name of our known language as well as a string of text

func determineLanguage(for text: String, with language: String) {
 // initialize linguistic tagger with the language scheme
 tagger = NSLinguisticTagger(tagSchemes: [.language], options: 0)

 // set the orthography for the text using the known langugage 
 let range = NSRange(location: 0, length: text.count)
 tagger.setOrthography(NSOrthography.defaultOrthography(forLanguage: language, range: range))
}

//example usage:
let text = "You have just begun reading the sentence you have just finished reading."
let knownLanguage = "en"
determineLanguage(for: text, with: knownLanguage)
print(tagger.dominantLanguage) // "en"

Code example: Lemmatization

Lemmatization is the identification of lemmas, or root forms of a word. Previously we were only able to recognize known forms of a word, but through lemmatization we are now able to develop abstractions like plurality, verbs, and tense. In this example we have a function that takes a string of text and returns for us a collection of lemmatized words. This function could be used to take a search phrase from a user's input and expand results.


func setOfWords(from string: String, with language: inout String?) -> Set {
 var wordSet = Set()

 // note that we are able to use multiple schemes at the same time within a function
 let tagger = NSLinguisticTagger(tagSchemes: [.lemma, .language], options: 0)
 let range = NSRange(location: 0, length: string.utf16.count)

 tagger.string = string

 // since language is an optional param we only set orthography if the language is provided, otherwise we ask tagger to do it for us
 if let language = language {
 tagger.setOrthography(NSOrthography.defaultOrthography(forLanguage: language), range: range)
 } else {
 language = tagger.dominantLanguage
 }

 tagger.enumerateTags(in: range, unit: .word, scheme: .lemma, options: options) {
 tag, tokenRange, _ in
 let token = (string as NSString).substring(with: tokenRange)

 // insert the root of the word into the collection
 wordSet.insert(token.lowercased())

 // detect lemmas and insert them into the collection
 if let lemma = tag?.rawValue {
 wordSet.insert(lemma.lowercased())
 }
 }

 return wordSet
 }

 // example usage:
 let language = "en"
 let word = "Biking"
 let wordSet: Set = myTaggerClass.setOfWords(from: word, with: &language)

 print(wordSet) // ["bike", "biking", "bikes", "biked"]

 // now we have the ability to access all lemmas of a word based on any variation of the word a user provides. One example here would be searching for photos within a photo album app based on user descriptions. If a user added a description to a photo that said "Biking with the family on a lovely spring day", a search for "bike" would return that photo.
 

Code example: Named Entity Recognition

Named entity recognition is a powerful tool that allows us to identify the names of people, places, or things. This tool when paired with tokenization allows us to truly develop smart applications. In the function below we are trying to identify elements of these name types to infer the names and locations of important people within the sentence provided.


 func nameSet(from string: String, with language: inout String?) -> [NamedEntity] {
 let tagger = NSLinguisticTagger(tagSchemes: [.nameType], options: 0)
 tagger.string = string

 let range = NSRange(location: 0, length: string.count)

 if let language = language {
 tagger.setOrthography(NSOrthography.defaultOrthography(forLanguage: language), range: range)
 } else {
 language = tagger.dominantLanguage
 }

 // .joinNames allows us to evaluate named entities that space multiple words like Sandy Williamson
 let options = NSLinguisticTagger.options = [.omitPunctuation, .omitWhitespace, .joinNames]

 var extractedEntities: [NamedEntity] = []
 let tags: [NSLinguisticTag] = [.personalName, .placeName, .organizationName]

 tagger.enumerateTags(in: range, unit: .word, scheme: .nameType, options: options) {
 tag, tokenRange, stop in

 if let tag = tag, tags.contains(tag) {
 let token = (text as NSString).substring(with: tokenRange)
 extractedEntities.append(NamedEntity(token: token, tag: tag, range: tokenRange))
 }
 }

 return extractedEntities
 }

 // example usage
 let sentence = "In 1997, Sandy Williamson and Slaughter Fitz-Hugh co-founded CapTech Consulting, Inc. in their hometown of Richmond, VA."
 let importantNames = myTaggerClass.nameSet(from: sentence, with: nil)

 print(importantNames) // ["Sandy Williamson", "Slaughter Fitz-Hugh", "CapTech Consulting", "Richmond VA"]
 

Code example: Tokenization

In this final example we see implementation of tokenization. Tokenization allows us to examine text based on its hierarchical levels; words, phrases, sentences, paragraphs etc. This API enables us to work with more complex linguistic structures and identify elements within them. In the example below we have a function that examines a user's sentence and looks for key words. If a key word is found we trigger an interaction for the user. This interaction could be a pop up that makes a suggestion on the user's next actions.

func detectKeyWords(within string: String) {
 let tagger = NSLinguisticTagger(tagSchemes: [.tokenType], options: 0)
 tagger.string = string

 let range = NSRange(location: 0, length: string.count)
 let options = NSLinguisticTagger.options = [.omitPunctuation, .omitWhitespace]

 // we now can enumerate on a specific unit within our text and do something with it, in this case we are targeting words
 tagger.enumerateTags(in: range, unit: .word, scheme: .tokenType, options: options) {
 tag, tokenRange, stop in
 let token = (text as NSString).substring(with: tokenRange)

 // keyWordCollection is a high level collection of words we care about, most likely a class or app level variable
 if keyWordCollection.contains(token) {
 triggerUserInteraction(basedOn: string)
 }
 }
 }

 func triggerUserInteraction(basedOn text: String) {
 // this function will ask the user if they would like to set a calendar reminder on their device
 // furthermore within this fuction we could use named entity recognition to come up with specific suggestions for what the calendar reminder should be about
 }

 // example usage
 keyWordCollection = ["occur", "2017", "event"]

 let bodyOfText = "On August 21st 2017 there will be a full solar eclipse of the sun! The first of its kind since 1918! This is an event not to be missed!"

 myTaggerClass.detectKeyWords(within: bodyOfText)

 // the user's phone pops up an alert saying something like: "We found something that you might be interested in. There is an event on August 21st, 2017. Full solar eclipse. Would you like to set a reminder for this?"
 

Conclusion

By leveraging these powerful new tools, we are able to provide homogeneous text processing, giving our users a consistent text processing experience across all Apple platforms. These are the exact same NLP APIs that Apple employs in its first party apps, which means that integration will provide a familiar experience to our users.

The new NLP APIs enhance privacy, as all of the processing now takes place on the device as opposed to a server, keeping content and information in place and private. Apple has dramatically increased performance by highly optimizing the on-device processes, and integrating multi-threaded processing.

Existing clients of the NLP APIs will see a significant increase in speed. For example, part of speech tagging has improved processing rate from approximately 50,000 tags per second to 80,000 tags per second. Named entity recognition has improved procesing rate from 40,000 tags per sec up to tags 65,000 per second. This means the improved NLP APIs can handle hundreds of documents per second on the user's device.

Apple has also increased language support to include language identification of 29 scripts and 52 languages. Tokenization includes all iOS and MacOS system languages, and lemmatization, part of speech recognition, and named entitiy recognition are available in eight languages. Accuracy has also improved, bringing the averages to within 85% to 90% for most processes.

Every year we are excited to see the latest and greatest Apple has to offer, and we have come to expect a certain level of wow. WWDC 2017 did not disappoint, offering a new horizon in personal computational power. The introduction of and improvements to the NLP APIs are a game changer, giving developers the ability to create smart, engaging apps that truly personalize themselves to the user. Technology continues to be a key to globalization, allowing us to enhance our communication, increase our worldly reach, and touch new arenas of innovation. Natural language processing is a simple integration with powerful implications, enabling us to truly enhance our ability to communicate while adding a new thread of personality to mobile devices.

Interested in finding out more about CapTech? Connect with us or visit our Career page to learn more about becoming a part of a team that is developing world-class mobile apps for some of the largest institutions in the world.

CapTech is a thought leader in the Mobile and Devices spaces, and has deployed over 300 mobile releases for Fortune 500 companies. Additionally, CapTech was identified by Forrester as one of top Business to Consumer (B2C) Mobile Services Providers in their 2016 Wave.