When it comes to Augmented Reality, most consumers have no idea how to get involved. Due to the complex challenges of building AR, development teams capable of delivering such applications are few and far between.

The sad reality is that the only AR tech most consumers have been exposed to is a game like Pokèmon Go, or a HoloLens booth at a Windows Store. At $3,000 for a developer kit, the HoloLens is not exactly consumer-friendly. It's clear that today's AR options are either extremely limited in scope or prohibitively expensive.

Because of this, businesses can't build unique, branded AR experiences. There just isn't a platform with enough users to justify the R&D.

With their announcement of the ARKit Framework at WWDC 2017, Apple has made a huge step in bringing AR into the mainstream. As of iOS 11, millions of devices are now AR-ready platforms using an API that, true to Apple's tradition, is both powerful and easy to implement.

For retail companies, the IKEA demo at WWDC is a great example of how AR applications can engage consumers with a brand and increase conversion rates. For the entertainment industry, AR opens an entirely new medium of storytelling. For manufacturers & hardware engineers, AR is a new tool in a portable toolkit. For designers, AR is a new interface for crafting 3D content. For educational institutions, AR is a new technique for engaging visual learners.

Similar to other emerging user interfaces like voice (VUIs), Augment Reality is a frontier that has yet to mature into a core set of principles regarding user experience and usability.

Are you excited to build the next truely innovative AR experience? Before that can happen, you'll need to grasp the fundamentals of the ARKit SDK. That's what this article is all about!

How Does ARKit Work?

ARKit is essentially a fusion between a bunch of frameworks, some of them brand new:

  • AVFoundation: Monitors device camera input and renders it on-screen.
  • CoreMotion: Monitors device movement using internal hardware such as the gyroscope, accelerometer, and compass.
  • Vision (New): Applies high-performance computer vision algorithms to identify the interesting features in a scene.
  • CoreML (New): Generates predictions from pre-trained machine learning models.

In addition to these, a rendering library is needed to generate content for the AR experience:

  • SceneKit: Render 3D content into an AR scene.
  • SpriteKit: Render 2D content into an AR scene.
  • Metal: Render 3D content into an AR scene for advanced game development (Apple's replacement of OpenGL).

All of these technologies are activated with the help of hardware support on the A9 and later processors.

Checking Compatibility

Since not all devices support ARKit fully, it's important to check the ARWorldTrackingSessionConfiguration.isSupported property before using the framework. The following devices support ARKit 100%:

  • All iPad Pro models
  • 9.7" iPad (2017)
  • iPhone 7/7 Plus
  • iPhone 6S/6S Plus
  • iPhone SE

An ARWorldTrackingSessionConfiguration on these devices provides us with the most accurate AR experience using six degrees of freedom (6DOF):

  • 3 rotational axes: Roll, pitch, yaw
  • 3 translation axes: Movement in X, Y, Z

Only A9 and later devices support 6DOF. Earlier devices only support 3DOF, which are the 3 rotational axes. These devices are configured with ARSessionConfiguration instead. Note that ARSessionConfiguration does not support plane detection.

Another limitation is that ARKit will stop its simulation when an iOS device goes into split-screen mode. When using ARSCNView, the view goes white until the OS goes back into single-app mode. Apple has chosen to limit this ability instead of overloading the processor and risking bad performance. AR-based apps should handle this scenario elegantly.

If you're writing an app that only supports ARKit, provide the arkit value in the list of UIRequiredDeviceCapabilities in your plist, and it won't be made available to non-ARKit devices on the app store.

ARKit Fundamentals

In this article I'll outline some basic AR techniques that form a foundation for future AR development. At the end of this tutorial, you should walk away with a clear understanding of how to accomplish the following:

  1. Adding a 3D object into a real-world scene
  2. Detecting & visualizing horizontal planes (tables, floors, etc.)
  3. Adding physics to real-world scene
  4. Anchoring 3D models to real-world objects

All of the code for this tutorial is available on GitHub if you'd like to follow along. There are 3 view controllers in the sample project:

  • SimpleShapeViewController: Tap with 1 finger to add a sphere in front of the camera. Tap with 2 fingers to add a cube in front of the camera.
  • PlaneMapperViewController: Look around the scene to identify planes, highlighted in blue. Tap to drop a cube in front of the camera that will respond to gravity and collide with detected planes.
  • PlaneAnchorViewController: Tap to hit-test a specific point for a horizontal plane. If found, a coffee cup will be anchored to the plane at life-scale.

The scope of this tutorial only includes using SceneKit to manipulate an AR scene. We will not cover using SpriteKit (2D content) or Metal (more advanced 3D content).

If you know nothing about SceneKit, that's ok. You can still follow along without getting stuck. But it doesn't hurt to check out a basic SceneKit walkthrough if you have time...there are plenty online.

Go ahead and launch Xcode 9 and let's get started!

Setting Up an AR Scene

The first thing to do is set up an ARSCNView. This is basically a mashup between ARSession (from ARKit) and SCNView (from SceneKit).

  • The view automatically renders the live video feed from the device camera as the scene background.
  • The world coordinate system of the view's SceneKit scene directly responds to the AR world coordinate system established by the session configuration.
  • The view automatically moves its SceneKit camera to match the real-world movement of the device.

Let's create our ARSCNView inside our viewDidLoad:

let sceneView = ARSCNView()
self.sceneView.autoenablesDefaultLighting = true
self.sceneView.antialiasingMode = .multisampling4X

Note that you can also create an ARSCNView inside a .xib, which is what I do in the sample project.

The autoenablesDefaultLighting property (from the SCNView superclass) tells the view to add a diffuse light source to the scene since we're not adding any ourselves. Combined with the automaticallyUpdatesLighting property (defaults to true) from ARKit, it means we'll have a light source that is continuously updated based on the analyzed real-world light coming from the camera feed. Pretty nifty!

Setting the antialiasingMode is optional, but it can help smooth jagged edges for the 3D objects we place on-screen. It defaults to .none.

We start & stop our ARSession based on our view's appearance callbacks:

override func viewWillAppear(_ animated: Bool) {
 super.viewWillAppear(animated)

 if ARWorldTrackingSessionConfiguration.isSupported {
 let configuration = ARWorldTrackingSessionConfiguration()
 self.sceneView.session.run(configuration)
 } else if ARSessionConfiguration.isSupported {
 let configuration = ARSessionConfiguration()
 self.sceneView.session.run(configuration)
 }
}

override func viewWillDisappear(_ animated: Bool) {
 super.viewWillDisappear(animated)
 self.sceneView.session.pause()
}

For the most part, ARSessionConfiguration and its subclasses (currently only 1) can be used without any extra tweaking. If you want horizontal plane detection, you'll have to set the planeDetection property on ARWorldTrackingSessionConfiguration (more on that later). There's also the worldAlignment property, which controls how the AR world is mapped to the real world. Here are the options:

  • .gravity: The default setting. The AR coordinate space will be mapped to the real-world coordinate space as closely as possible. Any realistic AR experience should use this setting.
  • .gravityAndHeading: In addition to the Y-axis being parallel to gravity, the X and Z axes are oriented to the compass heading. This means that if the user rotates left or right, the entire scene will rotate with them.
  • .camera: All axes are locked to the orientation of the camera, so wherever the user looks the scene will follow. This means the 3D content has no connection with the real-world position of objects, they are simply overlaid onto the camera feed.

Additionally, our ARSCNView exposes a few important properties:

  • scene: The SCNScene that renders 3D content into the view.
  • session: The active AR session, containing configuration and ARAnchor management.
  • delegate: Can provide SCNNode instances for a detected ARAnchor, such as an ARPlaneAnchor (more on this later).

That's all we need to create an AR scene! But without content, there's nothing "augmented" about it. Which leads us to...

Adding 3D Objects to an AR Scene

ARKit automatically matches our SceneKit coordinate space to the real world, so placing objects that appear to maintain a real position is as easy as using the normal methods for setting an object's position in SceneKit. All we have to do is add our 3D content to the scene using real-world units, so they appear to be the correct size. All units in SceneKit are expressed in meters.

In our SimpleShapeViewController, we've added a gesture recognizer to accept taps on the view. When that happens, we'll place a sphere 1 meter in front of the camera. Just for fun, we'll use a random color and radius, and we'll let a 2-finger tap generate a cube instead.

If you've never used SceneKit before, here's a quick crash course:

  • A SCNScene is a hierarchy of SCNNode instances. Each scene has a rootNode and can contain multiple camera nodes, light nodes, a physicsWorld and animated effects.
  • A SCNNode is a 3D object with its own position, transform, scale, rotation, and orientation in its parent coordinate space. Each node can have childNodes and has other properties such as name (typically used to look up nodes within a scene) and isHidden, to name a few.
  • A SCNGeometry is a mesh (collection of vertices) making up a 3D polygon. A SCNNode can have only one geometry. In this tutorial we'll use SCNSphere and SCNBox, but there are a number of other options. You can also build custom geometry yourself in code.
  • A SCNMaterial is a collection of visual attributes that defines how a geometry looks, such as its color or texture. Each SCNGeometry can have multiple materials that interact with each other.
  • A SCNPhysicsBody defines how a node interacts with other nodes as part of the physics engine.

We're going to create a simple sphere node, so let's start by creating a geometry:

let radius = (0.02...0.06).random() // keep things interesting
let sphere = SCNSphere(radius: CGFloat(radius))

We need some color, so let's make a material that uses a random color:

let color = SCNMaterial()
color.diffuse.contents = self.randomColor() // returns a UIColor
sphere.materials = [color]

There are a ton of shading attributes you can provide for a material that will affect the look of your geometry. For example, you can set the lightingModel to .physicallyBased to get a more realistic lighting effect. You can set a UIImage to the diffuse.contents property to apply a bitmap texture.

There are even more properties to be discovered. I suggest reading the SCNMaterial Documentation to learn more!

Next, let's create a node to contain our geometry:

let sphereNode = SCNNode(geometry: sphere)

We want to position our node 1 meter in front of the camera, which is the Z axis within the camera's local coordinate space. We also want it to face the user.

let camera = self.sceneView.pointOfView!
let position = SCNVector3(x: 0, y: 0, z: -1)
sphereNode.position = camera.convertPosition(position, to: nil)
sphereNode.rotation = camera.rotation

First we access the camera, which is the point of reference the user is controlling inside the simulated world.

Our target position is a combination of X,Y,Z coordinates where we "push" the node backwards by 1 meter along the Z axis. This is the node's position relative to the camera.

We then use the convertPosition method to convert our target position to be relative to the world itself (passing nil defaults to the world coordinate space). We also match the camera's rotation to ensure the node is facing us (not visible for spheres, but matters for cubes and other shapes).

The last step is adding the node into our scene:

self.sceneView.scene.rootNode.addChildNode(sphereNode)

Now, when tapping on the screen, we can build clouds of spheres and cubes that stay anchored in real-world positions!

simple shapes

Detecting & Visualizing Horizontal Planes

Moving over to our PlaneMapperViewController, we're going to build a way to visualize the horizontal planes that ARKit can detect for us. This functionality can be enabled through the planeDetection property of ARWorldTrackingSessionConfiguration. As of this writing, there is only support for PlaneDetection.horizontal, but I'm sure Apple is already working on the vertical equivalent!

When using ARSCNView, the underlying SCNSceneRenderer will automatically create nodes for any detected ARPlaneAnchor instances provided by the ARSession. We can react to these through the ARSCNViewDelegate protocol.

We start by conforming our view controller to that protocol, and passing a reference to it to the sceneView.

self.sceneView.delegate = self

When the ARSession detects a plane, we receive the following message:

func renderer(_ renderer: SCNSceneRenderer, didAdd node: SCNNode, for anchor: ARAnchor) {
 guard let planeAnchor = anchor as? ARPlaneAnchor else { return }

 // Handle new plane
}

Our plan is to add a child node to the generated one that covers the plane in a semi-transparent blue. As a child node, all of our positioning is relative the the parent node's coordinate space. We simply have to match the ARPlaneAnchor size and center, and inherit our parent node's position.

Note that the parent node provided does not have any geometry, and its position is only accurate after a subsequent didUpdate callback.

We start by creating a SCNBox geometry matching the plane's extent on the X and Z axes:

let plane = SCNBox(width: CGFloat(planeAnchor.extent.x), height: 0.005, length: CGFloat(planeAnchor.extent.z), chamferRadius: 0)

Note that we could also use a SCNPlane (and apply a transform due to its local orientation being vertical), but I found that since they're so thin, they don't play well with the physics engine we add later on. The 5mm height we provide to our SCNBox is more than enough thickness.

We add a simple blue material:

let color = SCNMaterial()
color.diffuse.contents = UIColor(red: 0, green: 0, blue: 1, alpha: 0.5)
plane.materials = [color]

Next, we configure the position of our node to match the plane's center X and Z. The Y axis is set to -5mm to account for the height we gave to our geometry. Note that the value of planeAnchor.center.y is always 0.

let planeNode = SCNNode(geometry: plane)
planeNode.position = SCNVector3Make(planeAnchor.center.x, -0.005, planeAnchor.center.z)

Finally, we add our custom planeNode as a child of the node provided from the renderer. This node is associated with the ARPlaneAnchor that was recognized and will be updated as the ARSession gathers new data from the sensors.

node.addChildNode(planeNode)

This code is enough to get a blue plane on-screen, but it's not quite enough to keep it up-to-date.

ARKit continuously monitors existing anchors to determine if they have moved or changed in size. As a user moves the camera around the scene and the computer vision algorithm understands more about the environment, new horizontal planes are recognized and existing planes are merged into larger planes. The ARSCNViewDelegate is invoked throughout this process.

At the end of our didAdd method, we keep a reference to our new node in a dictionary so we can make updates later.

let key = planeAnchor.identifier.uuidString
self.planes[key] = planeNode

Once the engine decides to update a registered plan, we simply need to re-assign the proper geometry and position:

func renderer(_ renderer: SCNSceneRenderer, didUpdate node: SCNNode, for anchor: ARAnchor) {
 guard let planeAnchor = anchor as? ARPlaneAnchor else { return }

 let key = planeAnchor.identifier.uuidString
 if let existingPlane = self.planes[key] {
 if let geo = existingPlane.geometry as? SCNBox {
 geo.width = CGFloat(planeAnchor.extent.x)
 geo.length = CGFloat(planeAnchor.extent.z)
 }
 existingPlane.position = SCNVector3Make(planeAnchor.center.x, -0.005, planeAnchor.center.z)
 }
}

Since SCNNode and SCNGeometry are a reference types, any changes to their properties will be applied on the next frame.

All that's left is handling when planes are removed from the scene. This happens when multiple existing planes are merged into one. We'll invoke removeFromParentNode to clean up.

func renderer(_ renderer: SCNSceneRenderer, didRemove node: SCNNode, for anchor: ARAnchor) {
 guard let planeAnchor = anchor as? ARPlaneAnchor else { return }

 let key = planeAnchor.identifier.uuidString
 if let existingPlane = self.planes[key] {
 existingPlane.removeFromParentNode()
 self.planes.removeValue(forKey: key)
 }
}

Note that the didRemove method is not invoked when the user looks away from the plane, causing it to be off-screen. Even if a user is not looking at a node, it is still in memory and its position is mainted relative to where the user is currently pointing the camera.

This is precisely what makes ARKit so powerful. The engine actively builds a digital representation of your world and doesn't forget what is has seen previously. You can test this out yourself by placing objects in a room, walking to a completely different room or down a hallway, then turning around and walking back to the objects. Impressively, ARKit still has your objects anchored exactly where you left them!

Ok, let's run our PlaneMapperViewController and start looking around.

plane mapper

You'll notice that it takes a while for planes to be recognized, but once they are registered they start to grow pretty quickly. You'll also notice that it's not perfect, you can see in the screenshot that the tops of the chairs were counted along with the tabletop. Even still, ARKit can usually recognize multiple tiers of planes, such as a table and the floor below it.

I've found that moving the camera forward and backward helps ARKit recognize the plane more quickly.

Adding Physics to an AR Scene

Now that we have planes detected and nodes in place to visualize them, let's add some physics to the scene. Similar to the first demo, we create some logic to drop a cube 1 meter in front of the camera:

let size = CGFloat((0.06...0.1).random())
let box = SCNBox(width: size, height: size, length: size, chamferRadius: 0)
// configure color
// create node (boxNode)
// set position

I have removed code we've already covered, scroll up to see the missing parts!

Next, we associate a SCNPhysicsBody to the node:

let physicsBody = SCNPhysicsBody(type: .dynamic, shape: SCNPhysicsShape(geometry: box, options: nil))
physicsBody.mass = 1.25
physicsBody.restitution = 0.25
physicsBody.friction = 0.75
physicsBody.categoryBitMask = CollisionTypes.shape.rawValue
boxNode.physicsBody = physicsBody

The physics API is super easy. Setting .dynamic means that our object will be affected by both forces (gravity) and collisions. The mass, restitution and friction properties are among a number of configuration options for the physics body. The categoryBitMask is required to manage collision detection. More on that later.

We also need to update our generated plane nodes to have physics bodies as well:

let body = SCNPhysicsBody(type: .kinematic, shape: SCNPhysicsShape(geometry: plane, options: nil))
body.restitution = 0.0
body.friction = 1.0
planeNode.physicsBody = body

Bodies with type .kinematic are unmoved by forces or collisions, but can cause collisions with other objects (such as our .dynamic cube), and can be moved (unlike .static bodies).

We also need one more physics body, which I'm calling the "world bottom". This will catch any freefalling cubes that don't impact with a plane node, to ensure they are destroyed instead of being simulated forever. This will prevent our device from running out of memory if the user starts adding a bunch of cubes to the scene.

// Use a huge size to cover the entire world
let bottomPlane = SCNBox(width: 1000, height: 0.005, length: 1000, chamferRadius: 0)

// Use a clear material so the body is not visible
let material = SCNMaterial()
material.diffuse.contents = UIColor(white: 1.0, alpha: 0.0)
bottomPlane.materials = [material]

// Position 10 meters below the floor
let bottomNode = SCNNode(geometry: bottomPlane)
bottomNode.position = SCNVector3(x: 0, y: -10, z: 0)

// Apply kinematic physics, and collide with shape categories
let physicsBody = SCNPhysicsBody.static()
physicsBody.categoryBitMask = CollisionTypes.bottom.rawValue
physicsBody.contactTestBitMask = CollisionTypes.shape.rawValue
bottomNode.physicsBody = physicsBody

self.sceneView.scene.rootNode.addChildNode(bottomNode)

SceneKit will automatically collide our cubes with the plane nodes we've detected. But when it comes to the world bottom, we need some custom logic to remove the cube nodes when they've collided. To do that, we implement the SCNPhysicsContactDelegate protocol:

self.sceneView.scene.physicsWorld.contactDelegate = self

Once a collision is detected, we get the following callback:

func physicsWorld(_ world: SCNPhysicsWorld, didBegin contact: SCNPhysicsContact) {
 let mask = contact.nodeA.physicsBody!.categoryBitMask | contact.nodeB.physicsBody!.categoryBitMask

 if CollisionTypes(rawValue: mask) == [CollisionTypes.bottom, CollisionTypes.shape] {
 if contact.nodeA.physicsBody!.categoryBitMask == CollisionTypes.bottom.rawValue {
 contact.nodeB.removeFromParentNode()
 } else {
 contact.nodeA.removeFromParentNode()
 }
 }
}

Remember those categoryBitMask and contactTestBitMask properties we added to our physics bodies earlier? We're using those properties to determine what nodes were involved and which ones to remove. Here's a quick peek at the CollisionTypes declaration:

struct CollisionTypes : OptionSet {
 let rawValue: Int

 static let bottom = CollisionTypes(rawValue: 1 << 0)
 static let shape = CollisionTypes(rawValue: 1 << 1)
}

We're now ready to run the scene, and start dropping cubes!

plane physics

It's particularly fun to drop cubes at the edges of tables and watch them slide off. In the demo application, you can choose to hide the plane visualization to get a better view.

plane physics hidden

Anchoring 3D Models to Real-World Objects

It's a fun demo to add a random shape to an AR world, but the next level of immersion is adding 3D models based on real-world objects. Using models with high polygon counts, realistic lighting effects, high-res textures and accurate scale allows us to make the division between real and generated content more seamless and engaging.

The first and most difficult part of this process is finding a suitable model. Xcode can import Collada (.dae) and SceneKit (.scn) files, and can convert from .dae to .scn in order enable more advanced built-in editing features. 3D models and their textures should be placed inside a .scnassets file which contains special logic to normalize models and support app thinning & on-demand resources.

There are a number of sites hosting both free and paid models that can be imported into SceneKit scenes. Here's a few:

  • Google 3D Warehouse: Full of free models built using Google's SketchUp software.
  • yobi3D: A large number of low to mid-quality models, many of them free.
  • TurboSquid: A nice collection of high-quality models oriented for professionals.

In our example I'm using a model of a coffee mug, with a custom texture that displays the CapTech logo.

xcode mug

Once you've decided on the model you want to place into the AR environment, chances that you'll be able to use it as-is are extremely slim. You'll want to tweak the model to make your life easier when importing it into SceneKit later. I recommend downloading Blender, an open-source 3D modeling program, in order to ensure your model meets the following criteria:

  • Is made up of a single object, so it is easy to import into the AR scene in one fell swoop.
    • In Blender, you can join multiple objects (and their underlying mesh) by shift-selecting all of them and choosing "Join" from the tools menu on the left.
  • Is free of any extra camera or light sources. We don't need these in our scene since ARKit is handling both.
  • Is in real-world scale. This is a crucial one...otherwise you'll end up with a model the size of your house.
    • You can change the unit of measure for the model's world from the Scene -> Units menu on the bottom right.
    • You can measure you model using the Grease Pencil -> Ruler/Protractor option on the left.
    • You can scale your model from the Object -> Transform menu on the bottom right, or the Tools -> Transform menu on the top left.
    • To apply your current scale as the new 100%, select the Object and choose Object -> Apply -> Scale
  • Is positioned properly in the local coordinate space.
    • For the coffee mug, I positioned the model so the bottom of the mug is at Y zero, and the model is centered on the X and Z axes.
  • Is textured properly, to look nice in the real-world environment.
    • The "Physically Based" lighting model (available in Xcode) works well.

It's worth noting that you can perform almost all of these changes from within Xcode itself, with the exception of joining objects and changing the relative scale. If you find yourself having to input extremely tiny scale values in Xcode, you may want to use Blender to re-adjust the model's overall size.

Also, if you find that updating your model in Xcode is not reflecting in your scene when running on a device, uninstall the app and clear the build folder. I had to do this frequently to get my changes to apply.

Now that our model is prepped, let's start adding logic to our PlaneAnchorViewController. When the user taps on the view, we want to "hit test" the scene to find horizontal planes behind their finger. This functionality is built directly into ARSCNView.

@IBAction func tapScreen(sender: UITapGestureRecognizer) {
 guard sender.state == .ended else { return }

 let point = sender.location(in: self.sceneView)
 let results = self.sceneView.hitTest(point, types: [. existingPlaneUsingExtent, .estimatedHorizontalPlane])
 self.attemptToInsertMugIn(results: results)
}

We're using the .existingPlaneUsingExtent type to match existing recognized planes, limited to their current size. This is as opposed to using .existingPlane which doesn't limit by size and could result in a mug floating in mid-air next to a plane. The estimatedHorizontalPlane option gives us a better chance of finding a plane that hasn't been registered yet.

Once we find a match, we can instantiate a new node from our 3D model. Since we've already scaled and centered our model using Blender, this is the easy part!

 if let match = results.first {
 let scene = SCNScene(named: "art.scnassets/mug.scn")!
 let node = scene.rootNode.childNode(withName: "mug", recursively: true)!

 let t = match.worldTransform
 node.position = SCNVector3(x: t.columns.3.x, y: t.columns.3.y, z: t.columns.3.z)

 self.sceneView.scene.rootNode.addChildNode(node)
}

Build it, run it, and you now have unlimited virtual swag.

mug
mugs

Conclusion

This is just a taste of what's possible with the ARKit framework. The power of ARKit can also be combined with custom code using the SpriteKit, Metal, Vision and CoreML frameworks with fascinating results. With ARKit, Apple has forged a platform that makes 3D experiences truly portable for the first time.

If you haven't already, don't forget to check out the sample code on GitHub!

Interested in finding out more about CapTech? Connect with us or visit our Career page to learn more about becoming a part of a team that is developing world-class mobile apps for some of the largest institutions in the world.

CapTech is a thought leader in the Mobile and Devices spaces, and has deployed over 300 mobile releases for Fortune 500 companies. Additionally, CapTech was identified by Forrester as one of top Business to Consumer (B2C) Mobile Services Providers in their 2016 Wave.