SwiftUI Coding Katas
As I’ve been teaching myself SwiftUI, I like to run through the following coding katas. Note that they’re katas, not one-time exercises. Run through them repeatedly, from scratch, to start exercising your SwiftUI muscles on a regular basis.
Tic Tac Toe
Create a tic tac toe game in SwiftUI. To save time, you may use the SF Symbols characters “xmark”, “circle”, and “square.dotted” to represent X, O, and empty spaces, respectively. Game should allow two players to alternate moves, and indicate when a player wins (showing the 3 in a row) or that the game is a tie. There should also be a way to start a new game. For extra credit, add a computer player.
Chess
Here are some Unicode glyphs for chess pieces:
♚♛♝♞♜♟ ♔♕♗♘♖♙
Write a SwiftUI app to allow two players to play chess on a chessboard. (Note that each player should have a white square on the right side of their end of the board.)
Master-Detail
Using the JSON file from:
https://jsonlint.com/datasets/us-states-with-detail
Create a master-detail app with an alphabetical list of state names that link to a detail page for each state with the other data. The neighboring states list should link to those states’ detail views. You may include the state data as a static resource, but for extra credit, have your app download the JSON data directly.
API Calls
Using the Wikipedia API calls:
https://en.wikipedia.org/w/api.php?action=query&format=json&list=recentchanges&formatversion=2
and
Create a master-detail app listing the recent changes on Wikipedia and letting you click through to the page within the app.
Getting Started Developing for Vision Pro and visionOS
Get an Overview
Start with the last 45 minutes of the WWDC Keynote. This shows how Apple imagines people will interact with the Vision Pro and how it will be used. If that’s TL;DR, at least watch the one-minute Vision Pro ad from the end of the Keynote.
Then watch the last 30 minutes of the Platforms State of the Union. This provides a good overview of the technologies and APIs involved.
Native or Unity?
Right now there are two basic ways to develop for Vision Pro: Apple’s native APIs or using the Unity game engine. Because I ran across several issues trying to develop for Vision Pro with Unity, the rest of this document focuses on native development.
Know How to Program the iPhone
If you’ve never developed for Apple platforms before, I recommend learning how to program for the iPhone first. visionOS builds on iOS, using the Swift programming language and especially frameworks like SwiftUI and RealityKit. Also, if you’ve never used Apple’s device provisioning and code signing before, learning these on the iPhone will be way easier.
If you don’t know iPhone yet, may I recommend 100 Days of SwiftUI. It’s a 100 day course taking about an hour a day. It’ll teach you the Swift programming language and the SwiftUI user interface framework, along with some Apple development basics. Once you’re done with that class, come back here.
Get the Right Version of Xcode
Right now, developing for visionOS and Vision Pro requires Xcode 15.1, beta 1 (and not the version of Xcode 15 you can find in the app store). There’s a big bold message next to the download link that says, “Note: Developing for visionOS requires a Mac with Apple Silicon.” When you launch Xcode, it’ll ask which platforms you’ll want to develop for so it can download the support files. Check the visionOS checkbox, and wait for the download to complete.
Create and Run a New, Empty visionOS App
To make sure you have everything set up correctly, create a new, empty visionOS project. Use the following settings:
Now make sure the target device is set to “Apple Vision Pro” under “visionOS Simulator”, and hit the “Run” button.
If all goes well, you should see a window showing a pretend living room with your app’s main window floating in the foreground. Click on the “Show Immersive Space” button to show the immersive space. Click again to hide it. Practice moving around the space using the different mode buttons at the bottom right of the screen. (A 3-button mouse helps a lot if you’re used to traditional video game keyboard & mouse WASD navigation.)
Cool! Now you have an application that runs end to end. Start hacking on it and see what happens! If you create a source control repo when creating your project (the default), you can always back up a step or two if you break something.
Unity and Vision Pro: It’s Complicated
I was originally planning on taking the Unity path to development for Vision Pro. But as I dug deeper, I ran into some complications.
First a few definitions:
A Shared Space app shares the space with other apps’ windows and volumes, with the room visible via pass-through video. A Full Space app has the entire space to itself, and can create unbounded content (unlike the strictly bounded volumes in Shared Space). Full Space apps can also access additional information, like polygon meshes for objects in the room, and user head and hand position.
Full Space apps can be either Mixed Immersion, where the real world is visible and can interact with virtual objects, or Full Immersion, which is essentially traditional VR — the outside world is not visible, and the app draws everything the user sees. (There’s also a Progressive Immersion mode where the user can switch between Mixed and Full.
With those definitions out of the way, we can discuss the roughly 3 ways to develop for Reality Pro with Unity:
Develop a conventional Unity VR app. This lets you use the full power of Unity, including custom shaders to create a full-space virtual reality app. But you can’t access the pass through video to display the real world. This is because Metal-based apps can’t access pass through, and Unity is using Metal to render. For more details, see the WWDC video “Bring your Unity VR app to a fully immersive space”.
Shared Space apps using Unity’s PolySpatial technology. This technology lets you build a Unity app on top of Apple’s RealityKit instead of Metal. This lets Unity apps participate in the Shared Space at the cost of some restrictions. Materials have to be mapped onto RealityKit materials, and no custom shaders are permitted. 3D is shown inside of Volumes, bounded areas of preset size. You specify what part of your Unity scene appears inside the Volume with a new camera in Unity called the Volume Camera. For more details, see the WWDC video “Create immersive Unity apps”.
Unbounded Volumes with PolySpatial? At 9:27 into the “Create immersive Unity apps” video there’s a tease of something fascinating. An unbounded volume that displays in Full Space, with the ability to use pass through along with other Full Space features like hand and head pose tracking. See the “input” section of the video for a demo where a user paints a wall with flowers. I haven’t been able to find out much about this functionality so far, but it sounds promising. Note that the PolySpatial restrictions still apply.
Other Considerations
There is a limited beta for Unity on the Vision Pro. I signed up for it almost immediately after it was announced, but have yet to be accepted. Meanwhile, the current versions of Unity 2022.3 (2022.3.9f1 as of this writing) let you download Vision Pro support alongside Mac, Windows, iOS and Android. However I haven’t successfully built for Vision Pro in Unity, possibly because Unity’s Vision Pro Support appears to require Xcode 15 beta 2 from June, whereas the latest version of Xcode 15 is the release candidate.
As I write this, Unity just announced that their licensing terms will be changing in 2024. Developers will be charged based on both revenue and total number of installs of an app, rather than just revenue. I’ve heard from several upset developers who are considering switching to another game engine as a result.
That’s pretty much all I know about developing for Vision Pro with Unity. Based on what I learned, I’ve switched to developing for Vision Pro in Swift using Apple’s native RealityKit API. I hope the information in this article will help you decide which development path is right for you.
Announcing the Silicon Valley Tech Reading Group
TL;DR: I’ve just started a tech reading group on Meetup: Silicon Valley Tech Reading Group
We meet Tuesdays from 7-9pm Pacific Time.
Back around the turn of the 21st Century there was an amazing group that met every Tuesday night at Hobee’s in Cupertino. Led by Russ Rufer, the Silicon Valley Patterns Group started by reading the “Gang of Four” Design Patterns book, but soon branched out to reading about other software design patterns and eventually process patterns.
It’s in that group that I first heard of of Extreme Programming and, later, agile software development. Indeed if you look in the acknowledgments from Alistair Cockburn’s book Agile Software Development, you find a special thanks to the group for reviewing the book when it was still in draft form.
After more than a decade, the group drifted apart, and there wasn’t any equivalent group for a long time. Until now.
As one of my New Year’s resolutions, this year I formed the Silicon Valley Tech Reading Group on Meetup.com. We meet online every Tuesday night from 7-9 Pacific Time.
My goal is to create reading group like the one Russ created a quarter century ago. Like the Silicon Valley Patterns Group, we use the reading group patterns from Joshua Kerievsky’s paper “Pools of Insight” (aka Knowledge Hydrants). That is, we use the book as a jumping off point for everybody in the group to share their own experience on the topic.
At I write this, we’ll be finishing up our study of The Lean Startup this coming Tuesday, and then starting in on the 20th Anniversary Edition of The Pragmatic Programmer the following Tuesday.
I hope you can join us.
Here we go again...
Welcome to the third major redesign of jera.com in its over 25 year history.
Welcome to the third major redesign of jera.com in its over 25 year history.
This time the site is built on top of SquareSpace, which lets me appear to be a much better web designer than I actually am.
Some pages may not be there yet. Be patient. Please tell me about any problems you run across.
Thanks!
Computer Vision Challenge 4: OCR
This challenge is to use OCR to read a receipt.
This is a challenge we’re working on in the Silicon Valley Computer Vision Meetup. This challenge is to use OCR to read a receipt. Specifically, this receipt:
We’ll be using an OCR engine called Tesseract. To get started with Tesseract:
1. Install Tesseract using the instructions. Be sure to install the appropriate language training data.
2. Download the full-size receipt image.
3. Enter the command line:
tesseract IMG_2288.jpg out
4. Look at file “out.text”. You should see (among other things) the text:
SANTA CRUZ HOTEL
Red Restaurant and Bar
Congratulations, you’ve got Tesseract up and running!
Along with the text, you’ll see a lot of garbage. The next step is to tune Tesseract so that it captures all of the text.
Computer Vision Challenge 3: Play Spot-It™
Our new challenge is to write a program that successfully plays the card game “Spot-It“.
Our new challenge is to write a program that successfully plays the card game “Spot-It“.
The Game
There are several variations on the game, but the basic Spot-It mechanic is this:
Two circular cards are turned over.
Every pair of cards has precisely one symbol in common.
The first player to point out the common symbol wins the round.
Here is a sample pair of Spot-It cards:
In this example, the common symbol is a 4-leaf clover.
Suggested Setup
Assume the cards will be laid out side by side, like in the above photo. Split the input image in half, assuming one card on the left side, and one card on the right. That way you can use the above photo to develop your algorithm, and then test it with a camera pointed at two real cards.
How to Match Symbols
There are a number of different ways to match symbols.
Identify and extract the features for each card, and then find the areas on each card match features for each
Extract the contours for each symbol, compute the moments for each contour, and then find the contours with the closest moments. The OpenCV call matchShapesmight come in handy.
?
I’ll be focusing on the feature-based approach here. I’ll post more here later as I work on my solution.
Update: Several members of the meet up have done some amazing things with this.
Soheil Eizadi has solved the problem for the sample image. His code is available at: https://github.com/seizadi/spotit
JJ Stiff has gotten really nice outlines of the images. His code is available at:
http://jjsland.com/opencv
Computer Vision Challenge 2: Object Tracking
Given a somehow-designated object in a scene, track that object as it moves about the scene.
This challenge is much more open-ended than the augmented reality challenge:
Given a somehow-designated object in a scene, track that object as it moves about the scene.
The object could be designated a number of different ways:
1. The largest object moving in the foreground.
2. The object is a different color than the rest of the scene.
3. Designated with some kind of GUI (e.g. click on the object to track).
4. Your idea here.
Similarly, “tracking” can mean a number of different things:
1. Overlay some kind of marker over the designated object in the scene, and move that marker as the object moves.
2. Move a camera to keep the object centered in the field of view.
3. Move a robot so that it follows the designated object without letting it get too close or too far away.
I suggest you start with either tracking the largest moving object in the foreground, or a uniquely colored object using a marker and a fixed camera.
Computer Vision Challenge 1: Augmented Reality
The first challenge is to implement augmented reality (AR). This means synchronizing the world as seen through a camera and then superimposing computer-generated imagery on top of the real world.
This is a challenge we’re working through in the Silicon Valley Computer Vision Meetup. Feel free to follow along on your own.
The source and supporting files are available in a GitHub repo.
The first challenge is to implement augmented reality (AR). This means synchronizing the world as seen through a camera and then superimposing computer-generated imagery on top of the real world. You can see an example in the following photo. The 3D graphic of a globe is superimposed over a real-world image. In this case, the position of the globe is based on the position of the black square printed on the piece of paper. (The white square is for orientation, and will only be needed later. To begin with we will use a target with only a solid black square.) The black square is an example of a fiducial marker. Marker-based AR uses markers like these (or more sophisticated ones that encode data) to orient the camera with respect to the world.
Like most computer vision projects, this will require a pipeline. The basic pipeline will do the following:
Run video from a camera through OpenCV and out to the screen.
Separate the black square from the rest of the image.
Find the contours (2d coordinates for the outline of the square).
Get the four vertices of the square from the contours.
Map the 2d coordinates of the corners to their coordinates in 3d.
Draw a 3d object relative to the square’s position in 3d.
Use the second smaller square to orient your 3d drawing.
Each of these steps will teach you something about computer vision and OpenCV.Iif you don’t complete the whole thing, don’t worry. However far you get, you’ll learn something.
Step 0 Install OpenCV
You can’t use OpenCV until you install it. Unfortunately, OpenCV has a fairly complicated install process due to all the dependencies it includes.. Here are some suggestions to make things easier:
Mac: Use your favorite package manager to install OpenCV. For example, to install with MacPorts:
Install MacPorts from the MacPorts website.
Update MacPorts by issuing the command “sudo port selfupdate” from the command line.
Install OpenCV by entering the command “sudo port install opencv”. This will take several hours.
Linux: Use your system package manager to install OpenCV.
Windows: No idea. Try using the Windows installer on the OpenCV home page maybe?
Step 1 Video from Camera to Screen
This step will introduce you to OpenCV’s HighGUI module. The HighGUI module provides a simple cross-platform way of drawing windows and reading from the camera, among other things.
Rather than walk you through this, I’ve written the code for you. Take a look at step1.cpp on GitHub. The line “cap >> image;” creates an OpenCV Mat object from the image capture device. The line “cv::imshow(“image”, image);” puts up a window titled “image” and shows the image in it.
Virtually all of the code you write for this challenge will be between those two lines of code. You capture the image with “cap >> image”, process it somehow, and show the result with “cv::imshow”.
Important! In addition to calling “cap >> image” and “cv::imshow”, you also need to call “cv::waitKey” to give the system time to process events. If you don’t call cv::waitKey, the image window may mysteriously stop updating under certain circumstances.
Before you go any further build and run the step1 program. This will show that you’ve got everything hooked up properly.
Step 2: Separate the black square from the rest of the image.
Note: steps 2 through 6 use the solid square marker. Click on the link and print it out. You may want to scale your print so there’s plenty of white border around the black square.
In step 3, we’re going to use the findContours call to find the outline of the shape of the black square. But findContours wants a single channel binary image (anything that isn’t 0 is treated as a 1. So we’ll have to modify the image before findContours does its magic.
First, we’ll need to convert the image from color to grayscale. This is done with the cvtColor call. Then we’ll need to threshold the image using either threshold or adaptiveThreshold.
Step 3: Find the contours.
Now that we have a properly prepared image, we can find the contours. This involves a single call to findContours.
Step 4: Find the four corners of the square.
findContours returns a vector of contours. From that, we’ll need to extract the polygon for the square. One approach to this is to use approxPolyDP to find polygon approximations of the contours. Then search for polygons with only four vertices. See the squares example from in your OpenCV samples directory for details.
Step 5: Map the 2d coordinates to 3d.
For this step, we’ll use the call solvePnP to map the 2d points onto the 3d points in our model of the world.
Note that two of the parameters solvePnP take are a camera matrix and a vector of distortion coefficients. To compute these, you’ll need to calibrate your camera, then pass in the computed values to solvePnP.
Step 6: Draw something.
Once you have computed the camera and object position with solvePnP, you can use the results to project 3d points onto the image using projectPoints. The you can use the OpenCV drawing functions to draw a line, for example, that moves with the image.
Step 7: Orient your drawing with the small white square.
Up until now, we’ve been picking which 3 vertices of the square to use arbitrarily. In real-world situations, you probably want to keep track of the marker’s orientation, so you can orient a complex drawing constantly as the marker rotates from the point of view of the camera.
To do this, we’ll switch from the solid square marker to the marker with the small white square. Print out this marker like you did the solid one.
We start the same way we did before: find the large square. However, once we’ve done that, we then find the small square. We’ll designate the vertex of the large square closest to the small square as “first”.
That’s all there is to it!
References
This exercise was inspired in part by chapter 2 of Mastering OpenCV with Practical Computer Vision Projects from Packt Publishing. That chapter walks through building a more advanced AR pipeline. The chapter is also available as an article on the Packt website. If you’re not developing for iPhone, you can skip down to the section titled “Marker Detection”, after which point all the code is all platform-independent C++.
There is also a full marker-based AR library built on top of OpenCV called ArUco that does almost all of the work for you. I used ArUco to build my iPhone AR demo.
Streaming Video for Pebble
Back in 2012, I participated in the Kickstarter for Pebble, a smart watch that talks to your smart phone via Bluetooth. I was looking forward to writing apps for it. Unfortunately, my first Pebble had a display problem and by the time I got around to getting it exchanged, all the easy watch apps had been written.
I racked my brain for an application that hadn’t already been written. Then it hit me — streaming video! I could take a movie, dither it, and send it over Bluetooth from my iPhone to the Pebble. The only problem was: how would I get the video source?
Then I remembered, “Duh, I just wrote an app for that.” CVFunhouse was ideal for my purposes, since it converts video frames into easier-to-handle OpenCV image types, and then back to UIImages for display. All I had to do was process the incoming video into an image suitable for Pebble display, and then ship it across Bluetooth to the Pebble.
My first iteration just tried to send a buffer of data the size of the screen to the Pebble, and then have the Pebble copy the data to the screen. This failed fairly spectacularly. The hard part about debugging on the Pebble is that there’s no feedback. You build your app, copy it to the watch, and then run it. It either works or it doesn’t. (Internally, your code may receive an error code. But unless you do something to display it, you’ll never know about it.) Also, if your Pebble app crashes several times in rapid succession, it goes into “safe mode” and forces you to reinstall the Pebble OS from scratch. I had to do this several times during this process.
Eventually, I wrote a simple binary display routine, and lo and behold, I was getting errors. APP_MSG_BUFFER_OVERFLOW errors, to be exact, even though my buffer should have been more than sufficiently large to handle the data the watch was receiving. I discovered that there is a maximum allowed value for Bluetooth receive buffer size on Pebble, and if you exceed it, you’ll either get an error, or crash the watch entirely. I wanted to send 3360 bytes of data to the Pebble. I discovered empirically that the most I could send in one packet was 116 bytes. (AFAIK, this is still not documented anywhere.) Once I realized this, I was able to send image data to the Pebble in fairly short order, albeit only 5 scan lines at a time.
All that remained was to dither the image on the iPhone side. From back in the monochrome Mac days, I remembered a name: Floyd-Steinberg dithering. I Googled it, and it turns out that the Wikipedia article includes the algorithm, and it’s all of 10 lines of code. Once I coded that, I had streaming video.
Unfortunately, the video only streamed at around 1 FPS on an iPhone 5. How I got it streaming faster is a tale for another day.
CVFunhouse, a iOS Framework for OpenCV
Ever since I took the free online Stanford AI class in fall of 2011, I’ve been fascinated by artificial intelligence, and in particular computer vision.
I’ve spent the past year and a half teaching myself computer vision, and in particular the open source computer vision library OpenCV. OpenCV is a cross-platform library that encapsulates a wide range of computer vision techniques, ranging from simple edge detection, all the way up to 3D scene reconstruction.
But developing primarily for iOS, there was an impedance mismatch. iOS deals with things like UIImages, CGImages and CVImageBuffers. OpenCV deals with things like IplImages and cv::Mats.
So I wrote a framework that takes care of all the iOS stuff, so you can focus on the computer vision stuff.
I call it CVFunhouse. (With apologies to Robert Smigel).
As an app, CVFunhouse displays a number of different applications of computer vision. Behind the scenes, the framework is taking care of a lot of the work, so you can focus on the vision stuff.
To use CVFunhouse, you create a subclass of CVFImageProcessor. You override a single method, “processIplImage:” (or “processMat:” if you’re working in C++). This method will get called once for every frame of video the camera receives. Your method processes the video frame however you like, and outputs the processed image via a callback to imageReady: (or matReady: for C++).
The callback is important, because you’re getting the video frames on the camera thread, but you probably want to use the image in the main UI thread. The imageReady: and matReady: methods take care of getting you a UIImage on the main thread, and also take care of disposing of the pixels when you’re done with them, so you don’t leak image buffers. And you really don’t want to leak image buffers in an app that’s processing about 30 of them per second!
CVFunhouse is dead easy to use. The source is on GitHub at github.com/jeradesign/CVFunhouse. To get started, just run:
git clone https://github.com/jeradesign/CVFunhouse.git
from the command line. Then open the project in Xcode, build and run.
I’ve now built numerous apps on top of CVFunhouse. It’s the framework I use in my day-to-day work, so it’s constantly getting improved. I hope you enjoy it too.
Your iPhone’s Seven Senses
Humans have five senses. Your iPhone has seven!
Humans have five senses. Your iPhone has seven:
Touchscreen
Camera
Microphone
GPS (augmented by cell tower and WiFi location)
Accelerometer
Gyroscope
Magnetometer
(The magnetometer is normally used as a compass. But think for a moment — your iPhone can actually sense magnetic fields. That’s something only a few animals can do.)
Now here’s the sad part:
Most of the time we communicate with our iPhones via only one of those senses — touch. Virtually all of our interaction with our iPhones is via touching a screen the size of a business card. We talk with our iPhone like Anne Sullivan talked to Helen Keller.
But the iPhone isn’t blind or deaf. It can see and hear quite well, and it has a better sense of location and direction than most people.
But it’s very rare that apps take advantage of these senses. One of the few that does (other than navigation and photography apps) is the Apple Store app.
Note, I’m not talking about the App Store app, I’m talking about the app you use to purchase Macs and iPhones from Apple. The app that’s normally a friendly front end for the Apple Store website.
But when you run the app while you’re in (or near) an actual physical Apple retail store (like this one in Palo Alto), the Apple Store app gives you a bunch of new options. For example, it knows you’re in an Apple Store, so if you have a Genius Bar appointment there, it automatically checks you in for your appointment, and shows you a picture of the Genius who will be meeting you.
But the coolest thing you can do with the Apple Store app while at an actual Apple Store is self-checkout. You don’t need to find somebody in a blue shirt to help you with your purchase. Instead, you can just grab an item off the shelf, point your iPhone’s camera at its barcode, and enter your iTunes password. Your item is charged to the credit card associated with your iTunes account, and you’re free to walk out the door with it. It’s freaky weird the first time you do it, but also way cool.
And all this is done using just a two of the iPhone’s senses — GPS and camera.
Imagine what you could do with all seven!
A Brief Note on WikiPad
On Friday, July 22nd, I removed the app formerly known as WikiPad from the iOS app store. I’ve sold my rights to the name “Wikipad” to Wikipad, Inc., makers of the Wikipad gaming tablet.
While the app is no longer for sale, current owners may still re-download the app through the app store.
Shuttle Launch
So, I went to see the shuttle launch.
So, I went to see the shuttle launch.
It was okay.
I guess you could sum up my feelings with that Peggy Lee song “Is That All There Is?” The reason that I went to see the shuttle launch is because of the essay Penn Jillette wrote about it in Penn and Teller’s “How to Play in Traffic”.
“It’s 3.7 miles away, and your looking at this flame and the flame is far away and it’s brighter than watching an arc welder from across a room[….] The fluffy smoke clouds of the angels of exploration spill out of your field of vision. They spill out of your peripheral vision.”
“You don’t exactly hear it at first, it almost knocks you over. It’s the loudest most wonderful sound you’ve ever heard. […] You can’t really hear it. It’s too loud to hear. It’s wonderful deep and low. It’s the bottom.”
“This is a real explosion and it’s controlled and it’s doing nothing but good and it makes your unbuttoned shirt flap around your arms. It’s beyond sound,it’s wind. It’s a man-made hurricane.”
The key point there being, “3.7 miles away”. In the VIP section. I was in closer to 7 miles away, along the NASA Causeway, in the closest section open to to the general public. From there, the Shuttle is a tiny speck without binoculars, and the sound of the launch, when it hits you, is reminiscent of the sound of distant thunder in the midwest. And with the low clouds, the whole show was over in matter of seconds. I could tell you more, but just watch the movie. That’s pretty much what I saw and heard, and I’m nowhere near as good at words as Penn.
Next time, I’m bringing binoculars.