• yeldarb 5 days ago

    Neat walkthrough!

    Last year I actually made an applied-CoreML app to solve sudoku puzzles where MNIST came in very handy.

    I wrote about it here: https://blog.prototypr.io/behind-the-magic-how-we-built-the-...

  • nothis 5 days ago

    >After I scanned a wide variety of puzzles from each book, my server had stored about 600,000 images

    600,000?!? Even divided by 81 that's over 7000! How long did this take?

  • yeldarb 5 days ago

    A couple of afternoons.

    I just hacked into my app's flow to upload a "scan" of the isolated puzzle to my server instead of slicing it and sending the component images to CoreML.

    Then I sat there and flipped through page after page of Sudoku puzzles and scanned them from a few different angles each, sliced them in bulk on the server, and voila: data!

  • dangero 5 days ago

    Sorry I’m still confused. You took roughly 7000 pictures in two afternoons? What do you mean by sliced them in bulk? If you took them from different angles how do you slice them in bulk?

  • yeldarb 5 days ago


    The app already had the code for "isolate the puzzle and do perspective correction" so the uploaded images all looked something like this: https://magicsudoku.com/example-uploaded-image.png

    By "slicing in bulk" I mean the server was the one that split that out into 81 smaller images rather than the app doing the slicing and uploading 81 small images.

    Taking them from different angles was done because the perspective correction adds distortions that I didn't want my model to be sensitive to.

  • bigmit37 5 days ago

    Interesting stuff! I’m also a little confused as to how you took so much pictures in only a couple of afternoons.

  • jononor 5 days ago

    7000 pictures at 5 seconds per picture is "only" 10 hours of work. Possibly per-picture time can be lower than that too. Seems quite doable over 2-4 afternoons.

    Props for doing the project end2end, including the non-trivial (and typically skipped) part of collecting training data.

  • rahimnathwani 5 days ago

    "Apple ... provides a ... helper library called coremltools that we can use to ... convert scikit-learn models, Keras and XGBoost models to CoreML"


  • a_c 5 days ago

    As someone with not much experience in ML, how to handle when there is no number present or if a number is present?

  • ericjang 5 days ago

    Great question! This is actually a surprisingly deep problem in ML, known as "anomaly detection" or "out-of-distribution" (OoD) detection.

    Another way to formulate this question: "given training data that only tells you about digits, how do you know whether something is a digit or not?" Given that the training data never actually defines what isn't a digit, how can we ensure that the model actually sees a digit at test time? If we cannot ensure this (e.g. an adversary or the real world supplies inputs), how can we "filter out" bad inputs?

    A quick hack solution that works well in practice is to examine the "predictive distribution" across digit classes. Researchers have empirically found that entropy tends to be higher (i.e. more smooth) when the model sees an OoD input. However, the OoD problem is not fully solved.

    Here's a nice survey paper on the topic: https://arxiv.org/abs/1809.04729

    Note that methods that tie OoD to the task at hand (classification) are not actually solving OoD, they are solving "predictive uncertainty" of the task.

  • jononor 5 days ago

    You mean to get either 0-9 or 'no number'? Here are two approaches:

    1) Integrated. Represent 'no number' as class number 11 in the original model. Retrain it with this additional class (needs additional training data).

    2) Cascading. Train a dedicated model for 'number' versus 'no number' (binary classifier), and use that in front of the original model.

    Note that the MNIST data comes already extracted from original image, centered in fixed-size images of 28x28 pixels. In a practical ML application these steps would also need to be done before classification can be performed.

  • jononor 5 days ago

    In the work shown in the article, the segmentation and centering of digits looks to be done by the user holding the camera. Which can be workable for some applications!

  • lozenge 5 days ago

    The predictions variable has a confidence value for each digit. You can put a cutoff and say if none is above a certain confidence, assume there's no number at all.

  • jefft255 5 days ago

    This could work, but it is important to note that a lot of ML algorithms trained in a closed domain (no "other" class) will be pretty bad at knowing what they don't know. This is an open problem in ML.

  • jononor 5 days ago

    Choosing the threshold will be hard. And (as mentioned by other poster) the model is unlikely to generalize well to classes of data it has not seen. I suspect that this approach will get things similar to numbers wrong quite often, like handwritten characters (a,b,c). Including these into the training set is much more likely to yield a model which will successfully discriminate it.

  • gunzor 5 days ago

    You can use threshold value to detect whether there is no number. If the prediction accuracy is below this threshold value you can say it as no number

  • zackmorris 5 days ago

    The scrollbar distance confirms a suspicion that I've held for some time: that writing a machine learning algorithm is of similar complexity to developing an iOS app in Xcode!

  • saagarjha 5 days ago

    What scrollbar distance are you talking about?

  • zackmorris 4 days ago

    It was a joke - the Xcode section starts about halfway down the page. I was just illustrating that the friction we deal with today is of comparable complexity to what might be thought of as advanced programming (AI, VR, AR, physics, etc etc).