Sudoku Solver using Computer Vision and Deep Learning — Part 1

Extract and solve the sudoku from an image.

I, like many others, enjoy solving new puzzles and questions. During my school days, each morning I used to do The Times of India’s sudoku. Everyone knows how to solve sudoku but have you ever wondered that you can get to the solution without even scratching your head once. You just have to click the picture of the sudoku and it should calculate the solution for you.

Peter Norvig in his Solving Every Sudoku Puzzle gives a beautiful summary of the game’s rule in just one sentence.

A puzzle is solved if the squares in each unit are filled with a permutation of the digits 1 to 9.

The rules of the game:

Now that you know the rules, let's do the heavy lifting so that you don’t have to spend minutes or maybe hours solving the sudoku. Again what we have to do? Take this image and somehow extract the sudoku from this image and calculate the solution.

Input image for sudoku solver

We will be dividing the whole process into 3 steps.

A: Extract the sudoku from the image

B: Extract each number present in the image

C: Compute the solution of the sudoku using algorithm

A: Extract the sudoku from the image

1: Pre Processing the image

proc = cv2.GaussianBlur(img.copy(), (9, 9), 0)
proc = cv2.adaptiveThreshold(proc, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)

To make gridlines have non-zero pixel values, we will invert the colours. Also, we will dilate the image to increase the size of gridline.

proc = cv2.bitwise_not(proc, proc)  
kernel = np.array([[0., 1., 0.], [1., 1., 1.], [0., 1., 0.]],np.uint8)
proc = cv2.dilate(proc, kernel)
Sudoku image after thresholding

2: Find the corners of the largest polygon

_, contours, h = cv2.findContours(img.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = sorted(contours, key=cv2.contourArea, reverse=True)
polygon = contours[0]

Use of `operator.itemgetter` with `max` and `min` allows us to get the index of the point. Each point is an array of 1 coordinate, hence the [0] getter, then [0] or [1] used to get x and y respectively.

Bottom-right point has the largest (x + y) value. Top-left has point smallest (x + y) value. The bottom-left point has the smallest (x — y) value. The top-right point has the largest (x — y) value.

bottom_right, _ = max(enumerate([pt[0][0] + pt[0][1] for pt in
polygon]), key=operator.itemgetter(1))
top_left, _ = min(enumerate([pt[0][0] + pt[0][1] for pt in
polygon]), key=operator.itemgetter(1))
bottom_left, _ = min(enumerate([pt[0][0] - pt[0][1] for pt in
polygon]), key=operator.itemgetter(1))
top_right, _ = max(enumerate([pt[0][0] - pt[0][1] for pt in
polygon]), key=operator.itemgetter(1))

Now that we have all the 4 coordinates. We will return an array of all the 4 points using the indices. Each point is in its own array of one coordinate.

[polygon[top_left][0], polygon[top_right][0], polygon[bottom_right][0], polygon[bottom_left][0]]
4 corners of the largest polygon

3: Crop and Warp Image

Note: Explicitly set the data type to float32 or `getPerspectiveTransform` will throw an error.

top_left, top_right, bottom_right, bottom_left = crop_rect[0], crop_rect[1], crop_rect[2], crop_rect[3]
src = np.array([top_left, top_right, bottom_right, bottom_left], dtype='float32')
side = max([ distance_between(bottom_right, top_right),
distance_between(top_left, bottom_left),
distance_between(bottom_right, bottom_left),
distance_between(top_left, top_right) ])

distance_between() function returns the scalar distance between two points.

def distance_between(p1, p2): 
a = p2[0] - p1[0]
b = p2[1] - p1[1]
return np.sqrt((a ** 2) + (b ** 2))

We will describe a square with side of the calculated length, this is the new perspective we want to warp to. Next thing is to get the transformation matrix for skewing the image to fit a square by comparing the 4 before and after points. In the end, we will perform the transformation on the original image.

dst = np.array([[0, 0], [side - 1, 0], [side - 1, side - 1], [0, side - 1]], dtype='float32')
m = cv2.getPerspectiveTransform(src, dst)
cv2.warpPerspective(img, m, (int(side), int(side)))
Sudoku image after crop and warp perspective transform

4: Infer grid from the square image

squares = [] 
side = img.shape[:1]
side = side[0] / 9
for j in range(9):
for i in range(9):
p1 = (i * side, j * side) #Top left corner of a box
p2 = ((i + 1) * side, (j + 1) * side) #Bottom right corner
squares.append((p1, p2)) return squares

5: Get each digit

digits = []
img = pre_process_image(img.copy(), skip_dilate=True)
for square in squares:
digits.append(extract_digit(img, square, size))

extract_digit is a function which extracts a digit (if one exists) from a Sudoku square. It gets the digital box from the whole square, use the fill feature finding to get the largest feature in the middle of the box. We would expect to find a pixel belonging to the digit in the margin which is used to define an area in the middle. Next, we will scale and pad the digit so that it fits a square of the digit size we’re using for machine learning. Also, we have to ignore any small bounding boxes

def extract_digit(img, rect, size):
digit = cut_from_rect(img, rect)
h, w = digit.shape[:2]
margin = int(np.mean([h, w]) / 2.5)
_, bbox, seed = find_largest_feature(digit, [margin, margin], [w
- margin, h - margin])
digit = cut_from_rect(digit, bbox)

w = bbox[1][0] - bbox[0][0]
h = bbox[1][1] - bbox[0][1]

if w > 0 and h > 0 and (w * h) > 100 and len(digit) > 0:
return scale_and_centre(digit, size, 4)
return np.zeros((size, size), np.uint8)
Final sudoku image

Now, we have the final preprocessed image of sudoku, our next job is to extract each digit from the image, store it in a matrix and calculate the solution of sudoku by some algorithm.

We will discuss these 2 processes in the next part.

For the lazy among you who have skipped reading or performing the tutorial yourselves, here’s a link to the source code.

Software Engineer, Machine Learning