Github Repo. All the code I'm referencing is here.
Restart/stack the animation below (the animation is purely for entertainment. See "The Generator" for a demo of the program)
Sudoku is a combinatorial number-based puzzle with complicated origins. I am going to assume the you know what it is in this article. My goal is to create a deterministic Sudoku generator that creates boards with a high amount of entropy, or a large amount of randomness.
The project was originally to create a book of puzzles, which I did a few months ago. The original program was written in C to be compiled into a native executable that outputs a file that is then read by a Javascript program ran in a browser locally and rendered into a printable book. Recently, I thought it'd be a good idea if I converted that piece of code to a web app that can be served over http. I will elaborate on this in subsequent sections.
I wanted the program to generate a puzzle from the ground up for maximum randomness. I will elaborate on this later but just know that philosophically, the source of randomness and how it's applied is very important (to me).
Like with my other programs, the source of randomness in my programs are acquired from the xorshift128+ pseudorandom number generator. This is the same one the V8 JavaScript engine uses; in fact, my implementation is heavily based from this. This is the article that mainly helped me.
The random function simply generates a random 64-bit word, the value is then either masked as a mantissa to a floating point number or modulo'd for a random integer (in this case, the ratio between the word size and range of number I needed is very large and the bias introduced via the non power of two modulo is negligible).
In the native executable version, the PRNG is manually seeded. In the web version, the PRNG is seeded using Math.random() multiplied by Number.MAX_SAFE_INTEGER. (It is theoretically possible for both Math.random calls to return a number less than Number.EPSILON * 0.5, paralyzing the PRNG on the WASM side with both initial states set to 0, but this is impossible for most implementations and statictically extremely improbable)
The heart of the program is the backtracking solver. The generator is just a modified version of the solver. The solver is a simple depth-first brute-force solver. Due to Sudoku's nature, the solution can be found relatively quickly. See this Wikipedia page.
In code, each node in the tree is a recursive function call where the call either generates additional nodes (makes more calls), returns unsolved, where the branch is effectively killed, or return solved, where the entire puzzle is solved. Note that only one "branch" is processed at once. In practive, most puzzles can be solved with between a few hundred and a few hundred thousand calls.
The actual board generation starts with a full board. Which is created by running the backtracking solver on an empty board with the gusses randomized. This works because the backtracking solver finds a solution for boards with more than one solution, and running it with randomized guesses ensures a random board.
The filled board, or answer key, is passed into a function where it attempts to remove each entry (only once) in a random order until either the desired number of removes has been met or removing entries destroys the uniqueness of the solution, or the validity of the board. The function checks whether removing an entry will make the board invalid by another modified backtracking solver, which returns if the board has more than one solution; it does this by trying to find another solution after the initial one has been found.
This is the most time consuming step of the process. I have some ideas to speed it up but ehh.
There are other algorithm to generate boards like this way faster, but I'm not sure if their results will be random enought (that is, has the potential to generate puzzles that cover all solutions.)
The initial filled board generation can be sped up by modifying an existing filled board but the process itself is already extremely fast. Checking unque solutions may be sped up by avoiding guesses that is too close to the known solution, but that's too much effort to implement.
The puzzle generator outputs a large string of tilde (~) delimited values, which is read by some JS, and combined with some horrifying emmet-generated strings, and written into the page.
I originally wanted to manually transpile the C code to JS, but that wasn't trivial because Javascript doesn't pass by reference. I'm sure if I worked on it for a few hours I could get it done, but I am lazy, so I decided to use WebAssembly. Plus, using WASM will help with the performance because that code I have is not very fast.
I made some modifications to original C code. Mainly removing the main function and made the main generating function return a char* that references the string of outputs. I then compiled the code with Emscripten. The new main function can now be called with the ccall method provided via Emscripten. The render script handles the rest.
I made heavy modifications to the new render function, it didn't really change its functionality besides making the answer key the same size as the board itself, the main change was that I made it very functional, in the formal sense. I'm striving to be a functional bro.
Write me an email, it is on the site's homepage. If you're affiliated with Georgia Tech, use the email listed in the directory.
Originally written as a gift for Mikinley