Thursday, July 31, 2014

Progress

Below is the table of my progress on this Language Challenge. The numbers below link to my code for the solution, statements of the problems can be found on projecteuler.net. A * indicates a language that was used at some point but has yet to be reused to solve anything, and P&P denotes the problems that have been solved with pencil and paper, no coding necessary. The rightmost number is the problem I am actually using a language for - the numbers to the left are problems I solved previously in the language but then used other languages to free the language up again. This table is starting to get quite large, and hard to fit in a post. The below table is actually mixing a small bit of information - to fit everything, I shortened the Python section. I also solved 16,24, and 39 in Python. I am looking into ways to express this information better so that I don't have to hide any more information from the table.
BrainFuck 1*
Ook! 6*
Whenever 5,2
Cat 3
FALSE 4
GLSL 7
ArnoldC 9
LOLCODE 10
Io 11
Shakespeare 12
Smalltalk 13
Clojure 14
BASIC 16
INTERCAL 17
WIND 18
F# 19
E 20
COBOL 21
SwiftScript 22
Ceylon 23
Erlang 24
E# 26
Befunge 27
Boo 29
K 8,30
ALGOL68 31
Go 12,32
Bash 33
Batch 34
ChucK 35
Whitespace 36
Haskell 2,37
Lua 38
Gosu 39
Rust 40
Ada 41
sML 10,42
Coffeescript 43
Scala 18,44
Rexx 45
Julia 46
x86-64 47
ELM 17,48
OCaml 49
Postscript 7,50
Cobra 51
APL 52
EEL 53
Chapel 54
Elixir 55
Linotte 56
Racket 7,26,57
WARM 58
C# 26,59
Javascript 9,31,60
Pascal 61
cLisp 13,62
Rebol 63
Tcl 64
Dart 63,65
Python 65,66
Prolog 18,67
Fantom 68
Perl 19,70
Processing 71
J 3,24,35,69,72
Groovy 73
Genie 74
Vala 75
Forth 9,76
Hack 77
R 45,78
CIL 79
Frink 16,80
Dogescript 81
Fortran95 3,15,82
Zimbu 83
Red 84
Idris 85
Squirrel 21,52,55,86
D 22,87
C 48,50,88
PASM 89
JavaBC 90
Kotlin 26,91
X10 92
PHP 11,93
Yeti 94
Ruby 4,40,56,95
Java 20,79,91,96
Pike 97
C++ 14,53,94,98
Mathematica 99
P&P 1,5,6,8
P&P 15,25,28,69

Problem 94 - C++

I had a solution to problem 94 written up in EEL pretty quickly before running into 32-bit integer limitations. Therefore, it was not too hard to then port my solution to C++, especially seeing as I am extremely familiar with the language. Problem 94 was a very different type of problem from the last many. Problems 89-93 seemed to all pretty much be problems of knowing how to enumerate a space. Problem 94 was almost entirely a "do some math to dramatically simplify the problem, then brute force it" approach. The below code probably makes absolutely no sense at first glance, but accompanied by the small page of math that motivated it, this was a nice little problem. Looking at the shortness and simplicity of the below code makes me think I may be resolving this to free up C++ once again in the near future.

Code runs in about 3s on my machine.
#include<stdio.h>
#include<cmath>

long contribution(long x) {
    long c = 0;
    // x as smaller side:
    long temp = 3 * x * x - (2 * x) - 1;
    long root = (long)sqrt((double)temp);
    if (root * root == temp) {
        c += 3 * x + 1;
    }
    // x as larger side:
    temp = 3 * x * x + (2 * x) - 1;
    root = (long)sqrt((double)temp);
    if (root * root == temp) {
        c += 3 * x - 1;
    }
    return c;
}
int main(int argc, char ** argv) {
    long ans = 0;
    //repeated side must be odd
    for (long i = 3; i * 3 < 1000000000; i+=2) {
        ans += contribution(i);
    }
    printf("%ld\n", ans);
    return 0;
}

Problem 53 - EEL

So, the next problem I have to solve is problem 94. 94 didn't seem like it would be too hard, so I put my hand into the bag of languages that is a list of languages on Wikipedia, and pulled out EEL as a language to use. EEL (Extensible Embeddable Language) wasn't too hard to pick up how to use, having a very C-like syntax, though it has next to no documentation, which made things a bit hard. And indeed, the lack of documentation caused me to run into an issue. Problem 94 involves some arithmetic that occasionally goes outside the bounds of 32-bit integer arithmetic (indeed, it involves the squares of rather large 32-bit integers, which could stretch to be around 64-bits). However, EEL only has 32-bit integers, so my attempt at using it to solve problem 94 was cut short.

I may have been able to still do 94 in EEL though, for the same reason I was able to do the problem I ended up using it for - I seem to have used C++ to solve problem 53  a while ago. However, I am not entirely sure why I hadn't redone the problem more recently, as problem 53 was rather simple to solve. The below code is pretty much a translation of my previous C++ code, which was not too hard to do after having just played around in EEL trying to get problem 94 working. The most notable things about the code below are that it follows EEL coding style, giving the curly-braces-get-their-own-line syntax I don't use very often, and one very important part of the below program: The * 1.0. This coerces to a real number, because EEL only supports 32-bit integers, and only supports 64-bit floating point numbers, because why not?

Below code runs in about 8ms (EEL seems to be a fairly fast language, even though it runs on its own VM).
export function main<args>
{
    function fac(n)
    {
        if (n < 2)
        {
            return 1;
        }
        else
        {
            return n * fac(n - 1);
        }
    }

    function isncrbig(n, r)
    {
        local denom = fac(r);
        //very important 1.0 converts to a real;
        //EEL only has 32 bit ints and only has 
        //64 bit reals.
        local rest = n * 1.0;
        local i = n - 1;
        while (i > (n - r))
        {
            rest = rest * i;
            if (rest / denom > 1000000)
            {
                return true;
            }
            i = i - 1;
        }
        return false;
    }

    local min_r = 10;
    local ans = 0;
    local n = 23;
    while (n <= 100)
    {
        while (isncrbig(n, min_r - 1))
        {
            min_r = min_r - 1;
        }
        ans = ans + n - (2 * min_r) + 1;
        n = n + 1;
    }
    print(ans);
    print("\n");
    return 0;
}

Wednesday, July 30, 2014

Problem 93 - PHP

After freeing up PHP with Io, the natural next step was to use PHP to solve problem 93. Now, I am not a huge fan of PHP, but this solution was not too hard to type up. My issues with this problem ended up being hard to fix, but very interesting mistakes. The first big mistake was not noticing that in the statement of the problem, divisions were supposed to be floating point divisions, not integer divisions. (I assumed because we had integer inputs and were only considering integer outputs that the integers would be integer divisions, but assuming that division is integer division is almost always wrong in a math context). My next issue ended up being a very weird issue that basically boils down to "don't assume a language feature exists." In many languages, and most recently in Io, if x is a floating point number, then x % 1 (where % is the modulus operator), returns the floating point portion of a number. I used this operator to test for a whether a number was probably an integer, being off only because of floating point error. So, to check for whether a number was an "integer", I had the line of code if ($n % 1 == 0), and I was surprised when this did not fix my issues, and started trying to hunt down other issues. However, it turns out that, in PHP, $n % 1 returns 0 even if $n is floating point. Therefore, this test that I had placed my trust in was doing absolutely nothing. Replacing the test with one that actually tested what I want then gave me the correct answer, ending a long struggle to answer this problem.

Answer runs in about 12s on my machine (thank god I didn't try to stick with Io.)
<?php
function fac($x) {
    return $x < 2 ? 1 : $x * fac($x - 1);
}

#Nth Lexicographic permutation fcn,
#taken from my solutions to many other
#problems
function NthLP($n, $p, $rem) {
    if (count($rem) == 1) {
        $p[] = $rem[0];
        return $p;
    }
    $k = fac(count($rem) - 1);
    $timesDivided = (int) ($n / $k);
    $p[] = $rem[$timesDivided];
    array_splice($rem,$timesDivided, 1);
    return NthLP($n % $k, $p, $rem);
}

function digits($num) {
    $ret = array();
    while ($num > 0) {
        $ret[] = ($num % 10);
        $num = (int) ($num / 10);
    }
    return array_reverse($ret);
}

#abstraction for calling operations by number
function op($x, $y, $op) {
    if (is_infinite($x) || is_infinite($y)) { 
        return INF; 
    }
    switch($op) {
    case 0:
        return $x * $y;
    case 1:
        return $x - $y;
    case 2:
        return $y == 0 ? INF : ($x / $y);
    case 3:
        return $x + $y;
    default:
        return 3025;
    }
}

#index encodes the permutation of digits,
#parenthization, and operands,
#as a sort of multi-base number
function getNum($dlist, $index) {
    $permut = $index % 24;
    $index  = (int) ($index / 24);
    $parenth= $index % 5;
    $index  = (int) ($index / 5);
    $op1    = $index % 4;
    $index  = (int) ($index / 4);
    $op2    = $index % 4;
    $index  = (int) ($index / 4);
    $op3    = $index % 4;
    $d      = NthLP($permut, array(), $dlist);
    switch($parenth) {
    case 0:
        return op( op( $d[0], $d[1], $op1), op($d[2], $d[3], $op3), $op2);
    case 1:
        return op( $d[0], op($d[1], op($d[2], $d[3], $op3), $op2), $op1);
    case 2:
        return op( $d[0], op( op($d[1], $d[2], $op2), $d[3], $op3), $op1);
    case 3:
        return op( op( $d[0], op( $d[1], $d[2], $op2), $op1), $d[3], $op3);
    case 4:
        return op( op( op( $d[0], $d[1], $op1), $d[2], $op2), $d[3], $op3);
    default:
        return 3025;
    }
}

#So apparently PHP passes arrays by value
#scary
function setBit($lst, $n) {
    #test for integer-ness
    if (($n - (int)$n) < 0.01) {
        $n = (int) $n;
        $lst[(int) ($n / 32)] |= 1 << ($n % 32);
    }
    return $lst;
}

function countConsec($lst) {
    $count = 0;
    for ($k = 1; $k < 3024; $k++) {
        if ((($lst[(int) ($k / 32)] >> ($k % 32)) & 1) == 1) {
            $count += 1;
        } else {
            return $count;
        }
    }
    return $count;
}

function numConsec($x) {
    $digs = digits($x);
    if (!((count($digs) == 4) && ($digs[0] < $digs[1]) &&
        ($digs[1] < $digs[2]) && ($digs[2] < $digs[3]))) {
            return 0;
    }
    $results = array_fill(0, (int) (3024 / 32), 0);
    for ($i = 0; $i < (5 * 4 * 4 * 4 * 24); $i++) {
        $next = getNum($digs, $i);
        if ($next > 0 && $next < 3024) {
           $results = setBit($results, $next);
        }
    }
    return countConsec($results);
}

$soln = 1234;
$maxConsec = 0;
for($i = 1234; $i <= 6789; $i++) {
    $num = numConsec($i);
    if ($num > $maxConsec) {
        $maxConsec = $num;
        $soln = $i;
    }
}
var_dump($soln);
?>

Problem 11 - Io

After writing up a somewhat full solution to Problem 93 in Io, I found out that that solution was far, far too slow to run. Thus, I looked for a much, much simpler problem that I could solve in Io in order to free up another language for 93. Problem 11 was a good choice, as the problem is quite simple, other than requiring working with a 2-dimensional array, which is very doable in Io. So, I replaced my old PHP solution from a long time ago with an Io solution. This was mostly an easy matter of translating, nothing special. Solution runs in about 0.1s on my machine (for what it's worth, my PHP solution runs in about 1/5 of the time, and of course that is worth very little on these time scales).
matrix := list(list( 8, 2,22,97,38,15, 0,40, 0,75, 4, 5, 7,78,52,12,50,77,91, 8),
    list(49,49,99,40,17,81,18,57,60,87,17,40,98,43,69,48, 4,56,62, 0),
    list(81,49,31,73,55,79,14,29,93,71,40,67,53,88,30, 3,49,13,36,65),
    list(52,70,95,23, 4,60,11,42,69,24,68,56, 1,32,56,71,37, 2,36,91),
    list(22,31,16,71,51,67,63,89,41,92,36,54,22,40,40,28,66,33,13,80),
    list(24,47,32,60,99, 3,45, 2,44,75,33,53,78,36,84,20,35,17,12,50),
    list(32,98,81,28,64,23,67,10,26,38,40,67,59,54,70,66,18,38,64,70),
    list(67,26,20,68, 2,62,12,20,95,63,94,39,63, 8,40,91,66,49,94,21),
    list(24,55,58, 5,66,73,99,26,97,17,78,78,96,83,14,88,34,89,63,72),
    list(21,36,23, 9,75, 0,76,44,20,45,35,14, 0,61,33,97,34,31,33,95),
    list(78,17,53,28,22,75,31,67,15,94, 3,80, 4,62,16,14, 9,53,56,92),
    list(16,39, 5,42,96,35,31,47,55,58,88,24, 0,17,54,24,36,29,85,57),
    list(86,56, 0,48,35,71,89, 7, 5,44,44,37,44,60,21,58,51,54,17,58),
    list(19,80,81,68, 5,94,47,69,28,73,92,13,86,52,17,77, 4,89,55,40),
    list( 4,52, 8,83,97,35,99,16, 7,97,57,32,16,26,26,79,33,27,98,66),
    list(88,36,68,87,57,62,20,72, 3,46,33,67,46,55,12,32,63,93,53,69),
    list( 4,42,16,73,38,25,39,11,24,94,72,18, 8,46,29,32,40,62,76,36),
    list(20,69,36,41,72,30,23,88,34,62,99,69,82,67,59,85,74, 4,36,16),
    list(20,73,35,29,78,31,90, 1,74,31,49,71,48,86,81,16,23,57, 5,54),
    list( 1,70,54,71,83,51,54,69,16,92,33,48,61,43,52, 1,89,19,67,48));

max := method(x, y, if(x > y, x, y))

maxProd := 0;
bound := matrix size;
for (i, 0, bound - 1,
    for (j, 0, bound - 1,
    (i + 3 < bound) ifTrue (maxProd := max((matrix at(i) at(j)) * 
    (matrix at(i+1) at(j)) * (matrix at(i+2) at (j)) * (matrix at(i+3) at(j)), maxProd));
    (j + 3 < bound) ifTrue (maxProd := max((matrix at(i) at(j)) * 
    (matrix at(i) at(j+1)) * (matrix at(i) at (j+2)) * (matrix at(i) at(j+3)), maxProd));
    (i + 3 < bound and j + 3 < bound) ifTrue (maxProd := max((matrix at(i) at(j)) * 
    (matrix at(i+1) at(j+1)) * (matrix at(i+2) at (j+2)) * (matrix at(i+3) at(j+3)), maxProd));
    (i + 3 < bound and j > 2) ifTrue (maxProd := max((matrix at(i) at(j)) * 
    (matrix at(i+1) at(j-1)) * (matrix at(i+2) at (j-2)) * (matrix at(i+3) at(j-3)), maxProd));                   
    )
)

maxProd print;

Tuesday, July 29, 2014

Progress on Problem 93-ish, slow languages are fun.

I spent much of this evening coding up a solution to problem 93 in the Io programming languages. After I went through fixing all of the various errors that I had because of not getting the at-times confusing or just odd syntax right, I tried running my program, and it ran, and ran, and ran giving no answer. Turns out, it takes about 15 seconds for it to test just 10 of the couple thousand cases it has to check. After observing this, I decided to run a bit of a test to see just how slow the language is. As I had done all of my work with sieve timing before, that seemed like a good example of something I could code up quickly to test. A prime sieve to find all primes less than 100000 takes about 6.5 seconds to run in Io. That can be compared to about 10ms in C, and primes less than a million took too long for me to wait for in Io (more than a minute), and takes about .1s in C. So, this test showed me that Io could be expected to be at least 600 times slower than C code - and that's assuming no operations are particularly slow, such as array accesses, etc. So, based on this result, I think I will have to retire Io as the language for 93, and instead use it to solve an earlier problem, and use another language on 93. Though I must say, a 600x + slowdown compared to C is kind of impressive, and a sign that it will be hard to solve anything in close to a minute in this language.

Saturday, July 26, 2014

Problem 92 - X10

So, working on another problem. I found X10 through yet another google search, this time specifically for languages that run on the JVM, because why not. X10 is an object oriented languages developed by IBM that, yes, runs on the JVM. I ended up accidentally installing an IDE for x10 from the website (x10dt apparently means "X10 developer tools" which means "an IDE for X10"), so I used that instead of just using a command line compiler, and the IDEs copious red lines resultsd in all my errors being caught while I was writing, meaning that my pace of writing was slowed down. Overall the language wasn't too hard to pick up, having very Java-like syntax, though actually its Javaness led to my first bit of confusion...not knowing to use the new operator...but other than silly things like that, I had working code relatively quickly.

After I had working code, there were still issues. The code ran slowly, and it took me about 5 minutes to figure out how to stop execution of code in the IDE (pressing "run" again only made it run the program twice concurrently!) And in addition pressing run does not auto-save, which took a while to notice. The last big issue was an interesting one...I was accidentally allocating the "prev" array each time through the loop, which turns out to have been incredibly costly. After I moved to allocating it once instead of ten million times, my code's execution time went from about 5 minutes to do 10% of the search space to almost instant (I say almost instant because, again, IDE, so I don't have real timing data). Anyway, my X10 code is below:

(Small indents because I wrote it in the IDE which seems to have unfortunately used tabs instead of spaces).

public class e92 {

 public static def getBit(r:Rail[Long], x:Long) {
  val index    = x / 64;
  val subIndex = x % 64;
  return ((r(index) >> subIndex) & 1) == 1;
 }
 
 public static def setBit(r:Rail[Long], x:Long) {
  val index    = x / 64;
  val subIndex = x % 64;
  r(index) = r(index) | (1L << subIndex);
 }
 
 public static def nextTerm(x: Long) {
  var c:Long = x;
  var s:Long = 0;
  while (c > 0) {
   s += (c % 10) * (c % 10);
   c /= 10;
  }
  return s;
 }
    public static def main(Rail[String]) {
     // Pair of arrays for memoizing to store
     // if we have info, and if it goes to 89
        var isSet:Rail[Long] = new Rail[Long](10000000 / 64);
        isSet.fill(0);
        var is89:Rail[Long]  = new Rail[Long](10000000 / 64);
        is89.fill(0);
        setBit(isSet, 1);
        setBit(isSet, 89);
        setBit(is89, 89);
        var ans:Long = 1;
        var prev:Rail[Long] = new Rail[Long](100000);
        for (i in 1..9999999) {
         var pLength:Long = 0;
         var c:Long = i;
         while (! getBit(isSet, c)) {
          prev(pLength) = c;
          pLength      += 1;
          c = nextTerm(c);
         }
         var was89:Boolean = getBit(is89, c);
         for (j in 0..(pLength - 1)) {
          setBit(isSet, prev(j));
          if (was89) {
           setBit(is89, prev(j));
           ans += 1;
          }
         }
        }
        Console.OUT.println(ans);
    }
}

Sunday, July 20, 2014

Problem 91 - Java

After freeing up Java, I decided to use it. Problem 91 was not particularly hard (most of the recent problems have mostly been a matter o knowing how to enumerate the answer space well, and how to check for the answer easily arithmetically), so I actually did not intend to use a language as useful as Java on it. However, Java was available, and the first two languages I tried I ran into difficulties with (I may use them eventually, but Factor documentation was not very useful, and Dylan is having difficulties building on my machine).

Also, solving 91 in Java results in the nice phenomenon of using Java Byecode and Java to solve consecutive problems. Anyway, the main difficulty I experienced trying to solve this problem was me accidentally expressing the condition for perpendicularity incorrectly. I accidentally took the reciprocal twice, causing me to be checking if slopes were negatives, instead of negative reciprocals. other than that and other small issues, this was not the hardest of problems.

Runs in about .15s on my machine.

public class e91 {

    public static int min (int x, int y) {
        return x < y ? x : y;
    }

    public static void main(String[] args) {

        int ans = 0;
        final int B = 50 + 1;
        for (int P1 = 1; P1 < B*B; ++P1) {
            for (int P2 = P1 + 1; P2 < B*B; ++P2) {
                int x1 = P1 / B;
                int y1 = P1 % B;
                int x2 = P2 / B;
                int y2 = P2 % B;
                int oldans = ans; 
                if(min(x1, x2) == 0 && min(y1, y2) == 0) {
                    //Right Angle at origin
                    ++ans;
                } else if ((y1 == y2) && (x1 == 0 || x2 == 0)) {
                    //undef slope, check explicitly
                    ++ans;
                } else if ((x1 == x2) && (y1 == 0 || y2 == 0)) {
                    //same
                    ++ans;
                } else {
                    //Negative reciprocals => right angle
                    int num   = y1 * (y1 - y2);
                    int denom = x1 * (x1 - x2);
                    if (denom != 0) {
                        int ratio = num / denom;
                        int rem   = num % denom;
                        ans += (ratio == -1 && rem == 0) ? 1 : 0;
                    }
                    if (ans == oldans) {
                        num   = y2 * (y2 - y1);
                        denom = x2 * (x2 - x1);
                        if (denom != 0) {
                            int ratio = num / denom;
                            int rem   = num % denom;
                            ans += (ratio == -1 && rem == 0) ? 1 : 0;
                        }
                    }
                }
            }
        }
        System.out.println(ans);
    }
}

Saturday, July 19, 2014

Problem 79 - CIL, who needs high level languages?

After using Parrot Assembly and Java Bytecode, I figured I might as well continue my low level / "intermediate representation" language stint. CIL is the "Common Intermediate Language": it is essentially Java Bytecode for the .Net VM. So, programming in it was not too bad, maybe even a little bit nicer than Java Bytecode at times: one standard "call" method was nice instead of invokestatic, invokevirtual, etc. And most of the language is very similar to Java Bytecode - both are stack based, so they use very similar syntax. And the problem I chose to solve was pretty similar pro grammatically to problem 90 that I solved in Java Bytecode: both just involved lots of set operations, which are easy to express as ors, ands, etc (also, my code contains an xor, so clearly I am doing fancy stuff...). Also, CIL was nice as, though Java Bytecode exists only for Java, CIL is supposed to be able to support more than just C#, so there was no need in this case to create a class to wrap the main method in.

There was, however, one very important way in which programming in CIL was far less fun than Java Bytecode. I could not find as much good documentation, and the error messages are horrible. At least using mono (basically the open source .Net VM, while Microsoft also has one) there were essentially two errors one could get: "Unrecoverable Syntax Error" if an error happens at compile time, and "Invalid IL code" if the error happens at runtime. So, even though my code was basically correct very early (a couple of or's had to be changed to and's once I had things running...) Forgetting a single branch statement to go tho the top of a loop and a single pop instruction to clear the stack caused very hard to debug run time errors that took lots of "remove code until the error goes away" style debugging, which was not fun.

So basically, CIL is a fine stack-based assembly languages, as long as you don't cause any errors.
Also, it is worth noting that despite the title of this post...high level languages are actually quite nice. Variable names are really nice, even if I have gotten pretty good at remembering which register holds what variable.

Below code runs in 50ms on my machine.
.assembly extern mscorlib {}
.assembly e79 {}

.namespace E79 
{
    .method public static void main() cil managed
    {
        .entrypoint
        .maxstack 100
        .locals([0] int32[],
                [1] int32,
                [2] int32,
                [3] int32,
                [4] int32,
                [5] int32,
                [6] int32)
        ldc.i4 10
        newarr int32
        stloc.0
        ldc.i4.0
        stloc.3
        ldc.i4.0
        stloc.1
LOOP_INIT:
        ldloc.1
        ldloc.0
        ldlen
        bge OVERLOOP_INIT
        ldloc.0
        ldloc.1
        ldc.i4.0
        stelem.i4
        ldloc.1
        ldc.i4.1
        add
        stloc.1
        br LOOP_INIT
OVERLOOP_INIT:

LOOP_MAIN:
        call string [mscorlib]System.Console::ReadLine()
        dup
        brnull OVERLOOP_MAIN
        call int32 [mscorlib]System.Int32::Parse(string)
        stloc.2
        ldloc.2
        ldc.i4 100
        div
        stloc 4
        ldloc.2
        ldc.i4 10
        div
        ldc.i4 10
        rem
        stloc 5
        ldloc.2
        ldc.i4 10
        rem
        stloc 6
        ldloc.0
        ldloc 5
        ldloc.0
        ldloc 5
        ldelem.i4
        ldc.i4.1
        ldloc 4
        shl
        or
        stelem.i4
        ldloc.0
        ldloc 6
        ldloc.0
        ldloc 6
        ldelem.i4
        ldc.i4.1
        ldloc 5
        shl
        or
        ldc.i4.1
        ldloc 4
        shl
        or
        stelem.i4
        ldloc.3
        ldc.i4.1
        ldloc 4
        shl
        or
        ldc.i4.1
        ldloc 5
        shl
        or
        ldc.i4.1
        ldloc 6
        shl
        or
        stloc.3
        br LOOP_MAIN
OVERLOOP_MAIN:
        pop
        ldc.i4.0
        stloc 4
        ldc.i4.0
        stloc.1
LOOP_FINAL:
        ldloc.1
        ldc.i4 10
        bge OVERLOOP_FINAL
        ldloc.0
        ldloc.1
        ldelem.i4
        ldloc.3
        ldloc.1
        shr
        ldc.i4.1
        and
        ldc.i4.1
        xor
        or
        brzero IFPART
ELSEPART:
        ldloc.1
        ldc.i4.1
        add
        stloc.1
        br LOOP_FINAL
IFPART:
        ldloc 4
        ldc.i4 10 
        mul
        ldloc.1
        add
        stloc 4
        ldc.i4.1
        ldloc.1
        shl
        not
        ldloc.3
        and
        stloc.3
        ldc.i4.0
        stloc 5
LOOP_CLEAR:
        ldloc 5
        ldc.i4 10
        bge OVERLOOP_CLEAR
        ldloc.0
        ldloc 5
        ldloc.0
        ldloc 5
        ldelem.i4
        ldc.i4.1
        ldloc.1
        shl
        not
        and
        stelem.i4
        ldloc 5
        ldc.i4.1
        add
        stloc 5
        br LOOP_CLEAR
OVERLOOP_CLEAR:
        ldc.i4.0
        stloc.1
        br LOOP_FINAL
OVERLOOP_FINAL:
        ldloc.s 4
        call void [mscorlib]System.Console::WriteLine(int32)
        ret
    }
}

Monday, July 14, 2014

Problem 90 - Java Bytecode!

So, after solving problem 89 in Parrot Assembly, I claimed that I was interesting in trying out Java Bytecode. So, I did indeed use it. Turns out Java Bytecode is not exactly the world's most user friendly language to write in. Its not a standard assembly language, but is rather stack based, with some commands taking arguments directly, but most only directing with the stack, which makes things certainly different. Also, thank god I didn't have to invoke too many methods that weren't static in my own class...I wrote a helper function that is a wrapper for System.out.println, it was so frustrating to call it for debugging purposes!

Anyway, problem 90 was a rather simple problem, being capable of being expressed in terms of bitwise operations. Just lots of ors and ands and shifts, treating integers as sets, and iterating over all pairs of possible sets of {0...9}. This is only 1024 * 1024 =~ 1 million sets, and as the operations per pair of sets is exceedingly simple (and most can be thrown out very quickly for not having enough elements to represent a die), this was an entirely reasonable amount of work. Almost all the difficulty with this problem came from the problem, and not the language (well, the only problem related issue was me forgetting that, despite 6 and 9 being identified, a die could still have both, but I only forgot that exceedingly briefly, thanks to the example on the project euler page).

Anyway, below is my code...I would apologize for the lack of comments, as (I know) it makes it hard to read, but at least the ByteCode assembler I used didn't seem to have any syntax for allowing comments...so though unfortunate, it is what it is. Code runs in about 150ms, though it is notable that much of that is probably just starting up the JVM. Maybe my next language will be wonderfully low level again, or maybe I will go up to more sophisticated languages soon. Anyway, here is the beautiful code:
.class public e90
.super java/lang/Object

.method public static printInt(I)V
    .limit stack  100
    .limit locals 100
    getstatic     java/lang/System out Ljava/io/PrintStream;
    iload_0
    invokevirtual java/io/PrintStream println (I)V
    return
.end method

.method public static sixBitsSet(I)I
    .limit stack  100
    .limit locals 100
    iconst_1
    istore_1
    iconst_0
    istore_2
BITSETLOOP:
    iload_1
    iload_0
    if_icmpgt   BITSETLOOP_END
    iload_1
    iload_0
    iand
    ifeq        NOADD
    iinc        2 1
NOADD:
    iload_1
    iconst_1
    ishl
    istore_1
    goto        BITSETLOOP
BITSETLOOP_END:
    bipush      6
    iload_2
    if_icmpeq   RETURN_TRUE
    iconst_0
    ireturn
RETURN_TRUE:
    iconst_1
    ireturn
.end method
    
.method public static setContains(II)I
    .limit stack  100
    .limit locals 100
    iconst_1
    iload_1
    ishl
    iload_0
    iand
    ifgt        RETURN_SUCCESS
    iconst_0
    ireturn
RETURN_SUCCESS:
    iconst_1
    ireturn
.end method
          
.method public static check(III)I
    .limit stack  100
    .limit locals 100
    iload_0
    invokestatic e90/sixBitsSet(I)I
    iload_1
    invokestatic e90/sixBitsSet(I)I
    iadd
    iconst_2
    if_icmpne   RETURN_FAILURE
    iload_0
    sipush      512
    iand
    ifeq        SKIP_ADDSIX1
    iload_0
    sipush      64
    ior
    istore_0
SKIP_ADDSIX1:
    iload_1
    sipush      512
    iand
    ifeq        SKIP_ADDSIX2
    iload_1
    sipush      64
    ior
    istore_1
SKIP_ADDSIX2:
    iload_2
    bipush      10
    irem
    istore_3
    iload_3
    bipush      9
    if_icmpne   SKIP_SIXIFY
    bipush      6
    istore_3
SKIP_SIXIFY:
    iload_0
    iload_3
    invokestatic e90/setContains(II)I
    istore      4
    iload_1
    iload_3
    invokestatic e90/setContains(II)I
    istore      5
    iload_2
    bipush      10
    idiv
    istore_3
    iload_3
    bipush      9
    if_icmpne   SKIP_SIXIFY2
    bipush      6
    istore      3
SKIP_SIXIFY2:
    iload_0
    iload_3
    invokestatic e90/setContains(II)I
    istore      6
    iload_1
    iload_3
    invokestatic e90/setContains(II)I
    istore      7
    iload       4
    iload       7
    iand
    iload       5
    iload       6
    iand
    ior
    ireturn
RETURN_FAILURE:
    iconst_0
    ireturn        
.end method   
              
.method public static main : ([Ljava/lang/String;)V
    .limit stack  100
    .limit locals 100
    
    iconst_0
    istore      6
    bipush      9
    newarray    int
    astore      4        
    iconst_0    
    istore_0 
INITLOOP:
    iload_0
    bipush      9
    if_icmpge    INITLOOP_END
    aload       4
    iload_0
    iinc        0 1
    iload_0
    iload_0
    imul
    iastore
    goto        INITLOOP
INITLOOP_END:
    iconst_0
    istore_0
OUTERLOOP:      
    iload_0     
    sipush      1024
    if_icmpge   OUTERLOOP_END
    iload_0     
    istore_1    
INNERLOOP:      
    iload_1     
    sipush      1024
    if_icmpge   INNERLOOP_END
    iconst_0
    istore      5
INNERMOSTLOOP:
    iload       5
    bipush      9
    if_icmpge   INNERMOSTLOOP_END_SUCCESS
    iload_0
    iload_1
    aload       4
    iload       5
    iaload
    invokestatic e90/check(III)I
    ifeq        INNERMOSTLOOP_END_FAILURE
    iinc        5 1
    goto        INNERMOSTLOOP
INNERMOSTLOOP_END_SUCCESS:
    iinc        6 1
INNERMOSTLOOP_END_FAILURE:
    iinc        1 1
    goto        INNERLOOP
INNERLOOP_END:    
    iinc        0 1
    goto        OUTERLOOP
OUTERLOOP_END:
    iload       6
    invokestatic e90/printInt(I)V  
    return       
.end method       

Saturday, July 12, 2014

Progress

Below is the table of my progress on this Language Challenge. The numbers below link to my code for the solution, statements of the problems can be found on projecteuler.net. A * indicates a language that was used at some point but has yet to be reused to solve anything, and P&P denotes the problems that have been solved with pencil and paper, no coding necessary.
BrainFuck 1*
Ook! 6*
Whenever 5,2
Cat 3
FALSE 4
GLSL 7
ArnoldC 9
LOLCODE 10
Io 11
Shakespeare 12
Smalltalk 13
Clojure 14
BASIC 16
INTERCAL 17
WIND 18
F# 19
E 20
COBOL 21
SwiftScript 22
Ceylon 23
Erlang 24
Kotlin 26
Befunge 27
Boo 29
K 8,30
ALGOL68 31
Go 12,32
Bash 33
Batch 34
ChucK 35
Whitespace 36
Haskell 2,37
Lua 38
Gosu 39
Rust 40
Ada 41
sML 10,42
Coffeescript 43
Scala 18,44
Rexx 45
Julia 46
x86-64 47
ELM 17,48
OCaml 49
Postscript 7,50
Cobra 51
APL 52
EEL 53
Chapel 54
Elixir 55
Ruby 4,40,56
Racket 7,26,57
WARM 58
C# 26,59
Javascript 9,31,60
Pascal 61
cLisp 13,62
Rebol 63
Tcl 64
Dart 63,65
Python 16,24,39,65,66
Prolog 18,67
Fantom 68
Perl 19,70
Processing 71
J 3,24,35,69,72
Groovy 73
Genie 74
Vala 75
Forth 9,76
Hack 77
R 45,78
CIL 79
Frink 16,80
Dogescript 81
Fortran95 3,15,82
Zimbu 83
Red 84
Idris 85
Squirrel 21,52,55,86
D 22,87
C 48,50,88
PASM 89
JavaBC 90
Java 20,79,91
X10 92
PHP 11,93
C++ 14,53,94
P&P 1,5,6,8
P&P 15,25,28,69

Problem 89 - PASM

After solving problem 88, problem 89 is not a particularly difficult problem, it involves doing stuff with Roman Numerals. As I expected the problem to not be too difficult, I looked for a new language that would not be super easy to use to solve this problem. So, I stumbled across and decided to use PASM, the Parrot ASseMbly language for the Parrot Virtual Machine (http://www.parrot.org/). Now, though it is an assembly language, it is an assembly language for a virtual machine, so it a CISC language, and it notably has a number of language one does not find in most assembly languages, most notably a number of string manipulation functions (used below are length, substr, ord. Also readline and a simple print command are nice to have). So, the string manipulation niceties were nice, especially as problem 89 involves reading in and processing a large text file. The main difficulties I had working in PASM was 1) figuring out how to declare constants...even though this ended up being unnecessary, .const and .constant are both documented as existing but don't, so this took a while. And 2) Attempting to use an array. One disadvantage of PASM running on a VM is that, unlike a normal assembly language, memory management is hard, and I could not figure out how to allocate/use an array (I could use a stack, but random access in a stack is...not fun). This difficulty cost a lot of time while writing the program, and also resulted in the GET_CHAR_COUNT method not being too nice.

I think a number of the issues described above were due to my choice to use PASM, while Parrot apparently intends for most users to use (and of course by "use" I mean have their compilers output) PIR (Parrot Intermediate Representation) code which is then converted to PASM. And because this is the intent, good PASM documentation was harder to find than PIR. Luckily, there was a list of opcodes, and a whopping 4 examples that came with Parrot, so those two things combined are what I used to get through most of this.

The last issue was stuff with the input file having Windows newlines while I only tested on Unix newlines to begin with. But oh well, that was fixed relatively easily.

My use of PASM has inspired me to look into soon using a similar language...Java ByteCode (and, if it is sufficiently different, PIR).

Anyway, even if it runs on a VM, it is still basically an assembly language, so my solution below runs very quickly: about 25ms on my machine.
.loadlib 'io_ops'

#ASCII values of important letters
.macro_const M 77
.macro_const D 68
.macro_const C 67
.macro_const L 76
.macro_const X 88
.macro_const V 86
.macro_const I 73

#Impromptu convention:
#arguments passed in I/S 1, returned in I/S 0
.pcc_sub :main main:
    #'The Stack'
    new         P10,'ResizableIntegerArray'
    getstdin    P0
    set         I9, 0
MAINLOOP:
    readline    S1, P0
    unless      S1, FINAL
    #get length of string
    length      I8, S1
    #subtract 2 from length because windows newline
    sub         I8, I8, 2
    inc         I8
    #cutoff one char of newline, as methods expect
    #UNIX newlines
    substr      S1, S1, 0, I8
    dec         I8
    #Now that that is done...
    local_branch P10, GET_STRING_VALUE
    local_branch P10, GET_CHAR_COUNT
    sub         I0, I8, I0
    add         I9, I9, I0 
    branch      MAINLOOP
FINAL:
    say         I9
    end

#Basically a switch statement
#input: I0
#output: I0
GET_CHAR_VALUE:
    eq          I0, .M, M
    eq          I0, .D, D
    eq          I0, .C, C
    eq          I0, .L, L
    eq          I0, .X, X
    eq          I0, .V, V
I:
    set         I0, 1
    local_return P10
V:  
    set         I0, 5
    local_return P10
X:  
    set         I0, 10
    local_return P10
L:  
    set         I0, 50
     local_return P10
C:  
    set         I0, 100
    local_return P10
D:  
    set         I0, 500
    local_return P10
M:  
    set         I0, 1000
    local_return P10

#compute integer corresponding to Roman Numeral
#input: S1
#output: I0
#used: I2, I3, I4, I5, S2
GET_STRING_VALUE:
    length      I2, S1
    dec         I2
    set         I3, 0
    set         I4, 0
    set         I5, 0
LOOP:
    ge          I3, I2, ENDLOOP
    substr      S2, S1, I3, 1 
    ord         I0, S2
    local_branch P10, GET_CHAR_VALUE
    set         I4, I0
    inc         I3
    substr      S2, S1, I3, 1
    ord         I0, S2
    local_branch P10, GET_CHAR_VALUE
    lt          I4, I0, _SUB
    add         I5, I5, I4
    branch      LOOP
_SUB:
    sub         I5, I5, I4
    branch      LOOP    
ENDLOOP:
    set         I0, I5
    local_return P10

#This could be written much more nicely
#If I could figure out how to do arrays
#input: I0
#used: I1, I6, I7
GET_CHAR_COUNT:
    #Move input into I1
    #And compute number of M's
    set         I1, I0
    div         I6, I1, 1000
    mul         I0, I6, 1000
    sub         I1, I1, I0
    set         I7, I6
    #CM
    div         I6, I1, 900
    mul         I0, I6, 900
    sub         I1, I1, I0
    mul         I6, I6, 2
    add         I7, I7, I6
    #D
    div         I6, I1, 500
    mul         I0, I6, 500
    sub         I1, I1, I0
    add         I7, I7, I6
    #CD
    div         I6, I1, 400
    mul         I0, I6, 400
    sub         I1, I1, I0
    mul         I6, I6, 2
    add         I7, I7, I6
    #C
    div         I6, I1, 100
    mul         I0, I6, 100
    sub         I1, I1, I0
    add         I7, I7, I6
    #XC
    div         I6, I1, 90
    mul         I0, I6, 90
    sub         I1, I1, I0
    mul         I6, I6, 2
    add         I7, I7, I6
    #L
    div         I6, I1, 50
    mul         I0, I6, 50
    sub         I1, I1, I0
    add         I7, I7, I6
    #XL
    div         I6, I1, 40
    mul         I0, I6, 40
    sub         I1, I1, I0
    mul         I6, I6, 2
    add         I7, I7, I6
    #X
    div         I6, I1, 10
    mul         I0, I6, 10
    sub         I1, I1, I0
    add         I7, I7, I6
    #IX
    div         I6, I1, 9
    mul         I0, I6, 9
    sub         I1, I1, I0
    mul         I6, I6, 2
    add         I7, I7, I6
    #V
    div         I6, I1, 5
    mul         I0, I6, 5
    sub         I1, I1, I0
    add         I7, I7, I6
    #IV
    div         I6, I1, 4
    mul         I0, I6, 4
    sub         I1, I1, I0
    mul         I6, I6, 2
    add         I7, I7, I6
    #I
    add         I0, I7, I1
    local_return P10 

Friday, July 11, 2014

Problem 88 - A struggle completed in C

Problem 88 was quite the problem. I am absolutely certain that the answer I have now is not the best solution, but it is definitely a solution. This problem investigates "Product Sum numbers" - numbers that are both the sum and product of collections of numbers. I got some insight into some of the math going on by reading a paper I found online on the subject: http://www-users.mat.umk.pl/~anow/ps-dvi/si-krl-a.pdf. From what I found in that paper, I wrote up a solution in Python, which gave the correct answer...in 30 minutes. In the search for a faster solution, I realized that I was actually computing all I needed for an answer in the first minute or so of the problem and then used a lot of unnecessary computation to find the answer.

The key to making the problem tractable was to notice that though the problem asks for the minimal Product-Sum numbers for a given k (size of the set for which it is a PS number), the easier domain was to look at numbers and find for what k values they are product sum numbers. When I made this switch, I when from minutes to seconds in the speed of my code.

However, the script that I ran to initially compute the answer was written in python, for ease, but I knew I had to convert to another language eventually for the language challenge. I ended up going with C, and while C works, I ran into...a bit of an issue. In Python I was just using lists and wasn't aware of just how much memory was flying around, but when I wrote a solution in C I found out that the solution actually involves allocating many million-element arrays...so after pushing the size of my arrays and making sure to allocate and free memory well to avoid seg faults, I eventually got to a working solution, which is below. Runs in about 16 seconds in my machine. Much of that time is probably spent in malloc.

#include<stdio.h>
#include<stdlib.h>

int BOUND = 12000;

/** Gets the sums of all factoriations of n, minus the number of terms in the
 *  factorization. This is most of the k-value for which n is a Product-Sum number
 *  k-value = n - factorizationSum*/

/** Admittably not the most efficient of code...this is what I get for
 *  trying to transfer code originally written in python
 */
void getAllFactorizationSums(int n, int includeself,
        int maxlength, int* output, int* outputLength) {

    /**Don't look at exceptionally long factorizations*/
    if (maxlength == 0) {
        output[0]     = -1;
        *outputLength =  1;
        return;
    }
    int*  returnValue = (int*) malloc(sizeof(int) * 15 * BOUND);
    int  rvLength = 0;
    int f;
    for (f = 2; f*f <= n; ++f) {
        if (n % f == 0) {
            getAllFactorizationSums(n / f, 1, maxlength - 1, output, outputLength);
            int j;
            for (j = 0; j < *outputLength; ++j) {
                int fSum = output[j];
                if (fSum != -1) {
                    returnValue[rvLength++] = fSum + f - 1;
                }
            }
        }
    }
    if (includeself) {
        returnValue[rvLength++] = n - 1;
    }
    //copy back to output
    int i;
    for (i = 0; i < rvLength; ++i) {
        output[i] = returnValue[i];
    }
    *outputLength = rvLength;
    free(returnValue);
} 

int min(int x, int y) { return x < y ? x : y; }

int main(int argc, char ** argv) {
    int* productSumsForK = (int*) malloc(sizeof(int) * (BOUND + 1));
    productSumsForK[0] = 0; productSumsForK[1] = 0;
    int i;
    /**Initialize with upper bounds*/
    for (i = 2; i <= BOUND; ++i) {
        productSumsForK[i] = 2 * i;
    }
    /**Allocate lots of space...most of it actually gets used!*/
    int* factorizationSums = (int*) malloc(sizeof(int) * 15 * BOUND);
    int fsLength = 0;
    int N;
    for (N = 4; N <= 2*BOUND; ++N) {
        getAllFactorizationSums(N, 0, 15, factorizationSums, &fsLength);
        for (i = 0; i < fsLength; ++i) {
            int kIndex = N - factorizationSums[i];
            if (kIndex <= BOUND) {
                productSumsForK[kIndex] = min(productSumsForK[kIndex], N);
            }
        }
    }

    //Now, we need to convert to a set to print the answer
    char set[BOUND/4];
    for (i = 0; i < BOUND/4; ++i) { set[i] = 0; }
    for (i = 0; i <= BOUND; ++i) {
        int psNum = productSumsForK[i];
        set[psNum / 8] |= (1 << (psNum % 8));
    }
    int sum = 0;
    for (i = 0; i < 2*BOUND; ++i) {
        if ((set[i / 8] >> (i % 8)) & 1) {
            sum += i;
        }
    }
    printf("%d\n", sum);
    return 0;
}

Thursday, July 10, 2014

Problem 50 - Postscript

I have been working out a solution to the next problem, problem 88, and I figured I wanted a pretty capable language for my solution to that one, so I decided to free up such a language. Now, the language i had free to do this freeing up with was Postscript. Prior to this project, my main experiences with Postscript had been writing an interpreter for a subset of the language, and using the language way back when in my initial 24 hour run to solve problem 7. Now, however, I used Postscript to free up C, which means that I used it to solve problem 50. Once I got into the swing of things, the stack-basedness wasn't too bad, the main issues that arose were due to my inability to parse error messages, causing bugs to be hard to remove (and, as would be expected when using an unfamiliar programming paradigm, bugs abounded). This problem gave me a chance to write an implementation of my smarter sieve method from my analysis before...though in my attempt to also generate an independent isPrime method, I messed up at first (an independent isPrime method cannot assume the input number is not divisible by 2 or 3, it actually needs to check that!). But anyway, other than lots of little mistakes like that, things weren't too bad. Postscript isn't the most blazingly fast language though, so even with a number of optimizations that were not present in my C solution to problem 50, the solution below runs in about 50 seconds on my machine, whereas my C solution ran in about 0.5s. But either way, it runs in under a minute, so it is acceptable. Now, to solve problem 88 in C.
%Implementation of "Smartest Sieve" from Sieve Timing :)
%Assumes symbol /primes is defined
/isPrime {
    /x exch def
    /ip true def
    /i 0 def
    /for2 {
        /p primes i get def
        x p mod 0 eq {
            /ip false def
        } {
            p p mul x le {
                /i i 1 add def
                for2
            } if
        } ifelse
    } def
    for2   
    ip     
} def

/primeSieve {
    /N exch def
    /primes N array def
    primes 0 2 put
    primes 1 3 put
    primes 2 5 put
    /primeLength 3 def
    /n 7 def
    /for1 {
        n isPrime {
            primes primeLength n put
            /primeLength primeLength 1 add def
        } if
        /n 2 n mul 3 mod 2 mul n add def
        n N lt {
            for1
        } if
    } def
    for1
} def

%Assumes symbol /primes is defined
/partialSum {
    /sum 0 def
    /high exch def
    /low exch def
    /k low def
    /for3 {
        /sum sum primes k get add def
        /k k 1 add def
        k high lt {
            for3
        } if
    } def
    for3
    sum
} def

/BOUND 1000000 def
%Method defines symbols /primes and /primeLength
%We do not use any return value
BOUND primeSieve
/ans 0 def
/maxChainLength 1 def
/lindex 0 def
/for4 {
    lindex primeLength lt {
        /hindex lindex maxChainLength add def
    hindex primeLength lt {
        /psum lindex hindex partialSum def
        /for5 {
            hindex primeLength lt psum BOUND lt and {
                psum isPrime {
                    /ans psum ans max def
                    /maxChainLength hindex lindex sub def
                } if
                /psum psum primes hindex get add def
                /hindex hindex 1 add def
                for5
            } if
        } def
        for5
        /lindex lindex 1 add def
        for4
    } if
    } if
} def
for4

ans pstack pop

Sunday, July 6, 2014

Sieve Timing Continued - A simpler method

After writing up all my sieve timing stuff yesterday, I was thinking of things I could do to add more information to my set of sieve times, and in an attempt to implement some additional optimizations, I realized a method that I probably should have realized a long time ago: instead of computing sub-list of primes based on square roots and such, why not just generate a list of primes in place? Somehow this never came to mind before, but this time it did, and lo and behold, it is about as fast or maybe faster than the recursive method (In the data I have it is much faster at 10^8, but I only ran one trial, so that information is sketchy). Most importantly though, the code is much simpler. The code is below:

/** For some reason this didn't occur to me for a really long time */
void smartestSieve(int bound, int* primes, int* primeslength) {
    primes[0] = 2; primes[1] = 3; primes[2] = 5; int mLength = 3;
    int n;
    for (n = 7; n < bound; n += 2 * ((2*n) % 3)) {
        int isprime = 1;
        int i;
        for (i = 2; i < mLength; ++i) {
            int p = primes[i];
            if (p*p > n) {
                break;
            } else if (n % p == 0) {
                isprime = 0;
                break;
            }
        }
        if (isprime) {
            primes[mLength++] = n;
        }
    }
    *primeslength = mLength;
}

Timing information is below, first, I slotted it into the original graph, removing the two slowest methods:

 And, to emphasize the "part that matters", its comparison with the recursive method:

The information used to generate these graphs is now on the second page of the sheet linked to in the previous post, https://docs.google.com/spreadsheets/d/18-avzFTZDDOPMh7p_FRPOVc9SVfEhxPEoVFGQPTyNWk/edit?usp=sharing

I am now considering looking at the advantages of hand tooled assembly and/or comparing the same methods in Python.

Saturday, July 5, 2014

(Something Different) - Sieve Timing

So, most of my posts on this blog are direct solutions to Project Euler problems. However, this time I am writing on something I did which is kind of meta-Project Euler related. A lot of Project Euler problems involve prime numbers, and therefore Prime Sieves often have to be written. For anyone who may not know, a prime sieve is a program that finds primes basically by checking numbers for their divisibility by other numbers: if a number is not divisible by any of the numbers below it (other than 1), then it is prime. However, there are plenty of optimizations to this process that can be done.

For a recent problem (87), I decided to write up a somewhat complicated prime sieve that calls itself recursively to ensure that you only ever "sieve" with prime numbers. That is, to check that a number is prime, instead of checking for divisibility by all numbers less than it, or even all numbers less than the square root, or even all odds less than the square root, you only do divisibility tests with primes. I am pretty happy with my sieve,  which I wrote up first in python quite a while ago. However, after spending the time to rewrite it in D, I wondered whether it was really worth it. Does the improved sieve actually save a significant amount of time over a simpler sieve?

I decided to investigate this question thoroughly. I wrote up a series of sieve methods in C (so that I can feel confident about what code is actually getting executed, unlike in a language like python). The methods are as follows:

  1. isPrimeDumbest: checks primality of a number n by dividing by 2..n-1
  2. isPrimeSqrt: checks primality of a number n by dividing by 2..sqrt(n)
  3. isPrimeSqrt2: checks primality of a number n by dividing by 2 and also all odd numbers 3..sqrt(n).
  4. isPrimeSqrt23: checks primality of a number n by dividing by 2 and 3 and then also all numbers in 5..sqrt(n) not divisible by 2 or 3.
  5. smartSieve: checks primality of a number n by dividing by all primes in 2..sqrt(n)
Obviously checking only up to the square root will be better than checking all numbers, and obviously checking only odd numbers will give an approximately 2x speed up. The main things I was considering was how much also predividing by 3 helps (it lets you check only 2/7 of numbers, but requires a wee bit extra computation), and whether the overhead of recursion and memory accesses allowed the smart sieve method to be as smart as I wanted it to be.

And in the end, yes, it is as smart as I wanted it to be: to compute all primes below 10^8, in the tests I ran, smartSieve is about 5x faster than isPrimeSqrt, and also more than twice as fast as isPrimeSqrt23. So, in conclusion, smartSieve is worth using, at least if the number of primes concerned is at least around 10^7 (and notably, problem 87, which brought up all of this, required finding all primes less than 5*10^7). Also the tests drove home just how bad it is to not at least shorten things to the square root and truly brute force things...the dumbest method takes about 5 minute at 10^6, while the next slowest method still takes under half a second to compute all primes.

Below are graphs of my data: The horizontal axis is the N, the number below which all primes were computed. It is on a log scale, so I ran tests on all powers of 10 from 10^1 to 10^8 (100 million). The vertical axis is time, measured in seconds. For smaller N, as the code would run too quickly to get any meaningful timing results I ran each sieve 1000000 times for N = 10 to 10^3, then for 10^4 to 10^7, each sieve was run 50 times, except for the dumbest sieve, which I ran only once at 10^5 because its just super slow, it doesn't really matter how much, and for all of these the result I show is the computed average running time of multiple trials. Because 10^8 takes so long and the data is spread out enough, I only ran these tests once instead of computing an average.

Now that I have fully described what the data is, here is the data, which shows exactly what one would expect, except for perhaps the magnitudes. The first graph shows all methods (except no data for the dumbest method above 10^6 because that would take too long to compute), and the second graph shows all but the dumbest method run on small N, so that differences can actually be seen (though the differences are oh-so small on an absolute scale).

In case you want to know the exact actual results: all the data is in a google spreadsheet here: https://docs.google.com/spreadsheets/d/18-avzFTZDDOPMh7p_FRPOVc9SVfEhxPEoVFGQPTyNWk/edit?usp=sharing


Lastly, here is the code I ran to generate the information. In order to get code running as quickly as possible, and because it doesn't hurt, the below was compiled with gcc -O3 (highest amount of optimization). I haven't included the testing code, but its a fairly no-frills python script.

Also, it is notable that I am sure there are other prime sieve methods that may be faster out there in this world, and maybe I should eventually check out others, but this analysis is sufficient for my own purposes for now at least.

#include<stdio.h>
#include<stdlib.h>
#include<math.h>

/** Very brute force sieve
 *  Always checks 2..n-1
 */
int isPrimeDumbest(int n) {
    int i;
    for (i = 2; i < n - 1; ++i) {
        if (n % i == 0) {
            return 0;
        }
    }
    return 1;
}

/** Mostly brute force, but checks
 *  2..n-1
 */
int isPrimeSqrt(int n) {
    int i;
    for (i = 2; i*i <= n; ++i) {
        if (n % i == 0) {
            return 0;
        }
    }
    return 1;
}

/** Checks for 2 explicitly, then does a += 2 */
int isPrimeSqrt2(int n) {
    if (n % 2 == 0) { return n == 2; }
    int i;
    for (i = 3; i*i <= n; i+=2) {
        if (n % i == 0) {
            return 0;
        }
    }
    return 1;
}

/** Sieves out both 2 and 3 */
int isPrimeSqrt23(int n) {
    if (n % 2 == 0) { return n == 2; }
    if (n % 3 == 0) { return n == 3; }
    int i;
    for (i = 3; i*i <= n; i += (i % 3 == 1 ? 4 : 2)) {
        if (n % i == 0) {
            return 0;
        }
    }
    return 1;
}

/** Function that takes one of the isPrime Methods above
 *  and sieves:
 *  For consistency, all sieves know 2 is prime
 *  and then check all odds
 */
void mostSieves(int bound, int* primes, int* primeslength, int (*sieveFunc)(int)) {
    primes[0] = 2;
    int c = 1;
    int i;
    for (i = 3; i < bound; i+=2) {
        if (sieveFunc(i)) {
            primes[c++] = i;
        }
    }
    *primeslength = c;
}
/** Recursively determines prime lists
 *  So that mods are only taken for
 *  primes <= sqrt */
void smartSieve(int bound, int* primes, int* primeslength) {
    if (bound < 11) {
        primes[0] = 2; primes[1] = 3; primes[2] = 5; primes[3] = 7;
        *primeslength = 4;
        return;
    }
    int rbound = (int)sqrt(bound * 1.0f);
    smartSieve(rbound, primes, primeslength);
    int partialLength = *primeslength;
    int mLength       = partialLength;
    int current = rbound;
    if (current % 2 == 0) { ++current; }
    while (current < bound) {
        int isprime = 1;
        int i ;
        for (i = 0; i < partialLength; ++i) {
            int p = primes[i];
            if (p*p > current) {
                break;
            } else if (current % p == 0) {
                isprime = 0;
                break;
            }
        }
        if (isprime) {
            primes[mLength++] = current;
        }
        current += 2;
    }
    *primeslength = mLength;
    return; 
}

int main(int argc, char** argv) {
    if (argc < 3) { 
        printf("%s\n", "Not enough arguments");
        return 1;
    }
    int fchoice  = atoi(argv[1]);
    int bound    = atoi(argv[2]);
    int numtimes = atoi(argv[3]);
    /** Allocate an over estimate in the beginning
     *  So that resizing does not effect times
     */
    int* primes = (int*)malloc(bound * sizeof(int));
    // Why not use this place in memory for this...
    int primeslength = 0;
    int i;
    for (i = 0; i < numtimes; ++i) {
    switch(fchoice) {
        case 0:
            mostSieves(bound, primes, &primeslength, &isPrimeDumbest);
            break;
        case 1:
            mostSieves(bound, primes, &primeslength, &isPrimeSqrt);
            break;
        case 2:
            mostSieves(bound, primes, &primeslength, &isPrimeSqrt2);
            break;
        case 3:
            mostSieves(bound, primes, &primeslength, &isPrimeSqrt23);
            break;
        case 4:
            smartSieve(bound, primes, &primeslength);
            break;
        default:
            printf("Not a valid choice\n");
            return 1;
    }
    }
    return 0; 
}

Thursday, July 3, 2014

Problem 7 - GLSL

By this point, I have worked in a lot of programming languages. And, for obvious reasons, just about any programming language that I have used in my life for something, I have used in this blog to solve a Project Euler problem. However, this has not been the case for quite all languages. As I do work in the graphics lab at Williams College, naturally I have worked with GLSL, a shader language. Now, as GLSL is intended for shaders and running on the GPU, there are some issues using it for a Project Euler problem.

Perhaps the largest and most obvious issue is how to output an answer in GLSL. Shaders produce images, so clearly in order to output the answer from GLSL, I had to create an image. There are these things called 7-segment displays that display numbers pretty well, without much complicated logic, so I wrote up some GLSL to make a 7-segment display, or, to be more specific, some webGL. webGL is less featured than GLSL, including some annoying features like how in wbGL for loops must have constant bounds. The 7-segment display can be seen on shadertoy: https://www.shadertoy.com/view/4ssSz2.

Further frustrations with webGL include lack of much support for integer arithmetic (almost all of graphics is done with floating point numbers). See the hand-written modulus function and hand power-of-10 calculations for more info on this particular issue...

Then, when I had my code almost all running, I ran an issue with webGL. Remember how I said that for loops must have constant bounds? Well, the reason why is because the loops must get unfolded in the compiler. Now, unfolding a loop 150000 times...that is not the best, indeed, attempting to compile my shader crashed webGL on the machine I was using. So, because that didn't work, I may yet return to webGL to try to solve a problem, keeping in mind all the additional issues.

GLSL has smart loop bounds and doesn't need to do crazy stuff like unfold loops, so my solution below ran and gave the answer. Now, another interesting thing about using GLSL. This code is a pixel shader. What that means is that it is run once per pixel, using information about where that pixel is on screen. So though GPUs are known for being very good at parallelizing things, this problem used parallelism in the dumbest way possible, which is having every pixel compute the answer by itself, and then check if the pixel should be green or black in order to display it in a 7-segment display. Even though about 1000x more work was done than necessary, this runs in a reasonable time (maybe a couple seconds? I didn't set up anything to time it) on my machine. And here is the output, in all of its glory:

#define NUMBER_TO_PRINT esol
#define DISTANCE_BETWEEN_DIGITS 0.05
#define LINE_WIDTH 0.05
#define ON_COLOR  vec4(0.0, 1.0, 0.0, 1.0);
#define OFF_COLOR vec4(0.0, 0.0, 0.0, 1.0);

float digits(float x) {
    if (x < 2.0) { return 1.0; }
    return ceil(log(x) / log(10.0));
}

int imod(int x, int y) {
    return x - y*(x/y);
}

int euler7() {

    int numprimes = 1;
    int ans = 0;
    for (int i = 3; i < 150000; i += 2) {
        bool isprime = (imod(i, 2) != 0);
        for (int j = 3; j*j <= i; j += 2) {
            isprime = isprime && (imod(i, j) > 0);
        }
        numprimes  = numprimes + int(isprime);
        ans       += (int(numprimes == 10001) * i);
        numprimes += int(numprimes == 10001);
    }
    return ans;
}

uniform vec3 iResolution;

void main(void) {

    int esol = euler7();
    vec2 uv = gl_FragCoord.xy / iResolution.xy;
    //Below line switches between glsl and webgl coordinaes
    //uv.y = 1.0 - uv.y;

    float N = digits(float(NUMBER_TO_PRINT));
    float width = (1.0 - (N-1.0)*DISTANCE_BETWEEN_DIGITS) / N;
    float digit = floor(uv.x / (width + DISTANCE_BETWEEN_DIGITS));

    //coordinates within box
    float digitStartx = (width+DISTANCE_BETWEEN_DIGITS)*digit;
    float digitEndx   = digitStartx + width;
    vec2 sub_uv = vec2((uv.x - digitStartx) * 1.0/(digitEndx - digitStartx), (uv.y - .2) * (1.0/.6));
    //Computing a power of 10 was made harder because webgl
    //Could do something less silly in actual glsl
    int p = 1;
    for (int i = 0; i < 10; ++i) {
        p *= (1 + int(i < int(N) - int(digit)) * 9);
    }
    int currentDigit = imod(NUMBER_TO_PRINT, p) / (p/10);

    //Determining the position
    bool tiptop   = sub_uv.y < LINE_WIDTH && sub_uv.y > 0.0;
    bool top      = sub_uv.y < 0.5 - LINE_WIDTH/2.0&& sub_uv.y > 0.0;
    bool left     = sub_uv.x > 0.0                 && sub_uv.x < LINE_WIDTH;
    bool middle_x = sub_uv.x > LINE_WIDTH          && sub_uv.x < 1.0 - LINE_WIDTH;
    bool middle_y = sub_uv.y > 0.5 - LINE_WIDTH/2.0&& sub_uv.y < 0.5 + LINE_WIDTH/2.0;
    bool right    = sub_uv.x > (1.0 - LINE_WIDTH)  && sub_uv.x < 1.0;
    bool bottom   = sub_uv.y > 0.5 + LINE_WIDTH/2.0&& sub_uv.y < 1.0;
    bool botbot   = sub_uv.y > 1.0 - LINE_WIDTH    && sub_uv.y < 1.0;
    

    //shorthand

    int c = currentDigit;
    bool bit =(top && left  && (c != 1 && c != 2 && c != 3 && c != 7)) ||
              (top && right && (c != 5 && c != 6))                     ||
              (tiptop && middle_x && (c != 1 && c != 4))               ||
              (middle_x && middle_y && (c > 1 && c != 7))              ||
              (bottom && left && (c/2*2 == c) && c != 4)               ||
              (bottom && right && (c != 2))                            ||
              (botbot && middle_x && (c != 1 && c != 4 && c != 7));
    gl_FragColor = float(bit && true) * ON_COLOR + float(!(bit && true)) * OFF_COLOR;
}