AlZimmermannsProgrammingContests
Yahoo Groups

Search

-----
emertz3
19.03.2013 #5433

Finished complete search for 12 steps (15!,16!,17!) Ran 55 hours, 19 trillion nodes using 12 threads on a 3930K.
-----
Al Zimmermann
20.03.2013 #5435

You have the number of distinct integers reachable in 12 or fewer steps? You should enter those numbers on the OEIS.


Al


-----
Dima
22.04.2013 #5511

I have updated https://oeis.org/A217032 with these results. Please check once it has been approved by the OEIS moderators.
-----
(Tomas G. Rokicki)
22.04.2013 #5513

Shall I add a(18) and a(19), since the full search through depth 12
(which I also did in full generality) excludes a solution of length 12,
yet we know solutions of length 13?

-tom
-----
(Tomas G. Rokicki)
22.04.2013 #5514

Read "operations" rather than "length" or "depth"; sorry.
-----
emertz3
22.04.2013 #5515

My last version was down to 35.5 hours. There is still lots of room for improvement. How fast did yours run Tom?
-----
Gil Dogon
22.04.2013 #5516

Tom, are you sure you did a full search ? did you consider also numbers that are not factors of the target factorial ?
How did you do it ?

I did something which is also very close to full search, but excluded numbers that were greater than 2^64.
My algorithm was doing full enumeration of SLPs until length 10, and then I had a nice little algorithm that
could check if an SLP can be extended by three or less steps to reach a given target in O(10^2) which I ran
for every such node.
So in principle if I did not have a bug anywhere, I would say that also 20,21,22 are 14 (as if they was a 13 steps SLPs my algorithm should have detected it).
-----
(Tomas G. Rokicki)
22.04.2013 #5517

Mine took four days, single thread Macbook Air when it was actually on.
(The Macbook Air lives in my backpack off a lot of the time, so I don't
have a *cpu* time, but I'm going to roughly guess it is about 80 core
hours.) No tricks at all; simple straightforward code.

23T nodes. I'm not doing *all* possible duplicate pruning. Arithmetic
is mod 2^64.

But I just looked at the code and realized I'm not doing it *quite* right.
I check a
right for values that overflow. I'm going to need to rerun this. Do you
handle this case? Of course 19! << 2^64, and it's very unlikely this
actually made a difference, but with this problem things fall short of
being an actual proof. So I have to retract my claim.

-tom
-----
(Tomas G. Rokicki)
22.04.2013 #5518

Howdy, Gil!

Full search mod 2^64 except I did not consider subtraction correctly
for values that overflowed, so I withdrew my claim.

Your trick on the tail is a good one (I used something similar for
some earlier results). If you're confidence in your result and your
program I won't re-run my code since yours trumps mine.

I think I can fix it easily just by keeping track of which numbers
overflow (pretty easy to do) and considering subtraction in both
directions if any predecessor overflowed. I don't think this will
appreciably increase the search time or node count.

-tom
-----
emertz3
22.04.2013 #5519

12 threads/6 cores, HT gave me 25%

35.5 hours * 1.25 * 6 cores = 266.25 core hours

I spend too much time seeing if a number is in a list. I assumed an in order list took more time than was saved by faster look ups.

ed
-----
Gil Dogon
22.04.2013 #5520

Howdy Tom !

I have quite high confidence of no bug in the algorithm, though it is not that straightforward,as there are quite a few sub-cases. Also looking at the code now, I saw, that it still assumed the last operation is a multiplication, so it does not exclude the extremely unlikely possibility of a 13 step solution finished with an addition. As mentioned below I did not bother at all with numbers greater than 2^64, so I would not consider my result a full enumeration.
Maybe I should modify it along the lines suggested , i.e. work mod 2^64, keep track of overflows and then do subtraction in both directions AND also consider last addition, but right now I am taking time of the factorials to do
my actual job, and spending some time with my neglected family ...
-----
(Tomas G. Rokicki)
22.04.2013 #5521

Okay, great. I've made the change so it now tracks overflows and considers both subtractions in those
cases, and I'll rerun.

For duplicate elimination, I check to see if a number is already in a list just by using a bitmap of size
1MB (first 8M integers); if there are duplicates of larger numbers, I will waste some time, but these are
relatively rare. Of course in the new code that tracks overflows I'm careful to permit an overflow that
coincides (mod 2^64) with an earlier number, to make sure the search is truly exhaustive.

The program is running. Whether I let it complete or not depends on what else I need that Mac for.
(My desktops are back on doing cube work at the moment.)

But this won't let me duplicate Gil's result (and not considering addition/subtraction on the last step
means, I think, we haven't been sufficiently exhaustive for a proof.) But it will let me confirm
Ed's result. Ed, can you explain how did you your arithmetic and how you handled overflow? Did
you go to bigints on overflow, or something?

-tom
-----
Gil Dogon
22.04.2013 #5522

Tom, why are you using bitmap and not some small hash table ?
Since the set in which we need to check for duplicates is quite small, a small hash is good and compact, but
I guess also the 1MBytes bitmap is not so bad as it will be very sparsely accessed , so most access will
not be cache misses ....

Here is a small fragment of my small hash code:

//
// The very simple small_hash here is only used to detect collisions in a small set that is supposed to be
// of size much less than HSIZE
// this will be true in our case as the set will be about nstep^2 for step quite small anyway ...
//
#define SM_HBITS 14
#define HSIZE (1< #define SM_HMASK (HSIZE-1)
#define MAXSET 100*100
static int64 g_small_hash[HSIZE+MAXSET]={0};
static uchar g_small_hash_gens[HSIZE+MAXSET]={0};
static int g_small_hash_locs[MAXSET]={0};
static int64 g_small_hash_list[MAXSET]={0};
static int g_small_hash_size=0;

inline int small_hash_insert(int64 v,int gen)
{
int loc=(v^(v>>15))&SM_HMASK; //idiotic hash function but sufficient for out case ....
while(g_small_hash[loc]!=0) {
if(g_small_hash[loc]==v){
g_small_hash_gens[loc]=gen;
return 0;
}
loc++;
}
g_small_hash[g_small_hash_locs[g_small_hash_size++]=loc]=v; // sorry I could not resist that :)
g_small_hash_gens[loc]=gen;

assert(g_small_hash_size < MAXSET);
return 1;
}

void small_hash_clear()
{
int i;
for(i=0;i g_small_hash_size=0;
}

You can figure the small_hash_get routine yourself.
-----
(Tomas G. Rokicki)
22.04.2013 #5524

Bitmap is fast; little code, and always in cache. The inner loop of
this search is tiny, so every instruction counts.

I do use a hash at the leaves to check for good results (I check for
n! for n=1..19 or so, and also k*n! where k<=n, to accumulate
"statistics" on solutions to numbers that are similar to factorials.)
-----
emertz3
22.04.2013 #5525

I did not handle overflows. I also assumed there would not be a large subtraction for the last step. Would an upper limit of 2^128 be enough for 15!, 16! and 17! ? 128-bit search is not that much slower.
-----
(Tomas G. Rokicki)
22.04.2013 #5526

I see no basis to accept that even using 128-bit numbers
constitutes a proof. Perhaps someone else can prove
that? I mean, the chances are vanishingly small, but
we are talking about a proof, right?
-----
emertz3
22.04.2013 #5527

We are trying to prove there is no solution in 11 steps for 15!, 16!, 17! We know of solutions in 12 steps. Using the infinite precision cpp_int ( Boost ), that run shouldn't take that long and removes the error of overflows. Proving there are no solutions in 12 steps for 18! and 19! is where time is a problem.
-----
(Tomas G. Rokicki)
23.04.2013 #5529

Ahh, my program that detects overflows and does both directions of
subtractions, and does everything mod 2^64, has completed through
11 operations and not found anything for 15!, 16!, or 17! (though it
did find k*15! for k in {3,10,13,14} in 11 steps, and k*16! for k in {11,14}
also in 11 steps.) So 15!, 16!, and 17! are proven in full generality.

I'll let the program grind for a while; it shouldn't take too long for 18! and
19!. And then I'll take a look at updating the OEIS entry.

Maybe some time later Gil might be motivated to extend it with his
endgame strategy (more powerful than mine!) to take it further.

-tom
-----
Dima
23.04.2013 #5530

I already added those, just waiting for the comment to be accepted.
-----
emertz3
23.04.2013 #5538

Added bit-vector and it doubled my speed. Thank you for that.

I actually have two lists. The second list keeps track of values to search. It was the only way to insure a full search and possibly eliminate some duplicate work (I found 19T not 23T nodes). If you can find these solutions for 8! without creating the second list, I will have to rethink my approach.

1,2,3,6,36,35,32,1120,40320
1,2,3,6,36,35,32,1152,40320
1,2,3,6,36,35,32,1260,40320
-----
(Tomas G. Rokicki)
23.04.2013 #5539

Can you explain what you mean by the second list?

I did indeed find all three of those:

New best 8 1 of 8: 1 2 3 6 36 35 32 1260 40320
New best 8 1 of 8: 1 2 3 6 36 35 32 1152 40320
New best 8 1 of 8: 1 2 3 6 36 35 32 1120 40320

You might try adjusting the size of the bitmap; if you make it
bigger, you cut down on redundant search but risk more
cache misses. The optimal size will depend on your CPU.
-----
emertz3
23.04.2013 #5540

1,2,3,4,6,36 next possible 32 (36-4) and 35 (36-1)
1,2,3,4,6,36,32 next possible 35 (32+3), not 36-1
1,2,3,4,6,36,35 next possible 32 (35-3), not 36-4

I am math stupid. Will there always be a way to build numbers from the previous level of the recursion so they are not lost? I am just missing something that is not obvious to me.
-----
(Tomas G. Rokicki)
23.04.2013 #5541

I use a pending number queue that holds numbers that are
still to be considered.

Each level of recursion gets a pointer into this queue, the
current "get" pointer.

The bitmap is used to ensure no number goes into the queue
more than once (as a duplicate entry) at any given time.

Each level of recursion adds all newly possible numbers (due
to the main sequence being extended by a new value) to
the pending queue.

Then it iterates through the queue from the current "get" pointer
to the end, considering each number in turn as the next
continuation.

Then it pops all the numbers it added to the queue, from the
queue, and backtracks.

The canonical ordering is enforced by only selecting numbers for
the result list, in the order they occurred in the pending queue.

Because I use a finite bitmap, numbers larger than the bitmap
checks for can occur in the pending queue more than once, but
only if there is more than one way to generate them. I've run
it with a large bitmap (1GB, good for numbers through 8G) and
small bitmap (1MB, good for numbers through 8M) and it's
slightly faster for 1MB on my Mac despite searching more nodes.
I have not tried any intermediate sizes.

-tom
-----
emertz3
23.04.2013 #5543

Exactly what I am doing. Only difference is I keep numbers that are too large to fit in the bit-vector in a list. My second list is your pending number queue. I don't have a bit-vector for my queue, just my ignore list.

The optimal size for my bit vector is 3.5MB. 3930K, 2MB L2, 15MB L3
-----
(Tomas G. Rokicki)
23.04.2013 #5544

Ahh, so you do exact duplicate pruning (with the bit-vector overflow list),
where I just do approximate pruning. It would be interesting to see which
approach is faster. I probably win on per-node time, but do more nodes.

I believe almost all the real time in the program is spent dealing with
numbers that are reasonably small so it's probably a wash.

-tom