AlZimmermannsProgrammingContests Yahoo Groups Search ----- emertz3 19.03.2013 #5433 Finished complete search for 12 steps (15!,16!,17!) Ran 55 hours, 19 trillion nodes using 12 threads on a 3930K. ----- Al Zimmermann 20.03.2013 #5435 You have the number of distinct integers reachable in 12 or fewer steps? You should enter those numbers on the OEIS. Al ----- Dima 22.04.2013 #5511 I have updated https://oeis.org/A217032 with these results. Please check once it has been approved by the OEIS moderators. ----- (Tomas G. Rokicki) 22.04.2013 #5513 Shall I add a(18) and a(19), since the full search through depth 12 (which I also did in full generality) excludes a solution of length 12, yet we know solutions of length 13? -tom ----- (Tomas G. Rokicki) 22.04.2013 #5514 Read "operations" rather than "length" or "depth"; sorry. ----- emertz3 22.04.2013 #5515 My last version was down to 35.5 hours. There is still lots of room for improvement. How fast did yours run Tom? ----- Gil Dogon 22.04.2013 #5516 Tom, are you sure you did a full search ? did you consider also numbers that are not factors of the target factorial ? How did you do it ? I did something which is also very close to full search, but excluded numbers that were greater than 2^64. My algorithm was doing full enumeration of SLPs until length 10, and then I had a nice little algorithm that could check if an SLP can be extended by three or less steps to reach a given target in O(10^2) which I ran for every such node. So in principle if I did not have a bug anywhere, I would say that also 20,21,22 are 14 (as if they was a 13 steps SLPs my algorithm should have detected it). ----- (Tomas G. Rokicki) 22.04.2013 #5517 Mine took four days, single thread Macbook Air when it was actually on. (The Macbook Air lives in my backpack off a lot of the time, so I don't have a *cpu* time, but I'm going to roughly guess it is about 80 core hours.) No tricks at all; simple straightforward code. 23T nodes. I'm not doing *all* possible duplicate pruning. Arithmetic is mod 2^64. But I just looked at the code and realized I'm not doing it *quite* right. I check a right for values that overflow. I'm going to need to rerun this. Do you handle this case? Of course 19! << 2^64, and it's very unlikely this actually made a difference, but with this problem things fall short of being an actual proof. So I have to retract my claim. -tom ----- (Tomas G. Rokicki) 22.04.2013 #5518 Howdy, Gil! Full search mod 2^64 except I did not consider subtraction correctly for values that overflowed, so I withdrew my claim. Your trick on the tail is a good one (I used something similar for some earlier results). If you're confidence in your result and your program I won't re-run my code since yours trumps mine. I think I can fix it easily just by keeping track of which numbers overflow (pretty easy to do) and considering subtraction in both directions if any predecessor overflowed. I don't think this will appreciably increase the search time or node count. -tom ----- emertz3 22.04.2013 #5519 12 threads/6 cores, HT gave me 25% 35.5 hours * 1.25 * 6 cores = 266.25 core hours I spend too much time seeing if a number is in a list. I assumed an in order list took more time than was saved by faster look ups. ed ----- Gil Dogon 22.04.2013 #5520 Howdy Tom ! I have quite high confidence of no bug in the algorithm, though it is not that straightforward,as there are quite a few sub-cases. Also looking at the code now, I saw, that it still assumed the last operation is a multiplication, so it does not exclude the extremely unlikely possibility of a 13 step solution finished with an addition. As mentioned below I did not bother at all with numbers greater than 2^64, so I would not consider my result a full enumeration. Maybe I should modify it along the lines suggested , i.e. work mod 2^64, keep track of overflows and then do subtraction in both directions AND also consider last addition, but right now I am taking time of the factorials to do my actual job, and spending some time with my neglected family ... ----- (Tomas G. Rokicki) 22.04.2013 #5521 Okay, great. I've made the change so it now tracks overflows and considers both subtractions in those cases, and I'll rerun. For duplicate elimination, I check to see if a number is already in a list just by using a bitmap of size 1MB (first 8M integers); if there are duplicates of larger numbers, I will waste some time, but these are relatively rare. Of course in the new code that tracks overflows I'm careful to permit an overflow that coincides (mod 2^64) with an earlier number, to make sure the search is truly exhaustive. The program is running. Whether I let it complete or not depends on what else I need that Mac for. (My desktops are back on doing cube work at the moment.) But this won't let me duplicate Gil's result (and not considering addition/subtraction on the last step means, I think, we haven't been sufficiently exhaustive for a proof.) But it will let me confirm Ed's result. Ed, can you explain how did you your arithmetic and how you handled overflow? Did you go to bigints on overflow, or something? -tom ----- Gil Dogon 22.04.2013 #5522 Tom, why are you using bitmap and not some small hash table ? Since the set in which we need to check for duplicates is quite small, a small hash is good and compact, but I guess also the 1MBytes bitmap is not so bad as it will be very sparsely accessed , so most access will not be cache misses .... Here is a small fragment of my small hash code: // // The very simple small_hash here is only used to detect collisions in a small set that is supposed to be // of size much less than HSIZE // this will be true in our case as the set will be about nstep^2 for step quite small anyway ... // #define SM_HBITS 14 #define HSIZE (1< #define SM_HMASK (HSIZE-1) #define MAXSET 100*100 static int64 g_small_hash[HSIZE+MAXSET]={0}; static uchar g_small_hash_gens[HSIZE+MAXSET]={0}; static int g_small_hash_locs[MAXSET]={0}; static int64 g_small_hash_list[MAXSET]={0}; static int g_small_hash_size=0; inline int small_hash_insert(int64 v,int gen) { int loc=(v^(v>>15))&SM_HMASK; //idiotic hash function but sufficient for out case .... while(g_small_hash[loc]!=0) { if(g_small_hash[loc]==v){ g_small_hash_gens[loc]=gen; return 0; } loc++; } g_small_hash[g_small_hash_locs[g_small_hash_size++]=loc]=v; // sorry I could not resist that :) g_small_hash_gens[loc]=gen; assert(g_small_hash_size < MAXSET); return 1; } void small_hash_clear() { int i; for(i=0;i g_small_hash_size=0; } You can figure the small_hash_get routine yourself. ----- (Tomas G. Rokicki) 22.04.2013 #5524 Bitmap is fast; little code, and always in cache. The inner loop of this search is tiny, so every instruction counts. I do use a hash at the leaves to check for good results (I check for n! for n=1..19 or so, and also k*n! where k<=n, to accumulate "statistics" on solutions to numbers that are similar to factorials.) ----- emertz3 22.04.2013 #5525 I did not handle overflows. I also assumed there would not be a large subtraction for the last step. Would an upper limit of 2^128 be enough for 15!, 16! and 17! ? 128-bit search is not that much slower. ----- (Tomas G. Rokicki) 22.04.2013 #5526 I see no basis to accept that even using 128-bit numbers constitutes a proof. Perhaps someone else can prove that? I mean, the chances are vanishingly small, but we are talking about a proof, right? ----- emertz3 22.04.2013 #5527 We are trying to prove there is no solution in 11 steps for 15!, 16!, 17! We know of solutions in 12 steps. Using the infinite precision cpp_int ( Boost ), that run shouldn't take that long and removes the error of overflows. Proving there are no solutions in 12 steps for 18! and 19! is where time is a problem. ----- (Tomas G. Rokicki) 23.04.2013 #5529 Ahh, my program that detects overflows and does both directions of subtractions, and does everything mod 2^64, has completed through 11 operations and not found anything for 15!, 16!, or 17! (though it did find k*15! for k in {3,10,13,14} in 11 steps, and k*16! for k in {11,14} also in 11 steps.) So 15!, 16!, and 17! are proven in full generality. I'll let the program grind for a while; it shouldn't take too long for 18! and 19!. And then I'll take a look at updating the OEIS entry. Maybe some time later Gil might be motivated to extend it with his endgame strategy (more powerful than mine!) to take it further. -tom ----- Dima 23.04.2013 #5530 I already added those, just waiting for the comment to be accepted. ----- emertz3 23.04.2013 #5538 Added bit-vector and it doubled my speed. Thank you for that. I actually have two lists. The second list keeps track of values to search. It was the only way to insure a full search and possibly eliminate some duplicate work (I found 19T not 23T nodes). If you can find these solutions for 8! without creating the second list, I will have to rethink my approach. 1,2,3,6,36,35,32,1120,40320 1,2,3,6,36,35,32,1152,40320 1,2,3,6,36,35,32,1260,40320 ----- (Tomas G. Rokicki) 23.04.2013 #5539 Can you explain what you mean by the second list? I did indeed find all three of those: New best 8 1 of 8: 1 2 3 6 36 35 32 1260 40320 New best 8 1 of 8: 1 2 3 6 36 35 32 1152 40320 New best 8 1 of 8: 1 2 3 6 36 35 32 1120 40320 You might try adjusting the size of the bitmap; if you make it bigger, you cut down on redundant search but risk more cache misses. The optimal size will depend on your CPU. ----- emertz3 23.04.2013 #5540 1,2,3,4,6,36 next possible 32 (36-4) and 35 (36-1) 1,2,3,4,6,36,32 next possible 35 (32+3), not 36-1 1,2,3,4,6,36,35 next possible 32 (35-3), not 36-4 I am math stupid. Will there always be a way to build numbers from the previous level of the recursion so they are not lost? I am just missing something that is not obvious to me. ----- (Tomas G. Rokicki) 23.04.2013 #5541 I use a pending number queue that holds numbers that are still to be considered. Each level of recursion gets a pointer into this queue, the current "get" pointer. The bitmap is used to ensure no number goes into the queue more than once (as a duplicate entry) at any given time. Each level of recursion adds all newly possible numbers (due to the main sequence being extended by a new value) to the pending queue. Then it iterates through the queue from the current "get" pointer to the end, considering each number in turn as the next continuation. Then it pops all the numbers it added to the queue, from the queue, and backtracks. The canonical ordering is enforced by only selecting numbers for the result list, in the order they occurred in the pending queue. Because I use a finite bitmap, numbers larger than the bitmap checks for can occur in the pending queue more than once, but only if there is more than one way to generate them. I've run it with a large bitmap (1GB, good for numbers through 8G) and small bitmap (1MB, good for numbers through 8M) and it's slightly faster for 1MB on my Mac despite searching more nodes. I have not tried any intermediate sizes. -tom ----- emertz3 23.04.2013 #5543 Exactly what I am doing. Only difference is I keep numbers that are too large to fit in the bit-vector in a list. My second list is your pending number queue. I don't have a bit-vector for my queue, just my ignore list. The optimal size for my bit vector is 3.5MB. 3930K, 2MB L2, 15MB L3 ----- (Tomas G. Rokicki) 23.04.2013 #5544 Ahh, so you do exact duplicate pruning (with the bit-vector overflow list), where I just do approximate pruning. It would be interesting to see which approach is faster. I probably win on per-node time, but do more nodes. I believe almost all the real time in the program is spent dealing with numbers that are reasonably small so it's probably a wash. -tom