This site is supported by donations to The OEIS Foundation.

User:Georg Fischer/B-file check

From OeisWiki
Jump to: navigation, search

At the beginning of 2019 Martin Pedersen sent lists of b-files which differ seriously from the DATA section in the sequences. Martin found various problems, and NJAS asked for assistance. The progress of this small maintenance project is noted here.

References

  • B-files - describing the strict format with comments only at the beginning of the file and of the lines, and ASCII encoding. There is a change proposal to UTF-8 pending, but not approved.
  • Deleted sequences - stating that b-files must also be deleted (by an admin).
  • Charles' b-file formatting rules with the loose format description

Project pages

Signatures

Neil, Martin and me are correcting many b-files. We are using a format like this:

%H Jean-Marc Falcoz, <a href="/A303570/b303570.txt">Table of n, a(n) for n = 1..6625</a> 
  (shortened by _N. J. A. Sloane_, Jan 18 2019)
%H Charles R Greathouse IV, <a href="/A111076/b111076_1.txt">Table of n, a(n) for n = 1..10000</a>
  (a(0)=1 added by _Martin Møller Skarbiniks Pedersen_, Jan 19 2019)

This makes it easier to see which b-files have been changed. So if you see a comment like that, please do not remove it.

Formal problems in b-files

Script bfanalyze.pl matches the non-comment, non-empty lines with the regular expression regular expression (loose format):

\A\(-?\d+)\s+(\-?\d{1,})\s*(\#.*)?\Z
   index     term          comment?

Comments behind terms are tolerated (email NJAS 2017-01-17). A scan of all b-files showed various (rare) problems:

  • Lines not obeying the loose format for "index term":
    • missing term,
    • more than one term,
    • terms with non-digits, for example 2.7e+11,
  • Lines longer than 1000 characters (over 5000 b-files).
  • Index not strictly increasing.
  • Index and term distributed over 4 separate lines.

The following b-files with non-digits (ndig) and/or non-increasing (ninc) problems were either edited, deleted or shortened by Neil or Gfis (2019-01-17):

A002975 e   1161   ninc@1160
A078559 s   1703   ninc@1703
A084706 d   1000   ndig@445  13300240106204236588439439029
A103269 e     11   ndig@11 1213121121312121312112131213121
A112927 s   1206   ndig@607 
A288723 e      1   ndig@1 ninc@1 
A288724 e      3   ndig@1 ninc@2 
A288725 e      2   ndig@1 ninc@3 
A300191 e   2500   ninc@1939
A300813 s     70   ndig@70 ninc@70
A302109 e     61   ndig@54 1410996161970523870803793140730
A303002 s    303   ndig@303 111111111111111111111111111111}
A303570 s  10000   ndig@6626 2,5901E+11
A316347 e    719   ndig@719 ninc@719
A319154 e  10000   ndig@1 0 1
A321214 d    200   ndig@47 1946149855069343009842873176425

B-files which have fewer entries than the DATA section

NJAS examined 49 cases, and deleted 42 "shorter" b-files. (2019-01-05)

bextra.txt - 414 unlinked b-files

Martin provided a bigger list of b-files from sequences which were recycled. He manually checked about 800 b-files. There is a list bextra_combine.txt of 414 candidates. All entries were classified manually (by replacing the %b line by a code %0..%8). There are also 9 extracts containing the A-numbers only. Neil deleted gf1, gf2, gf5 (2019-01~13).

gf0 -    22 # ok, corrected
gf1 d    24 # b-file DELETEd since "dead"
gf2 d   145 # b-file DELETEd since severe differences and mentioned
gf3 ?    68 # b-file is longer, %H link to it is missing
gf4 ?     7 # like gf3, but offset differs
gf5 d    34 # b-file DELETEd since terms were the same
gf6 ?     9 # strange cases, some terms seen in the b-file
gf7 ?    85 # severe differences, but not mentioned in Wiki-Deleted
gf8 ?    20 # no terms, b-file only; new allocated
total    414

Extraction of index ranges from the links to the b-files

A typical link record in the internal format is:

%H A000003 N. J. A. Sloane, <a href="/A000003/b000003.txt">Table of n, a(n) for n = 1..20000</a>

GFis generated a tab-separated file for 145323 user defined b-files. The generated record for the example link line is:

Name           low     high    code        author
b000003.txt    1       20000   a(n)    ..  N. J. A. Sloane

The index range cannot easily be extracted when for sequences which are the flattened version of tables, triangles, antidiagonals etc. bflink.txt contains special codes in these cases, but some 200 descriptions could not be parsed.