User:Charles R Greathouse IV/Metadata

This is a page for my thoughts about metadata (data about data) in the OEIS. In all cases the basic idea is to take some recurrent feature and expose it in some way so that it can be searched for, rendered differently, user-customized, etc.

See also Features Wishlist#Sequence Metadata for requested features and discussion.

Current state

There is a great deal of metadata in the OEIS at present. The keywords are a major part: for example, keyword:tabl allowed the addition of a table output format (see, e.g., A007318/table) and similarly with keyword:cons (e.g., A000796/constant).

Contributors to the OEIS are now tagged with underscores which cause them to be auto-linked. I do not believe it is possible to search for them (other than as plain text, as usual) but this mechanism should allow better automated parsing.

Sequence properties

See User:Charles R Greathouse IV/Properties for thoughts on properties and their relationships.

Keywords

See User:Charles R Greathouse IV/Keywords for information on individual keywords.

Keywords have been the primary form of metadata for the OEIS since its creation. In its current incarnation keywords can be searched and have title-text allowing new users to understand their meanings more readily (though many are intuitive even without this hovertext).

Index

Because there are so few keywords, the Index is the fallback method for collecting similar sequences together. Unfortunately, in its present implementation:

The meaning of inclusion is not well-specified. This is appropriate and useful for an index, but limits its utility when using it for other purposes. For example, quadratic form primes by discriminant: would a sequence *about* but not *of* primes of that discriminant be included? As an index, it would be useful to include such other sequences but in other contexts this may be undesirable.
It is hard to search for. The names are long and many do not get their own name/id attribute. Also there are entries with Index links that do not match the spelling of the Index entry.
It is relatively inflexible (new entries are rarely added).

Despite these drawbacks I strongly recommend adding Index links to entries. If this is the only way sequence properties are ever tagged then we should make the best of it we can. If we eventually move to a different system then the Index links can form the start of that system by some automated process.

Of course there are things that an index is good at where it should not be replaced. I see its function, ideally, as cataloging "related to" rather than "is a" relationships. A sequence "is" monotonic, a relationship better described by a keyword than an index, but a sequence related to monotonicity (say, A158939) but which need not be itself monotonic is perfect for the index. (Similarly, A083140 is not actually a permutation of the natural numbers but it should have—and does have!—an index link to permutations of the natural numbers.)

New index entries

There are a number of areas where the index should be expanded. For example, there is an index entry for transcendental numbers, but none yet for algebraic numbers; this should be created and populated. Similarly, there should be an index for periodic sequences. Both of these should be built out like the linear recurrences index entry, with degree for algebraic numbers and period for linear recurrences. Also, there should probably be an entry for polynomial sequences and quasipolynomials.

Other approaches

Beside the existing keywords, there are many properties that seem worth coding such as being monotone, completely multiplicative, additive, sub-/super-additive, or even "a rearrangement of $\mathbb {N}$ ". Also it would be good to have information on the recognizability of sequences: A038772 is a regular language in decimal, and a number of sequences (primes, 2^n-1, etc.) are regular in unary. Similarly, when there are results showing that a particular sequence is/is not context-free, context-sensitive, or decidable/recursive this seems worth mentioning. (Almost all sequences in the OEIS should be at least recursively enumerable; A004147 is one of the rare uncomputable sequences in the OEIS.)

I would also very much like to be able to classify the growth rates of the monotone sequences; this could lend itself to searching very well. I'm not sure what the best way to do this is; some system where the types are meaningful rather than just text would be ideal, so that adding more information would not detract from the entry.

I would also like to be able to mark sequences and sequence properties which are dubious (guessed / open / conjectured) or simply not rigorously proven yet. I prefer, ceteris paribus, to define sequences without reference to conjecture. For example, A059784 could be defined as either $\lfloor k^{2^{n}}\rfloor$ or as a(n+1)=nextprime(a(n)^2). The former relies on the existence of such k ('obvious' but unproven) while the latter exists unconditionally.

Many sequences have their generating function listed in a standardized form. I'd love to be able to search for sequences by properties implied by these generating functions, like sequences with exponential growth.

Finally, there are many natural equivalence classes of sequences. It would be good to mark these somehow, probably by choosing a representative from each class. (No need for AC, since there are only finitely many sequences in the OEIS...)

Identifiers

People

OEIS contributors are now identified with their standard user name surrounded by underscores. This causes the name to be auto-linked to the user's page, and opens the door at some later point to various forms of automated processing.

It would be good to find user names not marked in this way and mark them, but this is not a high priority.

Perhaps something should be done to mark the names of other people (besides just the contributors). When searching this would make it easier to find people with names that can be spelled differently (Chebyshev), names with accented characters, names that are often abbreviated, names that change (married names? personal, religious, or cultural changes?), and so forth. It should also make it possible to disambiguate common names.

Programs

The first priority with programs is to distinguish the languages in the "other language" field. This way

Searching is made easier (look, for example, at the number of variants used to describe Visual Basic or Scheme, or the difficulty of searching for Maxima programs)
There is potential to format programs with, e.g., GeSHi or SyntaxHighlighter.
The entry can be formatted differently, perhaps (e.g.) showing two rows with "Python" and "MAGMA" rather than one with "Program:" in the left column
By exposing this content, scripting the OEIS becomes easier.

Another priority is to distinguish versions from each other. (See, for example, the issues with Maple versions between A006506 and A191779.) What runs in Mathematica 10 may not run in Mathematica 8, etc. This should support multiple versions and/or version ranges: Math'ca 6+ or Pari/GP 2.3.1–2.4.2. Ideally (but this seems more difficult) related languages could share implementations: Octave and Matlab or Excel and OOo Calc.

A low priority would be to distinguish comments from program so that they could be treated differently by, e.g., search.

Other

It could be useful to identify other things uniquely.

Languages: The ISO 639-3 codes, possibly together with IANA subtags per BCP 47, can be used in a tag's lang attribute. (It would be nice to be able to distinguish when sources are in Latin or French, for example.) This could be as simple as permitting the hreflang attribute in anchors in the links section.
Journals: A journal may have several abbreviations or even several names in addition to the full form of its current name. Consider (not a great example...) "Mathematics of Computation" vs. "Math. Comp." vs. "Mathematical Tables and Other Aids to Computation" vs. however that was abbreviated.
Authors: Perhaps ORCiD would be useful?
Books: Different printings, translations, etc. (via ISBN?); distinguish books with similar or identical names; WorldCat or other links; attach other relevant metadata concerning author, language, etc.

General metadata

Dates

The OEIS is in a relatively good position having standardized its date format as either mmm dd yyyy or mmm dd, yyyy. But dates are not easily searched: imagine trying to find sequences from the first half of 2010. That would require a complicated search: [1]. But worst, try searching for a comment from July 2010. You can't just search for comments with both 2010 and July, because that would match a sequence with one comment from July 2009 and another from April 2010.

It would be nice to serve dates conforming to the OEIS standard in HTML <time datetime="..."> elements.

<time datetime="2030-01-12">Jan 12, 2030</time>

Templating

Certain things show up frequently in the database, like links to MathWorld. In many ways it would be nice to collect these together. For example, what if one changed names, say from "World of Mathematics" to "MathWorld"? For the wiki side there is {{MathWorld}}, but nothing for the sequence side at the moment. (Actually, I'm not even sure what the recommended format for such links is now....)

Some possibilities:

Abramowitz & Stegun
MathWorld
Wikipedia
the Internet Archive
the EIS and HIS

Semantics

POSH, especially rel attributes, would be good:

rel=alternate for links to text, internal format, or JSON (not currently linked from sequence entries, but could be)
rel=help for links to welcome and format (https://oeis.org/wiki/Welcome and https://oeis.org/eishelp2.html)
rel=license for link to https://oeis.org/wiki/Legal_Documents
rel=nofollow could be used for user-submitted external links, either those on a blacklist or those not on a whitelist
rel=stylesheet is probably already used
rel=tag could be used for links to the index and possibly keywords

Microformats like citations, hCalendar, hCard would be great, though probably not a high priority. We may already meet WCAG 2.0 (there's also a WCAG 3 draft), but it may be worth checking. (Any accessibility experts want to chime in?)

Subject

It would be nice to have subject identifiers for sequences. Filtering sequences to look for chemistry-related sequences, or quantum physics, or number theory, or zeta functions... in general this would require constructing an appropriate ontology, but using the MSC plus ad-hoc additions for subjects outside of mathematics would probably suffice. A simpler alternative would be to use the arXiv classification.

This could be built on the wiki side and simply linked; the category structure seems singularly appropriate, though we should probably impose a DAG requirement on the structure so that descendants and ancestors could be searched without creating loops.

User:Charles R Greathouse IV/Metadata

Contents

Current state

Sequence properties

Keywords

Index

New index entries

Tags

Other approaches

Identifiers

People

Programs

Other

General metadata

Dates

Templating

Semantics

Subject

Navigation menu

Views

Personal tools

Navigation

Search

Tools