--- Larry Sanger <lsanger(a)nupedia.com> wrote:
First, let me say that, despite all appearances,
I'm
not trying to criticize Magnus (surely no one
should be criticized for doing work for
Wikipedia ;-) ). Or Simon! I just think we need to
think harder about all of these issues.
Maybe Simon can answer the question Magnus couldn't
(or, not very well as far as I could tell): what is
the *purpose* of this feature? What is it supposed
to accomplish?
Well, as I see it, the purpose is to make the
administration of lists of pages easier -- these lists
could be categories (such as Philosophy, or
Mathematics, or so on) -- but I also think other lists
such as Biographies, Mathematics, U.S. Presidents,
International Intergovernmental Organizations, and so
on.
Consider for example if I was to put the article on
Bill Clinton in the Biography listing, and the U.S.
Presidents listing. At the moment I have to edit two
or three pages -- the Biography and the U.S.
Presidents listing to add a link to the article, and
the article to add a backlink to the lists (if so
desired). With Magnus proposal, as I understand it,
I'd just have to insert "{{{CATEGORY Biography,
US_Presidents}}}" into the article and I'd be done.
Also, with these categories, its easier to
automatically extract the articles in Wikipedia on
that category, since software can parse "{{{CATEGORY
...}}}" more easily than links on a category page
(since it can't tell which links point to pages within
the category, and which links point elsewhere --
unless it has AI, of course).
And supposing we want to divide articles up by
traditional academic discipline, using this lets us
easily see which articles have not yet been assigned.
(We'd simply have to do a query to see which articles
are not in one of the categories which make up our
category scheme.)
[snip]
OK, this is helpful: categories could be used to
sort articles into broad compilations (associated
with academic fields like linguistics) that, for
whatever reason (and who can predict what reasons
they would be?), people might like to have.
I can see a use for that, but I think it needs to be
clarified further: is the claim that it would be
useful to have articles sorted only at the
top-level categories (those on the HomePage, just
for example) or at some expanded level? If at some
expanded level, depending on *how* expanded,
the purpose(s) of the feature might change
considerably.
Well, I see two uses for this. Firstly, putting
articles into broad categorisations, such as those
used on the home page. Second, is maintaining lists
like U.S. President, and so on.
I'm not proposing we try to design a complex
hierarchial classification scheme. For starters, we
can put all the Maths pages into a "Mathematics"
category. But, if someone wants to go to the trouble
of creating additional categories,
"Mathematics--Analysis" and "Mathematics--Topology"
and "Mathematics--Geometry" and so on, why not let
them? Lets create a basic category scheme to start
with, and let it grow finer over time as (and if)
people see the need.
We should also allow people to create low-level
categories without fitting them into a hierarchy -- I
should be able to go ahead and create the category
U.S. Presidents, without having to decide whether it
belongs under History or Politics or what, or where it
should belong in some finer subclassification of those
topics. Later on, if we feel the need, and once the
category system is sufficently evolved, we can worry
about how to fit these standalone low-level categories
into broader ones.
Moreover, it is *not* at all clear to me
(particularly given, as Simon says, we might want to
support multiple category schemes--though this too
I might doubt, as I'll have to explain) that this
particular implementation is the best. What *would*
be the best way to sort articles into broad
categories? Why not, for example, have special
pages that simply list article titles, so that,
e.g.,
[[linguistics category]] would contain nothing but
links to articles that someone asserts belong to the
category of linguistics? We could just as easily
autogenerate lists of uncategorized pages, and for
each article page we could have the script look at
the category: pages to see whether the article is in
a category.
I'm not saying that this is a better idea; I'm just
saying that there are multiple ways of going about
doing something, and it would be a mistake not to
consider them.
I think the approach you suggest would be inferior to
Magnus' for a number of reasons. Firstly, when looking
at an article, you would not be able to see which
categories it belonged to, unless someone added a
backlink to that category. Secondly, if I want to add
an article to three different categories, I have three
different pages I have to edit -- four if I want to
include a backlink in the article. Thirdly, by using
the "{{{CATEGORY ...}}}" notation, we can store the
categories directly in the database, if we want, as a
CATEGORY table -- which will mean faster access.
[snip]
> Larry suggested having a drop-down box, to limit
> the categories the users could choose from. I
think
> Wikipedia should support multiple category
schemes,
and should
allow anyone to add their own
categories.
Well, we could have multiple drop-down boxes, eh?
How else would we individuate our multiple category
schemes? The whole point of a drop-down box is to
disallow a category scheme from metastasizing for
frivolous reasons (such as that somebody didn't know
that an article that goes under XYZ really belongs
under XYZ, and so creates its own category--just an
example).
I agree that we need to find a way to stop frivolous
or accidental creation of categories. But at the same
time, I think we should create tools that can be used
for a wide variety of purposes, rather than putting
things in a straightjacket.
The main point behind having multiple category schemes
is that I can ask the software the question "Does this
page belong to any category in this scheme?" Suppose
we decide we want to place every Wikipedia article in
certain broad categories (Mathematics, Physics,
Chemistry, Biology, Philosophy, History, etc.), then
we want a list of all articles which do not currently
belong to one of those categories. Suppose "Bill
Clinton" belongs to the category "U.S. Presidents" --
that article should still come up in our list, since
although it belongs to a category, it does not belong
to the broad category scheme. So at the least, we'd
want a way of distinguishing subject categories (such
as History, Philosophy, etc.) from other categories
(like U.S. Presidents, Treaties, Roman Emperors,
Popes, etc.).
Secondly, suppose someone wants to create a
subclassification of a pre-existing category. Say
create categories "History--Ancient",
"History--Medieveal" and "History--Modern", or
"History--Europe", "History--North America",
"History--Asia" and so on. Then they'd want to ask
"show me all the articles in category History which
haven't been assigned to categories History--Ancient,
History--Modern or History--Europe". This would
involve some way of marking categories as
subcategories of a larger category. However, these
aren't just simple subcategories within the same
category scheme -- these are two orthogonal
categorisations. We want to be able to generate
separate "not yet assigned" lists for each.
To stop people accidentally or frivolousy creating
categories, we could make it so that people have to
execute some special procedure (e.g. a create_category
action on the script) to create a category (so they
don't accidentally automatically create one, by
mispelling it say.) That procedure should show them
what categories already exist, warn them against
creating them needlessly, explain Wikipedia policy on
categories, and so on.
We should also enable for categories to be deleted if
they are created in error. Only admins should do that
after consultation -- but we could permit anyone to
delete them, if we had an "undelete" feature (i.e. the
category disappears from the list of categories, but
its data is retained, so it can be undeleted.)
As to drop down boxes, they have their advantages and
disadvantages. The advantage is that they are easier
to use than {{{CATEGORY ...}}}. The disadvantage is
that if we ended up with a lot of categories, they'd
become unwieldly. They should be multiple selection
combo-boxes, not drop-down boxes, if we are going to
allow the one article to belong to multiple
categories. Finally, the problem with combo boxes is
that its easy to accidentally add or remove an article
from a category -- one misplaced click is all it
takes. At least with "{{{CATEGORY ...}}}", they have
to type or delete something.
But whatever user interface we choose, we can still
provide the same backend implementation.
That way we
can experiment, and see what works
best.
How exactly would we experiment? What would we be
seeing "works best"?
Well, as I said, create little categories for
things
like "U.S. Presidents", "Kings of France", "Bible"...
and if someone wants to subcategorize a broader
category (i.e. create "History--Asia" and
"History--United States"), let them. Let the system
evolve (just like how we let Wikipedia articles
evolve). Remove categories that are unneccesary or
stupid. Every now and then, look back over what is
there, and try to move things into a more coherent
system.
I think at this point we need to be very clear on
what we mean by "category scheme." On the one hand,
there are schemes of the sort we have on the Home
Page, or the Library of Congress catalog scheme.
Those schemes (1) list subjects, (2) arrange those
subjects under large headings ("supercategories"),
and even (3) provide an ordering of some
sort within the "supercategories." So when you say
we can experiment and see what works best, which of
these (or what combination) do you mean?
Well, now I thought about it more, what
I'm really
talking about is groups of categories which fit
together -- e.g. all different subdivisions of the one
broader subject along one aspect. (Kind of like a
faceted library classification, ala the Colon
Classification of S. R. Ranganathan.) So we'd really
only have one category scheme, it would just be, in
part, hierarchial and faceted.
We already do experiment with multiple category
schemes in the sense of a combination of (1) and
(2). But this doesn't list all articles, of
course.
But when Magnus proposes to allow us to list the
category, or categories, of a particular article on
an article's page itself (even within the body
of the article itself), he provides us no particular
category scheme in *any* of the senses in (1), (2),
or (3) (which is fine). What Simon asserts now is
that Magnus' feature allows is "multiple category
schemes."
Well, it doesn't at the moment allow that, but it
could be extended to do so, which is I suppose what I
am proposing.
[snip]
People with
alternative views of how to categorise
things can create their own category schemes (and
categorising things is one area where there are
often as many views as there are people, probably
because there is no one right answer.)
Wouldn't that make categorization particularly
pointless? No one person is going to categorize all
our articles, I imagine--no one person is
competent to do so, probably. That means we have to
work together on this. Now, I can see multiple
competing category schemes (maybe--but I'd
like to know what the purpose of *that* would be).
Say, two or three.
More than that, and, again, we've got a veritable
babel; in that case, I doubt any one scheme would
succeed in categorizing all the articles. Even
two or three is a little confusing: won't
"philosophy" be a category in any plausible scheme?
Similar with other traditional subjects. So how
will the competing category lists (not schemes,
really) be distinguished?
Now I think about it, I agree with you, so I withdraw
that aspect of what I'm proposing.
On the other hand, if we can agree in advance on one
set of categories, then, *for the purpose of sorting
articles into broad academic fields*
(which, as I said, seems like a clear, reasonably
useful purpose), we can *work together* on sorting
all the articles. That would be a good thing:
it could be a useful, accurate piece of metadata.
I think that would be useful
also. But I also think we
should permit the creation of many finer categories,
such as U.S. Presidents, or Kings of England, or
Treaties, or Thailand, or 12th century... and also
subdivisions of subjects, so we can have
"Philosophy--Philosophy of Religion" and
"Mathematics--Analysis" and "Law--criminal law"... I'm
not suggesting we design a whole detailed category
scheme from the top-down, but rather let one grow from
the bottom up...
> I think it would be nice if we could have
> different "category namespaces", to support
> multiple category schemes. There should also be a
> way to lock category namespaces: so I can have my
> own category namespace, and only I am allowed to
> assign pages to categories within it; or so (like
> Larry seems to be suggesting) people can't create
> their own categories, but they can assign pages to
pre-existing
ones.
I'm not sure I understand, exactly, but is the idea
here somewhat like the one I suggested above? Viz.,
we put the metainformation about categories
not on article pages but on special categorization
pages?
It was a badly thought out suggestion, so I withdraw
it.
[snip]
I really don't like to sound contrary (really, I
don't!), but I think that whenever we propose new
features that could potentially complicate the
process of building Wikipedia, and that could be
abused or misused (resulting in confusion if nothing
else), we should think more carefully about what we
are doing and why we're doing it, exactly.
My attitude is different -- build
versatile tools,
that can be used for many things, and then see what
useful things people can do with them. I agree though
we should be careful to avoid abuse or misuse, or
having some wizzy new software feature get in the way
of the primary purpose of Wikipedia, which is writing
articles (categorising them is only secondary).
[snip]
(1) It would help sort the Recent Changes page
nicely, so that specialists can, if they want, focus
just on articles in their areas. (Others could
view all categories at once.) If this were all we
needed to accomplish, then we might as well sort
*edits* into categories, not *articles*.
(2) It would allow us to produce a list of all the
articles in one broad area of study, which would no
doubt be useful for a variety of purposes. For this,
of course, we need to sort articles, not
edits.
Let me add:
(3) We need a heap of categories to make it easier to
maintain and manipulate lists of Presidents,
Philosophers, Mathematicians, Countries, International
Organizations, and so on. (One category per a list.)
(4) We need categories to group articles on some broad
topic, such as all the articles on the Bible, or
articles on Hinduism, or all articles on the U.S.
Government, or so on. When dealing with a broad topic
like these, it would be nice to see a list of all the
articles on the topic, to help improve the coherency
between the different articles on the topic. However,
although these are broad topics, they are a lot
narrower than the disciplines you suggest.
(5) We need categories to help progressively develop a
more structured category scheme for Wikipedia. (The
bigger we get, the more essential organizing is going
to be, or else everything will just turn into a
mess...)
Now, there are a variety of ways we could accomplish
both of these purposes. The best I think we've
heard so far would work like this:
On each article page, there is a multiple-selection
box that allows us to place articles into one or
more categories from among a set of categories
that is previously decided upon by Wikipedia members
and probably Nupedia as well (it would be nice if
the categories corresponded to Nupedia review
groups). This particular datum is editable like
anything else in the article. There is *also* a set
of pages sorting existing articles into
these categories based on the metadata found on the
articles; these might or might not be editable.
There is, as well, a page or several of
unsorted articles; from that page one could visit
different pages and sort them quickly.
The latter proposal would accomplish purposes (1)
and (2) as follows: the metadata would allow us to
sort the Recent Changes page so we can view
only those categories of articles we're interested
in; it would also allow us to generate (and further
organize, perhaps) broad categories of
articles. One can see an autogenerated Wikipedia
Encyclopedia of Mathematics, for example!
We could have both your proposal and
mine. We have a
fixed set of broad categories for your purposes (1)
and (2). And we have an expandable list of independent
categories or subcategories for my purposes (3)-(5).
Larry
Simon.
__________________________________________________
Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!
http://promo.yahoo.com/videomail/