[Zthes] Discussion of Extensions to ZThes Elements

Mike Taylor mike at indexdata.com
Fri Jun 18 12:20:42 CEST 2004

> Date: Thu, 17 Jun 2004 13:05:44 -0600
> From: "Dave Clarke" <dclarke at synaptica.com>
> Following is a summary of the some extensions to the ZTHes DTD that
> my company have identified as desirable. At Mike Taylor's suggestion
> I am circulating these to the ZThes group for comment:

Hello Dave, and welcome to the Zthes list.

A bit of background for everyone else: Dave's company make a thesaurus
authoring tool, which has an option to export its data as XML.  The
XML format they chose was Zthes-with-extensions, which we now want to
look at bringing into the standard Zthes DTD.  Bill Moen, who most of
you know, is the common link who introduced Dave and me.

> <termVocabulary>
>     Name of thesaurus - our users usually have multiple thesauri and
>     authority files with inter-vocabulary mappings, so every term
>     needs to identify the vocabulary that it belongs to.

I _was_ going to say that an alternative approach would be just to
represent the multiple vocabulary as separate Zthes thesauri, but the
need for inter-vocabulary mappings scotches that approach.

But when I think about inter-vocabulary mappings, I think about
linguistic equivalents (the LE relation-type) which induce mappings
between terms in different languages -- that is, with different
termLanguage fields.  Would it be reasonable to characterise the
termVocabulary requirement as a generalisation of this?

> <termStatus>
>     Active/deleted status - our users store and report on terms in
>     logically deleted/deactivated states as well as active
>     terms. Possible states are "Active", "Deleted", "Deactivated".

Seems reasonable enough to me, although I would like to better
understand the difference between Deleted and Deactivated.

However --

> <termApproval>
>     Candidate/Approved status - often a thesaurus contains a mix of
>     work-in-progress terms (candidates) and publishable (approved)
>     terms. Our system supports 5 approval states but we think 2
>     (Candidate & Approved) would be a sufficient minimum.

-- isn't there some semantic overlap between this and termStatus?
Other things being equal, I think we might prefer to introduce one new
field rather than two.  Or do you actually have two orthogonal axes
here, so that both Candidate and Approved terms can have any of the
Active, Deleted and Deactivated statuses?

> <termSortkey>
>     Sortkey for term - Synaptica creates a sortkey for every term
>     where numbers are parsed to support natural numeric ascendancy
>     in an alphabetical orderby clause and to removal certain special
>     characters and leading articles etc.

Can't this be done algorithmically?  Can you give examples of
situations in which an human-authored sort-key field is necessary?

I'll reply to the sub-element change requests in a separate message,
so this one doesn't get intimidatingly long.  Experience suggests that
messages longer than somewhat just don't get replied to :-)

 _/|_	 _______________________________________________________________
/o ) \/  Mike Taylor  <mike at indexdata.com>  http://www.miketaylor.org.uk
)_v__/\  "By the time you discover that you need a two-inch-thick
	 book to figure out how to format chapter headings, your
	 check has already been deposited in the Bank of Redmond" --
	 Jakob Nielsen, "Designing Web Usability"

Listen to free demos of soundtrack music for film, TV and radio

More information about the Zthes mailing list