[Zthes] Discussion of Extensions to ZThes Elements

Mike Taylor mike at indexdata.com
Thu Jun 24 11:55:36 CEST 2004

> Date: Fri, 18 Jun 2004 06:33:21 -0600
> From: "Dave Clarke" <dclarke at synaptica.com>
> Thanks so much for your response and good questions. I am adding my
> clarifications inline below...

Hello again, Dave.  I left it a few days before replying in the hope
that someone else would jump in, but it looks like it's just thee and
me for the moment.  (Hello, the rest of you?)

> > > <termVocabulary>
> > >     Name of thesaurus - our users usually have multiple thesauri
> > >     and authority files with inter-vocabulary mappings, so every
> > >     term needs to identify the vocabulary that it belongs to.
> > 
> > When I think about inter-vocabulary mappings, I think about
> > linguistic equivalents (the LE relation-type) which induce
> > mappings between terms in different languages -- that is, with
> > different termLanguage fields.  Would it be reasonable to
> > characterise the termVocabulary requirement as a generalisation of
> > this?
> Most of the inter-vocabulary mappings we encounter are not of the
> type language equivalency.

No indeed -- that's why I suggested that your termVocabulary
requirement as a _generalisation_ of linguistic equivalents, rather
than arguing that LEs as they stand in the current Zthes profile meet
your requirement.

> Example 1 - a thesaurus of product and service terms that is then
> cross-references into a taxonomy of SIC Codes and NAICS Codes. Both
> the latter vocabularies are discrete thesauri (well not strictly
> thesauri but taxonomies are structurally a subset of thesauri and
> our users manage a lot of taxonomies in their thesaurus management
> application);

I am not sure what is meant by "taxonomy" here.  (Different groups
seem to use "thesaurus", "taxonomy", "authority", "ontology" and other
terms in overlapping and contradictory ways.)

> They will have intra-vocabulary relationships to handle the
> hierarchy and then inter-vocabulary relationships to map together
> equivalencies bwteen the thesaurus and SICs and NAICs.

Good.  Now, again, I think this is _technically_ equivalent to what we
support for linguistic equivalence, where we might have a portion of
thesaurus as follows: term "animal", with narrower terms "dog" and
"cat" (all three of them in English).  And alongside that, we have the
term "animal" (in French) with narrower terms "chien" and "chat".
Then there are LE links between the equivalent terms.  Isn't your
termVocabulary requirement conceptually similar to this?

My point is not to say that you should shoehorn your broad concept of
vocabulary into the existing narrower concept of language, but to
consider discarding the existing language _in favour of_
termVocabulary, and using that to express what's currently done with
the explicit termLanguage tag.

On reflection, though, I don't think that's a good idea.  I can
imagine a situation, and you can probably give examples, where the
same vocabulary is translated into different languages, so that terms
in that vocabulary would need to carry both termLanguage _and_
termVocabulary elements.

> > > <termStatus>
> > >     Active/deleted status - our users store and report on terms in
> > >     logically deleted/deactivated states as well as active
> > >     terms. Possible states are "Active", "Deleted", "Deactivated".
> > 
> > Seems reasonable enough to me, although I would like to better
> > understand the difference between Deleted and Deactivated.
> A deactivated term is "suspended" such that a duplicate of that term
> may not be added to the active thesaurus - if an attempt is made to
> do so the only choice the user has would be reinstate the
> deactivated term. A deleted term is a full logical delete. The term
> can still be retrieved by specifically searched for it in the
> thesaurus "trash can", and reinstated if desired [...]

OK, this makes sense.  Thanks for the clarification.

> [...] but unlike deactivated terms the user can add a duplicate of a
> deleted term as a new active term.

(Why would anyone want to do that?)

> One thing to be aware of here is that our system support unique and
> persistent numeric identifiers for all terms so the is a functional
> difference between reinstating a term (with the same UID as it one
> had) and creating a replacement that may be identical in everything
> except the UID.

Yup -- and that is, of course, termId in the Zthes model.

> > > <termApproval>
> > >     Candidate/Approved status - often a thesaurus contains a mix
> > >     of work-in-progress terms (candidates) and publishable
> > >     (approved) terms. Our system supports 5 approval states but
> > >     we think 2 (Candidate & Approved) would be a sufficient
> > >     minimum.
> > 
> > -- isn't there some semantic overlap between this and termStatus?
> > Other things being equal, I think we might prefer to introduce one
> > new field rather than two.  Or do you actually have two orthogonal
> > axes here, so that both Candidate and Approved terms can have any
> > of the Active, Deleted and Deactivated statuses?
> Agreed there is some semantic overlap between active status
> and candidacy status, but as far as most of our users are concerned
> the subtle difference between these is sufficent to view them as
> being orthoganal axes.

OK, I guess I can buy that.

> Further I would also refer to Z39.19 Section 8.6 which describes the
> need for various descriptive term states to exist that are
> conceptually different from the temporary suppression of a term
> (deactivation) and the logical deletion of a term.

Arrgh!  Got me!  :-)

> > > <termSortkey>
> > >     Sortkey for term - Synaptica creates a sortkey for every
> > >     term where numbers are parsed to support natural numeric
> > >     ascendancy in an alphabetical orderby clause and to removal
> > >     certain special characters and leading articles etc.
> > 
> > Can't this be done algorithmically?  Can you give examples of
> > situations in which an human-authored sort-key field is necessary?
> Yes in most cases it can be done algorithmically by the system to
> which the thesaurus extract is handed over, but there are cases
> where one wants different sort rules on a term-by-term basis. One
> example is to handle foreign loan words/phrases that appear in a
> thesaurus that is principally of one specific language. Another
> example would be in an all English thesaurus of scientific terms
> where a subset of chemical names need to be processed according to
> chemical sort rules.

Thanks -- once more, the examples are very helpful.  If these new
fields are added to the next version of the profile, do I have your
permission to include your examples?

And please, does anyone else have anything to say about these?

 _/|_	 _______________________________________________________________
/o ) \/  Mike Taylor  <mike at indexdata.com>  http://www.miketaylor.org.uk
)_v__/\  "Keep an open mind, but not so open that your brain falls out"
	 -- attributed to Carl Sagan.

Listen to free demos of soundtrack music for film, TV and radio

More information about the Zthes mailing list