[Ex-plain] Latest DTD (was ZIG)

Robert Sanderson azaroth at liverpool.ac.uk
Thu Mar 7 12:25:57 CET 2002


> * Is it an SGML or XML DTD? I would have assumed it should be for XML.
>   If so, 'implied' must be 'IMPLIED', 'required' must be 'REQUIRED',
>   remove '- -' for omit thingies, and Explain-- in comments causes problems!
>   (-- ends the comment!)

I write SGML DTDs because that's what we use, but the SGML that it 
produces would be XML legal.
(eg no <br/>, tags are case sensitive, "s needed on attributes etc)

> * My XML parser also said that XML could not use 
>     <!ELEMENT (a|b|c) (#PCDATA)>

Element lists like this are valid in SGML, but apparently not in XML :/

This brings back up the point of multiple representations of the same
model.  While the elements described in the DTD are XML legal, the DTD
itself does not constitute a proper XML DTD. (XML doing this was daft IMO
as almost nothing is gained, but you get that)

So long as all of the representations have the same resulting content 
model, I don't see a problem in allowing more than one official 
representation.  This would mean that when people can't use one, they 
don't have to do what Alan ended up having to do and rewriting it, 
possibly introducing inconsistencies if they don't fully grok it.

For ease and consistency of implmentation, I really think we need an SGML 
DTD, XML DTD, and XML Schema.


> * The same name is being used for search, sort, and scan. The assumption
>   is these belong to the same space of index names. Is this true all the 
>   time? Should the index name be unique in a single XML record?

I don't follow?

Index describes a single index in the database.  This allows for multiple 
attributes to be grouped together as aliases for a single index. This will 
be especially useful (IMO) when we get BIB2, as then we can have an author 
index with both BIB1 and BIB2 attributes associated with it.

In the latest version there is no <name> element in <index> as it was 
irrelevant, so what do you mean by index name? 


> * What does 'primary' mean on indexes?

Primary on any element means that this should be used in preference to any 
other similar elements.

To use the example above, if the database maintainer wanted people to use 
BIB2 rather than BIB1 as a transition phase, then they would put 
primary="true" on the BIB2 map.  Equally the same goes for <host>, 
<title>, <description> etcetc.


> * What is <indexType>?

The 'type' of the index, if this is not verifiable by the attributes. (But 
can be used even if it is)
For example, I have my own attribute set for collectable card games.  It 
has an attribute and index for 'card name' and if I were doing a cross db 
title search I would want to search this, but it wouldn't be possible to 
know to do this without some typing mechanism.

> * Do you really want to turn attribute types (numbers) into different
>   schema elements? This would restrict the population space and gives
>   a big long list of element names. Would it be better to use numbers
>   avoiding mapping problems? I was not sure what <hitcount> etc meant.

Yes.  Otherwise you need to put attribute set on Every element which has 
them as use and access are both '1' (etc)


> * Should attribute values distinguish between string and numeric attribute
>   values?

Why, and if so, how?

> * What is a <sortKeyword>?

A keyword that is accepted for sorting.


> * What are legal format names? (What is the exact format etc)

The OID, or the official name for it?  GRS-1, SUTRS, MARC, XML, SGML, etc


> * Should there be able to be a description for element sets (rather than
>   just the names?)

Why?  I can see a Title for elementset, but if we allow a description, 
then we should also allow a description for each index. 

> * Should this be a record in IR-Explain-1 with a record syntax of XML?
>   I defined a ExplainCategory index with a value of "Explain--" if you
>   want to search for all records.

I think it should be a record in IR-Explain---1 :)

> <indexInfo> 
> <index id="DatabaseName" search="true" scan="true" sort="false"> 
> <indexTitle primary="false">Database Name</indexTitle> 
> <indexType>clever</indexType> 
> <map attributeset="1.2.840.10003.3.1000.62.0.10.1.33541.5992.210"
> primary="false"> 
> <use>numeric 1</use> 
> </map> 
> </index> 

Should be, IMO, <use> 1 </use> as use attributes are only ever numeric.
See indexType and primary explanation above. 


> <index id="F" search="false" scan="false" sort="true"> 
> <indexTitle primary="false"></indexTitle> 
> <indexType>clever</indexType> 
> <map primary="false"> 
> </map> 
> </index> 

This is the old and stupid way of representing a sort by keyword, although 
one which I quite liked.

New way is <sortKeyword> F </sortKeyword>, which I like even more.


Rob

-- 
      ,'/:.          Rob Sanderson (azaroth at liverpool.ac.uk)
    ,'-/::::.        http://www.o-r-g.org/~azaroth/
  ,'--/::(@)::.      Special Collections and Archives, extension 3142
,'---/::::::::::.    Twin Cathedrals:  telnet: liverpool.o-r-g.org 7777
____/:::::::::::::.              WWW:  http://liverpool.o-r-g.org:8000/
I L L U M I N A T I





More information about the Ex-plain mailing list