From: oleg Date: Mon, 10 Mar 2003 13:27:38 +0000 (+0000) Subject: Added standalone document, which describe MARC indexing process. X-Git-Tag: ZEBRA.1.3.8~20 X-Git-Url: http://lists.indexdata.dk/?a=commitdiff_plain;h=4ec243cf897e7d84cf8860ff3b0086c5364306e7;p=idzebra-moved-to-github.git Added standalone document, which describe MARC indexing process. --- diff --git a/doc/marc_indexing.sgml b/doc/marc_indexing.sgml new file mode 100644 index 0000000..acb20f6 --- /dev/null +++ b/doc/marc_indexing.sgml @@ -0,0 +1,317 @@ + + + + + + + Indexing of MARC records by Zebra + + Zebra is suitable for distribution of MARC records via Z39.50. We + have a several possibilities to describe the indexing process of MARC records. + This document shows these possibilities. + + + + + + Simple indexing of MARC records +Simple indexing is not described yet. + + + + Extended indexing of MARC records + +Extended indexing of MARC records will help you if you need index a +combination of subfields, or index only a part of the whole field, +or use during indexing process embedded fields of MARC record. + + +Extended indexing of MARC records additionally allows: + + + +to index data in LEADER of MARC record + + + +to index data in control fields (with fixed length) + + + +to use during indexing the values of indicators + + + +to index linked fields for UNIMARC based formats + + + + + +In compare with simple indexing process the extended indexing +may increase (about 2-3 times) the time of indexing process for MARC +records. + + +The index-formula + +At the beginning, we have to define the term index-formula +for MARC records. This term helps to understand the notation of extended indexing of MARC records +by Zebra. Our definition is based on the document "The +table of conformity for Z39.50 use attributes and RUSMARC fields". +The document is available only in russian language. + +The index-formula is the combination of subfields presented in such way: + + + 71-00$a, $g, $h ($c){.$b ($c)} , (1) + + +We know that Zebra supports a Bib-1 attribute - right truncation. +In this case, the index-formula (1) consists from +forms, defined in the same way as (1) + + + 71-00$a, $g, $h + 71-00$a, $g + 71-00$a + + +The original MARC record may be without some elements, which included in index-formula. + + +This notation incudes such operands as: + + + + # + It means whitespace character. + + + + - + The position may contain any value, defined by MARC format. + For example, index-formula + + + 70-#1$a, $g , (2) + + +includes + + + 700#1$a, $g + 701#1$a, $g + 702#1$a, $g + + + + + + +{...} +The repeatable elements are defined in figure-brackets {}. For example, +index-formula + + + + 71-00$a, $g, $h ($c){.$b ($c)} , (3) + + +includes + + + 71-00$a, $g, $h ($c). $b ($c) + 71-00$a, $g, $h ($c). $b ($c). $b ($c) + 71-00$a, $g, $h ($c). $b ($c). $b ($c). $b ($c) + + + + + + +All another operands are the same as accepted in MARC world. + + + + + +Notation of <emphasis>index-formula</emphasis> for Zebra + + +Extended indexing overloads path of +elm definition in abstract syntax file of Zebra +(.abs file). It means that names beginning with +"mc-" are interpreted by Zebra as +index-formula. The database index is created and +linked with access point (Bib-1 use attribute) +according to this formula. + +For example, index-formula + + + 71-00$a, $g, $h ($c){.$b ($c)} , (4) + + +in .abs file looks like: + + + mc-71.00_$a,_$g,_$h_(_$c_){.$b_(_$c_)} + + + +The notation of index-formula uses the operands: + + + +_ +It means whitespace character. + + + +. +The position may contain any value, defined by MARC format. For example, +index-formula + + + 70-#1$a, $g , (5) + + +matches mc-70._1_$a,_$g_ and includes + + + 700_1_$a,_$g_ + 701_1_$a,_$g_ + 702_1_$a,_$g_ + + + + + +{...} +The repeatable elements are defined in figure-brackets {}. For example, +index-formula + + + 71#00$a, $g, $h ($c) {.$b ($c)} , (6) + + +matches mc-71.00_$a,_$g,_$h_(_$c_){.$b_(_$c_)} and +includes + + + 71.00_$a,_$g,_$h_(_$c_).$b_(_$c_) + 71.00_$a,_$g,_$h_(_$c_).$b_(_$c_).$b_(_$c_) + 71.00_$a,_$g,_$h_(_$c_).$b_(_$c_).$b_(_$c_).$b_(_$c_) + + + + + + +<...> +Embedded index-formula (for linked fields) is between <>. For example, +index-formula + + + 4--#-$170-#1$a, $g ($c) , (7) + + +matches mc-4.._._$1<70._1_$a,_$g_(_$c_)>_ and +includes + + + 463_._$1<70._1_$a,_$g_(_$c_)>_ + + + + + + + + +All another operands are the same as accepted in MARC world. + + + +Examples + + + + + + +indexing LEADER + +You need to use keyword "ldr" to index leader. For example, indexing data from 6th +and 7th position of LEADER + + + elm mc-ldr[6] Record-type ! + elm mc-ldr[7] Bib-level ! + + + + + + +indexing data from control fields + +indexing date (the time added to database) + + + elm mc-008[0-5] Date/time-added-to-db ! + + +or for RUSMARC (this data included in 100th field) + + + elm mc-100___$a[0-7]_ Date/time-added-to-db ! + + + + + + +using indicators while indexing + +For RUSMARC index-formula +70-#1$a, $g matches + + + elm 70._1_$a,_$g_ Author !:w,!:p + + +When Zebra finds a field according to "70." pattern it checks +the indicators. In this case the value of first indicator doesn't mater, but +the value of second one must be whitespace, in another case a field is not +indexed. + + + + + +indexing embedded (linked) fields for UNIMARC based formats + +For RUSMARC index-formula +4--#-$170-#1$a, $g ($c) matches + + + elm mc-4.._._$1<70._1_$a,_$g_(_$c_)>_ Author !:w,!:p + + +Data are extracted from record if the field matches to +"4.._." pattern and data in linked field match to embedded +index-formula 70._1_$a,_$g_(_$c_). + + + + + + + + + + + +