The bibliographic profile supports the ingest of digitised written works consisting of multiple bounded or unbounded pages predominately containing handwritten or printed text, such as books, magazines, manuscripts, letters, notated music or newspapers. They are often described and maintained by libraries. This profile dictates how the media files (in formats such as TIFF, ALTO XML and PDF), their metadata, and the relationships between them, should be expressed and organized. It applies the MODS XML metadata schema for descriptive metadata.
A SIP MUST contain content of exactly one written work of type
newspaper edition;
book;
letter;
notated music;
magazine issue; or
manuscript.
The content MUST be digitised per page, i.e. each TIFF and/or ALTO XML file contained in their respective representation directories MUST represent exactly one page.
An exception to this requirement MAY be made with regards to a PDF file: it is RECOMMENDED to only use a single PDF that contains the entire contents (i.e. all pages are present in one single PDF file).
There MUST be exactly one IE present in the SIP, i.e. the written work.
There MUST be preservation metadata at the package level in the preservation/premis.xml file.
There MUST be preservation metadata at the representation level in the respective preservation/premis.xml files.
Preservation metadata in the SIP MUST be limited to the PREMIS metadata schema.
Only the MD5 hashing algorithm is allowed to compute the fixity, thus:
The value of element premis:premis/premis:object[@xsi:type="premis:file"]/premis:objectCharacteristics/premis:fixity/premis:messageDigestAlgorithm MUST be set to MD5.
The value of attribute premis:premis/premis:object[@xsi:type="premis:file"]/premis:objectCharacteristics/premis:fixity/premis:messageDigestAlgorithm/@valueURI MUST be set to "http://id.loc.gov/vocabulary/preservation/cryptographicHashFunctions/md5".
There MAY be descriptive metadata at the representation level (e.g. information about the representations, such as a title or a description).
Package METS
The csip:CONTENTINFORMATIONTYPE attribute MUST be set to OTHER and the csip:OTHERCONTENTINFORMATIONTYPE attribute MUST be set to https://data.hetarchief.be/id/sip/2.1/bibliographic.
The mets/dmdSec/mdRef/@MDTYPE attribute MUST be set to MODS.
Package Descriptive Metadata
A descriptive/mods.xml descriptive metadata file MUST be present at the package level.
The descriptive/mods.xml file MUST follow the MODS metadata schema (v3.7).
The descriptive/mods.xml file MUST contain a shared identifier with the preservation/premis.xml to indicate which PREMIS object is being described in the descriptive/mods.xml file.
The MODS metadata in descriptive/mods.xml MUST be limited to the elements and attributes outlined below.
General information
Element
mods:mods
Name
MODS root element
Description
This root element MUST contain the XML schema namespace of MODS. It MUST NOT contain any other XML schema namespaces besides MODS.
Cardinality
1..1
Obligation
MUST
Attribute
mods:mods/@version
Name
MODS version attribute
Description
This attribute indicates which version of MODS is being used. It MUST be set to 3.7 to indicate conformance with MODS v3.7.
A unique identifier for the written work. This identifier MUST be shared with the relevant PREMIS object in the preservation/premis.xml file. This metadata element MUST NOT contain any attributes.
This element contains a persistent identifier for the record that describes the written work, which typically originates from the source application. The record identifier is different from the identifier that identifies the written work itself, which is denoted by mods:identifier.
This element contains information about the main title of the written work. This element MUST NOT contain a @type attribute in order to designate the main title and differentiate it from other optional <mods:titleInfo/> elements.
Cardinality
1..1
Obligation
MUST
Element
mods:mods/mods:titleInfo[not(@type)]/mods:title
Name
MODS title element
Description
This element contains the title of the written work. Its parent element (<mods:titleInfo/>) MUST NOT contain a @type attribute.
This element contains alternative information about the title of the written work (e.g., alternative titles for a newspaper or magazine). This element MUST contain a @type attribute set to alternative and the attribute @otherType MUST be present.
This element contains an alternative title of the written work. Its parent element (<mods:titleInfo/>) MUST contain the @type attribute set to alternative.
This element contains a term or terms that designate a category characterizing a particular style, form, or content of the written work, such as artistic, musical, literary composition, etc.
This element contains information about the written work’s origin, e.g., when and where it was created or published
Cardinality
1..*
Obligation
MUST
Attribute
mods:mods/mods:originInfo/@eventType
Name
MODS issuance date element
Description
This attribute specifies the type of event that should be associated with the originInfo. This attribute is not required, but if present, its value MUST be set to publication, meaning that the origin info is about when the written work was published.
This element contains the date the written work was created. Its value MUST be EDTF-compliant, as indicated by the @encoding attribute which MUST be set to edtf.
This element contains the date the written work was issued. Its value MUST be EDTF-compliant, as indicated by the @encoding attribute which MUST be set to edtf.
This attribute indicates whether the note describes the condition of the written work of dictates the statement of responsibility. Its value MUST be either statement of responsibility or condition.
Vocabulary
statement of responsibility, condition
Cardinality
1..1
Obligation
MUST
Element
mods:mods/mods:physicalDescription/mods:extent
Name
MODS extent element
Description
This element is used to express a physical dimension of the written work indicated by the @unit attribute, such as the number of pages, the number of sheets, or its physical measurements. For expressing the physical size of the written work, the metric unit cm (centimeter) or mm (millimeter) is used; the value MUST be in the form {width} X {height}, with {width} and {height} being values of type Integer.
This element contains the number of the series in which the written work was published. The @type attribute of its parent element (i.e. <mets:relatedItem/>) MUST be set to series.
This element contains the number of the series in which the written work was published. The @type attribute of its parent element (i.e. <mets:relatedItem/>) MUST be set to series.
This element contains the Abraham identifier taken from the Abraham Belgian Newspaper Catalog. Note that an Abraham identifier refers to newspaper titles rather than newspaper editions; multiple editions can therefore share the same Abraham identifier.
This element MUST contain the @type attribute, with its value set to abraham_id. The @type attribute of its parent element (i.e. <mets:relatedItem/>) MUST be set to series.
This element contains the Abraham URI taken from the Abraham Belgian Newspaper Catalog. Note that an Abraham URI refers to newspaper titles rather than newspaper editions; multiple editions can therefore share the same Abraham URI.
This element MUST contain the @type attribute, with its value set to abraham_uri. The @type attribute of its parent element (i.e. <mets:relatedItem/>) MUST be set to series. Note that the Abraham URI contains the Abraham identifier.
This element contains the date the series was issued. Its value MUST be EDTF-compliant, as indicated by the @encoding attribute which MUST be set to edtf.
A preservation metadata file preservation/premis.xml MUST be present at the package level.
The preservation/premis.xml file MUST follow the PREMIS metadata schema (v3.0.).
If the SIP contains ALTO XML files, the preservation/premis.xml file MUST contain a PREMIS event of type transcription to link the TIFF and ALTO XML files. With this event, the representation containing the TIFF files MUST receive the PREMIS linking object role source and the representation containing the ALTO XML files MUST receive the PREMIS linking object role outcome. See the section about PREMIS events and example 1 below for more information about the structure of PREMIS events.
If the SIP contains a PDF file (which SHOULD contain all pages of the written work, cf. supra, the preservation/premis.xml file MUST contain a PREMIS event of type creation to link the TIFF and ALTO XML files to the PDF file. With this event, the two representations containing the TIFF and the ALTO XML files MUST receive the PREMIS linking object role source and the representation containing the PDF file MUST receive the PREMIS linking object role outcome. See the section about PREMIS events and example 1 below for more information about the structure of PREMIS events.
Example 1: a PREMIS transcription event (linking the TIFF and ALTO XML files)
<premis:premisversion="3.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xmlns:premis="http://www.loc.gov/premis/v3"xsi:schemaLocation="http://www.loc.gov/premis/v3 https://www.loc.gov/standards/premis/premis.xsd">
[...]
<premis:event><premis:eventIdentifier><premis:eventIdentifierType>UUID</premis:eventIdentifierType><premis:eventIdentifierValue>uuid-34ae79f8-a8e7-4768-a269-4d6d895662d6</premis:eventIdentifierValue></premis:eventIdentifier><premis:eventType>transcription</premis:eventType><premis:eventDateTime>2022-02-16T10:01:15.014+02:00</premis:eventDateTime><premis:eventDetailInformation><premis:eventDetail>Generate ALTO XML from TIFF via OCR</premis:eventDetail></premis:eventDetailInformation><premis:linkingObjectIdentifier><premis:linkingObjectIdentifierType>UUID</premis:linkingObjectIdentifierType><premis:linkingObjectIdentifierValue>uuid-d8fd6dde-53a5-4614-823c-32f64588efe6</premis:linkingObjectIdentifierValue><premis:linkingObjectRole>source</premis:linkingObjectRole></premis:linkingObjectIdentifier><premis:linkingObjectIdentifier><premis:linkingObjectIdentifierType>UUID</premis:linkingObjectIdentifierType><premis:linkingObjectIdentifierValue>uuid-1fca6190-a4bd-4773-8529-272b9e7d536a</premis:linkingObjectIdentifierValue><premis:linkingObjectRole>outcome</premis:linkingObjectRole></premis:linkingObjectIdentifier></premis:event>
[...]
</premis:premis>
Example 2: a PREMIS creation event (linking the TIFF, ALTO XML and PDF files)
<premis:premisversion="3.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xmlns:premis="http://www.loc.gov/premis/v3"xsi:schemaLocation="http://www.loc.gov/premis/v3 https://www.loc.gov/standards/premis/premis.xsd">
[...]
<premis:event><premis:eventIdentifier><premis:eventIdentifierType>UUID</premis:eventIdentifierType><premis:eventIdentifierValue>uuid-16a5c827-e513-4ec5-ad75-f75c7b9bde1f</premis:eventIdentifierValue></premis:eventIdentifier><premis:eventType>creation</premis:eventType><premis:eventDateTime>2022-02-16T10:01:15.014+02:00</premis:eventDateTime><premis:eventDetailInformation><premis:eventDetail>Generate PDF from ALTO XML and TIFF</premis:eventDetail></premis:eventDetailInformation><premis:linkingObjectIdentifier><premis:linkingObjectIdentifierType>UUID</premis:linkingObjectIdentifierType><premis:linkingObjectIdentifierValue>uuid-d8fd6dde-53a5-4614-823c-32f64588efe6</premis:linkingObjectIdentifierValue><premis:linkingObjectRole>source</premis:linkingObjectRole></premis:linkingObjectIdentifier><premis:linkingObjectIdentifier><premis:linkingObjectIdentifierType>UUID</premis:linkingObjectIdentifierType><premis:linkingObjectIdentifierValue>uuid-1fca6190-a4bd-4773-8529-272b9e7d536a</premis:linkingObjectIdentifierValue><premis:linkingObjectRole>source</premis:linkingObjectRole></premis:linkingObjectIdentifier><premis:linkingObjectIdentifier><premis:linkingObjectIdentifierType>UUID</premis:linkingObjectIdentifierType><premis:linkingObjectIdentifierValue>uuid-3d371b39-90af-4655-91e9-d93c55f25da1</premis:linkingObjectIdentifierValue><premis:linkingObjectRole>outcome</premis:linkingObjectRole></premis:linkingObjectIdentifier></premis:event>
[...]
<premis:premis>
Representation METS
If the files in a representation each correspond with a single page (e.g. the TIFF and ALTO XML files, since each of these files MUST correspond to a single page), the corresponding <div/> elements in the structural map MUST contain an @ORDER attribute that indicates the sequence of the pages. Additionally, each <div/> element that corresponds to a file representing a page MUST have a @TYPE attribute that is set to page. See example 3 below for more information.
Example 3: the structural map of a representation METS, with @TYPE and @ORDER attributes
<metsxmlns="http://www.loc.gov/METS/"xmlns:csip="https://DILCIS.eu/XML/METS/CSIPExtensionMETS"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xmlns:xlink="http://www.w3.org/1999/xlink"OBJID="representation_1"TYPE="Textual works - Print"PROFILE="https://earksip.dilcis.eu/profile/E-ARK-SIP.xml"xsi:schemaLocation="https://www.w3.org./1999/xlink http://www.loc.gov/standards/xlink/xlink.xsd http://www.loc.gov/METS/ https://www.loc.gov/standards/mets/mets.xsd https://DILCIS.eu/XML/METS/CSIPExtensionMETS https://earkcsip.dilcis.eu/schema/DILCISExtensionMETS.xsd ">
[...]
<structMapID="uuid-04647bb4-f524-435b-b4bf-5fe7a926b9d4"TYPE="PHYSICAL"LABEL="CSIP"><divID="uuid-74e4335c-1d24-42bc-bbd0-864bd216d99c"LABEL="representation_1"><divID="uuid-60d4a0db-769c-42a9-8ef8-c395bb555803"LABEL="Metadata"><divID="uuid-e96b8688-e811-4dd8-83dc-81ae263b9c2a"LABEL="preservation"><fptrFILEID="uuid-4482555d-aed7-4066-a211-44429a60a49a"/></div></div><!-- order attributes for page order --><divID="uuid-41bacec1-1d6c-467a-8020-7114115562a8"LABEL="Representations"><divID="uuid-47e52361-8508-4ae1-ad8c-0e1f5382065e"TYPE="page"ORDER="1"><fptrFILEID="uuid-9850cb03-b1fd-4661-a4fb-e3dfcf25e9e5"/></div><divID="uuid-47e52361-8508-4ae1-ad8c-0e1f5382065e"TYPE="page"ORDER="2"><fptrFILEID="uuid-3309e853-bf0f-4d19-ae6a-5e14911e3662"/></div><divID="uuid-eebd6f2a-f06e-4c5f-9c52-fd58e784eaff"TYPE="page"ORDER="3"><fptrFILEID="uuid-4ef96979-4abf-4af0-8156-d04fdd2ff7c3"/></div></div></div></structMap>
[...]
</mets>
Representation Preservation Metadata
If ALTO XML files are present in the SIP, the preservation/premis.xml files of the representation containing the TIFF files and of the representation containing the ALTO XML files MUST contain a PREMIS relationship to establish a link between the two.
In the case of the representation with the TIFF files, this PREMIS relationship MUST be of type derivation and of subtype is source of. The @valueURI attribute of the <premis:relationshipType> element MUST be set to http://id.loc.gov/vocabulary/preservation/relationshipType/der. The @valueURI attribute of the <premis:relationshipSubType> element MUST be set to http://id.loc.gov/vocabulary/preservation/relationshipSubType/iso. Finally, a <premis:relatedEventIdentifier/> element MUST be present that refers to the relevant event (in this case a transcription event) defined in the preservation/premis.xml file of the package level. This is shown in example 4 below.
In the case of the representation with the ALTO XML files, this PREMIS relationship MUST be of type derivation and of subtype has source. The @valueURI attribute of the <premis:relationshipType> element MUST be set to http://id.loc.gov/vocabulary/preservation/relationshipType/der. The @valueURI attribute of the <premis:relationshipSubType> element MUST be set to http://id.loc.gov/vocabulary/preservation/relationshipSubType/hss. Finally, a <premis:relatedEventIdentifier/> element MUST be present that refers to the relevant event (in this case a transcription event) defined in the preservation/premis.xml file of the package level. This is shown in example 5 below.
If a PDF file is present in the SIP, the preservation/premis.xml files of all three representations (i.e. of the TIFF files, of the ALTO XML file and of the PDF file) MUST contain a PREMIS relationship to establish a link between the three.
In the case of the representations with the TIFF and ALTO XML files, this PREMIS relationship MUST be of type derivation and of subtype is source of. The @valueURI attribute of the <premis:relationshipType> element MUST be set to http://id.loc.gov/vocabulary/preservation/relationshipType/der. The @valueURI attribute of the <premis:relationshipSubType> element MUST be set to http://id.loc.gov/vocabulary/preservation/relationshipSubType/iso. Finally, a <premis:relatedEventIdentifier/> element MUST be present that refers to the relevant event (in this case a transcription event) defined in the preservation/premis.xml file of the package level. This is similar to example 4 shown below.
In the case of the representation with the PDF file, this PREMIS relationship MUST be of type derivation and of subtype has source. The @valueURI attribute of the <premis:relationshipType> element MUST be set to http://id.loc.gov/vocabulary/preservation/relationshipType/der. The @valueURI attribute of the <premis:relationshipSubType> element MUST be set to http://id.loc.gov/vocabulary/preservation/relationshipSubType/hss. Finally, a <premis:relatedEventIdentifier/> element MUST be present that refers to the relevant event (in this case a transcription event) defined in the preservation/premis.xml file of the package level. This is similar to example 5 below, the difference being that the relationship will mostly entail multiple <premis:relatedObjectIdentifier/> elements since the PDF is derived from all TIFF and ALTO XML files together.