Archaeotools: Data mining, facetted classification and E-archaeology
Primary tabs
Grant Holder:
Professor Julian Richards
This two year project built upon previous ADS work to develop tools (the Common Information Environment - Archaeobrowser project) using advanced data mining and knowledge capture technologies to allow archaeologists to discover, share and analyse datasets and legacy publications that had hitherto been very difficult to integrate into digital frameworks. The project had three interrelated objectives, each represented by a distinct workpackage.
Generation of meta-data (XML, RDBMS) from semi-structured and unstructured texts (such as HTML, PDF and Word documents)
| Project start date: 2007-09 | Project end date: 2009-09 |
Subject domains:
Era(s):
Country/region(s):
| Methods used | Category |
|---|---|
| Accessibility analysis | Strategy and project management |
| Resource sharing | Communication and collaboration |
| Indexing | Data analysis |
| Content analysis | Data analysis |
| Data mining | Data analysis |
| Documentation | Strategy and project management |
| Human factors analysis | Strategy and project management |
| Iterative design | Strategy and project management |
| Risk management | Strategy and project management |
| Searching and querying | Data analysis |
| Security planning | Strategy and project management |
| Usability analysis | Strategy and project management |
| Interface design | Data publishing and dissemination |
| Spatial | Content types |
| text mining | Data analysis |
| Spatial data analysis | Data analysis |
| Statistical analysis | Data analysis |
| Collaborative publishing | Data publishing and dissemination |
| archaeology | Discipline |
| text | Content types |
Funding sources:
Joint Information Systems Committee (JISC), Arts and Humanities Research Council (AHRC), Engineering and Physical Sciences Research Council (EPSRC)
Content types created:
Dataset/structured data, Spatial, Text
Software tools used:
Java, Solr, Java Server Faces, Aleph, Runes, T-rex
Source material used:
The project consists of three work packages each dealing with a particular type of data.
Workpackage 1 - The underlying dataset comprises over 1,000,000 records (held in Oracle RDBMS) aggregated from the National Monuments Records of Scotland, Wales and England as well as Historic Environment Records from numerous local authorities and the ADS’s own archive holdings. The facets selected will be standard hierarchical ‘What’, ‘Where’, and ‘When’ facets plus a ‘Media’ facet to allow the selection of particular subsets of resources. The facets are populated from existing thesauri (e.g. the Thesaurus of Monument types) in XML format and extended/integrated to allow for geographical differences, such as terminological differences in monument and period types between Scotland and England. The Archaeotools project also integrates thesauri served in XML by Simple Knowledge Organisation Systems (SKOS ) based web services developed by the AHRC-funded Semantic Tools for Archaeology project (STAR ) based at the University of Glamorgan.
Work Package 2 - deals with primariy unpublished archaeological reports (grey literature), in total approximately 1000 reports ranging from 10 to 500 hundred of pages. These reports are published by a wide range of archaeological organisations. As an example, OASIS project actively gathers digital versions of grey literature fieldwork reports and currently holds around 2300. This total grows by around 50-100 reports a month; all reports can be downloaded, free of charge, from the ADS.
Work Package 3 - The system is extended to capture metadata from legacy historical documents, using the PSAS (annual Proceedings of the Society of Antiquaries of Scotland, from 1851 to 1999) as an exemplar corpus and utilising the University of Edinburgh’s geoXwalk service to recast place names and locations extracted from text as national grid references (NGRs), allowing enhanced geospatial searching of the data.
Digital resource created:
The ultimate goal of this project is to create a faceted search, browse and knowledge management system for archaeologists to access, share and re-use archaeological data. The working system will be online by early 2010, and a demonstration system is available at http://archaeologydataservice.ac.uk/. A registration is required for accessing the demo.
Access to digital resource:
Open Access
Data Formats created:
XML, RDBMS, free text search index
Metadata standards employed:
Dublin Core, simple (DC), Simple Knowledge Organization System (SKOS)
Publications:
The Archaeotools project, faceted classification and natural language processing in an archaeological context.
Jeffrey, S., Richards, J., Ciravegna, F., Waller, S., Chapman, S. & Zhang, Z. The Archaeotools project, faceted classification and natural language processing in an archaeological context. UK e-Science All Hands Meeting 2008,
Philosophical Transactions of the Royal Society A, 2009 367, 2507-2519
doi: 10.1098/rsta.2009.0038
S. Jeffrey, J. Richards, F. Ciravegna, S. Waller, S. Chapman, Ziqi Zhang. When ontology and reality collide: the Archaeotools project, facetted classification and natural language processing in an archaeological context. In 36th Annual Conference on Computer Applications and Quantitative Methods in Archaeology On the Road to Reconstructing the Past (2008)
Z. Zhang and J. Iria. A Novel Approach to Automatic Gazetteer Generation using Wikipedia. In Proceedings of the ACL'09 Workshop on Collaboratively Constructed Semantic Resources, Singapore, August 2009.
Jeffrey, S., Richards, J., Ciravegna, F., Waller, S., Chapman, S. & Zhang, Z. The Archaeotools project, faceted classification and natural language processing in an archaeological context. UK e-Science All Hands Meeting 2008,
Philosophical Transactions of the Royal Society A, 2009 367, 2507-2519
doi: 10.1098/rsta.2009.0038
S. Jeffrey, J. Richards, F. Ciravegna, S. Waller, S. Chapman, Ziqi Zhang. When ontology and reality collide: the Archaeotools project, facetted classification and natural language processing in an archaeological context. In 36th Annual Conference on Computer Applications and Quantitative Methods in Archaeology On the Road to Reconstructing the Past (2008)
Z. Zhang and J. Iria. A Novel Approach to Automatic Gazetteer Generation using Wikipedia. In Proceedings of the ACL'09 Workshop on Collaboratively Constructed Semantic Resources, Singapore, August 2009.
Institutions affiliated with this project:
| UK HE institutions involved: |
|---|
| University of York |
| University of Sheffield |
Project staff and expertise:
| Principal staff member: | Prof. Julian Richards, Dr Stuart Jeffrey, Prof. Fabio Ciravegna, Stewart Waller, Ziqi Zhang, Sam Chapman, Tony Austin |
|---|---|
| Other staff: | |
| External expertise: |
| Metadata on this arts-humanities.net record | |
|---|---|
| Author(s) of record | Ziqi Zhang |
| Title | Archaeotools: Data mining, facetted classification and E-archaeology |
| Record created | 2010-02-01 |
| Record updated | 2010-06-11 11:17 |
| URL of record | http://www.arts-humanities.net/node/3005 |
| Citation of record | Ziqi Zhang: Archaeotools: Data mining, facetted classification and E-archaeology. <http://www.arts-humanities.net/node/3005> created: 2010-02-01, last updated 2010-06-11 11:17 |