Deep learning models need a lot of labeled data to work well. In this study, we use a Self-Supervised Learning (SSL) method for semantic segmentation of archaeological monuments in Digital Terrain Models (DTMs). This method first uses unlabeled data to pretrain a model (pretext task), and then fine-tunes it with a small labeled dataset (downstream task). We use unlabeled DTMs and Relief Visualizations (RVs) to train an encoder-decoder and a Generative Adversarial Network (GAN) in the pretext task and an annotated DTM dataset to fine-tune a semantic segmentation model in the downstream task. Experiments indicate that this approach produces better results than training from scratch or using models pretrained on image data like ImageNet. The code and pretrained weights for the encoder-decoder and the GAN models are made available on Github.1
2022
An active learning approach for the interactive and guided segmentation of tomography data
Bashir Kazimi, Philipp Heuser, Frank Schluenzen, and 7 more authors
The Helmholtz-Zentrum Hereon is operating several tomography end stations at the beamlines P05 and P07 of the synchrotron radiation facility PETRA III at DESY in Hamburg, Germany. Attenuation and phase contrast imaging techniques are provided as well as sample environments for in situ/operando/vivo experiments for applications in biology, medicine, materials science, etc. Very large and diverse data sets with varying spatiotemporal resolution, noise levels and artifacts are acquired which are challenging to process and analyze. Here we report on an active learning approach for the semantic segmentation of tomography data using a guided and interactive framework, and evaluate different acquistion functions for the selection of images to be annotated in the iterative process.
2021
Self Supervised Learning for Detection of Archaeological Monuments in LiDAR Data
The use of deep learning techniques for detection of objects in imagery has spread to many disciplines, including archaeology. Deep learning models are exploited in detection of objects and structures in archaeology using natural and satellite images, as well as aerial and terrestrial laser scanning data. A well-known limitation of such mod- els, specifically deep supervised models, is that they highly depend on large volumes of labelled data. For tasks with a small amount of labelled data and a huge amount of un- labelled data, unsupervised pretraining or transfer learning can be used. In this work, a product of airborne laser scanning data, i.e., Digital Terrain Models (DTM) is used to detect structures such as bomb craters, charcoal kilns, and barrows in the Harz region of Lower Saxony, Germany. Labels for only a small area are available while the majority of the region is unlabelled. Therefore, the large number of unlabelled examples are used to pretrain an auto-encoder model in an unsupervised fashion, and then a supervised training is performed using the labelled data. This combination of unsupervised learning and supervised learning is hereafter referred to as Semi Supervised Learning (SSL). Experiments in this study show that SSL helps gain up to 9 % improvement in perfor- mance compared to using supervised training alone.
Detection of Terrain Structures in Airborne Laser Scanning Data Using Deep Learning
Bashir Kazimi, Frank Thiemann, and Monika Sester
ISPRS Annals of Photogrammetry, Remote Sensing & Spatial Information Sciences, Nov 2020
Automated recognition of terrain structures is a major research problem in many application areas. These structures can be invest- igated in raster products such as Digital Elevation Models (DEMs) generated from Airborne Laser Scanning (ALS) data. Following the success of deep learning and computer vision techniques on color images, researchers have focused on the application of such techniques in their respective fields. One example is detection of structures in DEM data. DEM data can be used to train deep learning models, but recently, Du et al. (2019) proposed a multi-modal deep learning approach (hereafter referred to as MM) prov- ing that combination of geomorphological information help improve the performance of deep learning models. They reported that combining DEM, slope, and RGB-shaded relief gives the best result among other combinations consisting of curvature, flow ac- cumulation, topographic wetness index, and grey-shaded relief. In this work, we approve and build on top of this approach. First, we use MM and show that combinations of other information such as sky view factors, (simple) local relief models, openness, and local dominance improve model performance even further. Secondly, based on the recently proposed HR-Net (Sun et al., 2019), we build a tinier, Multi-Modal High Resolution network called MM-HR, that outperforms MM. MM-HR learns with fewer parameters (4 millions), and gives an accuracy of 84.2 percent on ZISM50m data compared to 79.2 percent accuracy by MM which learns with more parameters (11 millions). On the dataset of archaeological mining structures from Harz, the top accuracy by MM-HR is 91.7 percent compared to 90.2 by MM.
2019
Effectiveness of DTM Derivatives for Object Detection Using Deep Learning
Bashir Kazimi, Katharina Malek, Frank Thiemann, and 1 more author
In International Conference on Cultural Heritage and New Technologies 2019, Nov 2019
Deep learning models have achieved significant performances in identification and localization of objects in image data. Researchers in the remote sensing community have adopted such methods for object recognition in remote sensing data, especially raster products of Airborne Laser Scanning (ALS) data such as Digital Terrain Models (DTM). Small patches of larger DTMs, where pixels represent elevations, are cropped to train deep learning models. However, due to the variation in elevation values for the same object in two different regions, deep learning models either fail to converge or take a long time to train. To alleviate the problem, a local preprocessing step such as normal- ization to a fixed range or local patch standardization is necessary. Another solution is to first calculate other raster products where the pixel values are calculated based on the surrounding pixels within a certain range. Examples of such rasters are Simple Local Relief Models (SLRM), Local Dominance (LD), Sky View Factor (SVF), and Openness (positive and negative). In this research, the effect of using the aforementioned DTM derivatives are studied for detection of historical mining structures in the Harz Region in Lower Saxony. The well-known Mask R-CNN model is trained to produce bounding boxes, labels, and segmentation maps for each object in a given input raster.
Semantic Segmentation of Manmade Landscape Structures in Digital Terrain Models
Bashir Kazimi, Frank Thiemann, and Monika Sester
ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, Sep 2019
We explore the use of semantic segmentation in Digital Terrain Models (DTMS) for detecting manmade landscape structures in archaeological sites. DTM data are stored and processed as large matrices of depth 1 as opposed to depth 3 in RGB images. The matrices usually contain continuous real-valued information upper bound of which is not fixed, such as distance or height from a reference surface. This is different from RGB images that contain integer values in a fixed range of 0 to 255. Additionally, RGB images are usually stored in smaller multidimensional matrices, and are more suitable as inputs for a neural network while the large DTMs are necessary to be split into smaller sub-matrices to be used by neural networks. Thus, while the spatial information of pixels in RGB images are important only locally within a single image, for DTM data, they are important locally, within a single sub-matrix processed for neural network, and also globally, in relation to the neighboring sub-matrices. To cope with the two differences, we apply min-max normalization to each input matrix fed to the neural network, and use a slightly modified version of DeepLabv3+ model for semantic segmentation. We show that with the architecture change, and the preprocessing, better results are achieved.
Object Instance Segmentation in Digital Terrain Models
Bashir Kazimi, Frank Thiemann, and Monika Sester
In Computer Analysis of Images and Patterns, Sep 2019
We use an object instance segmentation approach in deep learning to detect and outline objects in Digital Terrain Models (DTMs) derived from Airborne Laser Scanning (ALS) data. Object detection methods in computer vision have been extensively applied to RGB images, and gained excellent results. In this work, we use Mask R-CNN, a famous object detection model, to detect objects in archaeological sites by feeding the model with DTM data. Our experiments show successful application of the Mask R-CNN model, originally developed for image data, on DTM data.
2018
Deep learning for archaeological object detection in airborne laser scanning data
Bashir Kazimi, Frank Thiemann, Katharina Malek, and 2 more authors
In Proceedings of the 2nd Workshop On Computing Techniques For Spatio-Temporal Data in Archaeology And Cultural Heritage co-located with 10th International Conference on Geographical Information Science, Sep 2018
It is important to preserve archaeological monuments as they play a key role in helping us understand human history and their accomplishments for times with no or little written sources. The first step for this purpose is an efficient method for collecting and documenting information about objects of interest for archaeologists. Airborne laser scanning (ALS) is of great use in collecting and documenting detailed measurements from an area of interest. However, it is time consuming for scientists to manually analyze the collected ALS data. One possible way to automate this process is using deep neural networks. In this work, we propose a hierarchical Convolutional Neural Network (CNN) model to classify archaeological objects in ALS data. The data is acquired from the Harz mining Region in Lower Saxony, where a high density of different archaeological monuments including the UNESCO world heritage site Historic Town of Goslar, Mines of Rammelsberg, and the Upper Harz Water Management System can be found. To compare and validate our method, we run experiments on the same data set using two existing deep learning models. The first model is VGG-16; an image classification network pretrained on ImageNet2 data. The second model is a stacked autoencoders model. The results of the classification as analyzed in this paper show that our model is suitably tuned for this task as it achieves the best classification accuracy of around 91 percent, compared to 88 percent and 82 percent accuracy by the pretrained and stacked autoencoders models, respectively.
Classification of laser scanning data using deep learning
Florian Politz, Bashir Kazimi, and Monika Sester
38th Scientific Technical Annual Meeting of the German Society for Photogrammetry, Remote Sensing and Geoinformation, Sep 2018
In the last couple of years Deep Learning has gained popularity and shown potential in the field of classification. In contrast to 2D image data, Airborne Laser Scanning data is complex due to its irregular 3D structure, which turns the classification into a difficult task. Classifying point clouds can be separated into pointwise semantic classification and objectbased classification. In this paper, we investigate both classification strategies using Convolutional Neural Networks (CNNs). In the first part of this paper, we focus on the semantic classification of 3D point clouds into three classes as required for generating digital terrain models. The second part of this paper deals with classifying archaeological structures in digital terrain models.
2017
Coverage for Character Based Neural Machine Translation
In recent years, Neural Machine Translation (NMT) has achieved state-of-the-art performance in translating from a language; source language, to another; target language. However, many of the proposed methods use word embedding techniques to represent a sentence in the source or target language. Character embedding techniques for this task has been suggested to represent the words in a sentence better. Moreover, recent NMT models use attention mechanism where the most relevant words in a source sentence are used to generate a target word. The problem with this approach is that while some words are translated multiple times, some other words are not translated. To address this problem, coverage model has been integrated into NMT to keep track of already-translated words and focus on the untranslated ones. In this research, we present a new architecture in which we use character embedding for representing the source and target languages, and also use coverage model to make certain that all words are translated. Experiments were performed to compare our model with coverage and character model and the results show that our model performs better than the other two models.