<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://www.selmilab.eu/feed.xml" rel="self" type="application/atom+xml" /><link href="https://www.selmilab.eu/" rel="alternate" type="text/html" /><updated>2026-04-02T13:01:26+00:00</updated><id>https://www.selmilab.eu/feed.xml</id><title type="html">Selmilab</title><subtitle>Write an awesome description for your new site here. You can edit this line in _config.yml. It will appear in your document head meta (for Google search results) and in your feed.xml site description.</subtitle><author><name>Luigi Sellmi</name></author><entry><title type="html">ECRS 2022 Best Presentation Award on Synthetic Aperture Radar Data Processing</title><link href="https://www.selmilab.eu/ecrs2022-best-presentation-award.html" rel="alternate" type="text/html" title="ECRS 2022 Best Presentation Award on Synthetic Aperture Radar Data Processing" /><published>2022-07-27T00:00:00+00:00</published><updated>2022-07-27T00:00:00+00:00</updated><id>https://www.selmilab.eu/ecrs2022-best-presentation-award</id><content type="html" xml:base="https://www.selmilab.eu/ecrs2022-best-presentation-award.html"><![CDATA[<p><img src="/assets/images/ecrs2022/ecrs2022-logo.png" alt="ECRS2022" />
To my great pleasure I have been granted the <a href="https://ecrs-4.sciforum.net/#custom2325">ECRS 2022 Best Presentation
Award of the 4th International Electronic Conference on Remote Sensing</a> for the session on SAR Data Processing. This simple fact might not be of great interest to anyone else so in order to help you decide whether to follow reading or not, these are the topics that are discussed in this post</p>

<ol>
  <li>The content of my presentation: I provide some tips for people interested in using Synthetic Aperture Radar imagery to estimate the extent of flooded areas. You do not have to be an expert on electromagnetic theory or digital image processing.</li>
  <li>The delivery: my tips on how to prepare and deliver a successful presentation.</li>
</ol>

<h3 id="the-content">The content</h3>
<p>I have been working on Synthetic Aperture Radar (SAR) imagery from the Copernicus Sentinel-1 satellite for some time and I have found that the gap between the theory that can be learnt from textbooks and the practical knowledge needed to use the data in its many applications is very large. The people interested in using the free SAR data provided by the European Space Agency (ESA) through its Copernicus programme are not anymore only electronic engineers or physicists with a solid understanding of electromagnetic theory. Synthetic Aperture Radar is a very powerful technology that allows the monitoring of many natural and human-made phenomena such as floods, oil spills, subsidence, urbanization, volcanic eruptions, landslides, deforestation, glaciers and sea ice sheets melting. Given its broad range of applications SAR data users have a large variety of backgrounds: GIS analysts, natural resource managers, geologists, environmental  researchers, geographers, community managers, land use planners, farmers, economists just to mention a few categories. The SAR data products that can be downloaded from the <a href="https://scihub.copernicus.eu/dhus/#/home">Copernicus Open Access Hub</a> have to undergo a quite complex digital image processing chain depending on the application.</p>

<p><img src="/assets/images/ecrs2022/dip-operators.png" alt="Digital Image Processing Operators available in SNAP" /></p>

<p>In some applications such as for volcanic eruptions, landslides and subsidence the user needs the phase and the amplitude of the backscattered signal in order to detect a change that might have occurred over the Earth surface. The technique, called SAR interferometry, consists of counting the number of wavelengths of the radar backscatter signal along the path from the target to the antenna on board the satellite, before and after the event that caused the change. The difference in the number of wavelengths is a measure of the change occurred at the target area. In other applications such as for agriculture, forest mapping, urban areas monitoring and oil spills detection the user might need to estimate the change of the polarization of the backscattered radar signal. As I was working with the ESA <a href="https://step.esa.int/main">Science Toolbox Exploitation Platform</a> to process the Sentinel-1 imagery in the middle of July 2021, Germany was hit by a devastating flood that killed 184 people and caused damages estimated to be €40 billions. I was living in Bonn at that time, very close to the affected areas so I decided to use the imagery and the toolbox to assess the extent of flooded area. The tutorials available on the tool’s website were mostly addressed to people already skilled in remote sensing and digital image processing with detailed descriptions about the mathematical operators required to extract the flooded areas from the rest of the image but without a clear explanation of why those operators were required and in which order. I wrote some notes for myself putting together the sequence of operators required for the task at hand, among those available in the toolbox, with a short explanation of their function. After a couple of weeks I thought the notes might be useful to other people as well, not expert in SAR digital image processing, and I sent them to the <a href="https://forum.step.esa.int/">STEP Forum</a>. The notes were quite well received and added to the list of the <a href="https://step.esa.int/main/doc/tutorials/">Sentinel-1 Toolbox tutorials</a>. I shared the link to the tutorial on a LinkedIn group interested in Earth Observation that was well appreciated and I decided to participate in the 4th International Electronic Conference on Remote Sensing with my tutorial. The content was ready but what about presenting it ?</p>

<h3 id="the-delivery">The delivery</h3>
<p>I have given many presentations in my work but I never had the feeling to be good at that. Finally I came to the conclusion that presenting is something to be learnt, not something we are naturally gifted. So I looked for some textbooks and online courses and finally I found the course I was looking for, that could help me to improve my slides and my speech: the Matt Garrity’s course <a href="https://www.coursera.org/specializations/public-speaking">Dynamic Public Speaking</a>, freely available on the Coursera platform. It’s made up of four courses, not just one: an introductory part on rhetoric and the origin of public speaking, a course on how to organize and structure your presentation and slides, one about persuasive argumentation , and one last course on how to make an inspirational speech. In a few sentences what I learnt from the course:</p>

<ul>
  <li>Make it clear what is the message you want to deliver to your public.</li>
  <li>Structure your speech. Provide an introduction, state the problem you are going to   talk about and say what you have done to solve it. Provide a conclusion.</li>
  <li>Do not add too much text to your slides, better no words at all than too many. Remember the adage “a picture is worth one thousand words”.</li>
  <li>Do not read, never. Prepare the delivery of your presentation with someone else kind enough to listen to you or in front of a mirror.</li>
</ul>

<h3 id="conclusion">Conclusion</h3>
<p>SAR is a powerful technology with many applications and an increasing number of users with different backgrounds eager to use the data and the tools made available by the Copernicus programme. Each application exploits different properties of the electromagnetic radiation and of its interaction with the Earth’s surface that might be difficult to grasp without a long training on electromagnetic theory and digital image processing. The learning curve can be made much less steep by developing tutorials that contain the practical information on how to use a tool with the contextual information about the theory relevant to the specific application.</p>

<p>You can get my tutorial on the ESA STEP website, section <a href="http://step.esa.int/main/doc/tutorials/">Sentinel-1 Toolbox</a> (SAR Applications): Flood mapping using the Sentinel-1 imagery and the ESA SNAP S1-Toolbox, October 2021, or <a href="/assets/pdf/flood_mapping_using_sentinel-1_imagery_v1.pdf">download it</a> directly from this website.</p>]]></content><author><name>Luigi Sellmi</name></author><category term="Remote_Sensing" /><category term="Synthetic_Aperture_Radar" /><category term="Copernicus" /><category term="Sentinel-1" /><category term="Climate_Change" /><category term="Flooding" /><category term="Natural_Hazard" /><summary type="html"><![CDATA[To my great pleasure I have been granted the ECRS 2022 Best Presentation Award of the 4th International Electronic Conference on Remote Sensing for the session on SAR Data Processing. This simple fact might not be of great interest to anyone else so in order to help you decide whether to follow reading or not, these are the topics that are discussed in this post The content of my presentation: I provide some tips for people interested in using Synthetic Aperture Radar imagery to estimate the extent of flooded areas. You do not have to be an expert on electromagnetic theory or digital image processing. The delivery: my tips on how to prepare and deliver a successful presentation. The content I have been working on Synthetic Aperture Radar (SAR) imagery from the Copernicus Sentinel-1 satellite for some time and I have found that the gap between the theory that can be learnt from textbooks and the practical knowledge needed to use the data in its many applications is very large. The people interested in using the free SAR data provided by the European Space Agency (ESA) through its Copernicus programme are not anymore only electronic engineers or physicists with a solid understanding of electromagnetic theory. Synthetic Aperture Radar is a very powerful technology that allows the monitoring of many natural and human-made phenomena such as floods, oil spills, subsidence, urbanization, volcanic eruptions, landslides, deforestation, glaciers and sea ice sheets melting. Given its broad range of applications SAR data users have a large variety of backgrounds: GIS analysts, natural resource managers, geologists, environmental researchers, geographers, community managers, land use planners, farmers, economists just to mention a few categories. The SAR data products that can be downloaded from the Copernicus Open Access Hub have to undergo a quite complex digital image processing chain depending on the application. In some applications such as for volcanic eruptions, landslides and subsidence the user needs the phase and the amplitude of the backscattered signal in order to detect a change that might have occurred over the Earth surface. The technique, called SAR interferometry, consists of counting the number of wavelengths of the radar backscatter signal along the path from the target to the antenna on board the satellite, before and after the event that caused the change. The difference in the number of wavelengths is a measure of the change occurred at the target area. In other applications such as for agriculture, forest mapping, urban areas monitoring and oil spills detection the user might need to estimate the change of the polarization of the backscattered radar signal. As I was working with the ESA Science Toolbox Exploitation Platform to process the Sentinel-1 imagery in the middle of July 2021, Germany was hit by a devastating flood that killed 184 people and caused damages estimated to be €40 billions. I was living in Bonn at that time, very close to the affected areas so I decided to use the imagery and the toolbox to assess the extent of flooded area. The tutorials available on the tool’s website were mostly addressed to people already skilled in remote sensing and digital image processing with detailed descriptions about the mathematical operators required to extract the flooded areas from the rest of the image but without a clear explanation of why those operators were required and in which order. I wrote some notes for myself putting together the sequence of operators required for the task at hand, among those available in the toolbox, with a short explanation of their function. After a couple of weeks I thought the notes might be useful to other people as well, not expert in SAR digital image processing, and I sent them to the STEP Forum. The notes were quite well received and added to the list of the Sentinel-1 Toolbox tutorials. I shared the link to the tutorial on a LinkedIn group interested in Earth Observation that was well appreciated and I decided to participate in the 4th International Electronic Conference on Remote Sensing with my tutorial. The content was ready but what about presenting it ? The delivery I have given many presentations in my work but I never had the feeling to be good at that. Finally I came to the conclusion that presenting is something to be learnt, not something we are naturally gifted. So I looked for some textbooks and online courses and finally I found the course I was looking for, that could help me to improve my slides and my speech: the Matt Garrity’s course Dynamic Public Speaking, freely available on the Coursera platform. It’s made up of four courses, not just one: an introductory part on rhetoric and the origin of public speaking, a course on how to organize and structure your presentation and slides, one about persuasive argumentation , and one last course on how to make an inspirational speech. In a few sentences what I learnt from the course: Make it clear what is the message you want to deliver to your public. Structure your speech. Provide an introduction, state the problem you are going to talk about and say what you have done to solve it. Provide a conclusion. Do not add too much text to your slides, better no words at all than too many. Remember the adage “a picture is worth one thousand words”. Do not read, never. Prepare the delivery of your presentation with someone else kind enough to listen to you or in front of a mirror. Conclusion SAR is a powerful technology with many applications and an increasing number of users with different backgrounds eager to use the data and the tools made available by the Copernicus programme. Each application exploits different properties of the electromagnetic radiation and of its interaction with the Earth’s surface that might be difficult to grasp without a long training on electromagnetic theory and digital image processing. The learning curve can be made much less steep by developing tutorials that contain the practical information on how to use a tool with the contextual information about the theory relevant to the specific application. You can get my tutorial on the ESA STEP website, section Sentinel-1 Toolbox (SAR Applications): Flood mapping using the Sentinel-1 imagery and the ESA SNAP S1-Toolbox, October 2021, or download it directly from this website.]]></summary></entry><entry><title type="html">Land Use and Land Cover Classification using a ResNet Deep Learning Architecture</title><link href="https://www.selmilab.eu/lulc-classification.html" rel="alternate" type="text/html" title="Land Use and Land Cover Classification using a ResNet Deep Learning Architecture" /><published>2021-09-01T00:00:00+00:00</published><updated>2021-09-01T00:00:00+00:00</updated><id>https://www.selmilab.eu/land-use-land-cover-classification</id><content type="html" xml:base="https://www.selmilab.eu/lulc-classification.html"><![CDATA[<p>The goal with this experiment was to test the accuracy of Convolutional Neural Networks to learn the spatial and spectral characteristics of image patches of the Earth surface extracted from satellite images for Land Use and Land Cover (LULC) classification tasks.</p>

<p><img src="/assets/images/sentinel-2/Sentinel-2-MSI_overview.jpg" alt="Sentinel-2" /></p>

<p>To achieve my goal I used a transfer learning technique that consists of using a pretrained ResNet CNN architecture [1] and finetune it with the EuroSAT dataset [2], a collection of labelled satellite patch images extracted from the Copernicus Sentinel-2 satellites products. I used the <a href="https://www.fast.ai/">Fastai</a> deep learning library to write the Python code to train and validate the system, and Google Colab to execute it on a GPU. I have also performed some additional LULC classification tests using new images extracted from the Copernicus Sentinel-2 dataset products through the Sentinel-Hub EO-Browser. In the next four sections I provide some information about the LULC classification task, the Sentinel-2 satellites and their products, the EuroSAT dataset, and the finetuning technique. In the coding sections that follow I describe all the steps required to set up a deep learning architecture to accomplish the LULC task and to assess the accuracy of the results.</p>

<h2 id="land-use-and-land-cover-lulc-classification">Land Use and Land Cover (LULC) classification</h2>
<p>Land cover indicates the type of surface, such as forest or river, whereas land use indicates how people are using the land. Land cover can be determined by the reflectance properties of the surface. This information is commonly extracted from aerial or satellite digital images whose pixels values represent the solar energy reflected by the Earth’s surface in different spectral bands. The class of a land cover at the pixel level can be determined by using some combinations of spectral bands. For instance, vegetation has a stronger reflectance in the near infrared region of the spectrum and can be better observed using bands B7, B8, B8a and B9 of the Sentinel-2 MSI than the bands in the visible region of the spectrum, B4, B3, B2 (or RGB).</p>

<p><img src="/assets/images/sentinel-2/usgs_spectral_bands.png" alt="Sentinel-2 spectral bands" /></p>

<p>In the visible region, dry grass has a stronger reflectance in band 4 (Red) than in the other two bands B2 and B3 (blue and green). Classical machine learning algorithms such as Random Forests or Support Vector Machine are used to improve the accuracy of the classification. On the other hand, spectral data at the pixel level alone cannot provide information about the land use and a patch image has to be considered in its entirety to infer its use. Often also additional information is required to disambiguate among all the possible uses of a land. Different classification systems have been developed over the years whose goal is to define a taxonomy of land covers and land use. One such classification system is the CORINE land cover nomenclature [3] that contains 44 classes. CORINE is a land cover inventory performed every 6 years by the Copernicus Land Monitoring Service to monitor the changes in land use and land cover over the European continent. The maps produced by CORINE are based on the Sentinel-2 images classified at the pixel level according to the nomenclature and on information available from national cadastre. Since the availability of deep learning algorithms for computer vision, researchers have been developing models to perform LULC tasks, to be used at any time that new images are available and without using information from cadastre that might be expensive, not always up to date or not publicly available.</p>

<h2 id="the-copernicus-sentinel-2-satellites">The Copernicus Sentinel-2 satellites</h2>
<p>The Copernicus <a href="https://www.esa.int/Applications/Observing_the_Earth/Copernicus/Sentinel-2">Sentinel-2</a> constellation is based on two identical satellites for earth observation, launched and operated by the European Space Agency (ESA). Each satellite flights on a Sun synchronous polar orbit at 786 km altitude, 180° apart from each other so that the same area can be revisited every 5 days. Both satellites carry a high spatial resolution multispectral imager (MSI) with a 290 km swath that can collect the solar energy reflected by the Earth surface in 13 spectral bands, from the visible to the near-infrared spectral range. The spatial resolution ranges from 10 m per pixel (B2, B3, B4, and B8 bands) down to 60 m per pixel. The ESA processes the raw data for radiometric calibration, orthorectification, atmospheric correction and georeferencing, and delivers the processed images as Level-1C (L1C) and Level-2A (L2A) <a href="https://sentinels.copernicus.eu/web/sentinel/missions/sentinel-2/data-products">data products</a>. Both products are delivered as 100 km x 100 km tiles in UTM/WGS84 projection. L1C products are not processed for atmospheric correction and are described as Top-Of-Atmosphere reflectance data, while L2A products have received atmospheric correction and are described as Bottom-Of-Atmosphere reflectance data. L1C and L2A products can be obtained under an open access license from the <a href="https://scihub.copernicus.eu/dhus/#/home">Copernicus Open Access Hub</a> or from other providers such as <a href="https://apps.sentinel-hub.com/eo-browser/">Sentinel-Hub</a>.</p>

<h2 id="the-eurosat-dataset">The EuroSAT dataset</h2>
<p>The EuroSAT dataset was created at the Deutsches Forschungszentrum für Künstliche Intelligenz (<a href="https://www.dfki.de/web/">DKFI</a>). The images were extracted from the Sentinel-2A L1C products covering cities in 34 European countries all over a year. The dataset consists of two subsets: RGB and multispectral. Each dataset contains 27000 images divided in 10 classes with 2000 to 3000 images per class. The classes defined to label the images are a subset of those defined in CORINE: Pasture, HerbaceousVegetation, Industrial, AnnualCrop, Residential, PermanentCrop, Highway, SeaLake, Forest, River. The RGB dataset contains images covering the three bands in the visible region of the spectrum (RGB colors). The multispectral dataset contains images covering all the 13 specral bands available from the MSI sensor of the Sentinel-2 satellites. The size of each patch image is 64x64 pixels with 10 m resolution. In this notebook I use only the RGB dataset.</p>

<h2 id="finetuning-a-pretrained-resnet-architecture">Finetuning a pretrained ResNet architecture</h2>
<p>A Convolutional Neural Network that can learn how to distinguish different types of land covers, where geometries and reflectance properties can be mixed in many different ways, requires an architecture with many layers to achieve a good accuracy. Such architectures are expensive to train from scratch in terms of amount of labeled data needed for training, and also in terms of time and computing resources. It is nowadays normal practice in computer vision to reuse a model that has been pretrained on a different but large set of examples, such as ImageNet, and finetune the pretrained model with data that is specific to the task at hand, in our case LULC classification. Finetuning is a transfer learning technique in which the parameters of a pretrained neural network architecture are updated using the new data. I will use the ResNet50 architecture, pretrained on the ImageNet dataset, as suggested in the EuroSAT paper. I will use the fastai deep learning library to implement all the functions required in this notebook. Fastai is a high level library that leverages the underlying PyTorch machine learning framework.</p>

<h2 id="prepare-the-development-environment">Prepare the development environment</h2>
<p>I use Google Colab for development and to test the code using the GPUs that are provided free of charge. The machines available in Google Colab come with an old version of the fastai library that does not support the functions and classes that I will be using and must be updated. I use fastai version 2.5.1 but any other later version should also work fine.</p>

<h2 id="download-the-eurosat-rgb-dataset">Download the EuroSAT RGB dataset</h2>
<p>I download the EuroSAT RGB <a href="https://madm.dfki.de/downloads">images</a> from the DKFI website. The images are compressed in a zip file and divided in 10 folders named after the classes defined for the classification task.</p>

<h2 id="define-the-dataloaders-the-test-and-validation-set-and-the-transformations-to-be-applied">Define the DataLoaders, the test and validation set, and the transformations to be applied</h2>
<p>The fastai library provides classes and functions to specify the type of example data (images, text files, tabular data), which transformations must be applied on them (e.g. resizing, cropping), how to find the images in the file system, how to extract their labels (classes), and how to split the EuroSAT dataset in a training set and a validation set. We can also artificially increase the number of images using the data augmentation technique. Fastai provides a convenience function that creates more images from the original EuroSAT dataset by applying different transformations such as flipping, rotation, brightness modification and others. I use the fastai default splitting value, 80% for training and 20% for validation. I do not apply any resizing since all the images in EuroSAT used to finetune the pretrained model have the same shape (64x64 pixels). The main step is to transform all the JPG images in PyTorch tensors, i.e. multidimensional float arrays that can be sent in batches to a GPU for computations. The fastai DataBlock class creates batches of images as 4 dimensional tensors for both the training and the validation steps. A batch is a tensor whose dimensions are B x C x H x W where B stands for batch size, C for channels (e.g. the 3 RGB channels), H for height, and W for width. The default batch size for a DataBlock is 64.</p>

<figure class="highlight"><pre><code class="language-text" data-lang="text">blocks = DataBlock(blocks = (ImageBlock, CategoryBlock),
                 get_items=get_image_files, # finds the images in the path
                 splitter=RandomSplitter(seed=42),  # default random split 80% training, 20% validation
                 get_y=using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name'), # extracts the label category from the image's folder name
                 batch_tfms=aug_transforms(mult=2)) # data augmentation (mult multiplies the default transformation values)

dls = blocks.dataloaders(path)</code></pre></figure>

<h2 id="data-augmentation">Data augmentation</h2>
<p>Additional images have been already created in the previous step from the original ones by using the data augmentation technique implemented by the fastai aug_transforms() convenience function. Here I show the images that are created from one example.
<img src="/assets/images/sentinel-2/data_augmentation.png" alt="Data Augmentation" /></p>

<h2 id="setup-the-resnet50-convnet-pretrained-architecture">Setup the ResNet50 ConvNet pretrained architecture</h2>
<p>Now we select the pretrained architecture to be finetuned using the EuroSAT dataset and the metric we want to use to check whether our model achieves our expectations. In this case we use the accuracy as a metric, that can be simply computed from the error rate (accuracy = 1 - error_rate). We use the ResNet50 architecture pretrained with the ImageNet dataset as in the EuroSAT paper. The ResNet50 CNN architecture contains a sequence of blocks of 3 convolutional layers each, for a total of 50 layers.</p>

<p><img src="/assets/images/sentinel-2/resnet_bottleneck_building_block.png" alt="ResNet bottleneck block" class="align-center" /></p>

<p>Each block contains a shortcut connection from the input to the output so that it learns the difference between them, that is the residual. This architectural choise allows to avoid the degradation problem that affects other architectures with many convolutional layers. As said before, a deep architecture is required to learn many complex features from data. Another advantage of the ResNet50 architecture is that the number of parameters don’t depend on the size of the images. The number of channels, also known as features maps, and their size are always the same at each layer so that the original ResNet50 parameters that have been computed using 224x224 pixels images can be updated during the finetuning process using images of a different size such as the 64x64 pixels images of the EuroSAT dataset. The final layer reduces the tensor to a one dimensional vector of size 10, the number of the classes, that is finally sent to a softmax layer that computes the probabilities of each image to be a member of any of the 10 classes used to classify the EuroSAT images. The fastai convenience function cnn_learner() is used to customize the learning process by setting different hyperparameters such as the optimizer (e.g. Stochastic Gradient Descent), the loss function, the learning rate and many others. A summary of the architecture is shown before starting the finetuning process.</p>

<figure class="highlight"><pre><code class="language-text" data-lang="text">learn = cnn_learner(dls, resnet50, metrics=accuracy)
learn.summary()</code></pre></figure>

<figure class="highlight"><pre><code class="language-text" data-lang="text">Sequential (Input shape: 64)
============================================================================
Layer (type)         Output Shape         Param #    Trainable
============================================================================
                     64 x 64 x 32 x 32   
Conv2d                                    9408       False     
BatchNorm2d                               128        True      
ReLU                                                           
MaxPool2d                                                      
Conv2d                                    4096       False     
BatchNorm2d                               128        True      
Conv2d                                    36864      False     
BatchNorm2d                               128        True      
____________________________________________________________________________
                     64 x 256 x 16 x 16  
Conv2d                                    16384      False     
BatchNorm2d                               512        True      
ReLU                                                           
____________________________________________________________________________
                     64 x 256 x 16 x 16  
Conv2d                                    16384      False     
BatchNorm2d                               512        True      
____________________________________________________________________________
                     64 x 64 x 16 x 16   
Conv2d                                    16384      False     
BatchNorm2d                               128        True      
Conv2d                                    36864      False     
BatchNorm2d                               128        True      
____________________________________________________________________________
                     64 x 256 x 16 x 16  
Conv2d                                    16384      False     
BatchNorm2d                               512        True      
ReLU                                                           
____________________________________________________________________________
                     64 x 64 x 16 x 16   
Conv2d                                    16384      False     
BatchNorm2d                               128        True      
Conv2d                                    36864      False     
BatchNorm2d                               128        True      
____________________________________________________________________________
                     64 x 256 x 16 x 16  
Conv2d                                    16384      False     
BatchNorm2d                               512        True      
ReLU                                                           
____________________________________________________________________________
                     64 x 128 x 16 x 16  
Conv2d                                    32768      False     
BatchNorm2d                               256        True      
____________________________________________________________________________
                     64 x 128 x 8 x 8    
Conv2d                                    147456     False     
BatchNorm2d                               256        True      
____________________________________________________________________________
                     64 x 512 x 8 x 8    
Conv2d                                    65536      False     
BatchNorm2d                               1024       True      
ReLU                                                           
____________________________________________________________________________
                     64 x 512 x 8 x 8    
Conv2d                                    131072     False     
BatchNorm2d                               1024       True      
____________________________________________________________________________
                     64 x 128 x 8 x 8    
Conv2d                                    65536      False     
BatchNorm2d                               256        True      
Conv2d                                    147456     False     
BatchNorm2d                               256        True      
____________________________________________________________________________
                     64 x 512 x 8 x 8    
Conv2d                                    65536      False     
BatchNorm2d                               1024       True      
ReLU                                                           
____________________________________________________________________________
                     64 x 128 x 8 x 8    
Conv2d                                    65536      False     
BatchNorm2d                               256        True      
Conv2d                                    147456     False     
BatchNorm2d                               256        True      
____________________________________________________________________________
                     64 x 512 x 8 x 8    
Conv2d                                    65536      False     
BatchNorm2d                               1024       True      
ReLU                                                           
____________________________________________________________________________
                     64 x 128 x 8 x 8    
Conv2d                                    65536      False     
BatchNorm2d                               256        True      
Conv2d                                    147456     False     
BatchNorm2d                               256        True      
____________________________________________________________________________
                     64 x 512 x 8 x 8    
Conv2d                                    65536      False     
BatchNorm2d                               1024       True      
ReLU                                                           
____________________________________________________________________________
                     64 x 256 x 8 x 8    
Conv2d                                    131072     False     
BatchNorm2d                               512        True      
____________________________________________________________________________
                     64 x 256 x 4 x 4    
Conv2d                                    589824     False     
BatchNorm2d                               512        True      
____________________________________________________________________________
                     64 x 1024 x 4 x 4   
Conv2d                                    262144     False     
BatchNorm2d                               2048       True      
ReLU                                                           
____________________________________________________________________________
                     64 x 1024 x 4 x 4   
Conv2d                                    524288     False     
BatchNorm2d                               2048       True      
____________________________________________________________________________
                     64 x 256 x 4 x 4    
Conv2d                                    262144     False     
BatchNorm2d                               512        True      
Conv2d                                    589824     False     
BatchNorm2d                               512        True      
____________________________________________________________________________
                     64 x 1024 x 4 x 4   
Conv2d                                    262144     False     
BatchNorm2d                               2048       True      
ReLU                                                           
____________________________________________________________________________
                     64 x 256 x 4 x 4    
Conv2d                                    262144     False     
BatchNorm2d                               512        True      
Conv2d                                    589824     False     
BatchNorm2d                               512        True      
____________________________________________________________________________
                     64 x 1024 x 4 x 4   
Conv2d                                    262144     False     
BatchNorm2d                               2048       True      
ReLU                                                           
____________________________________________________________________________
                     64 x 256 x 4 x 4    
Conv2d                                    262144     False     
BatchNorm2d                               512        True      
Conv2d                                    589824     False     
BatchNorm2d                               512        True      
____________________________________________________________________________
                     64 x 1024 x 4 x 4   
Conv2d                                    262144     False     
BatchNorm2d                               2048       True      
ReLU                                                           
____________________________________________________________________________
                     64 x 256 x 4 x 4    
Conv2d                                    262144     False     
BatchNorm2d                               512        True      
Conv2d                                    589824     False     
BatchNorm2d                               512        True      
____________________________________________________________________________
                     64 x 1024 x 4 x 4   
Conv2d                                    262144     False     
BatchNorm2d                               2048       True      
ReLU                                                           
____________________________________________________________________________
                     64 x 256 x 4 x 4    
Conv2d                                    262144     False     
BatchNorm2d                               512        True      
Conv2d                                    589824     False     
BatchNorm2d                               512        True      
____________________________________________________________________________
                     64 x 1024 x 4 x 4   
Conv2d                                    262144     False     
BatchNorm2d                               2048       True      
ReLU                                                           
____________________________________________________________________________
                     64 x 512 x 4 x 4    
Conv2d                                    524288     False     
BatchNorm2d                               1024       True      
____________________________________________________________________________
                     64 x 512 x 2 x 2    
Conv2d                                    2359296    False     
BatchNorm2d                               1024       True      
____________________________________________________________________________
                     64 x 2048 x 2 x 2   
Conv2d                                    1048576    False     
BatchNorm2d                               4096       True      
ReLU                                                           
____________________________________________________________________________
                     64 x 2048 x 2 x 2   
Conv2d                                    2097152    False     
BatchNorm2d                               4096       True      
____________________________________________________________________________
                     64 x 512 x 2 x 2    
Conv2d                                    1048576    False     
BatchNorm2d                               1024       True      
Conv2d                                    2359296    False     
BatchNorm2d                               1024       True      
____________________________________________________________________________
                     64 x 2048 x 2 x 2   
Conv2d                                    1048576    False     
BatchNorm2d                               4096       True      
ReLU                                                           
____________________________________________________________________________
                     64 x 512 x 2 x 2    
Conv2d                                    1048576    False     
BatchNorm2d                               1024       True      
Conv2d                                    2359296    False     
BatchNorm2d                               1024       True      
____________________________________________________________________________
                     64 x 2048 x 2 x 2   
Conv2d                                    1048576    False     
BatchNorm2d                               4096       True      
ReLU                                                           
AdaptiveAvgPool2d                                              
AdaptiveMaxPool2d                                              
Flatten                                                        
BatchNorm1d                               8192       True      
Dropout                                                        
____________________________________________________________________________
                     64 x 512            
Linear                                    2097152    True      
ReLU                                                           
BatchNorm1d                               1024       True      
Dropout                                                        
____________________________________________________________________________
                     64 x 10             
Linear                                    5120       True      
____________________________________________________________________________

Total params: 25,619,520
Total trainable params: 2,164,608
Total non-trainable params: 23,454,912</code></pre></figure>

<h2 id="finetuning">Finetuning</h2>
<p>Now the ResNet50 architecture is loaded with the pretrained parameters and the finetuning process can start. I set the number of epochs to a value that achieves a good enough accuracy without wasting too much time and resources in computations. We can try different numbers of epochs to find the right one. Fastai provides a convenience function that can do the work for us. In an epoch a batch of images is used to compute an average value of the loss and to update the model parameters before using the next batch till all the batches have been used and a new epoch can start.</p>

<figure class="highlight"><pre><code class="language-text" data-lang="text">learn.fine_tune(20)</code></pre></figure>

<p>We have achieved an accuracy close to 98%. The exact value can change slightly anytime we finetune the model. If that level of accuracy is enough for our purpose we can stop here otherwise we can run more epochs, try different learning rates or apply more transformations to further increase the number of images. We can plot the accuracy of the finetuned model relative to the number of epochs.</p>

<p><img src="/assets/images/sentinel-2/accuracy.png" alt="accuracy" class="align-center" /></p>

<h2 id="model-evaluation">Model evaluation</h2>
<p>We want to understand what is the accuracy of our model in separating images that belong to different classes. For example, it is known that the spectral response of rivers (turbid water) and roads is pretty similar in the visible part of the spectrum as they absorb most of the solar radiation, and seen from a long distance within a patch of 640 m x 640 m a river and a highway might be confused. So we will not be surprised if some images of rivers and roads may be misclassified, that is, an image of a river may be interpreted as an image of a road and vice versa by our model. Also, the distinction between Pasture, Permanent Crop, Annual Crop or Herbaceous Vegetation may not be so clear using only the three RGB bands so we can expect a certain level of misclassifications among these classes as well. We plot the confusion matrix using the validation data to check for which classes there have been misclassifications.</p>

<figure class="highlight"><pre><code class="language-text" data-lang="text">interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix(figsize=(12,12), dpi=60)</code></pre></figure>

<p><img src="/assets/images/sentinel-2/confusion_matrix.png" alt="Confusion matrix" class="align-center" /></p>

<p>As we already figured out, most of the misclassifications happen between classes that are related to the vegetation. Also for rivers and highways some misclassifications are reported. Still the great majority of classifications for each class are correct.</p>

<h2 id="a-minimalist-application">A minimalist application</h2>
<p>Once we have achieved the accuracy that we wanted, we can use our finetuned model with new images, for example extracted from the Sentinel-2 products available from Sentinel-Hub. I have extracted some patch images of the same size as those used to finetune the model (64x64 pixels). The new images were extracted manually from Sentilel-2 L1C products using the Sentinel-Hub <a href="https://www.sentinel-hub.com/explore/eobrowser/">EO-Browser</a>. As an example I use two images from the exact same area from Mesero (Lat. = 45.487971, Lon. = 8.849979), close to Milan, taken two years apart from each other to show how the application can be used to detect changes in land cover and use. The 1st image of the area was taken in July 2019 and shows a land that is mostly covered by crops (‘Permanent Crop’ or ‘Annual Crop’) but also contains a highway at its bottom.</p>

<p><img src="/assets/images/sentinel-2/2019-07-16-Sentinel-2_L1C_Mesero.jpg" alt="Mesero 2019" class="align-center" /></p>

<p>The probabilities computed by our model for the 1st image (Mesero 2019) are:</p>

<figure class="highlight"><pre><code class="language-text" data-lang="text">Prediction: Highway; Probability: 0.9980

tensor([4.8410e-04, 1.3392e-06, 1.5634e-04, 9.9796e-01, 1.9808e-04, 1.3962e-04,
        8.8622e-04, 4.6300e-05, 1.1766e-04, 1.2730e-05])</code></pre></figure>

<p>The 2nd image was taken in August 2021 and shows an industrial building and still the same highway as in 2019.</p>

<p><img src="/assets/images/sentinel-2/2021-08-14-Sentinel-2_L1C_Mesero.jpg" alt="Mesero 2021" class="align-center" /></p>

<p>The probabilities for the 2nd image (Mesero 2021) are:</p>

<figure class="highlight"><pre><code class="language-text" data-lang="text">Prediction: Industrial; Probability: 0.7301

tensor([2.8591e-04, 3.4131e-06, 1.4427e-04, 2.6680e-01, 7.3014e-01, 3.1067e-05,
        1.4392e-04, 3.6543e-04, 2.0304e-03, 5.0807e-05])</code></pre></figure>

<p>The model has been able to detect the change of land use and land cover even if the presence of objects of two different classes in the same image, i.e. crop and highway in the 1st image and industrial building and highway in the 2nd image has decreased the confidence of the model.</p>

<h2 id="conclusion">Conclusion</h2>
<p>Thanks to the fastai deep learning library it took less than 1 hour to set up the code, finetune the ResNet50 architecture with the EuroSAT RGB dataset for 20 epochs and also perform some tests using new images extracted from the Sentinel-2 products. We have seen that the performances for an LULC task are already quite good but there is certainly room for improvements. The next step is to use the EuroSAT multispectral dataset with the complete set of the Sentinel-2 MSI 13 spectral bands to see whether it helps in better separating the classes related to crop, pasture and forest where the bands in the Near Infrared (NIR) show a stronger reflectance from vegetation than the visible bands. A Jupyter notebook with the complete Python code is available on <a href="https://github.com/luigiselmi/datascience/blob/master/python/copernicus/deeplearning_land_use_land_cover_classification.ipynb">my GitHub repository</a>.</p>

<h2 id="references">References</h2>
<ol>
  <li><a href="https://arxiv.org/abs/1512.03385">He K. et al. - Deep Residual Learning for Image Recognition</a></li>
  <li><a href="https://arxiv.org/abs/1709.00029">Helber P. et al. - EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification</a></li>
  <li><a href="https://land.copernicus.eu/pan-european/corine-land-cover">Copernicus Land Monitoring Service - CORINE Land Cover</a></li>
</ol>]]></content><author><name>Luigi Selmi</name></author><category term="Earth_Observation" /><category term="Remote_Sensing" /><category term="Deep_Learning" /><summary type="html"><![CDATA[The goal with this experiment was to test the accuracy of Convolutional Neural Networks to learn the spatial and spectral characteristics of image patches of the Earth surface extracted from satellite images for Land Use and Land Cover (LULC) classification tasks. To achieve my goal I used a transfer learning technique that consists of using a pretrained ResNet CNN architecture [1] and finetune it with the EuroSAT dataset [2], a collection of labelled satellite patch images extracted from the Copernicus Sentinel-2 satellites products. I used the Fastai deep learning library to write the Python code to train and validate the system, and Google Colab to execute it on a GPU. I have also performed some additional LULC classification tests using new images extracted from the Copernicus Sentinel-2 dataset products through the Sentinel-Hub EO-Browser. In the next four sections I provide some information about the LULC classification task, the Sentinel-2 satellites and their products, the EuroSAT dataset, and the finetuning technique. In the coding sections that follow I describe all the steps required to set up a deep learning architecture to accomplish the LULC task and to assess the accuracy of the results. Land Use and Land Cover (LULC) classification Land cover indicates the type of surface, such as forest or river, whereas land use indicates how people are using the land. Land cover can be determined by the reflectance properties of the surface. This information is commonly extracted from aerial or satellite digital images whose pixels values represent the solar energy reflected by the Earth’s surface in different spectral bands. The class of a land cover at the pixel level can be determined by using some combinations of spectral bands. For instance, vegetation has a stronger reflectance in the near infrared region of the spectrum and can be better observed using bands B7, B8, B8a and B9 of the Sentinel-2 MSI than the bands in the visible region of the spectrum, B4, B3, B2 (or RGB). In the visible region, dry grass has a stronger reflectance in band 4 (Red) than in the other two bands B2 and B3 (blue and green). Classical machine learning algorithms such as Random Forests or Support Vector Machine are used to improve the accuracy of the classification. On the other hand, spectral data at the pixel level alone cannot provide information about the land use and a patch image has to be considered in its entirety to infer its use. Often also additional information is required to disambiguate among all the possible uses of a land. Different classification systems have been developed over the years whose goal is to define a taxonomy of land covers and land use. One such classification system is the CORINE land cover nomenclature [3] that contains 44 classes. CORINE is a land cover inventory performed every 6 years by the Copernicus Land Monitoring Service to monitor the changes in land use and land cover over the European continent. The maps produced by CORINE are based on the Sentinel-2 images classified at the pixel level according to the nomenclature and on information available from national cadastre. Since the availability of deep learning algorithms for computer vision, researchers have been developing models to perform LULC tasks, to be used at any time that new images are available and without using information from cadastre that might be expensive, not always up to date or not publicly available. The Copernicus Sentinel-2 satellites The Copernicus Sentinel-2 constellation is based on two identical satellites for earth observation, launched and operated by the European Space Agency (ESA). Each satellite flights on a Sun synchronous polar orbit at 786 km altitude, 180° apart from each other so that the same area can be revisited every 5 days. Both satellites carry a high spatial resolution multispectral imager (MSI) with a 290 km swath that can collect the solar energy reflected by the Earth surface in 13 spectral bands, from the visible to the near-infrared spectral range. The spatial resolution ranges from 10 m per pixel (B2, B3, B4, and B8 bands) down to 60 m per pixel. The ESA processes the raw data for radiometric calibration, orthorectification, atmospheric correction and georeferencing, and delivers the processed images as Level-1C (L1C) and Level-2A (L2A) data products. Both products are delivered as 100 km x 100 km tiles in UTM/WGS84 projection. L1C products are not processed for atmospheric correction and are described as Top-Of-Atmosphere reflectance data, while L2A products have received atmospheric correction and are described as Bottom-Of-Atmosphere reflectance data. L1C and L2A products can be obtained under an open access license from the Copernicus Open Access Hub or from other providers such as Sentinel-Hub. The EuroSAT dataset The EuroSAT dataset was created at the Deutsches Forschungszentrum für Künstliche Intelligenz (DKFI). The images were extracted from the Sentinel-2A L1C products covering cities in 34 European countries all over a year. The dataset consists of two subsets: RGB and multispectral. Each dataset contains 27000 images divided in 10 classes with 2000 to 3000 images per class. The classes defined to label the images are a subset of those defined in CORINE: Pasture, HerbaceousVegetation, Industrial, AnnualCrop, Residential, PermanentCrop, Highway, SeaLake, Forest, River. The RGB dataset contains images covering the three bands in the visible region of the spectrum (RGB colors). The multispectral dataset contains images covering all the 13 specral bands available from the MSI sensor of the Sentinel-2 satellites. The size of each patch image is 64x64 pixels with 10 m resolution. In this notebook I use only the RGB dataset. Finetuning a pretrained ResNet architecture A Convolutional Neural Network that can learn how to distinguish different types of land covers, where geometries and reflectance properties can be mixed in many different ways, requires an architecture with many layers to achieve a good accuracy. Such architectures are expensive to train from scratch in terms of amount of labeled data needed for training, and also in terms of time and computing resources. It is nowadays normal practice in computer vision to reuse a model that has been pretrained on a different but large set of examples, such as ImageNet, and finetune the pretrained model with data that is specific to the task at hand, in our case LULC classification. Finetuning is a transfer learning technique in which the parameters of a pretrained neural network architecture are updated using the new data. I will use the ResNet50 architecture, pretrained on the ImageNet dataset, as suggested in the EuroSAT paper. I will use the fastai deep learning library to implement all the functions required in this notebook. Fastai is a high level library that leverages the underlying PyTorch machine learning framework. Prepare the development environment I use Google Colab for development and to test the code using the GPUs that are provided free of charge. The machines available in Google Colab come with an old version of the fastai library that does not support the functions and classes that I will be using and must be updated. I use fastai version 2.5.1 but any other later version should also work fine. Download the EuroSAT RGB dataset I download the EuroSAT RGB images from the DKFI website. The images are compressed in a zip file and divided in 10 folders named after the classes defined for the classification task. Define the DataLoaders, the test and validation set, and the transformations to be applied The fastai library provides classes and functions to specify the type of example data (images, text files, tabular data), which transformations must be applied on them (e.g. resizing, cropping), how to find the images in the file system, how to extract their labels (classes), and how to split the EuroSAT dataset in a training set and a validation set. We can also artificially increase the number of images using the data augmentation technique. Fastai provides a convenience function that creates more images from the original EuroSAT dataset by applying different transformations such as flipping, rotation, brightness modification and others. I use the fastai default splitting value, 80% for training and 20% for validation. I do not apply any resizing since all the images in EuroSAT used to finetune the pretrained model have the same shape (64x64 pixels). The main step is to transform all the JPG images in PyTorch tensors, i.e. multidimensional float arrays that can be sent in batches to a GPU for computations. The fastai DataBlock class creates batches of images as 4 dimensional tensors for both the training and the validation steps. A batch is a tensor whose dimensions are B x C x H x W where B stands for batch size, C for channels (e.g. the 3 RGB channels), H for height, and W for width. The default batch size for a DataBlock is 64. blocks = DataBlock(blocks = (ImageBlock, CategoryBlock), get_items=get_image_files, # finds the images in the path splitter=RandomSplitter(seed=42), # default random split 80% training, 20% validation get_y=using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name'), # extracts the label category from the image's folder name batch_tfms=aug_transforms(mult=2)) # data augmentation (mult multiplies the default transformation values) dls = blocks.dataloaders(path) Data augmentation Additional images have been already created in the previous step from the original ones by using the data augmentation technique implemented by the fastai aug_transforms() convenience function. Here I show the images that are created from one example. Setup the ResNet50 ConvNet pretrained architecture Now we select the pretrained architecture to be finetuned using the EuroSAT dataset and the metric we want to use to check whether our model achieves our expectations. In this case we use the accuracy as a metric, that can be simply computed from the error rate (accuracy = 1 - error_rate). We use the ResNet50 architecture pretrained with the ImageNet dataset as in the EuroSAT paper. The ResNet50 CNN architecture contains a sequence of blocks of 3 convolutional layers each, for a total of 50 layers. Each block contains a shortcut connection from the input to the output so that it learns the difference between them, that is the residual. This architectural choise allows to avoid the degradation problem that affects other architectures with many convolutional layers. As said before, a deep architecture is required to learn many complex features from data. Another advantage of the ResNet50 architecture is that the number of parameters don’t depend on the size of the images. The number of channels, also known as features maps, and their size are always the same at each layer so that the original ResNet50 parameters that have been computed using 224x224 pixels images can be updated during the finetuning process using images of a different size such as the 64x64 pixels images of the EuroSAT dataset. The final layer reduces the tensor to a one dimensional vector of size 10, the number of the classes, that is finally sent to a softmax layer that computes the probabilities of each image to be a member of any of the 10 classes used to classify the EuroSAT images. The fastai convenience function cnn_learner() is used to customize the learning process by setting different hyperparameters such as the optimizer (e.g. Stochastic Gradient Descent), the loss function, the learning rate and many others. A summary of the architecture is shown before starting the finetuning process. learn = cnn_learner(dls, resnet50, metrics=accuracy) learn.summary() Sequential (Input shape: 64) ============================================================================ Layer (type) Output Shape Param # Trainable ============================================================================ 64 x 64 x 32 x 32 Conv2d 9408 False BatchNorm2d 128 True ReLU MaxPool2d Conv2d 4096 False BatchNorm2d 128 True Conv2d 36864 False BatchNorm2d 128 True ____________________________________________________________________________ 64 x 256 x 16 x 16 Conv2d 16384 False BatchNorm2d 512 True ReLU ____________________________________________________________________________ 64 x 256 x 16 x 16 Conv2d 16384 False BatchNorm2d 512 True ____________________________________________________________________________ 64 x 64 x 16 x 16 Conv2d 16384 False BatchNorm2d 128 True Conv2d 36864 False BatchNorm2d 128 True ____________________________________________________________________________ 64 x 256 x 16 x 16 Conv2d 16384 False BatchNorm2d 512 True ReLU ____________________________________________________________________________ 64 x 64 x 16 x 16 Conv2d 16384 False BatchNorm2d 128 True Conv2d 36864 False BatchNorm2d 128 True ____________________________________________________________________________ 64 x 256 x 16 x 16 Conv2d 16384 False BatchNorm2d 512 True ReLU ____________________________________________________________________________ 64 x 128 x 16 x 16 Conv2d 32768 False BatchNorm2d 256 True ____________________________________________________________________________ 64 x 128 x 8 x 8 Conv2d 147456 False BatchNorm2d 256 True ____________________________________________________________________________ 64 x 512 x 8 x 8 Conv2d 65536 False BatchNorm2d 1024 True ReLU ____________________________________________________________________________ 64 x 512 x 8 x 8 Conv2d 131072 False BatchNorm2d 1024 True ____________________________________________________________________________ 64 x 128 x 8 x 8 Conv2d 65536 False BatchNorm2d 256 True Conv2d 147456 False BatchNorm2d 256 True ____________________________________________________________________________ 64 x 512 x 8 x 8 Conv2d 65536 False BatchNorm2d 1024 True ReLU ____________________________________________________________________________ 64 x 128 x 8 x 8 Conv2d 65536 False BatchNorm2d 256 True Conv2d 147456 False BatchNorm2d 256 True ____________________________________________________________________________ 64 x 512 x 8 x 8 Conv2d 65536 False BatchNorm2d 1024 True ReLU ____________________________________________________________________________ 64 x 128 x 8 x 8 Conv2d 65536 False BatchNorm2d 256 True Conv2d 147456 False BatchNorm2d 256 True ____________________________________________________________________________ 64 x 512 x 8 x 8 Conv2d 65536 False BatchNorm2d 1024 True ReLU ____________________________________________________________________________ 64 x 256 x 8 x 8 Conv2d 131072 False BatchNorm2d 512 True ____________________________________________________________________________ 64 x 256 x 4 x 4 Conv2d 589824 False BatchNorm2d 512 True ____________________________________________________________________________ 64 x 1024 x 4 x 4 Conv2d 262144 False BatchNorm2d 2048 True ReLU ____________________________________________________________________________ 64 x 1024 x 4 x 4 Conv2d 524288 False BatchNorm2d 2048 True ____________________________________________________________________________ 64 x 256 x 4 x 4 Conv2d 262144 False BatchNorm2d 512 True Conv2d 589824 False BatchNorm2d 512 True ____________________________________________________________________________ 64 x 1024 x 4 x 4 Conv2d 262144 False BatchNorm2d 2048 True ReLU ____________________________________________________________________________ 64 x 256 x 4 x 4 Conv2d 262144 False BatchNorm2d 512 True Conv2d 589824 False BatchNorm2d 512 True ____________________________________________________________________________ 64 x 1024 x 4 x 4 Conv2d 262144 False BatchNorm2d 2048 True ReLU ____________________________________________________________________________ 64 x 256 x 4 x 4 Conv2d 262144 False BatchNorm2d 512 True Conv2d 589824 False BatchNorm2d 512 True ____________________________________________________________________________ 64 x 1024 x 4 x 4 Conv2d 262144 False BatchNorm2d 2048 True ReLU ____________________________________________________________________________ 64 x 256 x 4 x 4 Conv2d 262144 False BatchNorm2d 512 True Conv2d 589824 False BatchNorm2d 512 True ____________________________________________________________________________ 64 x 1024 x 4 x 4 Conv2d 262144 False BatchNorm2d 2048 True ReLU ____________________________________________________________________________ 64 x 256 x 4 x 4 Conv2d 262144 False BatchNorm2d 512 True Conv2d 589824 False BatchNorm2d 512 True ____________________________________________________________________________ 64 x 1024 x 4 x 4 Conv2d 262144 False BatchNorm2d 2048 True ReLU ____________________________________________________________________________ 64 x 512 x 4 x 4 Conv2d 524288 False BatchNorm2d 1024 True ____________________________________________________________________________ 64 x 512 x 2 x 2 Conv2d 2359296 False BatchNorm2d 1024 True ____________________________________________________________________________ 64 x 2048 x 2 x 2 Conv2d 1048576 False BatchNorm2d 4096 True ReLU ____________________________________________________________________________ 64 x 2048 x 2 x 2 Conv2d 2097152 False BatchNorm2d 4096 True ____________________________________________________________________________ 64 x 512 x 2 x 2 Conv2d 1048576 False BatchNorm2d 1024 True Conv2d 2359296 False BatchNorm2d 1024 True ____________________________________________________________________________ 64 x 2048 x 2 x 2 Conv2d 1048576 False BatchNorm2d 4096 True ReLU ____________________________________________________________________________ 64 x 512 x 2 x 2 Conv2d 1048576 False BatchNorm2d 1024 True Conv2d 2359296 False BatchNorm2d 1024 True ____________________________________________________________________________ 64 x 2048 x 2 x 2 Conv2d 1048576 False BatchNorm2d 4096 True ReLU AdaptiveAvgPool2d AdaptiveMaxPool2d Flatten BatchNorm1d 8192 True Dropout ____________________________________________________________________________ 64 x 512 Linear 2097152 True ReLU BatchNorm1d 1024 True Dropout ____________________________________________________________________________ 64 x 10 Linear 5120 True ____________________________________________________________________________ Total params: 25,619,520 Total trainable params: 2,164,608 Total non-trainable params: 23,454,912 Finetuning Now the ResNet50 architecture is loaded with the pretrained parameters and the finetuning process can start. I set the number of epochs to a value that achieves a good enough accuracy without wasting too much time and resources in computations. We can try different numbers of epochs to find the right one. Fastai provides a convenience function that can do the work for us. In an epoch a batch of images is used to compute an average value of the loss and to update the model parameters before using the next batch till all the batches have been used and a new epoch can start. learn.fine_tune(20) We have achieved an accuracy close to 98%. The exact value can change slightly anytime we finetune the model. If that level of accuracy is enough for our purpose we can stop here otherwise we can run more epochs, try different learning rates or apply more transformations to further increase the number of images. We can plot the accuracy of the finetuned model relative to the number of epochs. Model evaluation We want to understand what is the accuracy of our model in separating images that belong to different classes. For example, it is known that the spectral response of rivers (turbid water) and roads is pretty similar in the visible part of the spectrum as they absorb most of the solar radiation, and seen from a long distance within a patch of 640 m x 640 m a river and a highway might be confused. So we will not be surprised if some images of rivers and roads may be misclassified, that is, an image of a river may be interpreted as an image of a road and vice versa by our model. Also, the distinction between Pasture, Permanent Crop, Annual Crop or Herbaceous Vegetation may not be so clear using only the three RGB bands so we can expect a certain level of misclassifications among these classes as well. We plot the confusion matrix using the validation data to check for which classes there have been misclassifications. interp = ClassificationInterpretation.from_learner(learn) interp.plot_confusion_matrix(figsize=(12,12), dpi=60) As we already figured out, most of the misclassifications happen between classes that are related to the vegetation. Also for rivers and highways some misclassifications are reported. Still the great majority of classifications for each class are correct. A minimalist application Once we have achieved the accuracy that we wanted, we can use our finetuned model with new images, for example extracted from the Sentinel-2 products available from Sentinel-Hub. I have extracted some patch images of the same size as those used to finetune the model (64x64 pixels). The new images were extracted manually from Sentilel-2 L1C products using the Sentinel-Hub EO-Browser. As an example I use two images from the exact same area from Mesero (Lat. = 45.487971, Lon. = 8.849979), close to Milan, taken two years apart from each other to show how the application can be used to detect changes in land cover and use. The 1st image of the area was taken in July 2019 and shows a land that is mostly covered by crops (‘Permanent Crop’ or ‘Annual Crop’) but also contains a highway at its bottom. The probabilities computed by our model for the 1st image (Mesero 2019) are: Prediction: Highway; Probability: 0.9980 tensor([4.8410e-04, 1.3392e-06, 1.5634e-04, 9.9796e-01, 1.9808e-04, 1.3962e-04, 8.8622e-04, 4.6300e-05, 1.1766e-04, 1.2730e-05]) The 2nd image was taken in August 2021 and shows an industrial building and still the same highway as in 2019. The probabilities for the 2nd image (Mesero 2021) are: Prediction: Industrial; Probability: 0.7301 tensor([2.8591e-04, 3.4131e-06, 1.4427e-04, 2.6680e-01, 7.3014e-01, 3.1067e-05, 1.4392e-04, 3.6543e-04, 2.0304e-03, 5.0807e-05]) The model has been able to detect the change of land use and land cover even if the presence of objects of two different classes in the same image, i.e. crop and highway in the 1st image and industrial building and highway in the 2nd image has decreased the confidence of the model. Conclusion Thanks to the fastai deep learning library it took less than 1 hour to set up the code, finetune the ResNet50 architecture with the EuroSAT RGB dataset for 20 epochs and also perform some tests using new images extracted from the Sentinel-2 products. We have seen that the performances for an LULC task are already quite good but there is certainly room for improvements. The next step is to use the EuroSAT multispectral dataset with the complete set of the Sentinel-2 MSI 13 spectral bands to see whether it helps in better separating the classes related to crop, pasture and forest where the bands in the Near Infrared (NIR) show a stronger reflectance from vegetation than the visible bands. A Jupyter notebook with the complete Python code is available on my GitHub repository. References He K. et al. - Deep Residual Learning for Image Recognition Helber P. et al. - EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification Copernicus Land Monitoring Service - CORINE Land Cover]]></summary></entry><entry><title type="html">The Hough Transform</title><link href="https://www.selmilab.eu/hough_transform.html" rel="alternate" type="text/html" title="The Hough Transform" /><published>2021-01-12T00:00:00+00:00</published><updated>2021-01-12T00:00:00+00:00</updated><id>https://www.selmilab.eu/hough_transform</id><content type="html" xml:base="https://www.selmilab.eu/hough_transform.html"><![CDATA[<script type="text/x-mathjax-config">
MathJax.Hub.Config({
  tex2jax: {
    inlineMath: [['$','$'], ['\\(','\\)']],
    processEscapes: true
  }
});
</script>

<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script>

<p>The <a href="https://en.wikipedia.org/wiki/Hough_transform">Hough transform</a> is used in digital image processing and computer vision to find geometrical shapes such as lines, circles or ellipses, common in images that contain man-made objects. The Hough transform can be used after an image has been pre-processed by an edge detector to find the edges that reveal the border of objects or regions inside it. In this post I will introduce briefly the theory behind the Hough transform, and then I will present two examples , one with images containing simple geometrical shapes, to better explain the idea, and one with an image containing man-made objects. A Jupyter notebook with the Python code used to implement the functions discussed in the post and to derive the pictures shown here is available on my <a href="https://github.com/luigiselmi/dip/blob/main/hough_transform.ipynb">GitHub repository</a>.</p>

<h2 id="introduction">Introduction</h2>
<p>In digital image processing different filters are available that can detect edges on an image, namely regions in which the intensity value of a set of pixels along a certain direction changes steeply. These regions contain pixels that are members of real edges but also pixels that are due to noise or blurring. Often what we want to know, and what we want a computer to detect, is the shape, namely the analytical description of the object that has been revealed by the edge detector, such as the slope and intersect of a line. This further step, after the edge detection, is called edge linking and consists of connecting together the pixels that are real members of an edge of a certain shape, avoiding the pixels that have been included due to noise or blurring. The most common shape that can be found in pictures, especially those containing man-made objects, is a line. The problem can be solved by exhaustively by testing all the pixels in the edge regions. However, the computational complexity of such an approach would be proportional to the square of the number of edge pixels. Another approach was suggested in 1962 by Paul Hough, while trying to automatically determine the trajectories of charged particles crossing a bubble chamber.</p>

<p><img src="/assets/images/hough_transform/mesons.png" alt="Bubble chamber" /></p>

<p>The Hough method was to transform each bubble point $(x_0, y_0)$, represented by a pixel in the image, and the set of all possible lines $y_0 = sx_0 + d$ passing through it, into a line in a parameter space $(s, d)$ whose variable were the slope $s$ and the intercept $d$ with the y axis. If two points in the image belong to the same line, their representations in the parameter space must intersect for a certain value of the slope $s$ and the intercept $d$. We can therefore solve the problem of finding a line that goes through a certain number of pixels in the image by solving the problem of finding the point in the parameter space where the lines that represent each pixels intersect. The more lines intersect in a specific point $(s_0, d_0)$ of the parameter space, the more pixels in the image belong to the same line with slope $s_0$ and intersect $d_0$. The point in the parameter space that lies at the intersection of a high number of lines represents the most “voted” line in the image. Since the linear parametrization is unbounded for vertical or near vertical lines, a different transformation was introduced by <a href="https://www.cse.unr.edu/~bebis/CS474/Handouts/HoughTransformPaper.pdf">Duda and Hart</a> that uses as parameters the orientation angle $\theta$ and the distance $\rho$ from the origin of the coordinates system to represent the set of lines that can pass through a pixel. We can derive the normal form of a line by computing the slope and the intercept in the frame of reference that is commonly used for images where the origin is on the upper left corner, with the y axis pointing downward and the x axis pointing to the right.</p>

<p><img src="/assets/images/hough_transform/hough_transform.png" alt="Hough Transform" /></p>

<p>From the diagram we can easily derive the expressions of the slope s and the intercept d for the equation of a line $y = sx + d$</p>

\[s = \frac{y_2 - y_1}{x_2 - x_1} = \frac{cos(\theta)}{sin(\theta)}\]

<p>and</p>

\[d = \frac{\rho}{sin(\theta)}\]

<p>so that we can represent the set of lines passing through a pixel at $(x_0, y_0)$ with the expression</p>

\[\rho = -x_0 cos(\theta) + y_0 sin(\theta)\]

<p>With this expression, called normal Hesse form or simply normal form, we can represent the set of lines that pass through a pixel at $(x_0, y_0)$ in the image by a sinusoidal function in the parameter space $(\theta, \rho)$. If two points belong to the same line in the image, their representations as sinusoidal functions in the parameter space must intersect at a certain point $(\theta_0, \rho_0)$. Similarly to what has been said before, the more sinusoidal curves intersect in a point $(\theta_0, \rho_0)$ of the parameter space, the more its corresponding line in the image ranks high enough to be elected as a real line. We can count the number of sinusoidal curves that intersect at each point of the parameter space by dividing this space into a grid of cells whose width and height depends on the angular and spatial resolution of the image. For example, if we can distinguish two lines in the image that are rotated by 1 degree and two lines that are separated by one pixel, we can set the width of each cell in the parameter space $(\theta, \rho)$ as one degree and the height as 1 pixel. In Python we can use a two-dimensional array to store the number of sinusoidal curves that pass through each cell. The 2D array is called accumulator matrix. Once we have processed all the edge pixels, computed the corresponding Hough transform and counted the votes for each cell in the accumulator matrix, we can select the cells that contain the highest number of votes that correspond to straight lines in the image.
In the following section we will see examples of the application of the Hough transform to detect simple geometrical shapes, made up of dotted lines.</p>

<h2 id="images-with-geometrical-shapes">Images with geometrical shapes</h2>
<p>An image in Python is a 2D array in which the intensity values of each pixel are stored. We start by creating an image with shapes composed of lines to test the performances of the Hough algorithm. The first step is to compute the Hough transform, in normal form, for each pixel that belongs to a geometrical shape. The second step is to initialize the accumulator matrix $A$ and, for each pixel that belongs to a shape, mark each cell $A[i_{\rho}, j_{\theta}]$ in the accumulator that is passed by its Hough transform, represented by a sinusoidal curve. In other words, we store in the accumulator the trace of the Hough transform of every edge pixel in the image. The Hough transform returns the quantized values $j_{\theta}$ and $i_{\rho}$ for $\theta$ and $\rho$. We choose the quantization for the angle $\theta$ based on the accuracy of the orientation of a line in the image. We assume the angle $\theta$ lies in the interval $0 \leq \theta \lt 180$ so that the relation between $\theta$ and $\rho$ is one-to-one. If the resolution of our image is good enough that we can distinguish lines whose difference in slope is at least one degree, we can set the increment to 1 degree, or $\frac{\pi}{180}$ radians.
In the same way, we can choose the quantization for the distance $\rho$ of a pixel from the origin. Given an image whose 2D array shape is (M,N), i.e. M rows and N columns, the distance between any two pixels in the image cannot be bigger than the length of the diagonal of the image, therefore $0 \lt \rho \lt \sqrt{M^2 + N^2}$. If the spatial resolution of our image is one pixel, we can set the increment for the distance to 1 pixel as well.
With this quantization we can represent any pixel in an image and the set of lines that pass through it, represented by the parameters $\theta$ and $\rho$ in the parameters space, with the two integer values $j_{\theta}$ and $i_{\rho}$ that can range between 0 and 180 degree and 0 and the length of the diagonal of the image, respectively. The two integer values are used as indexes of the cell $A[i_{\rho}, j_{\theta}]$ that contains the number of votes for the line in the image whose angle with the y axis is $\theta$ and whose distance from the origin is $\rho$.</p>

<h3 id="the-hough-curves">The Hough curves</h3>
<p>As an example, we plot the Hough sinusoidal curves of three aligned pixels, to see that they intersect in one point $(\theta_0, \rho_0)$ of the parameter space that corresponds to the angle $\theta_0$ between the line that passes through them and the y axis, and to the distance $\rho_0$ of the line from the origin.</p>

<p><img src="/assets/images/hough_transform/hough_lines.png" alt="Hough Curves" /></p>

<p>We can see from the plot that the three sinusoidal curves that represent the three pixels in the parameter space cross each other at 45 degrees and at a distance of approximately 70 pixels. We will see how to use the accumulator matrix to derive both values with the best accuracy possible. The picture can be seen as a snapshot of the accumulator matrix, after the Hough transforms of the three pixels have been determined and stored.</p>

<h3 id="the-accumulator-matrix">The accumulator matrix</h3>
<p>As said before, in Python we can use a 2D array to store the traces of the curves computed for each edge pixels using the Hough transform. We can see that the value of the distance parameter $\rho$ can be negative for certain values of the pixel coordinates and of the angle $\theta$. Since NumPy cannot use negative values for indexes we get the absolute value of the distance. In this way we will be able to store the votes for any point in the parameters space.</p>

<p>We create an image with a triangular shape and then we compute the Hough transform of each pixel that belongs to any of the three lines that form the triangle. We store the number of curves that pass through each accumulator’s cell $A[i_{\rho}, j_{\theta}]$ and, after all the edge pixels are processed, we plot the image and the corresponding accumulator matrix. We notice four points in the Hough transform diagram with the highest values, also called peaks: the point at 135 degrees, that has the highest number of votes, one at 90 degree, that represents the horizontal line in the image, and two other points at 0 and 180 degree that represent the same vertical line in the image. We can extract the peaks from the accumulator matrix by setting a minimum vote threshold and taking only the cells whose value lies above it.</p>

<p><img src="/assets/images/hough_transform/hough_triangle.png" alt="Triangle Hough Transform" /></p>

<p>After we have got the angle $\theta$ and distance $\rho$ of the peaks in the accumulator matrix, corresponding to the most voted lines in our image, we can compute the respective slopes and intercepts.</p>

<p>We show another example with an image that contains a little more complex figure with two geometrical shapes, the triangle we have already used and a square box. We create the image and compute the accumulator matrix. We plot the image and the detected lines setting the minimum threshold for the cells in the accumulator matrix to 50 votes first and then to 200.</p>

<p><img src="/assets/images/hough_transform/shapes_transform.png" alt="Shapes Transform" /></p>

<h2 id="images-with-man-made-objects">Images with man-made objects</h2>
<p>Now that we have tested our implementation of the Hough transform with images containing geometrical shapes made up of dotted lines, we are ready to move on to the next step, namely, applying the algorithm to find lines in pictures containing man-made objects. When we use pictures of real objects, before looking for lines or other geometrical shapes, we have to detect the edges that reveal the border of objects or regions in the image. This step was not necessary in the previous examples because the edges of the geometrical shapes were drawn precisely using the equation of a line. Borders separating man-made or natural objects can be found using a thresholding function or an edge detector. Once edges have been detected, the next step is to link their pixels to find out lines for which we can determine the slope and the intercept. We perform the linking step using the Hough transform. We can build a pipeline of functions to find lines in pictures. We can add one more step to our pipeline to take into account the quantization error of the accumulator matrix for which the Hough lines may not intersect exactly in one single cell but more likely in a cluster of neighboring cells. We add a thresholding step after the edge detector to separate precisely the edges from the background. The complete steps that we will perform in the next example are the followings</p>

<ol>
  <li>Apply the gradient-based edge detector to an image to get its edge map.</li>
  <li>Apply a threshold to the edge map to obtain a binary representation.</li>
  <li>Apply the Hough transform to the foreground pixels of the binary edge map to build the accumulator matrix.</li>
  <li>Suppress the nonmaximal cells from the accumulator matrix to reduce the quantization error</li>
  <li>Set the minimum votes threshold to select the peaks in the accumulator matrix that correspond to straight lines in the image.</li>
  <li>Compute the slopes and intercepts of the lines in the image corresponding to the peaks.</li>
  <li>Plot the lines on the image.</li>
</ol>

<p>The quantization error can be addressed by suppressing from the accumulator matrix the nonmaximal cells whose value is lower than any of its neighboring cells. A function is defined in the Jupyter notebook to implement the suppression of the nonmaximal cells.</p>

<p>In the next example we use an image of an airport that contains two runways, among other structures. We compute the edge map of the image by applying a gradient filter, and then we create a binary version of the edge map by applying a thresholding function that enhances the separation between the edges and the background. From the binary edge map we can compute the accumulator matrix. We suppress the nonmaximal cells in the accumulator matrix, within a default distance of one pixel from each cell and finally, we select the peaks in the parameter space whose number of votes are above a threshold. The slopes and intercepts corresponding to the peaks are used to plot the detected lines superimposed on the original image.</p>

<p><img src="/assets/images/hough_transform/runways_hough_transform.png" alt="Runways Hough Transform" /></p>

<p>We can see from the last picture that the Hough transform is able to determine the main lines, with their slopes and intercepts, that correspond to the borders of the runways of the airport. We can also notice that other lines, visible in the binary image, have not been included in the set that resulted from our choice of the vote threshold and neighboring distance. This is mainly due to the fact that those lines are shorter or contain less edge pixels than the two runways. This bias towards longer lines can be addressed, for example by dividing the image in smaller boxes and then applying the Hough transform to each of them, or by finding the pixels that delimit the lines in the binary image and then looking for the corresponding lines in the accumulator matrix.</p>

<h2 id="conclusion">Conclusion</h2>
<p>The Hough transform can be used to extract lines from images with a complexity cost that is linear with respect to the number of edge pixels. We have shown the basic steps that are required to implement the Hough transform for which some manual settings are required, such as the quantization of the parameter space, the votes threshold and the neighboring distance for the accumulator matrix.</p>]]></content><author><name>Luigi Sellmi</name></author><category term="Digital_Image_Processing" /><category term="Computer_Vision" /><summary type="html"><![CDATA[The Hough transform is used in digital image processing and computer vision to find geometrical shapes such as lines, circles or ellipses, common in images that contain man-made objects. The Hough transform can be used after an image has been pre-processed by an edge detector to find the edges that reveal the border of objects or regions inside it. In this post I will introduce briefly the theory behind the Hough transform, and then I will present two examples , one with images containing simple geometrical shapes, to better explain the idea, and one with an image containing man-made objects. A Jupyter notebook with the Python code used to implement the functions discussed in the post and to derive the pictures shown here is available on my GitHub repository. Introduction In digital image processing different filters are available that can detect edges on an image, namely regions in which the intensity value of a set of pixels along a certain direction changes steeply. These regions contain pixels that are members of real edges but also pixels that are due to noise or blurring. Often what we want to know, and what we want a computer to detect, is the shape, namely the analytical description of the object that has been revealed by the edge detector, such as the slope and intersect of a line. This further step, after the edge detection, is called edge linking and consists of connecting together the pixels that are real members of an edge of a certain shape, avoiding the pixels that have been included due to noise or blurring. The most common shape that can be found in pictures, especially those containing man-made objects, is a line. The problem can be solved by exhaustively by testing all the pixels in the edge regions. However, the computational complexity of such an approach would be proportional to the square of the number of edge pixels. Another approach was suggested in 1962 by Paul Hough, while trying to automatically determine the trajectories of charged particles crossing a bubble chamber. The Hough method was to transform each bubble point $(x_0, y_0)$, represented by a pixel in the image, and the set of all possible lines $y_0 = sx_0 + d$ passing through it, into a line in a parameter space $(s, d)$ whose variable were the slope $s$ and the intercept $d$ with the y axis. If two points in the image belong to the same line, their representations in the parameter space must intersect for a certain value of the slope $s$ and the intercept $d$. We can therefore solve the problem of finding a line that goes through a certain number of pixels in the image by solving the problem of finding the point in the parameter space where the lines that represent each pixels intersect. The more lines intersect in a specific point $(s_0, d_0)$ of the parameter space, the more pixels in the image belong to the same line with slope $s_0$ and intersect $d_0$. The point in the parameter space that lies at the intersection of a high number of lines represents the most “voted” line in the image. Since the linear parametrization is unbounded for vertical or near vertical lines, a different transformation was introduced by Duda and Hart that uses as parameters the orientation angle $\theta$ and the distance $\rho$ from the origin of the coordinates system to represent the set of lines that can pass through a pixel. We can derive the normal form of a line by computing the slope and the intercept in the frame of reference that is commonly used for images where the origin is on the upper left corner, with the y axis pointing downward and the x axis pointing to the right. From the diagram we can easily derive the expressions of the slope s and the intercept d for the equation of a line $y = sx + d$ \[s = \frac{y_2 - y_1}{x_2 - x_1} = \frac{cos(\theta)}{sin(\theta)}\] and \[d = \frac{\rho}{sin(\theta)}\] so that we can represent the set of lines passing through a pixel at $(x_0, y_0)$ with the expression \[\rho = -x_0 cos(\theta) + y_0 sin(\theta)\] With this expression, called normal Hesse form or simply normal form, we can represent the set of lines that pass through a pixel at $(x_0, y_0)$ in the image by a sinusoidal function in the parameter space $(\theta, \rho)$. If two points belong to the same line in the image, their representations as sinusoidal functions in the parameter space must intersect at a certain point $(\theta_0, \rho_0)$. Similarly to what has been said before, the more sinusoidal curves intersect in a point $(\theta_0, \rho_0)$ of the parameter space, the more its corresponding line in the image ranks high enough to be elected as a real line. We can count the number of sinusoidal curves that intersect at each point of the parameter space by dividing this space into a grid of cells whose width and height depends on the angular and spatial resolution of the image. For example, if we can distinguish two lines in the image that are rotated by 1 degree and two lines that are separated by one pixel, we can set the width of each cell in the parameter space $(\theta, \rho)$ as one degree and the height as 1 pixel. In Python we can use a two-dimensional array to store the number of sinusoidal curves that pass through each cell. The 2D array is called accumulator matrix. Once we have processed all the edge pixels, computed the corresponding Hough transform and counted the votes for each cell in the accumulator matrix, we can select the cells that contain the highest number of votes that correspond to straight lines in the image. In the following section we will see examples of the application of the Hough transform to detect simple geometrical shapes, made up of dotted lines. Images with geometrical shapes An image in Python is a 2D array in which the intensity values of each pixel are stored. We start by creating an image with shapes composed of lines to test the performances of the Hough algorithm. The first step is to compute the Hough transform, in normal form, for each pixel that belongs to a geometrical shape. The second step is to initialize the accumulator matrix $A$ and, for each pixel that belongs to a shape, mark each cell $A[i_{\rho}, j_{\theta}]$ in the accumulator that is passed by its Hough transform, represented by a sinusoidal curve. In other words, we store in the accumulator the trace of the Hough transform of every edge pixel in the image. The Hough transform returns the quantized values $j_{\theta}$ and $i_{\rho}$ for $\theta$ and $\rho$. We choose the quantization for the angle $\theta$ based on the accuracy of the orientation of a line in the image. We assume the angle $\theta$ lies in the interval $0 \leq \theta \lt 180$ so that the relation between $\theta$ and $\rho$ is one-to-one. If the resolution of our image is good enough that we can distinguish lines whose difference in slope is at least one degree, we can set the increment to 1 degree, or $\frac{\pi}{180}$ radians. In the same way, we can choose the quantization for the distance $\rho$ of a pixel from the origin. Given an image whose 2D array shape is (M,N), i.e. M rows and N columns, the distance between any two pixels in the image cannot be bigger than the length of the diagonal of the image, therefore $0 \lt \rho \lt \sqrt{M^2 + N^2}$. If the spatial resolution of our image is one pixel, we can set the increment for the distance to 1 pixel as well. With this quantization we can represent any pixel in an image and the set of lines that pass through it, represented by the parameters $\theta$ and $\rho$ in the parameters space, with the two integer values $j_{\theta}$ and $i_{\rho}$ that can range between 0 and 180 degree and 0 and the length of the diagonal of the image, respectively. The two integer values are used as indexes of the cell $A[i_{\rho}, j_{\theta}]$ that contains the number of votes for the line in the image whose angle with the y axis is $\theta$ and whose distance from the origin is $\rho$. The Hough curves As an example, we plot the Hough sinusoidal curves of three aligned pixels, to see that they intersect in one point $(\theta_0, \rho_0)$ of the parameter space that corresponds to the angle $\theta_0$ between the line that passes through them and the y axis, and to the distance $\rho_0$ of the line from the origin. We can see from the plot that the three sinusoidal curves that represent the three pixels in the parameter space cross each other at 45 degrees and at a distance of approximately 70 pixels. We will see how to use the accumulator matrix to derive both values with the best accuracy possible. The picture can be seen as a snapshot of the accumulator matrix, after the Hough transforms of the three pixels have been determined and stored. The accumulator matrix As said before, in Python we can use a 2D array to store the traces of the curves computed for each edge pixels using the Hough transform. We can see that the value of the distance parameter $\rho$ can be negative for certain values of the pixel coordinates and of the angle $\theta$. Since NumPy cannot use negative values for indexes we get the absolute value of the distance. In this way we will be able to store the votes for any point in the parameters space. We create an image with a triangular shape and then we compute the Hough transform of each pixel that belongs to any of the three lines that form the triangle. We store the number of curves that pass through each accumulator’s cell $A[i_{\rho}, j_{\theta}]$ and, after all the edge pixels are processed, we plot the image and the corresponding accumulator matrix. We notice four points in the Hough transform diagram with the highest values, also called peaks: the point at 135 degrees, that has the highest number of votes, one at 90 degree, that represents the horizontal line in the image, and two other points at 0 and 180 degree that represent the same vertical line in the image. We can extract the peaks from the accumulator matrix by setting a minimum vote threshold and taking only the cells whose value lies above it. After we have got the angle $\theta$ and distance $\rho$ of the peaks in the accumulator matrix, corresponding to the most voted lines in our image, we can compute the respective slopes and intercepts. We show another example with an image that contains a little more complex figure with two geometrical shapes, the triangle we have already used and a square box. We create the image and compute the accumulator matrix. We plot the image and the detected lines setting the minimum threshold for the cells in the accumulator matrix to 50 votes first and then to 200. Images with man-made objects Now that we have tested our implementation of the Hough transform with images containing geometrical shapes made up of dotted lines, we are ready to move on to the next step, namely, applying the algorithm to find lines in pictures containing man-made objects. When we use pictures of real objects, before looking for lines or other geometrical shapes, we have to detect the edges that reveal the border of objects or regions in the image. This step was not necessary in the previous examples because the edges of the geometrical shapes were drawn precisely using the equation of a line. Borders separating man-made or natural objects can be found using a thresholding function or an edge detector. Once edges have been detected, the next step is to link their pixels to find out lines for which we can determine the slope and the intercept. We perform the linking step using the Hough transform. We can build a pipeline of functions to find lines in pictures. We can add one more step to our pipeline to take into account the quantization error of the accumulator matrix for which the Hough lines may not intersect exactly in one single cell but more likely in a cluster of neighboring cells. We add a thresholding step after the edge detector to separate precisely the edges from the background. The complete steps that we will perform in the next example are the followings Apply the gradient-based edge detector to an image to get its edge map. Apply a threshold to the edge map to obtain a binary representation. Apply the Hough transform to the foreground pixels of the binary edge map to build the accumulator matrix. Suppress the nonmaximal cells from the accumulator matrix to reduce the quantization error Set the minimum votes threshold to select the peaks in the accumulator matrix that correspond to straight lines in the image. Compute the slopes and intercepts of the lines in the image corresponding to the peaks. Plot the lines on the image. The quantization error can be addressed by suppressing from the accumulator matrix the nonmaximal cells whose value is lower than any of its neighboring cells. A function is defined in the Jupyter notebook to implement the suppression of the nonmaximal cells. In the next example we use an image of an airport that contains two runways, among other structures. We compute the edge map of the image by applying a gradient filter, and then we create a binary version of the edge map by applying a thresholding function that enhances the separation between the edges and the background. From the binary edge map we can compute the accumulator matrix. We suppress the nonmaximal cells in the accumulator matrix, within a default distance of one pixel from each cell and finally, we select the peaks in the parameter space whose number of votes are above a threshold. The slopes and intercepts corresponding to the peaks are used to plot the detected lines superimposed on the original image. We can see from the last picture that the Hough transform is able to determine the main lines, with their slopes and intercepts, that correspond to the borders of the runways of the airport. We can also notice that other lines, visible in the binary image, have not been included in the set that resulted from our choice of the vote threshold and neighboring distance. This is mainly due to the fact that those lines are shorter or contain less edge pixels than the two runways. This bias towards longer lines can be addressed, for example by dividing the image in smaller boxes and then applying the Hough transform to each of them, or by finding the pixels that delimit the lines in the binary image and then looking for the corresponding lines in the accumulator matrix. Conclusion The Hough transform can be used to extract lines from images with a complexity cost that is linear with respect to the number of edge pixels. We have shown the basic steps that are required to implement the Hough transform for which some manual settings are required, such as the quantization of the parameter space, the votes threshold and the neighboring distance for the accumulator matrix.]]></summary></entry><entry><title type="html">Public-key cryptography and digital signature using OpenSSL</title><link href="https://www.selmilab.eu/message-encryption-and-signature.html" rel="alternate" type="text/html" title="Public-key cryptography and digital signature using OpenSSL" /><published>2018-12-27T00:00:00+00:00</published><updated>2018-12-27T00:00:00+00:00</updated><id>https://www.selmilab.eu/message-encryption-and-signature</id><content type="html" xml:base="https://www.selmilab.eu/message-encryption-and-signature.html"><![CDATA[<p>The purpose of this post is to explain how to communicate privately over the Internet using public-key cryptography and how to digitally sign a document.</p>

<h2 id="introduction">Introduction</h2>
<p>Being able to communicate privately is a civil right and often a business need. As we can not allow anyone to eavesdrop our communications, we have also the right to avoid surveillance by companies or governments. There are many tools and protocols, many being open source and free, that can be used to enhance the security of our communications over the Internet. The aim of this post is to provide a very high level description of the ideas behind these tools and protocols and practical guidance on how to use one of them, <a href="https://www.openssl.org/">OpenSSL</a>, which is open source, free and used to secure most of the communications over the Internet. In particular in this post we will show</p>

<ol>
  <li>How to avoid being eavesdropped while sending files to our friends or collaborators over the internet</li>
  <li>How to digitally sign a document</li>
</ol>

<p>It is supposed that you are using a Linux distribution or a Mac with OpenSSL version 1.0.2 installed. In case you use Windows you might want to install <a href="https://www.cygwin.com/">Cygwin</a> with openssl. It is assumed that you know how to use the command line.</p>
<h2 id="alice-and-bob">Alice and Bob</h2>
<p>We will set up a context for the secure communication problem using two characters, Alice and Bob. We will simulate the transmission of encrypted messages between Alice and Bob by copying files from Alice’s folder to Bob’s and vice-versa on our local file system. This simulation is meant for you to easily check what happens on both sides when they send or receive messages using OpenSSL, but it must be kept in mind that it bypasses the core business of encryption that is about sending messages over an insecure channel such as the Internet where other parties could eavesdrop or interfere with Alice’s and Bob’s communication. With this warning in mind, let’s start our simulation by creating a folder for Alice’s messages and one for Bob’s</p>

<figure class="highlight"><pre><code class="language-text" data-lang="text">$ mkdir alice

$ mkdir bob</code></pre></figure>

<p>Let’s imagine that Bob can’t remember his bank account details and asks Alice to send them to him by email. Alice is aware that sending the data as plain text over the Internet is risky so she wonders how to send the data to Bob in such a way that nobody else but he can read and use the data. After some investigation, Alice decides that the solution to their problem is public-key cryptography and the OpenSSL tools.</p>

<h2 id="public-key-cryptography">Public-key cryptography</h2>
<p>Public-key cryptography consists of creating a key pair, namely a private key and a public key, to encrypt and decrypt messages. The private key is kept secret and is never shared with anyone. Alice uses Bob’s public key to encrypt the messages being sent to him. Bob uses his private key to decrypt the messages encrypted with his public key. The public key can even be published on the Internet for that matter. Only the owner of the private key can decrypt a message encrypted with his or her public key. There are different ways of creating a key pair but all are based on defining mathematical problems that are very difficult to solve in a short time scale, such as factorizing a number that is the product of two big prime numbers. This class of problems is used in the Rivest-Shamir-Adleman (RSA) cryptosystem. The idea is to find two prime numbers big enough, e.g. with more than 150 digits, so that it would be very difficult even for a cluster of computers to find them out in decades while it is very easy to compute their product. In RSA, the public key is the product of two prime numbers and the private key is the set of the two prime numbers themselves. An eavesdropper who wants to decrypt a message would need to extract the private key, i.e. the two prime numbers, from their product. In other words, the eavesdropper must be able to factorize a number that is the product of two big prime numbers, which in itself is an hard enough problem. Using RSA we can be confident that nobody will be able to decrypt our messages. The algorithm used for the encryption is well known and publicly available. The only thing that is not public, and known only to the owner of the key pair, is the private key. Let’s see what Alice and Bob have to do to keep their communication private:</p>

<ol>
  <li>Alice and Bob create their own private and public keys.</li>
  <li>Bob sends Alice his public key.</li>
  <li>Alice encrypts her message using Bob’s public key and sends it to Bob.</li>
  <li>Bob decrypts Alice’s message using his private key.</li>
</ol>

<p>So, first of all, both Alice and Bob need a key pair.</p>

<h3 id="1-alice-and-bob-create-their-own-private-and-public-keys">1. Alice and Bob create their own private and public keys.</h3>
<p>Alice doesn’t yet have a key pair, so she needs to create it. As an example she may use the RSA cryptosystem. Her private key will be stored in a file, e.g. alice_rsa. The size of the private key will be 2048 bit. Let’s move into Alice’s folder and execute the command</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>openssl genpkey <span class="nt">-algorithm</span> RSA <span class="nt">-out</span> alice_rsa <span class="nt">-pkeyopt</span> rsa_keygen_bits:2048</code></pre></figure>

<p>The private key in alice_rsa is saved in the Privacy-Enhanced Mail (PEM) format and looks like the following</p>

<figure class="highlight"><pre><code class="language-text" data-lang="text">$ cat alice_rsa
-----BEGIN PRIVATE KEY-----
MIIEvgIBADANBgkqhkiG9w0BAQEFAASCBKgwggSkAgEAAoIBAQC68nDsjtWepLcM
pF4zVaMdFsVdg692M5Mj9v/vGvgyyPHpHmH/QKolOB9KtlUZcth6d7fwmFgyaa/m
XN1HjORKrpzm0rysPpFXJymUmIGy9XLzvPP4phJS3oGsnjsQJ6O017uWt8kqgz5U
U+hYc1/AOUCA9vEP1AN+6fT6O11zZQ75aeJOK6aESV/++7ZaM4M8maVrCKFhonBU
L9ByHNDkgQMWIu2iGazb7FZ5xDHWq+wBpfJZe8/3LNjS7VpUNeqoCsBuUE4fWHYo
TSPK3CmhpWHtk9EnMxyho/rpt6/ETYPOQ3QV6Uxz4G9tDLpJzgL2Q4VHVKwqJSnT
BLrvVOnxAgMBAAECggEBAK0VbDHIqMVh0Ux2HfU/U27KN182Xcx9Qbzpodm5yZQT
cc4Y4DhYoW8mP+qHV9DhAMaacwXhtr6uFTqePg1Rx8fRVNlswVxj7WKYkqnObT7I
e25pQiSzdYGeGsc8FIkHek0j870+WZTvwFSI/zRtVXh+SVddyqCR9c6aQ8MuFX6Q
u9LzGNcYTg6Dmv8qsrXlctkJRvLOuKajaAG3AHT6f5GTXRDDhk9/Ab4h9Dorkgen
Z0fg9yLfvlltO6z3z99VHIMlWX4TZ67kGP7L+AqPpzN9Qj9G15h2Blb6InlB5J/q
pgKUVGGVC0TTILLjXLUT3xEvrWvpqHIAj80Xejuz0XUCgYEA5CAIZdKyqTrfNKTg
B7ZxIX23R/YoqU6SGYDJ/8mz8Z+0PdlCrFb/fTvb2e8aQWSfYTccwhpPH9ZGQ2zL
hXImNxPeVPymVYojZ8XQwwsd2KoK1jkuLr7uzcs393P5PB0YoCzB4kvsnlozcyrm
+lTz913eHd6BsXoRx8GeRPs3nx8CgYEA0cpRr8+EBYgWCgJ71cpB8BLZ+kZErz5W
p9GO5CD/wqaJ+Ljzrvr+XmbCFzDaf/KPTcYeFD7bz9aYq3SSavvnNSQQsHhjuphb
CE40eR/fLwubyYhjOdHXjdYrxsI2gF7FyOO25PWx2OLoCqZITBYlaOdgxefQNtlM
1boATrYZJO8CgYBCuX/bUIqDZz3cJxGED//9HMlcGgsAooOnQ/1RfMzOMrlEkeSn
hfbKyZRfpUkXsXfQto8J0yorlMAOfqb0zFOTLpOMZi28vV/nvXt3YSwEsI/k4uq4
L46n0PX4wgo3ZAdM6mp3Z1+5XYbI+9Z9iBWn1+Pc9rUWlS7YL7C8WoKFXwKBgCmI
w7lp/TpXIf3jVf8SpxFPuiYpqUmErwVUoNSbj+dKr4A1pdEb0iaAc6bBvlCchjCg
q63YcA5q7xjq4F4b9z93H3LAswXrSgKP8SWV4Mrgonw462Q0HlfvcgVMyBuMJ95I
7xnPZuGIsuYA28lsjQWC4Y7tATUKuoKJ66ups7qzAoGBAJyKVY2ZqpkEHlzMixnk
BBKZA9sccokOYWtVtnCxWZYnnG7ElOBvojuLtf+/stvIadnCVe7km6f6J50QcqtH
1g6eTMfEoqkXG5plBlcEbjEv+wAGO9RXCiyYNquUuwjMrgv8dqUpHGXdw6XxxGi6
LTf0HIwHOpMNVVyptpRZoCH/
-----END PRIVATE KEY-----</code></pre></figure>

<p>The public key can be created from the private one, and saved in e.g. alice_rsa.pub, with the command</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>openssl rsa <span class="nt">-in</span> alice_rsa <span class="nt">-pubout</span> <span class="nt">-out</span> alice_rsa.pub</code></pre></figure>

<p>Alice’s public key will look like</p>

<figure class="highlight"><pre><code class="language-text" data-lang="text">$ cat alice_rsa.pub
-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAuvJw7I7VnqS3DKReM1Wj
HRbFXYOvdjOTI/b/7xr4Msjx6R5h/0CqJTgfSrZVGXLYene38JhYMmmv5lzdR4zk
Sq6c5tK8rD6RVycplJiBsvVy87zz+KYSUt6BrJ47ECejtNe7lrfJKoM+VFPoWHNf
wDlAgPbxD9QDfun0+jtdc2UO+WniTiumhElf/vu2WjODPJmlawihYaJwVC/QchzQ
5IEDFiLtohms2+xWecQx1qvsAaXyWXvP9yzY0u1aVDXqqArAblBOH1h2KE0jytwp
oaVh7ZPRJzMcoaP66bevxE2DzkN0FelMc+BvbQy6Sc4C9kOFR1SsKiUp0wS671Tp
8QIDAQAB
-----END PUBLIC KEY-----</code></pre></figure>

<p>Now we have Alice’s key pair in her folder. Let’s do the same for Bob. We move into Bob’s folder and create his key pair, stored in e.g. bob_rsa and bob_rsa.pub, as we did for Alice. After Alice and Bob have their key pair we are done with the 1st step of the procedure.</p>

<h3 id="2-bob-sends-alice-his-public-key">2. Bob sends Alice his public key.</h3>
<p>Let’s move to the 2nd step: Bob must send his public key to Alice so she will be able to send him her message encrypted. We simulate this by copying Bob’s public key file, bob_rsa.pub, in Alice’s folder. From Bob’s folder</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span><span class="nb">cp </span>bob_rsa.pub ../alice/</code></pre></figure>

<p>As soon as a copy of Bob’s public key is in Alice’s folder, the 2nd step of the procedure is complete and we can move to the 3rd: Alice will encrypt her message using Bob’s public key and will send it to Bob.</p>

<h3 id="3-alice-encrypts-her-message-using-bobs-public-key-and-sends-it-to-bob">3. Alice encrypts her message using Bob’s public key and sends it to Bob.</h3>
<p>Bob’s public key can now be used by Alice with OpenSSL to encrypt her message stored in a file, e.g. data.txt, containing sensitive information</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash">Bob Bank Account
userid: 123456
password: 276f8%2<span class="o">=</span>0as<span class="o">}</span>
pin: 4657</code></pre></figure>

<p>In our example the size of the file is only 65 bytes. Alice encrypts the file using OpenSSL and Bob’s public key that she has received from him, e.g. by email, which we have simulated by simply copying the file from Bob’s folder to Alice’s. From Alice’s folder</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>openssl rsautl <span class="nt">-encrypt</span> <span class="nt">-pubin</span> <span class="nt">-inkey</span> bob_rsa.pub <span class="nt">-in</span> data.txt <span class="nt">-out</span> data.txt.enc</code></pre></figure>

<p>Now Alice can send her encrypted message, data.txt.enc. The encrypted message is a binary file whose content doesn’t make any sense and can be decrypted only by Bob using his private key. The RSA encryption algorithm is randomized, and executing again the same command will result in a different ciphertext but when they are decrypted the output will be exactly the same message. If Alice were a real person she would be able to send it to Bob by email. We will once again simulate the sending of the encrypted message by copying it in Bob’s folder. From Alice’s folder</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span><span class="nb">cp </span>data.txt.enc ../bob/</code></pre></figure>

<p>As soon as the encrypted message has been received by Bob, in our simulation when it has been copied in Bob’s folder, the 3rd step is complete. We can move to the 4th and last step.</p>

<h3 id="4-bob-decrypts-alices-message-using-his-private-key">4. Bob decrypts Alice’s message using his private key.</h3>
<p>From Bob’s folder</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>openssl rsautl <span class="nt">-decrypt</span> <span class="nt">-inkey</span> bob_rsa <span class="nt">-in</span> data.txt.enc <span class="nt">-out</span> data.txt</code></pre></figure>

<p>Bob can open the file data.txt containing the original message in plain text that Alice wanted to send to him. We can easily verify that Bob’s decrypted message and Alice’s original message are exactly the same. From the root folder</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>diff <span class="nt">-s</span> alice/data.txt bob/data.txt
Files alice/data.txt and bob/data.txt are identical</code></pre></figure>

<p>The procedure that Alice chose to send her message to Bob, without risking anyone else reading it, is complete. In this example Alice did not use her private or public key. In case Bob wanted to send her feedback, he could use Alice’s public key to encrypt his message, so that only she would be able to decrypt it, using her private key. Both Alice and Bob must keep their private keys in a very safe place. The private key we have just created for them can be used by anyone who has access to it. One way to protect the private key is to encrypt it using an algorithm, e.g. AES-256, with a password so that only the person who knows the password can decrypt the private key and use it. For example, Alice could have made her private key safer by creating it with the following command</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>openssl genpkey <span class="nt">-algorithm</span> RSA <span class="nt">-out</span> alice_rsa <span class="nt">-pkeyopt</span> rsa_keygen_bits:2048 <span class="nt">-aes-256-cbc</span> <span class="nt">-pass</span> pass:wT16pB9y</code></pre></figure>

<p>where wT16pB9y would be Alice’s password. Currently OpenSSL supports only alphanumeric characters for passwords.</p>

<h2 id="hybrid-cryptosystem">Hybrid cryptosystem</h2>
<p>Alice has successfully solved Bob’s problem. She has been able to send him his bank account details in a secure way. Now she wants to send Bob a file, e.g. a jpeg picture that she doesn’t want anyone else to see, and whose size is some KB. Let’s try to encrypt the image on behalf of Alice</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>openssl rsautl <span class="nt">-encrypt</span> <span class="nt">-pubin</span> <span class="nt">-inkey</span> bob_rsa.pub <span class="nt">-in</span> alice.jpg <span class="nt">-out</span> alice.jpg.enc</code></pre></figure>

<p>This time OpenSSL will raise an error</p>

<figure class="highlight"><pre><code class="language-text" data-lang="text">RSA operation error
4294956672:error:0406D06E:,rsa routines:RSA_padding_add_PKCS1_type_2:data too large for key size:rsa_pk1.c:174:</code></pre></figure>

<p>The problem is that the RSA algorithm can be used only to encrypt messages whose size is smaller than the size of the private key that corresponds to the public key used for the encryption. Since Bob’s private key is 2048 bit long, or 256 bytes, his public key cannot be used to encrypt messages that are bigger than 256 bytes. The best option to solve this issue is to use a symmetric algorithm. A symmetric algorithm can use only one key, called a symmetric key, for encryption and decryption. Once a message has been encrypted with the symmetric key, it can be sent, with the symmetric key encrypted using the public key of the recipient, so he or she will be able to decrypt the message. One more reason to use a symmetric algorithm to encrypt a message is that they are three orders of magnitude faster than asymmetric ones. The algorithms used in the symmetric key encryption are different from those used in public-key encryption. The symmetric key algorithms use a key that is based on a pseudo-random value taken from a huge range of possible values. The key is shared only by the two communicating parties. The strength of the algorithm rests in the difficulty of finding the key within a huge key space. The way in which the symmetric key must be created depends on the cryptographic algorithm, also called cipher. One of the most robust ciphers is AES-256, that we have already used to encrypt Alice’s private key. OpenSSL creates the symmetric key, to be used with the AES-256 cipher, from a secret string, in short secret, that can be created and stored in a file. Alice defines a new protocol in which she will create the secret that she will use to encrypt her picture and that she will share with Bob. The system that she is going to use is called a hybrid cryptosystem because it uses public-key and symmetric cryptography together.</p>

<ol>
  <li>Alice creates the secret.</li>
  <li>Alice encrypts the data using the AES-256 cipher and the secret.</li>
  <li>Alice encrypts the secret using Bob’s public key.</li>
  <li>Alice sends the encrypted data and the encrypted secret to Bob.</li>
  <li>Bob decrypts the secret using his private key.</li>
  <li>Bob decrypts the data using the AES-256 cipher and the secret.</li>
</ol>

<p>Let’s implement these steps on behalf of Alice and Bob using OpenSSL.</p>

<h3 id="1-alice-creates-the-secret">1. Alice creates the secret.</h3>
<p>First, Alice creates a secret, e.g. a sequence of 32 random bytes, using a pseudo-random bytes generator provided by OpenSSL. The longer is the sequence of random bytes the more difficult is for an eavesdropper to figure it out. It should never be so short that it could be found simply by brute force. For example a sequence of only 2 bytes (16 bits) can be found in just 2^16 = 65536 attempts. From Alice’s folder</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>openssl rand 32 <span class="nt">-out</span> secret</code></pre></figure>

<h3 id="2-alice-encrypts-the-data-using-the-aes-256-cipher-and-the-secret">2. Alice encrypts the data using the AES-256 cipher and the secret.</h3>
<p>The command will encrypt the image</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>openssl enc <span class="nt">-e</span> <span class="nt">-aes-256-cbc</span> <span class="nt">-in</span> alice.jpg <span class="nt">-out</span> alice.jpg.enc <span class="nt">-pass</span> file:secret <span class="nt">-p</span></code></pre></figure>

<p>and will print the key created by OpenSSL from the secret</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">salt</span><span class="o">=</span>469950DBF6FA435A
<span class="nv">key</span><span class="o">=</span>E94C0C70A8BC662DB270C57B642C010910B65118C97DD37088E84F6DC3627225
iv <span class="o">=</span>B83AB9A6A80D67DFA6B3572EB850EE0D</code></pre></figure>

<p>The AES-256 cipher is a block cipher that encrypts a fixed block of 128 bits of the message at a time with a 256 bits long key. The mode of operation used in the example is Cipher Block Chaining (CBC). The CBC operation mode is a scheme that allows the use of a block cipher to encode strings longer than the block size (16 bytes). The key, created by OpenSSL from the secret, is shown as a result of the encryption with other random parameters, salt and iv. The salt parameter is created by OpenSSL to be added as a suffix to the secret to mitigate directory attacks, when the secret has not been chosen wisely and it could easily be found simply by brute force attack. The iv parameter is the initialization vector used as the content of the first block. It ensures that no information can be extracted by an attacker from messages that may start with some common header. All the parameters, key, salt and iv, are recreated newly every time the command is executed, even if the file and the secret are the same.</p>

<h3 id="3-alice-encrypts-the-secret-using-bobs-public-key">3. Alice encrypts the secret using Bob’s public key.</h3>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>openssl rsautl <span class="nt">-encrypt</span> <span class="nt">-pubin</span> <span class="nt">-inkey</span> bob_rsa.pub <span class="nt">-in</span> secret <span class="nt">-out</span> secret.enc</code></pre></figure>

<h3 id="4-alice-sends-the-encrypted-data-and-the-encrypted-secret-to-bob">4. Alice sends the encrypted data and the encrypted secret to Bob.</h3>
<p>We can simulate the sending of the encrypted data and secret by copying them from Alice’s folder to Bob’s.</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span><span class="nb">cp </span>alice.jpg.enc secret.enc ../bob</code></pre></figure>

<h3 id="5-bob-decrypts-the-secret-using-his-private-key">5. Bob decrypts the secret using his private key.</h3>
<p>From Bob’s folder</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>openssl rsautl <span class="nt">-decrypt</span> <span class="nt">-inkey</span> bob_rsa <span class="nt">-in</span> secret.enc <span class="nt">-out</span> secret</code></pre></figure>

<h3 id="6-bob-decrypts-the-data-using-the-aes-256-cipher-and-the-secret">6. Bob decrypts the data using the AES-256 cipher and the secret.</h3>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>openssl enc <span class="nt">-d</span> <span class="nt">-aes-256-cbc</span> <span class="nt">-in</span> alice.jpg.enc <span class="nt">-out</span> alice.jpg <span class="nt">-pass</span> file:secret <span class="nt">-p</span>
<span class="nv">salt</span><span class="o">=</span>469950DBF6FA435A
<span class="nv">key</span><span class="o">=</span>E94C0C70A8BC662DB270C57B642C010910B65118C97DD37088E84F6DC3627225
iv <span class="o">=</span>B83AB9A6A80D67DFA6B3572EB850EE0D</code></pre></figure>

<p>You can verify that the image in Bob’s folder is exactly the same as the image in Alice’s folder by looking at them or by using the following command from the root folder</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>diff <span class="nt">-s</span> alice/alice.jpg bob/alice.jpg
Files alice/alice.jpg and bob/alice.jpg are identical</code></pre></figure>

<p>It can also be verified that the key, created by OpenSSL from the secret for the decryption, is the same as the key created for the encryption. In case a non valid secret is used, the decryption will fail.<br />
This 2nd protocol enables Alice and Bob to send each other files of any size allowed by the channel, encrypted. Unfortunately it is subject to the man-in-the-middle attack. This is because a message sent over the Internet goes through different routers where a 3rd party, called Mallory in cryptography, can impersonate both Alice and Bob by sending them his public key instead of Bob’s and Alice’s respectively. Alice and Bob can solve this issue by publishing their public keys on a trusted website or by using certificates where their public keys are signed by a trusted 3rd party. The creation of certificates, even if possible with OpenSSL, requires the definition of a certificate authority and is beyond the scope of this post.</p>

<h2 id="message-integrity">Message integrity</h2>
<p>Encrypting a message is almost never enough. What we usually need is also to be sure that nobody can tamper our communications by intercepting our messages, dropping some of them or modifying them even when they are encrypted, without being detected. We need message integrity. The way in which the integrity of a message can be provided is by computing a value using a hash function that takes a message as input and outputs a short string, the digest or fingerprint. If the input message is modified, a hash function will output a complete different value in an unpredictable way. Even if the message space can be much larger than the digest space, the chances of a collision, in which a hash function outputs the same digest from two different messages, are practically negligible. Hash functions are used to support integrity in protocols such as TLS, SSH, IPsec. OpenSSL provides many hash functions such as SHA256, a standard function that hashes long messages into 256-bit digests. In some common use cases, encryption is not needed at all while integrity can be a strong security requirement. As an example, the integrity of a software package downloaded from the Internet can be checked comparing its fingerprint, provided on the website, with the fingerprint computed locally after the package has been downloaded and before it is executed. If the two fingerprints are the same we are assured that the software package has not been modified by an attacker during the transfer. Hash functions are also used to compute the fingerprint of public keys. For example, Bob can provide his public key’s fingerprint to Alice so that it will be easier for her to verify whether her copy of Bob’s public key is the right one. From Bob’s folder</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>openssl dgst <span class="nt">-sha256</span> <span class="nt">-hex</span> <span class="nt">-c</span> bob_rsa.pub <span class="o">&gt;</span> bob.fingerprint</code></pre></figure>

<p>The fingerprint can be verified more easily than the full public key</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span><span class="nb">cat </span>bob.fingerprint
SHA256<span class="o">(</span>bob_rsa.pub<span class="o">)=</span> 7f:98:0e:4f:a7:e4:5d:5f:bb:fb:f5:80:3a:32:b8:7e:2a:23:22:44:c4:da:8c:4d:eb:95:fa:f8:9c:5f:d9:24   </code></pre></figure>

<p>In the following section we will address another important use case in which a hash function is used to digitally sign a file.</p>

<h2 id="digital-signature">Digital signature</h2>
<p>Alice is a journalist and wants to send Bob an article, e.g. a pdf file, being sure than no one else can claim to be the author. Once again she comes up with a protocol that can solve her problem.</p>

<ol>
  <li>Alice creates a one-way hash of a document, Alice’s digest.</li>
  <li>Alice encrypts the digest with her private key, thereby signing the document.</li>
  <li>Alice sends the document, her public key and the signed digest to Bob.</li>
  <li>Bob decrypts Alice’s digest with her public key.</li>
  <li>Bob creates a one-way hash of the document that Alice has sent, Bob’s digest.</li>
  <li>Bob compares his digest with Alice’s to find out if they match</li>
</ol>

<p>If the signed hash matches the hash he generated, the signature is valid. Let’s say Alice wants to send a file, e.g. article.pdf, with her digital signature to Bob.</p>

<h3 id="1-alice-creates-a-one-way-hash-of-a-document-alices-digest">1. Alice creates a one-way hash of a document, Alice’s digest.</h3>
<p>Alice can sign the message choosing one hash function, e.g. SHA-256 . She can create the one-way hash of the message, also known as the digest,  with</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>openssl dgst <span class="nt">-sha256</span> article.pdf <span class="o">&gt;</span> alice.dgst</code></pre></figure>

<p>The content of the digest will be similar to</p>

<figure class="highlight"><pre><code class="language-text" data-lang="text">SHA256(article.pdf)= cb686d3838cba15e5e603b8fa5191759a46227230884e20325abd19fb997f064</code></pre></figure>

<h3 id="2-alice-encrypts-the-digest-with-her-private-key-thereby-signing-the-document">2. Alice encrypts the digest with her private key, thereby signing the document.</h3>
<p>The next step is to encrypt the digest of the hash function, data.dgst, with her private key</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>openssl rsautl <span class="nt">-sign</span> <span class="nt">-inkey</span> alice_rsa <span class="nt">-keyform</span> PEM <span class="nt">-in</span> alice.dgst <span class="o">&gt;</span> alice.sign</code></pre></figure>

<h3 id="3-alice-sends-the-document-and-the-signed-digest-to-bob">3. Alice sends the document and the signed digest to Bob.</h3>
<p>Alice sends the document, article.pdf, with her signature, alice.sign and her public key, to Bob. Bob can verify Alice’s signature of the document using her public key. Again we will simulate the sending of the files by copying them from Alice’s folder to Bob’s.</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span><span class="nb">cp </span>article.pdf alice.sign alice_rsa.pub ../bob</code></pre></figure>

<h3 id="4-bob-decrypts-alices-digest-with-her-public-key">4. Bob decrypts Alice’s digest with her public key.</h3>
<p>From Bob’s folder</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>openssl rsautl <span class="nt">-verify</span> <span class="nt">-inkey</span> alice_rsa.pub <span class="nt">-pubin</span> <span class="nt">-keyform</span> PEM <span class="nt">-in</span> alice.sign <span class="nt">-out</span> alice.dgst</code></pre></figure>

<p>The output, alice.dgst, is Alice’s digest of the document, extracted from her signature of the document.</p>

<h3 id="5-bob-creates-a-one-way-hash-of-the-document-that-alice-has-sent-bobs-digest">5. Bob creates a one-way hash of the document that Alice has sent, Bob’s digest.</h3>
<p>Bob can again compute the hash of the document data.txt using the same hash function SHA-256 that has been used by Alice</p>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>openssl dgst <span class="nt">-sha256</span> article.pdf <span class="o">&gt;</span> bob.dgst</code></pre></figure>

<h3 id="6-bob-compares-his-digest-with-alices-to-find-out-if-they-match">6. Bob compares his digest with Alice’s to find out if they match</h3>

<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>diff <span class="nt">-s</span> alice.dgst bob.dgst</code></pre></figure>

<p>The result of the comparison is</p>

<figure class="highlight"><pre><code class="language-text" data-lang="text">Files alice.dgst and bob.dgst are identical</code></pre></figure>

<p>proving that Alice has signed the document. The signature can not be repudiated and the document can not be changed without compromising the validity of the signature.</p>

<h2 id="conclusion">Conclusion</h2>
<p>We have seen how to use OpenSSL to add some level of security to our communications with the public-key cryptography and the symmetric encryption. As previously cautioned, the protocols we have shown are not completely secure, but they will certainly limit the number of eavesdroppers capable of figuring out the content of your digital assets sent over the Internet. You can get more information on cryptography, algorithms and how protocols can be improved to enhance the security of the communications, by consulting the resources in the references.</p>

<h2 id="acknowledgments">Acknowledgments</h2>
<p>Thanks to Eurydice Prentoulis for proof-reading the text.</p>
<h2 id="references">References</h2>
<ol>
  <li><a href="https://www.schneier.com/books/applied_cryptography/">Bruce Schneier - Applied Cryptography, 2nd Edition</a></li>
  <li><a href="https://wstein.org/ent/">William Stein - Elementary Number Theory: Primes, Congruences, and Secrets</a></li>
  <li><a href="https://crypto.stanford.edu/~dabo/cryptobook/">Dan Boneh, Victor Shoup - A Graduate Course in Applied Cryptography</a></li>
  <li><a href="http://cacr.uwaterloo.ca/hac/">Alfred J. Menezes, Paul C. van Oorschot, Scott A. Vanstone - Handbook of Applied Cryptography</a></li>
  <li><a href="https://www.coursera.org/learn/crypto">Dan Boneh - Cryptography I (Coursera)</a></li>
  <li><a href="http://www.crypto-textbook.com/">Christoph Paar, Jan Pelzl - Understanding Cryptography</a></li>
</ol>]]></content><author><name>Luigi Sellmi</name></author><category term="Security" /><category term="Cryptography" /><summary type="html"><![CDATA[The purpose of this post is to explain how to communicate privately over the Internet using public-key cryptography and how to digitally sign a document. Introduction Being able to communicate privately is a civil right and often a business need. As we can not allow anyone to eavesdrop our communications, we have also the right to avoid surveillance by companies or governments. There are many tools and protocols, many being open source and free, that can be used to enhance the security of our communications over the Internet. The aim of this post is to provide a very high level description of the ideas behind these tools and protocols and practical guidance on how to use one of them, OpenSSL, which is open source, free and used to secure most of the communications over the Internet. In particular in this post we will show How to avoid being eavesdropped while sending files to our friends or collaborators over the internet How to digitally sign a document It is supposed that you are using a Linux distribution or a Mac with OpenSSL version 1.0.2 installed. In case you use Windows you might want to install Cygwin with openssl. It is assumed that you know how to use the command line. Alice and Bob We will set up a context for the secure communication problem using two characters, Alice and Bob. We will simulate the transmission of encrypted messages between Alice and Bob by copying files from Alice’s folder to Bob’s and vice-versa on our local file system. This simulation is meant for you to easily check what happens on both sides when they send or receive messages using OpenSSL, but it must be kept in mind that it bypasses the core business of encryption that is about sending messages over an insecure channel such as the Internet where other parties could eavesdrop or interfere with Alice’s and Bob’s communication. With this warning in mind, let’s start our simulation by creating a folder for Alice’s messages and one for Bob’s $ mkdir alice $ mkdir bob Let’s imagine that Bob can’t remember his bank account details and asks Alice to send them to him by email. Alice is aware that sending the data as plain text over the Internet is risky so she wonders how to send the data to Bob in such a way that nobody else but he can read and use the data. After some investigation, Alice decides that the solution to their problem is public-key cryptography and the OpenSSL tools. Public-key cryptography Public-key cryptography consists of creating a key pair, namely a private key and a public key, to encrypt and decrypt messages. The private key is kept secret and is never shared with anyone. Alice uses Bob’s public key to encrypt the messages being sent to him. Bob uses his private key to decrypt the messages encrypted with his public key. The public key can even be published on the Internet for that matter. Only the owner of the private key can decrypt a message encrypted with his or her public key. There are different ways of creating a key pair but all are based on defining mathematical problems that are very difficult to solve in a short time scale, such as factorizing a number that is the product of two big prime numbers. This class of problems is used in the Rivest-Shamir-Adleman (RSA) cryptosystem. The idea is to find two prime numbers big enough, e.g. with more than 150 digits, so that it would be very difficult even for a cluster of computers to find them out in decades while it is very easy to compute their product. In RSA, the public key is the product of two prime numbers and the private key is the set of the two prime numbers themselves. An eavesdropper who wants to decrypt a message would need to extract the private key, i.e. the two prime numbers, from their product. In other words, the eavesdropper must be able to factorize a number that is the product of two big prime numbers, which in itself is an hard enough problem. Using RSA we can be confident that nobody will be able to decrypt our messages. The algorithm used for the encryption is well known and publicly available. The only thing that is not public, and known only to the owner of the key pair, is the private key. Let’s see what Alice and Bob have to do to keep their communication private: Alice and Bob create their own private and public keys. Bob sends Alice his public key. Alice encrypts her message using Bob’s public key and sends it to Bob. Bob decrypts Alice’s message using his private key. So, first of all, both Alice and Bob need a key pair. 1. Alice and Bob create their own private and public keys. Alice doesn’t yet have a key pair, so she needs to create it. As an example she may use the RSA cryptosystem. Her private key will be stored in a file, e.g. alice_rsa. The size of the private key will be 2048 bit. Let’s move into Alice’s folder and execute the command $ openssl genpkey -algorithm RSA -out alice_rsa -pkeyopt rsa_keygen_bits:2048 The private key in alice_rsa is saved in the Privacy-Enhanced Mail (PEM) format and looks like the following $ cat alice_rsa -----BEGIN PRIVATE KEY----- MIIEvgIBADANBgkqhkiG9w0BAQEFAASCBKgwggSkAgEAAoIBAQC68nDsjtWepLcM pF4zVaMdFsVdg692M5Mj9v/vGvgyyPHpHmH/QKolOB9KtlUZcth6d7fwmFgyaa/m XN1HjORKrpzm0rysPpFXJymUmIGy9XLzvPP4phJS3oGsnjsQJ6O017uWt8kqgz5U U+hYc1/AOUCA9vEP1AN+6fT6O11zZQ75aeJOK6aESV/++7ZaM4M8maVrCKFhonBU L9ByHNDkgQMWIu2iGazb7FZ5xDHWq+wBpfJZe8/3LNjS7VpUNeqoCsBuUE4fWHYo TSPK3CmhpWHtk9EnMxyho/rpt6/ETYPOQ3QV6Uxz4G9tDLpJzgL2Q4VHVKwqJSnT BLrvVOnxAgMBAAECggEBAK0VbDHIqMVh0Ux2HfU/U27KN182Xcx9Qbzpodm5yZQT cc4Y4DhYoW8mP+qHV9DhAMaacwXhtr6uFTqePg1Rx8fRVNlswVxj7WKYkqnObT7I e25pQiSzdYGeGsc8FIkHek0j870+WZTvwFSI/zRtVXh+SVddyqCR9c6aQ8MuFX6Q u9LzGNcYTg6Dmv8qsrXlctkJRvLOuKajaAG3AHT6f5GTXRDDhk9/Ab4h9Dorkgen Z0fg9yLfvlltO6z3z99VHIMlWX4TZ67kGP7L+AqPpzN9Qj9G15h2Blb6InlB5J/q pgKUVGGVC0TTILLjXLUT3xEvrWvpqHIAj80Xejuz0XUCgYEA5CAIZdKyqTrfNKTg B7ZxIX23R/YoqU6SGYDJ/8mz8Z+0PdlCrFb/fTvb2e8aQWSfYTccwhpPH9ZGQ2zL hXImNxPeVPymVYojZ8XQwwsd2KoK1jkuLr7uzcs393P5PB0YoCzB4kvsnlozcyrm +lTz913eHd6BsXoRx8GeRPs3nx8CgYEA0cpRr8+EBYgWCgJ71cpB8BLZ+kZErz5W p9GO5CD/wqaJ+Ljzrvr+XmbCFzDaf/KPTcYeFD7bz9aYq3SSavvnNSQQsHhjuphb CE40eR/fLwubyYhjOdHXjdYrxsI2gF7FyOO25PWx2OLoCqZITBYlaOdgxefQNtlM 1boATrYZJO8CgYBCuX/bUIqDZz3cJxGED//9HMlcGgsAooOnQ/1RfMzOMrlEkeSn hfbKyZRfpUkXsXfQto8J0yorlMAOfqb0zFOTLpOMZi28vV/nvXt3YSwEsI/k4uq4 L46n0PX4wgo3ZAdM6mp3Z1+5XYbI+9Z9iBWn1+Pc9rUWlS7YL7C8WoKFXwKBgCmI w7lp/TpXIf3jVf8SpxFPuiYpqUmErwVUoNSbj+dKr4A1pdEb0iaAc6bBvlCchjCg q63YcA5q7xjq4F4b9z93H3LAswXrSgKP8SWV4Mrgonw462Q0HlfvcgVMyBuMJ95I 7xnPZuGIsuYA28lsjQWC4Y7tATUKuoKJ66ups7qzAoGBAJyKVY2ZqpkEHlzMixnk BBKZA9sccokOYWtVtnCxWZYnnG7ElOBvojuLtf+/stvIadnCVe7km6f6J50QcqtH 1g6eTMfEoqkXG5plBlcEbjEv+wAGO9RXCiyYNquUuwjMrgv8dqUpHGXdw6XxxGi6 LTf0HIwHOpMNVVyptpRZoCH/ -----END PRIVATE KEY----- The public key can be created from the private one, and saved in e.g. alice_rsa.pub, with the command $ openssl rsa -in alice_rsa -pubout -out alice_rsa.pub Alice’s public key will look like $ cat alice_rsa.pub -----BEGIN PUBLIC KEY----- MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAuvJw7I7VnqS3DKReM1Wj HRbFXYOvdjOTI/b/7xr4Msjx6R5h/0CqJTgfSrZVGXLYene38JhYMmmv5lzdR4zk Sq6c5tK8rD6RVycplJiBsvVy87zz+KYSUt6BrJ47ECejtNe7lrfJKoM+VFPoWHNf wDlAgPbxD9QDfun0+jtdc2UO+WniTiumhElf/vu2WjODPJmlawihYaJwVC/QchzQ 5IEDFiLtohms2+xWecQx1qvsAaXyWXvP9yzY0u1aVDXqqArAblBOH1h2KE0jytwp oaVh7ZPRJzMcoaP66bevxE2DzkN0FelMc+BvbQy6Sc4C9kOFR1SsKiUp0wS671Tp 8QIDAQAB -----END PUBLIC KEY----- Now we have Alice’s key pair in her folder. Let’s do the same for Bob. We move into Bob’s folder and create his key pair, stored in e.g. bob_rsa and bob_rsa.pub, as we did for Alice. After Alice and Bob have their key pair we are done with the 1st step of the procedure. 2. Bob sends Alice his public key. Let’s move to the 2nd step: Bob must send his public key to Alice so she will be able to send him her message encrypted. We simulate this by copying Bob’s public key file, bob_rsa.pub, in Alice’s folder. From Bob’s folder $ cp bob_rsa.pub ../alice/ As soon as a copy of Bob’s public key is in Alice’s folder, the 2nd step of the procedure is complete and we can move to the 3rd: Alice will encrypt her message using Bob’s public key and will send it to Bob. 3. Alice encrypts her message using Bob’s public key and sends it to Bob. Bob’s public key can now be used by Alice with OpenSSL to encrypt her message stored in a file, e.g. data.txt, containing sensitive information Bob Bank Account userid: 123456 password: 276f8%2=0as} pin: 4657 In our example the size of the file is only 65 bytes. Alice encrypts the file using OpenSSL and Bob’s public key that she has received from him, e.g. by email, which we have simulated by simply copying the file from Bob’s folder to Alice’s. From Alice’s folder $ openssl rsautl -encrypt -pubin -inkey bob_rsa.pub -in data.txt -out data.txt.enc Now Alice can send her encrypted message, data.txt.enc. The encrypted message is a binary file whose content doesn’t make any sense and can be decrypted only by Bob using his private key. The RSA encryption algorithm is randomized, and executing again the same command will result in a different ciphertext but when they are decrypted the output will be exactly the same message. If Alice were a real person she would be able to send it to Bob by email. We will once again simulate the sending of the encrypted message by copying it in Bob’s folder. From Alice’s folder $ cp data.txt.enc ../bob/ As soon as the encrypted message has been received by Bob, in our simulation when it has been copied in Bob’s folder, the 3rd step is complete. We can move to the 4th and last step. 4. Bob decrypts Alice’s message using his private key. From Bob’s folder $ openssl rsautl -decrypt -inkey bob_rsa -in data.txt.enc -out data.txt Bob can open the file data.txt containing the original message in plain text that Alice wanted to send to him. We can easily verify that Bob’s decrypted message and Alice’s original message are exactly the same. From the root folder $ diff -s alice/data.txt bob/data.txt Files alice/data.txt and bob/data.txt are identical The procedure that Alice chose to send her message to Bob, without risking anyone else reading it, is complete. In this example Alice did not use her private or public key. In case Bob wanted to send her feedback, he could use Alice’s public key to encrypt his message, so that only she would be able to decrypt it, using her private key. Both Alice and Bob must keep their private keys in a very safe place. The private key we have just created for them can be used by anyone who has access to it. One way to protect the private key is to encrypt it using an algorithm, e.g. AES-256, with a password so that only the person who knows the password can decrypt the private key and use it. For example, Alice could have made her private key safer by creating it with the following command $ openssl genpkey -algorithm RSA -out alice_rsa -pkeyopt rsa_keygen_bits:2048 -aes-256-cbc -pass pass:wT16pB9y where wT16pB9y would be Alice’s password. Currently OpenSSL supports only alphanumeric characters for passwords. Hybrid cryptosystem Alice has successfully solved Bob’s problem. She has been able to send him his bank account details in a secure way. Now she wants to send Bob a file, e.g. a jpeg picture that she doesn’t want anyone else to see, and whose size is some KB. Let’s try to encrypt the image on behalf of Alice $ openssl rsautl -encrypt -pubin -inkey bob_rsa.pub -in alice.jpg -out alice.jpg.enc This time OpenSSL will raise an error RSA operation error 4294956672:error:0406D06E:,rsa routines:RSA_padding_add_PKCS1_type_2:data too large for key size:rsa_pk1.c:174: The problem is that the RSA algorithm can be used only to encrypt messages whose size is smaller than the size of the private key that corresponds to the public key used for the encryption. Since Bob’s private key is 2048 bit long, or 256 bytes, his public key cannot be used to encrypt messages that are bigger than 256 bytes. The best option to solve this issue is to use a symmetric algorithm. A symmetric algorithm can use only one key, called a symmetric key, for encryption and decryption. Once a message has been encrypted with the symmetric key, it can be sent, with the symmetric key encrypted using the public key of the recipient, so he or she will be able to decrypt the message. One more reason to use a symmetric algorithm to encrypt a message is that they are three orders of magnitude faster than asymmetric ones. The algorithms used in the symmetric key encryption are different from those used in public-key encryption. The symmetric key algorithms use a key that is based on a pseudo-random value taken from a huge range of possible values. The key is shared only by the two communicating parties. The strength of the algorithm rests in the difficulty of finding the key within a huge key space. The way in which the symmetric key must be created depends on the cryptographic algorithm, also called cipher. One of the most robust ciphers is AES-256, that we have already used to encrypt Alice’s private key. OpenSSL creates the symmetric key, to be used with the AES-256 cipher, from a secret string, in short secret, that can be created and stored in a file. Alice defines a new protocol in which she will create the secret that she will use to encrypt her picture and that she will share with Bob. The system that she is going to use is called a hybrid cryptosystem because it uses public-key and symmetric cryptography together. Alice creates the secret. Alice encrypts the data using the AES-256 cipher and the secret. Alice encrypts the secret using Bob’s public key. Alice sends the encrypted data and the encrypted secret to Bob. Bob decrypts the secret using his private key. Bob decrypts the data using the AES-256 cipher and the secret. Let’s implement these steps on behalf of Alice and Bob using OpenSSL. 1. Alice creates the secret. First, Alice creates a secret, e.g. a sequence of 32 random bytes, using a pseudo-random bytes generator provided by OpenSSL. The longer is the sequence of random bytes the more difficult is for an eavesdropper to figure it out. It should never be so short that it could be found simply by brute force. For example a sequence of only 2 bytes (16 bits) can be found in just 2^16 = 65536 attempts. From Alice’s folder $ openssl rand 32 -out secret 2. Alice encrypts the data using the AES-256 cipher and the secret. The command will encrypt the image $ openssl enc -e -aes-256-cbc -in alice.jpg -out alice.jpg.enc -pass file:secret -p and will print the key created by OpenSSL from the secret salt=469950DBF6FA435A key=E94C0C70A8BC662DB270C57B642C010910B65118C97DD37088E84F6DC3627225 iv =B83AB9A6A80D67DFA6B3572EB850EE0D The AES-256 cipher is a block cipher that encrypts a fixed block of 128 bits of the message at a time with a 256 bits long key. The mode of operation used in the example is Cipher Block Chaining (CBC). The CBC operation mode is a scheme that allows the use of a block cipher to encode strings longer than the block size (16 bytes). The key, created by OpenSSL from the secret, is shown as a result of the encryption with other random parameters, salt and iv. The salt parameter is created by OpenSSL to be added as a suffix to the secret to mitigate directory attacks, when the secret has not been chosen wisely and it could easily be found simply by brute force attack. The iv parameter is the initialization vector used as the content of the first block. It ensures that no information can be extracted by an attacker from messages that may start with some common header. All the parameters, key, salt and iv, are recreated newly every time the command is executed, even if the file and the secret are the same. 3. Alice encrypts the secret using Bob’s public key. $ openssl rsautl -encrypt -pubin -inkey bob_rsa.pub -in secret -out secret.enc 4. Alice sends the encrypted data and the encrypted secret to Bob. We can simulate the sending of the encrypted data and secret by copying them from Alice’s folder to Bob’s. $ cp alice.jpg.enc secret.enc ../bob 5. Bob decrypts the secret using his private key. From Bob’s folder $ openssl rsautl -decrypt -inkey bob_rsa -in secret.enc -out secret 6. Bob decrypts the data using the AES-256 cipher and the secret. $ openssl enc -d -aes-256-cbc -in alice.jpg.enc -out alice.jpg -pass file:secret -p salt=469950DBF6FA435A key=E94C0C70A8BC662DB270C57B642C010910B65118C97DD37088E84F6DC3627225 iv =B83AB9A6A80D67DFA6B3572EB850EE0D You can verify that the image in Bob’s folder is exactly the same as the image in Alice’s folder by looking at them or by using the following command from the root folder $ diff -s alice/alice.jpg bob/alice.jpg Files alice/alice.jpg and bob/alice.jpg are identical It can also be verified that the key, created by OpenSSL from the secret for the decryption, is the same as the key created for the encryption. In case a non valid secret is used, the decryption will fail. This 2nd protocol enables Alice and Bob to send each other files of any size allowed by the channel, encrypted. Unfortunately it is subject to the man-in-the-middle attack. This is because a message sent over the Internet goes through different routers where a 3rd party, called Mallory in cryptography, can impersonate both Alice and Bob by sending them his public key instead of Bob’s and Alice’s respectively. Alice and Bob can solve this issue by publishing their public keys on a trusted website or by using certificates where their public keys are signed by a trusted 3rd party. The creation of certificates, even if possible with OpenSSL, requires the definition of a certificate authority and is beyond the scope of this post. Message integrity Encrypting a message is almost never enough. What we usually need is also to be sure that nobody can tamper our communications by intercepting our messages, dropping some of them or modifying them even when they are encrypted, without being detected. We need message integrity. The way in which the integrity of a message can be provided is by computing a value using a hash function that takes a message as input and outputs a short string, the digest or fingerprint. If the input message is modified, a hash function will output a complete different value in an unpredictable way. Even if the message space can be much larger than the digest space, the chances of a collision, in which a hash function outputs the same digest from two different messages, are practically negligible. Hash functions are used to support integrity in protocols such as TLS, SSH, IPsec. OpenSSL provides many hash functions such as SHA256, a standard function that hashes long messages into 256-bit digests. In some common use cases, encryption is not needed at all while integrity can be a strong security requirement. As an example, the integrity of a software package downloaded from the Internet can be checked comparing its fingerprint, provided on the website, with the fingerprint computed locally after the package has been downloaded and before it is executed. If the two fingerprints are the same we are assured that the software package has not been modified by an attacker during the transfer. Hash functions are also used to compute the fingerprint of public keys. For example, Bob can provide his public key’s fingerprint to Alice so that it will be easier for her to verify whether her copy of Bob’s public key is the right one. From Bob’s folder $ openssl dgst -sha256 -hex -c bob_rsa.pub &gt; bob.fingerprint The fingerprint can be verified more easily than the full public key $ cat bob.fingerprint SHA256(bob_rsa.pub)= 7f:98:0e:4f:a7:e4:5d:5f:bb:fb:f5:80:3a:32:b8:7e:2a:23:22:44:c4:da:8c:4d:eb:95:fa:f8:9c:5f:d9:24 In the following section we will address another important use case in which a hash function is used to digitally sign a file. Digital signature Alice is a journalist and wants to send Bob an article, e.g. a pdf file, being sure than no one else can claim to be the author. Once again she comes up with a protocol that can solve her problem. Alice creates a one-way hash of a document, Alice’s digest. Alice encrypts the digest with her private key, thereby signing the document. Alice sends the document, her public key and the signed digest to Bob. Bob decrypts Alice’s digest with her public key. Bob creates a one-way hash of the document that Alice has sent, Bob’s digest. Bob compares his digest with Alice’s to find out if they match If the signed hash matches the hash he generated, the signature is valid. Let’s say Alice wants to send a file, e.g. article.pdf, with her digital signature to Bob. 1. Alice creates a one-way hash of a document, Alice’s digest. Alice can sign the message choosing one hash function, e.g. SHA-256 . She can create the one-way hash of the message, also known as the digest, with $ openssl dgst -sha256 article.pdf &gt; alice.dgst The content of the digest will be similar to SHA256(article.pdf)= cb686d3838cba15e5e603b8fa5191759a46227230884e20325abd19fb997f064 2. Alice encrypts the digest with her private key, thereby signing the document. The next step is to encrypt the digest of the hash function, data.dgst, with her private key $ openssl rsautl -sign -inkey alice_rsa -keyform PEM -in alice.dgst &gt; alice.sign 3. Alice sends the document and the signed digest to Bob. Alice sends the document, article.pdf, with her signature, alice.sign and her public key, to Bob. Bob can verify Alice’s signature of the document using her public key. Again we will simulate the sending of the files by copying them from Alice’s folder to Bob’s. $ cp article.pdf alice.sign alice_rsa.pub ../bob 4. Bob decrypts Alice’s digest with her public key. From Bob’s folder $ openssl rsautl -verify -inkey alice_rsa.pub -pubin -keyform PEM -in alice.sign -out alice.dgst The output, alice.dgst, is Alice’s digest of the document, extracted from her signature of the document. 5. Bob creates a one-way hash of the document that Alice has sent, Bob’s digest. Bob can again compute the hash of the document data.txt using the same hash function SHA-256 that has been used by Alice $ openssl dgst -sha256 article.pdf &gt; bob.dgst 6. Bob compares his digest with Alice’s to find out if they match $ diff -s alice.dgst bob.dgst The result of the comparison is Files alice.dgst and bob.dgst are identical proving that Alice has signed the document. The signature can not be repudiated and the document can not be changed without compromising the validity of the signature. Conclusion We have seen how to use OpenSSL to add some level of security to our communications with the public-key cryptography and the symmetric encryption. As previously cautioned, the protocols we have shown are not completely secure, but they will certainly limit the number of eavesdroppers capable of figuring out the content of your digital assets sent over the Internet. You can get more information on cryptography, algorithms and how protocols can be improved to enhance the security of the communications, by consulting the resources in the references. Acknowledgments Thanks to Eurydice Prentoulis for proof-reading the text. References Bruce Schneier - Applied Cryptography, 2nd Edition William Stein - Elementary Number Theory: Primes, Congruences, and Secrets Dan Boneh, Victor Shoup - A Graduate Course in Applied Cryptography Alfred J. Menezes, Paul C. van Oorschot, Scott A. Vanstone - Handbook of Applied Cryptography Dan Boneh - Cryptography I (Coursera) Christoph Paar, Jan Pelzl - Understanding Cryptography]]></summary></entry></feed>