If you worked in the software development industry, sooner or later you will face projects where you need to transfer part of functionality from one programming language to another. Sometimes the whole projects are translated from one programming language to another. These are expensive endeavors. There is a famous example of how Bank of Australia spent around $750 million and 5 years of work to convert its platform from COBOL to Java. Basically, translating functionality from one language to another is not easy. For big projects, you need to be experienced in both languages. Of course, there are a number of tools that can help you with this, in fact, some of these tools are integrated as a part of some programming languages.
In a nutshell, a Cross-lingual Language Model (XLM) is pretrained with a masked language modeling objective on monolingual source code datasets and as a result, pieces of code that express the same instructions are mapped to the same representation, regardless of the programming language. XLM is used to initialize the TransCoder model. However, this on its own is not enough, because the decoder part of the transformer architecture requires additional attention parameters which are initialized randomly. That is how the first part of unsupervised training (initialization is done). The second part of this training, i.e. language modeling is done by training the model to encode and decode sequences with a Denoising Auto-Encoding (DAE) objective. This means that model is trained to predict a sequence of tokens given a corrupted version of that sequence. Corruption of the sequences is done by randomly masking, removing and shuffling input tokens.
In the end, the authors used Back-translation. The previous two steps of the training would be enough for this model, but in order to increase the quality of generated code authors added this step. In general, this process two models are trained: source-to-target and target-to-source. The purpose of target-to-source model is to generate noisy translations of the source language from the target language. These generated sequences of noisy codes are then used to train the source-to-target model. The two models are trained in parallel until convergence.
By now you have probably seen the results of this paper somewhere on the web. It is truly amazing how the solution proposed in it transforms sketches into face images. This is a very attractive field since it’s applications are many from character design to criminal investigations. In general, it would be cool to have such drawing assistance at your disposal. Thus far similar solutions used sketches as hard constraints, which didn’t always give good results. This is why the authors of this paper suggest the solution that is utilizing recent advances in image-to-image translation and with that use sketches as soft constraints to guide image synthesis. Basically, they form loose points from the sketch and then use deep learning to “fill” missing parts.
The solution relies heavily on some recent advances in deep learning, especially from conditional face generation. To be more precise the authors relied on Condition GANs and pix2pix principles for the image synthesis parts of the architecture. Apart from that, data preparation for this architecture is a bit specific, but it also provides aimed flexibility. The authors couldn’t use datasets with sketches, like CUHK face sketch database, because these contain shading effects, which authors wanted to avoid. They have built dataset form face image data of CelebAMask-HQ, which contains high-resolution facial images with semantic masks of facial attributes and processed with holistically-nested edge detection, APDrawingGAN and Photoshop’s Photocopy filter.
This paper is exploring the understanding of 3D structure on the images which is a challenging, but an integral part of many computer vision applications. In fact, the authors consider this problem under two challenging conditions. The first one is that there are no 2D or 3D ground truth about images, hence the unsupervised learning term in the headline and the second one is that model should not require multiple views of the same instance. Essentially, their main goal is to create a deep learning model that can output 3D shape of any instance given a single image of it, while keeping the unsupervised spirit.
To do so authors created an autoencoder based structure that splits the image into albedo, depth, illumination and viewpoint component. However, as expected, this is not enough and the model has to has some assumptions about the image. This is done in a really cool manner, meaning the model creates these assumptions on its own. One of the most important assumptions is the symmetry of the image. It does so by creating a dense map that contains the probability that a given pixel has a symmetric one. Note that here we talk about Bilateral symmetry, meaning that opposite sides of the image are similar but not identical. The whole thing is done by modeling asymmetric illumination and creating a confidence score that explains the probability of the pixel having a symmetric counterpart in the image for each pixel of the image.
This content was originally published here.