I wanted to name this as “Software Cannibalism” but then I found out people are using a more friendly name: “Software is Eating The Software“
TLDR: deep learning, NLP, language models, F@#$ Programming
In this post I explain my views about why programming might be disrupted and how.
If we think of programming as a language between humans and computers, we can see that this language has evolved through time. From punch cards, to ForTran, C, C++ and many more: Java, Python, Rust, Go, Groovy, Julia, Kotlin, Wtf, etc.
Actually “Wtf” is not a programming language, but could sound like one. we have BrainFuck, why not “Wtf”?
But these languages are for computer programmers and experts. The cost of training a programmer is high and creating a new language which encompasses all other languages might be the worst idea.

Programming languages have a huge amount of overlapping concepts where just the syntax and naming is different. For example if you learn an object oriented programming language, you learn the next one a lot easier as the concept is the same with minor changes in implementation. An example of these overlaps can be found in ORMs like Django, SqlAlchemy and Ruby on Rails where the same concept is used similarly in different languages.
So here comes the big question:
Why not Use Natural Language as a Programming language?
Why not just use plain english as a language between humans and computer? Maybe we could not do it efficiently in the past. But we might be able now. I will explain later in this post.
Programming is cool until you learn it. After that, you just code and if you bump into problems, you search Stackoverflow, and it is not much cool. So just automatically open stackoverflow when an error happens? Great idea and this meme:

So the problem definition is this:
How to use plain language as a programming language?
Approaches to solve this problem:
- Translation (sequence to sequence)
Just like when we translate english to french.
Instead, we can see it as: translate english to python - Abstract Syntax Tree generation (sequence to graph)
We can generate AST of code instead of text. This can be a good idea as graphs might have better inductive bias. - Semantic search engine on code
We have good search engines for text. We can build a search engine where you search and it finds the code that matches your needs.
In the rest of this post I try give you a sense why recent developments in technology might bridge the gap between natural language and programming.
Language Models
If we are gonna use natural language as a programming language, we should have tools to process it. We are seeing better and better language models being developed every day. These language models are able to learn and hold context in long sequences and even generate plausible articles.
You can check Open-AI blog post to learn about their capabilities: Better Language Models
For more technical details, check this video from Alec Radford (also from Open-AI): Language Models (Deep Unsupervised Learning course spring 2020 Berkeley)
Working with Graphs (AST)
Syntax tree is a good representation of source code and some people are working with graph representations of source code (graph generation, graph embedding, sequence to graph, graph to sequence, etc.).

Also, there are some tools being developed. Babelfish, a library developed by Source{d}Tech startup, tries to parse any programming language and produce a Universal Abstract Syntax Tree(UAST) link. So if we can develop such a universal syntax tree, we might be able to generate code in one programming language and efficiently translate it to another.
Facebook has been working on this too. To some degree. I just quote:
We created Aroma, a code-to-code search and recommendation tool that uses machine learning (ML) to make the process of gaining insights from big codebases much easier.
…
With the advances in this area, we believe that programming should become a semiautomated task in which humans express higher-level ideas and detailed implementation is done by the computers themselves.
source: Aroma: Using machine learning for code recommendation
Github
Github has direct access to tons of open source code and their business is code. So no surprise that they have some insights.
They have worked on building semantic code search. link
And a presentation by a github engineer is amazing and direct to the point:
Code Encodings: Bridging the Gap Between Natural Language and Code. link


TabNine
I love this one. It is based on GPT-2 language model and helps the programmer by suggesting code completions. These suggestions are somewhat more intelligent than the usual IDE suggestions. link
TabNine is created by Matthias Fey also the creator of pytorch geoemtric
Final Thoughts
It takes a lot more to build a system that can dominate programming languages. But I believe it can happen in the near future.
There are other aspects too. If I build this technology, would it be a profitable business? Would it be so good to replace traditional programming? Or it might be an incremental improvement in shape of an AI programming-assistant? Or it will lose out to AI assistants like Siri and Alexa?
Personally, I believe it would be something huge. If I built such thing, I could attack other sectors like personal assistants. Enabling people to program computers at 10X speed without much programming skill could unleash a lot of potential. What do you think?
