A Forlorn Hope of Fortran Modernisation · Amen Zwa, Esq.

Source: https://amenzwa.github.io/stem/PL/FortranModernisation/

_a proposal for a dependently typed Fortran_

[TOC]

For decades, people in IT had taken delight in drafting Fortran’s obituary. Yet, this old language lives on. But in recent years, the Fortran user community has begun sounding alarms: Fortran shops are having difficulty finding young programmers to replace those who are leaving the workforce, because the young are not willing to devote their careers to this archaic language. At present, no language can rival, let alone surpass, Fortran when it comes to implementing long-lived, large-scale, massively-parallel scientific and engineering applications; not even C and C++. Yet, modern programmers know nothing about Fortran, nor have they any interest in it. Suffice it to say, Fortran has an image problem.

In this article, I explore the causes of Fortran’s diminished popularity and discuss potential remedies. The key points I make here are these:

- Fortran is indispensable for scientific parallel computing

- The industry is facing a shortage of Fortran programmers

- The industry has no actionable plans to replenish the ranks

My intended audience include the following groups:

- Computer scientists maintaining the Fortran language standard

- Scientists and engineers who implement scientific software using Fortran

- STEMers interested in parallel processing and scientific programming

Given the breadth and depth of topics involved, the reader is expected to be an experienced parallel programmer in both procedural and functional languages and possess a working knowledge of simple type theory, parametric type theory, and dependent type theory.

It would seem that trying to shore up this mid-century language for the grind of the 21st Century verges on insanity. Not so. I contend that Fortran modernisation is worthwhile and necessary. I admit, though, that refreshing Fortran for the 2020s is but a forlorn hope, at present.

Before we delve into the subject, here is some background on my connection to Fortran. Like other electrical engineering undergraduates in the early 1980s, I learned FORTRAN in college. I used FORTRAN 1977 on the DEC VAX-11/780. FORTRAN was then the primary language for performing electronic circuit simulation, digital signal processing (DSP), finite element method (FEM), computational fluid dynamics (CFD), and other engineering computations, so we were required to learn the language.

But in those days, most of us STEMers tried to learn every programming language we could get our hands on. There were only a few languages in popular use in science and engineering: LISP, FORTRAN, C, Pascal, ML, and Prolog; good compilers for these languages were available only on minicomputers; and the only place where we could gain access to these large, expensive machines was at the university computer centre. Scarcity creates demand, I suppose. I did most of my work in my preferred languages: LISP, C, and ML. So, my FORTRAN experience was limited to class assignments.

Later, in computer science graduate school, my research work led me back to Fortran 90 on a CRAY Y-MP with an attached T3D. My professional association with Fortran ended, when I left academia for industry in the mid 1990s. Even in those days, no one in IT was using Fortran any longer. But out of personal interest, I kept up with Fortran’s evolution, through the years. So, my views presented here are born of dated hands-on experience with older FORTRAN, present awareness of modern Fortran’s predicament, and decades-long hands-on experience with numerous programming languages, both modern and ancient. And I admit that although I am no fan of Fortran at a practical level, I truly admire Fortran at an intellectual level, for its originality, longevity, and history.

Now, let us crack on.

_yesterday_

The concept of modern computer was conceived in the minds of mathematicians in the late 1930s. Then, in the mid 1940s, the implementation of modern computer was realised in the hands of electrical engineers. Initially, programming was done by wiring up the circuits. This was followed by entering binary words into registers using hardware switches. Later, symbolic assembly languages were invented. And in 1957, Backus, a computer scientist extraordinaire, and his colleagues at IBM created the world’s first high-level programming language, FORTRAN. Its purpose was to enable scientists to implement scientific applications using their native tongue, the mathematical notation, instead of in a foreign tongue, the assembly language.

Over the past 70 years, Fortran continued to evolve, incorporating the advances through the decades. There have been numerous standardised versions of Fortran: 1966, 1977, 1990, 1995, 2003, 2008, 2018, and 2023. The latest standard version as of this writing in early 2024, the Fortran 2023 (draft), is a thoroughly modern language with strong static type system, user-defined types, tail-call eliminating recursions, objects, modules, garbage collection, and built-in partitioned global address space (PGAS) parallel programming to boot. To date, Fortran is the only standardised language with built-in parallel processing facilities. Despite its age, despite the decades of accretion, Fortran remains essentially a simple language that non-programmers could quickly learn to use. In that sense, Fortran is easy to use, like Python. But unlike Python, Fortran is fast—faster than even the mighty C—when it comes to scientific parallel computing.

Today, modern Fortran is still the primary language of choice in large-scale, massively-parallel applications which are the bread-and-butter of scientific and engineering computations. A few elite engineering colleges round the world still teach Fortran to their engineering undergraduate students, even if their computer science students may have never heard of the language. There simply is no modern language that can compete with Fortran, when it comes to high-performance parallel scientific computing on supercomputers. The same is true of modern Cobol. Although no undergraduate students would ever learn it in college, many large financial institutions still rely heavily on Cobol for handling massive amounts of transactions. No modern language can compete with Cobol, when it comes to high-throughput real-time transaction processing on mainframes.

The first generation high-level programming languages emerged in the late 1950s and early 1960s: FORTRAN (1957) the first high-level language specifically designed for scientific computing; LISP (1958) the first functional programming (FP) language designed for symbolic computing and automate theorem proving; ALGOL (1958) the first procedural programming (PP); COBOL (1959) the first purpose-built business computing language; Simula (1962) the first object-oriented (OO). These OG (original gangsta) languages live on in one form or another. Modern FORTRAN is Fortran 2023. Modern LISP is Common Lisp 2005. Modern ALGOL is essentially all modern PP languages, the exponent of the lot is C 2023. Modern COBOL is Cobol 2023. Modern Simula is all modern OO languages, the most popular one being C++ 2023.

LISP was immensely popular during the heyday of rule-based artificial intelligence (AI) in the 1970s and 1980s. Today, its best known use is as the scripting language for the inimitable Emacs editor. And being the first FP language, all modern FP languages, like OCaml and Haskell, are its spiritual descendants. ALGOL never escaped academia, but its influence is seen in the design of all PP languages that followed, including Pascal, C, and other modern system programming languages like Rust, Odin, and Zig. C is still being used heavily in system programming—compilers, libraries, operating systems, etc. Its most famous use is in the implementation of the Linux kernel. Simula, too, was primarily an academic research language. But it heavily influenced Smalltalk and C++. Smalltalk, in turn, was influential to Objective-C and Java. C++ combined C’s efficiency and Simula’s objects, thereby injecting many useful OO concepts into the traditionally PP domain of system programming. Today, C++ is used in almost every application domain where speed is essential. Hence, LISP, ALGOL, and Simula live on, albeit in quite different guises.

FORTRAN and COBOL, however, have remained essentially unchanged for almost seven decades. Their standard committees opted to maintain full backward compatibility. For example, Fortran 2023 compiler can still compile the FORTRAN 1977 code—without modification!

Note that the original stylised name “FORTRAN” changed to the modern form “Fortran” with the publication of the Fortran 1990 standard. By the way, FORTRAN stands for “formula translator”, which hints at its mathematical lineage.

The advantage of backward compatibility is dependability, both in terms of longevity and of reliability. Most bugs had been eradicated over time, and the institutions can depend on the continued existence of these long-lasting languages. The disadvantage of backward compatibility, of course, is that newer concepts that had emerged decades after the birth of these old languages had to be shoehorned into their designs, yielding a rather ectopic feel, syntactically and semantically. As such, linguistic extensions for these languages, like modules and objects, though modern, feel awkward and dated, nonetheless. Consequently, FORTRAN and COBOL look and feel stale, to modern eyes.

For the remainder of this section, the discussions will focus on Fortran. But the arguments and the conclusions proffered apply equally to Cobol, by analogy.

_today_

There is no denying that modern Fortran suffers from the image problem. Young programmers entering IT rightly perceive this pioneering language as old. That perception reflects the reality. But these youngsters wrongly assume that this old language had remained in its ancient, infantile form, that there is no future for them in Fortran, and that they should learn only the newest language currently popular in the industry. That assumption is not merely incorrect, it is the opposite of the reality.

IT became a speciality in the 1950s, with the advent of modern digital computers. But modern IT, as we now know it, did not come about until the 1970s, when minicomputers from DEC, Data General, IBM, and other manufacturers became affordable, due to the widespread use of integrated circuits (ICs). This rapid expansion of IT coincided with the rise of PP languages. C was introduced in the early 1970s. And by the early 1980s, it had established itself as the dominant language in the industry, and its reign continued well into the late 1980s. C++ eventually wrested away the crown in the early 1990s. Java emerged almost overnight less than a decade later, and took the top spot by the early 2000s. The 2010s saw the meteoric rise of JavaScript, fuelled by Web 2.0. Today, in the 2020s, Python is king, due to its popularity in machine learning and data science. It appears that the reign of a popular industrial language is about as long as that of a Roman emperor.

This is in stark contrast to Fortran’s trajectory. Since the rise of C more than 50 years ago, the popularity of Fortran had been in steady decline. This decline accelerated in the mid 1980s, when the free software movement became mainstream and quality compilers for many new languages became freely available. Also, the rise of home computers around this time hatched throngs of new programmers who were seeking new languages to learn. And even at its peak of popularity, Fortran occupied a narrow, albeit deep, specialised domain of scientific computing. Those specialised applications were inseparable from supercomputers, which were inaccessible to all but the most exclusive of universities, the largest of corporations, and the best funded of government agencies. The combination of these factors conspired to deplete Fortran’s popularity. Yet, it remains firmly entrenched in scientific computing.

And therein lies the rub: those organisations that depend on Fortran can no longer find young programmers who are willing to devote their careers to this old language that offends their delicate, Pythonic sensibilities. Also, most programmers today have never seen a supercomputer; colleges no longer operate traditional computer centres, because the schools’ IT needs are now served by the cloud and all students now own laptops, tablets, and mobiles. These factors militate against the students’ exploration into the Fortran ecosystem. Consequently, the organisations that are relying on Fortran for their mission can no longer find new programmers to replace the dwindling population of aging programmers, who are about to leave the workforce, or had left it.

_tomorrow_

If the decline of Fortran’s popularity continues to accelerate, soon there will no longer be enough programmers to maintain the massive amounts of existing mission-critical code. The “tomorrow” in the section title is no hyperbole; this problem is imminent.

And despite the industry leaders’ increasing concerns about this problem for the past three decades, there had been no industry-wide concerted effort to remedy it. Indeed, there are no actionable plans in existence. Transitioning away from an entrenched, long-lasting language like Fortran would take at least a couple of decades. Hence, this situation is insupportable.

There are only a handful of possible courses of action, but ignoring the problem and maintaining the status quo is not a sensible choice:

- Manually translate Fortran into a newer, popular language

- Automatically translate Fortran into a newer, popular language

- Embrace Fortran by systematically promoting it in academia and in industry

- Target Fortran from existing modern languages

- Thoroughly modernise Fortran and shed all emotional attachments to the days of yore

We shall now explore these options, in depth.

OPTIONS

This document was written with a design to promote discussion, but it is not a design document for a new Fortran. In other words, the recommendations given here are fragmentary ideas, not firm dicta. Indeed, there may well be ambiguities in the grammar of the syntax presented below. My aim here is not to be precise, but to lay the foundation upon which to think, talk, and transmit concerns about the looming gloom of Fortran code rot.

_translate manually_

It is possible to translate manually a Fortran programme into a modern language, say C++, Java, Python, or something else. After all, an algorithm is an algorithm, whatever the implementation language.

But such a manual translation is not only uneconomical, it is downright perilous. Most Fortran programmes in use were first implemented decades ago, and the scientists and engineers who designed the programme had long left the workforce, and with them went the institutional knowledge of requirements and designs. Such programmes grew out of someone’s research or experiment, and were never designed to grow so large and last so long. As such, the code and the system are usually undocumented. Given the specialist nature of their domain, old-school Fortran programmers are an insular lot; they do not know, or if they do know are not proficient in, newer languages and libraries necessary to reimplement those massive, complex, parallel scientific applications on modern computing infrastructures. Conversely, most C++ and Java programmers have never seen a piece of Fortran code. And a typical Fortran code is highly optimised for parallel execution, but modern programmers have no experience with parallel computing. Moreover, modern languages do not have standardised, built-in parallel processing facilities. As such, the lead programmer must research, experiment, and select one parallel API from among the many: MPI, OpenMP, OpenCL, OpenACC, Vulkan, CUDA, Metal, and countless others, not to mention the implementation language itself. Also, parallel code is usually encrusted with layers of data distribution and control synchronisation protocols that obscure the underlying algorithm. These algorithms—even in their simpler, sequential form—are so specialised and sophisticated that IT coders, who do not possess the requisite science and engineering background, have no chance of comprehending.

Hence, the only people who could understand large, complicated Fortran codebase are those old scientists and engineers who are on the verge of departing the workforce. But in the twilight of their careers, they have no impetus to switch to newer, less-capable languages which would only hinder their work. And the only people who could translate old Fortran programmes into newer languages are those typical IT coders, who have no desire to devote a decade or more of their careers trying to master Fortran. This is a classic Catch-22.

_translate automatically_

A more realistic alternative to manual translation is automatic translation by leveraging existing Fortran compilers. For instance, the ever-popular GNU Fortran compiler, a free software to boot, can translate FORTRAN 1977 code into C. However, although it can compile modern, parallel Fortran code, it cannot translate that newer code into C, mainly because C is an unashamedly sequential language. Hence, even the state of the art compilers can convert automatically only the old, sequential FORTRAN 1977 code.

The other possible path is to use a large language model (LLM), which was first published in 2017. In late 2022, when ChatGPT was released, it shook the whole of IT and the world at large. Today, IT is abuzz with LLMs: code generation, automated chat, text summarisation, spreadsheet analysis, image annotation, and the like. And of late, LLMs have all but swamped the field of academic AI research. At present, they are arguably the most capable, and the most adaptable, machine learning algorithms for text manipulation.

There are many LLMs that specialise in programming languages. They are trained on trillions of lines of code written in tens of thousands of languages. But even the most advanced LLMs could do no more than generate isolated code snippets of known algorithms. There are also some LLMs that are able to translate code snippets into a variety of different languages. But they, too, are limited to translating only small amounts of code. As of today, the most capable LLM translators falter when presented with a few thousand lines of code in any language they purportedly know. And they know nothing of parallel programming.

Beyond all that, there is a fundamental weakness of LLMs that may prove to be difficult, if not impossible, to overcome. Like all neural networks, nondeterminism is innate to LLMs. Moreover, LLMs are prone to hallucinations, which maybe amusing in informal business communications but unacceptable in life-critical engineering applications. The bottom line is that there is no direct statistical measure for assessing the accuracy of LLMs’ code generation and translation capabilities. Hence, even if LLMs miraculously gained the ability to translate parallel Fortran code into some new language, there is no mathematical means to assess the semantic fidelity of the translated code. And it is a well known fact that, when life and limb are at risk, automated testing is wholly inadequate to offer $100 \%$ guarantee of correctness demanded by this application domain.

It may well be that in the future, AI could perform machine translation of programmes. But that may take a few decades, yet. By then, it is unlikely that machine translation of high-level languages would be necessary at all, because AI programmers do not need high-level programming languages. It would then be economically nonsensical to employ human programmers. So, if AI were to be used to program machines in the future—a distinct possibility—it is an eminently more sensible use of resources to focus on creating AI that programs the hardware directly in hardware-specific binary codes, or perhaps even in hardware description languages (HDLs). So, AI translation of Fortran is also a Catch-22.

_embrace fortran_

When I was an undergraduate engineering student a very long time ago, Fortran was the only programming language we were required to learn. Indeed, it was the only language scientists and engineers used in those days. Soon, C took over large swathes of Fortran’s traditional hunting ground. And today, the majority of scientists and engineers use Python, exclusively. Meanwhile, Fortran continued to evolve quietly, incorporating modern programming language concepts and, most significantly, parallel programming facilities collectively known as coarray. Coarray Fortran (CAF) originated at CRAY in the early 1990s, and it was eventually incorporated into the Fortran 2008 standard.

The coarray parallel programming facilities offer an unparalleled, as it were, comfort and convenience in implementing massively parallel code. No other language, to date, has managed to offer such a cultured parallel programming experience. Coarray has distinct advantages over other methods: it is syntactically and semantically bound to Fortran, a language specifically designed for high-performance scientific computing, a language that caters to the way scientists and engineers think about computing, a language that has been in sustained use for seven decades in scientific computing. Also, CAF employs the partitioned global address space (PGAS) model of parallel programming. Of the many parallel programming models, PGAS is generally considered to be one of the most natural and intuitive to use. Thus, CAF is not only natural for programmers to use, it also enjoys field-proven reliability and stability. It is no surprise that Fortran continues to thrive today in scientific parallel computing.

But as I explained above, Fortran’s popularity in broader academia and industry faded a long time ago. That the scientific computing community has ignored the impending problem of Fortran programmer shortage for such a long time is regrettable. The simplest and the most realistic way to remedy this problem is to reintroduce Fortran into broader STEM undergraduate curricula.

At the moment, however, Python occupies that sacred spot in STEM education. Indeed, it would be sacrilegious to foist the crusty old Fortran upon the STEM undergraduates, when most of them had already achieved familiarity with Python, even back in high school. It does not help Fortran’s cause that it is used primarily on supercomputers and supercomputers are inaccessible to most universities.

But there is a glimer of hope. All computers, including the mobile phones, are multicore, shared-memory machines, these days. That is, they are parallel computers. But these powerful parallel machines currently run single-threaded applications written in sequential programming languages, because parallel programming has not caught on in IT. It takes considerable knowledge, skills, guile, and effort to implement parallel code using the current stock of industrial languages. Even though no web developer would ever take up Fortran, it is not inconceivable that some science and engineering undergraduates could be coaxed into using CAF—on their GPU-equipped, multicore laptops say—to implement and train large, deep-learning AI models with greater efficiency, especially given that their code would run much, much faster on supercomputers, with no modifications at all. And the same educational initiatives could be pursued in the industry, especially with the backing of current power users of Fortran: nVIDIA, IBM, NASA, NOA, DoD, and the like.

Put crudely, this approach is nothing more than a marketing campaign to restore Fortran’s tarnished popularity to its former lustre. But it is as easy to accomplish as it is crude. All it would take is a few years. Moreover, familiarity with Fortran’s parallel processing concepts will help young programmers immensely in their future career where the only type of computing resources available will be multicore, multiprocessor machines, be they on the wrist or in the cloud. This could be a win-win scenario, both for the Fortran shops and for the young programmers—if the marketing campaign can convince the youngsters to give Fortran a change against the tide of six-figure Python jobs. That, then, is a big “IF”.

_target fortran_

Today, a common practice in programming language design is to target “low-enough” high-level languages, instead of the machine language. The first C++ compiler from the 1980s, called the AT&T Cfront, transpiled C++ down to C, then invoked the platform’s C compiler to produce the native binary. Nim, a new PP language, also transpiles down to C. Elm, the effervescent, purely-functional, web front-end language, transpiles down to JavaScript, the assembly language of the Web. So, it is in keeping with the current industry trends to transpile down to Fortran from newer, popular languages.

It is important to note, though, that transpilation is unidirectional; it only goes from higher abstraction languages to lower abstraction languages. That is, down-transpilation is abstraction lossy: transpiling a low-abstraction language, like C, up to a high-abstraction language, like Haskell, is a practical impossibility, whereas transpiling Haskell down to C is far simpler by comparison. One way to mitigate abstraction loss is to inject metadata into the generated low-level code from which the high-level abstractions can be recovered automatically. But those techniques are neither effective nor efficient. However, it is straightforward to translate between languages that are semantically proximate, say between C and Pascal, between Java and C#, or between OCaml and Reason. Fortran is a low-abstraction language, compared to modern FP languages. And Fortran’s application domain, scientific computing, is intensely mathematical. Hence, existing FP languages with a mathematical bent—Haskell, OCaml, F#, etc.—could target Fortran with relative ease. In fact, these languages already target C and JavaScript, so down-transpilation is not a foreign practice to them. Programmers always favour languages with elegant syntax and cogent semantics. So, they would have no objections to learning popular FP languages that can target Fortran, even if they have to work within the mouldy, old Fortran ecosystem.

A more practicable approach, though, is to leverage Python, the language all scientists and engineers use, a language all junior coders in IT know, a language all high schoolers adore. Python already targets C and JavaScript, so adding a Fortran target is no burden. More importantly, Python’s most popular scientific computing libraries (NumPy, SciPy, etc.) invoke, via foreign function interface, highly optimised low-level libraries written in Fortran, such as LAPACK and BLAS. Instead of a piece of slow, interpreted Python code calling out to fast, compiled Fortran binaries here and there, the Python code can be transpiled into Fortran, and the whole application becomes a fast, compiled Fortran binary. Integrating PGAS into Python and transpiling the parallel Python code directly into parallel Fortran PGAS code would be a win-win situation.

_modernise fortran_

Throughout its long existence, Fortran has staunchly adhered to those original design concepts and philosophies that gave it life way back in the 1950s. Indeed, all modern Fortran compilers can still compile FORTRAN 1977 code. This level of backward compatibility is a remarkable achievement on the part of those who curated and guided this long series of language standards, through the years.

But much has changed in computing, since 1957. As explained above, backward compatibility is a double-edged blade with a needle-sharp point. Fortran’s single-minded pursuit of backward compatibility extended its longevity: despite being the first ever high-level programming language, it is still being used actively, today. But this design choice also made Fortran verbose and stale. In the early days, before the emergence of the programmer caste, English-like verbosity was valued for its perceived comprehensibility. Back then, no one really knew what high-level programming languages were supposed to look like. Also, hardware limitations constrained programmes to be no larger than a few hundred lines of code, making them essentially one-time-use tools. But today, when projects routinely exceed the million-line mark and last a few decades, verbosity severely diminishes comprehensibility and maintainability. So, the verbose nature of Fortran makes it unpalatable to modern programmers, in whose eyes modern Fortran is anything but modern.

Despite the successive standards’ attempts to modernise Fortran and to expand its application domain, there is no denying that Fortran is not a general-purpose language (GPL), a system language, an enterprise language, a web development language, nor a mobile development language; it is a scientific programming language, through and through. Indeed, it is the only standardised programming language with parallel programming facilities baked into the syntax. Fortran is, thus, the ultimate scientific domain-specific language (DSL).

All modern software development projects employ several different programming languages, each put to a particular purpose. For instance, the following specialisations amongst the programming languages is standard practice in any large software development project, today: use HTML augmented with JavaScript or TypeScript for web UI; use modern, industrial-strength, high-level languages that target WebAssembly for front-end background services and back-end remote services; use Python in Jupyter for interactive data analysis; use C, C++, Rust, or Zig for system integration; use GraphQL for data wrangling; and use SQL for data warehousing. Most of those languages are GPL, yet they are employed for purposes that best fit their propensities. Likewise, a large scientific software development project should follow that industry standard practice of using multiple languages for different purposes. Fortran need not be the only language used in the whole project; instead, the Fortran should be used exclusively to implement parallel algorithms from science and engineering.

In this scenario, Fortran’s core could be excised of the inessentials that it had accumulated through the decades, and be restored to its former svelte self, and its syntax will be overhauled to reflect modern thinking in language design. Much of Fortran’s present bloat came from the desperate attempts to keep pace with modernity and the misguided endeavours to turn it into a GPL. Accepting Fortran’s lot in life as the ultimate scientific DSL will make the language more compact, thereby making it better suited to its original purpose, once again. Indeed, it would fit the scientific DSL role better than Python, C++, or any other modern GPL.

Trimming the verbose syntax and the extraneous features of Fortran is not only feasible, it is also the most sensible option. This “new Fortran” will employ the syntax and semantics of modern FP languages, like Haskell, OCaml, and F#, and incorporate features from modern proof assistants, like Agda, Idris, Coq, F*, and Lean. But it will retain Fortran’s array manipulation and parallel programming facilities. Yet, it is but a thin veneer of palatable, alternative syntax atop Fortran’s core semantics. In terms of syntactic differences, this effort is analogous to what C++ is to C. And in terms of semantic differences, it is like what Agda is to Haskell. The compiler transpiles this new, higher-abstraction language down to core Fortran, then invokes the Fortran compiler to produce high-quality native binary. This is how Agda compiles its code, via Haskell, into a binary executable. And just as C++ is able to interact natively with the existing C libraries, this new language will interact natively with existing Fortran libraries.

After this modernisation, the resultant language is still Fortran in essence—a strongly, statically typed scientific DSL with built-in parallel programming facilities—only without the decades-old crust. And it provides familiar modern comforts and accoutrements that today’s programmers adore. More importantly, its cohabitation with Fortran helps sustain Fortran, long into the future. Those who find it irksome to call this language the “new Fortran” may refer to it simply as $\mathcal{F}$ or $\Phi$, after its Fortran and functional roots.

MODERNISATION

Singular among modern programming languages, Fortran is religiously backward compatible. A 50-year-old piece of code is still a valid modern Fortran code, along with tonnes of other syntactic extensions that had been added over time. This obsession with full backward compatibility makes Fortran revolting to modern programmers.

Surely, language designers and maintainers do make mistakes from time to time; it is human nature. The purpose of design updates is to remedy those mistakes, not to amplify them. Staunchly maintaining backward compatibility with a decades-old design, no matter what, is a monstrous mistake, in hindsight. It is senseless to enable application owners who slothfully or parsimoniously avoid updating their ancient, but still mission-critical, codebase. The continued use of a mission-critical code that no one understands any longer endangers that very mission. This behaviour must be deterred. Fortran’s historical crud must, therefore, be cleansed by following the usual planned obsolescence path, and by embracing modern programming concepts without continuing to strut that tired mid-century chic.

There have been examples of languages dramatically altering their syntactic appearance without altering their semantic core. The difference between C 1978 and C 1989 was stark, but planned obsolescence smoothed out the transition. Python 2 to Python 3 transition was considerably messier, but it was done, eventually. The most recent language to have a plastic surgery was Scala, a popular FP language. Scala was released in 2004, at the height of Java’s fame. As such, Scala 1 syntax was nearly identical to Java. By the time Scala 3 was released in 2021, though, Python was burning hot, and the syntax has been altered significantly to resemble Python, whose syntax is undeniably cleaner than that of Java. Twenty years is almost an eternity in the life of a programming language. The designers of Scala opted to evolve the language to ensure its continued success, instead of holding on to the old fashions that were not aging gracefully. At some point, a clean break with the past is warranted. If Fortran were to continue to attract new programmers, it, too, must undergo a similar plastic surgery.

To the programmer, the syntax is the intimate touchpoint with a language. So, programmers grow emotionally attached to a language, mainly for its attractive, intuitive syntax. Of course, semantics is important to programmers, too, to do the job right. During the three-quarters of a century over which high-level programming languages have been in use, a handful of design philosophies and implementation practices have emerged as dominant ones. Modern languages must adhere to those good principles, to promote execution efficiency and programmer productivity. It is, therefore, important to modernise both the dated syntactic jerks and the antiquated semantics quirks of Fortran.

In this section, I propose a modernised Fortran syntax that resembles the syntax of the ML family of FP languages. This choice is deliberate. First, the ML family uses the clutter-free offside rule (meaningful whitespace) of ISWIM, a seminal FP research language. Although Python is of the OO lineage, it follows the offside rule. Marks for Python, there. Secondly, Fortran leans heavily towards mathematics, much like FP languages, and the most established languages in the FP paradigm belong to the ML family. Thirdly, the offside rule, together with a small collection of carefully chosen syntactic rules, yield a clean, clear, cogent syntax without parentheses, braces, commas, semicolons, and other syntactic noises, as demonstrated by Hope, Clean, Miranda, Haskell, Agda, Idris, Caml, OCaml, Coq, F*, Lean, and other descendants of ML.

The syntactic and semantic modifications proposed in this section, like all such proposals, are always susceptible to being dismissed as “mere preferences”. The conversation, then, quickly devolves into a fight over who has a better taste. To avoid that, I have provided my reasoning behind every choice, so that it could be subjected to a reasoned critique. However, rejecting a proposal by saying, “We reject the _new_, because we have always done it the _old_ way” is neither reasonable nor valid. Also, the proposal to remove the unnecessary, irrelevant, old syntax cannot be countered with, “We want that”, because wants are not needs. I have already refuted the claimed supremacy of absolute backward compatibility. Lastly, it will not do to argue that “We could do something similar to that in Fortran, albeit with loads more characters”, because one could just as well substitute “assembly”, or even “`0`s and `1`s”, in place of “Fortran” in that argument.

_scrub the crud_

**_remove fixed format_**—Modern Fortran continues to support the 1950s code formatting convention developed for the teleprinter, called the fixed format. Stringent backward compatibility demands modern Fortran compilers to support this obsolete convention. Yet, in modern programming practice, this format is arbitrary, onerous, and irrelevant. Hence, this convention must be discarded.

**_remove implicit none_**—The `implicit none` compiler directive disables the old FORTRAN feature, which automatically assign all identifiers that begin with `I`, `J`, `K`, `L`, `M`, and `N` to the `INTEGER` type, and all other variables to `REAL`. In the days of the teleprinter, this implicit typing feature was a desirable keystroke saver. But it grates against the strong, static typing philosophy of modern programming languages, including modern Fortran. As such, every modern Fortran module file begins with this incantation to disable implicit typing.

This feature should have been excised long ago through the planned obsolescence process employed by countless other languages, including C and C++. For instance, the original C syntax that appeared in _The C Programming Language_, 1ed (1978) continued to be used well into the early 1990s, but it is no longer accepted by modern C compilers in the early 2020s—as they ought not. Indeed, most modern C programmers cannot understand the 1970s style C code any more. But the irrational adherence to absolute backward compatibility obliges modern Fortran programmers to inject `implicit none` into every module.

The `implicit none` is an instance of the many poor design decisions that littered Fortran’s long trek to modernity. Disabling implicit typing should have been cued off a file name extension (`.f90` for instance) or a compiler flag (say `--ImplicitNone`), and it should have been excised from the language a long time ago. But the designers opted to keep this superfluous feature in perpetuity and to force the programmer manually to disable implicit typing in every single module. Consequently, this statement is almost as prevalent in code as `do` and `if`, yet it performs no substantive work. Such pointless and arcane incantations not only make modern Fortran verbose, they also diminish its legibility, without offering any countervailing benefits.

_trim the bloat_

**_trim type definitions_**—To make programmes readable, it is necessary to eliminate verbosity from the syntax. Modern Fortran supports user-defined product types, which it calls _derived type_. It is not uncommon to see bloated code like the following.

``` type :: point3d integer, parameter :: k8 = selected_real_kind(8) real(kind=k8) :: x real(kind=k8) :: y real(kind=k8) :: z end type point3d

type(point3d) :: p

p = point3d(3.0, 2.0, 1.0) ```

This verbosity is gratuitous, tedious, confusing, illegible, and error-prone. The above code could be reduced to the following Haskell-like syntax in our new language.

``` Point3D : point3d {x y z : ℝ} ## declare record type p = point3d {x = 3.0, y = 2.0, z = 1.0} ## define record value ```

In the above declaration `Point3D : point3d {x y z : ℝ}`, the left-side `Point3D` is the type constructor and the right-side `point3d` is the value constructor that takes a record of the shape `{x y z : ℝ}`, which is a record of three named $\mathbb{R}$-typed values. For simplicity, this same syntax is used for type aliases, too. In the second line, the value constructor `point3d` is applied to the value $3.0 , 2.0 , 1.0$ to construct the record value `p` of type `Point3D`. The fields of this record are accessed as `p.x`, `p.y`, and `p.z`.

Because in a language with a simple type system, like Haskell, types and values exist in different domains, the same name (`Point3D`) can be given to the type constructor and its associated value constructor. But in a dependently typed language like ours where types can depend on values, the two coexist in the same domain and, as such, we must use different names for the type constructor (`Point3D`) and its associated value constructor (`point3d`).

By convention, we capitalise type constructors, but not the value constructors. Since value constructors are just ordinary functions, this is in keeping with the Haskell convention of not capitalising function names.

**_remove kinds_**—For historical reasons, Fortran supports several different number representations via a convoluted, and convulsive, mechanism it calls _kind selection_. In the `point3d` example above, `selected_real_kind(8)` is used to defines an 8-byte, double-precision floating-point kind selector named `k8` (whose type is `integer`), and `k8` in turn is used to define three double-precision floating-point variables named `x`, `y`, and `z` . This syntax is excessive, to say the least. The main purpose of this mess is to allow the programmer to choose the smallest number of bytes necessary for the desired precision, thereby conserving memory use. But allowing multiple representations for a number type could create incompatibilities across different hardware platforms. And these byte-level concerns are superfluous in modern programming practice, where memory conservation is no longer the utmost priority.

Moreover, the word “kind” is a term of art in modern type theory. It refers to a higher-kinded type, the “type of a type” in Haskell. That is, just as values are classified by their types, types are classified by their kinds. Hence, using the term “kind” to refer to number representations should be abandoned, along with the quaint practice of allowing multiple, incompatible representations for one number type.

**_remove bit twiddling_**—System programming languages like C, C++, Odin, and Zig need to manipulate bits, but a scientific DSL like Fortran does not need bit twiddling operators like `&`, `|`, `~`, and so on. Bits are, after all, hardware-level concepts.

**_remove objects_**—A scientific DSL like Fortran has no need for OO. The usual assortment of sum types, product types, functions, functionals, type classes, and modules are more than capable of modelling mathematical and scientific concepts. Objects are commonly used to model massive tangles of interacting business processes and for hiding their convoluted, mutual mutations of each other’s states. Objects are convenient (that is, expedient) for representing concepts and organising code that implements a large, sequential business application running on a uniprocessor. But objects’ very nature—their propensity to hide mutating interactions—make them unsuited to implementing a large, parallel scientific application running on a multiprocessor.

The key feature introduced by the Fortran 2003 standard was objects. OO was the dominant paradigm in the early 2000s, so this extension to the standard was welcomed, at the time. Today, though, it has been well established that hidden state mutations of OO are detrimental to code readability and to execution parallelism. In my view, Fortran’s adoption of OO was a misstep. Fortran has never been known for its OO prowess, unlike Simula, Smalltalk, Objective-C, C++, Java, Python, and loads of other modern languages that are ab initio objective. Moreover, just tossing objects into the standard does not make the language OO; the syntax must make creating and manipulating objects convenient and effortless. Smalltalk and Python excel in that, but Fortran fell way short. Be that as it may, given Fortran’s mathematical propensities, FP is the superior suitor for Fortran than OO ever could be.

**_trim variable declarations_**—The following is a way to declare and define a two-dimensional, floating-point array in modern Fortran.

``` real, dimension(:, :), allocatable :: y ! declare a matrix

allocate(y, 2, 3) ! allocate an uninitialised 2x3 matrix ```

The following is an equivalent array defined in the new syntax, using the standard library’s dependently typed`Matrix ℝ 2 3` (aliased as`[ℝ 2 3]`), which is parameterised with the element type $\mathbb{R}$ and is doubly indexed with size values of type $\mathbb{N}$. Note also that in the code snippet below, the elements of the matrix `y` are automatically initialised to $0.0$, which almost always is what programmers want.

``` y : [ℝ 2 3] ## define an ℝ type 2x3 matrix initialised to 0.0 ```

**_trim function definitions_**—The following is an implementation of quicksort in modern Fortran. For simplicity, this implementation uses a naïve double recursion and accepts as argument an array of integers, only.

``` recursive function qsort(x) result(y) integer, dimension(:), intent(in) :: x integer, dimension(1:size(x)), intent(out) :: y if (size(x) > 1) then y = (/ qsort(pack(x(2:), x(2:) < x(1))), & x(1), & qsort(pack(x(2:), x(2:) >= x(1))) /) else y = x end if end function qsort ```

As can be seen above, there is much crud in the syntax of this simple function. By contrast, the following equivalent implementation in the new syntax is succinct and, more importantly, legible and comprehensible.

``` qsort : (n : ℕ) ⇒ [ℤ n] → [ℤ n] | [] → [] | x,xx → qsort [l | l ← xx, l < x] + [x] + qsort [g | g ← xx, g ≥ x] ```

As can be seen above, the quicksort algorithm immediately pops out at the reader in our new syntax, without him having to traipse over Fortran’s syntactic spikes. The notation `x,xx` above refers to the vector whose head is `x` element and whose tail is the `xx` sub-vector. Because we use the `:` for type annotation and `::` for class instantiation (see below), we use the `,` for separating elements of a vector.

The type declaration `qsort : (n : ℕ) ⇒ [ℤ n] → [ℤ n]` states that the function `qsort` takes an integer vector of size `n` as argument and returns an integer vector of the same size. Here, `[ℤ n]` is the shorthand syntax for the dependent type `Vector ℤ n`. And the double-arrow syntax `(n : ℕ) ⇒` constrains the type of `n` to be $\mathbb{N}$. Hence, this vector type, which is defined in the standard library, is parameterised with the type $\mathbb{Z}$ and is indexed with the value of the variable `n` whose type is $\mathbb{N}$. This single-line, function type declaration replaces the following three lines of busy Fortran code.

``` recursive function qsort(x) result(y) integer, dimension(:), intent(in) :: x integer, dimension(1:size(x)), intent(out) :: y ```

In this `qsort` implementation, the formal parameter `[]` in the first clause pattern matches an empty vector argument. Since a sorted version of an empty vector is just an empty vector, the first clause simply returns the value `[]`. The formal parameter `x,xx` in the second clause pattern matches a non-empty vector argument, with the variable `x` bound to the first element, and the variable `xx` bound to the rest of the elements. The `[l | l ← xx, l < x]` is the vector comprehension syntax that is an analogue of the $L =$ {$l \mid l \in X , l < x$} set comprehension notation in mathematics. This phrase dynamically constructs a vector that holds the filtered elements of that are less than the pivot element . Likewise, the vector `[g | g ← xx, g ≥ x]`, or its analogue the set $G =$ {$g \mid g \in X , g \geq x$}, holds the filtered elements of `xx` that are greater than or equal to `x`. Then, the sorted lesser vector, the singleton vector `[x]` that contains the pivot element `x`, and the sorted greater vector are concatenated into the resultant vector using the overloaded `+` operator, and this result is returned from the second clause. This version of `qsort` reads like a mathematical description of the algorithm. Proximity to mathematical discourse, not hardware bits, is the driving force behind FP languages. And a scientific DSL, like Fortran, should adopt this design.

In the `x,xx` syntax above, the operator `,` is aliased to the standard library vector constructor function `cons`. Applying this function to the element `x` and the vector `xx`, as in `cons x xx`, prepends `x` to the head of `xx`, as is conventionally done in FP languages since LISP. Because this construction occurs frequently in FP, we have the shorthand syntax `x,xx` for it. Note that `,` is overloaded as the vector literal constructor, as in `[1, 2, 3]`. The arithmetic operator `+` is overloaded, as well. It is aliased to the standard library vector concatenation function `cat`. Hence, `cat u v` is the same as `u + v`. We pronounce `,` as _cons_ (per LISP tradition) and `+` as _cat_ (per UNIX tradition), when these symbols appear in the context of vector operations.

_forget old memories_

**_remove manual memory management_**—Manual memory management has long been known to be the primary source of runtime errors. Manual memory management was essential in the days when large computers had only a few kilobytes of RAM and before the advent of modern, efficient garbage collection) (GC) techniques. Note that LISP was the first to employ a GC. But it was not until the 1990s that GCs became fast enough to be used with mainstream industrial languages.

Today, though, most languages, including Haskell, Java, C#, Kotlin, Go, Swift, etc., have automatic memory management by GC. The only languages that still rely on manual memory management—out of necessity—are system programming languages, like C, Odin, and Zig. Nim has automatic memory management, but it allows the programmer to take control, if desired. Rust’s ownership model, which is based on affine types (substructural type system), requires the programmer manually to provide hints in the code to enable the compiler automatically to ensure that objects are always possessed (referenced) by one owner. Hence, the compiler can automatically free up the objects that are no longer being used. This behaviour is commonly referred to as “reference borrowing”.

In property law parlance, borrowing is not an ownership interest, but merely a possessory interest. So, the label “ownership model” is technically incorrect. But I digress.

In some ways, Fortran’s present approach to memory management is similar to that of Nim’s. But I argue that it is time for Fortran to go a step further: abandon manual memory management, completely. First, Fortran is not a system programming language, so it ought not muck about with hardware-level concerns like memory management. Secondly, the memory access patterns of large business applications are driven by external events and hence are disordered, whereas the memory access patterns of large scientific applications are driven by internal logic and thus are ordered. Such simple patterns of memory use can readily be optimised by modern compilers and GCs. Thirdly, modern automatic memory management techniques have advanced to the point where the GC could out perform hand-t