Friday, October 16, 2015

Toolchains in a Key of C

In developing a lively, component-oriented view of software development -- allegorically, as beginning from a location of "the ground," towards a limit of "upwards" -- it may be logically reasonable to begin with a component, "The Toolchain."  Not as if to propose any singular, ad hoc definition of a concept of a topic so broad as toolchains -- and in this single article, as such -- theoretically, a definition of "The Toolchain"  begins with a definition of "The operating system." In the present State of the Art, that would likely entail one of: Linux, any single BSD -- for instance, FreeBSD, NetBSD, OpenBSD, or any BSD happily derived from either of those three "Main BSDs" -- or OS X, or Microsoft Windows.

Proceeding in a rough estimate along a timeline going backwards in relation to present time, previously the State of the Art would have also included Beos, NeXT, MS-DOS, IBM-DOS, CP/M, the Lisp Machines of yore, and any number of UNIXes whose development in any way chronologically parallels the same timeline. The Industry has had its trends, for a number of years, before Social Networking web log networks ever became such a popular topic as today, a topic how much for advertisers, Social Networking networkers, and the more of the social networking service user community. If assuming that we may say that the present State of the Art is the only State of the Art that has ever existed, in all known time, we might likewise be assuming as if life proceeds without a sense of historical context. Though that could be quite a trendy way to not view history, perhaps it may be understood that the present State of the Art has developed only of the previous State of the Art, at any moment of time. If we may leave aside so many stylistic brand names and endeavor to consider how the present State of the Art has developed, perhaps we can learn more of the present State of the Art, if not of any estimable "Future" State of the Art, by studying any works of the previous State of the Art. If that does not tire us fully, perhaps it may begin to seem that not all of the State of the Art may have developed as if only along any single linear chronological trend. Thus, even as if to analyze the architecture of an operating system comprising any manner of an obvious element of the present State of the Art, there may be a whole lot of "Previous work" available, such that may serve to inform the present discussion -- as even of so many discrete encodings of program codes onto punch cards, and applications of Teletype machines for other than radio telecommunications, and any trends marking the evolution of terrestrial semiconductor manufacturing methods. The State of the Art, clearly, being a material domain, though not exclusively of any single material vocation -- not even as if singularly of the many works of marketing, of works of media ever apparently seeking to draw a social attention in one way or another across the present State of the Art, if not furthermore to direct the viewer's attention to any single commercial product -- perhaps it cannot all be said to derive back to a material physics and a corresponding mathematics ever developed of any possibly more intuitive laboratory.

Inasmuch, it might not be said that all of the State of the Art derives back to knowledge, or knowledge deriving back to language, or everything under the sun deriving back to a simple concept of communications. Such naive theses, though presenting any manner of an immediate sense of perspective, may seem difficult to prove, to any detail, logically and at scale. Perhaps not all of the universe is merely a mote in the eye of a grand, benevolent narcissist, but it would seem that much of the known universe derives, at least, to a sense of information.

So, if we are to begin at toolchains, it might be expedient to skip ahead past the estimable origin of the physical universe, to leap a little ways across the evolutions of mineral mining and tool production techniques, to take a long way around the events of empires, piracy, and war, and hop on up to the present day, in which all operating systems may appear to be constructed of C or a programming language deriving of C, in terms of syntax, semantics, and evaluation procedures. The subtle leaning of Java over to anything like a Lisp -- even so far as of the lambda nomenclature of the Java programming language, edition 8 -- this might be ignored simply as an aberrant trend, nothing whatsoever arcing around to another method of systems design, nothing in any ways suggesting anyone had constructed any microprocessors either wrongly or in way merely keeping up with the industry's state at any point in time. Surely, every microprocessor must have an Arithmetic and Logic Unit, and every OS must be constructed of C or a dialect of C ... except for those that are not.

So, then -- taking some liberty to try to construct a light-hearted point of view of this thesis -- we may begin with the present state of the art in C toolchains.

...and the author will return to this thesis, shortly, with a reference to the K&R book, section 4.11, and no further aside about a story by -- estimably -- a satirist writing by the name, Ayn Rand.

For wont of expedience, this article will resume the discussion not at the development of the first C dialect, in 1971 [Raymond2003], and neither of an analysis of any market trends, ahead to which the GNU C Compiler Collection (GCC) first addressed the GNU Public License (GPL) to a Patents Industry, thirdly leaving aside any analysis of the complex interleavings of the LLVM toolchain and non-BSD operating systems including OS X and Android, lastly to an immediate, albeit in ways ad hoc overview about a generic model of a C toolchain, as to include -- in the albeit naive model -- a C preprocessor, a C compiler, and a C linker, such that the linker produces -- in a procedure of processibg certain intermediate compiled object files produced by the C compiler -- producing a loadable binary object file, such as may be later evaluated by an operating system, whether evaluated as a "runnable" software program having any single main() routine as its entry point for its launching as a software program, or evaluated as a library file for linking with other binary object files. This generic model may be difficult to describe to any detail, for how it may serve as model if the components of any single toolchain, with the addition of any more specialized and toolchain-specific components, abd an aside to address compiler components such as may produce an intermediate or loadable object file, from a source code language not C.

Of course, as well as those components of a C toolchain -- the preprocessor, the compiler, and the linker -- there is also the inevitable Makefile implementation, such that provides instructions to an operating system for how to "Put the pieces together" to any point of program evaluation, in producing evaluable programs. A Makefile interpreter, in some regards, might be cast in a metaphor of a mechanical chef.

Aside to the C toolchain, of course there are software programs that may -- in ways -- resemble a Makefile interpreter, such as the Ant program, in a Java toolchain, or the inimitable ASDF, in a Common Lisp toolchain, as of the present state of the art in Common Lisp system definition utilities. The author's novel thesis that all of these toolchains could be -- theoretically -- translated into a Common Lisp interpreter, it might seem too novel to be obviously relevant to the State of the Art. For all of the UNIX architecture developed in C, furthermore, it might not either be fortuitous to abandon such architecture for a Lisp Machine, if without making a comprehensive study of the exiting work.

Of course, not all of UNIX is implemented in C. In fact, the FreeBSD operating system uses a bit of Forth in its bootloader. Ever, there are these novel things that so impede a linear introduction of the State of the Art. Forth being a language as much allied to a concept of stack machines as is the Lisp implementation described in the AI Memo 514, in which the authors propose to develop a microprocessor absent an ALU, as well as proposing an implementation of Lisp, in how far absent to the going trends of CISC microprocessor designs, of industry at the time -- to the author's best understanding of such features of the State of the Art -- well so, but now we have C, C preprocessor langauge, Makefiles, and Forth, as well as anything else that may be compiled to a binary loadable object file, insofar as source code languages -- with a note in regards to intermediate object file formats, and loadable object file formats.

The author has read that there are criticisms of Lisp syntax. The author fails to understand, How can this be? Is it too far unlike the linguistic sandwich bar of the modern toolchain? Could it be, perhaps, too far unlike a CISC language?

On top of -- or, in another way below to -- C, of course there is the syntax of any single assembler.  Below the assembler, in a similar arc, any individual Instruction Set Architecture.

Not as though to begin a Lisp Advocacy thesis forthright, ironically there's something like an assembler defined in one Common Lisp implementation named CMU Common Lisp, the low-level compiler VOP framework of CMUCL being then inherited by Steel Bank Common Lisp (SBCL), with SBCL being originally a fork of CMUCL. How this may seem to parallel an evolution of a BSD operating system -- moreover that CMUCL's architecture may seem, in some certain ways, curiously BSD-like -- but it might not seem to contribute an obvious whole lot(TM) to the State of the Art, Immediately Today(TM) to make any too lengthy dissertation of such topics of systems evolution, and well would the author go out of depth to speculate of the similarity. No myth, no magic, perhaps an independent operating system can be developed out of Common Lisp, once more, but there is a dearthy lot of existing work to observe, if not to study, in UNIX systems.

Perhaps the author has begun to mistake this English language for Makefile syntax, if not merely a disposable lexicon. Of course, BPMN might be far more succinct, visually -- if not more likewise difficult to reproduce if discarded -- to describe a thesis topic or a recipe.

And so, the author must take another aside, with a glib and/or drab nod to the works of the grand satirists in literature. This article has now breezed across the whole C toolchain, topically, and here it is not even August yet.

Ed. Note: This article may be reviewed, at some later time, towards clarifications about compiler architectures, including: The nature of "Intermediate" compiled object files (e.g. *.s) whether present in C compilers, C++ compilers, or otherwise; the role of the assembler, in the procedures of the compiler.

Before commencing to present the hot topic of the evening's article – as of a simple illustration of two ways to produce object files, each of a popular though by no means industry-dominant programming language, and as such, to produce object files as without an immediate application of a C compiler – the author should take care to define, initially, what the term, object file, may denote – as in how the term may be defined, at least in a context of the media object comprising this single article, if not also of how the term may be encountered of other literature.

In a metaphor to granola … non. This thesis shall presently disembark to a discussion of machine architectures, focusing primarily about microprocessor architectures, specifically Intel, MIPS, and ARM microprocessors. This representing an adventurous aspect of the evening's thesis, a food with a suitable proportion of complex carbohydrates may be recommended … if not a draught of the evening's coffee, along with.

This intermission brought to you in a format of lyrical music

 [Article will resume momentarily]

Ed. note: For some intents and purposes, the Executable and Linkable Format (ELF) may seem to be "Enough to know about", as with regards to object files produced by compiler toolchains on UNIX platforms -- at least, so far as up until a point of actually developing a compiler [TO DO: FINALIZE ARTICLE] (NOTES)

 Ed. note: Though the Embeddable Common Lisp (ECL) Common Lisp implementation can be applied to produce object files, it is not without applying a C compiler as an intermediary component. Thus, the comment -- in the previous -- as if it was possible to generate an object file with ECL does not hold. Neither might it hold as if LuaJIT was not applying a C compiler, itself, in producing object files for the respective machine of its application. As stated in the previous article, the "Hot topic" of the evening might seem to be a "Dud," in such regards.

Ed. note: With regards to how ECL and LuaJIT may be applied with the LLVM toolchain, such a study may be addressed at a later time.

 Ed. note: Follow up with documentation about ctags, etags, Exuberant CTags, and llvm-clang ETags/CTags, as with regards to source code modeling and review. See also: Doxygen; UML; SysML; MARTE

Ed. note: The goal of this article was to develop a singular overview about compiler toolchains, as with regards to (1) how a compiler toolchain is applied as a component of an operating system; (2) how a compiler toolchain extends of any single microcontroller's supported instruction set architectures (e.g. amd64, SSE2, MMX; on GPU microcotrollers, lastly, CUDA).  Beyond such a description of existing work, in contemporary operating systems design, perhaps it may seem frivolous to endeavor to assert that a reproducably usable operating system may be constructed for contemporary microcontrollers, and without an application of a C toolchain.