zl程序教程

您现在的位置是:首页 >  其它

当前栏目

LLVM IR is a compiler IR

is compiler LLVM IR
2023-09-27 14:26:37 时间
In this email, I argue that LLVM IR is a poor system for building a

Platform, by which I mean any system where LLVM IR would be a

format in which programs are stored or transmitted for subsequent

use on multiple underlying architectures.

LLVM IR initially seems like it would work well here. I myself was

once attracted to this idea. I was even motivated to put a bunch of

my own personal time into making some of LLVMs optimization passes

more robust in the absence of TargetData a while ago, even with no

specific project in mind. There are several things still missing,

but one could easily imagine that this is just a matter of people

writing some more code.

However, there are several ways in which LLVM IR differs from actual

platforms, both high-level VMs like Java or .NET and actual low-level

ISAs like x86 or ARM.

First, the boundaries of what capabilities LLVM provides are nebulous.

LLVM IR contains:

 * Explicitly Target-specific features. These arent secret;

 x86_fp80s reason for being is pretty clear.

 * Target-specific ABI code. In order to interoperate with native

 C ABIs, LLVM requires front-ends to emit target-specific IR.

 Pretty much everyone around here has run into this.

 * Implicitly Target-specific features. The most obvious examples of

 these are all the different Linkage kinds. These are all basically

 just gateways to features in real linkers, and real linkers vary

 quite a lot. LLVM has its own IR-level Linker, but it doesnt

 do all the stuff that native linkers do.

 * Target-specific limitations in seemingly portable features.

 How big can the alignment be on an alloca? Or a GlobalVariable?

 Whats the widest supported integer type? LLVMs various backends

 all have different answers to questions like these.

Even ignoring the fact that the quality of the backends in the

LLVM source tree varies widely, the question of "What can LLVM IR do?"

has numerous backend-specific facets. This can be problematic for

producers as well as consumers.

Second, and more fundamentally, LLVM IR is a fundamentally

vague language. It has:

 * Undefined Behavior. LLVM is, at its heart, a C compiler, and

 Undefined Behavior is one of its cornerstones.

 High-level VMs typically raise predictable exceptions when they

 encounter program errors. Physical machines typically document

 their behavior very extensively. LLVM is fundamentally different

 from both: it presents a bunch of rules to follow and then offers

 no description of what happens if you break them.

 LLVMs optimizers are built on the assumption that the rules

 are never broken, so when rules do get broken, the code just

 goes off the rails and runs into whatever happens to be in

 the way. Sometimes it crashes loudly. Sometimes it silently

 corrupts data and keeps running.

 There are some tools that can help locate violations of the

 rules. Valgrind is a very useful tool. But they cant find

 everything. There are even some kinds of undefined behavior that

 Ive never heard anyone even propose a method of detection for.

 * Intentional vagueness. There is a strong preference for defining

 LLVM IR semantics intuitively rather than formally. This is quite

 practical; formalizing a language is a lot of work, it reduces

 future flexibility, and it tends to draw attention to troublesome

 edge cases which could otherwise be largely ignored.

 Ive done work to try to formalize parts of LLVM IR, and the

 results have been largely fruitless. I got bogged down in

 edge cases that no one is interested in fixing.

 * Floating-point arithmetic is not always consistent. Some backends

 dont fully implement IEEE-754 arithmetic rules even without

 -ffast-math and friends, to get better performance.

If youre familiar with "write once, debug everywhere" in Java,

consider the situation in LLVM IR, which is fundamentally opposed

to even trying to provide that level of consistency. And if you allow

the optimizer to do subtarget-specific optimizations, you increase

the chances that some bit of undefined behavior or vagueness will be

exposed.

Third, LLVM is a low level system that doesnt represent high-level

abstractions natively. It forces them to be chopped up into lots of

small low-level instructions.

 * It makes LLVMs Interpreter really slow. The amount of work

 performed by each instruction is relatively small, so the interpreter

 has to execute a relatively large number of instructions to do simple

 tasks, such as virtual method calls. Languages built for interpretation

 do more with fewer instructions, and have lower per-instruction

 overhead.

 * Similarly, it makes really-fast JITing hard. LLVM is fast compared

 to some other static C compilers, but its not fast compared to

 real JIT compilers. Compiling one LLVM IR level instruction at a

 time can be relatively simple, ignoring the weird stuff, but this

 approach generates comically bad code. Fixing this requires

 recognizing patterns in groups of instructions, and then emitting

 code for the patterns. This works, but its more involved.

 * Lowering high-level language features into low-level code locks

 in implementation details. This is less severe in native code,

 because a compiled blob is limited to a single hardware platform

 as well. But a platform which advertizes architecture independence

 which still has all the ABI lock-in of HLL implementation details

 presents a much more frightening backwards compatibility specter.

 * Apple has some LLVM IR transformations for Objective-C, however

 the transformations have to reverse-engineer the high-level semantics

 out of the lowered code, which is awkward. Further, theyre

 reasoning about high-level semantics in a way that isnt guaranteed

 to be safe by LLVM IR rules alone. It works for the kinds of code

 clang generates for Objective C, but it wouldnt necessarily be

 correct if run on code produced by other front-ends. LLVM IR

 isnt capable of representing the necessary semantics for this

 unless we start embedding Objective C into it.


In conclusion, consider the task of writing an independent implementation of an LLVM IR Platform. The set of capabilities it provides depends on who you talk to. Semantic details are left to chance. There are features which require a bunch of complicated infrastructure to implement which are rarely used. And if you want light-weight execution, youll probably need to translate it into something else better suited for it first. This all doesnt sound very appealing. LLVM isnt actually a virtual machine. Its widely acknoledged that the name "LLVM" is a historical artifact which doesnt reliably connote what LLVM actually grew to be. LLVM IR is a compiler IR. Dan

C++_pair pair是将2个数据组合成一组数据,当需要这样的需求时就可以使用pair,如stl中的map就是将key和value放在一起来保存。另一个应用是,当一个函数需要返回2个数据的时候,可以选择pair。pair的实现是一个结构体,主要的两个成员变量是first second 因为是使用struct不是class,所以可以直接使用pair的成员变量。就算其他的语言中没有pair,比如java与c语言,也可以自己理解原理之后,通过结构体自己做一个。
常见问题之Golang cgo: C compiler gcc not found: exec: gcc : executable file not found in %PATH%错误 本文主要是对我日常在使用golang时遇到的一些问题与解决方式进行的汇总,在此提供给大家便于排查一些遇到的问题,其中有更好的解决方案可在评论区留言。
STL—pair pair是一个特别实用的“小玩意儿”,当我们想把两个元素捆绑在一起当成一个元素的时候,又不想定义结构体的时候,这个时候就可以使用pair,也就是说,pair实际上可以当成内部只有两个元素的结构体,且这两个元素的类型是可以指定的.
snsn1984 多年研发和团队管理经验,熟悉LLVM、GPU等技术领域,具备公司层面技术管理能力。