您现在的位置是：首页 > 工具

当前栏目

深入研究Clang（五） Clang Lexer代码阅读笔记之Lexer

笔记代码阅读深入研究 clang

2023-09-27 14:26:37 时间

Clang的Lexer（词法分析器）的源码的主要位置如下： clang/lib/Lex 这里是主要的Lexer的代码； clang/include/clang/Lex 这里是Lexer的头文件代码的位置；同时，Lexer还使用了clangBasic库，所以要分析Lexer的代码，clangBasic（clang/lib/Basic）的一些代码也会用到。

作者：史宁宁（snsn1984）

Clang的Lexer（词法分析器）的源码的主要位置如下：

clang/lib/Lex 这里是主要的Lexer的代码；

clang/include/clang/Lex 这里是Lexer的头文件代码的位置；

同时，Lexer还使用了clangBasic库，所以要分析Lexer的代码，clangBasic（clang/lib/Basic）的一些代码也会用到。

首先从Lexer入手。

clang/include/clang/Lex/Lexer.h

clang::Lexer:

00057 //===--------------------------------------------------------------------===//

00058 // Context-specific lexing flags set by the preprocessor.

00059 //

00060

00061 /// ExtendedTokenMode - The lexer can optionally keep comments and whitespace

00062 /// and return them as tokens. This is used for -C and -CC modes, and

00063 /// whitespace preservation can be useful for some clients that want to lex

00064 /// the file in raw mode and get every character from the file.

00065 ///

00066 /// When this is set to 2 it returns comments and whitespace. When set to 1

00067 /// it returns comments, when it is set to 0 it returns normal tokens only.

00068 unsigned char ExtendedTokenMode;

00069

00070 //===--------------------------------------------------------------------===//

这个成员变量保存词法分析的一个状态，根据它的值的不同：0、1、2，分别对应只返回正常的token，返回comments
和正常的token，返回空格、comments和正常的token。
下面是几个操作这个成员变量的函数，基本上都是获取值、设置值和重设值。代码不复杂，

00162 /// isKeepWhitespaceMode - Return true if the lexer should return tokens for

00163 /// every character in the file, including whitespace and comments. This

00164 /// should only be used in raw mode, as the preprocessor is not prepared to

00165 /// deal with the excess tokens.

00166 bool isKeepWhitespaceMode() const {

00167 return ExtendedTokenMode 

00168 }

00169

00170 /// SetKeepWhitespaceMode - This method lets clients enable or disable

00171 /// whitespace retention mode.

00172 void SetKeepWhitespaceMode(bool Val) {

00173 assert((!Val || LexingRawMode || LangOpts.TraditionalCPP) 

00174 "Can only retain whitespace in raw mode or -traditional-cpp");

00175 ExtendedTokenMode = Val ? 2 : 0;

00176 }

00177

00178 /// inKeepCommentMode - Return true if the lexer should return comments as

00179 /// tokens.

00180 bool inKeepCommentMode() const {

00181 return ExtendedTokenMode 

00182 }

00183

00184 /// SetCommentRetentionMode - Change the comment retention mode of the lexer

00185 /// to the specified mode. This is really only useful when lexing in raw

00186 /// mode, because otherwise the lexer needs to manage this.

00187 void SetCommentRetentionState(bool Mode) {

00188 assert(!isKeepWhitespaceMode() 

00189 "Cant play with comment retention state when retaining whitespace");

00190 ExtendedTokenMode = Mode ? 1 : 0;

00191 }

00192

00193 /// Sets the extended token mode back to its initial value, according to the

00194 /// language options and preprocessor. This controls whether the lexer

00195 /// produces comment and whitespace tokens.

00196 ///

00197 /// This requires the lexer to have an associated preprocessor. A standalone

00198 /// lexer has nothing to reset to.

00199 void resetExtendedTokenMode();

关于raw mode:
raw mode的时候，ExtendedTokenMode = 2，Lexer会输出包含空格、comments和正常tokens在内的所有
字符。在Lexer的父类：clang::PreprocessorLexer类中（），有一个成员变量：

00049 /// \brief True if in raw mode.

00050 ///

00051 /// Raw mode disables interpretation of tokens and is a far faster mode to

00052 /// lex in than non-raw-mode. This flag:

00053 /// 1. If EOF of the current lexer is found, the include stack isnt popped.

00054 /// 2. Identifier information is not looked up for identifier tokens. As an

00055 /// effect of this, implicit macro expansion is naturally disabled.

00056 /// 3. "#" tokens at the start of a line are treated as normal tokens, not

00057 /// implicitly transformed by the lexer.

00058 /// 4. All diagnostic messages are disabled.

00059 /// 5. No callbacks are made into the preprocessor.

00060 ///

00061 /// Note that in raw mode that the PP pointer may be null.

00062 bool LexingRawMode;

它可以表明Lexer是否在raw mode下。同时，这里的注释也说明了raw model的作用。

从clang::Lexer的定义可以看出，它是clang::PreprocessorLexer的子类，上面raw model的部分也引用了clang::PreprocessorLexer类的代码，下面看下clang::PreprocessorLexer的代码。

clang/include/clang/Lex/PreprocessorLexer.h

00022 namespace clang {

00023 

00024 class FileEntry;

00025 class Preprocessor;

从这里可以看出clang::PreprocessorLexer使用了上面两个类，而在头文件中的具体位置就是：

00027 class PreprocessorLexer {

00028 virtual void anchor();

00029 protected:

00030 Preprocessor *PP; // Preprocessor object controlling lexing.

以及

00164 /// getFileEntry - Return the FileEntry corresponding to this FileID. Like

00165 /// getFileID(), this only works for lexers with attached preprocessors.

00166 const FileEntry *getFileEntry() const;

从代码中可以看出，这两个类，一个是作为成员变量，一个是作为了一个成员函数的返回类型来使用的。我们跟踪代码去看下这两个类的具体实现。这两个类的具体实现，FileEntry较为简单，很容易看出到底内容；而Preprocessor类较为复杂，牵涉内容较多，在这里暂且不作分析。后续继续分析。

LLVM编译器前端 Clang 简介昨天晚上安装rails的开发环境，被ruby的编译搞的有点崩溃。下载的ruby的源码不能用系统自带的gcc -4.21编译，也不能用系统自带的clang进行编译，必须下载并使用gcc -4.2进行编译才能通过。今天稍微看看编译器的一些背景。
带你读《LLVM编译器实战教程》之一：构建和安装LLVM 本书的前半部分将向您介绍怎么样去配置、构建、和安装LLVM的不同软件库、工具和外部项目。接下来，本书的后半部分将向您介绍LLVM的各种设计细节，并逐步地讲解LLVM的各个编译步骤：前段、中间表示（IR）、后端、即时编译（JIT）引擎、跨平台编译和插件接口。本书包含有大量翔实的示例和代码片段，以帮助读者平稳顺利的掌握LLVM的编译器开发环境。
深入研究Clang(十） Clang Static Analyzer简介 Clang Static Analyzer 官网地址：http://clang-analyzer.llvm.org/ Clang Static Analyer是一个源码分析工具，它可以发现C、C++和Objective-C程序中的bug。
深入研究Clang（七） Clang Lexer代码阅读笔记之Lexer 作者：史宁宁（snsn1984）源码位置：clang/lib/Lexer.cpp 源码网络地址：http://clang.llvm.org/doxygen/Lexer_8cpp_source.html Lexer.cpp这个文件，是Clang这个前端的词法分析器的主要文件，它的内容是对Lexer这个类的具体实现，原文件的注释中：“This file implements the Lexer and Token interfaces.” 这么解释这个文件的，但是Token只有两个简单函数的实现，剩下的都是Lexer的实现。
LLVM是Apple官方支持的编译器，而该编译器的前端是Clang，这两个工具都被集成到了Xcode里面。
LLVM是什么？这是一个虽然基础，但是也曾经让很多新入门的人迷惑的一个问题。从字面上来讲，LLVM(Low Level Virtual Machine)是一个底层虚拟机，LLVM曾经有一部分功能对虚拟机有所帮助。
从今天起，写一本关于LLVM的书----《深入理解LLVM》一直想写一本关于深入学习LLVM的书，这个想法有了很久了，但是一直没有机会动手。现在虽然很忙，但是依然觉的有必要马上动手去做这个事情。任何事情都是一点一点积累起来的，如果一直不动手，什么都做不成。
snsn1984 多年研发和团队管理经验，熟悉LLVM、GPU等技术领域，具备公司层面技术管理能力。

猜你喜欢

开源！！！100多个常用API接口免费分享！建议收藏！
C++程序调用已经被编译后的C函数
OpenAI最新研究：“对抗样本”能轻易黑掉AI系统，如何抵御？
Spring Boot2 系列教程(四)理解Spring Boot 配置文件 application.properties
JavaScript 数据结构与算法之美 - 线性表（数组、栈、队列、链表）（上）
Rasa课程、Rasa培训、Rasa面试、Rasa实战系列之Docker Tracker Store Redis服务部署
SSDP 简单服务发现协议
Docker Workflow（三）：编排工具
Github Enterprise版本SAML服务两个身份认证漏洞
MAC 安装jenkins
力的合成
JDK 17有可能代替 JDK 8 吗
【Rust】元组-transpose
英特尔至强E5 V4荣耀出炉宝德服务器抢先同步升级
shell之if简化语句
vue3 watchEffect 的使用

相关主题

10.14 python笔记
shell 笔记
Vue笔记(2)
Linux笔记09
nginx笔记
git 笔记
Maven笔记
Unity笔记-02
oracle学习笔记
笔记笔记笔记
安卓笔记一
kafka笔记4(2)
kafka笔记202104-6
Linux Shell 笔记
笔记笔记
Vue笔记3
JVM笔记（二）
MySQL 笔记2

zl程序教程

当前栏目

深入研究Clang（五） Clang Lexer代码阅读笔记之Lexer

相关文章