1. Background

Recently, I was involved in the development of the KCL configuration language compiler built in KusionStack, and I needed to develop the error handling module of the compiler. Since KCL was developed using Rust, I plan to learn how the error handling module of the Rust language does it.

2. Introduction

From the directory structure of the Rustc source code, the error handling part of Rustc is mainly concentrated in the three directories of rustc_errors, rustc_error_codes and rustc_error_message, but in the process of looking at the source code, I found that due to the large amount of Rustc code and the error handling module involved For many other modules, it is easy to get confused by simply looking at the code in these three directories, and it is also difficult to analyze. Therefore, I plan to split the source code analysis of this part into several parts. This article mainly combines Rustc’s official documentation and Rustc source code to sort out the structure.

Therefore, the core idea of ​​this article is to sort out the structure of the error handling part. The goal is to sort out how the errors are transmitted from the analysis process to the terminal step by step in the process of parsing the Rust program by Rustc and output as diagnostic information. For some content that is complex and has nothing to do with the output of error diagnostic information, we will skip it for the time being. Leaving a hole to fill in later, sorting out the structure first, will also help us to analyze the source code step by step in a more in-depth and clear way, preventing us from getting lost in the large amount of Rustc source code. And in order to see the structure of the code more clearly, this article deals with the code fragments used, and removes the life cycle and other parts that are not related to the code execution logic.

3. What does the diagnostic information look like?

First, before looking at the source code, let’s take a look at the format of Rust’s diagnostic information. As shown below:

image.png

According to the description in the Rustc documentation, the above information can be divided into the following 5 parts,

  • Level (error, warning, etc.), this part is mainly used to describe the severity of the current message.

  • It is better to translate the Code code into a number (for example: for the diagnostic information of “type of error”, its corresponding number is E0308), this number is an index, and the user can find a more complete description of the current error through this index. . Diagnostics created through lint do not have this number.
    Note: I checked it later, and rustc officially calls Code the Rust Compiler Error Index.

  • Message message, which describes the main content of the current problem. The content of this message should be generic and independent. Even if there is no other content, only this piece of information can be helpful.

  • Diagnostic Window The diagnostic window is mainly responsible for displaying information related to the context of the code in question.

  • Sub-diagnostic sub-diagnostics, any error has many sub-diagnostics and they all look similar to the main part of the diagnostics.

4. Where does the diagnostic information come from?

After understanding Rustc diagnostics, let’s take a look at how Rustc constructs such diagnostics. In this part Rustc officially provides two ways,

  1. Implement the trait provided by rustc_sessions.
  2. The traits provided by rustc_sessions are automatically implemented using the attribute macros in rustc_macros for outputting diagnostic information.

Looking directly at the above two points is not easy to understand, the main process can refer to the following picture,

Screen Shot 2022-09-02 at 11.51.26.png

Among them, the yellow part indicates that in different modules of Rustc, the structure Struct (Note: Enumeration is also possible. This article is an overview. For the convenience of description, only Struct is listed below.). The green part indicates that a trait SessionDiagnostic is provided in Rustc’s error handling module. Structs defined inside different modules implement this trait SessionDiagnostic. The specific implementation of trait SessionDiagnostic is to extract and encapsulate the content needed to output diagnostic information in Struct, and return it to Rustc’s error handling module for output.

This is the trait provided by the implementation error module mentioned above. The source code of this trait SessionDiagnostic is as follows:

// rustc/compiler/rustc_session/src/session.rs
pub trait SessionDiagnostic
<T: EmissionGuarantee = ErrorGuaranteed> 
{
    fn into_diagnostic(
        self, 
        sess: & ParseSess
    ) -> DiagnosticBuilder<T>;
}

Take the error structure given in the Rustc documentation as an example:

pub struct FieldAlreadyDeclared {
    pub field_name: Ident,
    pub span: Span,
    pub prev_span: Span,
}

According to Rustc’s official description, in order to output the error message corresponding to struct FieldAlreadyDeclared, trait SessionDiagnostic must be implemented. The error structure defined inside Rustc’s source code is currently entirely in the second way.

On the official documentation provided by Rustc, a concrete implementation of trait SessionDiagnostic is provided.

impl SessionDiagnostic for FieldAlreadyDeclared {
    fn into_diagnostic(self, sess: Session) -> DiagnosticBuilder {
        let mut diag = sess.struct_err(...);
        diag.set_span(self.span);
        diag.span_label(...);
        ... 
        diag
    }
}

The above code shows how to implement trait SessionDiagnostic for Struct FieldAlreadyDeclared. Don’t worry if you don’t understand the specific code details. Here is just a demonstration. The details of the code are not the topic of this article. Part of the code extracts the content needed to output diagnostic information from Struct FieldAlreadyDeclared, and encapsulates it into DiagnosticBuilder for return.

So how to understand the second way? Taking the above code as an example, the implementation of trait SessionDiagnostic is mainly to extract the content that needs to be output to the diagnostic information in Struct FieldAlreadyDeclared, and fill it into the DiagnosticBuilder. This process is actually moving bricks. FieldAlreadyDeclared is moved to DiagnosticBuilder, so this process can be automated. When we define a new error Struct, the bricks do not need to be moved by ourselves, we can write a program to help us move, we only need to mark it when defining the Struct Which bricks need to be moved out.

Therefore, Rustc has written the brick-moving program by means of attribute macros. This brick-moving program provides us with some annotations. When defining a new error Struct, we only need to mark which bricks to move through annotations. The property macro will automatically implement trait SessionDiagnostic for us. The same is Struct FieldAlreadyDeclared, the code using the second method is as follows:

#[derive(SessionDiagnostic)]
#[diag(typeck::field_already_declared, code = "E0124")]
pub struct FieldAlreadyDeclared {
    pub field_name: Ident,
    #[primary_span]
    #[label]
    pub span: Span,
    #[label(typeck::previous_decl_label)]
    pub prev_span: Span,
}

Among them, through the annotation #[derive(SessionDiagnostic)] Use attribute macros implemented inside rustc_sessions, via annotations[diag(typeck::field_already_declared, code = “E0124”)] Explain the text information output by the current diagnostic information and the number of the current diagnostic information mentioned above, and finally pass the annotation #[primary_span]#[label] and #[label(typeck::previous_decl_label)] Annotations mark information about the context of the code in question.

After defining an annotated Struct or implementing the trait SessionDiagnostic for Struct, what do you do next? Rustc documentation says so.

Now that we’ve defined our diagnostic, how do we use it? It’s quite straightforward, just create an instance of the struct and pass it to emit_err (or emit_warning).

Now that we have defined our diagnostic information, how do we use it? This is very simple, we just need to create an instance of the struct and pass it to the emit_err() or emit_warning() method.

tcx.sess.emit_err(FieldAlreadyDeclared {
    field_name: f.ident,
    span: f.span,
    prev_span,
});

I don’t quite understand it, but I got a key method emit_err(), through which the error diagnostic information is output to the terminal, then search for this method globally in the source code:

Screen Shot 2022-09-02 at 12.03.22.png

Found the definition of this method as follows:

// 这个方法在 Struct Session 中。
impl Session{
    pub fn emit_err(
        &self, 
        err: impl SessionDiagnostic
    ) -> ErrorGuaranteed {
        self.parse_sess.emit_err(err)
    }
}

Let’s go in and see continuously along the method’s calling link.

// self.parse_sess.emit_err(err)
impl ParseSess{
    pub fn emit_err(
        &self, 
        err: impl SessionDiagnostic
    ) -> ErrorGuaranteed {
        self.create_err(err).emit()
    }
}

// self.create_err(err)
impl ParseSess{
    pub fn create_err(
        &'a self,
        err: impl SessionDiagnostic,
    ) -> DiagnosticBuilder<ErrorGuaranteed> {
        err.into_diagnostic(self)
    }
}

// self.create_err(err).emit()
impl DiagnosticBuilder {
    pub fn emit(&mut self) -> G {
        ......
    }
}

Looking at the code, I seem to understand, and refine the diagram of the above error handling process:

Screen Shot 2022-09-02 at 14.19.23.png

As shown in the figure, I added some things on the right side of the figure, the yellow part does not change much, other Rustc modules define wrong Struct, the green part adds some content, and refines the main implementation of trait SessionDiagnostic, Generate the blue DiagnosticBuilder based on the content provided by the yellow Struct. In the generated DiagnosticBuilder, the built-in emit() method is used to output diagnostic information to the terminal, and this emit() method will finally be called in the Session.

In rustc, the generated DiagnosticBuilder is called by Struct Session to output diagnostic information. The specific calling process is shown on the right side of the above figure. Struct Session has built-in Struct ParseSess, which includes two layers of emit_err() methods, and in the method ParseSess.emit_err( ), the ParseSess.create_err() method is called, this method accepts the implementation of trait SessionDiagnostic, and calls the into_diagnostic() method provided by trait SessionDiagnostic to obtain an instance of DiagnosticBuilder, and then calls the built-in emit() method of DiagnosticBuilder to output diagnostic information to the terminal.

Seeing this, the problem comes again. Rustc receives DiagnosticBuilder output diagnostic information through Session. What is this Session? How does this Session link with other Rustc modules? Or how is this Session called?

Regarding what a Session is, this is not the focus of this article. In order to prevent getting lost, let’s dig a hole first. We will see what a Session is in a subsequent article. Next, let’s take a look at how a Session is called to deal with errors. Let’s search for the keyword sess.emit_err() globally to see how rustc outputs diagnostic information through Session.

As you can see, error messages are output through Session in many places in Rustc.

Screen Shot 2022-09-02 at 14.37.01.png

I took a look and picked out a few of the more typical places. The first is that in Ructc’s parser rustc_parse, if an error is encountered during the parsing process, the error diagnostic information will be output through the sess.emit_err() method.

Screen Shot 2022-09-02 at 14.38.27.png

Then, in rustc’s type checker TypeChecker, the ownership borrow checking part rustc_borrowck and the type checking part rustc_typeck will output error diagnostic information through the sess.emit_err() method when an error is detected. The difference from rustc_parse is that TypeChecker does not directly use the Session instance as a structure member, but obtains the Session instance through a method of obtaining the context, tcx().

Screen Shot 2022-09-02 at 14.38.06.png Screen Shot 2022-09-02 at 14.38.48.png

The details of the context method tcx() and the structure of the context are not discussed yet. For now, we only need to know that TypeChecker also outputs diagnostic information through Session. Then, let’s take a look at how they output the wrong information with the help of Session.

First, take a look at the Session section in rustc_parse:

pub struct Parser {
    pub sess: & ParseSess,
	......
}

// 在 Parser 解析 Rust 语言的时候,会调用emit_err方法输出诊断信息。
self.sess.emit_err(...)

Knowing the name has brought me a little misjudgment, Parser has a built-in ParseSess instead of a Session. Therefore, you can draw a separate picture for the part of Parser error handling with the help of the structure of the above picture.

Screen Shot 2022-09-02 at 15.08.25.png

The internal details have been shown in the previous figure, and they are not shown here. Only the relationship between trait SessionDiagnostic and Parser is shown here, (Note: The Parse() method in the picture above is my name, which refers to the process of parsing Rust programs in Rustc. This method does not necessarily exist in the Rustc source program. The specific method used is not in this article. The point, but as long as it is a compiler, there must be a parse process, and the name of the parse process may be different in different compilers.)

As shown in the figure, in the process of parsing the Rust program, if there is an error, instantiate an error Struct structure that implements trait SessionDiagnostic and throw it to the emit_err() method in Parser’s built-in ParseSess to diagnose Information output.

Then, look at rustc_borrowck and rustc_typeck. From the way of calling, they are not directly built-in Session, they should have a built-in context-related structure, and then the context-related structure contains Session.

self.tcx().sess.emit_err(MoveUnsized { ty, span });

Click on self and take a look, you can see that this is a type checker TypeChecker, find the context structure and click on the depth-first search Session or ParseSess structure, in order to prevent everyone from getting lost when reading, the search process will not be written, and it will be shown directly here search results.

struct TypeChecker {
    infcx: & InferCtxt,
    ......
}

pub struct InferCtxt {
    pub tcx: TyCtxt,
    ......
}

pub struct TyCtxt {
    gcx: & GlobalCtxt,
}


pub struct GlobalCtxt {
    pub sess: & Session, // Session 在这里
    ....
}

It’s hidden deep enough, but fortunately we dug it out, and currently focus on error handling, so don’t worry about what these context structures (XXXCtxt) mean for the time being.

Screen Shot 2022-09-02 at 15.24.03.png

As shown in the figure above, similar to the part of Parser, ty_check() is a method written by myself, which refers to the process of TypeChecker’s type checking of Rust programs. Currently, it focuses on error handling, so the context structures such as InferCtxt, TyCtxt and GlobalCtxt I It is abbreviated as XXXCtx. It can be seen that this process is the same as the process of Parser error handling. If an error occurs during type checking, a structure that implements trait SessionDiagnostic is instantiated and thrown to TypeChecker built-in The emit_err() method in the built-in Session in various contexts outputs diagnostic information.

Seeing this, the pressure is on Session and ParseSess. Since everyone throws mistakes on him, let’s take a look at what’s going on in it.

pub struct Session {
    pub parse_sess: ParseSess,
	......
}

pub struct ParseSess {
    pub span_diagnostic: Handler,
	......
}

I don’t understand very well, let’s take a look at the previous code

// self.parse_sess.emit_err(err)
impl ParseSess{
    pub fn emit_err(
        & self, 
        err: impl SessionDiagnostic
    ) -> ErrorGuaranteed {
        self.create_err(err).emit()
    }
}

// 这个方法是 self.create_err(err)
impl ParseSess{
    pub fn create_err(
        & self,
        err: impl SessionDiagnostic,
    ) -> DiagnosticBuilder<ErrorGuaranteed> {
        err.into_diagnostic(self)
    }
}

// 这个方法是 self.create_err(err).emit()
impl DiagnosticBuilder {
    pub fn emit(&mut self) -> G {
        ...... /// 看来,是时候把这里省略的代码展开了...
    }
}

Expand the code on line 21 above to see that this is an abstract interface for a trait:

impl<G: EmissionGuarantee> DiagnosticBuilder<G> {
    pub fn emit(&mut self) -> G {
        // 省略的代码
        G::diagnostic_builder_emit_producing_guarantee(self)
}

// 省略的代码是一个trait的抽象接口。
pub trait EmissionGuarantee: Sized {
    fn diagnostic_builder_emit_producing_guarantee(
        db: &mut DiagnosticBuilder
    ) -> Self;
	...
}

In order to prevent getting lost, don’t delve into what EmissionGuarantee does, just focus on the function he provides to output diagnostic information to the terminal. Then, we search EmissionGuarantee globally, find an implementation of EmissionGuarantee, and see how he outputs information.

impl EmissionGuarantee for ErrorGuaranteed {
    fn diagnostic_builder_emit_producing_guarantee(
        db: &mut DiagnosticBuilder<Self>
    ) -> Self {
        match db.inner.state {
            DiagnosticBuilderState::Emittable(handler) => {
                ...
                let guar = handler.emit_diagnostic(&mut db.inner.diagnostic);
            	...
            }
            DiagnosticBuilderState::AlreadyEmittedOrDuringCancellation => {
                ......
            }
        }
    }
}

Seeing the above code, I feel that the pressure is coming to DiagnosticBuilder. It’s all here, so I have to take a look.

// match db.inner.state

pub struct DiagnosticBuilder<G: EmissionGuarantee> {
    inner: DiagnosticBuilderInner,
    ...
}

struct DiagnosticBuilderInner {
    state: DiagnosticBuilderState,
    diagnostic: Box<Diagnostic>,
}

// match db.inner.state
enum DiagnosticBuilderState {
    Emittable(& Handler),
    AlreadyEmittedOrDuringCancellation,
}

As you can see, the diagnostic information is finally output through the Handler in DiagnosticBuilderState.

/// A handler deals with errors and other compiler output.
/// Certain errors (fatal, bug, unimpl) may cause immediate exit,
/// others log errors for later reporting.
pub struct Handler {
    flags: HandlerFlags,
    inner: Lock<HandlerInner>,
}

Go to the Handler and look at the comments. I think it’s ok. We know all the diagnostic information of the error, and finally output it to the terminal through the Handler. Here, you can refine the above diagram:

Screen Shot 2022-09-02 at 16.56.19.png

As shown in the figure, we have drawn a little detail inside the DiagnosticBuilder in the figure, ignoring the EmissionGuarantee first. DiagnosticBuilder includes Handler for outputting diagnostic information and Diagnostic for saving the content of diagnostic information. In Session and ParseSess, the into_diagnostic() method of SessionDiagnostic will be called first to obtain DiagnosticBuilder, and then the emit() method of DiagnosticBuilder will be called to output diagnostic information. ) method, the built-in Handler of DiagnosticBuilder will be called and the Diagnostic in DiagnosticBuilder will be output to the terminal.

Summarize

In this article, we have only covered a small part of the error handling module in Rustc. Through this part, we have a general understanding of the entire process of errors in Rustc from appearing to outputting diagnostic information to the terminal. Finally, take the rustc_parser and rustc_type_checker mentioned in the article as an example, and a picture ends.

Screen Shot 2022-09-02 at 17.17.33.png

Three parts of the Rustc error handling module:

  • Various parts of the compiler customize the structure of errors and save error information.
  • SessionDiagnostic is responsible for converting the structure of each part of the custom error to DiagnosticBuilder.
  • Session/ParseSess is responsible for calling the interface provided by SessionDiagnostic to obtain DiagnosticBuilder, and calling the built-in method of DiagnosticBuilder to output diagnostic information.

If you are still a little confused, add a stroke to the above picture. Through the red tip, we can see the main flow of information contained in an exception in Rust from the place where the error occurred to the developer:

Screen Shot 2022-09-05 at 10.18.39.png

As you can see from the right part of the above figure, the error message is not sent directly from the DiagnosticBuilder to the developer, but from the Session first, so why do it? Here’s a pit first, and then we will go further into the source code of Rustc, analyze and interpret the source code structure of each part in detail, and understand the motivation of Rustc developers to add each part.

Digging in this issue

  • What exactly are Session and ParseSess?
  • Why does the search for emit_err() not involve the parts of Lexer and CodeGen for lexical analysis and how to deal with errors in these two parts?
  • What does the EmissionGuarantee structure do during error handling?

refer to

#overview #Rusts #error #message #output #principle #personal #page #chai2010 #News Fast Delivery

Leave a Comment

Your email address will not be published. Required fields are marked *