Racer + Rustc update2

(This is the second in a possible series of posts about racer + rustc, the first being here)

Over the last few weeks I've been spending quite a bit of my evening + weekend spare time in rustc trying out ideas and writing code to figure out what's possible and reasonable for real time type inference.

The goal is to get racer to perform perfect rustc type-inferred completion queries in the 100ms ballpark. To give an idea what racer is up against: using rustc to do a type check pass on a sizeable library (I've using the 'rustc_typeck' crate as an example) takes close to 10 seconds, with 8 seconds of that being the actual type checking.

Here's a representative breakdown of a prototype running against rustc_typeck (times are in seconds):


0.182s         parse crate text from files into one big AST
0.711s         expand macros, insert std::prelude
0.123s         read external crates (libraries)
0.164s         resolve crate (i.e. resolve all the static paths to point to actual nodes)
0.046s         create a type context object (mk_ctxt())
0.036s         collect the types of all the items (fns, traits etc..)
0.010s         infer variance
0.018s         coherence check
8.077s         perform a full type inference pass

Originally I was hoping I could find a way to cache a type-checked ast + side tables and then just incrementally fill in functions and re-typecheck only them. Unfortunately this approach conflicts heavily with the current rustc design, which freezes the ast shortly after the macro expansion stage (the 2nd phase above). I suspect changing the rustc design to accomodate modifying the ast after this stage would be a large and invasive change.

So if racer can't cache data after the expand stage, the other potentual avenue is to try and speed up the rest of the phases by typechecking less code. 'rustc' still needs a fully cohesive crate to compile, but there are some transformations that can be made to the code that will still leave a cohesive crate.

Rust has the nice charactistic that a function signature is enough to type-check a caller without the analysing the body, so the first obvious transformation is to remove function bodies except for the one you want to type check.

Here's some timings running a type-check over the same crate, but this time with all the function bodies removed:


0.031s         parse
0.127s         expand
0.085s         read crates
0.032s         resolve crate
0.002s         mk_ctxt
0.016s         collect_item_types
0.001s         infer variance
0.013s         coherence check
0.644s         type check

This is closer, although still way-off the 100ms goal. A sub-second typecheck is a pretty good start though. There are other transformations racer can try - e.g. pruning private declarations that can't be reached from the target code.

So if we take this approach, racer's ultimate performance will depend on its ability to prune the ast. To get an idea of what the performance could be like if racer became really good at this I pruned an ast manually. This is rust_typeck with the body of one function intact (I used 'typeck::check::check_bounds_are_used'), and only the direct dependencies of this function. I cheated a bit and used rustc to tell me what the dependencies should be; The timings assume the initial ast was parsed, expanded and cached, and doesn't include any time racer would take to prune the tree.


0.045s         re-expand the ast based on 
0.000s         ast map
0.073s         read crates
0.008s         resolve crate
0.000s         mk_ctxt
0.000s         collect_item_types
0.000s         infer variance
0.003s         coherence check
0.149s         type check

So there we are. Still almost a couple of hundred millis off the goal, but definitely in right ballpark. I haven't dug much into the actual rustc typechecking code proper yet, so I'm hoping I can get some further (maybe radical) gains there.

I think the next stage for me is to build a prototype that can do interactive type checking of a crate. Something like 'highlight an expression and it tells you the type'. I hope that by building that I'll learn more about rustc and maybe discover some further optimizations or maybe a better approach.

Racer on Rust Beta!

I'm very pleased to announce that Racer now compiles with the Rust 1.0 beta releases! This means that hopefully Racer should be ready to go for Rust 1.0 when it launches later this month.

The big blocker for this release was rustc's libsyntax. I'd like to say a big thanks to Erick Tryzelaar for porting libsyntax into the syntex_syntax crate, and for working hard to remove unstable features from libsyntax. Also I'd like to say a special thanks to Tamir Duberstein for helping to remove the dependency on rust unstable features.

And of course thanks to all the Racer contributors.

(BTW, a side effect of relying on a separate syntax crate is that build times for Racer have gone up dramatically. This is a shame, but I think a small price to pay for Rust 1.0 compatibility)

Rustc noodling for code completion

Background: Racer is a code completion tool for the Rust programming language. This is going to be a bit of a meandering post as I try to decide what to do about type inference.

As I wrote in the previous post, I'm pretty convinced that racer should be using rustc to interrogate the type of expressions. This would enable racer to perform more accurate completions (and potentually complex refactorings in the future). Currently racer does some of its own type inference, but getting this right is error prone and I think long-term probably a dead end.

However the nice thing about racer doing its own type inference is that it can still work in the face of unfinished code, and also be very fast (because it can work incrementally, take shortcuts and avoid doing a lot of the work that rustc does). The downside is that hand-rolling rust type inference is a lot of work and I don't have much spare time.

Initially I was wondering if racer could approach the type inference problem by isolating the target expression and constructing a smaller representative program around it. The program would be quicker to compile by rustc but have same type expression output. This is essentially the approach racer currently takes with parsing.

Unfortunately after attempting a few hand written examples I realised this would be a very complex approach. E.g. consider this piece of code from libsyntax parse/parser.rs and imagine you want to construct a program to extract the type of the |p| argument:

                let tts = try!(self.parse_seq_to_before_end(
                    &token::CloseDelim(delim),
                    seq_sep_none(),
                    |p| p.parse_token_tree()
                ));

Racer would need to incrementaly resolve the types of a whole bunch of expressions before it could reasonably construct a representative program to run through rustc in order to find the type of 'p'. I suspect this work is similar in scope to completing racer's existing hand-rolled type analysis, which kind of defeats the point.

So the other approach is to run the rustc type inference over whole crates, including imports, and somehow get this fast enough for a good user experience. I'm trying to figure out how best to accomplish this. (Ignoring unfinished code for now)

I started by putting together a spike which attempts to use rustc to run a full type-inference pass on some code. I had to create a copy of the rustc_typeck crate (called mytypeck) in order to get access to the private innards.

(N.B. rustc changes frequently so it is likely that the mytypeck crate will get out-of-date very fast)

The approach I currently intend to persue is to maintain an ast and side tables of all the items in a crate, but no function bodies, (a 'skeleton ast' if you will), and then somehow patch the ast with an individual function body and get the side-tables updated in order to obtain the type of an expression contained within. The function body may be need to be artificially constructed by racer to handle unfinished code. The skeleton ast would need to be constantly updated as the item signatures are updated.

This is all uninformed speculation, but I'm a bit limited with spare time and wanted to get something written down to encourage some chat and input from people with more experience. Am I on the right track?

Racer Progress Update 5

Racer is a code completion tool for the Rust programming language. Time for another progress update!

1. Cargo support

The big news is that I recently added some cargo support to racer:

When searching for crates racer now interrogates the nearest Cargo.lock and Cargo.toml file and then searches git checkouts and crates.io repos in the ~/.cargo directory. This was a bit fiddly but luckily I had the source-code to Daniel Trstenjak's excellent rusty-tags project to refer to. Thanks very much Daniel!

2. Rust Beta

Racer relies on some unstable libraries at the moment, which unfortunately means it won't currently build with the Rust Beta channel. Hopefully this situation will be remedied over the next few weeks before 1.0, but in the meantime you need a rust nightly install to build racer.

(Unstable features used are: collections, core, rustc_private, str_char, path_ext, exit_status. Patches very welcome!)

3. Performance improvements

Back in February Renato Zannon noticed that racer was spending a lot of time spawning threads. The spawning was originally designed to protect against libsyntax panics, and preventing crashes has become an important issue recently because there is a lot of demand to provide racer as a library for linking with IDEs. Unfortunately the thread spawning also generates a lot of overhead and removing it more than doubled the performance of racer in a lot of cases.

The thread spawning is currently removed from the codebase, which means that racer will sometimes panic and fail to produce matches. As an alternative to wrapping libsyntax calls I've been spending time removing panics from Rust's libsyntax itself in the last few weeks. If you want to follow along the PR-in-progress is here.

4. Type Inference

Not much progress on the type inference front I'm afraid. I'm now more convinced than ever that racer should be using rustc for type inference rather than continuing to roll its own. The question is how to get there: the rust compiler is currently not the right shape to do incremental type inference. There is plenty of desire in the community to have a compiler library that can support IDEs, so I plan to help out there where I can.

As a start I'm going to try having racer construct small representative pieces of code to compile (in the same way as it does with using libsyntax to parse), and see where that takes me.

5. Thanks!

Once again, a big thanks to all the Racer contributors. Your help and support is much appreciated:
Andrew Andkjar, awdavies, Ben Batha, Björn Steinbrink, Björn Zeutzheim, Dabo Ross, Darin Morrison, Derek Chiang (Enchi Jiang), Emil Lauridsen, Henrik Johansson, Heorhi Valakhanovich, Jake Kerr, Johann Tuffe, Jorge Aparicio, Justin Harper, Keiichiro Ui, krzat, Leonids Maslovs, Marvel Mathew, mathieu _alkama_ m, Michael Gehring, Michael Maurizi, nlordell, oakes, Pyry Kontio, Renato Zannon, rsw0x, Saurabh Rawat, Sebastian Thiel, Siddharth Bhat, Vincent Huang, Vlad Svoka, Zbigniew Siciarz, Ziad Hatahet.

Racer Progress Update 4

Time for another Racer update! Most of the work on racer in the last few months has been about keeping up with the flood of changes to the rust language and internal apis. Aside from that there were a couple of improvements:

  • Racer got some support for generics (connect() returns IoResult<TcpStream>)
  • It got some support for destructuring
  • Racer got better at handling half-finished code, dangling scopes and match statements
  • It also got a bit better at working on windows (in particular carriage returns and BOMs (byte-order-marks) in files).

TODO List

Racer is starting to get some heavier use by people now, which is great, but has thrown up a few deficiencies that need addressing in the coming weeks/months

  • Cargo support

    Racer works well for a small project using the rust standard libraries. Unfortunately using it with a modern cargo project is a bit clunky and sometimes completely unworkable, involving having to fiddle with the RUST_SRC_PATH variable and in some cases make a copy of external sources in a separate directory. Racer needs to be able to read cargo.toml files and locate the sources referenced in them.

  • Racer as a library (+ daemon)

    Some people would like racer to operate as a long-running daemon for performance reasons. Current Racer performance isn't a big deal for me personally (a complete or find-definition is a few hundred ms on my laptop in most cases), but @valloric thinks it will be and he has a lot more experience in this area due to his YouCompleteMe project. The plan is to create a racer library that editors and daemons can embed.

  • Racer only reads code from files on the file system

    This is somewhat connected to the library thing: Currently racer only consumes code from the file system - you cannot pass it text to work on. This simplifies the internals but means that if you're half way through editing a file and you need racer, your editor must save the contents of your file before invoking it. This is currently handled by emacs and vim by saving a temporary file in the current directory. If racer is to work as a library in a long running process it will need the capability to allow editors to pass unsaved code to it directly and not get confused when it doesn't match what's on disk.

  • Type inference + rustc

    This is probably Racer's biggest hurdle: Racer currently does a bunch of custom type inference in order to resolve method and field completions and definitions. Unfortunately this inference is nowhere near 'complete' (ha!), and the rust language is a fast moving target making this a bunch of effort. I could easily spend the next year or two of my spare time replicating rust type inference and getting nothing else done.

    Instead I'd ideally like to lever the rustc compiler to do the type inference. Unfortunately rustc currently isn't designed for type-inferring anything other than an full complete crate, which is slow and also won't work for half-finished programs like the code you typically autocomplete. I need to spend some time researching what can be done in this area and I'm open to any ideas or advice.

    I noticed @sanxiyn has been experimenting with adding an ast node to represent a completion ('ExprCompletion') to enable this to go through the various compiler passes even though the code is 'broken'. This is cool and really promising - I think the meat here is in allowing incomplete programs to get through rustc's type-inference pass. I'm not sure whether having a 'completion' ast node is as important (once you can evaluate the type of an expression, providing completions is possible without the compiler support), but maybe this approach could be generalised to allow the compiler to skip over sections of code that don't parse correctly.

    In the short term I'm wondering if it might be possible/reasonable to generate small programs that represent the inputs to a specific type query but are quicker for rustc to type-infer. Racer currently does this for parsing (constructs a artificial statement similar to the original and then uses rustc's libsyntax) and it works really well there, but parsing is considerably less brittle than a full type-inference pass. I'm unsure of the best place to discuss this sort of thing; Reddit?

Finally I'd like to thank everybody that has contributed fixes and code to Racer so far:

Michael Gehring, Johann Tuffe, rsw0x, Darin Morrison, Zbigniew Siciarz, Michael Maurizi, Björn Steinbrink, Leonids Maslovs, Vincent Huang, mathieu _alkama_ m, Ben Batha, Björn Zeutzheim, Derek Chiang (Enchi Jiang), Emil Lauridsen, Henrik Johansson, Heorhi Valakhanovich, Jorge Aparicio, Justin Harper, Keiichiro Ui, Pyry Kontio, Renato Zannon, Saurabh Rawat, Vlad Svoka, Ziad Hatahet, awdavies, nlordell

I have very limited spare time so it's a real joy to wake up in the morning and find that somebody has submitted a patch to e.g. bring racer up to date with the latest rust. Thanks everyone!

Another Racer progress update

Racer has come on a bit since my last post. Here's a quick rundown of the main changes:

  • Method resolution got better. Racer now hunts around traits for default method implementations. This means that e.g. a completion on a Vec method gives you all the methods available, including 'is_exists()' and other methods implemented in the sub traits (see above). Racer can also resolve the type of 'self' so that it can match methods and fields when you're writing method implementations.
  • More completion information Racer now outputs some contextual info which gets included in matches - see the image above.
    (N.B. unfortunately there's a bug in my emacs bindings somewhere in the interaction with company mode which means that the extra stuff only crops up after you move the cursor. I'm not really sure why this happens, I'll try and fix it at some point.)
  • Module searching got better: Racer does a more accurate job of following 'extern crate' and nested modules. It turned out that modules didn't quite work the way I originally thought they did.
  • Racer now recognises explicit types in let expressions
    let v : Vec<uint> = ...
    This gives you a way to 'tell' Racer the type if it isn't smart enough to deduce it.
  • Internally Racer now uses iterators everywhere instead of callback functions. (I mentioned callbacks as something I didn't like about the design in a previous post).

    I originally thought iterators would solve the 'don't do the whole search if I only need the first match' problem. Unfortunately writing lazy iterators in Rust is non-trivial when the thing you're iterating is nested and recursive (like a code search). For now I've replaced the callbacks with a simpler eager solution: 'collect the results in a vector and return vec.move_iter()'.

    N.B. I did manage to cobble together a lazy iterator in one place by chaining together a bunch of procs in the central 'resolve_name' function; This has improved performance in racer - I might do the same trick for sub-trait searches.

Basically Racer is now pretty good at navigating and completing anything that does not involve lambdas, destructuring or generics. I'm planning to work on generics next.

I should say a big thanks to the people who committed patches to racer in the last couple of months, in particular rsw0x who posted a couple of fixes and has also been working on the vim plugin, Jorge Aparicio who got racer integrated with the travis build system and Micheal Gehring who dilligently kept Racer compiling in the face of breaking Rust changes.

While on the subject of contributors: I've been feeling the need for somewhere to discuss Racer recently, in particular for things that I don't have much experience with - e.g. windows support, vim + other ide plugins. I've created a google group racer-discuss; please join if you're interested in Racer.

Racer progress update (+ some vim support!)

It's been a few weeks since I posted about Racer, my rust code-autocompletion project, so here's another update.

First off, thanks to Michael Gehring who has been regularly submitting patches to bring Racer up to date with breaking Rust master changes. I have a very limited amount of time to work on Racer so I really appreciate this, thanks Michael.

The main news is that I've cobbled together a Racer vim plugin to go with the emacs one. I don't use vim much myself and I was kinda hoping somebody with expertise would show up and do this but I guess racer isn't sufficiently advanced for people to be interested yet. I'm a bit of a vim n00b so there's probably a much better way of structuring the plugin - any feedback or help would be gratefully received.

The other big improvement from last time is that racer can now perform some code completion of fields and methods (in addition to the static lookups I talked about last time). It turns out that type deduction is pretty involved. As an illustration here are the steps racer took in the screenshot above:

  1. In the screenshot the completion is being performed on fields of the 'v' variable. The plan is to find the type of 'v' and then locate all the fields and methods connected with that type beginning with 'pu'.
  2. Racer starts by searching backwards in the current scope, and parses the statement 'let v = Vec::new();'. It needs to evaluate the type of the right hand side of the expression so that it can know the type of 'v'. Vec::new() parses to a 'ExprCall' AST type with path: 'Vec::new'.
  3. Racer splits the path 'Vec::new' and starts searching for 'Vec'. 'Vec' is not defined in the screenshot and there are no 'use' or 'extern crate' statements in the file, so racer takes a look in libstd/prelude.rs. (The prelude module is implicitly included in all rust files).
  4. libstd/prelude.rs contains the line:
           ...
           #[doc(no_inline)] pub use vec::Vec;
           ...
    
    So now Racer needs to find the path 'vec::Vec'. It splits the path and starts searching for 'vec'
  5. 'vec' isn't defined anywhere in the prelude module. The next place to search is in the crate root, since paths in use statements are implicitly relative to the crate root. Racer hunts for the crate root using some heuristics and finds libstd/lib.rs.
  6. libstd/lib.rs (the std lib crate root) contains the line:
           ...
           pub use core_collections::vec;
           ...
    
    So Racer needs to find 'core_collections::vec'. It starts searching for 'core_collections' in the same file.
  7. The std lib crate root also contains the line:
           ...
           extern crate core_collections = "collections";
           ...
    
    Which means in the std lib crate 'core_collections' is an alias for the collections crate. Racer searches for the collections crate root and finds it in libcollections/lib.rs.
  8. The collections crate root contains:
           ...
           pub mod vec;
           ...
    
    Which means the source for the vec module should be in either vec.rs or vec/mod.rs under the libcollections directory. Racer finds vec.rs and searches for 'Vec::new' under it. It splits the path and searches for 'Vec'
  9. The vec module contains (amongst other things):
           impl<T> Vec<T> {
              ...
              ...
           }
    
    Which means racer has located an impl for Vec function. It now searches for 'new' in the impl scope and finds
        #[inline]
        pub fn new() -> Vec<T> {
            Vec { len: 0, cap: 0, ptr: 0 as *mut T }
        }
    
  10. Rather than parse the whole method Racer extracts the method type signature of new() and parses that. 'new()' returns a 'Vec<T>' type. Racer now has the type of the right hand side of the 'let v = Vec::new();' statement from step (2) and so knows that the type of the 'v' variable is 'Vec<T>'. All it needs to do now is look for fields and methods belonging to the Vec type. It starts by looking for the definition of the Vec type.
  11. Racer Locates the Vec struct in the the vec module
    pub struct Vec<T> {
        len: uint,
        cap: uint,
        ptr: *mut T
    }
    
  12. This is starting to get a bit boring so I'll cut the rest short: Racer looks for public fields in the struct (there are none), and then hunts for traits implemented for Vec and lists the methods (which it identifies as functions with 'self' as the first argument) starting with 'pu'. Phew!

Despite this elaborate field completing capability, Racer's type inference facilities are very far from er.. complete. At this point in time it is easy to confuse Racer with generic types, destructuring, lambdas and many other constructs that pop up all the time in rust code.

My plan now is to grind through each of these in some sort of order based on usage, probably starting with generics. My hope is that by incrementally adding smarts Racer will gradually become more and more useful and evolve into a fully fledged decent ide library by the time Rust hits 1.0.

Incidentally, this Thursday (26th June) is the first Rust London meetup, which looks like it is going to be awesome. (I'm not involved with organising it, just excited to be going!)

Racer progress update (Code autocompletion for Rust)

I've recently become a fan of the Rust Language. I think memory safety without garbage collection is an important innovation in the field and I'm excited by the prospect of using a safer alternative to C++ for fast low level code in my job.

Unfortunately the language is currently in heavy development, with major features changing every week or so. This is exciting but makes it unsuitable for production code at the moment. Instead I've started a project in my evenings and weekends to help me get up to speed with the language.

'Racer' is intended to bring code completion facilites to Rust editors and IDEs. It works as a standalone program that takes source code coordinates (filename, character position) and outputs completions. Eventually I hope it will also provide other search and editing capabilities. At present though it provides auto-complete of functions, structs + enums, and also does 'find-definition' for jumping around code.

It works like this:

  1. User presses <tab> or whatever to complete some code
  2. Editor writes out a temporary copy of the edited text and invokes Racer on it
  3. Racer searches the source code and prints the matching completion options
  4. The editor then reads the output and displays nice drop down list or whatever

The only editor support bundled with Racer at the moment is emacs. I also heard that Eric West has written support for the Atom editor. It appears that a lot of the core Rust developers use vi so I'm privately hoping that somebody will come along and add some vi support too.

Under the hood Racer works by parsing the local expression to be completed and then traversing the source code, doing text searches to narrow scope and parsing individual statements and blocks with libsyntax to perform type inference.

Internals

The code is very 'work in progress' (i.e. a bit crappy), mainly because I'm still getting to grips with how best to do things in Rust. There are some good bits and some less good bits:

Good:
  • Racer has a comment and string scrubbing iterator that works well. This is important for text-searching code without getting tricked by doc strings and comments.
  • It also has a statement iterator that steps through blocks of code quickly without requiring the parser (which is relatively expensive). This means statements can be text searched and then sent to libsyntax for full-blown parsing if they look like a match.
  • I'm now up to speed with Rust's parser package libsyntax and use it to parse various statements and expressions. The Rust language has changed a bunch over the last year and libsyntax still has a bunch of old idioms and GC based code so I feel like this is an achievement in itself.
  • I'm quite pleased with the decision to invoke racer as a command for each search rather than running a daemon that the editor talks to. It makes things easier to test from the command line and I hope will make integration with editors easier too. The downside is that I'll have to make code searching fast enough to work without building an up front in-memory model of the code. If it gets too slow I'll probably consider writing a code cache file rather than running Racer as a daemon.
Less good:
  • Internally the various searching functions should really return iterators so that the calling client can stop consuming whenever it wants. Instead I'm passing a callback arg to the methods which they then invoke when they find something, e.g.
    pub fn complete_from_file(src: &str, 
                              filepath: &Path, 
                              position: uint, 
                              callback: &mut |Match|) { 
       ... 
    }
    
    complete_from_file("foo", "myfile.rs", 3, |match| {
         // do something with each match
    });
    
    This means the whole search happens regardless of whether you just want the first match. The main reason for doing this is because iterators are somewhat less easy to write for my brain. I'm hoping that yield functionality shows up to help at some point. In the meantime converting Racer's search functions into iterators is on the todo list.
  • Lots of debug!() calls. I'm using logging as my primary debug tool. This is a habit I've got in practically every language I use, even with java+intellij which has a great debugging environment. Rust's debug story isn't bad actually and I've used GDB a bit to debug stuff in Rust.
  • I'm still not completely comfortable with Option types in Rust; for my brain they still generate a little friction every time I need to think about unpacking a result. In Racer the majority case is to just ignore the None type (because we are searching so 'None' usually means nothing showed up). My current go-to method of doing this in code is .map():
    do_something_that_might_work().map(|res| {
           do_something_with_result(res);
    });
    
    I feel I should be using .iter() instead:
    for res in do_something_that_might_work().iter() {
           do_something_with_result(res);
    }
    
    but I can't look at 'for' without thinking 'loop'. I find this particularly unclear for methods that are named as if they could return plural results (e.g. StrSlice::find_str(), which looks like it could return multiple results, but in fact stops at the first one and returns an Option<uint>).
  • Code positions are referenced by byte offset in the start of the file. Unfortunately some days I prefer calling this the 'point' (like emacs does), and some days it's 'pos'. I really should just pick one.
  • Actually the biggest problem with Racer at the moment is that I'm tracking Rust master and am often a step behind new language changes. This means the Racer build is frequently broken with respect to the latest Rust nightly. I'm not sure how best to address this; Maybe I'll release a version that tracks the next Rust release when it gets done.

That's enough brain dumping for now. Racer's completion is pretty immature at the moment but it is still somewhat useful. I personally use the find-definition feature a lot for jumping around code to look up function arguments and I'd miss coding in Rust without it. The next phase for Racer is beefing up the type inference to better track types and handle methods on traits.