Over the last 6 months I have had the privilege of being a contributor for the Google Summer of Code (GSoC) with The Rust Foundation. In GSoC, contributors work with a mentor and an open source organisation to work on a project of value to the community. My project was to improve the performance of cargo-semver-checks (csc). cargo-semver-checks, if you haven't heard of it, is a cargo plugin that checks rust libraries for SemVer breaking changes. Since the design of csc makes it difficult to analyse performance with traditional external tools, it was an important project goal that, along with implementing performance improvements, I would design and build a tracer that could easily identify performance bottlenecks. Overall, I reduced the typical runtime on very large crates down to ~2s from ~8s - nearly an 80% speedup - without compromising performance on smaller crates. Along the way I reduced test time from ~7min to ~1min.
trustfall-rustdoc#98 Improve efficiency of detect_rustdoc_format_version. In the old approach, the entire JSON file, which could be several hundred MB, was parsed using serde to find the format version at the end of the file. This PR adds a fast path that directly skips to the end of the file and checks for the version using string manipulation. This saves up to 1% of total runtime on large crates.
trustfall-rustdoc-adapter#902 and trustfall-rustdoc-adapter#903 Add a per-item-kind index of public items. This was a follow-up to a 7 month old PR by Predrag that had stalled because of a lack of visibility into the internal behaviour of c-s-c. Improvements were not as good as expected and there was no clear reason why. Using a prototype performance tracer, I was able to identify that the reason for lacking performance was due to not applying to types that underwent type coercion. Along the way, I ran into an API issue where with_capacity is supported for hashmaps using the default hasher, but not with a custom hasher, and implemented a fix. Also interestingly, it was faster to build the index in serial rather than parallel - creating hashmaps in parallel is slow because after the memory is allocated to construct the hashmaps, all the partial hashmaps need to be combined together and the intermediate maps dropped.
trustfall-rustdoc-adapter#926 Dynamic resolution of enum variants. Tracing using the prototype tracer identified that there was a particular resolution on five queries that was taking a lot of time. When checking whether two enum variants match, for each enum and variant it will iter through all the variants to find a match. This is actually quadratic in the number of variants per enum, taking O(enum_variants^2) time. After adding an index, it becomes linear, since each variant now exactly matches.
trustfall-rustdoc-adapter#927 A small refactoring change to improve readability.
trustfall-rustdoc-adapter#936 Adds an experimental performance tracer into trustfall-rustdoc-adapter. This is my last merged change, and underwent several rounds of iteration before it landed on the final design. The goal of the tracer is to be able to identify how long is spent in each resolution with as little overhead as possible.
trustfall-rustdoc-adapter#1009 A work in progress index to reduce the time taken computing the importable path for each property. It currently has neutral performance implications. When resolving the importable paths for a type, it looks them up in an index instead of recomputing. As a side effect, this allows avoiding allocating the vector of paths each time.
cargo-semver-checks has an exhaustive test suite that runs each lint thousands of times across different test crates for ~250k lints executed. At that number of iterations, even fast operations cause noticeable slowdowns, culminating in a test time of ~7 minutes. Across the following PRs, I reduced that to ~1 minute.
trustfall-rustdoc#103 Add run_query_with_indexed_query to trustfall-rustdoc. About 60% of test runtime was spent re-parsing GraphQL. This PR adds a new function that allows executing a lint with an already parsed query, avoiding re-parsing for multiple runs.
cargo-semver-checks#1371 and cargo-semver-checks#1373 Adds a local cache for each query and uses the newly added function to go from parsing each query six times per lint-crate pair to once per lint - a several hundred time reduction.
trustfall-rustdoc-adapter#911 Lazily initialise and cache the schema once per version instead of reparsing it on-demand. This has a minor beneficial effect in the main execution, but a large effect on test execution where constantly re-parsing the schema took a substantial portion of the test runtime.
cargo-semver-checks#1497 By default, when a test fails, insta generates an expression that shows the line of source code that generates the error. However, since tests for c-s-c are generated by macros, there are hundreds of tests that all run from the same file with the same expression: Expression: &query_execution_result. This clearly doesn't provide any meaningful information - every single one of hundreds of tests will have the same expression. Unfortunately, since generating the expression requires waiting for a lock on the source file, when there are multiple failing tests, generation of the expression can take up to ~10-15% of the test runtime for a failing test. This PR removes the generation of the expression, thereby producing a speedup in test times.
There are a few things that I couldn't get done over the coding period, but that I would like to work on over the coming years. In no particular order, these include:
cargo-semver-checks.Debug implementation.--perf-report command line flag so that users could easily report performance issues with the data to match.