TODO

Generic

  • It would be nice to have some basic infrastructure for handling command line parsing better.
  • It would be nice to have some infrastructure for statistics (a la LLVM).

Ninja Manifests

  • Diagnose multiple build decls with same output.

Build Engine

  • Generalize key type.
  • Generalize value type (cleanly, current solution is gross).
  • Support multiple outputs.
  • Support active scheduling (?).
  • Consider moving the task model to one where the build engine just invokes callbacks on the delegate (probably using task IDs, and maybe kind IDs for the use of clients which want to follow class-based dispatch models). This has two advantages, (1) it maps more neatly to an obvious C API, and (2) it should make it easier to cleanly generalize the value type because only the engine and delegate need to be templated, and neither support subclassing.
  • Figure out when taskNeedsInput() should be allowed, and if taskDiscoveredDependency() should be eliminated in favor of loosened rules about when it can be invoked.
  • Figure out if Rule should be subclassed, with virtual methods for action generation and input checking.
  • Implement proper error handling.
  • Think about how to implement dynamic command failure – how should this be communicated downstream? This is another area where flexibility might be nice.
  • Investigate how Ninja handles syncing the build graph with the on disk state when it has no DB (nevermind, seems to rebuild). What happens when an output of a compile command is touched?
  • Introspection features for watching build status.
  • Performance
    • Think about # of allocations in engine loop.
    • Think about whether there is any way to get of per-task variable length list.
    • Think about making-progress policy, what order do we prefer starting tasks vs. providing inputs vs. finishing tasks.
    • Should we finalize immediately instead of moving things around on queues.
    • We need to avoid the duplicate stating we currently do of input files.
    • Implement an efficient shared queue for FinishedTaskInfos, if we stay with the current design.
    • Figure out if we should change the Rule interface so that the engine can manage a free-list of allocated Tasks.

Build Database

  • Performance

    • Should we support DB stashing extra information in Rule (its own ID number). Should the engine officially maintain the ID (maybe useful once we generalize anyway).

    • Should we support the DB doing a bulk load of the rule data? Probably more efficient for most databases, but on the other hand this would go against moving to a model where the data is only stored in the DB layer, and the BuildEngine always has to go to the DB for it. That model might be slower, but more scalable.

    • Investigate using DB entirely on demand, per above comment. Very scalable.

    • Normalize key and dependency node types to a unique entry to reduce duplication of large key values.

    • Many clients end up having additional information about a key that then gets lost when they serialize it and get it back somewhere else (e.g., one task requests the key, another starts the key). For example, the client most likely has some internal state associated with that key that would be really nice to be able to pass around.

      We can solve this by making the key type richer, and allowing it to have serialization as just one of its methods. We could use a virtual interface or just a simple mechanism to attach a payload.

Ninja Specific

Tasks for Usable Tool

  • Implement path normalization (for Nodes as well as things like imported dependencies from compiler output).
  • Implement Ninja failure semantics for order-only dependencies, which block the downstream command.
  • Handle removed implicit dependencies properly, currently this generates an error and then builds ok on the next iteration. The latter problem may indicate a latent issue in handling of discovered dependencies.
  • Handle rerunning a command on introduction of new dependencies. For example, before we reran on command changes we wouldn’t rebuild a library just because it depended on a new file. We should probably make sure this happens even if the command doesn’t change, although it might be good to check vs Ninja.
  • Support update-if-newer for commands with discovered dependencies.
  • We should probably store a rule type along with the result, and always invalidate if the rule type has changed.

Random Tasks

  • Ninja reruns a command if the restat= flag changes? What triggers this? This behavior probably also happens for depfiles and things, we should match.
  • There are some subtle differences in how we handle restat = 0, and generally we don’t do the same thing Ninja would (we can rebuild less). This is because Ninja will just run things downstream if an incoming edge was dirty, but we will do so and also allow the update-if-newer behavior to trigger on interior commands. As an example, look at how the multiple-outputs test case behaves in Ninja with restat = 0.
  • Tasks should have a way to examine their discovered dependencies from a previous run. For example, this is necessary to implement update-if-newer for commands with discovered dependencies.
  • Add support for cleaning up output files (deleting them on failed tasks)?
  • Investigate using pselect mechanisms vs blocked threads.
  • Support traditional style handling of updated outputs (in which consumer commands are rerun but not the producer itself).
  • Performance
    • Need to optimize the evalString() stuff. A good performance test for this is to compare ninja -n on the chromium fake build.ninja file vs llb ninja build --quiet --no-execute (62% of which is currently the loading, and 43% of which is evalString()).
    • It would be super cool to use the build engine itself for managing the loading of the .ninja file, if we cached the result in a compact binary representation (the build engine would then be used to recompute that as necessary). This would be a nice validation of the generic engine approach too.
    • There is some bad-smelling redundancy between how we check the file info in the *IsValid() functions, and how we then recompute that info later as part of the task (and the engine internally will compare it to see if it has changed to know if it needs to propagate the change). We need to think about this and figure out what is ideal. There might be a cleaner modeling where we discretely represent each stat-of-file as an input that is then consumed by each item that requires it. This would make it easy to guarantee we compute such things once.
    • I have heard a claim that one can actually improve performance by strategically purging the OS buffer cache – the claim was that it is faster to build Swift after building LLVM & Clang if there is a purge in between. If true, this may be better things we can do to communicate to the kernel the purpose and lifetime of things like object files.
    • We should consider allowing the write of the target result to go directly into the stored Result field. That would avoid the need for spurious allocations when updating results.
    • We need to switch the Rule Dependencies to be stored using the ID of the rule (which means we need to assign rule IDs, but the DB would like that anyway). This dramatically reduces the storage required by the database (although a lot of that is because of our subpar phony command implementation, and it would drop significantly if we switch to a specialized implementation for phony commands, because we don’t need the clunky giant composite-key).
    • We should use a custom task for Phony commands, they have a lot of special cases (like the one above about the composite key size).
    • We should move to a compact encoding for the build value. Not worth doing until we address the rule_dependencies table size.

Build System

Build File

  • We will probably want some way to define properties shared by groups of tasks (for example, common flags), for efficiencies sake. There are a couple ways to do this:
    • We could make the build file an “immediate-mode” sort of interface, and allow interleaving of tool and task maps. Then the client could just generate the file with updated information interleaved. This would be similar to how Ninja files get generated in practice by nice generators (gyp, not CMake).
    • We could allow the definition of tool aliases, that can define additional properties. This lets the format be better definite and not have immediate mode stateful problems.
  • We want some way to allow the task name and one of the tasks outputs to be the same, without having a redundant specification.
  • We might need a mechanism for defining default properties for nodes.
  • We may want to add a notion of types for nodes. We could try and be context dependent too, but having a type here would make it easier for the client to bind the node to the right type during loading.
  • We may want some provision for providing inline node attributes with the task definitions. Otherwise we cannot really stream the file to the build system in cases where node attributes are required.