Drake: data workflow management tool

In this video we’re introducing Drake – a data workflow management tool designed for Data Scientists and Data Engineers, which has been developed and used …


#codango #developer #development #coder #coding

We're happy to share this resource that we found. The content displayed on this page is property of it's original author and/or their organization.


8 Replies to “Drake: data workflow management tool”

  1. I love the idea and the implementation of drake. I can clearly see myself using it for my research. However, being new to clojure, I am not sure how to deal with the slow startup time of the uberscript. Is there any way I can pre-compile all the included libraries? Or just keep clojure.core in the memory at all times?
    Thank you for the great tool.

  2. Ivan, you've touched upon an extremely irritating property of Java (and anything else JVM-based like Clojure). The delay you see is JVM startup time. There are some solutions basing on keeping JVM always running. The most common is Nailgun. The problem with Nailgun is that having one JVM has consequences: for example, the current directory is one where you started Nailgun under, there's no clean up between runs, and running multiple instances can get quite problematic.

  3. Also, Clojure seems to have its own problems with Nailgun, for example, under certain conditions Clojure's agents seems to not work very well with it (probably startup-cleanup issue). There are other solutions, such as an ingenious tool called Drip, which, instead of running on JVM, spins up a "backup" one (or several), so that it's ready for the next time. But we didn't have any luck running Drake under it. Simpler stuff works well though.

    This is a known issue: bug #1 is Drake's bugtracking.

  4. Thank you for an exhaustive answer. It is not what I have hoped to hear, but it is what it is. I will take a shot with Nailgun.
    Again, thanks for putting Drake out in the open.

  5. Be careful with Nailgun. Drake doesn't work with it very well. First, it assumes the current directory is where it was run from. Second, it uses agents to capture stdin/stderr/stdout of child processes, and it fails under Nailgun (on the second start, not the first one). You're more than welcome to experiment with it – you can post your questions to the issue #1 I referenced before (I can't post links in YouTube comments).

    Alternatively, you can try running 'main' from under REPL.

  6. If you're following along at home, you'll find the the step that runs `join` fails. The solution is to change $INPUTS to $[INPUTS]. I'm guessing the current implementation of drake (mine is 1.0.1) changed since the video was made.

    I discovered the solution (after an hour or so of hacking through the jungle) by reading the design docs, linked above.

Leave a Reply

Your email address will not be published. Required fields are marked *