Understanding JavaScript Async by Hacking Dropbox
<p>When most online articles explain javascript async behaviour, most immediately gravitate towards the concepts of <em>delayed evaluation</em>, <code>Promise</code>s, and <code>async function</code>s. While this does offer value to the pragmatic programmer, it fails to explain just how a single-threaded programming language deals with immediacy and delayed execution. In fact, I only recently had the opportunity to truly delve in to how <em>most</em> JavaScript runtimes execute asynchronous behaviour.</p> <p>(Not all JavaScript implementations deal with asynchronous behaviour the same way; this becomes important with how we expect our code to e...</p>
When most online articles explain javascript async behaviour, most immediately gravitate towards the concepts of delayed evaluation, Promise
s, and async function
s. While this does offer value to the pragmatic programmer, it fails to explain just how a single-threaded programming language deals with immediacy and delayed execution. In fact, I only recently had the opportunity to truly delve in to how most JavaScript runtimes execute asynchronous behaviour.
(Not all JavaScript implementations deal with asynchronous behaviour the same way; this becomes important with how we expect our code to execute).
I spoke about async before at Lighthouse Labs, and learning the ins and outs proved a challenging task -- even for a self-taught, intermediate developer such as myself.
The Dropbox Example
I had a problem recently, while using rclone
: my data and all its corresponding folder accidentally got deleted when I ran:
rclone sync backups dropbox_main:
This example of rclone
effectively wipes all folders and files, replacing them with the contents of my local backups
folder (not good!) 😞.
So now I had all the previous contents of my dropbox folder in the trash.
To fix this, I hopped on over to dropbox's trash page, where I found a randomly-ordered list of all those files and folders I accidentally deleted.
A First Petty Solution
My first attempt involved a petty jQuery solution:
function clickBoxes() {
jQuery(".mc-checkbox.mc-checkbox-unchecked").trigger("click");
}
I soon found myself repeating the menial task of pressing CTRL and UP. It only took a matter of minutes to realize just how many files I had managed to accidentally delete. I couldn't continue to perform this menial task. Onto the first naiive solution to mitigate that problem:
for (let i = 0; i < 16; i++ {
clickBoxes();
}
But it didn't make a difference! Why?
The JavaScript Event Loop
In short: the JavaScript event loop.
Every time we click all the checkboxes, dropbox's JavaScript client makes an ajax request to fetch the next items in the infinite list of deleted files and folders. So even though we search for and click all the checkboxes 16
times, we don't give the Dropbox UI enough time to fetch the next files and folders in the list.
Instead, we really want to wait for Dropbox's UI to load the next files and folders before we execute clickBoxes
again. How do we do that?
setTimeout
setTimeout
allows us to execute code later... kind of.
Let's take a look at the JavaScript event loop diagram.
By default, setTimeout
tells the compiler to queue up the call for clickBoxes
to after all synchonous code has executed. This means that Dropbox's UI should theoretically have an opportunity to fetch the next folders and files in our tras. With that in mind, we can rewrite setTimeout(clickBoxes, 0)
to setTimeout(clickboxes)
, omitting the second argument of setTimeout
. Our new solution might look something like this:
for (let i = 0; i < 16; i++) {
setTimeout(clickBoxes);
}
Running this in the console uncovers a strange behaviour of the JavaScript runtime: why don't async request execute as we would expect now that we have moved clickBoxes
to the callback queue?
To answer this question, we must look to the diagram again.
Notice how the same delay I showcased earlier in the console doesn't happen. Because we have moved the calls for clickBoxes
to the callback queue, Dropbox's subsequent AJAX request does happen, but not in time for any subsequent executions of clickBoxes
. As a result, in the same way as before, all executions of clickBoxes
still happen before we get a chance to load subsequent folders and files.
So how do we give the UI a chance to load subsequent files and folders? In short, we need to add a delay to match the race condition of the next pages not loading. Turns out we'll have to use that second argument of setTimeout
after all:
for (let i = 0; i < 16; i++) {
setTimeout(clickboxes, i * 1000); // 1000 = 1 second
}
Alas, it works!
But now we only execute clickBoxes
16
times. That surely won't suffice for millions of files. To start with, a naiive solution might involve increasing the count of times we execute clickBoxes
:
for (let i = 0; i < 9000; i++) { // it's over 9000!
setTimeout(clickboxes, i * 1000); // 1000 = 1 second
}
However, what we really want, instead, will require indefinitely.
To do that, we will need our only other friend: setInterval
.
setInterval
setInterval
allows us to execute a callback function at a specified interval. Thus, we can indefinitely execute clickBoxes
. The second parameter allows us to specify just which interval at which the function should execute.
setInterval(clickBoxes, 1000); // every second, run `clickBoxes`
And voila! It works.
clearInterval
Just how long will we really want to run this, though? When we try to click on the restore button, the clickBoxes
function still executes. We need a way to stop the clickBoxes
function from executing -- temporarily.
Luckily for us, the clearInterval
method allows us to stop a setInterval
function from calling.
We do this by passing in the ID (a positive integer returned from the setTimeout
):
let repeatingClickID = setInterval(clickBoxes, 1000); // 59
Once we're ready to click that "restore" button, we can halt the clickBoxes
from executing every second:
clearInterval(repeatingClickID);
Automating File Restoration
While the solution described above works well, we can still automate the steps further to prevent having to click on the restore button. How can we restore batches of files by 300 files each? Also, it gets annoying to have to type in each of these commands in our console.
To solve this problem, let's start with a function
that lets us configure how many files we want to restore:
const restoreDropboxFiles = function(fileCount) {
// place logic here
};
Leveraging Promise
s
With our newly-defined function
we can now leverage Promise
s, incorporating our previous setInterval
solution:
const selectFilesToRestore = function(fileCount) {
const clickedBoxesPromise = new Promise((resolve, reject) => {
const $fileCheckboxes = jQuery(".mc-checkbox.mc-checkbox-unchecked").trigger("click");
return $fileCheckboxes.length > 0 ? resolve($fileCheckboxes) : reject(`No more files to recycle! Got ${$fileCheckboxes.length} files!`);
});
return clickedBoxesPromise;
}
The above code creates a new Promise
which resolves only when the page has available files to delete (if at least 1 file exists, then we can resolve
the Promise
).
We can check that this works by running it once (without setInterval
) in our console like so:
selectFilesToRestore(10).then(() => alert("files restored!"));
One problem, however: the function
will indiscriminately click on every file checkbox on the page; we still haven't used fileCount
. We need to have some way to check when we've restored the set amount of files.
We can do this with Array.prototype.slice
; it allows us to generate a sub-array based on the length we provide (if the provided array is empty, and we try to access non-existent indices, we still get an empty array []
).
Let's refactor our previous Promise
ey solution of selectFilesToRestore
to use Array.prototype.slice
(jQuery overrides this with its own implementation):
const selectFilesToRestore = function(fileCount) {
const clickedBoxesPromise = new Promise((resolve, reject) => {
const $fileCheckboxes = jQuery(".mc-checkbox.mc-checkbox-unchecked").slice(0, fileCount).trigger("click");
return $fileCheckboxes.length > 0 ? resolve($fileCheckboxes) : reject(`No more files to recycle! Got ${$fileCheckboxes.length} files!`);
});
return clickedBoxesPromise;
}
Great! Now we can provide a number of files to check, but we still have to restore them. Let's create another Promise
ey function
to do this:
const restoreButtonSelector = ".restore-button";
const restoreButtonClicked = function() {
return jQuery(restoreButtonSelector).trigger("click").promise();
}
The promise()
method provided by jQuery returns a resolved promise (jQuery's implementation of a deferred
object, similar to the native Promise
but with polyfill capabilities to support all browsers). Now, only when we've clicked on the restore button will the promise resolve.
Now, we can use our restoreButtonClicked
in conjunction with selectFilesToRestore
to generate our final solution:
const restoreButtonSelector = ".restore-button";
const restoreButtonClicked = function() {
return jQuery(restoreButtonSelector).trigger("click").promise();
}
const selectFilesToRestore = function(fileCount) {
const clickedBoxesPromise = new Promise((resolve, reject) => {
const $fileCheckboxes = jQuery(".mc-checkbox.mc-checkbox-unchecked").slice(0, fileCount).trigger("click");
return $fileCheckboxes.length > 0 ? resolve($fileCheckboxes) : reject(`No more files to recycle! Got ${$fileCheckboxes.length} files!`);
});
return clickedBoxesPromise.then(() => clickRestoreButton());
}
We now have a working solution to restore files!
However, this solution only works for each page. How can we implement this solution to delete files on multiple pages?
To do that, we'll need to combine everything we've learned so far about async.
Combining setInterval
with Promise
s
How can we chain Promise
s within a setInterval
? Our first attempt might look something like this:
const restoreDropboxFiles = function(fileCount) {
let filesDeletedSoFar = 0;
const selectFilesInterval = setInterval(function() {
const selectFilesToRestorePromise = selectFilesToRestore(fileCount).then((checkedFiles) => {
filesDeletedSoFar += checkedFiles.length;
if (filesDeletedSoFar >= fileCount) {
return restoreButtonClicked().then(() => clearInterval(selectFilesToRestore);
}
});
}, 1000);
};
Success! Each second, we continuously select new files to be deleted, finally clicking on the restore button. When that button gets clicked, we call clearInterval()
. Note that you can call clearInterval
from within a setInterval
.
Introducing async
, await
We can further optimize the solution to reduce the amount of .then
chaining to place our code in-line so that we can reason about our code as if it were synchronous. We can do this through JavaScript's new async
and await
keywords:
const restoreDropboxFiles = function(fileCount) {
let filesDeletedSoFar = 0;
const selectFilesInterval = setInterval(async function() {
const selectFilesToRestorePromise = await selectFilesToRestore(fileCount);
filesDeletedSoFar += selectFilesToRestorePromise.length;
if (filesDeletedSoFar >= fileCount) {
await restoreButtonClicked();
clearInterval(selectFilesToRestore);
});
}, 1000);
};
Conclusion
All in all, asynchronous behaviour in JavaScript is predictable because the language operates on a single thread. Though the implementations used vary from browser to browser, we can take confidence in understanding how our program flow gets interpreted by the JavaScript compiler (V8, Gecko, etc.). Understanding how this works not only allows us to write concise code, but it gives us assurance as to how our code will execute, helping us to tidy up loose ends and to remove potentially unreached code.
In many ways, our adventures in JavaScript async have closley resembled what we may have done while writing acceptance or integration tests (a la Q Unit, Protractor, etc.). While the examples above do not specifically test code, we can use the same techniques with async
, await
, and setInterval
to improve our code quality and readability.