I currently run my own high-frequency trading business. Prior to that I was a software engineer at Google Research for 8 years, specializing in deep learning and computer vision.
My entire education and career has been oriented around an interest in AGI, excepting the last 18 months in which I pursued an opportunity in high frequency trading. Given the recently increased rate of change in AI, I am eager to return to the field.
For a condensed version of this website, please email me for my resume.
Updated 2024 January.
I found a market niche in the high frequency trading space and created an autonomous trading system that has made >$300K profit so far on >$10M volume.
Due to the sensitive nature of this work, details are available upon request, but some highlights are:
PyO3. This formulation discovers trades that were not previously found by
prior heuristics, can be extended to non-market-neutral strategies,
and is amenable to a variety of gradient optimization techniques.
Versions of the system were written in TypeScript, then OCaml, then Rust with an OCaml sidecar. The whole system runs on a geographically-distributed Kubernetes cluster on a low-cost cloud provider2.
In that time I hosted 3 interns:
Note, I quit six months before the first layoffs in 2023.
Here are some selected projects:
In Silico Labeling was a project that used deep learning to predict fluorescence images from transmitted-light images of unlabeled cells. It gives life scientists many of the benefits of fluorescence labeling without most of the costs; see this blog post and this editorial for context.
I originated the idea and led the effort across an 18-person team at Google, Verily, Harvard, and Gladstone. The work consisted of target identification, experimental design, sample creation, data collection using robotic microscopes, large scale distributed image processing, and model development.
At the time, the SOTA for image-to-image models wasn’t good enough, due to limited spatial context, artifacts caused by scale changes, and convolution edge effects. So, I created a new architecture (second two figures) carefully designed to address these issues, resulting in a 25% loss reduction and qualitatively better images.
This work was published in Cell, the the Google Blog, and open sourced. It was also patented and led to the creation of two new projects at Verily. Later work automated quality control in similar pipelines.
For this project I used C++, Golang, Python, Flume, and TensorFlow.
I created Google's first hyperparameter tuning API for deep learning, by providing a convenient interface to black box optimizers and infrastructure to manage experiment lifecycles. This was the first version of what became the Vertex AI hyperparameter tuner, a product of the Vizier team led by Daniel Golovin.
At the time, the Vizier team already provided black-box optimizers for other Google products (e.g. Ads), but the API was not immediately suited to deep learning. I created a service and an API the user could use to define a search space along with hooks into their training and evaluation code. My infrastructure then ran the optimization, including selecting the next experiment, scheduling it, collecting evaluation data, dealing with failures, etc.
Fun fact: At the time I became the biggest user of Brain compute at Google, as I used the system to tune the hyperparameters of my own models. This compute was all low-priority "free" compute obtained by migrating my jobs around the globe to follow the night, taking advantage of overcapacity.
For this project I used C++, Python, and of course GCL3.
I built a system that trains deep networks faster by dynamically adjusting the train set data distribution (cf. curriculum learning), providing a nearly free 30% training speedup on tasks with imbalances in example difficulty, such as image classification.
The main idea was to reduce the variance of the SGD gradient estimate via importance sampling. The importance weights were estimated on the fly via a concurrently-trained helper network, using current model parameters. Interestingly, the curricula produced by the system were often human-interpretable and provided insight into the task.
Other than the proper design of the helper network, the main difficulty was to make the system fast, as it needed to feed TPUs without bottlenecking them4. The final artifact was a distributed system consisting of data loaders, annotators, the caching sampler, and the concurrently-trained helper network, all communicating via a cluster-local DB, achieved in about 50K lines of C++, Python, SQL, and GCL.
Unfortunately, at the time TPUs were plentiful and the extra lifting required to integrate the system was not seen by management as worth it, causing it to be deprioritized.
In 2012 June I started a 3-month research internship at Willow Garage, working in robot perception. I liked it so much that I twice extended the internship, finally ending in 2013 March.
While there, I:
I also organized two programs for the wider benefit of the company:
My initial focus was machine learning, working with Charles Elkan. During this time, I attended the Machine Learning Summer School at Cambridge University, where I presented a paper on theoretical machine learning.
In 2009, I switched my focus to computer vision with Serge Belongie. My initial thesis area was in local descriptor methods, e.g. SIFT, which are used in computer vision to compare local regions of images. They are building blocks for many computer vision applications, including structure-from-motion and object detection and recognition.
Between 2010 and 2013, I did a total of three internships at Willow Garage and Google amounting to 16 months.
Inspired by the internships, I developed the Food for Thought (FFT)8 program, paid for with grants and with the support of my advisor Serge. FFT provided Google-style free food to all members of our lab at UCSD. I believe it significantly improved lab morale and communication. Here's a photo of some of the lab eating and here's one of the stocked fridge.
I also did some teaching:
In 2014 I joined Google Research and continued my PhD part-time, while pivoting to focus on deep learning applications in computer vision. I finished in 2018.
While an undergrad at Swarthmore College, I worked in the summers with Gary Cottrell on a variety of cognitive science topics, thanks to whom I developed an interest in computer vision and biologically inspired models.
In 2008 I graduated with honors with a BA in math and a minor in computer science.
During my PhD, I kept sane by working on a number of side-projects, for example:
With Lance Hepler, a friend who joined for 3 months between jobs.
This low-cost cloud provider was the origin for one of the most annoying bugs I’ve recently experienced, related to VLAN MTU mismatches and randomly dropped packets.
The Generic Config Language, a Google-internal language for deploying and configuring services.
I believe the bus was around 12.5 GB/s at the time or 83K ImageNet examples / second.
At the time an OpenCV core contributor, later acquired by Intel.
I’ve had an interest in functional programming ever since I took Ranjit Jhala’s programming languages class at UCSD. I am so happy Rust is now making these ideas mainstream.