In Fall of 2020, I introduced myself in my classes by making these video pieces.
I found a sample of my own voice recordings from when I was 4 years old. I used several Tect-to-speech synthesizers to create generated speech from the transcript of my recordings. I also used google Magenta’s Onsets to Frames to extract midi information from the voice recordings and play it back using different tones from a neural synthesizer model.
The result is this audio visual piece – that is kind of eerie to listen to but is created only using the content of my voice recording from 1995 and current A.I. tools.
Every few months the world of visual deep fakes becomes more convincing. It also becomes easier to implement in real time. I explored the very early versions of these techniques this year for real time deep faking through first order motion models.
Most of these models work by extracting certain face detection landmarks which is why, they completely fail to identify a face if they cant recognize eyes. I love exploring the areas of an A.I. model where it starts “failing” according to our objective definition of its goals. I think it reveals the textures and implicit patterns that the model has managed to capture away from the local minima’s of the expected inputs.
Here’s an experiment I did, epxloring these very non-natural portions of the face synthesis models using an image of Einstein as a puppet.
I also created this short piece of controlling my own face called
Puppeteering Myself. (1991 – today)