æ

software engineer student @ polytechnique montreal

calling it mote

mote is like a calculator for your photos

Prototype's design based on @laurentdelrey's free ideas. He also helped me a lot through this process. You would be surprised how eager people are to help you if you just go out and ask. Big thanks to him.

i've been exploring intuitive interaction models for image-to-image generation. users select an image A and image B, representing structure and style respectively. then they "add" them together to produce a new composition.

Update 2025-03-28 : i started prototyping this idea of generating edits from source videos and aesthetic references. would like to apply this to the user's camera roll videos.

tools used: ComfyUI · ControlNet (Depth) · IPAdapter · ProtoPie · SDXL · Google Colab

mote prototype video

demo video of the prototype interaction, playing automatically.

more coming soon

im currently exploring a new interaction: division. the idea is to allow users to divide long-form videos into shorter, curated clips to generate aesthetic edits. for example, dividing a 5-minute video using a reference image as a guide could produce a 45-second TikTok-style edit that visually aligns with the source aesthetic.

mote prototype video

demo video of the prototype interaction, playing automatically.

Video gen knowledge is very open. Especially compared to the secrecy that LLM labs have. It seems that many of Pika/Runway techniques are public knowledge. see here

Update 2025-03-26 : OpenAI's GPT4o image model generation is actually very good. we won't need IPAdapters, or controlnets, or loras, or comfy workflows, or segmentation models. it'll be one model to rule them all. one question is now how long until we get a capable all-purpose model (like4o) that is open source and runs on consumer GPUs.

it's only going to get better, and it will apply to more and more mediums - audio and video are next.