Today we’ll turn the first frame into your first SHOT.
For that, let’s start by analyzing the stills we created in our last issue.
For me, are these 4:




We haven’t covered cinematic composition just yet, but just based on your gut feeling, tell me what you think:
Which one do you think looks more “cinematic” while, at the same time, is the most realistic rendition?
In my opinion, still #1, even if beautiful, has a few problems:
For one, the framing is way too tight, and the characters and objects we prompted are too close together.
This is possibly why Reve decided to place some objects in places that don’t make too much sense…
Like, what is the bamboo table doing there?
Is it floating, or is more of a bamboo stool with the teacup on it?
And why is the piano positioned on a diagonal line with the wall?
Same thing with the bamboo table; is looks like every object in the room is in a slightly weird position, completely out of alignment with each other!
Also, the proportions of grandma versus the granddaughter don’t feel quite right…
And I’m not entirely sure about her hand, of what is she doing with that screwdriver between the piano’s keys!
So, sadly, we can’t really use still #1.
Just too many problems and inconsistencies with it.
Let’s check still #2:
This frame looks much more natural in my opinion.
It has a lot more breathing room, and everything seem to be in place and properly aligned.
I also like the fact that we can see part of a traditional Japanese house in the background, which adds to the overall realism of the scene.
There are a few problems with this image though:
First, Rave decided to draw black stripes on the top and the button of the image for some reason.
I’ve found that this happens sometimes when we prompt for a “cinematic” image, likely because the model has seen many images that include those black stripes during training, related to the concept of “cinematic” .
We could fix that using something instructing ChatGPT or Rave to extend the image, or using something like Photoshop’s generative fill.
But there are other problems with this image:
The scene looks too static.
And I’m not talking about the still having no movement, of course it doesn’t, but rather about the way the characters and objects are positioned within the frame.
And then is the obvious issue: she’s not really trying to tune or repair the piano, she’s playing it, and that’s not what we prompted for!
Still #3 is great.
The scene looks super-realistic, the Japanese house looks great, the tea cup looks great, the characters’ positions relative of each other are great, the body language and expressions are on point.
She still is not really trying to repair the piano, which is what our story is about… but we could maybe make a case that she’s trying to tune it?
But all of that is out of the window when we look at still #4…
Still #4 is just perfect!
The expression of the characters, the natural lighting, the dusty piano, the tatami floor… the beautiful, steamy ceramic teacup sitting on the rustic bamboo table.
There is also the interesting visual element of having her kneeling while actually trying to repair the piano…
In Japan, kneeling is a very common way of sitting, specially for younger people, so it adds artistic interest and cultural relevance to the picture!
Grandma is not standing like we prompted, but in this case, I think it was the right call, from a composition point-of-view.
The composition of the entire frame looks, indeed, really cinematic!
Nothing more to say: this is the one!
So… how will we bring this first frame to life?
For that, we need to use an image-to-video AI model.
There are many tools we could use for this task, all of which we’ll be using in the future, but today, we’ll use Hailou AI.
Don’t get the sound of it scare you: this is, actually, a really easy task!
Simply click the link above and create your free account.
They will give you some free credits every day to try their models, and that’s one of the reasons why I love this platform!
Go now to the VIDEO section, make sure of selecting the I2V (image to video) tab, upload your first-frame image and you’re ready to go!
In this screen you can pick between different video models, write a prompt to guide the video generation, and even include different camera movements.
For now, we don’t need to do any of that.
So just click the colorful button with the number inside –the number of credits that will be consumed– and the generation will start.
In case you’d like to see a couple examples at once you can, as I did above, select 2 as the quantify.
But if you’re on a free account, maybe is a good idea to do 1 generation at a time in order to stretch your free credits.
After some waiting, usually less than 5 minutes, we have the results:
And just like that, we have brought our characters to life!
In our next issue, you’ll start learning the first letters of the wonderful language of visual storytelling, as we convert this moving frame into the first scene of your movie!
See you then, fellow Director-in-training,
Leonardo
P. S. If you happen to know someone special that would LOVE to be part of this exciting journey with us, please share Flawlessly Human with them: