YES
But for this project, I generated the clips first and then re-generated them to add lip-sync.
This multimodal reference capability is quite rare among current AI video tools. In theory, I could have directly provided the model with edited music or voice clips along with reference images for generation.
However, Seedance supports up to 9 images, 3 video clips, and 3 audio clips as reference materials simultaneously for each generated segment.
This was a habitual mistake I made while working on this video. Initially, I followed the traditional workflow for video models: first generating reference images, then describing the actions, and so on.
After generating the clips, I edited them by adding lip-sync, syncing them with the music, and adjusting the speed of some segments to match the beat.
Seedance 2 automatically designs camera angles based on the content, though you can also specify camera movements precisely. In the raw clip below, I didn’t describe camera angles—you can compare it with my final video.
1. Overall atmosphere description
2. Key actions
3. Scene description: starting pose, mid-sequence body/hand movements over time, and ending pose
4. Dialogue/lyrics/sound effects at specific timestamps
To clarify, I didn’t use any real human dance footage as reference for this video—everything was generated and then edited together. Each segment of my video is based on prompts that generally include the following elements:
Every chant, every breath, every siren hit pulses like a declaration of control. It’s not about dancing to the rhythm — it’s about being the rhythm. Minimal. Hypnotic. Absolute.
youtu.be/rxWNmzQpW2c
🔥 When rhythm takes over, power isn’t shown — it’s felt.
OWN THE BEAT is raw Brazilian Funk stripped to its essence — no melody, just command.
In the past, producing a video like this would have taken me at least a week, and the quality wouldn’t have been nearly as good. Hollywood really needs to start rethinking its approach to content creation.
The Seedance 2 model is incredibly powerful, completely overshadowing all other models. This is an original video I created in just one day, though the music was previously made using Suno.
This will drive upcoming Apple Intelligence features—including a more personalized Siri—while Apple continues to leverage on-device and Private Cloud Compute to maintain its industry-leading privacy standards.
Finally, it’s official: Apple’s next AI leap is… built on Google’s Gemini. 🤯
Apple and Google have signed a multi-year agreement: future Apple Foundation Models will be based on Gemini models and Google Cloud technology.
Open-source foundation.
Dev-focused sample from Oculus DevTech. Fork it, swap languages, tune models, and build your own MR learning experiences. It’s a baseline to prototype commercial-grade features without starting from zero.
Github: github.com/oculus-sampl...
MR-first UX via Passthrough.
You’re learning in your actual environment, not a cartoon room. Roomscale + Hand Tracking + Voice = hands-free practice.
It identifies chairs, desks, and more, then overlays nouns/adjectives in your target language.
The app listens and judges pronunciation strictly. That’s useful for serious practice, even if it feels tough. Expect real-time feedback and progression into a “final level” with sharper visuals.
Built for Meta Quest Passthrough, it detects objects around you, overlays translated words, and listens as you speak. A playful 3D guide gives real-time pronunciation feedback, turning your room into a dynamic classroom. It’s positioned as an open-source challenger to commercial MR language apps.
A Meta Quest open-source MR app turns your room into a language lab.
Spatial Lingo shows how mixed reality + AI can teach vocab by labeling your real world—now open-source.
sref: style reference control.
Use sref to steer aesthetic toward a target look while keeping your prompt. Handy for series consistency, brand vibes, or matching a particular artist’s feel.
Prompt following for specifics.
Niji 7 improves on complex, multi‑clause requests. It’s more literal with ordering and constraints, so you can stack attributes without losing key elements.
Coherency: “what you ask is what you get.”
Better compliance with spatial cues (left/right), colors, counts. E.g., “red cube left, blue cube right” renders correctly more often, cutting prompt wrangling.
Core: “Crystal Clarity.”
Sharper reflections and eye details reduce muddiness in faces and highlights. Expect fewer artifacts in glossy surfaces and more readable micro‑features—think eyelashes, irises, jewelry.
Key stats:
Coherency: major improvement vs prior Niji
Prompt following: stricter left/right, color, object placement
Compatibility: backwards support incl. –sv 4; use –niji 7 in Discord or “Version: Niji 7” on web
Niji 7 just landed.
The latest Niji focuses on sharper eyes, tighter coherency, and better prompt adherence. It keeps legacy flags and adds sref tweaks for style control. After 18 months of training, this release targets fewer misses and more faithful outputs for anime creators.
I’ve connected with LuxReal and got three redeem codes for you to try more. Share your test results and videos in the comments—first three get the codes via DM.
Try it now: www.luxreal.ai
The next step for AI video isn’t about being more “flashy,” but more “stable.” LuxReal’s approach is still in its early stages, but the direction is right. Below is the link—feel free to join the beta test. Share the product ads you create with LuxReal and let me know about your experience.
If AI video is to truly become a “tool,” I lean toward this path: first, ensure the video makes sense in a 3D world, then focus on style and flair. Controllability, reusability, and credibility—these all stem from spatiotemporal consistency.