Features

Multimedia Processing

Multimedia Processing - Opulent documentation

Multimedia Processing

Generate images, produce video, embed audio, and analyze any media — all inside your workspace

Opulent works across every media type. Generate original images on demand, produce short video clips, embed audio and YouTube content into documents, and extract insights from images and video — all woven directly into your agent workflows and Document panel.


Capabilities Overview

CapabilityWhat It DoesExample Use
Image GenerationCreate custom images from text descriptionsProduct mockups, illustrations, slide visuals
Image UnderstandingAnalyze and extract info from any imageDocument scanning, chart re-creation, visual QA
Video GenerationGenerate short video clips from a descriptionProduct teasers, social content, explainer clips
Audio EmbedEmbed playable audio clips into documentsNarration, music, recorded walkthroughs
Voice OutputConvert text to natural spoken audioVoiceovers, accessibility, podcast drafts
Speech to TextTranscribe audio or video to textMeeting notes, interview transcripts
YouTube EmbedEmbed any YouTube video inlineReference material, tutorials, demos
Video UnderstandingExtract insights from uploaded videoCompetitor analysis, meeting summaries

Image Generation

Quick Start

"Generate a product mockup showing our dashboard on a MacBook Pro,
clean white background, professional photography style"
"Create a diagram showing our customer journey from first visit to
paying customer, flat design, brand colors"
"Generate a hero image for our landing page — abstract, dark theme,
glowing blue accents, wide format 1920x1080"

Common Uses

Product Visuals:

  • Product mockups and prototypes
  • Feature illustrations and UI concepts
  • App store screenshots and demos

Marketing Assets:

  • Social media graphics (square, portrait, landscape)
  • Blog post header images
  • Ad creatives with specified aspect ratios

Presentations:

  • Custom slide backgrounds per section
  • Concept illustrations for abstract ideas
  • Visual metaphors that replace stock photos

Diagrams & Architecture:

  • System architecture diagrams
  • Process and workflow flows
  • Infographics combining icons and data

Tips for Better Images

Be specific about style:

  • "Flat design illustration, bright colors, icon-style"
  • "Minimalist, modern, professional photography, shallow depth of field"
  • "Make it look good"

Describe composition:

  • "Centered subject, blurred background, natural window lighting"
  • "A picture of our app"

Specify use case and format:

  • "For Instagram post, square format, bold text overlay space at bottom"
  • "For presentation slide, 16:9 wide format, subtle texture background"

Image Understanding

Quick Start

"Analyze this screenshot and extract all visible text into a document"

(Upload image)

"What products are shown in this catalog page? Extract names, prices,
and any SKU numbers visible."

(Upload image)

"Recreate this chart as an editable data table with the exact values shown"

(Upload image)

Common Uses

Document Processing:

  • Extract text from screenshots and scans
  • Parse receipts, invoices, and purchase orders
  • Read handwritten notes and whiteboard photos

Visual Analysis:

  • Re-create charts and graphs as live data
  • Identify objects, logos, and UI elements
  • Describe and annotate image content

Quality Control:

  • Compare two product photos for differences
  • Verify design specs are met in screenshots
  • Check marketing assets for brand compliance

Example Tasks

"Extract all text from these 10 product images and organize into a spreadsheet"
"Analyze this architecture diagram and create a written description of
every component and how they connect"
"Compare these two app screenshots and list every UI difference"

Video Generation

Opulent supports text-to-video generation for short clips and animations.

Quick Start

"Generate a 5-second product teaser showing our logo animating in
with a dark background and particle effects"
"Create a short explainer clip for this feature — professional style,
no text overlay, abstract motion graphics"

Common Uses

  • Product teasers and social media clips
  • Abstract motion graphics for presentations
  • Animated diagrams and process illustrations
  • Brand intro sequences

Checking Video Status

Video generation runs asynchronously. After requesting a video:

"Check the status of my video generation"

Once complete, the video is available in AI Drive and can be embedded directly into any Document.


Audio Embeds

Embed playable audio directly into documents and presentations — no external tools needed.

Quick Start

"Embed the audio file at this URL as a playable player in the document:
https://example.com/narration.mp3"
"Add background music to this document — label it 'Product Demo Walkthrough'"

Using the toolbar: open the Document panel → click the ♪ Music icon → enter the audio URL.

Common Uses

  • Meeting recordings embedded alongside notes
  • Product demo narration in proposals
  • Podcast clips embedded in research documents
  • Accessibility versions of written content

Voice Output

Convert any text to a natural audio narration.

Quick Start

"Convert this blog post to an audio narration file, professional tone"
"Create a voiceover for this presentation script — friendly, energetic,
moderate pace"
"Generate audio descriptions of these 10 products for our website"

Common Uses

Content Creation:

  • Blog posts converted to audio
  • Podcast scripts produced as audio files
  • Presentation narration tracks

Accessibility:

  • Audio versions of long documents
  • Voiceover tracks for screen recordings

Marketing:

  • Ad voiceover copy
  • Product demo narration
  • Social media audio content

Voice Options

Tone: Professional, friendly, casual, energetic, calm Pace: Fast, moderate, slow Style: Conversational, formal, educational, promotional


Speech to Text

Transcribe any audio or video file to text with speaker labels and timestamps.

Quick Start

"Transcribe this interview recording"

(Upload audio file)

"Convert this 1-hour webinar recording to text. Include:
- Full transcript
- Executive summary
- Key takeaways
- Q&A section extracted separately"
"Transcribe these 20 customer support calls and identify the most
common issues mentioned across all of them"

Common Uses

Meeting Notes:

  • Transcribe calls and standup recordings
  • Extract action items automatically
  • Build searchable meeting archives

Content Repurposing:

  • Convert podcasts to blog posts
  • Create show notes from audio
  • Extract quotable moments with timestamps

Research:

  • Transcribe user interview recordings
  • Analyze customer support calls at scale
  • Process focus group audio

Features

  • Speaker identification — Distinguish between multiple speakers
  • Timestamps — Mark when each segment was said
  • Formatting — Proper punctuation, paragraphs, and capitalization
  • Accuracy — High accuracy across accents, background noise, and technical terminology

YouTube Embeds

Embed any YouTube video inline in documents and presentations.

Quick Start

"Embed this YouTube video in the document: https://youtu.be/VIDEO_ID"

Using the toolbar: open the Document panel → click the ▶ YouTube icon → paste the URL or video ID.

Common Uses

  • Embed tutorial videos alongside written instructions
  • Add reference demos to proposals
  • Include competitor videos in research documents
  • Add explainer videos to onboarding documents

Combining Multiple Modes

Opulent can combine all media capabilities in a single workflow:

Example 1: Video to Document

"Watch this product demo video, transcribe it, extract key features
and pricing mentioned, generate a summary diagram, and write a
blog post with all findings embedded in the Document panel"

Example 2: Presentation with Full Media

"Create a 10-slide presentation about our Q4 results. Generate a custom
illustration for each section. Add a voiceover narration track for the
full deck. Embed a reference YouTube tutorial in the appendix slide."

Example 3: Bulk Image Analysis to Report

"Analyze these 50 product photos, extract visible text and product
details from each, generate a comparison chart, and create a
slide deck with findings and side-by-side image comparisons"

Quick Use Cases

Use CaseInputOutput
Product MockupsDescriptionGenerated image
Meeting NotesVideo / audio recordingTranscript + summary
Blog AudioText articleAudio narration file
Document ScanningPhoto of documentExtracted text
Video AnalysisCompetitor video URLFeature comparison
Podcast Show NotesAudio fileTranscript + summary
Slide VisualsTopic descriptionCustom generated images
Demo NarrationPresentation scriptVoiceover audio
YouTube ReferenceURLEmbedded player in doc

Common Questions

What image formats are supported? PNG, JPG, WEBP, GIF, SVG for uploads. Generated images are delivered as PNG or JPEG at requested dimensions.

What video formats work for upload? MP4, MOV, WEBM, and most common formats. Generated videos are delivered as MP4.

What audio formats work for transcription? MP3, WAV, M4A, WEBM, OGG, and most common audio formats.

Can I generate images in specific sizes? Yes. Specify dimensions: "Generate a 1920x1080 image..." or "Square format, 1:1 ratio for Instagram".

How accurate is speech transcription? Very high accuracy, including multiple speakers, accents, and technical vocabulary.

Can I embed audio from an external URL? Yes. Any direct audio URL (mp3, wav, ogg) can be embedded as a playable player in Documents.


Bottom line: Opulent handles every media type natively — generate images, produce video, embed audio, transcribe speech, and analyze visual content — all integrated into your documents, slides, and agent workflows without leaving the platform.