Monday, October 13, 2025

How Androidify leverages Gemini, Firebase and ML Equipment


How Androidify leverages Gemini, Firebase and ML Equipment

Posted by Thomas Ezan – Developer Relations Engineer, Rebecca Franks – Developer Relations Engineer, and Avneet Singh – Product Supervisor

We’re bringing again Androidify later this 12 months, this time powered by Google AI, so you may customise your very personal Android bot and share your creativity with the world. At this time, we’re releasing a brand new open supply demo app for Androidify as a fantastic instance of how Google is utilizing its Gemini AI fashions to boost app experiences.

On this submit, we’ll dive into how the Androidify app makes use of Gemini fashions and Imagen through the Firebase AI Logic SDK, and we’ll present some insights realized alongside the best way that will help you incorporate Gemini and AI into your individual tasks. Learn extra in regards to the Androidify demo app.

App movement

The general app capabilities as follows, with varied elements of it utilizing Gemini and Firebase alongside the best way:

flow chart demonstrating Androidify app flow

Gemini and picture validation

To get began with Androidify, take a photograph or select a picture in your gadget. The app must make it possible for the picture you add is appropriate for creating an avatar.

Gemini 2.5 Flash through Firebase helps with this by verifying that the picture accommodates an individual, that the individual is in focus, and assessing picture security, together with whether or not the picture accommodates abusive content material.

val jsonSchema = Schema.obj(
   properties = mapOf("success" to Schema.boolean(), "error" to Schema.string()),
   optionalProperties = listOf("error"),
   )
   
val generativeModel = Firebase.ai(backend = GenerativeBackend.googleAI())
   .generativeModel(
            modelName = "gemini-2.5-flash-preview-04-17",
   	     generationConfig = generationConfig {
                responseMimeType = "software/json"
                responseSchema = jsonSchema
            },
            safetySettings = listOf(
                SafetySetting(HarmCategory.HARASSMENT, HarmBlockThreshold.LOW_AND_ABOVE),
                SafetySetting(HarmCategory.HATE_SPEECH, HarmBlockThreshold.LOW_AND_ABOVE),
                SafetySetting(HarmCategory.SEXUALLY_EXPLICIT, HarmBlockThreshold.LOW_AND_ABOVE),
                SafetySetting(HarmCategory.DANGEROUS_CONTENT, HarmBlockThreshold.LOW_AND_ABOVE),
                SafetySetting(HarmCategory.CIVIC_INTEGRITY, HarmBlockThreshold.LOW_AND_ABOVE),
    	),
    )

 val response = generativeModel.generateContent(
            content material {
                textual content("You might be to investigate the offered picture and decide whether it is acceptable and applicable based mostly on particular standards.... (extra particulars see the complete pattern)")
                picture(picture)
            },
        )

val jsonResponse = Json.parseToJsonElement(response.textual content)
val isSuccess = jsonResponse.jsonObject("success")?.jsonPrimitive?.booleanOrNull == true
val error = jsonResponse.jsonObject("error")?.jsonPrimitive?.content material

Within the snippet above, we’re leveraging structured output capabilities of the mannequin by defining the schema of the response. We’re passing a Schema object through the response schedule param within the generationConfig.

We need to validate that the picture has sufficient data to generate a pleasant Android avatar. So we ask the mannequin to return a json object with success = true/false and an optionally available error message explaining why the picture would not have sufficient data.

Structured output is a strong characteristic enabling a smoother integration of LLMs to your app by controlling the format of their output, just like an API response.

Picture captioning with Gemini Flash

As soon as it is established that the picture accommodates adequate data to generate an Android avatar, it’s captioned utilizing Gemini 2.5 Flash with structured output.

val jsonSchema = Schema.obj(
            properties = mapOf(
                "success" to Schema.boolean(),
                "user_description" to Schema.string(),
            ),
            optionalProperties = listOf("user_description"),
        )
val generativeModel = createGenerativeTextModel(jsonSchema)

val immediate = "You might be to create a VERY detailed description of the principle individual within the given picture. This description shall be translated right into a immediate for a generative picture mannequin..."

val response = generativeModel.generateContent(
content material { 
       	textual content(immediate) 
             	picture(picture) 
	})
        
val jsonResponse = Json.parseToJsonElement(response.textual content!!) 
val isSuccess = jsonResponse.jsonObject("success")?.jsonPrimitive?.booleanOrNull == true

val userDescription = jsonResponse.jsonObject("user_description")?.jsonPrimitive?.content material

The opposite choice within the app is to begin with a textual content immediate. You may enter in particulars about your equipment, coiffure, and clothes, and let Imagen be a bit extra inventive.

Android technology through Imagen

We’ll use this detailed description of your picture to counterpoint the immediate used for picture technology. We’ll add further particulars round what we wish to generate and embrace the bot colour choice as a part of this too, together with the pores and skin tone chosen by the person.

val imagenPrompt = "A 3D rendered cartoonish Android mascot in a photorealistic model, the pose is relaxed and easy, dealing with immediately ahead (...) The bot appears to be like as follows $userDescription (...)"

We then name the Imagen mannequin to create the bot. Utilizing this new immediate, we create a mannequin and name generateImages:

// we provide our personal fine-tuned mannequin right here however you should use "imagen-3.0-generate-002" 
val generativeModel = Firebase.ai(backend = GenerativeBackend.googleAI()).imagenModel(
            "imagen-3.0-generate-002",
            safetySettings =
            ImagenSafetySettings(
                ImagenSafetyFilterLevel.BLOCK_LOW_AND_ABOVE,
                personFilterLevel = ImagenPersonFilterLevel.ALLOW_ALL,
            ),
)

val response = generativeModel.generateImages(imagenPrompt)

val picture = response.photos.first().asBitmap()

And that’s it! The Imagen mannequin generates a bitmap that we are able to show on the person’s display.

Finetuning the Imagen mannequin

The Imagen 3 mannequin was finetuned utilizing Low-Rank Adaptation (LoRA). LoRA is a fine-tuning method designed to cut back the computational burden of coaching giant fashions. As a substitute of updating the whole mannequin, LoRA provides smaller, trainable “adapters” that make small adjustments to the mannequin’s efficiency. We ran a superb tuning pipeline on the Imagen 3 mannequin usually out there with Android bot property of various colour mixtures and completely different property for enhanced cuteness and enjoyable. We generated textual content captions for the coaching photos and the image-text pairs had been used to finetune the mannequin successfully.

The present pattern app makes use of a typical Imagen mannequin, so the outcomes might look a bit completely different from the visuals on this submit. Nonetheless, the app utilizing the fine-tuned mannequin and a customized model of Firebase AI Logic SDK was demoed at Google I/O. This app shall be launched later this 12 months and we’re additionally planning on including help for fine-tuned fashions to Firebase AI Logic SDK later within the 12 months.

moving image of Androidify app demo turning a selfie image of a bearded man wearing a black tshirt and sunglasses, with a blue back pack into a green 3D bearded droid wearing a black tshirt and sunglasses with a blue backpack

The unique picture… and Androidifi-ed picture

Ml equipment

The app additionally makes use of the ML Equipment Pose Detection SDK to detect an individual within the digital camera view, which triggers the seize button and provides visible indicators.

To do that, we add the SDK to the app, and use PoseDetection.getClient(). Then, utilizing the possectorwe have a look at the detectedLandmarks which can be within the streaming picture coming from the Digital camera, and we set the _uiState.detectedPose to true if a nostril and shoulders are seen:

non-public droop enjoyable runPoseDetection() {
    PoseDetection.getClient(
        PoseDetectorOptions.Builder()
            .setDetectorMode(PoseDetectorOptions.STREAM_MODE)
            .construct(),
    ).use { poseDetector ->
        // Since picture evaluation is processed by ML Equipment asynchronously in its personal thread pool,
        // we are able to run this immediately from the calling coroutine scope as an alternative of pushing this
        // work to a background dispatcher.
        cameraImageAnalysisUseCase.analyze { imageProxy ->
            imageProxy.picture?.let { picture ->
                val poseDetected = poseDetector.detectPersonInFrame(picture, imageProxy.imageInfo)
                _uiState.replace { it.copy(detectedPose = poseDetected) }
            }
        }
    }
}

non-public droop enjoyable PoseDetector.detectPersonInFrame(
    picture: Picture,
    imageInfo: ImageInfo,
): Boolean {
    val outcomes = course of(InputImage.fromMediaImage(picture, imageInfo.rotationDegrees)).await()
    val landmarkResults = outcomes.allPoseLandmarks
    val detectedLandmarks = mutableListOf()
    for (landmark in landmarkResults) {
        if (landmark.inFrameLikelihood > 0.7) {
            detectedLandmarks.add(landmark.landmarkType)
        }
    }

    return detectedLandmarks.containsAll(
        listOf(PoseLandmark.NOSE, PoseLandmark.LEFT_SHOULDER, PoseLandmark.RIGHT_SHOULDER),
    )
}

moving image showing the camera shutter button activating when an orange droid figurine is held in the camera frame

The digital camera shutter button is activated when an individual (or a bot!) enters the body.

Get began with AI on Android

The Androidify app makes an intensive use of the Gemini 2.5 Flash to validate the picture and generate an in depth description used to generate the picture. It additionally leverages the particularly fine-tuned Imagen 3 mannequin to generate photos of Android bots. Gemini and Imagen fashions are simply built-in into the app through the Firebase AI Logic SDK. As well as, ML Equipment Pose Detection SDK controls the seize button, enabling it solely when an individual is current in entrance of the digital camera.

To get began with AI on Android, go to the Gemini and Imagen documentation for Android.

Discover this announcement and all Google I/O 2025 updates on io.google beginning Could 22.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles