UNC3944, which overlaps with public reporting on Scattered Spider, is a financially-motivated threat actor characterized by its persistent use of social engineering and brazen communications with victims. In early operations, UNC3944 largely targeted telecommunications-related organizations to support SIM swap operations. However, after shifting to ransomware and data theft extortion in early 2023, they impacted organizations in a broader range of industries. Since then, we have regularly observed UNC3944 conduct waves of targeting against a specific sector, such as financial services organizations in late 2023 and food services in May 2024. Notably, UNC3944 has also previously targeted prominent brands, possibly in an attempt to gain prestige and increased attention by news media.
Google Threat Intelligence Group (GTIG) observed a decline in UNC3944 activity after 2024 law enforcement actions against individuals allegedly associated with the group. Threat actors will often temporarily halt or significantly curtail operations after an arrest, possibly to reduce law enforcement attention, rebuild capabilities and/or partnerships, or shift to new tooling to evade detection. UNC3944’s existing ties to a broader community of threat actors could potentially help them recover from law enforcement actions more quickly.
Recentpublic reporting has suggested that threat actors used tactics consistent with Scattered Spider to target a UK retail organization and deploy DragonForce ransomware. Subsequentreporting by BBC News indicates that actors associated with DragonForce claimed responsibility for attempted attacks at multiple UK retailers. Notably, the operators of DragonForce ransomware recently claimed control of RansomHub, a ransomware-as-a-service (RaaS) that seemingly ceased operations in March of this year. UNC3944 was a RansomHub affiliate in 2024, after the ALPHV (aka Blackcat) RaaS shut down.While GTIG has not independently confirmed the involvement of UNC3944 or the DragonForce RaaS, over the past few years, retail organizations have been increasingly posted on tracked data leak sites (DLS) used by extortion actors to pressure victims and/or leak stolen victim data. Retail organizations accounted for 11 percent of DLS victims in 2025 thus far, up from about 8.5 percent in 2024 and 6 percent in 2022 and 2023. It is plausible that threat actors including UNC3944 view retail organizations as attractive targets, given that they typically possess large quantities of personally identifiable information (PII) and financial data. Further, these companies may be more likely to pay a ransom demand if a ransomware attack impacts their ability to process financial transactions.
UNC3944 global targeting map
We have observed the following patterns in UNC3944 victimology:
Targeted Sectors: The group targets a wide range of sectors, with a notable focus on Technology, Telecommunications, Financial Services, Business Process Outsourcing (BPO), Gaming, Hospitality, Retail, and Media & Entertainment organizations.
Geographical Focus: Targets are primarily located in English-speaking countries, including the United States, Canada, the United Kingdom, and Australia. More recent campaigns have also included targets in Singapore and India.
Victim Organization Size: UNC3944 often targets large enterprise organizations, likely due to the potential for higher impact and ransom demands. They specifically target organizations with large help desk and outsourced IT functions which are susceptible to their social engineering tactics.
A high-level overview of UNC3944 tactics, techniques and procedures (TTPs) are noted in the following figure.
UNC3944 attack lifecycle
The following provides prioritized recommendations to protect against tactics utilized by UNC3944, organized within the pillars of:
Identity
Endpoints
Applications and Resources
Network Infrastructure
Monitoring / Detections
While implementing the full suite of the recommendations in this guide will generally have some impact on IT and normal operations, Mandiant’s extensive experience supporting organizations to defend against, contain, and eradicate UNC3944 has shown that an effective starting point involves prioritizing specific areas. Organizations should begin by focusing on recommendations that:
Achieve complete visibility across all infrastructure, identity, and critical management services.
Ensure the segregation of identities throughout the infrastructure.
Enhance strong authentication criteria.
Enforce rigorous identity controls for password resets and multi-factor authentication (MFA) registration.
Educate and communicate the importance of remaining vigilant against modern-day social engineering attacks / campaigns (see Social Engineering Awareness section later in this post). UNC3944 campaigns not only target end-users, but also IT and administrative personnel within enterprise environments.
These serve as critical foundational measures upon which other recommendations in this guide can be built.
GoogleSecOps customers benefit from existing protections that actively detect and alert on UNC3944 activity.
UNC3944 has proven to be very prolific in using social engineering techniques to impersonate users when contacting the help desk. Therefore, further securing the “positive identity” process is critical.
Train help desk personnel to positively identify employees before modifying / providing security information (including initial enrollment). At a minimum, this process should be required for any privileged accounts and should include methods such as:
On-Camera / In-Person verification
ID Verification
Challenge / Response questions
If a suspected compromise is imminent or has occurred, temporarily disable or enhance validation for self-service password reset methods. Any account management activities should require a positive identity verification as the first step. Additionally, employees should be required to authenticate using strong authentication PRIOR to changing authentication methods (e.g., adding a new MFA device). Additionally, implement use of:
Trusted Locations
Notification of authentication / security changes
Out-of-band verification for high-risk changes. For example, require a call-back to a registered number or confirmation via a known corporate email before proceeding with any sensitive request.
Avoid reliance on publicly available personal data for verification (e.g., DOB, last 4 SSN) as UNC3944 often possesses this information. Use internal-only knowledge or real-time presence verification when possible.
Temporarily disable self-service MFA resets during elevated threat periods, and route all such changes through manual help desk workflows with enhanced scrutiny.
To prevent against social engineering or other methods used to bypass authentication controls:
Remove SMS, phone call, and/or email as authentication controls.
Utilize an authenticator app that requires phishing resistant MFA (e.g., number matching and/or geo-verification).
If possible, transition to passwordless authentication.
Leverage FIDO2 security keys for authenticating identities that are assigned privileged roles.
Ensure administrative users cannot register or use legacy MFA methods, even if those are permitted for lower-tier users.
Enforce multi-context criteria to enrich the authentication transaction. Examples include not only validating the identity, but also specific device and location attributes as part of the authentication transaction.
For organizations that leverage Google Workspace, these concepts can be enforced by usingcontext-aware access policies.
For organizations that leverage Microsoft Entra ID, these concepts can be enforced by using aConditional Access Policy.
To prevent compromised credentials from being leveraged for modifying and registering an attacker-controlled MFA method:
Review authentication methods available for user registration and disallow any unnecessary or duplicative methods.
Restrict MFA registration and modification actions to only be permissible from trusted IP locations and based upon device compliance. For organizations that leverage Microsoft Entra ID, this can be accomplished using aConditional Access Policy.
If a suspected compromise has occurred, MFAre-registration may be required. This action should only be permissible from corporate locations and/or trusted IP locations.
Review specific IP locations that can bypass the requirement for MFA. If using Microsoft Entra ID, these can be in Named Locations and the legacy Service Settings.
Investigate and alert when the same MFA method or phone number is registered across multiple user accounts, which may indicate attacker-controlled device registration.
To prevent against privilege escalation and further access to an environment:
For privileged access, decouple the organization's identity store (e.g., Active Directory) from infrastructure platforms, services, and cloud admin consoles. Organizations should create local administrator accounts (e.g., local VMware VCenter Admin account). Local administrator accounts should adhere to the following principles:
Created with long and complex passwords
Passwords should not be temporarily stored within the organization’s password management or vault solution
Enforcement of Multi-Factor Authentication (MFA)
Restrict administrative portals to only be accessible from trusted locations and with privileged identities.
Leverage just-in-time controls for leveraging (“checking out”) credentials associated with privileged actions.
Enforce access restrictions and boundaries that follow the principle of least-privilege for accessing and administering cloud resources.
For organizations that leverage Google Cloud, these concepts can be enforced by usingIAM deny orprinciple access boundary policies.
For organizations that leverage Microsoft Entra ID, these concepts can be enforced by usingAzure RBAC andEntra ID RBAC controls.
Enforce that privileged accounts are hardened to prevent exposure or usage on non-Tier 0 or non-PAW endpoints.
Modern-day authentication is predicated on more than just a singular password. Therefore, organizations should ensure that processes and associated playbooks include steps to:
Revoke tokens and access keys.
Review MFA device registrations.
Review changes to authentication requirements.
Review newly enrolled devices and endpoints.
An authentication transaction should not only include strong requirements for identity verification, but also require that the device be authenticated and validated. Organizations should consider the ability to:
Enforce posture checks for devices remotely connecting to an environment (e.g., via a VPN). Example posture checks for devices include:
Validating the installation of a required host-based certificate on each endpoint.
Verifying that the endpoint operates on an approved Operating System (OS) and meets version requirements.
Confirming the organization's Endpoint Detection and Response (EDR) agent is installed and actively running. Enforce EDR installation and monitoring for all managed endpoint devices.
To prevent against threat actors leveraging rogue endpoints to access an environment, organizations should:
Monitor for rogue bastion hosts or virtual machines that are either newly created or recently joined to a managed domain.
Harden policies to restrict the ability to join devices to Entra or on-premises Active Directory.
Review authentication logs for devices that contain default Windows host names.
To prevent against lateral movement using compromised credentials, organizations should:
Limit the ability for local accounts to be used for remote (network-based) authentication.
Disable or restrict local administrative and/or hidden shares from being remotely accessible.
Enforce local firewall rules to block inbound SMB, RDP, WinRM, PowerShell, & WMI.
For domain-based privileged and service accounts, where possible, organizations should restrict the ability for accounts to be leveraged for remote authentication to endpoints. This can be accomplished using a Group Policy Object (GPO) configuration for the following user rights assignments:
Deny log on locally
Deny log on through Remote Desktop Services
Deny access to this computer from network
Deny log on as a batch
Deny log on as a service
Threat actors may attempt to change or disable VPN agents to limit network visibility by security teams. Therefore, organizations should:
Disable the ability for end users to modify VPN agent configurations.
Ensure appropriate logging when configuration changes are made to VPN agents.
For managed devices, consider an “Always-On” VPN configuration to ensure continuous protection.
To prevent against threat actors attempting to gain access to privileged access management (PAM) systems, organizations should:
Isolate and enforce network and identity access restrictions for enterprise password managers or privileged access management (PAM) systems. This should also include leveraging dedicated and segmented servers / appliances for PAM systems, which are isolated from enterprise infrastructure and virtualization platforms.
Reduce the scope of accounts that have access to PAM systems, in addition to requiring strong authentication (MFA).
Enforce role-based access controls (RBAC) within PAM systems, restricting the scope of accounts that can be accessed (based upon an assigned role).
Follow the principle of just-in-time (JIT) access for checking-out credentials stored in PAM systems.
To prevent against threat actors attempting to gain access to virtualization infrastructure, organizations should:
Isolate and restrict access to ESXi hosts / vCenter Server Appliances.
Ensure that backups of virtual machines are isolated, secured and immutable if possible.
Unbind the authentication for administrative access to virtualization platforms from the centralized identity provider (IdP). This includes individual ESXi hosts and vCenter Servers.
Proactively rotate local root / administrative passwords for privileged identities associated with virtualization platforms.
If possible use stronger MFA and bind to local SSO for all administrative access to virtualization infrastructure.
Enforce randomized passwords for local root / administrative identities correlating to each virtualized host that is part of an aggregate pool.
Disable / restrict SSH (shell) access to virtualization platforms.
Enable lockdown mode on all ESXi hosts.
Enhance monitoring to identify potential malicious / suspicious authentication attempts and activities associated with virtualization platforms.
To prevent against threat actors attempting to gain access to backup infrastructure and data, organizations should:
Leverage unique and separate (non-identity provider integrated) credentials for accessing and managing backup infrastructure, in addition to the enforcement of MFA for the accounts.
Ensure that backup servers are isolated from the production environment and reside within a dedicated network. To further protect backups, they should be within an immutable backup solution.
Implement access controls that restrict inbound traffic and protocols for accessing administrative interfaces associated with backup infrastructure.
Periodically validate the protection and integrity of backups by simulating adversarial behaviors (red teaming).
To prevent against threat actors weaponizing endpoint security and management technologies such as EDR and patch management tools, organizations should:
Segment administrative access to endpoint security tooling platforms.
Reduce the scope of identities that have the ability to create, edit, or delete Group Policy Objects (GPOs) in on-premises Active Directory.
If Intune is leveraged, enforce Intune access policies that requiremulti-administrator approval (MMA) to approve and enforce changes.
Monitor and review unauthorized access to EDR and patch management technologies.
Monitor script and application deployment on endpoints and systems using EDR and patch management technologies.
Review and monitor “allow-listed” executables, processes, paths, and applications.
Inventory installed applications on endpoints and review for potential unauthorized installations of remote access (RATs) and reconnaissance tools.
To prevent against threat actors leveraging access to cloud infrastructure for additional persistence and access, organizations should:
Monitor and review cloud resource configurations to identify and investigate newly created resources, exposed services, or other unauthorized configurations.
Monitor cloud infrastructure for newly created or modified network security group (NSG) rules, firewall rules, or publicly exposed resources that can be remotely accessed.
Monitor for the creation of programmatic keys and credentials (e.g., access keys).
To proactively identify exposed applications, ingress pathways, and to reduce the risk of unauthorized access, organizations should:
Leverage vulnerability scanning to perform an external unauthenticated scan to identify publicly exposed domains, IPs, and CIDR IP ranges.
Enforce strong authentication (e.g., phishing-resistant MFA) for accessing any applications and services that are publicly accessible.
For sensitive data and applications, enforce connectivity to cloud environments / SaaS applications to only be permissible from specific (trusted) IP ranges.
Block TOR exit node and VPS IP ranges.
The terminology of “Trusted Service Infrastructure” (TSI) is typically associated with management interfaces for platforms and technologies that provide core services for an organization. Examples include:
Asset and Patch Management Tools
Network Management Tools and Devices
Virtualization Platforms
Backup Technologies
Security Tooling
Privileged Access Management Systems
To minimize the direct access and exposure of the management plane for TSI, organizations should:
Restrict access to TSI to only originate from internal / hardened network segments or PAWs.
Create detections focused on monitoring network traffic patterns for directly accessing TSI, and alert on anomalies or suspicious traffic.
To restrict the ability for command-and-control and reduce the capabilities for mass data exfiltration, organizations should:
Restrict egress communications from all servers. Organizations should prioritize enforcing egress restrictions from servers associated with TSI, Active Directory domain controllers, and crown jewel application and data servers.
Block outbound traffic to malicious domain names, IP addresses, and domain names/addresses associated withremote access tools (RATs).
Upon initial compromise, UNC3944 is known to search for documentation on topics such as: user provisioning, MFA and/or device registration, network diagrams, and shared credentials in documents or spreadsheets.
UNC3944 will also use network reconnaissance tools like ADRecon, ADExplorer, and SharpHound. Therefore, organizations should:
Ensure any sites or portals that include these documents have access restrictions to only required accounts.
Sweep for documents and spreadsheets that may contain shared credentials and remove them.
Implement alerting rules on endpoints with EDR agents for possible execution of known reconnaissance tools.
If utilizing an Identity monitoring solution, ensure detection rules are enabled and alerts are created for any reconnaissance and discovery detections.
Implement an automated mechanism to continuously monitor domain registrations. Identify domains that mimic the organization's naming conventions, for instance: [YourOrganizationName]-helpdesk.com
or[YourOrganizationName]-SSO.com
.
To further harden the MFA registration process, organizations should:
Review logs to specifically identify events related to the registration or addition of new MFA devices or methods to include actions similar to:
MFA device registered
Authenticator app added
Phone number added for MFA
The same MFA device / method / phone number being associated with multiple users
Verify the legitimacy of new registrations against expected user behavior and any onboarding or device enrollment records.
Contact users if new registrations are detected to confirm if the activity is intentional.
To prevent against social engineering and/or unauthorized access or modifications to communication platforms, organizations should:
Review organizational policies around communication tools such as Microsoft Teams.
Allow only trusted external domains for expected vendors and partners.
If external domains cannot be blocked, create a baseline of trusted domains and alert on new domains that attempt to contact employees.
Provide awareness training to employees and staff to directly contact the organization’s helpdesk if they receive suspicious calls or messages.
The following is a Microsoft Defender advanced hunting query example. The query is written to detect when an external account (attempting to impersonate the help desk) attempts to contact the organization’s users.
Note: TheDisplayName field can be modified to include other relevant fields specific to the organization (such as “IT Support” or “ServiceDesk”).
CloudAppEvents
| where Application == "Microsoft Teams"
| where ActionType == "ChatCreated"
| extend HasForeignTenantUsers =
parse_json(RawEventData)["ParticipantInfo"]["HasForeignTenantUsers"]
| extend DisplayName = parse_json(RawEventData)["Members"][0]["DisplayName"]
| where IsExternalUser == 1 or HasForeignTenantUsers == 'true'
| where DisplayName contains "help" or AccountDisplayName contains "help"
or AccountId contains "help"
The following is a Google SecOps search query example.
Note: TheDisplayName field can be modified to include other relevant fields specific to the organization (such as “IT Support” or “ServiceDesk”).
metadata.vendor_name = "Microsoft"
metadata.product_name = "Office 365"
metadata.product_event_type = "ChatCreated"
security_result.detection_fields["ParticipantInfo_HasForeignTenantUsers"] =
"true"
(
principal.user.userid = /help/ OR
principal.user.email_addresses = /help/ OR
about.user.user_display_name = /help/
)
Detections should include:
Authentication from infrequent locations - including from proxy and VPN service providers.
Attempts made to change authentication methods or criteria.
Monitoring and hunting for authentication anomalies based upon social engineering tactics.
UNC3944 has been known to modify requirements for the use of Multi-factor Authentication. Therefore, organizations should:
For Entra ID, monitor for modifications to any Trusted Named Locations that may be used to bypass the requirement for MFA.
For Entra ID, monitor for changes to Conditional Access Policies that enforce MFA, specifically focusing on exclusions of compromised user accounts and/or devices for an associated policy.
Ensure the SOC has visibility into token replay or suspicious device logins, aligning workflows that can trigger step-up (re)authentication when suspicious activity is detected.
For organizations that are using Microsoft Entra ID, monitor for possible abuse of Entra ID Identity Federation:
Check domain names that are registered in the Entra ID tenant, paying particular attention to domains that are marked as Federated.
Review the Federation configuration of these domains to ensure that they are correct.
Monitor for creation of any new domains within the tenant and for changing the authentication method to be Federated.
Abuse of Domain Federation requires the account accomplishing the changes to have administrative permissions in Entra ID. Hardening of all administrative accounts, portals, and programmatic access is imperative.
UNC3944 is extremely proficient at using multiple forms of social engineering to convince users into doing something that will allow them to gain access. Organizations should educate users to be aware of and notify internal security teams of attempts that utilize the following tactics:
SMS phishing messages that claim to be from IT requesting users to download and install software on their machine. These may include claims that the user’s machine is out-of-compliance or is failing to report to internal management systems.
SMS messages or emails with links to sites that reference domain names that appear legitimate and reference SSO (single sign-on) and a variation of the company name. Messages may include text informing the user that they need to reset their password and/or MFA.
Phone calls to users from IT with requests to reset a password and/or MFA - or requesting that the user provide a validated one time passcode (OTP) from their device.
SMS messages or emails with requests to be granted access to a particular system, particularly if the organization already has an established method for provisioning access.
MFA fatigue attacks, where attackers may repeatedly send MFA push notifications to a victim’s device until the user unintentionally or out of frustration accepts one. Organizations should train users to reject unexpected MFA prompts and report such activity immediately.
Impersonation via collaboration tools - UNC3944 has used platforms like Microsoft Teams to pose as internal IT support or service desk personnel. Organizations should train users to verify unusual chat messages and avoid sharing credentials or MFA codes over internal collaboration tools like Microsoft Teams. Limiting external domains and monitoring for impersonation attempts (e.g., usernames containing ‘helpdesk’ or ‘support’) is advised.
In rare cases, attackers have used doxxing threats or aggressive language to scare users into compliance. Ensure employees understand this tactic and know that the organization will support them if they report these incidents.
Across industries, enterprises need efficient and proactive solutions. Imagine frontline professionals using voice commands and visual input to diagnose issues, access vital information, and initiate processes in real-time. TheGemini 2.0 Flash Live API empowers developers to create next-generation, agentic industry applications.
This API extends these capabilities to complex industrial operations. Unlike solutions relying on single data types, it leverages multimodal data – audio, visual, and text – in a continuous livestream. This enables intelligent assistants that truly understand and respond to the diverse needs of industry professionals across sectors like manufacturing, healthcare, energy, and logistics.
In this post, we’ll walk you through a use case focused on industrial condition monitoring, specifically motor maintenance, powered by Gemini 2.0 Flash Live API. The Live API enables low-latency bidirectional voice and video interactions with Gemini. With this API we can provide end users with the experience of natural, human-like voice conversations, and with the ability to interrupt the model's responses using voice commands. The model can process text, audio, and video input, and it can provide text and audio output. Our use case highlights the API's advantages over conventional AI and its potential for strategic collaborations.
The demonstration features a live, bi-directional multimodal streaming backend driven by Gemini 2.0 Flash Live API, capable of real-time audio and visual processing, enabling advanced reasoning and life-like conversations. Utilizing the API's agentic and function calling capabilities alongside Google Cloud services allows for building powerful live multimodal systems with a clean, mobile-optimized user interface for factory floor operators. The demonstration uses a motor with a visible defect as a real-world anchor.
Here’s a summarized demo flow on a smartphone:
Real-time visual identification: Pointing the camera at a motor, Gemini identifies the model and instantly summarizes relevant information from its manual, providing quick access to crucial equipment details.
Real-time visual defect identification: With a voice command like "Inspect this motor for visual defects," Gemini analyzes the live video, identifies and localizes the defect, and explains its reasoning.
Streamlined repair initiation: Upon identifying defects, the system automatically prepares and sends an email with the highlighted defect image and part information, directly initiating the repair process.
Real-time audio defect identification: Analyzing pre-recorded audio of healthy and defective motors, Gemini accurately distinguishes the faulty one based on its sound profile and explains its analysis.
Multimodal QA on operations: Operators can ask complex questions about the motor while pointing the camera at specific components. Gemini intelligently combines visual context with information from the motor manual to provide accurate voice-based answers.
The demonstration leverages the Gemini Multimodal Livestreaming API on Google Cloud Vertex AI. The API manages the core workflow and agentic function calling, while the regular Gemini API handles visual and audio feature extraction.
The workflow involves:
Agentic function calling: The API interprets user voice and visual input to determine the desired action.
Audio defect detection: Upon user intent, the system records motor sounds, stores them in GCS, and triggers a function that uses a prompt with examples of healthy and defective sounds, analyzed by the Gemini Flash 2.0 API to diagnose the motor's health.
Visual inspection: The API recognizes the intent to detect visual defects, captures images, and calls a function that uses zero-shot detection with a text prompt, leveraging the spatial understanding of the Gemini Flash 2.0 API to identify and highlight defects.
Multimodal QA: When users ask questions, the API identifies the intent for information retrieval, performs RAG on the motor manual, combines it with multimodal context, and uses the Gemini API to provide accurate answers.
Sending repair orders: Recognizing the intent to initiate a repair, the API extracts the part number and defect image, using a pre-defined template to automatically send a repair order via email.
Such a demo can be easily built with minimal custom integration, by referring to the guide here, and incorporating the features mentioned in the diagram above. The majority of the effort would be in adding custom function calls for various use cases.
This demonstration highlights theGemini Multimodal Livestreaming API's key capabilities and their transformative industrial benefits:
Real-time multimodal processing: The API's ability to simultaneously process live audio and visual streams provides immediate insights in dynamic environments, crucial for preventing downtime and ensuring operational continuity.
Use case: In healthcare, a remote medical assistant could use live video and audio to guide a field paramedic, receiving real-time vital signs and visual information to provide expert support during emergencies.
Advanced audio & visual reasoning: Gemini's sophisticated reasoning interprets complex visual scenes and subtle auditory cues for accurate diagnostics.
Use Case: In manufacturing, AI can analyze the sounds and visuals of machinery to predict failures before they occur, minimizing production disruptions.
Agentic function calling for automated workflows: The API's agentic nature enables intelligent assistants to proactively trigger actions, like generating reports or initiating processes, streamlining workflows.
Use case: In logistics, a voice command and visual confirmation of a damaged package could automatically trigger a claim process and notify relevant parties.
Seamless integration and scalability: Built on Vertex AI, the API integrates with other Google Cloud services, ensuring scalability and reliability for large-scale deployments.
Use case: In agriculture, drones equipped with cameras and microphones could stream live data to the API for real-time analysis of crop health and pest detection across vast farmlands.
Mobile-optimized user experience: The mobile-first design ensures accessibility for frontline workers, allowing interaction with the AI assistant at the point of need using familiar devices.
Use case: In retail, store associates could use voice and image recognition to quickly check inventory, locate products, or access product information for customers directly on the store floor.
Proactive maintenance and efficiency gains: By enabling real-time condition monitoring, industries can shift from reactive to predictive maintenance, reducing downtime, optimizing asset utilization, and improving overall efficiency across sectors.
Use case: In the energy sector, field technicians can use the API to diagnose issues with remote equipment like wind turbines through live audio and visual streams, reducing the need for costly and time-consuming site visits.
Explore the cutting edge of AI interaction with the Gemini Live API, as showcased by thissolution. Developers can leverage its codebase – featuring low-latency voice, webcam/screen integration, interruptible streaming audio, and a modular tool system via Cloud Functions – as a robust starting point. Clone the project, adapt the components, and begin creating transformative, multimodal AI solutions that feel truly conversational and aware.The future of the intelligent industry is live, multimodal, and within reach for all sectors.
For AI developers building cutting-edge applications with large model sizes, a reliable foundation is non-negotiable. You need your AI to perform consistently, delivering results without hiccups, even under pressure. This means having dedicated resources that won't get bogged down by other users' activity. While existingVertex AI Prediction Endpoints– managed pools of resources to deploy AI models for online inference – provide a capable serving solution, developers need better ways to reach consistent performance and resource isolation in case of shared resource contention.
Today, we are pleased to announceVertex AI Prediction Dedicated Endpoints, a new family of Vertex AI Prediction endpoints, designed to address the needs of modern AI applications, including those related with large-scale generative AI models.
Serving generative AI and other large-scale models introduces unique challenges related to payload size, inference time, interactivity, and performance demands. The newVertex AI Prediction Dedicated Endpoints have been specifically engineered to help you build more reliably with the following new integrated features:
Native support for streaming inference: Essential for interactive applications like chatbots or real-time content generation, Vertex AI Endpoints now provide native support for streaming, simplifying development and architecture, via the following APIs:
streamRawPredict: Utilize this dedicated API method for bidirectional streaming to send prompts and receive sequences of responses (e.g., tokens) as they become available.
OpenAI Chat Completion: To facilitate interoperability and ease migration, endpoints serving compatible models can optionally expose an interface conforming to the widely used OpenAI Chat Completion streaming API standard.
gRPC protocol support: For latency-sensitive applications or high-throughput scenarios often encountered with large models, endpoints now natively supportgRPC. Leveraging HTTP/2 and Protocol Buffers, gRPC can offer performance advantages over standard REST/HTTP.
Customizable request timeouts: Large models can have significantly longer inference times. We now provide the flexibility, via API, toconfigure custom timeouts for prediction requests, accommodating a wider range of model processing durations beyond the default settings.
Optimized resource handling: The underlying infrastructure is designed to better handle the resource demands (CPU/GPU, memory, network bandwidth) of large models, contributing to the overall stability and performance, especially when paired with Private Endpoints.
The newly integrated capabilities of Vertex AI Prediction Dedicated Endpoints offer a unified and robust serving solution tailored for demanding modern AI workloads. From today, Vertex AI Model Garden will use Vertex AI Prediction Dedicated Endpoints as the standard serving method for self-deployed models.
While Dedicated Endpoints Public remain available for models accessible over the public internet,we are enhancing networking options onDedicated EndpointsutilizingGoogle Cloud Private Service Connect (PSC). The newDedicatedPrivateEndpoints (via PSC)provide a secure and performance-optimized path for prediction requests. By leveraging PSC, traffic routes entirely within Google Cloud's network, offering significant benefits:
Enhanced security: Requests originate from within your Virtual Private Cloud (VPC) network, eliminating public internet exposure for the endpoint.
Improved performance consistency: Bypassing the public internet reduces latency variability.
Reduced performance interference: PSC facilitates better network traffic isolation, mitigating potential "noisy neighbor" effects and leading to more predictable performance, especially for demanding workloads.
For production workloads with strict security requirements and predictable latency, Private Endpoints using Private Service Connect are the recommended configuration.
Sojern is a marketing company focusing on the hospitality industry, matching potential customers to travel businesses around the globe. As part of their growth plans, Sojern turned to Vertex AI. Leaving their self-managed ML stack behind, Sojern can focus more on innovation, while scaling out far beyond their historical footprint.
Given the nature of Sojern’s business, their ML deployments follow a unique deployment model, requiring several high throughput endpoints to be available and agile at all times, allowing for constant model evolution. Using Public Endpoints would cause rate limiting and ultimately degrade user experience; moving to a Shared VPC model would have required a major design change for existing consumers of the models.
With Private Service Connect (PSC) and Dedicated Endpoint, Sojern avoided hitting the quotas / limits enforced on Public Endpoints, while also avoiding a network redesign to accommodate Shared VPC.
The ability to quickly promote tested models, take advantage of Dedicated Endpoint’s enhanced featureset, and improve latency for their customers strongly aligned with Sojern’s goals. The Sojern team continues to onboard new models, always improving accuracy and customer satisfaction, powered by Private Service Connect and Dedicated Endpoint.
Are you struggling to scale your prediction workloads on Vertex AI? Check out the resources below to start using the new Vertex AI Prediction Dedicated Endpoints:
Documentation
Github samples
Your experience and feedback are important as we continue to evolve Vertex AI. We encourage you to explore these new endpoint capabilities and share your insights throughGoogle Cloud community forum.
When’s the last time you watched a race for the braking?
It’s the heart-pounding acceleration and death-defying maneuvers that keep most motorsport fans on the edge of their seats. Especially when it comes toFormula E — and really all EVs — the explosive, near-instantaneous acceleration of an electric motor is part of the appeal.
A less considered, yet no less important feature, is how EVs canregeneratively brake, turning friction into fuel. Part of Formula E’s mission is to make EVs a compelling automotive choice for consumers, not just world-class racers; highlighting this powerful aspect of the vehicles has become a priority. The question remained: How do you get others to feel the same exhilaration from deceleration?
The answer came from the mountains above Monaco, as well as some prompts in Gemini 2.5.
In the lead up to theMonaco E-Prix, Formula E and Google undertook a project dubbed Mountain Recharge. The challenge: Whether a Formula EGENBETA race car, starting with only 1% battery, could regenerate enough energy from braking during a descent through France’s coastal Alps to then complete a full lap of the iconic Monaco circuit.
More than just a stunt, this experiment is testing the boundaries of technology — and not just in EVs, but on the cloud, too. Without the live analytics and plenty of AI-powered planning, the Mountain Recharge might not have come to pass. In fact, AI even helped determine which mountain pass would be best suited for this effort. (Read on to find out which one, and see if we made it to the bottom.)
Mountain Recharge is exciting not only for thrills on the course but also the potential it shows for AI across industries. In addition to its role in helping to execute tasks, AI proved valuable to the brainstorming, experimentation, and rapidfire simulations that helped get Mountain Recharge to the finish line.
Before even setting foot or wheel to the course, the team at Formula E and Google Cloud turned to Gemini to try and figure out if such an endeavor was possible.
To answer the fundamental question of feasibility, the team entered a straightforward prompt into Google’sAI Studio: “Starting with just 1% battery, could the GENBETA car potentially generate enough recharge by descending a high mountain pass to do a lap of the Circuit of Monaco?”
The AI Studio validator, runningGemini 2.5 Pro with its deep reasoning functionality, analyzed first-party data that had been uploaded by Formula E on the GENBETA’s capabilities; we then grounded the model with Google Search to further improve accuracy and reliability by connecting to the universe of information available online.
AI Studio shared its “thinking” in a detailed eight-step process, which includedidentifying the key information needed; consulting the provided documents; gathering external information through a simulated search; performing calculations and analysis; and finally synthesizing the answer based on the core question.
The final output: “theoretically feasible.” In other words, the perfect challenge.
Navigating the steep turns above Monaco helped generate plenty of power for Mountain Recharge.
Still working in AI Studio, we then used a new feature, the ability to build custom apps such as the Maps Explorer, to determine the best route, which turned out to be theCol de Braus. AI Studio then mapped out a route for the challenge.This rigorous, data-backed validation, facilitated by AI Studio and Gemini's ability to incorporate technical specifications and estimations, transformed the project from a speculative what-if into something Formula E felt confident attempting.
AI played an important role away from the course, as well. To aid in coordination and planning, teams at Formula E and Google Cloud usedNotebookLM to digest the technical regulations and battery specifications and locate relevant information within them, which, given the complexity of the challenge and the number of parties involved, helped ensure cross-functional teams were kept up to date and grounded with sourced data to help make informed decisions.
During the mountain descent, real-time monitoring of the car's progress and energy regeneration would be crucial.Firebase andBigQuery were instrumental in visualizing this real-time telemetry. Data from both multiple sensors and Google Maps was streamed to BigQuery, Google Cloud's data warehouse, from a high-performance mobile phone connected to the car (a Pixel 9 was well suited to the task).
This data stream proved to be yet another challenge to overcome, because of the patchy mobile signal in the mountainous terrain of theMaritime Alps. When data couldn’t be sent, it was cached locally on the phone until the signal was available again.
BigQuery's capacity for real-time data ingestion and in-platform AI model creation enabled speedy analysis and the calculation of essential metrics. A web-based dashboard was developed using Firebase that connected to BigQuery to display both data and insights. AI Studio greatly facilitated the development of the application by translating a picture of a dashboard mockup into fully functional code.
“From figuring out if our crazy Mountain Recharge idea was even possible, to giving us live insights during the descent, AI was our guide,” said Alex Aidan, Formula E’s VP of Marketing. “It’s what turned an ambitious ‘what if' into a reality we could track moment by moment.”
After completing its descent, the car stored up enough energy that it is expected to complete its lap of the Monaco circuit on Saturday, as part of the E-Prix’s pre-race festivities.
A different kind of push start.
Both the success and the development of the Mountain Recharge campaign offer valuable lessons to others pursuing ambitious projects. It shows that AI doesn’t have to be central to a project — it can be just as powerful at facilitating and optimizing something we’ve been doing for years, like racing cars. Our results in the Mountain Recharge only underscores the potential benefits of AI for a wide range of industries:
Enhanced planning and exploration: Just as Gemini helped Formula E explore unconventional ideas and identify the optimal route, businesses can leverage large language models for innovative problem-solving, market analysis, and strategic planning, uncovering unexpected angles and accelerating the journey from "what if" to "we can do that".
Streamlined project management: NotebookLM's ability to centralize and organize vast amounts of information demonstrates how AI can significantly improve efficiency in complex projects, from logistics and resource allocation to research and compliance. This reduces the risk of errors and ensures smoother coordination across teams.
Data-driven decision making: The real-time data analysis capabilities showcased in the Mountain Recharge underscore the power of cloud-based data platforms like BigQuery. Organizations can leverage these tools to gain immediate insights from their data, enabling them to make agile adjustments and optimize performance on the fly. This is invaluable in dynamic environments where rapid responses are critical.
Deeper understanding of complex systems: By applying AI to analyze intricate data streams, teams can gain a more profound understanding of the factors influencing performance.
Such capabilities certainly impressed James Rossiter, a former Formula E Team Principal, current test driver, and broadcaster for the series. "I was really surprised at the detail of the advice and things to consider,” Rossiter said. “We always talk about these things as a team, but as this is so different to racing, I had to totally rethink the drive."
The Formula E Mountain Recharge campaign is more than just an exciting piece of content; it's a testament to the power of human ingenuity amplified by intelligent technology. It’s also the latest collaboration between Formula E and Google Cloud and our shared commitment to use AI to push the boundaries of what’s possible in the sport in the sport and in the world.
We’ve already developed anAI-powered digital driving coach to help level the field for EV racing. Now, with the Mountain Recharge, we can inspire everyday drivers well beyond the track with the capabilities of electric vehicles.
It’s thinking big, even if it all starts with a simple prompt on a screen. You just have to ask the right questions, starting with the most important ones: Is this possible, and how can we make it so?
Want to know the latest from Google Cloud? Find it here in one handy location. Check back regularly for our newest updates, announcements, resources, events, learning opportunities, and more.
Tip: Not sure where to find what you’re looking for on the Google Cloud blog? Start here: Google Cloud blog 101: Full list of topics, links, and resources.
Iceland’s Magic: Reliving Solo Adventure through Gemini
Embark on a journey through Iceland's stunning landscapes, as experienced on Gauti's Icelandic solo trip. From majestic waterfalls to the enchanting Northern Lights, Gautami then takes these cherished memories a step further, using Google's multi-modal AI, specifically Veo2, to bring static photos to life. Discover how technology can enhance and dynamically relive travel experiences, turning precious moments into immersive short videos. This innovative approach showcases the power of AI in preserving and enriching our memories from Gauti's unforgettable Icelandic travels. Read more.
What’s new in Database Center
With general availability, Database Center now provides enhanced performance and health monitoring for all Google Cloud databases, including Cloud SQL, AlloyDB, Spanner, Bigtable, Memorystore, and Firestore. It delivers richer metrics and actionable recommendations, helps you to optimize database performance and reliability, and customize your experience. Database Center also leverages Gemini to deliver assistive performance troubleshooting experience. Finally, you can track the weekly progress of your database inventory and health issues.
Get started with Database Center today
Protecting your APIs from OWASP’s top 10 security threats: We compare OWASP’s top 10 API security threats list to the security capabilities of Apigee. Here’s how we hold up.
Project Shield makes it easier to sign up, set up, automate DDoS protection: It’s now easier than ever for vulnerable organizations to apply to Project Shield, set up protection, and automate their defenses. Here’s how.
How Google Does It: Red teaming at Google scale- The best red teams are creative sparring partners for defenders, probing for weaknesses. Here’s how we do red teaming at Google scale.
AI Hypercomputer is a fully integrated supercomputing architecture for AI workloads – and it’s easier to use than you think. Check out this blog, where we break down four common use cases, including reference architectures and tutorials, representing just a few of the many ways you can use AI Hypercomputer today.
Join us for a new webinar,Smarter CX, Bigger Impact: Transforming Customer Experiences with Google AI, where we'll explore how Google AI can help you deliver exceptional customer experiences and drive business growth. You'll learn how to:
Transform Customer Experiences: With conversational AI agents that provide personalized customer engagements.
Improve Employee Productivity & Experience: With AI that monitors customers sentiment in real-time, and assists customer service representatives to raise customer satisfaction scores.
Deliver Value Faster: With 30+ data connectors and 70+ action connectors to the most commonly used CRMs and information systems.
Register here