Home New Trending Search
About Privacy Terms
#
#AmazonEmr
Posts tagged #AmazonEmr on Bluesky
Amazon S3 Storage Lens adds performance metrics, support for billions of prefixes, and export to S3 Tables New capabilities help optimize application performance, analyze unlimited prefixes, and simplify metrics analysis through S3 Tables integration.

Amazon S3 Storage Lens adds performance metrics, support for billions of prefixes, and export to S3 Tables

New capabilities help optimize appli...

#AWS #AmazonAthena #AmazonCloudwatch #AmazonEmr #AmazonQuickSight #AmazonRedshift #AmazonS3Tables #AmazonSimpleStorageService(S3) #Analytics #Storage

0 0 0 0
Amazon EMR Serverless now supports AWS KMS customer managed keys for encrypting local disks https://aws.amazon.com/emr/serverless/ now supports encrypting local disks with AWS Key Management Service (KMS) customer managed keys (CMKs). You can now meet strict regulatory and compliance requirements with additional encryption options beyond default AWS-owned keys, giving you greater control over your encryption strategy. Amazon EMR Serverless is a deployment option in Amazon EMR that makes it simple for data engineers and data scientists to run open-source big data analytics frameworks without configuring, managing, and scaling clusters or servers. Local disks on EMR Serverless workers are encrypted by default using AWS-owned keys. With this launch, customers who have strict regulatory and compliance needs can encrypt local disks with AWS KMS customer managed keys (CMKs) in the same account or from another account. This integration is supported on new or existing EMR Serverless applications and on all supported EMR release versions. You can specify the AWS KMS customer managed key at the application level where it applies to all workloads submitted on the application or you can specify the AWS KMS customer managed key for a specific job run or interactive session. This feature is available in all supported EMR Releases and in all AWS Regions where Amazon EMR Serverless is available including AWS GovCloud (US) and China regions. To learn more, see https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/disk-encryption-cmk.html in the Amazon EMR Serverless User Guide.

Amazon EMR Serverless now supports AWS KMS customer managed keys for encrypting local disks

https://aws.amazon.com/emr/serverless/ now supports encrypting local disks with AWS Key Management Service (KMS) customer managed keys (CMKs). You can now meet strict regulatory and com...

#AWS #AmazonEmr

0 0 0 0
Preview
Amazon EMR Serverless now supports AWS KMS customer managed keys for encrypting local disks Amazon EMR Serverless now supports encrypting local disks with AWS Key Management Service (KMS) customer managed keys (CMKs). You can now meet strict regulatory and compliance requirements with additional encryption options beyond default AWS-owned keys, giving you greater control over your encryption strategy. Amazon EMR Serverless is a deployment option in Amazon EMR that makes it simple for data engineers and data scientists to run open-source big data analytics frameworks without configuring, managing, and scaling clusters or servers. Local disks on EMR Serverless workers are encrypted by default using AWS-owned keys. With this launch, customers who have strict regulatory and compliance needs can encrypt local disks with AWS KMS customer managed keys (CMKs) in the same account or from another account. This integration is supported on new or existing EMR Serverless applications and on all supported EMR release versions. You can specify the AWS KMS customer managed key at the application level where it applies to all workloads submitted on the application or you can specify the AWS KMS customer managed key for a specific job run or interactive session. This feature is available in all supported EMR Releases and in all AWS Regions where Amazon EMR Serverless is available including AWS GovCloud (US) and China regions. To learn more, see Local Disk Encryption with AWS KMS CMK in the Amazon EMR Serverless User Guide.

🆕 Amazon EMR Serverless now supports AWS KMS CMKs for local disk encryption, enhancing compliance and control over encryption strategy, available in all regions where EMR Serverless operates.

#AWS #AmazonEmr

0 0 0 0
Preview
Amazon EMR Serverless Adds Job-Run Level Controls for Greater Operational Precision -- AWSInsider New capability gives data teams more granular visibility and management of individual analytics jobs without managing clusters.

Cleaner ops for serverless analytics.

Amazon Web Services (AWS) added job-run level controls to Amazon EMR Serverless. More granular visibility to troubleshoot runs, tune performance, and control costs without managing clusters.

Read more: https://ow.ly/VAUi50XW7BC

#AWS #AmazonEMR #EMRServerless

0 0 0 0
Amazon EMR Serverless adds support for job run level cost allocation https://aws.amazon.com/emr/serverless/now supports job run-level cost allocation that provides better visibility into charges for individual job runs by allowing you to configure granular billing attribution at the individual job run level. You can get granular cost visibility by filtering and tracking costs in AWS Cost Explorer and Cost and Usage Reports by specific job run IDs and cost allocation tags associated with job runs. Amazon EMR Serverless is a deployment option in Amazon EMR that makes it simple for data engineers and data scientists to run open-source big data analytics frameworks without configuring, managing, and scaling clusters or servers. Previously, you could assign cost allocation tags to EMR Serverless applications, with cost attribution limited to the application level. With job run-level cost allocation, now you can assign cost allocation tags to each job run, enabling fine-grained billing attribution at the individual job run level. Cost allocation tags at the job run level also allow you to track costs by domains within a single application. For example, a single application could support jobs for finance and marketing domains, allowing you to track costs separately for each domain. Tracking costs for individual job runs makes it easier to conduct benchmarks that assess the costs of each job run as well as focus cost optimization efforts more precisely, allowing deeper insights into resource utilization and spending patterns across different jobs and domains. This feature is available in all AWS Regions where Amazon EMR Serverless is available including AWS GovCloud (US) and China regions. To learn more, see https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/jobs-job-level-cost-allocation.html in the Amazon EMR Serverless User Guide

Amazon EMR Serverless adds support for job run level cost allocation

https://aws.amazon.com/emr/serverless/now supports job run-level cost allocation that provides better visibility into charges for individual job runs by allowing you to configure granular billing attribution ...

#AWS #AmazonEmr

0 0 0 0
Preview
Amazon EMR Serverless adds support for job run level cost allocation Amazon EMR Serverless now supports job run-level cost allocation that provides better visibility into charges for individual job runs by allowing you to configure granular billing attribution at the individual job run level. You can get granular cost visibility by filtering and tracking costs in AWS Cost Explorer and Cost and Usage Reports by specific job run IDs and cost allocation tags associated with job runs. Amazon EMR Serverless is a deployment option in Amazon EMR that makes it simple for data engineers and data scientists to run open-source big data analytics frameworks without configuring, managing, and scaling clusters or servers. Previously, you could assign cost allocation tags to EMR Serverless applications, with cost attribution limited to the application level. With job run-level cost allocation, now you can assign cost allocation tags to each job run, enabling fine-grained billing attribution at the individual job run level. Cost allocation tags at the job run level also allow you to track costs by domains within a single application. For example, a single application could support jobs for finance and marketing domains, allowing you to track costs separately for each domain. Tracking costs for individual job runs makes it easier to conduct benchmarks that assess the costs of each job run as well as focus cost optimization efforts more precisely, allowing deeper insights into resource utilization and spending patterns across different jobs and domains. This feature is available in all AWS Regions where Amazon EMR Serverless is available including AWS GovCloud (US) and China regions. To learn more, see Enabling Job Level Cost Allocation in the Amazon EMR Serverless User Guide

🆕 Amazon EMR Serverless now provides job run-level cost allocation for granular billing. This enhances tracking and cost optimization by domain, improving visibility on resource use and spending patterns. Available in all EMR Serverless regions.

#AWS #AmazonEmr

0 0 0 0
Amazon S3 Storage Lens adds performance metrics, support for billions of prefixes, and export to S3 Tables New capabilities help optimize application performance, analyze unlimited prefixes, and simplify metrics analysis through S3 Tables integration.

Amazon S3 Storage Lens adds performance metrics, support for billions of prefixes, and export to S3 Tables

New capabilities help optimize appli...

#AWS #AmazonAthena #AmazonCloudwatch #AmazonEmr #AmazonQuickSight #AmazonRedshift #AmazonS3Tables #AmazonSimpleStorageService(S3) #Analytics #Storage

1 0 0 0
Amazon EMR Managed Scaling is now available in 7 additional AWS regions We are excited to announce that https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-scaling.html is now available for EMR on EC2 customers in the Asia Pacific (Malaysia, New Zealand, Taipei, Thailand), Canada West (Calgary), Mexico (Central), and US Gameday Northeast (Illinois) AWS Regions. Amazon EMR Managed Scaling automatically resizes the EC2 instances in your EMR cluster for the best performance at the lowest possible cost. With Amazon EMR Managed Scaling, you simply specify the minimum and maximum compute limits for your clusters, and Amazon EMR on EC2 automatically resizes your cluster for optimal performance and resource utilization. Amazon EMR Managed Scaling constantly monitors key workload-related metrics and uses an algorithm that optimizes the cluster size for the best resource utilization. Using this algorithm, Amazon EMR can scale the EC2 cluster up during peaks and scale it down during idle periods, reducing your costs and optimizing cluster capacity for the best performance. Amazon EMR Managed Scaling can also be used with Amazon EC2 Spot Instances, that lets you take advantage of unused EC2 capacity for a discount when compared to on-demand prices. Amazon EMR Managed Scaling is now available in all AWS commercial regions. Amazon EMR Managed Scaling is supported for Apache Spark, Apache Hive and YARN-based workloads on Amazon EMR on EC2 versions 6.14 and above. To learn more and to get started, visit the Amazon EMR Managed Scaling https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-scaling.html.

Amazon EMR Managed Scaling is now available in 7 additional AWS regions

We are excited to announce that docs.aws.amazon.com/emr/latest/ManagementGui... is now available for EMR on EC2 customers in the Asia Pacific (Malaysia, New Zealand, Taipei,...

#AWS #AmazonEmr

0 0 0 0
Preview
Amazon EMR Managed Scaling is now available in 7 additional AWS regions We are excited to announce that Amazon EMR Managed Scaling is now available for EMR on EC2 customers in the Asia Pacific (Malaysia, New Zealand, Taipei, Thailand), Canada West (Calgary), Mexico (Central), and US Gameday Northeast (Illinois) AWS Regions. Amazon EMR Managed Scaling automatically resizes the EC2 instances in your EMR cluster for the best performance at the lowest possible cost. With Amazon EMR Managed Scaling, you simply specify the minimum and maximum compute limits for your clusters, and Amazon EMR on EC2 automatically resizes your cluster for optimal performance and resource utilization. Amazon EMR Managed Scaling constantly monitors key workload-related metrics and uses an algorithm that optimizes the cluster size for the best resource utilization. Using this algorithm, Amazon EMR can scale the EC2 cluster up during peaks and scale it down during idle periods, reducing your costs and optimizing cluster capacity for the best performance. Amazon EMR Managed Scaling can also be used with Amazon EC2 Spot Instances, that lets you take advantage of unused EC2 capacity for a discount when compared to on-demand prices. Amazon EMR Managed Scaling is now available in all AWS commercial regions. Amazon EMR Managed Scaling is supported for Apache Spark, Apache Hive and YARN-based workloads on Amazon EMR on EC2 versions 6.14 and above. To learn more and to get started, visit the Amazon EMR Managed Scaling user guide.

🆕 Amazon EMR Managed Scaling now available in 7 more regions: Malaysia, New Zealand, Taipei, Thailand, Calgary, Central Mexico, and Illinois. It optimizes EC2 instance resizing for cost-effective, high-performance clusters. Supports Spark, Hive, and YARN on EMR 6.14+.

#AWS #AmazonEmr

0 0 0 0
Announcing the Apache Spark upgrade agent for Amazon EMR AWS announces the Apache Spark upgrade agent, a new capability that accelerates Apache Spark version upgrades for Amazon EMR on EC2 and EMR Serverless. The agent converts complex upgrade processes that typically take months into projects spanning weeks through automated code analysis and transformation. Organizations invest substantial engineering resources analyzing API changes, resolving conflicts, and validating applications during Spark upgrades. The agent introduces conversational interfaces where engineers express upgrade requirements in natural language, while maintaining full control over code modifications. The Apache Spark upgrade agent automatically identifies API changes and behavioral modifications across PySpark and Scala applications. Engineers can initiate upgrades directly from SageMaker Unified Studio, Kiro CLI or IDE of their choice with the help of MCP (Model Context Protocol) compatibility. During the upgrade process, the agent analyzes existing code and suggests specific changes, and engineers can review and approve before implementation. The agent validates functional correctness through data quality validations. The agent currently supports upgrades from Spark 2.4 to 3.5 and maintains data processing accuracy throughout the upgrade process. The Apache Spark upgrade agent is now available in all AWS Regions where SageMaker Unified Studio is available. To start using the agent, visit SageMaker Unified Studio and select IDE Spaces or install the Kiro CLI. For detailed implementation guidance, reference documentation, and migration examples, visit the https://docs.aws.amazon.com/emr/latest/ReleaseGuide/spark-upgrades.html.

Announcing the Apache Spark upgrade agent for Amazon EMR

AWS announces the Apache Spark upgrade agent, a new capability that accelerates Apache Spark version upgrades for Amazon EMR on EC2 and EMR Serverless. The agent converts complex upgrade processes...

#AWS #AwsGovcloudUs #AwsGlue #AmazonEmr

0 0 0 0
Preview
Announcing the Apache Spark upgrade agent for Amazon EMR AWS announces the Apache Spark upgrade agent, a new capability that accelerates Apache Spark version upgrades for Amazon EMR on EC2 and EMR Serverless. The agent converts complex upgrade processes that typically take months into projects spanning weeks through automated code analysis and transformation. Organizations invest substantial engineering resources analyzing API changes, resolving conflicts, and validating applications during Spark upgrades. The agent introduces conversational interfaces where engineers express upgrade requirements in natural language, while maintaining full control over code modifications. The Apache Spark upgrade agent automatically identifies API changes and behavioral modifications across PySpark and Scala applications. Engineers can initiate upgrades directly from SageMaker Unified Studio, Kiro CLI or IDE of their choice with the help of MCP (Model Context Protocol) compatibility. During the upgrade process, the agent analyzes existing code and suggests specific changes, and engineers can review and approve before implementation. The agent validates functional correctness through data quality validations. The agent currently supports upgrades from Spark 2.4 to 3.5 and maintains data processing accuracy throughout the upgrade process. The Apache Spark upgrade agent is now available in all AWS Regions where SageMaker Unified Studio is available. To start using the agent, visit SageMaker Unified Studio and select IDE Spaces or install the Kiro CLI. For detailed implementation guidance, reference documentation, and migration examples, visit the documentation.

🆕 AWS's Apache Spark upgrade agent for Amazon EMR speeds up Spark upgrades from 2.4 to 3.5 via automated code analysis, reducing months to weeks. Available globally with SageMaker Unified Studio.

#AWS #AwsGovcloudUs #AwsGlue #AmazonEmr

0 0 0 0
Amazon EMR Serverless eliminates local storage provisioning for Apache Spark workloads Amazon EMR Serverless now offers serverless storage that eliminates local storage provisioning for Apache Spark workloads, reducing data processing costs by up to 20% and preventing job failures from disk capacity constraints. You no longer need to configure local disk type and size for each application. EMR Serverless automatically handles intermediate data operation such as shuffle with no local storage charges. You pay only for compute and memory resources your job consumes. EMR Serverless offloads intermediate data operations to a fully managed, auto-scaling serverless storage that encrypts data in transit and at rest with job-level isolation. Serverless storage decouples storage from compute, allowing Spark to release workers immediately when idle rather than keeping workers active to preserve temporary data. It eliminates job failures from insufficient disk capacity and reduces costs by avoiding idle worker charges. This is particularly valuable for jobs using dynamic resource allocation, such as recommendation engines processing millions of customer interactions, where initial stages process large datasets with high parallelism then narrow as data aggregates. This feature is generally available for EMR release 7.12 and later. See https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/jobs-serverless-storage.html#jobs-serverless-storage-regions for availability. To get started, visit see serverless storage for EMR Serverless documentation. 

Amazon EMR Serverless eliminates local storage provisioning for Apache Spark workloads

Amazon EMR Serverless now offers serverless storage that eliminates local storage provisioning for Apache Spark workloads, reducing data processing costs by up to 20% and prev...

#AWS #AmazonEmr #AwsGovcloudUs

0 0 0 0
Amazon S3 Storage Lens adds performance metrics, support for billions of prefixes, and export to S3 Tables New capabilities help optimize application performance, analyze unlimited prefixes, and simplify metrics analysis through S3 Tables integration.

Amazon S3 Storage Lens adds performance metrics, support for billions of prefixes, and export to S3 Tables

New capabilities help optimize appli...

#AWS #AmazonAthena #AmazonCloudwatch #AmazonEmr #AmazonQuicksight #AmazonRedshift #AmazonS3Tables #AmazonSimpleStorageService(S3) #Analytics #Storage

0 0 0 0
AWS announces support for Apache Iceberg V3 deletion vectors and row lineage AWS now supports deletion vectors and row lineage as defined in the Apache Iceberg Version 3 (V3) specification. These new features are available with Apache Spark on Amazon EMR 7.12, AWS Glue, Amazon SageMaker notebooks, Amazon S3 Tables, and the AWS Glue Data Catalog. These Iceberg V3 capabilities help customers build petabyte-scale data lakes with improved performance for data modifications and functionality to easily track changed records. Deletion vectors write optimized delete files that speed up data pipelines and reduce data compaction costs. Row lineage provides metadata fields on each record to track changes with a simple SQL query, eliminating the computational expense of finding small changes in large tables. Get started creating V3 tables by setting the table property to 'format-version = 3' in the CREATE TABLE command in Spark or a SageMaker notebook. To upgrade existing tables, simply update the table property in metadata with the new format version. When you do this, AWS query engines that support V3 will automatically begin to use deletion vectors and row lineage. Iceberg V3 deletion vectors and row lineage are now available in all AWS Regions where each respective service/feature—Amazon EMR, AWS Glue, SageMaker notebooks, S3 Tables, and AWS Glue Data Catalog—is supported. To learn more about AWS support for Iceberg V3, visit https://docs.aws.amazon.com/AmazonS3/latest/userguide/working-with-apache-iceberg-v3.html, and read the https://aws.amazon.com/blogs/big-data/accelerate-data-lake-operations-with-apache-iceberg-v3-deletion-vectors-and-row-lineage/.

AWS announces support for Apache Iceberg V3 deletion vectors and row lineage

AWS now supports deletion vectors and row lineage as defined in the Apache Iceberg Version 3 (V3) specification. These new features are available with Apache Spark on Amazon EMR 7.1...

#AWS #AmazonS3 #AwsGlue #AmazonEmr

1 0 0 0
Preview
AWS announces support for Apache Iceberg V3 deletion vectors and row lineage AWS now supports deletion vectors and row lineage as defined in the Apache Iceberg Version 3 (V3) specification. These new features are available with Apache Spark on Amazon EMR 7.12, AWS Glue, Amazon SageMaker notebooks, Amazon S3 Tables, and the AWS Glue Data Catalog. These Iceberg V3 capabilities help customers build petabyte-scale data lakes with improved performance for data modifications and functionality to easily track changed records. Deletion vectors write optimized delete files that speed up data pipelines and reduce data compaction costs. Row lineage provides metadata fields on each record to track changes with a simple SQL query, eliminating the computational expense of finding small changes in large tables. Get started creating V3 tables by setting the table property to 'format-version = 3' in the CREATE TABLE command in Spark or a SageMaker notebook. To upgrade existing tables, simply update the table property in metadata with the new format version. When you do this, AWS query engines that support V3 will automatically begin to use deletion vectors and row lineage. Iceberg V3 deletion vectors and row lineage are now available in all AWS Regions where each respective service/feature—Amazon EMR, AWS Glue, SageMaker notebooks, S3 Tables, and AWS Glue Data Catalog—is supported. To learn more about AWS support for Iceberg V3, visit Apache Iceberg V3 on AWS, and read the blog post.

🆕 AWS supports Apache Iceberg V3 in EMR, Glue, SageMaker, S3 Tables, and Glue Data Catalog, boosting data lake performance and change tracking. Set 'format-version = 3' in CREATE TABLE. Available in all supported regions.

#AWS #AmazonS3 #AwsGlue #AmazonEmr

1 0 0 0
Amazon EMR and AWS Glue now support audit context support with Lake Formation Amazon EMR and AWS Glue now provide comprehensive audit context support for AWS Lake Formation credential vending APIs and AWS Glue Data Catalog GetTable and GetTables API calls. This auditing capability helps you maintain compliance with regulatory frameworks, including the Digital Markets Act (DMA) and data protection regulations. The feature is enabled by default, offering seamless integration into existing workflows while strengthening security and compliance monitoring across your data lake infrastructure. You can view this audit context information in AWS CloudTrail logs, enabling enhanced security auditing, regulatory compliance, and improved troubleshooting for EMR for Apache Spark native fine-grained access control (FGAC) and full table access jobs. The audit logging feature automatically records the platform type (EMR-EC2, EMR on EKS, EMR Serverless, or AWS Glue) and its corresponding identifiers like such as Cluster ID, Step ID, Job Run ID, and Virtual Cluster ID. This enables security teams to track and correlate API calls from individual Spark jobs, streamline compliance reporting, and analyze historical data access patterns. Additionally, data engineers can quickly troubleshoot access-related issues by connecting them to specific job executions, resolve FGAC permission challenges, and monitor access patterns across different compute platforms. This feature is available in all AWS Regions that support Amazon EMR, AWS Glue, and AWS Lake Formation, requiring EMR version 7.12+ or AWS Glue version 5.1+.

Amazon EMR and AWS Glue now support audit context support with Lake Formation

Amazon EMR and AWS Glue now provide comprehensive audit context support for AWS Lake Formation credential vending APIs and AWS Glue Data Catalog GetTable and GetTables API calls. This auditi...

#AWS #AmazonEmr #AwsGlue

1 0 0 0
Amazon EMR and AWS Glue now support write operations with AWS Lake Formation fine-grained access controls Amazon EMR and AWS Glue now enable you to enforce fine-grained access control (FGAC) on both read and write operations for AWS Lake Formation registered tables in your Apache Spark jobs. Previously, you could only apply Lake Formation's table, column, and row-level permissions for read operations (SELECT, DESCRIBE). This simplifies data workflows by allowing both read and write tasks in a single Spark job, eliminating the need for separate clusters or applications. Organizations can now execute end-to-end data workflows with consistent security controls, streamlining operations and reducing infrastructure costs. With this launch, administrators can control who is authorized to insert new data, update specific records, or merge changes through DML operations (CREATE, ALTER, INSERT, UPDATE, DELETE, MERGE INTO, DROP), ensuring that all data modifications adhere to specified security policies to mitigate the risk of unauthorized data modification, or misuse. This launch simplifies data governance and security frameworks by providing a single point for defining access rules in AWS Lake Formation and enforcing these rules in Spark for both read and write operations. This feature is available in all AWS Regions where Amazon EMR (EC2, EKS and Serverless), AWS Glue and AWS Lake Formation are available. To learn more, visit the https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/emr-serverless-lf-enable.html#emr-serverless-lf-enable-open-table-format-support documentation.

Amazon EMR and AWS Glue now support write operations with AWS Lake Formation fine-grained access controls

Amazon EMR and AWS Glue now enable you to enforce fine-grained access control (FGAC) on both read and write operations for AWS Lake Formation registered tables in...

#AWS #AmazonEmr #AwsGlue

1 0 0 0
Preview
Amazon EMR and AWS Glue now support audit context support with Lake Formation Amazon EMR and AWS Glue now provide comprehensive audit context support for AWS Lake Formation credential vending APIs and AWS Glue Data Catalog GetTable and GetTables API calls. This auditing capability helps you maintain compliance with regulatory frameworks, including the Digital Markets Act (DMA) and data protection regulations. The feature is enabled by default, offering seamless integration into existing workflows while strengthening security and compliance monitoring across your data lake infrastructure. You can view this audit context information in AWS CloudTrail logs, enabling enhanced security auditing, regulatory compliance, and improved troubleshooting for EMR for Apache Spark native fine-grained access control (FGAC) and full table access jobs. The audit logging feature automatically records the platform type (EMR-EC2, EMR on EKS, EMR Serverless, or AWS Glue) and its corresponding identifiers like such as Cluster ID, Step ID, Job Run ID, and Virtual Cluster ID. This enables security teams to track and correlate API calls from individual Spark jobs, streamline compliance reporting, and analyze historical data access patterns. Additionally, data engineers can quickly troubleshoot access-related issues by connecting them to specific job executions, resolve FGAC permission challenges, and monitor access patterns across different compute platforms. This feature is available in all AWS Regions that support Amazon EMR, AWS Glue, and AWS Lake Formation, requiring EMR version 7.12+ or AWS Glue version 5.1+.

🆕 Amazon EMR and AWS Glue now support audit context for AWS Lake Formation, aiding compliance with regulations like DMA. Default enabled, it logs platform details in CloudTrail, enhancing security and troubleshooting. Available in all regions with EMR 7.12+ or Glue 5.1+.

#AWS #AmazonEmr #AwsGlue

1 0 0 0
Preview
Amazon EMR and AWS Glue now support write operations with AWS Lake Formation fine-grained access controls Amazon EMR and AWS Glue now enable you to enforce fine-grained access control (FGAC) on both read and write operations for AWS Lake Formation registered tables in your Apache Spark jobs. Previously, you could only apply Lake Formation's table, column, and row-level permissions for read operations (SELECT, DESCRIBE). This simplifies data workflows by allowing both read and write tasks in a single Spark job, eliminating the need for separate clusters or applications. Organizations can now execute end-to-end data workflows with consistent security controls, streamlining operations and reducing infrastructure costs. With this launch, administrators can control who is authorized to insert new data, update specific records, or merge changes through DML operations (CREATE, ALTER, INSERT, UPDATE, DELETE, MERGE INTO, DROP), ensuring that all data modifications adhere to specified security policies to mitigate the risk of unauthorized data modification, or misuse. This launch simplifies data governance and security frameworks by providing a single point for defining access rules in AWS Lake Formation and enforcing these rules in Spark for both read and write operations. This feature is available in all AWS Regions where Amazon EMR (EC2, EKS and Serverless), AWS Glue and AWS Lake Formation are available. To learn more, visit the open table format support documentation.

🆕 Amazon EMR and AWS Glue now support write operations with AWS Lake Formation's fine-grained access controls, enabling consistent security for both read and write tasks in Spark jobs, simplifying data workflows and reducing infrastructure costs.

#AWS #AmazonEmr #AwsGlue

1 0 0 0
Amazon EMR Serverless now supports Apache Spark 4.0.1 (preview) Amazon EMR Serverless now supports Apache Spark 4.0.1 (preview). With Spark 4.0.1, you can build and maintain data pipelines more easily with ANSI SQL and VARIANT data types, strengthen compliance and governance frameworks with Apache Iceberg v3 table format, and deploy new real-time applications faster with enhanced streaming capabilities. This enables your teams to reduce technical debt and iterate more quickly, while ensuring data accuracy and consistency. With Spark 4.0.1, you can build data pipelines with standard ANSI SQL, making it accessible to a larger set of users who don't know programming languages like Python or Scala. Spark 4.0.1 natively supports JSON and semi-structured data through VARIANT data types, providing flexibility for handling diverse data formats. You can strengthen compliance and governance through Apache Iceberg v3 table format, which provides transaction guarantees and tracks how your data changes over time, creating the audit trails you need for regulatory requirements. You can deploy real-time applications faster with improved streaming controls that let you manage complex stateful operations and monitor streaming jobs more easily. With this capability, you can support use cases like fraud detection and real-time personalization. Apache Spark 4.0.1 is available in preview in all regions where EMR Serverless is available, excluding China and AWS GovCloud (US) regions. To learn more about Apache Spark 4.0.1 on Amazon EMR, visit the https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/release-version-emr-spark-8.0-preview.html, or get started by creating an EMR application with Spark 4.0.1 from the https://console.aws.amazon.com/emr/serverless.

Amazon EMR Serverless now supports Apache Spark 4.0.1 (preview)

Amazon EMR Serverless now supports Apache Spark 4.0.1 (preview). With Spark 4.0.1, you can build and maintain data pipelines more easily with ANSI SQL and VARIANT data types, strengthen compliance and governance f...

#AWS #AmazonEmr

0 0 0 0
Preview
Amazon EMR Serverless now supports Apache Spark 4.0.1 (preview) Amazon EMR Serverless now supports Apache Spark 4.0.1 (preview). With Spark 4.0.1, you can build and maintain data pipelines more easily with ANSI SQL and VARIANT data types, strengthen compliance and governance frameworks with Apache Iceberg v3 table format, and deploy new real-time applications faster with enhanced streaming capabilities. This enables your teams to reduce technical debt and iterate more quickly, while ensuring data accuracy and consistency. With Spark 4.0.1, you can build data pipelines with standard ANSI SQL, making it accessible to a larger set of users who don't know programming languages like Python or Scala. Spark 4.0.1 natively supports JSON and semi-structured data through VARIANT data types, providing flexibility for handling diverse data formats. You can strengthen compliance and governance through Apache Iceberg v3 table format, which provides transaction guarantees and tracks how your data changes over time, creating the audit trails you need for regulatory requirements. You can deploy real-time applications faster with improved streaming controls that let you manage complex stateful operations and monitor streaming jobs more easily. With this capability, you can support use cases like fraud detection and real-time personalization. Apache Spark 4.0.1 is available in preview in all regions where EMR Serverless is available, excluding China and AWS GovCloud (US) regions. To learn more about Apache Spark 4.0.1 on Amazon EMR, visit the Amazon EMR Serverless release notes, or get started by creating an EMR application with Spark 4.0.1 from the AWS Management Console.

🆕 Amazon EMR Serverless adds Apache Spark 4.0.1 (preview) for easier data pipelines with ANSI SQL, VARIANT types, Apache Iceberg v3, and enhanced streaming. Available in all regions except China and AWS GovCloud (US).

#AWS #AmazonEmr

1 0 0 0
Amazon EMR 7.12 now supports the Apache Iceberg v3 table format Amazon EMR 7.12 is now available featuring the new Apache Iceberg v3 table format with Apache Iceberg 1.10. This release enables you to reduce costs when deleting data, strengthen governance and compliance through better tracking for row level changes, and enhance data security with more granular data access control. With Iceberg v3, you can delete data cost-effectively because Iceberg v3 marks deleted rows without rewriting entire files - speeding up your data pipelines while reducing storage costs. You get better governance and compliance capabilities through automatic tracking of every row’s creation and modification history, creating the audit trails needed for regulatory requirements and change data capture. You can enhance data security with table-level encryption, helping you meet privacy regulations for your most sensitive data. With Apache Spark 3.5.6 included in this release, you can leverage these Iceberg 1.10 capabilities for building robust data lakehouse architectures on Amazon S3. This release also includes support for data governance operations across your Iceberg tables using AWS Lake Formation. In addition, this release also includes Apache Trino 476. Amazon EMR 7.12 is available in all AWS Regions that support Amazon EMR. To learn more about Amazon EMR 7.12 release, visit the https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-7120-release.html. 

Amazon EMR 7.12 now supports the Apache Iceberg v3 table format

Amazon EMR 7.12 is now available featuring the new Apache Iceberg v3 table format with Apache Iceberg 1.10. This release enables you to reduce costs when deleting data, strengthen governance and compliance through...

#AWS #AmazonEmr

1 0 0 0
Preview
Amazon EMR 7.12 now supports the Apache Iceberg v3 table format Amazon EMR 7.12 is now available featuring the new Apache Iceberg v3 table format with Apache Iceberg 1.10. This release enables you to reduce costs when deleting data, strengthen governance and compliance through better tracking for row level changes, and enhance data security with more granular data access control. With Iceberg v3, you can delete data cost-effectively because Iceberg v3 marks deleted rows without rewriting entire files - speeding up your data pipelines while reducing storage costs. You get better governance and compliance capabilities through automatic tracking of every row’s creation and modification history, creating the audit trails needed for regulatory requirements and change data capture. You can enhance data security with table-level encryption, helping you meet privacy regulations for your most sensitive data. With Apache Spark 3.5.6 included in this release, you can leverage these Iceberg 1.10 capabilities for building robust data lakehouse architectures on Amazon S3. This release also includes support for data governance operations across your Iceberg tables using AWS Lake Formation. In addition, this release also includes Apache Trino 476. Amazon EMR 7.12 is available in all AWS Regions that support Amazon EMR. To learn more about Amazon EMR 7.12 release, visit the Amazon EMR 7.12 release documentation.

🆕 Amazon EMR 7.12 supports Apache Iceberg v3 for cost-effective data deletion, better governance, and enhanced security with table-level encryption, plus Spark 3.5.6 and Trino 476. Available in all AWS Regions.

#AWS #AmazonEmr

1 0 0 0
Amazon EMR announces S3A as the default connector AWS announces Amazon EMR S3A, a new Amazon S3 connector that optimizes performance for Apache Hadoop, Apache Spark, and Apache Hive workloads on https://aws.amazon.com/emr/. This new connector enhances the open source S3A architecture with AWS-specific optimizations to help organizations process large-scale data more efficiently. With direct integration support for S3 Express One Zone, S3 Glacier, and AWS Outposts, EMR S3A helps customers leverage different storage options in AWS to optimize both data access speed and storage cost on their EMR workloads. Additionally, the EMR S3A connector delivers advanced security features and performance capabilities that extend beyond open source S3A. Key improvements include Apache Spark built-in fine-grained access control support, enhanced S3A credentials resolver, MagicCommitter V2 for optimized file writes, and accelerated S3 prefix listing for columnar file formats. These enhancements are available starting with EMR release 7.10 and maintain compatibility with existing applications. The Amazon EMR S3A connector is available in all AWS Regions where Amazon EMR is available and comes pre-configured with Amazon EMR release version 7.10 and later. To learn more about Amazon EMR S3A, see the https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-s3a-file.html.

Amazon EMR announces S3A as the default connector

AWS announces Amazon EMR S3A, a new Amazon S3 connector that optimizes performance for Apache Hadoop, Apache Spark, and Apache Hive workloads on https://aws.amazon.com/emr/ This new connector enhances the open source S3A archi...

#AWS #AmazonEmr

1 0 0 0
Amazon EMR on EC2 Adds Apache Spark native FGAC and AWS Glue Data Catalog Views Support Amazon EMR on EC2 announces two significant enhancements for governance: Apache Spark native fine-grained access control (FGAC) via AWS Lake Formation, and support for AWS Glue Data Catalog views. These features allow organizations to improve data security, simplify access management, and enhance data sharing capabilities across their analytics environments. The Apache Spark native FGAC implementation allows customers to define granular access policies once in AWS Lake Formation and apply them consistently across EMR clusters. This reduces security risks and administrative overhead while providing a unified approach to data governance. Customers can now use familiar Lake Formation grant and revoke statements to manage access controls for their Spark jobs and interactive sessions on EMR on EC2, similar to how this works for other AWS analytics services. AWS Glue Data Catalog views enables customers to create, manage, and query multi-engine SQL views across AWS regions, accounts, and organizations. This feature allows administrators to create views from Spark jobs that can be queried from multiple engines, while controlling data access through Lake Formation permissions. These permissions include named resource grants, data filters, and tags, with all access requests automatically logged in AWS CloudTrail for comprehensive auditing. Apache Spark native FGAC and Glue Data Catalog view features are available with Amazon EMR release 7.10 in all AWS Regions where EMR on EC2 is available. To learn more, visit https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-lake-formation.html and https://docs.aws.amazon.com/emr/latest/ManagementGuide/SECTION-jobs-glue-data-catalog-views-ec2.htmlin the Amazon EMR documentation.

Amazon EMR on EC2 Adds Apache Spark native FGAC and AWS Glue Data Catalog Views Support

Amazon EMR on EC2 announces two significant enhancements for governance: Apache Spark native fine-grained access control (FGAC) via AWS Lake Formation, and support for AWS Glue Data Catalog...

#AWS #AmazonEmr

1 0 0 0
Preview
Amazon EMR on EC2 Adds Apache Spark native FGAC and AWS Glue Data Catalog Views Support Amazon EMR on EC2 announces two significant enhancements for governance: Apache Spark native fine-grained access control (FGAC) via AWS Lake Formation, and support for AWS Glue Data Catalog views. These features allow organizations to improve data security, simplify access management, and enhance data sharing capabilities across their analytics environments. The Apache Spark native FGAC implementation allows customers to define granular access policies once in AWS Lake Formation and apply them consistently across EMR clusters. This reduces security risks and administrative overhead while providing a unified approach to data governance. Customers can now use familiar Lake Formation grant and revoke statements to manage access controls for their Spark jobs and interactive sessions on EMR on EC2, similar to how this works for other AWS analytics services. AWS Glue Data Catalog views enables customers to create, manage, and query multi-engine SQL views across AWS regions, accounts, and organizations. This feature allows administrators to create views from Spark jobs that can be queried from multiple engines, while controlling data access through Lake Formation permissions. These permissions include named resource grants, data filters, and tags, with all access requests automatically logged in AWS CloudTrail for comprehensive auditing. Apache Spark native FGAC and Glue Data Catalog view features are available with Amazon EMR release 7.10 in all AWS Regions where EMR on EC2 is available. To learn more, visit Using AWS Lake Formation with Amazon EMR and Working with AWS Glue Data Catalog Views in the Amazon EMR documentation.

🆕 Amazon EMR on EC2 now supports Apache Spark FGAC and AWS Glue Data Catalog views for enhanced data security and governance, simplifying access management and improving data sharing across analytics environments. Available in EMR 7.10.

#AWS #AmazonEmr

1 0 0 0
Preview
Amazon EMR announces S3A as the default connector AWS announces Amazon EMR S3A, a new Amazon S3 connector that optimizes performance for Apache Hadoop, Apache Spark, and Apache Hive workloads on Amazon EMR. This new connector enhances the open source S3A architecture with AWS-specific optimizations to help organizations process large-scale data more efficiently. With direct integration support for S3 Express One Zone, S3 Glacier, and AWS Outposts, EMR S3A helps customers leverage different storage options in AWS to optimize both data access speed and storage cost on their EMR workloads. Additionally, the EMR S3A connector delivers advanced security features and performance capabilities that extend beyond open source S3A. Key improvements include Apache Spark built-in fine-grained access control support, enhanced S3A credentials resolver, MagicCommitter V2 for optimized file writes, and accelerated S3 prefix listing for columnar file formats. These enhancements are available starting with EMR release 7.10 and maintain compatibility with existing applications. The Amazon EMR S3A connector is available in all AWS Regions where Amazon EMR is available and comes pre-configured with Amazon EMR release version 7.10 and later. To learn more about Amazon EMR S3A, see the Amazon EMR documentation.

🆕 Amazon EMR now uses S3A as the default connector, optimizing Hadoop, Spark, and Hive workloads. It supports S3 Express One Zone, Glacier, and Outposts, with advanced security and performance features, available from EMR 7.10.

#AWS #AmazonEmr

1 0 0 0
Amazon EMR on EKS now supports Service Quotas Today, Amazon EMR on EKS announces support for Service Quotas, improving visibility and control over EMR on EKS quotas. Previously, to request an increase for EMR on EKS quotas, such as maximum number of StartJobRun API calls per second, customers had to open a support ticket and wait for the support team to process the increase. Now, customers can view and manage their EMR on EKS quota limits directly in the https://us-east-1.console.aws.amazon.com/servicequotas/home/services/emr-containers/quotas. This enables automated limit increase approvals for eligible requests, improving response times and reducing the number of support tickets. Customers can also set up Amazon CloudWatch alarms to get automatically notified when their usage reaches a certain percentage of a maximum quota. Amazon EMR on EKS support for Service Quotas is available in all Regions where Amazon EMR on EKS is currently available. To get started, visit the https://docs.aws.amazon.com/servicequotas/latest/userguide/intro.html.

Amazon EMR on EKS now supports Service Quotas

Today, Amazon EMR on EKS announces support for Service Quotas, improving visibility and control over EMR on EKS quotas.

Previously, to request an increase for EMR on EKS quotas, such as maximum number...

#AWS #AwsGovcloudUs #AmazonEks #AmazonEmr

1 0 0 0
Preview
Amazon EMR on EKS now supports Service Quotas Today, Amazon EMR on EKS announces support for Service Quotas, improving visibility and control over EMR on EKS quotas. Previously, to request an increase for EMR on EKS quotas, such as maximum number of StartJobRun API calls per second, customers had to open a support ticket and wait for the support team to process the increase. Now, customers can view and manage their EMR on EKS quota limits directly in the Service Quotas console. This enables automated limit increase approvals for eligible requests, improving response times and reducing the number of support tickets. Customers can also set up Amazon CloudWatch alarms to get automatically notified when their usage reaches a certain percentage of a maximum quota. Amazon EMR on EKS support for Service Quotas is available in all Regions where Amazon EMR on EKS is currently available. To get started, visit the Service Quotas User Guide.

🆕 Amazon EMR on EKS now supports Service Quotas, allowing direct management of quota limits in the Service Quotas console, reducing support tickets and enabling automated limit increases and CloudWatch alarms. Available in all EMR on EKS regions.

#AWS #AwsGovcloudUs #AmazonEks #AmazonEmr

0 0 0 0
Amazon EMR enables enhanced Apache Spark capabilities for Lake Formation tables with full table access Amazon EMR now supports read and write operations from Apache Spark jobs on AWS Lake Formation registered tables when the job role has full table access. This capability enables Data Manipulation Language (DML) operations including CREATE, ALTER, DELETE, UPDATE, and MERGE INTO statements on Apache Hive and Iceberg tables from within the same Apache Spark application. While Lake Formation's fine-grained access control (FGAC) offers granular security controls at row, column, and cell levels, many ETL workloads simply need full table access. This new feature enables Apache Spark to directly read and write data when full table access is granted, removing FGAC limitations that previously restricted certain ETL operations. You can now leverage advanced Spark capabilities including RDDs, custom libraries, UDFs, and custom images (AMIs for EMR on EC2, custom images for EMR-Serverless) with Lake Formation tables. Additionally, data teams can run complex, interactive Spark applications through SageMaker Unified Studio in compatibility mode while maintaining Lake Formation's table-level security boundaries. This feature is available in all AWS Regions where Amazon EMR and AWS Lake Formation are supported. To learn more about this feature, visit the https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/lake-formation-unfiltered-access.html section in EMR Serverless documentation.

Amazon EMR enables enhanced Apache Spark capabilities for Lake Formation tables with full table access

Amazon EMR now supports read and write operations from Apache Spark jobs on AWS Lake Formation registered tables when the job role has full table access. This capabi...

#AWS #AmazonEmr #AwsGlue

0 0 0 0