Introduction
AWS Dynamodb is NoSql database, it can scale up to the requirement by providing high availability and durability. A good introduction can be found here
Problem statement
We have hit a stumbling block where one of our tables contains near a million of data and that is expected to out grow multiples of current size with the time, we needed query that returns fast response out of this millions of data set. Currently we have used scan, which performs a full table scan in order to return the results regardless of number of records in the database. Problem with this approach is that response is very slow and it guaranteed get even slow when records hit huge sizes.
Event table is typically a table that collects event data over a period, it is legacy table so that we can not modify or introduce any hash keys.
Attempts to improve the performance
This is how our scan looked liked initially, against table (Event)
Map<String, Condition> scanFilter = new HashMap<String, Condition>(); Condition condition = new Condition().withComparisonOperator(ComparisonOperator.EQ.toString()) .withAttributeValueList(new AttributeValue().withS(attributeValue)); scanFilter.put(attributeType, condition); scanExpression.setScanFilter(scanFilter); PaginatedScanList<T> scan = mapper.scan(Event.class, scanExpression); //Event class the model class against table Event in dynamodb
Results with scan
On average it ranged somewhere in 25-30 seconds, which is very sluggish.
Parallel Scan
We re-wrote some of the queries using parallel scan that, considerably improved the performance, but the application still looked bit of slump compared to its responsiveness. Writing parallel scans can be found here.
DynamoDBScanExpression scanExpression = new DynamoDBScanExpression(); Map scanFilter = new HashMap(); Condition condition = new Condition().withComparisonOperator(ComparisonOperator.EQ.toString()) .withAttributeValueList(new AttributeValue().withS(attributeValue)); scanFilter.put(attributeType, condition); scanExpression.setScanFilter(scanFilter); PaginatedParallelScanList scan = mapper.parallelScan(classType, scanExpression,totalSegments)
As you can see that, we have used parallel scans, this issues separates jobs over the large quantity of data divided by number of segments. When issuing a parallel scan you need to specify number of segments for which the table to be scanned to.
Another caveat in this approach is, we have to constantly fine tune the number of segments as our db records size grows, this would be become bit of a pain in terms maintenance of application.
Results with parallel scan
On average it ranged somewhere in 10-12 seconds with fine tuned number of segments (20-25).
A Solution
So the typical solution should be looked else where, and there was Secondary Indexes. How Secondary Indexes works exactly can be found in above link. You can create index with the subset of table (Event) fields and issue query or scan against this index. Since most of our queries are related to finding activity information on current day or yesterday. we created the Global Secondary Index using all necessary fields need to be in the result of query and making "occurDate" is the hash key.
Eg. occurDate = "2015-06-04"
When you create a index with a hash key, all records against that hash key will be stored under separate bucket for that hash key. For example, all activity records for 2015-06-04 will be under hash_key for "2015-06-04" and all activity records for 2015-06-05 will be under hash_key for "2015-06-05" so on. You can define composite hash key for an index as well, using a range key attribute along with primary hash key, such as another field date ("valid_until") or number ("hits"). In our there wasn't a need for such a one.
Creating Secondary Index
Creating index via code
AmazonDynamoDBClient ddbClient; DynamoDBMapper mapper; ddbClient = AmazonDynamoDBConnection.getDynamoDBClient(); mapper = new DynamoDBMapper(ddbClient); ArrayList attributeDefinitions = new ArrayList(); attributeDefinitions.add(new AttributeDefinition().withAttributeName("occurDate").withAttributeType("S")); attributeDefinitions.add(new AttributeDefinition().withAttributeName("channelNo").withAttributeType("S")); attributeDefinitions.add(new AttributeDefinition().withAttributeName("eventType").withAttributeType("S")); attributeDefinitions.add(new AttributeDefinition().withAttributeName("eventStatus").withAttributeType("S")); attributeDefinitions.add(new AttributeDefinition().withAttributeName("isActive").withAttributeType("S")); attributeDefinitions.add(new AttributeDefinition().withAttributeName("isAlarm").withAttributeType("S")); attributeDefinitions.add(new AttributeDefinition().withAttributeName("serialNumber").withAttributeType("S")); attributeDefinitions.add(new AttributeDefinition().withAttributeName("occurTime_24").withAttributeType("S")); attributeDefinitions.add(new AttributeDefinition().withAttributeName("failureCount").withAttributeType("S")); attributeDefinitions.add(new AttributeDefinition().withAttributeName("occurTime").withAttributeType("S")); /*And you need to tell which attributes should be projected to index explicitly, alternatively you can ProjectionType.ALL but you need to be aware, additional attributes will cost space and incur cost during read and write as well*/ Projection p = new Projection().withProjectionType(ProjectionType.INCLUDE).withNonKeyAttributes( "channelNo", "eventType", "eventStatus", "isActive", "isAlarm", "serialNumber", "occurTime_24", "failureCount","occurTime"); //define the hash key
ArrayList indexKeySchema = new ArrayList();
indexKeySchema.add(new KeySchemaElement().withAttributeName("occurDate").withKeyType(KeyType.HASH));
//And you can specify the ready and write capacity
CreateGlobalSecondaryIndexAction action = new CreateGlobalSecondaryIndexAction().withIndexName("occurDateIndex") .withProjection(p).withKeySchema(indexKeySchema).withProvisionedThroughput(new ProvisionedThroughput() .withReadCapacityUnits((long) 500).withWriteCapacityUnits((long) 100));
GlobalSecondaryIndexUpdate gsiu = new GlobalSecondaryIndexUpdate().withCreate(action);
//and tell against which table the index is created.
UpdateTableRequest uReq = new UpdateTableRequest().withGlobalSecondaryIndexUpdates(gsiu).withTableName("Event") .withAttributeDefinitions(attributeDefinitions);
//all good, finally create index
UpdateTableResult updateTable = ddbClient.updateTable(uReq);
You should see new index creation started on your table, this can be viewed via your aws console. Depends on the number of records this process takes a while. if index is ready you should see the status as "Active" in the index table
Issuing query against index
Now, with our index created ("occurDateIndex") we should be able to issue query against our indexes, and see how effectively the index responds.
AmazonDynamoDBClient ddbClient; DynamoDBMapper mapper; ddbClient = AmazonDynamoDBConnection.getDynamoDBClient(); mapper = new DynamoDBMapper(ddbClient); DynamoDB dynamoDB = new DynamoDB(ddbClient); Table table = dynamoDB.getTable("Event"); Index index = table.getIndex("occurDateIdx"); ItemCollection items = null; QuerySpec querySpec = new QuerySpec(); /*since we know against which date we are going to issue query, we specify it our has value, so that query can immediately spot the bucket where it needed concentrate its working*/ querySpec.withHashKey("occurDate", "2015-06-04").withMaxResultSize(20000).withFilterExpression("serialNumber = :v_serialNumber and channelNo = :v_channel").withValueMap(new ValueMap() .withString(":v_serialNumber", "1B0111DPAYF8TG6").withString(":v_channel", "1")); items = index.query(querySpec); PageIterable pages = items.pages(); List list = new ArrayList<>(); items.forEach(t-> list.add(t.getJSONPretty("eventStatus")));
with million records in my database, our average responses range in 3 to 3.5 seconds with indexes where it was nearly about 25 seconds against the table scans. This is a dramatic gain in terms of performance. And we don't have to worry about the queries performance as the size of database grows sing our typically queries are just narrowed to a single bucket, so performance is likely to remain 3-4 seconds range.
NB: All time measurements include with delay over the wire ( network latency ) where still keeping in with a aws node close to our location. The main motive to show timing(average) is just to demonstrate improved performance, but not as any bench marks.
Nice informative article on AWS DynamoDB, I really appreciate your efforts and I am waiting for your further post thanks once again.
ReplyDeleteBest Regards,
AWS Online Training
AWS Training
Amazon Web Services Online Training in Hyderabad
AWS Online Training in Hyderabad
AWS Certification Online Training
AWS Training Online
AWS Certification Training
AWS Training and Certification
Learn AWS
Amazon Web Services Training
AWS Training in Hyderabad
Amazon Web Services Training in hyderabad
Amazon Web Services Training in india
AWS Training Institute in Hyderabad
CourseIng
The information which you have provided is very good. It is very useful who is looking for AWS Online course Bangalore
ReplyDeleteI have read your blog its very attractive and impressive. I like your blog.
ReplyDeleteAWS Online Training Bangalore
AWS Online Training Hyderabad
AWS Online Training India
AWS Online Training
I read your post, nice very easy to understand thanks for sharing for more updates AWS Online Training Hyderabad
ReplyDeleteNice informative article on AWS DynamoDB.
ReplyDeleteAWS Training in Bangalore
Best Big Data and Hadoop Training in Bangalore
Mastering Machine Learning
Artificial intelligence training in Bangalore
Blockchain training in bangalore
Python Training in Bangalore
This concept is a good way to enhance the knowledge.thanks for sharing. please keep it up
ReplyDeletesalesforce Online course Bangalore
I am very happy to see this post because it is very useful for me, because there is so much information in it. I always like to read quality and I'm happy that I got this thing in your post. Nice blog,I understood the topic very clearly,And want to study more like thisJava training in Chennai
ReplyDeleteJava Online training in Chennai
Java Course in Chennai
Best JAVA Training Institutes in Chennai
Java training in Bangalore
Java training in Hyderabad
Java Training in Coimbatore
Java Training
Java Online Training
I want to tell you that I am new to weblog and definitely like this blog site. It is very possible that I am going to bookmark your blog. You have amazing stories. Thanks for sharing the best article post.
ReplyDeleteselenium training in chennai
selenium training in chennai
selenium online training in chennai
software testing training in chennai
selenium training in bangalore
selenium training in hyderabad
selenium training in coimbatore
selenium online training
selenium training
Nice informative article on AWS DynamoDB, I really appreciate your efforts and I am waiting for your further post thanks once again. Thanks for sharing this information.
ReplyDeleteangular js training in chennai
angular training in chennai
angular js online training in chennai
angular js training in bangalore
angular js training in hyderabad
angular js training in coimbatore
angular js training
angular js online training
This article is really helpful for me. I am regular visitor to this blog. Share such kind of article more in future.It’s hard to come by experienced people about this subject, but you seem like you know what you’re talking about! Thanks.
ReplyDeleteData Science Training In Chennai
Data Science Online Training In Chennai
Data Science Training In Bangalore
Data Science Training In Hyderabad
Data Science Training In Coimbatore
Data Science Training
Data Science Online Training
This is one awesome blog article. Much thanks again.
ReplyDeleteacte chennai
acte complaints
acte reviews
acte trainer complaints
acte trainer reviews
acte velachery reviews complaints
acte tambaram reviews complaints
acte anna nagar reviews complaints
acte porur reviews complaints
acte omr reviews complaints
your post is the very organized way and easily understandable. Doing a good job. Thank you for sharing this content.
ReplyDeleteAWS Course in Bangalore
AWS Course in Hyderabad
AWS Course in Coimbatore
AWS Course
AWS Certification Course
AWS Certification Training
AWS Online Training
AWS Training
I am very happy to see this post because it is very useful for me, because there is so much information in it. I always like to read quality and I'm happy that I got this thing in your post. Thanks for sharing the best article post.
ReplyDeleteIELTS Coaching in chennai
German Classes in Chennai
GRE Coaching Classes in Chennai
TOEFL Coaching in Chennai
spoken english classes in chennai | Communication training
the content on your blog was really helpful and informative. Thakyou. # BOOST Your GOOGLE RANKING.It’s Your Time To Be On #1st Page
ReplyDeleteOur Motive is not just to create links but to get them indexed as will
Increase Domain Authority (DA).We’re on a mission to increase DA PA of your domain
High Quality Backlink Building Service
1000 Backlink at cheapest
50 High Quality Backlinks for just 50 INR
2000 Backlink at cheapest
5000 Backlink at cheapest
It was wonderfull reading your article. Great writing styleiamlinkfeeder iamlinkfeeder iamlinkfeeder iamlinkfeeder iamlinkfeeder iamlinkfeeder iamlinkfeeder iamlinkfeeder iamlinkfeeder iamlinkfeeder
ReplyDeleteKim Ravida is a lifestyle and business coach who helps women in business take powerful money actions and make solid, productiveIamLinkfeeder IamLinkfeeder IamLinkfeeder IamLinkfeeder IamLinkfeeder IamLinkfeeder IamLinkfeeder IamLinkfeeder IamLinkfeeder IamLinkfeeder
ReplyDeleteDavid Forbes is president of Alliance Marketing Associates IncIamLinkfeeder IamLinkfeeder IamLinkfeeder IamLinkfeeder IamLinkfeeder IamLinkfeeder IamLinkfeeder IamLinkfeeder IamLinkfeeder IamLinkfeeder
ReplyDeleteWe are used to the fact that we know only religious and public holidays and celebrate only them.buyseoservice2 buyseoservice2 buyseoservice2 buyseoservice2 buyseoservice2 buyseoservice2 buyseoservice2 buyseoservice2 buyseoservice2 buyseoservice2
ReplyDeleteAnnabelle loves to write and has been doing so for many years.iamlinkfeeder4 iamlinkfeeder4 iamlinkfeeder4 iamlinkfeeder4 iamlinkfeeder4 iamlinkfeeder4 iamlinkfeeder4 iamlinkfeeder4 iamlinkfeeder4
ReplyDelete
ReplyDeleteDigital Lync offers one of the best Online Courses Hyderabad with a comprehensive course curriculum with Continuous Integration, Delivery, and Testing. Elevate your practical knowledge with quizzes, assignments, Competitions, and Hackathons to give a boost to your confidence with our hands-on Full Stack Training. An advantage of the online Cources development course in Hyderabad from Digital Lync is to get industry-ready with Career Guidance and Interview preparation.
DevOps Training Institute
Python Training Institute
AWS Training Institute
Online Full Stack Developer Course Hyderabad
Python Course Hyderabad
Online AWS Training Course Hyderabad
devops training in hyderabad
angular training in hyderabad
I need to pass on a little remark to help and hope everything works out for you of luck. We hope everything turns out great for you of karma in all your writing for a blog endeavors…
ReplyDeleteDevOps Training in Hyderabad
Trade Stocks, Forex, And Bitcoin Anywhere In The World:exness login Is The Leading Provider Of Software That Allows You To Trade On Your Own Terms. Whether You Are Operating In The Forex, Stock, cgin Software And Anonymous Digital Wallet To Connect With The Financial World.: exness login Is A Currency Trading Company That Allows You To Trade Stocks, Forex, And Cryptocurrency.
ReplyDeleteStrategy class listen. Sort respond such someone wife public. Expert allow she fine idea.sports
ReplyDelete