The Bhargav Journal

Monday, July 25, 2022

Understand Code Refactoring Techniques to Improve your Code Quality

July 25, 2022Application Performance, Code Quality, Code Refactor Techniques, Coding Standards, Design Patterns, Design Principles, Scalable Applications 1 comment

View(s)

Photo by Danial Igdery

Now a days, agile teams are under tremendous pressure to write code faster with enhanced functionality in short time. There would be some or the other functionality added at the last moment or just before the release. As engineers are under pressure, the functionality gets implemented in a sloppy manner, which may technically work but may lead to dirty code.

Dirty code usually results from a developer’s inexperience, shortcuts taken to meet increasingly tight deadlines, poor management of the project, several different developers working on a project over time, or some combination of all of the above.

Bad code comes at a price, and writing good code isn’t that complicated. Let's understand what's code refactoring.

Code Refactoring

In computer programming and software design, code refactoring is the process of restructuring existing computer code—changing the factoring—without changing its external behavior. Refactoring is intended to improve the design, structure, and/or implementation of the software (its non-functional attributes), while preserving its functionality.

Potential advantages of refactoring may include improved code readability and reduced complexity; these can improve the source code's maintainability and create a simpler, cleaner, or more expressive internal architecture or object model to improve extensibility.

Another potential goal for refactoring is improved performance; software engineers face an ongoing challenge to write programs that perform faster or use less memory.

Code refactoring helps to change this dirty code into clean code and it helps to make it easier to extend the code and add new features easily in the future. Also, helps to improve the more objective attributes of code such as code length, code duplication, and coupling and cohesion, all of which correlate with ease of code maintenance and your code will use less memory and perform better and faster.

How to Perform Code Refactoring?

Now that you know the answer to the question of what is code refactoring, and you know many of its potential benefits, how exactly do you refactor code?

There are many approaches and techniques to refactor the code. Let’s discuss some popular ones.

Red-Green Refactor

Red-Green is the most popular and widely used code refactoring technique in the Agile software development process. This technique follows the test-first approach to design and implementation, this lays the foundation for all forms of refactoring.

Red - The first step starts with writing the failing red-test. You stop and check what needs to be developed.

Green - In the second step, you write the simplest enough code and get the development pass green testing.

Refactor - Find ways to improve the code and implement those improvements, without adding new functionality.

Refactoring by Abstraction

This technique is mostly used by developers when there is a need to do a large amount of refactoring. Mainly we use this technique to reduce the redundancy (duplication) in our code. This involves class inheritances, hierarchy, creating new classes and interfaces, extraction, replacing inheritance with the delegation, and vice versa.

Pull-Up Method - It pulls code parts into a superclass and helps in the elimination of code duplication.

Push-Down Method - It takes the code part from a superclass and moves it down into the subclasses.

Refactoring by abstraction allows you to make big changes to large chunks of code gradually. In this way, you can still release the system regularly, even with the change still in progress.

Composing Method

Code that is too long is difficult to understand and difficult to implement. The composing method is a code refactoring approach that helps to streamline code and remove any code duplications. This is done through extraction and inline techniques.

Extraction - We break the code into smaller chunks to find and extract fragmentation. After that, we create separate methods for these chunks, and then it is replaced with a call to this new method. Extraction involves class, interface, and local variables.

Inline - Refactoring also helps to create simpler, more streamlined code. It helps to remove unnecessary methods within the code and replaces them with the content of the method. After that, we delete the method from our program.

Simplifying Methods

As legacy code gets older and older, it tends to become more polluted and complex. In this sense, simplifying methods help to simplify the logic. These methods include adjusting the interaction between different classes, along with adding a new parameter or removing and replacing certain parameters with explicit methods.

Simplifying Conditional Expressions - Conditional statement in programming becomes more logical and complicated over time. You need to simplify the logic in your code to understand the whole program. There are so many ways to refactor the code and simplify the logic. Some of them are: consolidate conditional expression and duplicate conditional fragments, decompose conditional, replace conditional with polymorphism, remove control flag, replace nested conditional with guard clauses, etc.

Simplifying Method Calls - In this approach, we make method calls simpler and easier to understand. We work on the interaction between classes, and we simplify the interfaces for them. Examples are: adding, removing, and introducing new parameters, replacing the parameter with the explicit method and method call, parameterize method, making a separate query from modifier, preserve the whole object, remove setting method, etc.

Extract Method

The extract method is one of the techniques for code refactoring that helps to decrease complexity, while also increasing the overall readability of the code.

When you find that a class has so many responsibilities and too much thing is going on or when you find that a class is unnecessary and doing nothing in an application, you can move the code from this class to another class and remove it completely from the existing class. It involves the moving of a fragment or block of code from its existing method and into a newly created method, which is clearly named in order to explain its function.

Conclusion

Engineers are the only ones responsible for writing good and quality code. We should all make it a habit to write good code from the very beginning. Writing clean code isn’t complicated and doing so will help both you and your colleagues. A clean and well-organized code is always easy to change, easy to understand, and easy to maintain.

Clean Code: A Handbook of Agile Software Craftsmanship by Robert C. Martin

Even bad code can function. Every year, countless hours and significant resources are lost because of poorly written code but it doesn't have to be that way.

Refactoring: Improving the Design of Existing Code by Martin Fowler

Any fool can write code that a computer can understand. Good programmers write code that humans can understand.

Quick Tips

Always perform code refactoring in small chunks by making your code slightly better and leaves the application in a working state. Run jUnit tests after making small changes in the refactoring process. Without running these tests, you create a risk of introducing new bugs.

Do not create any new features or functionality during the refactoring process. You should refactor the code before adding any updates or new features into your existing code.

Refactoring process always results in complete regression, don't forget to involve your QA team in the process.

Understand and Analyze Java Thread Dump

October 20, 2021Application Performance, Distributed Systems, Executor Service, Scalable Applications, Thread Dump, Thread Pool No comments

View(s)

Photo by Mel Poole

Microservices

Also known as the microservices architecture, is an architectural style that structures an application as a collection of services that are:

Easily Maintainable and Testable
Loosely Coupled
Independently Deployable
Organized around Business Capabilities
Owned by a Small Team

The microservices architecture enables the rapid, frequent and reliable delivery of large, complex applications. It also enables an organization to evolve its technology stack.

The decentralization of business logic increases the flexibility and most importantly decouples the dependencies between two or more components, this being one of the major reasons as to why many companies are moving from monolithic architecture to a microservices architecture.

What is a Thread?

All of us have probably written a program that displays "Hello World!!" or given word is a palindrome or not etc. These are sequential programs that have a beginning, an execution sequence and an end, at any given point of time during the execution of a program, there is a single point of execution.

A single thread is also similar, as it has a beginning, an execution sequence and an end. However, a thread itself is not a program, a thread cannot run on its own, it runs within a program.

A program can consist of many lightweight processes called threads. The real excitement surrounding threads is not about a single sequence. It helps to achieve parallelism wherein, a program is divided into multiple threads and results in better performance. All threads within a process share the same memory space and might have dependency on each other in some cases.

Lifecycle of a Thread

For understanding a thread dump in detail, it is essential to know all the states a thread passes through during its lifecycle. A thread can assume one of these following states at any given point of time:

NEW

Initial state of a thread when we create an instance of Thread or Runnable. It remains in this state until the program starts the thread.

RUNNABLE

The thread becomes runnable after a new thread is started. A thread in this state is considered to be executing its task.

BLOCKED

A thread is in the blocked state when it tries to access an object that is currently locked by some other thread. When the locked object is unlocked and hence available for the thread, the thread moves back to the runnable state.

WAITING

A thread transitions to the waiting state while waiting for another thread to perform a task and transitions back to the runnable state only when another thread signals the waiting thread to resume execution.

TIMED_WAITING

A timed waiting state is a thread waiting for a specified interval of time and transitioning back to the runnable state when that time interval expires. The thread is waiting for another thread to do some work for up to a specified waiting time.

TERMINATED

A runnable thread enters the terminated state after it finishes its task.

Thread Dumps

A thread dump contains a snapshot of all the threads active at a particular point during the execution of a program. It contains all relevant information about the thread and its current state.

A new age application development involves multiple numbers of threads. Each thread requires certain resources, performs certain task related to the program. This can boost the performance of an application as threads can utilize available CPU cores. But we do have some trade-offs, for example, sometimes multiple threads may not co-ordinate well with each other and a deadlock situation may arise depending on the program. So, if something goes wrong, we can use thread dumps to identify the state of our threads.

As Java has been most popular language among application development, let's consider our application is built using spring-boot. If you want to take a snapshot of application threads, then we can go ahead with taking thread dump. A JVM thread dump is a listing of the state of all threads that are part of the process at that particular point of time. It contains information about the thread’s stack with other important information. The dump will be in a plain text format, the contents can be saved and analysis can be done either manually or using some UI that are available.

Analysis of thread dumps can help in following areas:

Tweak JVM performance
Tweak application performance
Identify threads related problems within application.

Now we know the basics of thread and it's life cycle. Let's get into the next stage where we will explore how to take a thread dump from any running Java application.

There are multiple ways to take a thread dumps. Am going to discuss about some JVM based tools and can be executed from the CLI or GUI tools.

Java Stack Trace

One of the easy way to generate a thread dump is by using jStack. jStack is a utility that ships with JVM, it can be used from the CLI and it expects the PID of the process for which we want to generate the thread dump.

jstack -l 1129 > thread_dump.txt

Java Command

JCMD is a command-line utility that ships with the JDK and are used to send diagnostic command requests to the JVM, where these requests are useful for controlling Java Flight Recordings, troubleshoot, and diagnose JVM and Java Applications. It must be used on the same machine where the JVM is running, and have the same effective user and group identifiers that were used to launch the JVM.

We can use the Thread.print command of jcmd to get a list of thread dumps for a particular process specified by the PID.

jcmd 1129 Thread.print > thread_dump.txt

Java Console

The jconsole GUI is a monitoring tool that complies to the Java Management Extensions (JMX) specification. It ships with the JDK and uses the extensive instrumentation of the Java VM to provide information about the performance and resource consumption of applications running on the Java platform.

Using the jconsole tool, we can inspect each thread’s stack trace when we connect it to a running java process. Then, in the Thread tab, we can see the name of all running threads. To detect a deadlock, we can click on the Detect Deadlock in the bottom right of the window. If a deadlock is detected, it will appear in a new tab otherwise a No Deadlock Detected will be displayed.

To launch the GUI tool, just type the below command on CLI.

jconsole

VisualVM

VisualVM is a GUI tool that helps us troubleshoot, monitor and profile Java applications. It perfectly fits all requirements of application developers, system administrators, quality engineers and end users.

As it's an external program, you need to download and install it on your machine. The GUI tool is very easy to use and lot of things you can monitor and troubleshoot things related to Java Applications.

Understanding Thread Dump Contents

Now, Let’s see what are the things we can explore using thread dumps. If we observe the thread dump, we can see a lot of information. However, if we take one step at a time, it can be fairly simple to understand.

1129:

2021-10-13 12:57:15

Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.261-b12 mixed mode):

"Attach Listener" #142 daemon prio=9 os_prio=31 tid=0x00007f8dc7146000 nid=0x440b waiting on condition [0x0000000000000000]

java.lang.Thread.State: RUNNABLE

"http-nio-8080-Acceptor" #138 daemon prio=5 os_prio=31 tid=0x00007f8dc7fab800 nid=0x9c03 runnable [0x000070000c59f000]

java.lang.Thread.State: RUNNABLE

at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method

The thread dump entry shown above, starts with the name of the thread Attach Listener whose ID is 142 thread (indicated by#142) created by the JVM after the application has started.
The daemon keyword after the thread number indicates that it's a daemon thread, which means that it will not prevent the JVM from shutting down if it is the last running thread.
After that we have are less important pieces of metadata about the thread like a priority, os priority, thread identifier, and native identifier.
The last piece of information is the most important, the state of the thread and its address in the JVM. The thread can be in one of the states as explained earlier in thread life cycle.

I am sure that most of us may not want to analyze the thread dump in plain text file. One can use GUI tools to analyse thread dumps.

Conclusion

Now you know, what's thread dump and how it can be generated. Also it's useful in understanding and diagnosing problems in multithreaded applications. With proper knowledge, regarding the thread dumps and it's structure, the information contained in dump etc, can be utilized to identify the root cause of the problems quickly.

Redis Overview and Benchmark

September 13, 2021Application Performance, Best Practices, Caching, Performance Management, Scalable Applications No comments

View(s)

Image Courtesy Morioh

ReDiS which stands for Remote Directory Server, is an open source in-memory data store, used as a database and as a cache. Redis provides data structures such as strings, hashes, lists, sets and sorted sets. Redis has built-in replication, Lua scripting, LRU eviction, transactions, and different levels of on-disk persistence and provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster.

Redis is an open source, advanced key-value store and an apt solution for building highperformance, scalable web applications.

Redis has three main features that sets it apart from others:

Redis holds its database entirely in the memory, using the disk only for persistence.
Redis has a relatively rich set of data types when compared to many key-value data stores.
Redis can replicate data to any number of slaves.

Following are certain advantages of Redis:

Exceptionally fast − Redis is very fast and can perform about 110000 SETs per second, about 81000 GETs per second.
Supports rich data types − Redis natively supports most of the datatypes that developers already know such as list, set, sorted set, and hashes. This makes it easy to solve a variety of problems as we know which problem can be handled better by which data type.
Operations are atomic − All Redis operations are atomic, which ensures that if two clients concurrently access, Redis server will receive the updated value.
Multi-utility tool − Redis is a multi-utility tool and can be used in a number of use cases such as caching, messaging-queues (Redis natively supports Publish/Subscribe), any short-lived data in your application, such as web application sessions, web page hit counts, etc.

Redis Monitoring

Availability, the redis server will respond to the PING command when it's running smoothly.

$ redis-cli -h 127.0.0.1 ping
PONG

Cache Hit Rate

This information can be calculated with the help of INFO command.

$ redis-cli -h 127.0.0.1 info stats | grep keyspace
keyspace_hits:1069963628
keyspace_misses:2243422165

Workload Statistics

The first two stats talks about connections and commands processed where last two stats talk about bytes received and sent from the redis server.

$ redis-cli -h 127.0.0.1 info stats | grep "^total"
total_connections_received:1687889
total_commands_processed:5602955422
total_net_input_bytes:198210899161
total_net_output_bytes:309040592973

Key Space

Anytime to know number of keys in the database, use this command. The size of the keyspace with a quick drop or spike in the number of keys is a good indicator of issues.

$ redis-cli -h 127.0.0.1 info keyspace
# Keyspace
db0:keys=3857884,expires=277,avg_ttl=259237

Clear Keys

We can clear all the keys from the Redis, using the below command.

$ redis-cli -h 127.0.0.1
127.0.0.1:6379> flushall

How to Perform Redis Benchmark?

Redis benchmark is the utility to check the performance of Redis by running n commands simultaneously.

redis-benchmark [option] [option value]

Option Description
-h Specifies server host name 127.0.0.1
-p Specifies server port 6379
-c Specifies number of parallel connections, default is 50
-n Specifies total number of requests, default is 100000
-d Specifies data size of SET/GET value in bytes, default is 3
-r Use random keys for SET/GET/INCR
-q Forces Quiet to Redis. Just shows query/sec values
-l Generates loop, Run the tests forever
-t Only runs the comma-separated list of tests
--csv Output in CSV format

$ redis-benchmark -h 127.0.0.1 -n 100000 -q
PING_INLINE: 57306.59 requests per second
PING_BULK: 57273.77 requests per second
SET: 56657.22 requests per second
GET: 57012.54 requests per second
INCR: 57240.98 requests per second
LPUSH: 57045.07 requests per second
RPUSH: 56657.22 requests per second
LPOP: 57142.86 requests per second
RPOP: 57175.53 requests per second
SADD: 56369.79 requests per second
HSET: 55679.29 requests per second
SPOP: 54704.60 requests per second
LPUSH (needed to benchmark LRANGE): 52798.31 requests per second
LRANGE_100 (first 100 elements): 35448.42 requests per second
LRANGE_300 (first 300 elements): 17618.04 requests per second
LRANGE_500 (first 450 elements): 12812.30 requests per second
LRANGE_600 (first 600 elements): 10036.13 requests per second
MSET (10 keys): 47281.32 requests per second

References

Retry or Stability Pattern

August 04, 2021Design Patterns, Design Principles, Distributed Systems, Retry/Stability Pattern, System Design No comments

View(s)

Photo by Brett Jordan

One of the key characteristic of microservices architecture is inter-service communication. We can split a monolithic application into multiple smaller applications called microservices. Each microservice is responsible for a single feature or domain and can be deployed, scaled, and maintained independently.

Since microservices are distributed in nature, various things can go wrong at any point of time. The network over which we access other services or services themselves can fail. There can be intermittent network connectivity errors or firewall issues. Individual services can fail due to service unavailability, coding issue, out of memory errors, deployment failure, hardware failure and etc., to make our services resilient to these failures, we adopt the retry pattern which is also known as stability pattern.

Retry Pattern

The idea behind the retry pattern is quite simple. If service A makes a call to service B and receives an unexpected response for a request, then service A will send the same request to service B again hoping to get an expected response.

Retry Pattern Representation

There are several retry strategies that can be applied depending on the failure type or nature of the requirements.

Immediate Retry

This strategy is the basic one. In this approach, calling service handles the unexpected failure and immediately makes the request again. This strategy can be useful for unusual failures that occur intermittently. The chances of success are high by just retrying in these cases.

Retry After Delay

In this strategy, we introduce a delay before retrying service call again, hoping that the cause of the fault would have been rectified. Retry after delay is an appropriate strategy when a request timeout occurs due to busy or failures or network-related issue.

Sliding Retry

In this strategy, the service will continue to retry the service call by adding an incremental time delays on each subsequent attempts. For example, the first retry may wait 500 MS, the second will wait 1000 MS, the third will wait 1500 MS until the retry count has not been exceeded. By adding an increasing delay, we reduce the number of retries to the service and avoid adding any additional load to a service which is already overloaded.

Retry with Exponential Backoff

In this strategy, we take the Sliding Retry strategy and ramp up the retry delay exponentially. If we started with a 500 MS delay, we would retry again after 1500 MS, then 3000 MS. Here we are trying to give the service more time to recover before we try to invoke it again.

Abort Retry

As we understand, we can't have a retry process happening forever. We need to have a threshold on the maximum number of retry attempts, we try for a failed service call. We need to maintain the counter and when it reaches the threshold value, our best strategy is to abort the retry process and let the error propagate to the calling service.

Conclusion

The retry pattern allows the calling service to retry failed attempts with a hope that the service will respond within an acceptable time.

With the varying interval between retries we provide the dependent service more time to recover and respond for our request.

It is recommend that, we need to keep a track of failed operations as it will be very useful information to find recurring errors and also the required infrastructure like thread pool, thread strategy etc.

At some point, we just need to abort the retry and we must acknowledge that the service is not responding and notify the calling service with an error.

References

Understanding Load Balancer

July 19, 2021Algorithms, API Gateway, Distributed Systems, Load Balancer, Reverse Proxy, System Design 2 comments

View(s)

Photo by Jon Flobrant

A load balancer is an important component of any distributed system. It helps to distribute the client requests within a cluster of servers to improve the responsiveness and availability of applications or websites.

It distributes workloads uniformly across servers or other compute resources to optimize the network efficiency, reliability and capacity. Load balancing is performed by an appliance either physical or virtual that identifies in real time which server [pod incase of kubernetes] in a pool can best meet a given client request, while ensuring heavy network traffic doesn't overwhelm any single server [ or pod]. Another important task of load balancer is to carry out continuous health checks on servers [or pods] to ensure they can handle requests. It ensures better use of system resources by balancing user requests and guarantees 100% availability of service.

Reverse Proxy/Load Balancer Communication Flow

During the system design, horizontal scaling is a very common strategy or solution to scale any system when the user base is huge in number. It also ensures better overall throughput of the application or website. Latencies should occur less often as requests are not blocked, and users need not to wait for their requests to be processed/served.

Availability is a key characteristic of any distributed system. In case of a full server failure, there won’t be any impact on the user experience as the load balancer will simply send the client request to a healthy server. Instead of a single resource performing or taking heavy load, load balancer ensures that several resources perform a bearable amount of work.

Categories of Load Balancer

Layer 4 Category Load Balancer

Load balancers distribute traffic based on transport data, such as IP addresses and Transmission Control Protocol (TCP) port numbers. Examples - Network Load balances in AWS and Internal Load balancer in GCP

Layer 7 Category Load Balancer

Load balancers make routing decisions based on application characteristics that include HTTP header information or the actual contents of the message such as URLs, Cookies etc. Examples - Applications Load balancer in AWS and Gloabl Load balancer in GCP

Types of Load Balancing

Hardware Load Balancing Type

Vendors of hardware‑based solutions load proprietary software onto the machine they provide, which often uses specialized components or resources. To handle the increasing traffic to the application or website, one has to buy specific h/w from the vendors. Example - F5 Load balancer from F5 networks

Software Load Balancing Type

Software solutions generally run on regular hardware, making them economical and more flexible. You can install the software on the hardware of your choice or in cloud environments like AWS, GCP, Azure etc.

Load Balancing Techniques

There are various types of load balancing methods and every type uses different algorithms for distributing the requests. Here is a list of load balancing techniques:

Random Selection

As the name itself says, the servers are selected randomly. There are no other factors considered in selection of the server. This method might cause a problem, where some of the servers gets overloaded with requests and other might be sitting idle.

Round Robin

One of the most commonly used load balancing methods. It’s a method where the load balancer redirects incoming traffic between a set of servers in a certain order. As per the above diagram, we have have 3 application servers; the first request goes to App Server 1, the second one goes to App Server 2, and so on. When load balancer reaches the end of the server list, it starts over again from the beginning which is from App Server 1. It almost evenly balances the traffic between the servers. All servers need to be of same specification for this method to work successfully. Otherwise, a low specification server may have the same load as a high processing capacity server.

Weighted Round Robin

It's a bit more complex than the Round Robin, as this method is designed to handle servers with different characteristics. A weight is assigned to each server in the configuration. This weight can be an integer value that varies according to the specifications of the server. Higher specification servers get more weightage, which is the key parameter for traffic redirection.

Least Response Time

This algorithm sends the client requests to the server with the least active connections and the lowest average response time. The backend server that responds the fastest receives the next request.

Least Connections

In this method, the traffic redirection happens based on the server with the least number of active connections.

IP Hash

In this method, a hash of the source/client's IP address is generated which is used to select a server for redirection. Once the server is allocated, same server will be used for the client’s consecutive requests. It becomes more like a sticky where requests of a client will be sent to same server irrespective of how busy the server with requests. In some use cases, this method will come very handy and even improve the performance.

Conclusion

Availability is a key characteristic of a distributed system. In case of a one server failure scenario, it won’t affect the end user experience as the load balancer will simply send the client request to another healthy server.

While designing a distributed system, one of the important task is to choose the load balancing strategy according to the application or website requirements.

HAProxy (High Availability Proxy) is open source proxy and load balancing server software. It provides high availability at the network (TCP) and application (HTTP/S) layers, improving speed and performance by distributing workload across multiple servers.

Nginx is a very efficient HTTP load balancer to distribute traffic to several application servers and to improve performance, scalability and reliability of web applications.

Monday, July 25, 2022

Code Refactoring

How to Perform Code Refactoring?

Red-Green Refactor

Refactoring by Abstraction

Composing Method

Simplifying Methods

Extract Method

Conclusion

Quick Tips

Wednesday, October 20, 2021

Microservices

What is a Thread?

Lifecycle of a Thread

NEW

RUNNABLE

BLOCKED

WAITING

TIMED_WAITING

TERMINATED

Thread Dumps

Understanding Thread Dump Contents

Conclusion

Monday, September 13, 2021

Redis has three main features that sets it apart from others:

Following are certain advantages of Redis:

Redis Monitoring

Cache Hit Rate

Workload Statistics

Key Space

Clear Keys

How to Perform Redis Benchmark?

References

Wednesday, August 4, 2021

Retry Pattern

Immediate Retry

Retry After Delay

Sliding Retry

Retry with Exponential Backoff

Abort Retry

Conclusion

References

Monday, July 19, 2021

Categories of Load Balancer

Layer 4 Category Load Balancer

Layer 7 Category Load Balancer

Types of Load Balancing

Hardware Load Balancing Type

Software Load Balancing Type

Load Balancing Techniques

Random Selection

Round Robin

Weighted Round Robin

Least Response Time

Least Connections

IP Hash

Conclusion

References

Featured Post

Tags

Blog Archives

Pages

Followers

About