Cloud call center large capacity and high availability platform architecture practice
the core value of call center is to connect people and services. With the deepening of the transformation of traditional industries by Internet, many online and offline interactive application scenarios have been derived, such as ordering meals, taking out meals, hotels, etc. The simplest and most efficient tool to combine online and offline information chain is. Therefore, the call center has evolved from only providing customer service and marketing services to a tool for enterprises to communicate with customers, which is deeply integrated with enterprise business processes and accounts for more than 15% of vehicle materials in an all-round way. As an open call center capability platform, the cloud call center of Tianrun RONGTONG makes it easy for enterprises to realize low-cost and highly reliable voice services by using very simple API or SDK
the combination of open voice platform and scenario based application makes the cloud call center platform put forward greater requirements for capacity and stability. How to meet customers' elastic business needs and cope with peak business hours? Next, take a meal ordering business model as an example to discuss how to deal with the architecture of the lower cloud call center
a takeout business model
a takeout business flow chart
daily 11:30 noon, 17:00 PM ordering business peak, extremely uneven
design principles
at the beginning of the design of the Intelligent Cloud call center platform, we confirmed the following principles for the platform architecture design according to the business needs of platform customers:
1 The platform architecture should be based on open and mature cloud IAAs services
2. When designing an architecture in the cloud, you should be pessimistic and assume that everything will fail. In other words, architecture needs to be designed, implemented, and deployed for automated fault recovery. Any module of the platform must be ha architecture, eliminating single point modules
3. The application cloud IAAs service and IDC machine room are composed of DX dedicated lines to form a hybrid architecture cloud
4. Distributed architecture must be very easy to expand and support automatic elastic scaling
5. The relationship between modules in the platform reduces coupling and facilitates the rapid evolution of business
6. Build a platform with business monitoring, logging and statistics as the core of operation
7. High availability structure across machine rooms
8. Perfect complete mechanism, self-protection and service degradation ability
the way of practice
system group by virtue of the advantages in the cloud
cloud platform based architecture has obvious business advantages in group structure. It is reflected in the almost zero start-up cost, flexible resource pay on demand mode, rapid capacity expansion and online capability, etc
at the technical level, the cloud platform architecture also has obvious advantages. It can realize automatic construction and deployment, automatic expansion without human intervention, and continuously inject tests into all stages of the development process to achieve improved predictability
Tianrun RONGTONG Intelligent Cloud call center platform, based on the hybrid architecture cloud formed by AWS cloud/Alibaba cloud +dx direct IDC, can not only take advantage of the cloud platform, but also be compatible with special applications, so that the operation of the platform can be switched seamlessly online. In terms of network structure, connect the core computer room and the floor computer room through special lines to form a loop line. Any fault of the dedicated line can be transferred by other dedicated lines or interconnection through the overall network scheduling, so as not to affect the normal operation of the business
highly available group structure chart
build a high-capacity and highly available system on the basis of IAAs cloud services
in terms of basic IAAs cloud services, AWS is not different from Alibaba cloud. The following only takes AWS as an example to illustrate how to build a high-capacity and highly available system on top of basic IAAs services
the current smart cloud call center platform architecture is based on the three-tier basic services provided by AWS:
AWS cloud platform component services
tier 1. Basic computing, storage and networking components, including EC2, S3, EBS, VPC and DX, etc. Among them, the persistence of 11 9s is provided by AWS for S3 service, and the DX dedicated line adopts 2 1g direct connections for mutual backup to ensure network performance
second layer. The highly available database RDS, cache, SNS and SQS application components support cross machine room high availability and flexible capacity expansion. Rediscache is used for real-time processing to reduce the pressure on the database, and SQS is widely used for asynchronous processing to realize peak shaving and valley filling
the third layer. ELB load balancer in the application layer, autoscaling elastic scaling, and perfect monitoring and logging services. First of all, all modules of the system are stateless. The application of autoscaling enables the combination of the current load collected by the ELB and the scaling strategy to dynamically adjust the number of EC2 instances. When the business peak, a large number of instances are started to undertake the business, and when the business peak, the instances are reduced and the cost is reduced
in the platform architecture design, we must realize that failure and failover exist as a part of the system architecture. The fault-tolerant architecture provided by aws/Alibaba cloud and other cloud environments greatly reduces the complexity of system operation and maintenance. In fact, this part of the architecture is completed by the cloud environment. Like the basic hardware fault design, the platform software must also be designed for the architecture of failover. For example, if a module goes down, what about the applications on the platform? How to deal with the interface request timeout or exception? What if the sudden request exceeds the system capacity
our experience is based on the SOA service-oriented architecture concept. The key to building components is to reduce the dependency between components. If a component hangs and does not respond or the response time is too long, other components in the system should continue to work. The development of the industry depends on further breakthroughs and innovations in major technologies. Components should be independent of each other as much as possible, and the interfaces between components should be designed by using message queues through asynchronous interaction. In this way, even if some functions are temporarily unavailable, the whole system will continue to run. When the faulty component is recovered, the data in the message queue can still be used to restore the running state
based on the SOA service-oriented architecture concept, we have decoupled and split a large number of ecological subsystems, and built a complete functional ecological chain between systems through API calls, such as NOS management center, boss accounting center, NMC code Center, TTS proxy speech synthesis center, SMSC SMS platform, etc, The overall architecture is shown in the figure below:
overall architecture figure
in addition to the decoupling and microservice oriented architecture splitting at the overall ecosystem level, the core switching platform of the Intelligent Cloud call center has also carried out a large number of micro module splitting. A total of 25 subsystems are split, of which the main subsystems are as follows:
the above subsystems all realize stateless logic, and achieve high availability and high performance in the way of cluster stacking. The key points of architecture implementation are:
1 Provide unified interface services for the upper layer, and the interface service version can be flexibly extended
nfdb and cachedb are completely separated, and real-time services do not depend on the configuration library, only high-performance cache libraries are used
3. Completely separate the super large amount of data storage from the runtime data storage, and use the cloud environment object storage and NoSQL database to realize the storage and processing of massive data
toscaling when elastic scaling, the instance is bootstrapped, and the instance asks the control service: who am I? What should I do? Minimize human deployment errors and create a self-healing environment
5. Use open source Dubbo automatic management service
6. There should be a complete monitoring service
module architecture diagram of core switching platform
Security Mechanism of cloud services
security issues faced in the cloud era are extremely important. The architecture design of Tianrun RONGTONG Intelligent Cloud call center platform has prepared a triple backup mechanism: first, it is based on AWS cloud platform. First of all, realize the dual active data center in the awsa/b machine room; Second, the business data will be hot standby in the core computer room. Once the AWS cloud service has a global problem, the business will be switched to the core computer room immediately to maintain the continuous service of the business; Third, perform offline cold backup of the data. According to the plastic ocean foundation, which is headquartered in the United States, ensure that the data can be recovered
in terms of security architecture, in addition to technical prevention, such as SQL injection, web vulnerabilities, brute force cracking, etc., a series of security architectures are also used to provide security protection, including external intrusion detection system, WAF protection, network firewall, and internal account authority management audit
practical achievements
Tianrun integrates the high-capacity and highly available call center platform architecture, so that the cloud call center can be comparable in performance, even surpassing the original call center system with hardware as the core and computer, completely breaking people's inherent impression that once the cloud call center could only be a small customer. The specific practical results are as follows:
1. Solve the problem of high-capacity concurrency
basic indicators include: the call concurrency capacity exceeds 10000 lines; More than 20000 concurrent seats; CPS (number of calls processed per second) capability is between; Support the maximum 1000 tenants of a single platform; The call response time is less than 1 second; Handle 2million minutes of calls every day; The average response time of TTS is less than 1 second; Message response time is less than 1 second; The recording conversion efficiency should be available less than 1 minute after the end of the call; 800g recording is processed every day (after compression)
2. Solve the problem of high availability of the platform, eliminate single point, cross machine room level load balancing, and the platform has ultra-high stability
3 Elastic scalability can solve the problem of business peak
4 The complete ecosystem solves the problem of operating costs
with the high-capacity and highly available Intelligent Cloud call center platform, Tianrun RONGTONG has gained the recognition of customers in various industries. The fast, flexible and scalable cloud model is also more suitable for the growth needs of future technology and business, so that the capacity of the call center can continue to grow in the future
LINK
Copyright © 2011 JIN SHI