Otrs6 with periods of CPU at 100%

Moderator: crythias

Post Reply
jlima
OTRS newbie
Posts: 34
Joined: 03 Jul 2014, 13:48
OTRS Version?: 6.0.18
Real Name: Jorge Lima
Location: Braga, Portugal

Otrs6 with periods of CPU at 100%

Post by jlima » 31 Oct 2019, 20:36

Hello

I have the same problem as described here (viewtopic.php?f=53&p=166163#p166163) , sudden 100% CPU spikes on all four CPUs that can last for 15min.
As you can see, I have several Daemon.pl as well as /opt/otrs/bin/c -DFOREGROUND processes choking the server.

I don't know what they're doing nor how to dig it out.

/var/log/httpd/error_log has several entries like these but I don't know if there's any connection
Use of uninitialized value in concatenation (.) or string at /opt/otrs//Kernel/Modules/AgentTicketSearch.pm line 1306.
Use of uninitialized value in concatenation (.) or string at /opt/otrs//Kernel/Modules/CustomerTicketZoom.pm line 586.
Use of uninitialized value $CustomerData{"UserFullname"} in concatenation (.) or string at /opt/otrs//Kernel/Output/PDF/Ticket.pm line 258.
Use of uninitialized value in concatenation (.) or string at /opt/otrs//Kernel/Modules/AgentTicketQueue.pm line 190.

Any help would be greatly appreciated.

Thanks,
Jorge

Image
OTRS 6.0.18 (public/testing) on CentOS with Postgres database

root
Moderator
Posts: 1895
Joined: 18 Dec 2007, 12:23
OTRS Version?: 4/5/6
Real Name: Roy Kaldung
Company: Znuny Inc.
Contact:

Re: Otrs6 with periods of CPU at 100%

Post by root » 01 Nov 2019, 07:37

Hi,

Next time when the load is about 100% check what the output of bin/otrs.Console.pl Maint::Daemon::Summary is and let us know.

- Roy
OTRS 5/6 CentOS / RHEL / Debian / SLES / MySQL / PostgreSQL / Oracle / OpenLDAP / Active Directory / SSO

Do you need professional services for the ((OTRS)) Community Edition? Check out https://www.znuny.com/

jlima
OTRS newbie
Posts: 34
Joined: 03 Jul 2014, 13:48
OTRS Version?: 6.0.18
Real Name: Jorge Lima
Location: Braga, Portugal

Re: Otrs6 with periods of CPU at 100%

Post by jlima » 04 Nov 2019, 11:35

Hi Roy
thanks for your reply
Here follows the Daemon Summary plus screen captures of htop and top
Hope they provide any light on this
Thanks
Jorge


Gathering summary for all daemons...

System configuration sync:
Daemon is active.

Tasks to be executed in future:
There are currently no tasks to be executed in future.

Recurrent cron tasks:
+---------------------------------+---------------------+-------------+----------------+---------------------+
| Name | Last Execution | Last Status | Last Duration | Next Execution |
+---------------------------------+---------------------+-------------+----------------+---------------------+
| CoreCacheCleanup | 2019-11-03 00:20:00 | Success | 50.3 Minute(s) | 2019-11-10 00:20:00 |
| EscalationCheck | 2019-11-01 00:01:00 | Success | 7.0 Second(s) | 2019-12-01 00:01:00 |
| GenerateDashboardStats | 2019-11-04 09:05:00 | Success | < 1 Second | 2019-11-04 10:05:00 |
| LoaderCacheDelete | 2019-11-03 00:30:00 | Success | 2.0 Second(s) | 2019-11-10 00:30:00 |
| MailAccountFetch | 2019-11-04 09:20:00 | Success | 2.2 Minute(s) | 2019-11-04 09:30:00 |
| OTRSBusinessAvailabilityCheck | 2019-11-04 09:15:00 | Success | < 1 Second | 2019-11-04 09:35:00 |
| OTRSBusinessEntitlementCheck | 2019-11-04 08:45:00 | Success | < 1 Second | 2019-11-04 09:25:00 |
| RegistrationUpdateSend | 2019-11-04 08:30:00 | Success | < 1 Second | 2019-11-04 09:30:00 |
| RenewCustomerSMIMECertificates | 2019-11-04 02:02:00 | Success | < 1 Second | 2019-11-05 02:02:00 |
| SessionDeleteExpired | 2019-11-04 08:55:00 | Success | < 1 Second | 2019-11-04 10:55:00 |
| SpoolMailsReprocess | 2019-11-04 00:10:00 | Success | < 1 Second | 2019-11-05 00:10:00 |
| SupportDataCollectAsynchronous | 2019-11-04 09:01:00 | Success | < 1 Second | 2019-11-04 10:01:00 |
| TicketAcceleratorRebuild | 2019-11-04 01:01:00 | Success | 45.0 Second(s) | 2019-11-05 01:01:00 |
| TicketPendingCheck | 2019-11-04 08:45:00 | Success | 5.0 Second(s) | 2019-11-04 10:45:00 |
| TicketUnlockTimeout | 2019-11-04 08:35:00 | Success | < 1 Second | 2019-11-04 09:35:00 |
| ArticleSearchIndexRebuild | 2019-11-04 09:23:00 | Success | 2.9 Minute(s) | 2019-11-04 09:24:00 |
| CommunicationLogDelete | 2019-11-04 03:00:00 | Success | 2.4 Minute(s) | 2019-11-05 03:00:00 |
| ConfigurationDeploymentCleanup | 2019-11-03 00:40:00 | Success | 19.0 Second(s) | 2019-11-10 00:40:00 |
| GeneticInterfaceDebugLogCleanup | 2019-11-04 03:02:00 | Success | < 1 Second | 2019-11-05 03:02:00 |
| MailQueueSend | 2019-11-04 09:23:00 | Success | 2.2 Minute(s) | 2019-11-04 09:24:00 |
| TicketDraftDeleteExpired | 2019-11-04 08:55:00 | Success | < 1 Second | 2019-11-04 09:55:00 |
| TicketNumberCounterCleanup | 2019-11-04 09:20:00 | Success | < 1 Second | 2019-11-04 09:30:00 |
| WebUploadCacheCleanup | 2019-11-04 08:46:00 | Success | < 1 Second | 2019-11-04 09:46:00 |
+---------------------------------+---------------------+-------------+----------------+---------------------+

Recurrent generic agent tasks:
+--------------------------------------------------+---------------------+-------------+----------------+---------------------+
| Name | Last Execution | Last Status | Last Duration | Next Execution |
+--------------------------------------------------+---------------------+-------------+----------------+---------------------+
| tickets novos com mais de 3 dias | 2019-11-01 16:50:00 | Success | 1.0 Second(s) | 2019-11-04 16:50:00 |
| ___Admin nao novos | 2019-11-04 05:00:00 | Success | 1.0 Second(s) | 2019-11-05 05:00:00 |
| ___Unlocked tickets | 2019-11-04 05:15:00 | Success | < 1 Second | 2019-11-05 05:15:00 |
| move webrequests com mais de 30 min para entrada | 2019-11-04 09:00:00 | Success | < 1 Second | 2019-11-04 10:00:00 |
| spam qq.com | 2019-11-04 03:30:00 | Success | 16.0 Second(s) | 2019-11-05 03:30:00 |
| apagar spam | 2019-11-04 00:00:00 | Success | 1.0 Second(s) | 2019-11-05 00:00:00 |
| arquivar tickets | 2019-11-02 04:10:00 | Success | 6.0 Minute(s) | 2019-11-09 04:10:00 |
+--------------------------------------------------+---------------------+-------------+----------------+---------------------+

Unhandled Worker Tasks:
+---------------+------+---------------------+
| Name | Type | Create Time |
+---------------+------+---------------------+
| MailQueueSend | Cron | 2019-11-04 09:23:09 |
+---------------+------+---------------------+

Handled Worker Tasks:
+---------------------------+------+-----+----------+---------------+
| Name | Type | NID | PID | Duration |
+---------------------------+------+-----+----------+---------------+
| ArticleSearchIndexRebuild | Cron | 001 | 00002284 | 9.0 Second(s) |
+---------------------------+------+-----+----------+---------------+

Done.

Image
Image
OTRS 6.0.18 (public/testing) on CentOS with Postgres database

root
Moderator
Posts: 1895
Joined: 18 Dec 2007, 12:23
OTRS Version?: 4/5/6
Real Name: Roy Kaldung
Company: Znuny Inc.
Contact:

Re: Otrs6 with periods of CPU at 100%

Post by root » 04 Nov 2019, 14:44

Hi Jorge,

I would check more on the I/O, even on my largest systems the CoreCacheCleanup never takes more than 3-4 minutes. 50 minutes seems pretty much to me, especially when you consider the date and time when the task is executed (Sunday night). My guess is based on the fact that there are many, small files which were deleted.

- Roy
OTRS 5/6 CentOS / RHEL / Debian / SLES / MySQL / PostgreSQL / Oracle / OpenLDAP / Active Directory / SSO

Do you need professional services for the ((OTRS)) Community Edition? Check out https://www.znuny.com/

jlima
OTRS newbie
Posts: 34
Joined: 03 Jul 2014, 13:48
OTRS Version?: 6.0.18
Real Name: Jorge Lima
Location: Braga, Portugal

Re: Otrs6 with periods of CPU at 100%

Post by jlima » 04 Nov 2019, 15:15

Hi Roy
thanks for the hint. Will ask the guy responsible for the infrastructure to check it out.
But, why isn't it permanent? Why does it happen at random times with no specific pattern?
Thanks
jlima
OTRS 6.0.18 (public/testing) on CentOS with Postgres database

jlima
OTRS newbie
Posts: 34
Joined: 03 Jul 2014, 13:48
OTRS Version?: 6.0.18
Real Name: Jorge Lima
Location: Braga, Portugal

Re: Otrs6 with periods of CPU at 100%

Post by jlima » 07 Nov 2019, 20:13

Hi everyone,
Some news here
First, no evidence of IO issues.
What we found was that CoreCacheCleanup was running at the same time of the system backup. We moved it so it won't overlap. Let's wait for this week's cleanup to see the difference.

Additionally, I found an article that induces the reported behavior. So far, it seems identical to every other article except that in article_data_mime_attachment there are attachments that aren't displayed by otrs.
Questions
1: why aren't these attachments (2 jpg and 1 gif) reflected on otrs (neither on the bottom of the article nor on the right side under the clip column?
2: could all that cpu be related to this?
3: has anyone experienced this?
4: what might have happened?
5: any suggestion on what to do?

Hope someone can help
best regards
jorge
OTRS 6.0.18 (public/testing) on CentOS with Postgres database

jlima
OTRS newbie
Posts: 34
Joined: 03 Jul 2014, 13:48
OTRS Version?: 6.0.18
Real Name: Jorge Lima
Location: Braga, Portugal

Re: Otrs6 with periods of CPU at 100%

Post by jlima » 21 Nov 2019, 17:45

Hello everyone

following my previous post, I changed the CacheCleanup schedule.
Regarding the cpu pikes, they are less frequent and last "only" for a minute or two.

Apart from that specific ticket, I still haven't found any trigger for what is happening.
I have just revisited one of the two articles, in that ticket, that hangup everything. It topped the four cpus to 100% for 110 seconds and then almost another 30 seconds to display the contents of the article. The other article took almost 4 minutes of 4x100% CPU and 25 seconds additional time.
Don't really know what it is with these two articles. The other articles are ok.
Then I copied (ArticleCopyMoveDelete extension) these articles to a new test ticket and the behavior still problematic.

After two executions of the cleanup here are the results.
CoreCacheCleanup is still taking too much time by Roy's referential.

+---------------------------------+---------------------+-------------+----------------+---------------------+
| Name | Last Execution | Last Status | Last Duration | Next Execution |
+---------------------------------+---------------------+-------------+----------------+---------------------+
| CoreCacheCleanup | 2019-11-09 21:00:00 | Success | 8.3 Minute(s) | 2019-11-16 21:00:00 |
| EscalationCheck | 2019-11-01 00:01:00 | Success | 7.0 Second(s) | 2019-12-01 00:01:00 |
| GenerateDashboardStats | 2019-11-11 09:05:00 | Success | < 1 Second | 2019-11-11 10:05:00 |
| LoaderCacheDelete | 2019-11-09 20:00:00 | Success | 4.0 Second(s) | 2019-11-16 20:00:00 |
| MailAccountFetch | 2019-11-11 10:00:00 | Success | 41.0 Second(s) | 2019-11-11 10:10:00 |
| OTRSBusinessAvailabilityCheck | 2019-11-11 09:55:00 | Success | < 1 Second | 2019-11-11 10:15:00 |
| OTRSBusinessEntitlementCheck | 2019-11-11 09:45:00 | Success | < 1 Second | 2019-11-11 10:25:00 |
| RegistrationUpdateSend | 2019-11-11 09:30:00 | Success | < 1 Second | 2019-11-11 10:30:00 |
| RenewCustomerSMIMECertificates | 2019-11-11 02:02:00 | Success | < 1 Second | 2019-11-12 02:02:00 |
| SessionDeleteExpired | 2019-11-11 08:55:00 | Success | < 1 Second | 2019-11-11 10:55:00 |
| SpoolMailsReprocess | 2019-11-11 00:10:00 | Success | < 1 Second | 2019-11-12 00:10:00 |
| SupportDataCollectAsynchronous | 2019-11-11 10:01:00 | Success | 1.0 Second(s) | 2019-11-11 11:01:00 |
| TicketAcceleratorRebuild | 2019-11-11 01:01:00 | Success | 2.1 Minute(s) | 2019-11-12 01:01:00 |
| TicketPendingCheck | 2019-11-11 08:45:00 | Success | 8.0 Second(s) | 2019-11-11 10:45:00 |
| TicketUnlockTimeout | 2019-11-11 09:35:00 | Success | < 1 Second | 2019-11-11 10:35:00 |
| ArticleSearchIndexRebuild | 2019-11-11 10:02:00 | Success | 1.0 Second(s) | 2019-11-11 10:03:00 |
| CommunicationLogDelete | 2019-11-11 03:00:00 | Success | 1.1 Minute(s) | 2019-11-12 03:00:00 |
| ConfigurationDeploymentCleanup | 2019-11-09 20:10:00 | Success | 9.0 Second(s) | 2019-11-16 20:10:00 |
| GeneticInterfaceDebugLogCleanup | 2019-11-11 03:02:00 | Success | 1.0 Second(s) | 2019-11-12 03:02:00 |
| MailQueueSend | 2019-11-11 10:01:00 | Success | < 1 Second | 2019-11-11 10:03:00 |
| TicketDraftDeleteExpired | 2019-11-11 09:55:00 | Success | < 1 Second | 2019-11-11 10:55:00 |
| TicketNumberCounterCleanup | 2019-11-11 10:00:00 | Success | < 1 Second | 2019-11-11 10:10:00 |
| WebUploadCacheCleanup | 2019-11-11 09:46:00 | Success | < 1 Second | 2019-11-11 10:46:00 |
+---------------------------------+---------------------+-------------+----------------+---------------------+


+---------------------------------+---------------------+-------------+----------------+---------------------+
| Name | Last Execution | Last Status | Last Duration | Next Execution |
+---------------------------------+---------------------+-------------+----------------+---------------------+
| CoreCacheCleanup | 2019-11-16 21:00:00 | Success | 28.8 Minute(s) | 2019-11-23 21:00:00 |
| EscalationCheck | 2019-11-01 00:01:00 | Success | 7.0 Second(s) | 2019-12-01 00:01:00 |
| GenerateDashboardStats | 2019-11-20 15:05:00 | Success | 1.0 Second(s) | 2019-11-20 16:05:00 |
| LoaderCacheDelete | 2019-11-16 20:00:00 | Success | 2.0 Second(s) | 2019-11-23 20:00:00 |
| MailAccountFetch | 2019-11-20 15:30:00 | Success | 7.0 Second(s) | 2019-11-20 15:40:00 |
| OTRSBusinessAvailabilityCheck | 2019-11-20 15:15:00 | Success | < 1 Second | 2019-11-20 15:35:00 |
| OTRSBusinessEntitlementCheck | 2019-11-20 15:25:00 | Success | < 1 Second | 2019-11-20 15:45:00 |
| RegistrationUpdateSend | 2019-11-20 15:30:00 | Success | < 1 Second | 2019-11-20 16:30:00 |
| RenewCustomerSMIMECertificates | 2019-11-20 02:02:00 | Success | < 1 Second | 2019-11-21 02:02:00 |
| SessionDeleteExpired | 2019-11-20 14:55:00 | Success | < 1 Second | 2019-11-20 16:55:00 |
| SpoolMailsReprocess | 2019-11-20 00:10:00 | Success | < 1 Second | 2019-11-21 00:10:00 |
| SupportDataCollectAsynchronous | 2019-11-20 15:01:00 | Success | < 1 Second | 2019-11-20 16:01:00 |
| TicketAcceleratorRebuild | 2019-11-20 01:01:00 | Success | 31.0 Second(s) | 2019-11-21 01:01:00 |
| TicketPendingCheck | 2019-11-20 14:45:00 | Success | 2.0 Second(s) | 2019-11-20 16:45:00 |
| TicketUnlockTimeout | 2019-11-20 14:35:00 | Success | < 1 Second | 2019-11-20 15:35:00 |
| ArticleSearchIndexRebuild | 2019-11-20 15:31:00 | Success | < 1 Second | 2019-11-20 15:33:00 |
| CommunicationLogDelete | 2019-11-20 03:00:00 | Success | 1.6 Minute(s) | 2019-11-21 03:00:00 |
| ConfigurationDeploymentCleanup | 2019-11-16 20:10:00 | Success | 8.0 Second(s) | 2019-11-23 20:10:00 |
| GeneticInterfaceDebugLogCleanup | 2019-11-20 03:02:00 | Success | < 1 Second | 2019-11-21 03:02:00 |
| MailQueueSend | 2019-11-20 15:31:00 | Success | 1.0 Second(s) | 2019-11-20 15:33:00 |
| TicketDraftDeleteExpired | 2019-11-20 14:55:00 | Success | < 1 Second | 2019-11-20 15:55:00 |
| TicketNumberCounterCleanup | 2019-11-20 15:30:00 | Success | < 1 Second | 2019-11-20 15:40:00 |
| WebUploadCacheCleanup | 2019-11-20 14:46:00 | Success | < 1 Second | 2019-11-20 15:46:00 |
+---------------------------------+---------------------+-------------+----------------+---------------------+

I really hope someone can shred some light on my path ...
cheers
OTRS 6.0.18 (public/testing) on CentOS with Postgres database

root
Moderator
Posts: 1895
Joined: 18 Dec 2007, 12:23
OTRS Version?: 4/5/6
Real Name: Roy Kaldung
Company: Znuny Inc.
Contact:

Re: Otrs6 with periods of CPU at 100%

Post by root » 22 Nov 2019, 08:48

Hi,

Anything "uncommon" on your setup? Using NFS or sth like that? What happens if you move the cache into an ram disk?

- Roy
OTRS 5/6 CentOS / RHEL / Debian / SLES / MySQL / PostgreSQL / Oracle / OpenLDAP / Active Directory / SSO

Do you need professional services for the ((OTRS)) Community Edition? Check out https://www.znuny.com/

Post Reply