<div dir="ltr">I get the impression there is a pattern for doing this and probably someone on this list has some good input into it.<div><br></div><div>We've been thinking about how to handle failures in internal services, whilst integrating with third party services and trading off robustness and ability to debug complex requests and yet still notice actual genuine errors in our codebase. (e.g. avoiding things like 'try' and 'rescue nil' or 'rescue Exception')<div>


<br><div style>Let's say we have three internal services A,B,C and some external API providers X,Y, Z.</div><div style><br></div><div style>Some object may be responsible for communicating with Z, but this object doesn't have access to the original incoming request.</div>


</div><div style>Also it's absolutely critical that if this request to Z fails, the rest of the request can complete and the our external API user is hidden from the failure and some manual or separate automated process resolves the issue.</div>


<div style><br class=""><span style="font-family:arial,helvetica,sans-serif">To emphasise the criticality of such a system it would be where a user has paid for a service and one part of the fulfilment of the customer's purchase is achieved by an API call to an external provider Z. If this doesn't occur then we'd have angry customers and so we make sure the request is fulfilled by any means possible (manual if necessary), but still assure the customer we have fulfilled their order.</span><br>


</div><div style><br></div><div style>We've been toying with the idea of generating unique identifiers for our incoming requests and sending these in to all other internal services, then we'd be able to log these ids in all our log statements. We'd also ideally use these ids in communications to airbrake.</div>


</div><div style><br></div><div style>We could pretty easily create middleware that can generate the ids and send/receive them in headers to our other services, but the issue comes with having access to this info in our models. </div>


<div style><br></div><div style><font face="courier new, monospace">sinatra route/rails controller code</font></div><div style><font face="courier new, monospace"> --   some long</font></div><div style><font face="courier new, monospace"> --   stack frame</font></div>


<div style><font face="courier new, monospace"> --  model code communicating with Z</font></div><div style><font face="courier new, monospace"><br></font></div><div style><font face="arial, helvetica, sans-serif">One solution that would get us to our controller/route code where we can access the request info would be throwing or raising</font></div>


<div style><font face="arial, helvetica, sans-serif">exceptions, but this then prevents us continuing the request in the normal way and completing the required tasks after a call to Z fails. Also it's horrendous goto flow control.</font></div>


<div style><font face="arial, helvetica, sans-serif"><br></font></div><div style><font face="arial, helvetica, sans-serif">Other undesirable hacks would be sticking something on the thread itself, or a global variable.</font></div>


<div style><font face="arial, helvetica, sans-serif"><br></font></div><div style><font face="arial, helvetica, sans-serif">The other thing is to actually ensure we can pass down request info all the way through a stack, but this completely breaks single responsibility and is going to result in complex spaghetti.</font></div>


<div style><font face="arial, helvetica, sans-serif"><br></font></div><div style><font face="arial, helvetica, sans-serif">There has to be some kind of intelligent solution to this that is elegant, readable and maintainable and isn't any of the things I've mentioned.</font></div>


<div style><font face="arial, helvetica, sans-serif"><br></font></div><div style><font face="arial, helvetica, sans-serif">Oh and another idea is to only swallow all exceptions in our protective blocks around API calls in production mode, and to not do this in dev or test. I.e. surface programming errors, but give as cast iron a guarantee as we can that no failure of Z can possibly result in non-completion of the rest of the request in production. </font></div>


<div style><br></div><div style>Will appreciate hearing your thoughts,</div><div style><br></div><div style>Mark</div></div>