# Parsing raw standard output

If your tool does not output JSON lines / JSON files, it's requires a bit more effort to integrate it with `secator`.

Depending on how you want to parse the output, read:

* [#using-regular-expressions](#using-regular-expressions "mention")
* [#writing-a-custom-item-loader](#writing-a-custom-item-loader "mention")

***

## Using regular expressions

**Steps:**

* Use the `RegexSerializer` item loader.
* Use the `on_regex_loaded` hook to yield `secator` output types.

#### **Example:**

Assume `mytool` outputs on stdout like:

```bash
mytool -u mytarget.com
[INF] This is an info message
[ERR] This is an error message
[FOUND] https://mytarget.com/api [type=url] [status=200] [content_type=application/json] [title=MyAwesomeWebPage]
[FOUND] https://mytarget.com/api/metrics [type=url] [status=403]
[FOUND] A3TBABCD1234EFGH5678 [type=aws_api_key] [matched_at=https://mytarget/api/.aws_key.json]
[FOUND] <-- an HTML comment --> [type=aws_api_key] [matched_at=https://mytarget/api/.aws_key.json]
[FOUND] CVE-2021-44228 [type=vulnerability] [matched_at=https://mytarget/api/sensitive_api_path]
```

First we need to find a regular expression that will match the items marked with `[FOUND]` and get the individual values using a named regex (you can use [Pythex](https://pythex.org) for this).

Here is the one we came up with:

{% code overflow="wrap" %}

```python
OUTPUT_REGEX = r'\[\w+]\s(?P<value>.*)\s\[type=(?P<type>[\w_]+)\](\s\[status=(?P<status>\d+)\])?(\s\[content_type=(?P<content_type>[\w\/]+)\])?(\s\[title=(?P<title>.*)\])?(\s\[matched_at=(?P<matched_at>.*)\])?'
```

{% endcode %}

An integration of `mytool` with `secator` would look like:

{% code title="secator/tasks/mytool.py" %}

```python
from secator.decorators import task
from secator.runners import Command
from secator.output_types import Url, Tag, Vulnerability
from secator.serializers import RegexSerializer
from secator.tasks._categories import Vuln

OUTPUT_REGEX = r'\[\w+]\s(?P<value>.*)\s\[type=(?P<type>[\w_]+)\](\s\[status=(?P<status>\d+)\])?(\s\[content_type=(?P<content_type>[\w\/]+)\])?(\s\[title=(?P<title>.*)\])?(\s\[matched_at=(?P<matched_at>.*)\])?'


@task()
class mytool(Command):
  cmd = '/home/osboxes/.local/bin/mytool'
  input_flag = '-u'
  json_flag = '-jsonl'
  output_types = [Url, Tag, Vulnerability]

  # Use the RegexSerializer to load the stdout input
  item_loaders = [
    RegexSerializer(
      OUTPUT_REGEX,
      fields=['value', 'type', 'status', 'content_type', 'title', 'matched_at']
    )
  ]

  # React to items loaded by the RegexSerializer, and yield secator output types
  # like Url, Vulnerability, and Tag.
  @staticmethod
  def on_regex_loaded(self, item):
    # this is called after the regex serializer runs,
    # so we can expect item to be a dict with the matched regex values
    if (item['type'] == 'url'):
      yield Url(
        url=item['value'],
        status_code=int(item['status']),
        content_type=item['content_type'],
        title=item['title']
      )
    elif (item['type'] == 'vulnerability'):
      cve_id = item['value']
      lookup_data = Vuln.lookup_cve(cve_id)  # perform vulnerability search
      vuln = {
        'matched_at': item['matched_at']
      }
      if lookup_data:
        vuln.update(**lookup_data)
      yield Vulnerability(**vuln)
    else:
      yield Tag(
        name=item['type'],
        match=item['matched_at'],
        extra_data={
          'secret': item['value']
        }
      )

```

{% endcode %}

Run it with `secator`:

{% tabs %}
{% tab title="CLI" %}

```bash
$ secator x mytool mytarget.com

                         __            
   ________  _________ _/ /_____  _____
  / ___/ _ \/ ___/ __ `/ __/ __ \/ ___/
 (__  /  __/ /__/ /_/ / /_/ /_/ / /    
/____/\___/\___/\__,_/\__/\____/_/     v0.6.0

                        freelabz.com

No Celery worker alive.
/home/osboxes/.local/bin/mytool -u mytarget.com -jsonl
[INF] This is an info message
[ERR] This is an error message
🔗 https://mytarget.com/api [200] [MyAwesomeWebPage] [application/json]
🔗 https://mytarget.com/api/metrics [403]
🏷️ aws_api_key found @ https://mytarget/api/.aws_key.json
    secret: A3TBABCD1234EFGH5678
🚨 [Object Injection 🡕] [critical] https://mytarget/api/sensitive_api_path
```

{% endtab %}

{% tab title="Python" %}

```python
from secator.tasks import mytool
task = mytool('mytarget.com')
for item in task:
    print(item)  # this will output Url, Vulnerability, or Tag items.

```

{% endtab %}
{% endtabs %}

***

### Writing a custom item loader

**Steps:**

* Override the `item_loader` static method to parse the standard output with custom code.

**Example:**

Assume `mytool` outputs on stdout like:

```bash
mytool -u mytarget.com
https://mytarget.com/api | url | 200 | application/json | MyAwesomePage
https://mytarget.com/api/metrics | url | 403
A3TBABCD1234EFGH5678 | aws_api_key | http://mytarget/api/.aws_key.json
<-- an HTML comment --> | html_comment | http://mytarget/api/.aws_key.json
CVE-2021-44228 | vulnerability | http://mytarget/api/sensitive_ap
```

```python
from secator.decorators import task
from secator.runners import Command
from secator.output_types import Url, Tag, Vulnerability


@task()
class mytool(Command):
  cmd = '/home/osboxes/.local/bin/mytool'
  input_flag = '-u'
  json_flag = '-jsonl'
  output_types = [Url, Tag, Vulnerability]

  @staticmethod
  def item_loader(self, line):
      items = [c.strip() for c in line.split('|')]
      value, item_type = tuple(items[0:2])
      if item_type == 'url':
          yield Url(
              url=value,
              status_code=items[3],
              content_type=items[4] if len(items) > 3 else '',
              title=items[5] if len(items) > 4 else ''
          )
      elif item_type == 'vulnerability':
          cve_id = value
          lookup_data = Vuln.lookup_cve(cve_id)  # perform vulnerability search
          vuln = {
            'matched_at': items[2]
          }
          if lookup_data:
            vuln.update(**lookup_data)
          yield Vulnerability(**vuln)
      else: # tag
          yield Tag(
              name=item_type,
              match=items[2],
              extra_data={
                  'value': value
              }
          )
 
```

Run it with `secator`:

{% tabs %}
{% tab title="CLI" %}

```bash
$ secator x mytool mytarget.com

                         __            
   ________  _________ _/ /_____  _____
  / ___/ _ \/ ___/ __ `/ __/ __ \/ ___/
 (__  /  __/ /__/ /_/ / /_/ /_/ / /    
/____/\___/\___/\__,_/\__/\____/_/     v0.6.0

                        freelabz.com

No Celery worker alive.
/home/osboxes/.local/bin/mytool -u mytarget.com -jsonl
[INF] This is an info message
[ERR] This is an error message
🔗 https://mytarget.com/api [200] [MyAwesomeWebPage] [application/json]
🔗 https://mytarget.com/api/metrics [403]
🏷️ aws_api_key found @ https://mytarget/api/.aws_key.json
    secret: A3TBABCD1234EFGH5678
🚨 [Object Injection 🡕] [critical] https://mytarget/api/sensitive_api_path
```

{% endtab %}

{% tab title="Python" %}

```python
from secator.tasks import mytool
task = mytool('mytarget.com')
for item in task:
    print(item)  # this will output Url, Vulnerability, or Tag items.

```

{% endtab %}
{% endtabs %}

***


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.freelabz.com/for-developers/writing-tasks/integrating-an-external-command/parsing-raw-standard-output.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
