Label Clusters API

The Label Clusters API provides access to aggregated sidewalk accessibility features and barriers. This is important because in Project Sidewalk, multiple people can label the same sidewalk feature/barrier (or even the same user from multiple street view images). So, similar labels that are geographically close together are grouped together.

For access to individual, unclustered label data, use the Raw Labels API instead.

To ensure data quality, we apply some data cleaning before grouping labels into clusters. You can read more about this in the Data Cleaning section below.

Label Clusters API Preview

Below is a live preview of label clusters from a sample region in St. Louis, Missouri retrieved directly from the API. Compare this map to the Raw Labels equivalent to see the difference between clustered labels vs. raw labels. Depending on the number of clusters in the region, this visualization may take a moment to load.

Loading label clusters data...

The size of the circles are proportional to the number of labels in each cluster. The color of the circles corresponds to the Label Type.

Note: In this example, we are visualizing Project Sidewalk data only in a single region; however, the Label Clusters API can return data from all regions in the city or just a selected region—see the API Query Parameters below.

Endpoint

Retrieve a list of label clusters, optionally filtered by various criteria. See Query Parameters below.

GET /v3/api/labelClusters

Examples

/v3/api/labelClusters?filetype=geojson Get all label clusters in GeoJSON (default)

/v3/api/labelClusters?filetype=geojson&inline=true Get all label clusters in GeoJSON but opened in the browser

/v3/api/labelClusters?filetype=csv Get all label clusters in a CSV

/v3/api/labelClusters&includeRawLabels=true Get all label clusters and the raw labels within each cluster.

/v3/api/labelClusters?labelType=CurbRamp Get all label clusters of type CurbRamp. The available label types match those in the Label Types API

/v3/api/labelClusters?labelType=SurfaceProblem&minSeverity=2 Get all label clusters of type SurfaceProblem with a minimum median severity of 2.

Quick Download

Download label clusters data directly in your preferred format:

Note: This downloads all label clusters. For filtered data, use the API query parameters described below.

Query Parameters

Filter the label clusters returned by this endpoint using the following query parameters. All parameters are optional. Combine multiple filter parameters to narrow down results (filters are applied using AND logic). When multiple location filters are provided (bbox, regionId, and regionName), bbox takes precedence over region filters, and regionId takes precedence over regionName.

Parameter Type Description
bbox string Filter clusters by bounding box. Coordinates should be provided as a comma-separated string in the format: minLnggitude,minLatitude,maxLongitude,maxLatitude (e.g., -74.01,40.71,-74.00,40.72). Uses WGS84 (EPSG:4326) coordinates. If omitted, results are not spatially filtered (potentially very large response).
regionId integer Filter clusters by region ID. Returns only clusters within the specified region. Note: If both bbox and regionId are provided, bbox takes precedence.
regionName string Filter clusters by region name. Returns only clusters within the specified region. Note: If bbox or regionId are provided, they take precedence over regionName.
labelType string Filter by one or more label types. Provide comma-separated values (e.g., labelType=CurbRamp,Obstacle). See Label Types Reference for available types.
includeRawLabels boolean Whether to include detailed information about the individual raw labels that make up each cluster. Default: false. Setting to true will increase response size substantially. Not available when using filetype=csv
clusterSize integer Filter for clusters with at least this many labels. Useful for focusing on features with higher confirmation counts.
avgImageCaptureDate string Filter clusters by minimum average image capture date. Format: ISO 8601 (e.g., 2020-01-01T00:00:00Z). Only includes clusters where the average image capture date is on or after this date.
avgLabelDate string Filter clusters by minimum average label creation date. Format: ISO 8601 (e.g., 2020-01-01T00:00:00Z). Only includes clusters where the average label creation date is on or after this date.
minSeverity integer Filter clusters with a median severity rating greater than or equal to this value (1-3).
maxSeverity integer Filter clusters with a median severity rating less than or equal to this value (1-3).
filetype string Specify the output format. Options: geojson (default), csv, shapefile, geopackage.
inline boolean Whether to display the file inline or as an attachment. Default: false (attachment). Set to true to view data in the browser instead of downloading.

Responses

Success Response (200 OK)

On success, the API returns an HTTP 200 OK status code and the requested data in the specified filetype format.

GeoJSON Format (Default)

Returns a GeoJSON FeatureCollection where each feature represents a single label cluster. Coordinate Reference System (CRS) is WGS84 (EPSG:4326).

{
    "type": "FeatureCollection",
    "features": [
        {
            "type": "Feature",
            "geometry": {
                "type": "Point",
                "coordinates": [-74.0243606567383, 40.8839912414551]
            },
            "properties": {
                "label_cluster_id": 124,
                "label_type": "CurbRamp",
                "street_edge_id": 951,
                "osm_way_id": 11584845,
                "region_id": 8,
                "region_name": "Teaneck Community Charter School",
                "avg_image_capture_date": "2012-08-15T00:00:00Z",
                "avg_label_date": "2023-06-20T14:32:45Z",
                "median_severity": 1,
                "agree_count": 18,
                "disagree_count": 2,
                "unsure_count": 0,
                "cluster_size": 5,
                "users": [
                    "18b26a38-24ab-402d-a64e-158fc0bb8a8a",
                    "53ad4d79-9a7b-4d3c-a753-63bbfca34c9b"
                ],
                "labels": [
                    {
                        "label_id": 8,
                        "user_id": "18b26a38-24ab-402d-a64e-158fc0bb8a8a",
                        "gsv_panorama_id": "DsCvWstZYz9JL81V9NloOQ",
                        "severity": 1,
                        "time_created": 1692227245041,
                        "latitude": 40.8839912414551,
                        "longitude": -74.0243606567383,
                        "correct": true,
                        "image_capture_date": "2012-08"
                    },
                    {
                        "label_id": 12,
                        "user_id": "53ad4d79-9a7b-4d3c-a753-63bbfca34c9b",
                        "gsv_panorama_id": "DsCvWstZYz9JL81V9NloOQ",
                        "severity": 1,
                        "time_created": 1692228103532,
                        "latitude": 40.8839912414551,
                        "longitude": -74.0243606567383,
                        "correct": true,
                        "image_capture_date": "2012-08"
                    }
                ]
            }
        },
        {
            "type": "Feature",
            "geometry": {
                "type": "Point",
                "coordinates": [-74.0243835449219, 40.8839416503906]
            },
            "properties": {
                "label_cluster_id": 125,
                "label_type": "NoCurbRamp",
                "street_edge_id": 952,
                "osm_way_id": 11566031,
                "region_id": 8,
                "region_name": "Teaneck Community Charter School",
                "avg_image_capture_date": "2012-08-15T00:00:00Z",
                "avg_label_date": "2023-06-22T11:12:24Z",
                "median_severity": 3,
                "agree_count": 9,
                "disagree_count": 0,
                "unsure_count": 1,
                "cluster_size": 3,
                "users": [
                    "8af92eb8-fb84-4aa6-9539-abc95216dcd7",
                    "be481045-4448-42ae-bbac-3455ce914202",
                    "549187e0-82c9-4014-a48d-31f18083d575"
                ],
                "labels": [
                    ...
                ]
            }
        },
        ...
    ]
}
GeoJSON Field Descriptions

Each feature in the GeoJSON response represents a single label cluster with point geometry and detailed properties:

Field Path Type Description
geometry.coordinates array[number] Geographic coordinates representing the centroid of the cluster in [longitude, latitude] format using WGS84 (EPSG:4326) coordinate system.
properties.label_cluster_id integer Unique identifier for the cluster in the Project Sidewalk database.
properties.label_type string Type of sidewalk feature or barrier represented by this cluster. Possible values: CurbRamp, NoCurbRamp, Crosswalk, SurfaceProblem, Obstacle, Signal, NoSidewalk, or Other.
properties.median_severity integer Median severity rating of all labels in the cluster, from 1 (minor issue) to 3 (major barrier). For accessibility features like curb ramps, a low severity indicates good condition, while for barriers, it indicates a less significant obstacle.
properties.cluster_size integer Number of individual labels that make up this cluster. Higher numbers generally indicate higher confidence in the existence of the feature.
properties.street_edge_id integer Project Sidewalk internal identifier for the street segment the cluster is associated with.
properties.osm_way_id integer OpenStreetMap Way ID for the street segment, if available.
properties.region_id integer Identifier for the region where the cluster is located.
properties.region_name string Name of the region where the cluster is located, as defined in Project Sidewalk's regions.
properties.avg_image_capture_date string Average date when the Street View imagery was captured for the labels in this cluster, in ISO 8601 format.
properties.avg_label_date string Average date when the labels in this cluster were created, in ISO 8601 format.
properties.agree_count integer Total number of users who agreed with (confirmed) the labels in this cluster during validation tasks.
properties.disagree_count integer Total number of users who disagreed with (disputed) the labels in this cluster during validation tasks.
properties.unsure_count integer Total number of users who marked "unsure" for the labels in this cluster during validation tasks.
properties.users array[string] Array of anonymized user identifiers (UUIDs) of the users who contributed labels to this cluster.
properties.labels array Array of raw label objects that make up this cluster. Only included if includeRawLabels=true is specified in the request. Each object contains core information about the individual label including its ID, user ID, severity, coordinates, and validation status.

CSV Format

If filetype=csv is specified, the response body will be CSV data. The first row contains the headers, corresponding to the fields in the GeoJSON properties object, plus avg_latitude and avg_longitude columns derived from the geometry. CRS is WGS84 (EPSG:4326).

label_cluster_id,label_type,street_edge_id,osm_way_id,region_id,region_name,avg_image_capture_date,avg_label_date,median_severity,agree_count,disagree_count,unsure_count,cluster_size,users,avg_latitude,avg_longitude
124,CurbRamp,951,11584845,8,Teaneck Community Charter School,2012-08-15T00:00:00Z,2023-06-20T14:32:45Z,1,18,2,0,5,"[18b26a38-24ab-402d-a64e-158fc0bb8a8a,53ad4d79-9a7b-4d3c-a753-63bbfca34c9b]",40.8839912414551,-74.0243606567383
125,NoCurbRamp,952,11566031,8,Teaneck Community Charter School,2012-08-15T00:00:00Z,2023-06-22T11:12:24Z,3,9,0,1,3,"[8af92eb8-fb84-4aa6-9539-abc95216dcd7,be481045-4448-42ae-bbac-3455ce914202,549187e0-82c9-4014-a48d-31f18083d575]",40.8839416503906,-74.0243835449219
...

Note: The labels array is not included in the CSV output even when includeRawLabels=true is specified, as the nested data structure is not suitable for the CSV format.

Shapefile Format

If filetype=shapefile is specified, the response body will be a ZIP archive containing the Shapefile components (.shp, .shx, .dbf, .prj). The attribute table (.dbf) contains fields corresponding to the GeoJSON properties object (field names may be truncated due to Shapefile limitations). The included .prj file defines the Coordinate Reference System (CRS), typically WGS84 (EPSG:4326).

GeoPackage Format

If filetype=geopackage is specified, the response body will be a GeoPackage file (.gpkg). This is an open standard format based on SQLite that contains both geometry and attributes in a single file, generally without the field name limitations of Shapefiles. CRS is typically WGS84 (EPSG:4326).

Error Responses

If an error occurs, the API will return an appropriate HTTP status code and a JSON response body containing details about the error.

  • 400 Bad Request: Invalid parameter values (e.g., malformed bounding box, invalid date format).
  • 404 Not Found: The requested resource does not exist (e.g., incorrect base URL path).
  • 500 Internal Server Error: An unexpected error occurred on the server.

Error Response Body

Error responses include a JSON body with the following structure:

{
    "status": 400, // HTTP Status Code repeated
    "code": "INVALID_PARAMETER", // Machine-readable error code
    "message": "Invalid value for bbox parameter. Expected format: minLng,minLat,maxLng,maxLat.", // Human-readable description
    "parameter": "bbox" // Optional: The specific parameter causing the error
}

Data Cleaning

Before grouping labels into the cluster provided by this API, we apply a few filters to clean the data and ensure data quality. We filter out labels that meet any of these criteria:

  • The label has been validated as incorrect. If a label has more "No" votes than "Yes" votes, it won't be included.
  • Through manual review, an Admin has flagged an entire street or more as being labeled incorrectly. If the label was validated as correct, however, it will still be included.
  • The user who created the label has been flagged as a low-quality contributor, either through an algorithmic assessment or through manual review by an Admin. If the label was validated as correct, however, it will still be included. A user may be automatically flagged as providing low-quality data if they meet any of the following criteria:
    • They have an accuracy rating below 60% based on validations from other users (min 50 of their labels validated).
    • They have a "labeling frequency" below 37.5 labels per kilometer. This cutoff point was determined experimentally and helps to ensure full data coverage.
    • They have been flagged by Admins as providing low quality data through manual review.

Best Practices

When working with the Label Clusters API, consider these recommendations:

  • Use spatial filtering: Always provide either a bbox or regionId/regionName parameter to limit results to a specific area, especially for cities with large datasets.
  • Control response size: Only set includeRawLabels=true when you specifically need the detailed label data, as it significantly increases response size.
  • Combine with other filters: Use multiple filters together (e.g., labelType, minSeverity) to narrow down results to specific accessibility issues of interest.
  • Choose the right format: Use geojson for web mapping applications, csv for data analysis, and shapefile for GIS software.

Contribute

Project Sidewalk is an open-source project created by the Makeability Lab and hosted on GitHub. We welcome your contributions! If you found a bug or have a feature request, please open an issue on GitHub.

You can also email us at sidewalk@cs.uw.edu

Project Sidewalk in Your City!

If you are interested in bringing Project Sidewalk to your city, please read our Wiki page.

On This Page