Five Ways to Fine-Tune Chronos-2, the Time Series Foundation Model

In Part 1 of this collection, -2, a time-series basis mannequin. We bought our palms soiled by strolling by way of an actual case examine and noticed what Chronos-2 can do straight out of the field, with no coaching.

However as we famous on the finish of Half 1, zero-shot isn’t all the time sufficient.

In circumstances when:

Your information might look not like something within the pretraining combine.
The mannequin retains making systematic errors.
You do have wealthy historic information that may be leveraged.
Your downstream goal could also be misaligned with the target that Chronos-2’s coaching optimizes for.

Fantastic-tuning is the pure subsequent step.

On this submit, we’ll proceed the identical constructing electricity-demand case examine from Half 1, and stroll by way of 5 fine-tuning situations of Chronos-2:

Single-building adaptation: learn how to fine-tune on the one asset.
Portfolio fine-tuning: learn how to pool historical past throughout the fleet for a shared adapter.
Covariate-informed fine-tuning: learn how to fine-tune with known-future alerts.
Portfolio + covariates: learn how to leverage each covariate and fleet info.
Held-out switch: learn how to adapt as soon as, then deploy on property the mannequin by no means noticed throughout fine-tuning.

By the top, you’ll have a working template for fine-tuning a TSFM that is able to adapt to your individual information.

Half 1 of this collection introduces learn how to make Chronos-2 forecasting for univariate, multivariate, covariate-informed, and cross-learning situations. If you wish to use Chronos-2 out of the field, examine the submit here.

1. The case examine, recapped

Let’s shortly revisit the setup from Half 1.

We’ve got an artificial dataset of eight business buildings that information hourly electrical energy demand. The duty we goal to resolve is to forecast the overall electrical energy load one week forward, i.e., 168 hours. We’ve got a bodily simulator to generate the dataset, the place the overall load is decomposed into base, plug, lighting, and HVAC masses. Bodily, plug and lighting masses are decided by weekday occupancy patterns, whereas HVAC load is decided by outside temperature.

Now, what’s new for Half 2 is that we simulate an extended time span in order that we will have information for fine-tuning. And we hold a clear separation between fine-tuning information and inference information. Particularly, we divide the timeline into 4 contiguous home windows:

Practice (12 weeks): 2025-03-01 to 2025-05-22, the one window fine-tuning sees.
Validation (1 week): 2025-05-23 to 2025-05-29, used for checkpoint choice and early stopping.
Inference context (45 days): 2025-05-30 to 2025-07-13, the window used as context when making forecasts. The zero-shot pipeline in Half 1 additionally consumed 45 days of context.
Check (1 week): 2025-07-14 to 2025-07-20, the forecast horizon for testing the fine-tuned mannequin.

Word that the fine-tuning course of would solely see information within the prepare & validation set, so there isn’t any leakage within the evaluation.

Determine 1. Practice/val/context/take a look at cut up. (Picture by creator)

2. Transient on fine-tuning and LoRA

Earlier than our walk-through, let’s first briefly focus on the idea of fine-tuning and one in every of its particular applied sciences, i.e., LoRA.

2.1 What’s fine-tuning?

Fantastic-tuning means we proceed coaching a pretrained mannequin on our personal information. Successfully, we’re adapting the weights of the pretrained mannequin such that it understands and follows the patterns particular to our downside.

For Chronos-2 particularly, it’s a 120M-parameter Transformer that has already discovered a number of generic time-series construction. Fantastic-tuning would permit us to additional nudge its conduct within the path of our information.

However ought to we replace all 120M parameters?

Most likely not.

This may be costly in each compute and storage. Additionally, in follow, we’d not have sufficient information to assist adjusting all 120M parameters.

We want a extra environment friendly option to do the fine-tuning. One such resolution is LoRA.

2.2 What’s LoRA?

LoRA stands for Low-Rank Adaptation [1]. Its core concept is easy: as an alternative of updating the complete weight matrices, we freeze the unique pre-trained mannequin and solely be taught a small set of extra parameters that barely modify its conduct.

To offer an instance, suppose one layer within the pretrained mannequin accommodates a weight matrix W, with a form of d_out x d_in, the place d_out=d_in=1024.

The replace of the load matrix would suggest:

Then, the dimensions of ΔW would additionally should be 1024 x 1024. If we need to do a full replace, that might imply that we replace a couple of million trainable parameters.

The trick that LoRA adopts is that ΔW will not be discovered as a full matrix. As a substitute, LoRA represents it because the product of two a lot smaller matrices:

the place A has a form of r x d_in and B has a form of d_out x r. And r is the rank of the adapter. The explanation why it’s referred to as a low-rank technique is that r is often fairly small, equivalent to 4, 8, 16, or 32.

What this suggests is that LoRA doesn’t permit the fine-tuning to make an arbitrary full-dimensional change to W. The updates are restricted to a lower-dimensional subspace. And that restriction is strictly the place the effectivity comes from.

This works in follow as a result of many downstream variations do probably not require altering the mannequin in each potential path. Typically, the helpful change lives in a a lot smaller subspace. LoRA immediately exploits this assumption.

In follow, this provides us a number of benefits. Since now we have many fewer trainable parameters, the GPU reminiscence utilization, which is consumed by gradients and optimizer states, might be made a lot decrease. We even have smaller checkpoints, as a result of we don’t want to save lots of a full copy of the 120M-parameter mannequin for each experiment; we solely save the adapter. And it reduces overfitting threat, particularly when the downstream dataset will not be giant.

3. The right way to do LoRA for Chronos-2?

To do LoRA for the Chronos-2 mannequin, the very first thing we have to resolve is which layers of Chronos-2 we need to adapt.

To reply this query, we should always first check out how the mannequin is constructed.

In Part 1, we defined that Chronos-2 is a Transformer encoder organized round three constructing blocks:

An enter patch embedding.
A stack of consideration layers, alternating between time consideration and group consideration.
An output patch embedding.

Our LoRA configuration adapts two of those three blocks:

The Q, Okay, V, and O projections in each consideration layer. That is the place we will fine-tune how the mannequin attends each temporally inside every collection and throughout collection inside a bunch.

In Chronos-2, every consideration layer entails 4 linear projections to map from layer’s enter to the output. The question (Q), key (Okay), and worth (V) produce three completely different views of the enter, the eye mechanism then computes a similarity rating between each question and each key, and makes use of these scores to compute the weighted aggregation of the values. The end result then passes by way of the output projection (O), which mixes info throughout consideration heads and reshapes it again to match the layer’s normal output dimensions.

The output patch embedding. This enables us to fine-tune the best way the mannequin initiatives its inside states into remaining forecasts.

In code, now we have:

LORA_CONFIG = {
    "r": 8,
    "lora_alpha": 16,
    "target_modules": [
        "self_attention.q",
        "self_attention.v",
        "self_attention.k",
        "self_attention.o",
        "output_patch_embedding.output_layer",
    ],
}

the place lora_alpha is a scaling issue. It controls how strongly the LoRA replace is utilized, the place a bigger α means a extra aggressive adaptation.

In our present examine, we use Hugging Face peft library to fine-tune Chronos-2.

Now we’re able to get hands-on.

4. 5 fine-tuning situations

For the next experiments, we additionally begin from the identical base mannequin, i.e., amazon/chronos-2 checkpoint, with the identical LoRA configuration. What adjustments is the info we expose to fine-tuning.

The primary metric we’ll use is weighted absolute proportion error:

With that setup, let’s stroll by way of the 5 situations one after the other.

When you haven’t but arrange the right Chronos surroundings, please confer with Part 1: 4.1 Setting up the Chronos-2 model.

4.1 Single-building adaptation

Can we fine-tune on one asset?

Suppose we solely care about one constructing, say Constructing 03. We do have its historic load information, and we need to adapt Chronos-2 to this explicit constructing’s patterns.

This could be the only fine-tuning setup. No covariates, no portfolio info, only one goal collection.

As talked about earlier, we begin from amazon/chronos-2 checkpoint, go away the bottom mannequin frozen, and solely be taught a small LoRA adapter on high of it.

Chronos-2’s fine-tuning API expects coaching information as an inventory of activity dictionaries. For our present target-only univariate activity, every dictionary solely wants one key: goal.

For Constructing 03, we will put together the fine-tuning enter like this:

story_building = "Constructing 03"
train_df = full_df[full_df["timestamp"] < "2025-05-23"]

single_building_train = train_df[
    train_df["building"].eq(story_building)
].sort_values("timestamp")

train_inputs = [
    {
        "target": single_building_train[["total_load_kw"]]
        .to_numpy(dtype="float32")
        .T
    }
]

The explanation why we’d like a “transpose” above is that Chronos-2 expects the goal array to have form:

(num_target_series, time_steps)

Since we solely have a single univariate goal, now we have:

(1, T)

Along with coaching information, we should always put together validation information in the identical format:

validation_df = full_df[full_df["timestamp"] < "2025-05-30"]
single_building_validation = validation_df[
    validation_df["building"].eq(story_building)
].sort_values("timestamp")

validation_inputs = [
    {
        "target": single_building_validation[["total_load_kw"]]
        .to_numpy(dtype="float32")
        .T
    }
]

There are two issues value mentioning right here:

To start with, only a reminder: the validation information right here will not be used to replace the LoRA adapter; it’s used to resolve which adapter checkpoint to maintain. It’s the identical sample you’ll usually use for coaching a neural community mannequin.

Then, you would possibly discover that validation_df will not be solely Could 23-29, but in addition accommodates every little thing earlier than that. We want that as a result of, for making forecasts, Chronos-2 wants context. Primarily based on the set prediction_length, Chronos internally treats the final prediction_length hours of validation_df because the true validation forecast goal. The previous values are the context.

Within the present case, we solely configured one validation activity in validation_inputs. This implies we successfully solely have one validation forecast window, as a result of internally Chronos-2 all the time makes use of the dataframe’s final prediction_length steps because the goal window and the previous context_length steps because the context, NO MATTER what number of extra steps you feed in that dataframe. In different phrases, merely feeding an extended validation dataframe doesn’t mechanically create extra validation home windows.

In follow, if you would like extra validation forecast home windows, e.g., doing a rolling window based mostly validation, we would wish to create a number of validation duties, every ending at a special cutoff date. This manner, Chronos-2 would validate on the final 168 hours of every activity.

For coaching, although, we don’t really want any particular therapy, as we will merely cross Chronos-2 a protracted historic collection and let it pattern many coaching home windows internally.

Now we will fine-tune:

fine_tuned_model = base_model.match(
    train_inputs,
    prediction_length=168,
    validation_inputs=validation_inputs,
    finetune_mode="lora",
    lora_config=LORA_CONFIG,
    context_length=1080,         # 45-day context window
    learning_rate=2e-5,
    num_steps=1000,
    batch_size=32,
    output_dir="finetuned_models/fine_tuning_modes/single_target",
    finetuned_ckpt_name="checkpoint",
    callbacks=[EarlyStoppingCallback(early_stopping_patience=6)],
    save_steps=25,
    eval_steps=25,
)

Right here, we set prediction_length=168, in order that the mannequin is skilled for a similar activity we care about at take a look at time, i.e., one-week forward hourly forecasting. Additionally, we set context_length=45 * 24, which represents a 45-day context window. This is similar context size we utilized in Half 1. Lastly, since now we have used validation_inputs, the checkpoint choice is activated. Each 25 coaching steps, Chronos-2 evaluates validation loss, and if validation loss stops enhancing for six validation checks in a row (early_stopping_patience=6), early cease will kick in and cease the fine-tuning.

Determine 2. Coaching loss retains falling, however validation loss rises after the primary checkpoint. (Picture by creator)

I ran the fine-tuning job on an NVIDIA RTX 2000 Ada Laptop computer GPU with 8 GB VRAM. This run completed in about 42s.

As soon as the adapter is skilled, inference seems to be virtually the identical as zero-shot forecasting:

single_context = test_context_df[
    test_context_df["building"].eq(story_building)
][["building", "timestamp", "total_load_kw"]]

pred_single_finetuned = fine_tuned_model.predict_df(
    single_context,
    prediction_length=168,
    quantile_levels=[0.025, 0.5, 0.975],
    id_column="constructing",
    timestamp_column="timestamp",
    goal="total_load_kw",
)

For Constructing 03, the target-only zero-shot baseline has a WAPE of 8.3%. After fine-tuning on Constructing 03 solely, WAPE reduces to 7.6%. We do see that fine-tuning has introduced some enhancements.

4.2 Portfolio fine-tuning

Can we pool historical past throughout the fleet for a shared adapter?

In follow, we regularly have a number of associated property in a portfolio.

In our case, which means eight buildings. They aren’t an identical, however they observe comparable every day and weekly demand patterns.

So the following pure query is: can we fine-tune one adapter on the entire constructing portfolio, as an alternative of only one constructing at a time?

Right here, we nonetheless forecast solely total_load_kw, this implies the setup is nearly the identical as earlier than:

target_column = "total_load_kw"

train_inputs = [
    {
        "target": building_df[[target_column]].to_numpy(dtype="float32").T,
    }
    for _, building_df in train_df.groupby("constructing", kind=True)
]

validation_inputs = [
    {
        "target": building_df[[target_column]].to_numpy(dtype="float32").T,
    }
    for _, building_df in validation_df.groupby("constructing", kind=True)
]

Successfully, every constructing turns into one coaching activity. Then we fine-tune Chronos-2 with the identical LoRA configuration as earlier than:

fine_tuned_model = base_model.match(
    inputs=train_inputs,
    validation_inputs=validation_inputs,
    prediction_length=168,
    context_length=1080,
    lora_config=LORA_CONFIG,
    learning_rate=2e-5,
    max_steps=1000,
)

It’s value emphasizing that right here we aren’t coaching eight separate adapters. As a substitute, we’re asking Chronos-2 to be taught one shared adaptation that works throughout the fleet. In follow, if there are recurring patterns throughout buildings, the adapter may have extra probabilities to be taught them. Nevertheless, if every constructing is totally impartial, this technique might not assist a lot.

The fine-tuning causes are proven beneath, the place we examine the forecasting high quality between the zero-shot and fine-tuned Chronos-2:

Constructing      Zero-shot WAPE    Fantastic-tuned WAPE
Constructing 01   8.0%              7.4%
Constructing 02   12.2%             11.3%
Constructing 03   8.3%              7.5%
Constructing 04   8.0%              7.6%
Constructing 05   7.2%              6.8%
Constructing 06   10.9%             9.9%
Constructing 07   7.7%              7.2%
Constructing 08   6.6%              6.3%

We see enhancements throughout all of the buildings, which is an efficient signal that each constructing is benefiting from the shared adapter.

4.3 Covariate-informed fine-tuning

Can we give Chronos-2 the recognized covariates throughout fine-tuning?

Up to now, Chronos-2 solely sees the goal collection itself, i.e., historic total_load_kw.

However in our building-demand case, we do know or can fairly nicely forecast the underlying driving elements, together with outside temperature, occupancy sample, photo voltaic irradiance, and weekend indicator. They’re the covariates that drive the change of total_load_kw.

Due to this fact, on this fine-tuning situation, we want to know if we will fine-tune Chronos-2 not solely on the goal historical past, but in addition on the connection between the goal and known-future covariates

That is the place the fine-tuning enter needs to be modified. As a substitute of solely passing the goal, every coaching activity ought to now additionally comprise past_covariates and future_covariates:

known_future_columns = [
    "outdoor_temp_c",
    "occupancy",
    "solar_irradiance",
    "is_weekend",
]

single_building_train = train_df[
    train_df["building"].eq(story_building)
].sort_values("timestamp")

train_inputs = [
    {
        "target": single_building_train[["total_load_kw"]]
        .to_numpy(dtype="float32")
        .T,
        "past_covariates": {
            column: single_building_train[column].to_numpy(dtype="float32")
            for column in known_future_columns
        },
        "future_covariates": {
            column: None
            for column in known_future_columns
        },
    }
]

The past_covariates half accommodates the historic values of the covariate collection. Throughout fine-tuning, Chronos-2 can see how covariates of temperature, occupancy, photo voltaic irradiance, and weekends change the load.

The future_covariates half tells Chronos-2 that these covariates are additionally accessible within the forecast horizon. We set them to None right here as a result of Chronos-2 constructs the long run home windows internally from the identical historic collection. Later, at inference time, we’ll present the precise future covariate values by way of future_df, similar to we did in Half 1.

The fine-tuning name itself stays virtually the identical:

fine_tuned_model = base_model.match(
    train_inputs,
    prediction_length=168,
    validation_inputs=validation_inputs,
    finetune_mode="lora",
    lora_config=LORA_CONFIG,
    context_length=1080,
    learning_rate=2e-5,
    num_steps=1000,
    batch_size=32,
    output_dir="finetuned_models/fine_tuning_modes/single_covariate",
    finetuned_ckpt_name="checkpoint",
    callbacks=[EarlyStoppingCallback(early_stopping_patience=6)],
    save_steps=25,
    eval_steps=25,
)

After the fine-tuning is finished, at inference time, we cross each the historic context and the recognized future covariates:

context_with_covariates = test_context_df[
    ["building", "timestamp", "total_load_kw"] + known_future_columns
]

future_covariates_df = test_truth_df[
    ["building", "timestamp"] + known_future_columns
]

pred_single_covariate = fine_tuned_model.predict_df(
    context_with_covariates,
    future_df=future_covariates_df,
    prediction_length=168,
    quantile_levels=[0.025, 0.5, 0.975],
    id_column="constructing",
    timestamp_column="timestamp",
    goal="total_load_kw",
)

For Constructing 03, covariate-informed zero-shot WAPE is 4.0%. After fine-tuning the covariate-informed adapter on Constructing 03, WAPE drops to 2.8%, resulting in a 30.7% relative discount.

This can be a a lot bigger achieve than target-only fine-tuning.

That is additionally an attention-grabbing sensible lesson right here: typically the largest win will not be “fine-tuning” by itself. It’s fine-tuning the mannequin with the suitable info.

4.4 Portfolio + covariates

Can we leverage each covariate and fleet info for fine-tuning?

The earlier two situations added the “Portfolio” ingredient and “covariate” ingredient individually. Naturally, we need to use each.

That is the setup I imagine to be most related in lots of actual use circumstances, as a result of in follow, we not often simply have one asset, and most of the time, we do have recognized or forecastable exterior alerts that may assist goal collection forecasting. Utilizing each for fine-tuning will not be solely logical, however in all probability additionally preferable.

Concretely, for our present case, we fine-tune on all eight buildings, and for every constructing, we offer total_load_kw because the goal and outdoor_temp_c, occupancy, solar_irradiance, and is_weekend as known-future covariates:

train_inputs = []

for constructing, building_df in train_df.groupby("constructing", kind=True):
    building_df = building_df.sort_values("timestamp")

    train_inputs.append(
        {
            "goal": building_df[["total_load_kw"]]
            .to_numpy(dtype="float32")
            .T,
            "past_covariates": {
                column: building_df[column].to_numpy(dtype="float32")
                for column in known_future_columns
            },
            "future_covariates": {
                column: None
                for column in known_future_columns
            },
        }
    )

Within the code snippet above, we create one activity per constructing. The identical concept applies to validation information as nicely. Every constructing is related to one validation activity, and Chronos-2 makes use of the final 168 hours of every activity because the validation forecast window.

The fine-tuning name itself nonetheless stays the identical:

fine_tuned_model = base_model.match(
    train_inputs,
    prediction_length=168,
    validation_inputs=validation_inputs,
    finetune_mode="lora",
    lora_config=LORA_CONFIG,
    context_length=1080,
    learning_rate=2e-5,
    num_steps=1000,
    batch_size=32,
    output_dir="finetuned_models/fine_tuning_modes/portfolio_covariate",
    finetuned_ckpt_name="checkpoint",
    callbacks=[EarlyStoppingCallback(early_stopping_patience=6)],
    save_steps=25,
    eval_steps=25,
)

For inference, we cross 45-day historic context, in addition to the recognized future covariates for the forecast week:

context_with_covariates = test_context_df[
    ["building", "timestamp", "total_load_kw"] + known_future_columns
]

future_covariates_df = test_truth_df[
    ["building", "timestamp"] + known_future_columns
]

pred_portfolio_covariate = fine_tuned_model.predict_df(
    context_with_covariates,
    future_df=future_covariates_df,
    prediction_length=168,
    quantile_levels=[0.025, 0.5, 0.975],
    id_column="constructing",
    timestamp_column="timestamp",
    goal="total_load_kw",
)

The determine beneath exhibits the fine-tuning outcomes for Constructing 03, the place we will clearly see the development introduced by fine-tuning:

Determine 3. Portfolio + covariate fine-tuning in contrast with the plain zero-shot forecast for Constructing 03. (Picture by creator)

Throughout all eight buildings, the plain zero-shot baseline has a WAPE of 8.4%. After portfolio + covariate fine-tuning, WAPE drops to 2.8%, a 66.8% relative discount.

4.5 Held-out switch

Can we adapt as soon as, then deploy on property the mannequin by no means noticed throughout fine-tuning?

Up to now, each fine-tuning situation has used the identical buildings that later seem at inference time.

However there may be yet one more necessary query: What if a brand new constructing comes on-line solely very just lately?

So on this remaining situation, we maintain out Constructing 06 throughout fine-tuning, in order that Chronos-2 by no means sees its information whereas studying the LoRA adapter. We fine-tune on the opposite seven buildings, utilizing each goal histories and known-future covariates. Then, at inference time, we apply the adapter to Constructing 06.

The code change is small:

held_out_building = "Constructing 06"

train_buildings = [
    building
    for building in sorted(train_df["building"].distinctive())
    if constructing != held_out_building
]

train_inputs = []

for constructing in train_buildings:
    building_df = train_df[
        train_df["building"].eq(constructing)
    ].sort_values("timestamp")

    train_inputs.append(
        {
            "goal": building_df[["total_load_kw"]]
            .to_numpy(dtype="float32")
            .T,
            "past_covariates": {
                column: building_df[column].to_numpy(dtype="float32")
                for column in known_future_columns
            },
            "future_covariates": {
                column: None
                for column in known_future_columns
            },
        }
    )

Then, at inference time, we goal Constructing 06 for forecasting:

building_06_context = test_context_df[
    test_context_df["building"].eq(held_out_building)
][["building", "timestamp", "total_load_kw"] + known_future_columns]

building_06_future_covariates = test_truth_df[
    test_truth_df["building"].eq(held_out_building)
][["building", "timestamp"] + known_future_columns]

pred_heldout = fine_tuned_model.predict_df(
    building_06_context,
    future_df=building_06_future_covariates,
    prediction_length=168,
    quantile_levels=[0.025, 0.5, 0.975],
    id_column="constructing",
    timestamp_column="timestamp",
    goal="total_load_kw",
)

For Constructing 06, the covariate-informed zero-shot baseline has a WAPE of 4.2%. After making use of the adapter fine-tuned on the opposite seven buildings, WAPE drops to three.1%. That’s a 26.8% relative discount.

For actual deployment, our present Q5 investigation represents a extra scalable sample, that’s, we fine-tune an adapter on a consultant portfolio, then deploy it to associated property as they arrive on-line. For every new asset, we nonetheless present its latest context and known-future covariates, however we should not have to fine-tune once more instantly. We gained’t have sufficient information for that anyway.

5. What did we be taught?

After strolling by way of the 5 situations one after the other, let’s put their outcomes aspect by aspect.

For every row, I examine the fine-tuned mannequin towards the matching zero-shot baseline. Concretely, which means target-only fine-tuning is in contrast with target-only zero-shot, and covariate-informed fine-tuning is in contrast with covariate-informed zero-shot:

Determine 4. Fantastic-tuning improves all 5 situations. Covariate-informed setups introduced the most important features. (Picture by creator)

The sample is fairly clear. Goal-only fine-tuning helps to some extent, however solely modestly. The bigger features seem after we give Chronos-2 the known-future covariates, after which fine-tune the adapter round that. The held-out switch end result can also be encouraging: even for a constructing excluded from fine-tuning, the adapter can be taught from associated buildings and nonetheless enhance over the covariate-informed zero-shot baseline.

You’ll find the complete pocket book right here: https://github.com/ShuaiGuo16/chronos-2-forecasting/blob/main/02_chronos2_fine_tuning_building_demand.ipynb

Reference

[1] LoRA: Low-Rank Adaptation of Large Language Models. arXiv, 2021.

Source link

Five Ways to Fine-Tune Chronos-2, the Time Series Foundation Model

How to Navigate the Shift from Prompt-Based Tools to Workflow-Driven AI

Small Data, Big Maps: Training Geospatial ML Models When Samples Are Scarce

Is an Online Master’s Degree in AI a Good Idea?

I Built a C++ Backend So My GPU Would Stop Eating Air

I Spent May Evaluating Different Engines for OCR

Why AI Is NOT Stealing Your Job

Seminole gaming overhaul permit ruling upheld in Florida

Today’s NYT Strands Hints, Answer and Help for June 5 #824

Five Ways to Fine-Tune Chronos-2, the Time Series Foundation Model

Science reveals a natural behavior

Featured Picks

Today’s NYT Connections Hints, Answers for Feb. 28 #993

TOP 100: Europe’s most influential women in the startup and venture capital space in 2026!

4 Crucial Money Moves You Should Make ASAP

Five Ways to Fine-Tune Chronos-2, the Time Series Foundation Model

1. The case examine, recapped

2. Transient on fine-tuning and LoRA

2.1 What’s fine-tuning?

2.2 What’s LoRA?

3. The right way to do LoRA for Chronos-2?

4. 5 fine-tuning situations

4.1 Single-building adaptation

4.2 Portfolio fine-tuning

4.3 Covariate-informed fine-tuning

4.4 Portfolio + covariates

4.5 Held-out switch

5. What did we be taught?

Reference

Related Posts